Copyright 1999-2021 Free Software Foundation, Inc.
Contributed by the AriC and Caramba projects, INRIA.

This file is part of the GNU MPFR Library.

The GNU MPFR Library is free software; you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation; either version 3 of the License, or (at your
option) any later version.

The GNU MPFR Library is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
License for more details.

You should have received a copy of the GNU Lesser General Public License
along with the GNU MPFR Library; see the file COPYING.LESSER.  If not, see
https://www.gnu.org/licenses/ or write to the Free Software Foundation, Inc.,
51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.

Table of contents:
1. Documentation
2. Compiler/library detection
3. Changes in existing functions
4. New functions to implement
5. Efficiency
6. Miscellaneous
7. Portability


##############################################################################
1. Documentation
##############################################################################

- add a description of the algorithms used and a proof of correctness

- when mp*_srcptr is used in a prototype, add "const" in the prototype
  in the manual, like what is now done in the GMP manual, i.e.
    mpfr_srcptr -> const mpfr_t
    mpz_srcptr  -> const mpz_t
    mpq_srcptr  -> const mpq_t
    mpf_srcptr  -> const mpf_t


##############################################################################
2. Compiler/library detection
##############################################################################

- update ICC detection.
  * Use only __INTEL_COMPILER instead of the obsolete macro __ICC?


##############################################################################
3. Changes in existing functions
##############################################################################

- export mpfr_overflow and mpfr_underflow as public functions

- many functions currently taking into account the precision of the *input*
  variable to set the initial working precision (acosh, asinh, cosh, ...).
  This is nonsense since the "average" working precision should only depend
  on the precision of the *output* variable (and maybe on the *value* of
  the input in case of cancellation).
  -> remove those dependencies from the input precision.

- mpfr_can_round:
   change the meaning of the 2nd argument (err). Currently the error is
   at most 2^(MPFR_EXP(b)-err), i.e. err is the relative shift wrt the
   most significant bit of the approximation. I propose that the error
   is now at most 2^err ulps of the approximation, i.e.
   2^(MPFR_EXP(b)-MPFR_PREC(b)+err).

- mpfr_set_q first tries to convert the numerator and the denominator
  to mpfr_t. But this conversion may fail even if the correctly rounded
  result is representable. New way to implement:
  Function q = a/b.  nq = PREC(q) na = PREC(a) nb = PREC(b)
    If na < nb
       a <- a*2^(nb-na)
    n <- na-nb+ (HIGH(a,nb) >= b)
    if (n >= nq)
       bb <- b*2^(n-nq)
       a  = q*bb+r     --> q has exactly n bits.
    else
       aa <- a*2^(nq-n)
       aa = q*b+r      --> q has exactly n bits.
  If RNDN, takes nq+1 bits. (See also the new division function).

- revisit the conversion functions between a MPFR number and a native
  floating-point value.
  * Consequences if some exception is trapped?
  * Specify under which conditions (current rounding direction and
    precision of the FPU, whether a format has been recognized...),
    correct rounding is guaranteed. Fix the code if need be. Do not
    forget subnormals.
  * Provide mpfr_buildopt_* functions to tell whether the format of a
    native type (float / double / long double) has been recognized and
    which format it is?
  * For functions that return a native floating-point value (mpfr_get_flt,
    mpfr_get_d, mpfr_get_ld, mpfr_get_decimal64), in case of underflow or
    overflow, follow the convention used for the functions in <math.h>?
    See §7.12.1 "Treatment of error conditions" of ISO C11, which provides
    two ways of handling error conditions, depending on math_errhandling:
    errno (to be set to ERANGE here) and floating-point exceptions.
    If floating-point exceptions need to be generated, do not use
    feraiseexcept(), as this function may require the math library (-lm);
    use a floating-point expression instead, such as DBL_MIN * DBL_MIN
    (underflow) or DBL_MAX * DBL_MAX (overflow), which are probably safe
    as used in the GNU libc implementation.
  * For testing the lack of subnormal support:
    see the -mfpu GCC option for ARM and
    https://en.wikipedia.org/wiki/Denormal_number#Disabling_denormal_floats_at_the_code_level


##############################################################################
4. New functions to implement
##############################################################################

- cr_xxx functions from http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf
  (section 7.31.8, page 392):
  rootn (there is mpfr_rootn_ui but not mpfr_rootn_si yet)
  rsqrt (differs from mpfr_rec_sqrt in -0: mpfr_rec_sqrt gives +Inf whereas
         rsqrt gives -Inf)
- a function to compute the hash of a floating-point number
  (suggested by Patrick Pelissier)
- implement new functions from the C++17 standard:
  https://en.cppreference.com/w/cpp/numeric/special_functions
  assoc_laguerre, assoc_legendre, comp_ellint_1, comp_ellint_2, comp_ellint_3,
  cyl_bessel_i, cyl_bessel_j, cyl_bessel_k, cyl_neumann, ellint_1, ellint_2,
  ellint_3, hermite, legendre, laguerre, sph_bessel, sph_legendre,
  sph_neumann.
  Already in mpfr4: beta and riemann_zeta.
  See also https://isocpp.org/files/papers/P0226R1.pdf and §29.9.5 in the
  C++17 draft:
  https://github.com/cplusplus/draft/blob/master/source/numerics.tex
- implement mpfr_q_sub, mpfr_z_div, mpfr_q_div?
- implement mpfr_pow_q and variants with two integers (native or mpz)
  instead of a rational? See IEEE P1788.
- implement functions for random distributions, see for example
  https://sympa.inria.fr/sympa/arc/mpfr/2010-01/msg00034.html
  (suggested by Charles Karney <ckarney@Sarnoff.com>, 18 Jan 2010):
   * a Bernoulli distribution with prob p/q (exact)
   * a general discrete distribution (i with prob w[i]/sum(w[i]) (Walker
     algorithm, but make it exact)
   * a uniform distribution in (a,b)
   * exponential distribution (mean lambda) (von Neumann's method?)
   * normal distribution (mean m, s.d. sigma) (ratio method?)
- wanted for Magma [John Cannon <john@maths.usyd.edu.au>, Tue, 19 Apr 2005]:
  HypergeometricU(a,b,s) = 1/gamma(a)*int(exp(-su)*u^(a-1)*(1+u)^(b-a-1),
                                    u=0..infinity)
  JacobiThetaNullK
  PolylogP, PolylogD, PolylogDold: see https://arxiv.org/abs/math/0702243
    and the references herein.
  JBessel(n, x) = BesselJ(n+1/2, x)
  KBessel, KBessel2 [2nd kind]
  JacobiTheta
  (see http://www.ams.org/journals/mcom/0000-000-00/S0025-5718-2017-03245-2/home.html)
  LogIntegral
  ExponentialIntegralEn (formula 5.1.4 of Abramowitz and Stegun)
  DawsonIntegral
  GammaD(x) = Gamma(x+1/2)
- new functions of IEEE 754-2008, and more generally functions of the
  C binding draft TS 18661-4:
    http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1946.pdf
  Some propositions about rootn: mpfr_rootn_si, mpfr_rootn_sj, mpfr_rootn_z,
  and versions with an unsigned integer: mpfr_rootn_ui (now implemented, as
  similar to mpfr_root) and mpfr_rootn_uj.
- functions defined in the LIA-2 standard
  <https://standards.iso.org/ittf/PubliclyAvailableStandards/c024427_ISO_IEC_10967-2_2001(E).zip>
  + minimum and maximum (5.2.2): max, min, max_seq, min_seq, mmax_seq
    and mmin_seq (mpfr_min and mpfr_max correspond to mmin and mmax);
  + rounding_rest, floor_rest, ceiling_rest (5.2.4);
  + remr (5.2.5): x - round(x/y) y;
  + error functions from 5.2.7 (if useful in MPFR);
  + power1pm1 (5.3.6.7): (1 + x)^y - 1;
  + logbase (5.3.6.12): \log_x(y);
  + logbase1p1p (5.3.6.13): \log_{1+x}(1+y);
  + rad (5.3.9.1): x - round(x / (2 pi)) 2 pi = remr(x, 2 pi);
  + axis_rad (5.3.9.1) if useful in MPFR;
  + cycle (5.3.10.1): rad(2 pi x / u) u / (2 pi) = remr(x, u);
  + axis_cycle (5.3.10.1) if useful in MPFR;
  + cotu, secu, cscu, cossinu, arccotu, arcsecu, arccscu (5.3.10.{5..14}):
  + arcu (5.3.10.15): arctan2(y,x) u / (2 pi);
  + rad_to_cycle, cycle_to_rad, cycle_to_cycle (5.3.11.{1..3}).
- From GSL, missing special functions (if useful in MPFR):
  (cf https://www.gnu.org/software/gsl/manual/gsl-ref.html#Special-Functions)
  + The Airy functions Ai(x) and Bi(x) defined by the integral representations:
   * Ai(x) = (1/\pi) \int_0^\infty \cos((1/3) t^3 + xt) dt
   * Bi(x) = (1/\pi) \int_0^\infty (e^(-(1/3) t^3) + \sin((1/3) t^3 + xt)) dt
   * Derivatives of Airy Functions
  + The Bessel functions for n integer and n fractional:
   * Regular Modified Cylindrical Bessel Functions I_n
   * Irregular Modified Cylindrical Bessel Functions K_n
   * Regular Spherical Bessel Functions j_n: j_0(x) = \sin(x)/x,
     j_1(x)= (\sin(x)/x-\cos(x))/x & j_2(x)= ((3/x^2-1)\sin(x)-3\cos(x)/x)/x
     Note: the "spherical" Bessel functions are solutions of
     x^2 y'' + 2 x y' + [x^2 - n (n+1)] y = 0 and satisfy
     j_n(x) = sqrt(Pi/(2x)) J_{n+1/2}(x). They should not be mixed with the
     classical Bessel Functions, also noted j0, j1, jn, y0, y1, yn in C99
     and mpfr.
     Cf https://en.wikipedia.org/wiki/Bessel_function#Spherical_Bessel_functions
   *Irregular Spherical Bessel Functions y_n: y_0(x) = -\cos(x)/x,
     y_1(x)= -(\cos(x)/x+\sin(x))/x &
     y_2(x)= (-3/x^3+1/x)\cos(x)-(3/x^2)\sin(x)
   * Regular Modified Spherical Bessel Functions i_n:
     i_l(x) = \sqrt{\pi/(2x)} I_{l+1/2}(x)
   * Irregular Modified Spherical Bessel Functions:
     k_l(x) = \sqrt{\pi/(2x)} K_{l+1/2}(x).
  + Clausen Function:
     Cl_2(x) = - \int_0^x dt \log(2 \sin(t/2))
     Cl_2(\theta) = \Im Li_2(\exp(i \theta)) (dilogarithm).
  + Dawson Function: \exp(-x^2) \int_0^x dt \exp(t^2).
  + Debye Functions: D_n(x) = n/x^n \int_0^x dt (t^n/(e^t - 1))
  + Elliptic Integrals:
   * Definition of Legendre Forms:
    F(\phi,k) = \int_0^\phi dt 1/\sqrt((1 - k^2 \sin^2(t)))
    E(\phi,k) = \int_0^\phi dt   \sqrt((1 - k^2 \sin^2(t)))
    P(\phi,k,n) = \int_0^\phi dt 1/((1 + n \sin^2(t))\sqrt(1 - k^2 \sin^2(t)))
   * Complete Legendre forms are denoted by
    K(k) = F(\pi/2, k)
    E(k) = E(\pi/2, k)
   * Definition of Carlson Forms
    RC(x,y) = 1/2 \int_0^\infty dt (t+x)^(-1/2) (t+y)^(-1)
    RD(x,y,z) = 3/2 \int_0^\infty dt (t+x)^(-1/2) (t+y)^(-1/2) (t+z)^(-3/2)
    RF(x,y,z) = 1/2 \int_0^\infty dt (t+x)^(-1/2) (t+y)^(-1/2) (t+z)^(-1/2)
    RJ(x,y,z,p) = 3/2 \int_0^\infty dt
                          (t+x)^(-1/2) (t+y)^(-1/2) (t+z)^(-1/2) (t+p)^(-1)
  + Elliptic Functions (Jacobi)
  + N-relative exponential:
     exprel_N(x) = N!/x^N (\exp(x) - \sum_{k=0}^{N-1} x^k/k!)
  + exponential integral:
     E_2(x) := \Re \int_1^\infty dt \exp(-xt)/t^2.
     Ei_3(x) = \int_0^x dt \exp(-t^3) for x >= 0.
     Ei(x) := - PV(\int_{-x}^\infty dt \exp(-t)/t)
  + Hyperbolic/Trigonometric Integrals
     Shi(x) = \int_0^x dt \sinh(t)/t
     Chi(x) := Re[ \gamma_E + \log(x) + \int_0^x dt (\cosh[t]-1)/t]
     Si(x) = \int_0^x dt \sin(t)/t
     Ci(x) = -\int_x^\infty dt \cos(t)/t for x > 0
     AtanInt(x) = \int_0^x dt \arctan(t)/t
     [ \gamma_E is the Euler constant ]
  + Fermi-Dirac Function:
     F_j(x)   := (1/r\Gamma(j+1)) \int_0^\infty dt (t^j / (\exp(t-x) + 1))
  + Pochhammer symbol (a)_x := \Gamma(a + x)/\Gamma(a) : see [Smith01] in
          algorithms.bib
    logarithm of the Pochhammer symbol
  + Gegenbauer Functions
  + Laguerre Functions
  + Eta Function: \eta(s) = (1-2^{1-s}) \zeta(s)
    Hurwitz zeta function: \zeta(s,q) = \sum_0^\infty (k+q)^{-s}.
  + Lambert W Functions, W(x) are defined to be solutions of the equation:
     W(x) \exp(W(x)) = x.
    This function has multiple branches for x < 0 (2 funcs W0(x) and Wm1(x))
    From Fredrik Johansson:
    See https://cs.uwaterloo.ca/research/tr/1993/03/W.pdf, in particular
    formulas 5.2 and 5.3 for the error bound: one first computes an
    approximation w, and then evaluates the residual w e^w - x. There is an
    expression for the error in terms of the residual and the derivative W'(t),
    where the derivative can be bounded by piecewise simple functions,
    something like min(1, 1/t) when t >= 0.
    See https://arxiv.org/abs/1705.03266 for rigorous error bounds.
  + Trigamma Function psi'(x).
    and Polygamma Function: psi^{(m)}(x) for m >= 0, x > 0.
- functions from ISO/IEC 24747:2009 (Extensions to the C Library,
  to Support Mathematical Special Functions).
  Standard: https://www.iso.org/standard/38857.html
  Draft: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1292.pdf
  Rationale: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1244.pdf
  See also: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3060.pdf
  (similar, for C++).
  Also check whether the functions that are already implemented in MPFR
  match this standard.

- from gnumeric (www.gnome.org/projects/gnumeric/doc/function-reference.html):
  - incomplete beta function, see message from Martin Maechler
    <maechler@stat.math.ethz.ch> on 18 Jan 2016, and Section 6.6 in
    Abramowitz & Stegun
  - betaln
  - degrees
  - radians
  - sqrtpi

- mpfr_inp_raw, mpfr_out_raw (cf mail "Serialization of mpfr_t" from Alexey
  and answer from Granlund on mpfr list, May 2007)
- [maybe useful for SAGE] implement companion frac_* functions to the rint_*
  functions. For example mpfr_frac_floor(x) = x - floor(x). (The current
  mpfr_frac function corresponds to mpfr_rint_trunc.)
- scaled erfc (https://sympa.inria.fr/sympa/arc/mpfr/2009-05/msg00054.html)
- asec, acsc, acot, asech, acsch and acoth (mail from Björn Terelius on mpfr
  list, 18 June 2009)

- function to reduce the precision of a variable, with a ternary value in
  input, i.e. taking care of double rounding. Two possible forms: like
  mpfr_set (i.e. with input and output) or like mpfr_prec_round (i.e. with
  a single variable). mpfr_subnormalize and mpfr_round_nearest_away_end
  could use it.

- UBF functions for +, -, *, fmma, /, sqrt.
  Support UBF in mpfr_check_range or add mpfr_ubf_check_range?
  Make this available in the API, e.g. for MPC.

- mpfr_cmp_uj, mpfr_cmp_sj, mpfr_mul_uj, mpfr_mul_sj. They would be useful
  to test MPFR with _MPFR_EXP_FORMAT=4.

- base conversion with the round-trip property using a minimal precision,
  such as the to_chars functions from the C++ standard:

    The functions [...] ensure that the string representation consists
    of the smallest number of characters such that there is at least
    one digit before the radix point (if present) and parsing the
    representation using the corresponding from_chars function
    recovers value exactly. [Note: This guarantee applies only if
    to_chars and from_chars are executed on the same implementation.
    — end note] If there are several such representations, the
    representation with the smallest difference from the
    floating-point argument value is chosen, resolving any remaining
    ties using rounding according to round_to_nearest.

  Text from: https://www.zsh.org/mla/workers/2019/msg01138.html

- Serialization / Deserialization. Suggested by Frédéric Pétrot:
  https://sympa.inria.fr/sympa/arc/mpfr/2020-02/msg00006.html
  like mpfr_fpif_{import,export}, but with memory instead of file.

  Idea of implementation to reuse most of the code and change very little:

  Instead of passing a FILE *fh, pass a struct ext_data *h, and instead of
  using fread and fwrite, use
    h->read (h, buffer, size)
    h->write (h, buffer, size)
  respectively.

  The struct ext_data structure could contain the following fields:
    * read: pointer to a wrapper function for the read method.
    * write: pointer to a wrapper function for the write method.
    * FILE *fh: to be used for operations with files.
    * unsigned char *arena: to be used for operations with memory.

  The wrapper functions for the read method could be:

  static int
  read_from_file (struct ext_data *h, unsigned char *buffer, size_t size)
  {
    return fread (buffer, size, 1, h->fh) != 1;
  }

  static int
  read_from_memory (struct ext_data *h, unsigned char *buffer, size_t size)
  {
    if (h->arena == NULL)
      return 1;
    memcpy (buffer, h->arena, size);
    h->arena += size;
    return 0;
  }

  So I expect very few changes in the existing code:
    * Write a few wrapper functions.
    * Rename mpfr_fpif_export to mpfr_fpif_export_aux and
      mpfr_fpif_import to mpfr_fpif_import_aux.
    * In the existing functions, replace FILE *fh, and fread/fwrite
      calls as mentioned above.
    * Add new mpfr_fpif_export, mpfr_fpif_import, mpfr_fpif_export_mem,
      mpfr_fpif_import_mem.

- Fused divide-add and fused divide-subtract (mpfr_fda, mpfr_fds).
  Reference: K. Pande, A. Parkhi, S. Jaykar and A. Peshattiwar,
  "Design and Implementation of Floating Point Divide-Add Fused Architecture",
  2015 Fifth International Conference on Communication Systems and Network
  Technologies, 2015, pp. 797-800, doi: 10.1109/CSNT.2015.179.
  However, this article does not provide a specification of special values
  (but the same rules as FMA/FMS could be chosen, based on 1/±0 = ±Inf and
  1/±Inf = ±0).


##############################################################################
5. Efficiency
##############################################################################

- Fredrik Johansson wrote a new preprint about computing the gamma function,
  where he claims a speedup of 10 over MPFR at machine precision, and a
  speedup of 1000 for a first call to gamma at 10000 digits:
  https://hal.inria.fr/hal-03346642
- Fredrik Johansson reports that mpfr_ai is slow for large arguments: an
  asymptotic expansion should be used (once done, remove REDUCE_EMAX from
  tests/tai.c and update the description in mpfr.texi).
- for exp(x), Fredrik Johansson reports a 20% speed improvement starting from
  4000 bits, and up to a 75% memory improvement in his Arb implementation, by
  using recursive instead of iterative binary splitting:
  https://github.com/fredrik-johansson/arb/blob/master/elefun/exp_sum_bs_powtab.c
- improve mpfr_grandom using the algorithm in https://arxiv.org/abs/1303.6257
- implement a mpfr_sqrthigh algorithm based on Mulders' algorithm, with a
  basecase variant
- use mpn_div_q to speed up mpfr_div. However, mpn_div_q, which is new in
  GMP 5, is not documented in the GMP manual, thus we are not sure it
  guarantees to return the same quotient as mpn_tdiv_qr.
  Also mpfr_div uses the remainder computed by mpn_divrem. A workaround would
  be to first try with mpn_div_q, and if we cannot (easily) compute the
  rounding, then use the current code with mpn_divrem.
- improve atanh(x) for small x by using atanh(x) = log1p(2x/(1-x)),
  and log1p should also be improved for small arguments.
- compute exp by using the series for cosh or sinh, which has half the terms
  (see Exercise 4.11 from Modern Computer Arithmetic, version 0.3)
  The same method can be used for log, using the series for atanh, i.e.,
  atanh(x) = 1/2*log((1+x)/(1-x)).
- improve mpfr_gamma (see https://code.google.com/p/fastfunlib/). A possible
  idea is to implement a fast algorithm for the argument reconstruction
  gamma(x+k): instead of performing k products by x+i, we could precompute
  x^2, ..., x^m for m ~ sqrt(k), and perform only sqrt(k) products.
  One could also use the series for 1/gamma(x), see for example
  https://dlmf.nist.gov/5/7/ or formula (36) from
  https://mathworld.wolfram.com/GammaFunction.html
- improve the computation of Bernoulli numbers: instead of computing just one
  B[2n] at a time in mpfr_bernoulli_internal, we could compute several at a
  time, sharing the expensive computation of the 1/p^(2n) series.
- fix regression with mpfr_mpz_root (from Keith Briggs, 5 July 2006), for
   example on 3Ghz P4 with gmp-4.2, x=12.345:
   prec=50000    k=2   k=3   k=10  k=100
   mpz_root      0.036 0.072 0.476 7.628
   mpfr_mpz_root 0.004 0.004 0.036 12.20
   See also mail from Carl Witty on mpfr list, 09 Oct 2007.
- for sparse input (say x=1 with 2 bits), mpfr_exp is not faster than for
        full precision when precision <= MPFR_EXP_THRESHOLD. The reason is
        that argument reduction kills sparsity. Maybe avoid argument reduction
        for sparse input?
- speed up mpfr_atan for large arguments (to speed up mpc_log) see FR #6198
- improve mpfr_sin on values like ~pi (do not compute sin from cos, because
  of the cancellation). For instance, reduce the input modulo pi/2 in
  [-pi/4,pi/4], and define auxiliary functions for which the argument is
  assumed to be already reduced (so that the sin function can avoid
  unnecessary computations by calling the auxiliary cos function instead of
  the full cos function). This will require a native code for sin, for
  example using the reduction sin(3x)=3sin(x)-4sin(x)^3.
  See https://sympa.inria.fr/sympa/arc/mpfr/2007-08/msg00001.html and
  the following messages.
- improve generic.c to work for number of terms <> 2^k
- rewrite mpfr_greater_p... as native code.

- mpf_t uses a scheme where the number of limbs actually present can
  be less than the selected precision, thereby allowing low precision
  values (for instance small integers) to be stored and manipulated in
  an mpf_t efficiently.

  Perhaps mpfr should get something similar, especially if looking to
  replace mpf with mpfr, though it'd be a major change.  Alternately
  perhaps those mpfr routines like mpfr_mul where optimizations are
  possible through stripping low zero bits or limbs could check for
  that (this would be less efficient but easier).

- try the idea of the paper "Reduced Cancellation in the Evaluation of Entire
  Functions and Applications to the Error Function" by W. Gawronski, J. Mueller
  and M. Reinhard, to be published in SIAM Journal on Numerical Analysis: to
  avoid cancellation in say erfc(x) for x large, they compute the Taylor
  expansion of erfc(x)*exp(x^2/2) instead (which has less cancellation),
  and then divide by exp(x^2/2) (which is simpler to compute).

- replace the *_THRESHOLD macros by global (TLS) variables that can be
  changed at run time (via a function, like other variables)? One benefit
  is that users could use a single MPFR binary on several machines (e.g.,
  a library provided by binary packages or shared via NFS) with different
  thresholds. On the default values, this would be a bit less efficient
  than the current code, but this isn't probably noticeable (this should
  be tested). Something like:
    long *mpfr_tune_get(void) to get the current values (the first value
      is the size of the array).
    int mpfr_tune_set(long *array) to set the tune values.
    int mpfr_tune_run(long level) to find the best values (the support
      for this feature is optional, this can also be done with an
      external function).

- better distinguish different processors (for example Opteron and Core 2)
  and use corresponding default tuning parameters (as in GMP). This could be
  done in configure.ac to avoid hacking config.guess, for example define
  MPFR_HAVE_CORE2.
  Note (VL): the effect on cross-compilation (that can be a processor
  with the same architecture, e.g. compilation on a Core 2 for an
  Opteron) is not clear. The choice should be consistent with the
  build target (e.g. -march or -mtune value with gcc).
  Also choose better default values. For instance, the default value of
  MPFR_MUL_THRESHOLD is 40, while the best values that have been found
  are between 11 and 19 for 32 bits and between 4 and 10 for 64 bits!

- during the Many Digits competition, we noticed that (our implantation of)
  Mulders short product was slower than a full product for large sizes.
  This should be precisely analyzed and fixed if needed.

- for various functions, check the timings as a function of the magnitude
  of the input (and the input and/or output precisions?), and use better
  thresholds for asymptotic expansions.

- improve the special case of mpfr_{add,sub} (x, x, y, ...) when |x| > |y|
  to do the addition in-place and have a complexity of O(prec(y)) in most
  cases. The mpfr_{add,sub}_{d,ui} functions should automatically benefit
  from this change.

- in gmp_op.c, for functions with mpz_srcptr, check whether mpz_fits_slong_p
  is really useful in all cases (see TODO in this file).

- optimize code that uses a test based on the fact that x >> s is
  undefined in C for s == width of x but the result is expected to
  be 0. ARM and PowerPC could benefit from such an optimization,
  but not x86. This needs support from the compiler.
  For PowerPC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79233

- deal with MPFR_RNDF in mpfr_round_near_x (replaced by MPFR_RNDZ).

- instead of a fixed mparam.h, optionally use function multiversioning
  (FMV), currently only available with the GNU C++ front end:
    https://gcc.gnu.org/wiki/FunctionMultiVersioning
  According to https://lwn.net/Articles/691932/ the dispatch resolution
  is now done by the dynamic loader, so that this should be fast enough
  (the cost would be the reading of a static variable, initialized at
  load time, instead of a constant).
  In particular, binary package distributions would benefit from FMV as
  only one binary is generated for different processor families.

- use intrinsics such as _addcarry_u64 when available
  (needs a configure test).
  References:
    https://software.intel.com/sites/landingpage/IntrinsicsGuide/
    https://stackoverflow.com/q/11228855/3782797
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97387
  (would be useful for add1sp.c and sub1sp.c).


##############################################################################
6. Miscellaneous
##############################################################################

- [suggested by Tobias Burnus <burnus(at)net-b.de> and
   Asher Langton <langton(at)gcc.gnu.org>, Wed, 01 Aug 2007]
  support quiet and signaling NaNs in mpfr:
  * functions to set/test a quiet/signaling NaN: mpfr_set_snan, mpfr_snan_p,
    mpfr_set_qnan, mpfr_qnan_p
  * correctly convert to/from double (if encoding of s/qNaN is fixed in 754R)
  Note: Signaling NaNs are not specified by the ISO C standard and may
  not be supported by the implementation. GCC needs the -fsignaling-nans
  option (but this does not affect the C library, which may or may not
  accept signaling NaNs).

- check the constants mpfr_set_emin (-16382-63) and mpfr_set_emax (16383) in
  get_ld.c and the other constants, and provide a testcase for large and
  small numbers.

- from Kevin Ryde <user42@zip.com.au>:
   Also for pi.c, a pre-calculated compiled-in pi to a few thousand
   digits would be good value I think.  After all, say 10000 bits using
   1250 bytes would still be small compared to the code size!
   Store pi in round to zero mode (to recover other modes).

- add other prototypes for round to nearest-away (mpfr_round_nearest_away
  only deals with the prototypes of say mpfr_sin) or implement it as a native
  rounding mode
- add a new roundind mode: round to odd. If the result is not exactly
        representable, then round to the odd mantissa. This rounding
        has the nice property that for k > 1, if:
        y = round(x, p+k, TO_ODD)
        z = round(y, p, TO_NEAREST_EVEN), then
        z = round(x, p, TO_NEAREST_EVEN)
  so it avoids the double-rounding problem.
  VL: I prefer the (original?) term "sticky rounding", as used in
    J Strother Moore, Tom Lynch, Matt Kaufmann. A Mechanically Checked
    Proof of the Correctness of the Kernel of the AMD5K86 Floating-Point
    Division Algorithm. IEEE Transactions on Computers, 1996.
  and
    http://www.russinoff.com/libman/text/node26.html

- new rounding mode MPFR_RNDE when the result is known to be exact?
  * In normal mode, this would allow MPFR to optimize using
    this information.
  * In debug mode, MPFR would check that the result is exact
    (i.e. that the ternary value is 0).

- add tests of the ternary value for constants

- When doing Extensive Check (--enable-assert=full), since all the
  functions use a similar use of MACROS (ZivLoop, ROUND_P), it should
  be possible to do such a scheme:
    For the first call to ROUND_P when we can round.
    Mark it as such and save the approximated rounding value in
    a temporary variable.
    Then after, if the mark is set, check if:
      - we still can round.
      - The rounded value is the same.
  It should be a complement to tgeneric tests.

- in div.c, try to find a case for which cy != 0 after the line
        cy = mpn_sub_1 (sp + k, sp + k, qsize, cy);
  (which should be added to the tests), e.g. by having {vp, k} = 0, or
  prove that this cannot happen.

- add a configure test for --enable-logging to ignore the option if
  it cannot be supported. Modify the "configure --help" description
  to say "on systems that support it".

- add generic bad cases for functions that don't have an inverse
  function that is implemented (use a single Newton iteration).

- add bad cases for the internal error bound (by using a dichotomy
  between a bad case for the correct rounding and some input value
  with fewer Ziv iterations?).

- add an option to use a 32-bit exponent type (int) on LP64 machines,
  mainly for developers, in order to be able to test the case where the
  extended exponent range is the same as the default exponent range, on
  such platforms.
  Tests can be done with the exp-int branch (added on 2010-12-17, and
  many tests fail at this time).

- test underflow/overflow detection of various functions (in particular
  mpfr_exp) in reduced exponent ranges, including ranges that do not
  contain 0.

- add an internal macro that does the equivalent of the following?
    MPFR_IS_ZERO(x) || MPFR_GET_EXP(x) <= value

- check whether __gmpfr_emin and __gmpfr_emax could be replaced by
  a constant (see README.dev). Also check the use of MPFR_EMIN_MIN
  and MPFR_EMAX_MAX.

- add a test checking that no mpfr.h macros depend on mpfr-impl.h
  (the current tests cannot check that since mpfr-impl.h is always
  included).

- move some macro definitions from acinclude.m4 to the m4 directory
  as suggested by the Automake manual? The reason is that the
  acinclude.m4 file is big and a bit difficult to read.

- use symbol versioning.

- check whether mpz_t caching (pool) is necessary. Timings with -static
  with details about the C / C library implementation should be put
  somewhere as a comment in the source or in the doc. Using -static
  is important because otherwise the cache saves the dynamic call to
  mpz_init and mpz_clear; so, what we're measuring is not clear.
  See thread:
    https://gmplib.org/list-archives/gmp-devel/2015-September/004147.html
  Summary: It will not be integrated in GMP because 1) This yields
  problems with threading (in MPFR, we have TLS variables, but this is
  not the case of GMP). 2) The gain (if confirmed with -static) would
  be due to a poor malloc implementation (timings would depend on the
  platform). 3) Applications would use more RAM.
  Additional notes [VL]: the major differences in the timings given
  by Patrick in 2014-01 under Linux were:
    Before:
    arccos(x)  took 0.054689 ms (32767 eval in 1792 ms)
    arctan(x)  took 0.042116 ms (32767 eval in 1380 ms)
    After:
    arccos(x)  took 0.043580 ms (32767 eval in 1428 ms)
    arctan(x)  took 0.035401 ms (32767 eval in 1160 ms)
  mpfr_acos doesn't use mpz, but calls mpfr_atan, so that the issue comes
  from mpfr_atan, which uses mpz a lot. The problem mainly comes from the
  reallocations in GMP because mpz_init is used instead of mpz_init2 with
  the estimated maximum size. Other places in the code that uses mpz_init
  may be concerned.
  Issues with mpz_t caching:
    * The pool can take much memory, which may no longer be useful.
      For instance:
        mpfr_init2 (x, 10000000);
        mpfr_log_ui (x, 17, MPFR_RNDN);
        /* ... */
        mpfr_clear (x);
        /* followed by code using only small precision */
      while contrary to real caches, they contain no data. This is not
      valuable memory: freeing/allocating a large block of memory is
      much faster than the actual computations, so that mpz_t caching
      has no impact on the performance in such cases. A pool with large
      blocks also potentially destroys the data locality.
    * It assumes that the real GMP functions are __gmpz_init and
      __gmpz_clear, which are not part of the official GMP API, thus
      is based on GMP internals, which may change in the future or
      may be different in forks / compatible libraries / etc. This
      can be solved if MPFR code calls mpfr_mpz_init / mpfr_mpz_clear
      directly, avoiding the #define's.
  Questions that need to be answered:
    * What about the comparisons with other memory allocators?
    * Shouldn't the pool be part of the memory allocator?
      For the default memory allocator (malloc): RFE?
  If it is decided to keep some form of mpz_t caching, a possible solution
  for both issues: define mpfr_mpz_init2 and mpfr_mpz_clear2, which both
  take 2 arguments like mpz_init2, where mpfr_mpz_init2 behaves in a way
  similar to mpz_init2, and mpfr_mpz_clear2 behaves in a way similar to
  mpz_clear but where the size argument is a hint for the pool; if it is
  too large, then the mpz_t should not be pushed back to the pool. The
  size argument of mpfr_mpz_init2 could also be a hint to decide which
  element to pull from the pool.

- in tsum, add testcases for mpfr_sum triggering the bug fixed in r9722,
  that is, with a large error during the computation of the secondary term
  (when the TMD occurs).

- use the keyword "static" in array indices of parameter declarations with
  C99 compilers (6.7.5.3p7) when the pointer is expected not to be null?
  For instance, if mpfr.h is changed to have:
    __MPFR_DECLSPEC void mpfr_dump (const __mpfr_struct [static 1]);
  and one calls
    mpfr_dump (NULL);
  one gets a warning with Clang. This is just an example; this needs to be
  done in a clean way.
  See:
    https://stackoverflow.com/a/3430353/3782797
    https://hamberg.no/erlend/posts/2013-02-18-static-array-indices.html

- change most mpfr_urandomb occurrences to mpfr_urandom in the tests?
  (The one done in r10573 allowed us to find a bug even without
  assertion checking.)

- tzeta has been much slower since r9848 (which increases the precision
  of the input for the low output precisions), at least with the x86
  32-bit ABI. This seems to come from the fact that the working precision
  in the mpfr_zeta implementation depends on the precision of the input.
  Once mpfr_zeta has improved, change the last argument of test_generic
  in tzeta.c back to 5 (as it was before r10667).

- check the small-precision tables in the tests?
  This may require to export some pointer to the tables, but this could
  be done only if some debug macro is defined.

- optionally use malloc() for the caches? See mpfr_mp_memory_cleanup.
  Note: This can be implemented by adding a TLS flag saying whether we
  are under cache generation or not, and by making the MPFR allocation
  functions consider this flag. Moreover, this can only work for mpfr_t
  caching (floating-point constants), not for mpz_t caching (Bernoulli
  constants) because we do not have the control of memory allocation for
  mpz_init.

- use GCC's nonnull attribute (available since GCC 4.0) where applicable.

- avoid the use of MPFR_MANT(x) as an lvalue; use other (more high level)
  internal macros if possible, such as MPFR_TMP_INIT1, MPFR_TMP_INIT and
  MPFR_ALIAS.

- add some option to use pkg-config to get the compiler flags associated
  with GMP ("pkg-config --cflags gmp" / "pkg-config --libs gmp"). It
  could be one of
    --with-pkg-config[=<PATH>]
    --with-pkg-config-path[=<PATH>]
  See also the autoconf macros.
  The non-standard CPATH and C_INCLUDE_PATH environment variables need to
  be unset (only when invoking pkg-config) as a workaround to
    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=984504


##############################################################################
7. Portability
##############################################################################

- add a web page with results of builds on different architectures

- [Kevin about texp.c long strings]
  For strings longer than c99 guarantees, it might be cleaner to
  introduce a "tests_strdupcat" or something to concatenate literal
  strings into newly allocated memory.  I thought I'd done that in a
  couple of places already.  Arrays of chars are not much fun.

- use https://gcc.gnu.org/viewcvs/gcc/trunk/config/stdint.m4 for mpfr-gmp.h

- By default, GNU Automake adds -I options to local directories, with
  the side effect that these directories have the precedence to search
  for system headers (#include <...>). This may make the build fail if
  a C implementation includes a file that has the same name as one used
  in such a directory.
  For instance, if one adds an empty file "src/bits/types.h", then the
  MPFR build fails under Linux because /usr/include/stdio.h has
    #include <bits/types.h>
  Possible workaround:
    * disable the default -I options with nostdinc as documented in
      the Automake manual;
    * have a rule that copies the needed files ("mpfr.h" or they should
      be prefixed with "mpfr-") to $(top_builddir)/include;
    * use "-I$(top_builddir)/include".

- _Noreturn (from ISO C11) may be replaced by [[noreturn]] in the future:
  http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2764.pdf
  Once this is implemented in compilers, consider it as an alternative
  (concerned files: acinclude.m4, doc/README.dev and src/mpfr-impl.h).