summaryrefslogtreecommitdiff
path: root/projects/projects.html
blob: bc48149984406986a29ad7dab9d5f084450cde12 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
  <title>
    GMP Development Projects
  </title>
</head>
<body bgcolor=pink>

<center>
  <h1>
    GMP Development Projects
  </h1>
</center>

<comment>
  An up-to-date version of this file is available at
  <a href="http://www.swox.com/gmp/projects.html">http://www.swox.com/gmp/projects.html</a>.
</comment>

<p> This file lists projects suitable for volunteers.  Please see the file
    tasks, or <a href="tasks.html">
    http://www.swox.com/gmp/tasks.html</a> for smaller tasks.

<p> If you want to work on any of the projects below, please let tege@swox.com
    know.  If you want to help with a project that already somebody else is
    working on, please talk to that person too, tege@swox.com can put you in
    touch.  (There are no email addresses of volunteers below, due to spamming
    problems.)

<ul>
<li> <strong>C++ wrapper</strong>

  <p> A C++ wrapper for GMP is highly desirable.  Several people have started
  writing one, but nobody has so far finished it.  For a wrapper to be useful,
  one needs to pay attention to efficiency.

  <ol>
    <li> Write arithmetic functions for direct applications of most
         elementary C++ types.  This might be necessary to avoid
         ambiguities, but it is also desirable from an efficiency
         standpoint.
    <li> Avoid running constructors/destructors when not necessary.
         For example, implement <code>a&nbsp+=&nbsp;b</code> by directly applying mpz_add.
  </ol>

<p> <li> <strong>Faster multiplication</strong>

  <p> The current multiplication code is 3-way Toom-Cook based and
  runs at O(n^1.465) when the operands are both around n digits large.

  Several new developments are desirable:

  <ol>
    <li> Handle multiplication of operands with different digit count better than
	 today.  We now split the operands in a very inefficient way, see
	 mpn/generic/mul.c.

    <li> Consider N-way Toom-Cook.  See Knuth's Seminumerical
	 Algorithms for details on the method.  Code implementing it
	 exists.  This is asyptotically inferior to FFTs, but is finer
	 grained since it can split into any number of pieces, not
	 just powers of 2 (or of 2 and 3 or whatever).

    <li> Implement some variant of FFT-based multiplication.  I believe the most
	 promising variant is the original Schönhage-Strassen mod 2^n+1 algorithm.
	 Paul Zimmermann and Robert Harley are working on this, Paul Zimmerman
	 has working code.

    <li> It's possible CPU dependent effects like cache locality will
         have a greater impact on speed than algorithmic improvements.

  </ol>

<p> <li> <strong>Division and modular multiplication</strong>

  <p> GMP's current division routines are O(n^2), or to be accurate, O(n(m-n))
  where n is the digit count of the divisor, and m is the digit count of the
  dividend.  Modular reduction code rely on division, and is thus also slow.

  <p> Desirable new developments:

  <ol>
  <li> Implement O(n^1.58) division.  There is an algorithm published recently
       that accomplish this.
  <li> Implement division based on first computing an approximation to the
       reciprocal of the divisor, using Newton's algorithm, and then multiplying
       the dividend by the approximate reciprocal.
  <li> Implement modular multiplication based on Peter Montgomery's REDC
       algorithm.  This algorithm is O(n^2) and should therefore be used only for
       operands under a certain limit.  Operands greater than that limit need to
       reply on computing the divisor reciprocal, see above.
  <li> Make <code>mpz_powm</code>, <code>mpz_powm_ui</code>, etc, use modular
       multiplication for the operand ranges where that helps.  Actually, these
       functions should probably be implemented at the mpn level as part of
       this development.
  <li> Tune the aging plain division functions.  (Mostly done by Torbjörn.)
       Perhaps write new functions with better interfaces,
       (<code>mpn_tdiv_q</code>, <code>mpn_tdiv_r</code>,
       <code>mpn_tdiv_qr</code>) replacing <code>mpn_divmod</code> and (the
       slow!) <code>mpn_divrem</code>.
  </ol>

<p> <li> <strong>Assembly routines</strong>

  <p> Write new and tune existing assembly routines.  The functions in
  mpn/tests are useful for testing and timing the routines you write.  See the
  README file in that directory for instructions on how to use the various test
  files.

  <p> Please make sure your new routines are fast for these three situations:
  <ol>
    <li> Operands that fit into the cache.
    <li> Small operands of less than, say, 10 limbs.
    <li> Huge operands that does not fit into the cache.
  </ol>

  <p> The most important routines are mpn_addmul_1, mpn_mul_basecase and
  mpn_sqr_basecase.  The latter two don't exist for all machines, while
  mpn_addmul_1 exists for almost all machines.

  <p> Standard techniques for these routines are unrolling, software
  pipelining, and specialization for common operand values.  For machines with
  poor integer multiplication, it is often possible to improve the performance
  using floating-point operations, or SIMD operations such as MMX or Sun's VIS.

  <p> Using floating-point operations is interesting but somewhat tricky.
  Since IEEE double has 53 bit of mantissa, one has to split the operands in
  small prices, so that no result is greater than 2^53.  For 32-bit computers,
  splitting one operand in 16-bit pieces works.  For 64-bit machines, one
  operand can be split in 21-bit pieces and the other in 32-bit pieces.  (A
  64-bit operand can be split in just three 21-bit pieces if one allows the
  split operands to be negative!)

<p> <li> <strong>Configuration</strong>

  <p> We want to recognize the processor very accurately, more accurately than
  other GNU packages.  config.guess does not currently make the distinctions we
  would like it to do and a --target often needs to be set explicitly.

  <p> For example, "sparc" is not very useful as a machine architecture
  denotation.  We want to distinguish old 32-bit SPARC without multiply support
  from newer 32-bit SPARC with such support.  We want to recognize a
  SuperSPARC, since its implementation of the UDIV instruction is not complete,
  and will trap to the OS kernel for certain operands.  And we want to
  recognize 64-bit capable SPARC processors as such.  While the assembly
  routines can use 64-bit operations on all 64-bit SPARC processors, one can
  not use 64-bit limbs under all operating system.  E.g., Solaris 2.5 and 2.6
  doesn't preserve the upper 32 bits of most processor registers.  For SPARC we
  therefore sometimes need to choose GMP configuration depending both on
  processor and operating system.

  <p> Other things autoconf should care about:
  <ol>

   <li> Find out whether there's an alloca available and how to use
        it.  AC_FUNC_ALLOCA has various system dependencies covered,
        but we don't want its alloca.c replacement.  (One thing
        current cpp tests don't cover: HPUX 10 C compiler supports
        alloca, but cannot find any symbol to test in order to know if
        we're on HPUX 10.  Damn.)

   <li> Identify Mips processor under Irix: `hinv -c processor'
   <li> Identify Alpha processor under OSF: "/usr/sbin/sizer -c".
        Unfortunately, sizer is not available before some revision of Dec Unix
        4.0, and it also returns some rather cryptic names for processors.
        Perhaps the <code>implver</code> and <code>amask</code> assembly
        instructions are better, but that doesn't differentiate between ev5 and
        ev56.
   <li> Get lots of information about a Solaris system: prtconf -vp
   <li> For specific target machines, figure out assembly language variant
        (like for x86 <code>shldl</code>).
   <li> For some target machines and some compilers, specific options are needed
        (sparcv8/gcc needs -mv8, sparcv8/cc needs -cg92, Irix64/cc needs -64,
        Irix32/cc might need -n32, etc).  Some are set already, add more, see
	configure.in.
   <li> Options to be passed to the assembler (via the compiler, using whatever
        syntax the compiler uses for passing options to the assembler).
   <li> On Solaris 7, check if gcc supports native v9 64-bit arithmetic.
        If not compile using "cc -fast -xarch=v9".  (Problem: -fast requires
	that we link with -fast too, which might not be very good.  Pass
	"-xO4 -xtarget=native" instead?)
  </ol>

  <p> In general, getting the exact right configuration, passing the exact right
  options to the compiler, etc, might mean that the GMP performance more than
  doubles.

  <p> When testing, make sure to test at least the
  following for all out target machines: (1) Both gcc and cc (and c89).
  (2) Both 32-bit mode and 64-bit mode (such as -n32 vs -64 under
  Irix). (3) Both the system `make' and GNU `make'. (4) With and without
  GNU binutils.

  <p> We'd ultimately like to build multiple variants of the library
  under certain systems.  An example is -n32, -o32, and -64 on Irix.
  
  <p> Improve config.guess.  It should say mips2, mips3, and mips4.
  It should say supersparc, microsparc, ultrasparc1, ultrasparc2.
  Etc.  Likewise for HPPA.  (Alpha and x86 work already.  Ensure
  config.sub accepts the guesses.)

<p> <li> <strong>Faster extended GCD</strong>

  <p> We currently have extended GCD based on Lehmer's method.
  But the binary method can quite easily be improved for bignums
  in a similar way Lehmer improved Euclid's algorithm.  The result should
  beat Lehmer's method by about a factor of 2.  Furthermore, the multiprecision
  step of Lehmer's method and the binary method will be identical, so the
  current code can mostly be reused.  It should be possible to share code
  between GCD and GCDEXT, and probably with Jacobi symbols too.

<p> <li> <strong>Math functions for the mpf layer</strong>

  <p> Implement the functions of math.h for the GMP mpf layer!  Check the paper
  "Pi and the AGM" for ideas how to do this.  These functions are desirable:
  acos, acosh, asin, asinh, atan, atanh, atan2, cos, cosh, exp, log, log10,
  pow, sin, sinh, tan, tanh.

<p> <li> <strong>Faster sqrt</strong>

  <p> The current code for computing square roots use a Newton iteration that
  rely on division.  It is possible to avoid using division by computing
  1/sqrt(A) using this formula:
  <pre>
                                    2
                   x   = x  (3 - A x )/2.
                    i+1   i         i  </pre>
  The final result is then computed without division using,
  <pre>
                     sqrt(A) = A x .
                                  n  </pre>
  The resulting code should be substantially faster than the current code.

<p> <li> <strong>Nth root</strong>

  <p> Implement, at the mpn level, a routine for computing the nth root of a
  number.  The routine should use Newton's method, preferably without using
  division.

  <p> If the routine becomes fast enough, perhaps square roots could be computed
  using this function.

<p> <li> <strong>Better random number generators</strong>

  <p> Write random number routines that accept bit counts instead of limb
  counts.  A state should be passed to each random number generator; no global
  state should be used, since that would be non-reentrant.

  <p> Partially done by Linus Nordberg.  Ask for the current code.

</ul>

<hr>

<table width="100%">
  <tr>
    <td>
      <font size=2>
      Please send comments about this page to
      <a href="mailto:tege@swox.com">tege@swox.com</a>.<br>
      Copyright (C) 1999 Torbjörn Granlund.
      </font>
    </td>
    <td align=right>
    </td>
  </tr>
</table>

</body>
</html>