diff options
author | tege <tege@gmplib.org> | 2000-06-30 10:35:05 +0200 |
---|---|---|
committer | tege <tege@gmplib.org> | 2000-06-30 10:35:05 +0200 |
commit | 8d20777e6911a1c027ec9f3e507ef374573c8c2b (patch) | |
tree | 7ae533339df0deb57cfb1281a52854a5b41a0c82 /doc/assembly_code | |
parent | a782c8d5ecfeca95b05c8c24040a2ac24283d531 (diff) | |
download | gmp-8d20777e6911a1c027ec9f3e507ef374573c8c2b.tar.gz |
*** empty log message ***
Diffstat (limited to 'doc/assembly_code')
-rw-r--r-- | doc/assembly_code | 57 |
1 files changed, 57 insertions, 0 deletions
diff --git a/doc/assembly_code b/doc/assembly_code new file mode 100644 index 000000000..f2b8abb37 --- /dev/null +++ b/doc/assembly_code @@ -0,0 +1,57 @@ +Most mpn subdirectories contain machine-dependent code, written in +assembly or C. The `generic' subdirectory contains default code, used +when there is no machine-dependent replacement for a particular +machine. + +There is one subdirectory for each ISA family. Note that e.g., 32-bit SPARC +and 64-bit SPARC are very different ISA's, and thus cannot share any code. + +A particular compile will only use code from one subdirectory, and the +`generic' subdirectory. The ISA-specific subdirectories contain hierachies of +directories for various architecture variants and implementations; the +top-most level contains code that runs correctly on all variants. + +HOW TO WRITE FAST ASSEMBLY CODE FOR GMP + +[This should ultimately be made into a chapter of the GMP manual.] + +The most basic techinques are software pipelining and loop unrolling. + +Software pipelining is the technique of scheduling instructions around +the branch point in a loop, so that consecutive iterations overlap. +It is very much like juggling. + +Unrolling is useful when software pipelining does not get us close +enough to the peek performance of a processor's pipeline. Unrolling +decreases the loop overhead, but also often allows a more even load on +a processor's functional units. + +For processors with very few registers, software pipelining is not +feasible as it increases register pressure. + +For superscalar machines, it is often the case that all available +execution capabilites are not used. Scheduling some instructions +for these otherwise unused resources will never cost us anything. + +Try to determine the alternative instructions that can be used for a +particular processor. For GMP, the problem that presents most +challenges is rpopagating carry from one iteration to the next. +Explore the different possibilities for doing that with the available +instructions! + +For wide superscalar processors, the performance might be completely +determined by the number of dependent instruction requied from +accepting carry-in from the previous iteration until producing +carry-out for the next iteration. This is particularly true for +simple operations like mpn_add_n and mpn_sub_n. Some carry +propagation schemes require 4 instructions, translating to at least +four cycles per iterations. Other schemes can propagate carry in two +cycles or even just one cycle. + +Therefore, for wide superscalar processors, finding methods with +"shallow" carry propagation given an instruction set is often the +central problem we need to address. The rest is just is hard coding +work. + +[Describe: First find issue maps with desired performance + Then schedule for latency] |