summaryrefslogtreecommitdiff
path: root/doc/assembly_code
diff options
context:
space:
mode:
authortege <tege@gmplib.org>2000-06-30 10:35:05 +0200
committertege <tege@gmplib.org>2000-06-30 10:35:05 +0200
commit8d20777e6911a1c027ec9f3e507ef374573c8c2b (patch)
tree7ae533339df0deb57cfb1281a52854a5b41a0c82 /doc/assembly_code
parenta782c8d5ecfeca95b05c8c24040a2ac24283d531 (diff)
downloadgmp-8d20777e6911a1c027ec9f3e507ef374573c8c2b.tar.gz
*** empty log message ***
Diffstat (limited to 'doc/assembly_code')
-rw-r--r--doc/assembly_code57
1 files changed, 57 insertions, 0 deletions
diff --git a/doc/assembly_code b/doc/assembly_code
new file mode 100644
index 000000000..f2b8abb37
--- /dev/null
+++ b/doc/assembly_code
@@ -0,0 +1,57 @@
+Most mpn subdirectories contain machine-dependent code, written in
+assembly or C. The `generic' subdirectory contains default code, used
+when there is no machine-dependent replacement for a particular
+machine.
+
+There is one subdirectory for each ISA family. Note that e.g., 32-bit SPARC
+and 64-bit SPARC are very different ISA's, and thus cannot share any code.
+
+A particular compile will only use code from one subdirectory, and the
+`generic' subdirectory. The ISA-specific subdirectories contain hierachies of
+directories for various architecture variants and implementations; the
+top-most level contains code that runs correctly on all variants.
+
+HOW TO WRITE FAST ASSEMBLY CODE FOR GMP
+
+[This should ultimately be made into a chapter of the GMP manual.]
+
+The most basic techinques are software pipelining and loop unrolling.
+
+Software pipelining is the technique of scheduling instructions around
+the branch point in a loop, so that consecutive iterations overlap.
+It is very much like juggling.
+
+Unrolling is useful when software pipelining does not get us close
+enough to the peek performance of a processor's pipeline. Unrolling
+decreases the loop overhead, but also often allows a more even load on
+a processor's functional units.
+
+For processors with very few registers, software pipelining is not
+feasible as it increases register pressure.
+
+For superscalar machines, it is often the case that all available
+execution capabilites are not used. Scheduling some instructions
+for these otherwise unused resources will never cost us anything.
+
+Try to determine the alternative instructions that can be used for a
+particular processor. For GMP, the problem that presents most
+challenges is rpopagating carry from one iteration to the next.
+Explore the different possibilities for doing that with the available
+instructions!
+
+For wide superscalar processors, the performance might be completely
+determined by the number of dependent instruction requied from
+accepting carry-in from the previous iteration until producing
+carry-out for the next iteration. This is particularly true for
+simple operations like mpn_add_n and mpn_sub_n. Some carry
+propagation schemes require 4 instructions, translating to at least
+four cycles per iterations. Other schemes can propagate carry in two
+cycles or even just one cycle.
+
+Therefore, for wide superscalar processors, finding methods with
+"shallow" carry propagation given an instruction set is often the
+central problem we need to address. The rest is just is hard coding
+work.
+
+[Describe: First find issue maps with desired performance
+ Then schedule for latency]