diff options
author | Michael Weiser <michael.weiser@gmx.de> | 2018-02-13 22:13:14 +0100 |
---|---|---|
committer | Niels Möller <nisse@lysator.liu.se> | 2018-03-25 11:27:44 +0200 |
commit | 70135c70863eedfd9b300614f4a5535b8b93066c (patch) | |
tree | e8230747d2d551418141fce35c12d4939b813ad6 /arm | |
parent | 2644d1ed132f7dad05e165d6c96a68ee66547d32 (diff) | |
download | nettle-70135c70863eedfd9b300614f4a5535b8b93066c.tar.gz |
Document arm endianness considerations
Extend arm/README to provide some background on considerations to be taken into
account when writing assembly routines supposed to work in big and little memory
endianness.
Diffstat (limited to 'arm')
-rw-r--r-- | arm/README | 69 |
1 files changed, 68 insertions, 1 deletions
@@ -44,4 +44,71 @@ q12 (d24, d25) Y q13 (d26, d27) Y q14 (d28, d29) Y q15 (d30, d31) Y - + +Endianness + +ARM supports big- and little-endian memory access modes. Representation in +registers stays the same but loads and stores switch bytes. This has to be +taken into account in various cases. + +Two m4 macros are provided to handle these special cases in assembly source: +IF_LE(<if-true>,<if-false>) +IF_BE(<if-true>,<if-false>) +respectively expand to <if-true> if the target system's endianness is +little-endian or big-endian. Otherwise they expand to <if-false>. + +1. ldr/str + +Loading and storing 32-bit words will reverse the words' bytes in little-endian +mode. If the handled data is actually a byte sequence or data in network byte +order (big-endian), the loaded word needs to be reversed after load to get it +back into correct sequence. See v6/sha1-compress.asm LOAD macro for example. + +2. shifts + +If data is to be processed with bit operations only, endianness can be ignored +because byte-swapping on load and store will cancel each other out. Shifts +however have to be inverted. See arm/memxor.asm for an example. + +3. vld1.8 + +NEON's vld instruction can be used to produce endianness-neutral code. vld1.8 +will load a byte sequence into a register regardless of memory endianness. This +can be used to process byte sequences. See arm/neon/umac-nh.asm for example. + +4. vldm/vstm + +Care has to be taken when using vldm/vstm because they have two non-obvious +characteristics: + +a. vldm/vstm do normal byte-swapping on each value they load. When loading into + d (doubleword) registers, this means that bytes, halfwords and words of the + doubleword get swapped. When the data loaded actually represents e.g. + vectors of 32-bit words this will swap columns. +a. vldm/vstm on q (quadword) registers get translated into lvdm/vstm on the + equivalent number of d (doubleword) registers. Instead of a 128-bit load it + does two 64-bit loads. When again handling vectors of 32-bit words this will + still swap adjacent columns but will not reverse all four columns. + +memory adr0: w0 w1 w2 w3 +register q0: w1 w0 w3 w2 + +See arm/neon/chacha-core-internal.asm for an example. + +5. simple byte store + +Sometimes it is necessary to store remaining single bytes to memory. A simple +logic will store the lowest byte from a register, then do a right shift and +start over until all bytes are stored. Since this constitutes a +least-significant-byte-first store, the data to be stored needs to be reversed +first on a big-endian system. See arm/memxor.asm Lmemxor_leftover for an +example. + +6. Function parameters/return values + +AAPCS requires 64-bit parameters to be passed to and returned from functions +"in two consecutive registers [...] as if the value had been loaded from memory +representation with a single LDM instruction." Since loading a big-endian +doubleword using ldm transposes its words, the same has to be done when e.g. +returning a 64-bit value from an assembler routine. See arm/neon/umac-nh.asm +for an example. |