diff options
Diffstat (limited to 'doc/latex/src/64bit.tex')
-rw-r--r-- | doc/latex/src/64bit.tex | 204 |
1 files changed, 204 insertions, 0 deletions
diff --git a/doc/latex/src/64bit.tex b/doc/latex/src/64bit.tex new file mode 100644 index 00000000..5aaf20f6 --- /dev/null +++ b/doc/latex/src/64bit.tex @@ -0,0 +1,204 @@ +% +% vim: ts=4 sw=4 et +% +\xchapter{64bit}{Writing 64-bit Code (Unix, Win64)} + +This chapter attempts to cover some of the common issues involved when +writing 64-bit code, to run under \textindex{Win64} or Unix. It covers +how to write assembly code to interface with 64-bit C routines, and +how to write position-independent code for shared libraries. + +All 64-bit code uses a flat memory model, since segmentation is not +available in 64-bit mode. The one exception is the \code{FS} and +\code{GS} registers, which still add their bases. + +Position independence in 64-bit mode is significantly simpler, since +the processor supports \code{RIP}-relative addressing directly; see the +\code{REL} keyword (\nref{effaddr}). On most 64-bit platforms, +it is probably desirable to make that the default, using the directive +\code{DEFAULT REL} (\nref{default}). + +64-bit programming is relatively similar to 32-bit programming, but +of course pointers are 64 bits long; additionally, all existing +platforms pass arguments in registers rather than on the stack. +Furthermore, 64-bit platforms use SSE2 by default for floating point. +Please see the ABI documentation for your platform. + +64-bit platforms differ in the sizes of the C/C++ fundamental +datatypes, not just from 32-bit platforms but from each other. If a +specific size data type is desired, it is probably best to use the +types defined in the standard C header \code{<inttypes.h>}. + +All known 64-bit platforms except some embedded platforms require that +the stack is 16-byte aligned at the entry to a function. In order to +enforce that, the stack pointer (\code{RSP}) needs to be aligned on an +\code{odd} multiple of 8 bytes before the \code{CALL} instruction. + +In 64-bit mode, the default instruction size is still 32 bits. When +loading a value into a 32-bit register (but not an 8- or 16-bit +register), the upper 32 bits of the corresponding 64-bit register are +set to zero. + +\xsection{reg64}{Register Names in 64-bit Mode} + +NASM uses the following names for general-purpose registers in 64-bit +mode, for 8-, 16-, 32- and 64-bit references, respectively: + +\begin{lstlisting} +AL/AH, CL/CH, DL/DH, BL/BH, SPL, BPL, SIL, DIL, R8B-R15B +AX, CX, DX, BX, SP, BP, SI, DI, R8W-R15W +EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI, R8D-R15D +RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI, R8-R15 +\end{lstlisting} + +This is consistent with the AMD documentation and most other +assemblers. The Intel documentation, however, uses the names +\code{R8L-R15L} for 8-bit references to the higher registers. It is +possible to use those names by definiting them as macros; similarly, +if one wants to use numeric names for the low 8 registers, define them +as macros. The standard macro package \code{altreg} (see +\nref{pkgaltreg}) can be used for this purpose. + +\xsection{id64}{Immediates and Displacements in 64-bit Mode} + +In 64-bit mode, immediates and displacements are generally only 32 +bits wide. NASM will therefore truncate most displacements and +immediates to 32 bits. + +The only instruction which takes a full \textindex{64-bit immediate} is: + +\begin{lstlisting} +mov reg64,imm64 +\end{lstlisting} + +NASM will produce this instruction whenever the programmer uses +\code{MOV} with an immediate into a 64-bit register. If this is not +desirable, simply specify the equivalent 32-bit register, which will +be automatically zero-extended by the processor, or specify the +immediate as \code{DWORD}: + +\begin{lstlisting} +mov rax,foo ; 64-bit immediate +mov rax,qword foo ; (identical) +mov eax,foo ; 32-bit immediate, zero-extended +mov rax,dword foo ; 32-bit immediate, sign-extended +\end{lstlisting} + +The length of these instructions are 10, 5 and 7 bytes, respectively. + +If optimization is enabled and NASM can determine at assembly time +that a shorter instruction will suffice, the shorter instruction will +be emitted unless of course \code{STRICT QWORD} or \code{STRICT DWORD} +is specified (see \nref{strict}): + +\begin{lstlisting} +mov rax,1 ; Assembles as "mov eax,1" (5 bytes) +mov rax,strict qword 1 ; Full 10-byte instruction +mov rax,strict dword 1 ; 7-byte instruction +mov rax,symbol ; 10 bytes, not known at assembly time +lea rax,[rel symbol] ; 7 bytes, usually preferred by the ABI +\end{lstlisting} + +Note that \code{lea rax,[rel symbol]} is position-independent, whereas +\code{mov rax,symbol} is not. Most ABIs prefer or even require +position-independent code in 64-bit mode. However, the \code{MOV} +instruction is able to reference a symbol anywhere in the 64-bit +address space, whereas \code{LEA} is only able to access a symbol within +within 2 GB of the instruction itself (see below.) + +The only instructions which take a full \textindex{64-bit displacement} +is loading or storing, using \code{MOV}, \code{AL}, \code{AX}, \code{EAX} +or \code{RAX} (but no other registers) to an absolute 64-bit address. +Since this is a relatively rarely used instruction (64-bit code +generally uses relative addressing), the programmer has to explicitly +declare the displacement size as \code{ABS QWORD}: + +\begin{lstlisting} +default abs + +mov eax,[foo] ; 32-bit absolute disp, sign-extended +mov eax,[a32 foo] ; 32-bit absolute disp, zero-extended +mov eax,[qword foo] ; 64-bit absolute disp + +default rel + +mov eax,[foo] ; 32-bit relative disp +mov eax,[a32 foo] ; d:o, address truncated to 32 bits(!) +mov eax,[qword foo] ; error +mov eax,[abs qword foo] ; 64-bit absolute disp +\end{lstlisting} + +A sign-extended absolute displacement can access from -2 GB to +2 GB; +a zero-extended absolute displacement can access from 0 to 4 GB. + +\xsection{unix64}{Interfacing to 64-bit C Programs (Unix)} + +On Unix, the 64-bit ABI as well as the x32 ABI (32-bit ABI with the +CPU in 64-bit mode) is defined by the documents at +\href{http://www.nasm.us/abi/unix64}{http://www.nasm.us/abi/unix64} + +Although written for AT\&T-syntax assembly, the concepts apply equally +well for NASM-style assembly. What follows is a simplified summary. + +The first six integer arguments (from the left) are passed in \code{RDI}, +\code{RSI}, \code{RDX}, \code{RCX}, \code{R8}, and \code{R9}, in that +order. Additional integer arguments are passed on the stack. These +registers, plus \code{RAX}, \code{R10} and \code{R11} are destroyed +by function calls, and thus are available for use by the function +without saving. + +Integer return values are passed in \code{RAX} and \code{RDX}, +in that order. + +Floating point is done using SSE registers, except for \code{long double}, +which is 80 bits (\code{TWORD}) on most platforms (Android is +one exception; there \code{long double} is 64 bits and treated the same +as \code{double}.) Floating-point arguments are passed in \code{XMM0} to +\code{XMM7}; return is \code{XMM0} and \code{XMM1}. \code{long double} +are passed on the stack, and returned in \code{ST0} and \code{ST1}. + +All SSE and x87 registers are destroyed by function calls. + +On 64-bit Unix, \code{long} is 64 bits. + +Integer and SSE register arguments are counted separately, so +for the case of + +\begin{lstlisting} +void foo(long a, double b, int c) +\end{lstlisting} + +\code{a} is passed in \code{RDI}, \code{b} in \code{XMM0}, +and \code{c} in \code{ESI}. + +\xsection{win64}{Interfacing to 64-bit C Programs (Win64)} + +The Win64 ABI is described by the document at +\href{http://www.nasm.us/abi/win64}{http://www.nasm.us/abi/win64} + +What follows is a simplified summary. + +The first four integer arguments are passed in \code{RCX}, \code{RDX}, +\code{R8} and \code{R9}, in that order. Additional integer arguments are +passed on the stack. These registers, plus \code{RAX}, \code{R10} and +\code{R11} are destroyed by function calls, and thus are available for +use by the function without saving. + +Integer return values are passed in \code{RAX} only. + +Floating point is done using SSE registers, except for \code{long +double}. Floating-point arguments are passed in \code{XMM0} +to \code{XMM3}; return is \code{XMM0} only. + +On Win64, \code{long} is 32 bits; \code{long long} or \code{\_int64} +is 64 bits. + +Integer and SSE register arguments are counted together, so +for the case of + +\begin{lstlisting} +void foo(long long a, double b, int c) +\end{lstlisting} + +\code{a} is passed in \code{RCX}, \code{b} in \code{XMM1}, +and \code{c} in \code{R8D}. |