1 files changed, 539 insertions, 0 deletions
diff --git a/doc/latex/src/32bit.tex b/doc/latex/src/32bit.tex
new file mode 100644
index 00000000..47c27466
--- /dev/null
+++ b/doc/latex/src/32bit.tex
@@ -0,0 +1,539 @@
+%
+% vim: ts=4 sw=4 et
+%
+\xchapter{32bit}{Writing 32-bit Code (Unix, Win32, DJGPP)}
+
+This chapter attempts to cover some of the common issues involved
+when writing 32-bit code, to run under \textindex{Win32} or Unix,
+or to be linked with C code generated by a Unix-style C compiler such as
+\textindex{DJGPP}. It covers how to write assembly code to interface with
+32-bit C routines, and how to write position-independent code for
+shared libraries.
+
+Almost all 32-bit code, and in particular all code running under
+\code{Win32}, \code{DJGPP} or any of the PC Unix variants, runs in
+\index{flat memory model}\emph{flat} memory model. This means that
+the segment registers and paging have already been set up to give
+you the same 32-bit 4Gb address space no matter what segment you
+work relative to, and that you should ignore all segment registers
+completely. When writing flat-model application code, you never
+need to use a segment override or modify any segment register,
+and the code-section addresses you pass to \code{CALL} and
+\code{JMP} live in the same address space as the data-section addresses
+you access your variables by and the stack-section addresses you access
+local variables and procedure parameters by. Every address is 32 bits
+long and contains only an offset part.
+
+\xsection{32c}{Interfacing to 32-bit C Programs}
+
+A lot of the discussion in \nref{16c}, about interfacing to
+16-bit C programs, still applies when working in 32 bits. The absence of
+memory models or segmentation worries simplifies things a lot.
+
+\xsubsection{32cunder}{External Symbol Names}
+
+Most 32-bit C compilers share the convention used by 16-bit
+compilers, that the names of all global symbols (functions or data)
+they define are formed by prefixing an underscore to the name as it
+appears in the C program. However, not all of them do: the \code{ELF}
+specification states that C symbols do \emph{not} have a leading
+underscore on their assembly-language names.
+
+The older Linux \code{a.out} C compiler, all \code{Win32} compilers,
+\code{DJGPP}, and \code{NetBSD} and \code{FreeBSD}, all use the leading
+underscore; for these compilers, the macros \code{cextern} and
+\code{cglobal}, as given in \nref{16cunder}, will still work.
+For \code{ELF}, though, the leading underscore should not be used.
+
+See also \nref{opt-pfix}.
+
+\xsubsection{32cfunc}{Function Definitions and Function Calls}
+
+\index{functions!C calling convention}The \textindex{C calling convention}
+in 32-bit programs is as follows. In the following description,
+the words \emph{caller} and \emph{callee} are used to denote
+the function doing the calling and the function which gets called.
+
+\begin{itemize}
+    \item{The caller pushes the function's parameters on the stack, one
+        after another, in reverse order (right to left, so that the first
+        argument specified to the function is pushed last).}
+
+    \item{The caller then executes a near \code{CALL} instruction to pass
+        control to the callee.}
+
+    \item{The callee receives control, and typically (although this
+        is not actually necessary, in functions which do not need to
+        access their parameters) starts by saving the value of \code{ESP}
+        in \code{EBP} so as to be able to use \code{EBP} as a base pointer
+        to find its parameters on the stack. However, the caller was
+        probably doing this too, so part of the calling convention states
+        that \code{EBP} must be preserved by any C function. Hence the
+        callee, if it is going to set up \code{EBP} as a \textindex{frame
+        pointer}, must push the previous value first.}
+
+    \item{The callee may then access its parameters relative to \code{EBP}.
+        The doubleword at \code{[EBP]} holds the previous value of
+        \code{EBP} as it was pushed; the next doubleword, at \code{[EBP+4]},
+        holds the return address, pushed implicitly by \code{CALL}.
+        The parameters start after that, at \code{[EBP+8]}. The leftmost
+        parameter of the function, since it was pushed last, is accessible
+        at this offset from \code{EBP}; the others follow, at successively
+        greater offsets. Thus, in a function such as \code{printf} which
+        takes a variable number of parameters, the pushing of the
+        parameters in reverse order means that the function knows where
+        to find its first parameter, which tells it the number and type
+        of the remaining ones.}
+
+    \item{The callee may also wish to decrease \code{ESP} further, so as
+        to allocate space on the stack for local variables, which will
+        then be accessible at negative offsets from \code{EBP}.}
+
+    \item{The callee, if it wishes to return a value to the caller,
+        should leave the value in \code{AL}, \code{AX} or \code{EAX}
+        depending on the size of the value. Floating-point results
+        are typically returned in \code{ST0}.}
+
+    \item{Once the callee has finished processing, it restores 
+        \code{ESP} from \code{EBP} if it had allocated local stack space,
+        then pops the previous value of \code{EBP}, and returns via
+        \code{RET} (equivalently, \code{RETN}).}
+
+    \item{When the caller regains control from the callee, the function
+        parameters are still on the stack, so it typically adds an
+        immediate constant to \code{ESP} to remove them (instead of
+        executing a number of slow \code{POP} instructions). Thus,
+        if a function is accidentally called with the wrong number
+        of parameters due to a prototype mismatch, the stack will
+        still be returned to a sensible state since the caller, which
+        \emph{knows} how many parameters it pushed, does the
+        removing.}
+\end{itemize}
+
+There is an alternative calling convention used by Win32 programs
+for Windows API calls, and also for functions called \emph{by} the
+Windows API such as window procedures: they follow what Microsoft
+calls the \code{\_\_stdcall} convention. This is slightly closer to the
+Pascal convention, in that the callee clears the stack by passing a
+parameter to the \code{RET} instruction. However, the parameters are
+still pushed in right-to-left order.
+
+Thus, you would define a function in C style in the following way:
+
+\begin{lstlisting}
+global  _myfunc
+
+_myfunc:
+    push    ebp
+    mov     ebp,esp
+    sub     esp,0x40        ; 64 bytes of local stack space
+    mov     ebx,[ebp+8]     ; first parameter to function
+
+    ; some more code
+
+    leave                   ; mov esp,ebp / pop ebp
+    ret
+\end{lstlisting}
+
+At the other end of the process, to call a C function from your
+assembly code, you would do something like this:
+
+\begin{lstlisting}
+extern  _printf
+
+    ; and then, further down...
+
+    push    dword [myint]   ; one of my integer variables
+    push    dword mystring  ; pointer into my data segment
+    call    _printf
+    add     esp,byte 8      ; `byte' saves space
+
+    ; then those data items...
+
+segment _DATA
+
+myint       dd  1234
+mystring    db  'This number -> %d <- should be 1234',10,0
+\end{lstlisting}
+
+This piece of code is the assembly equivalent of the C code
+
+\begin{lstlisting}
+    int myint = 1234;
+    printf("This number -> %d <- should be 1234\n", myint);
+\end{lstlisting}
+
+\xsubsection{32cdata}{Accessing Data Items}
+
+To get at the contents of C variables, or to declare variables which
+C can access, you need only declare the names as \code{GLOBAL} or
+\code{EXTERN}. (Again, the names require leading underscores, as stated
+in \nref{32cunder}.) Thus, a C variable declared as \code{int i}
+can be accessed from assembler as
+
+\begin{lstlisting}
+    extern _i
+    mov eax,[_i]
+\end{lstlisting}
+
+And to declare your own integer variable which C programs can access
+as \code{extern int j}, you do this (making sure you are assembling in
+the \code{\_DATA} segment, if necessary):
+
+\begin{lstlisting}
+    global _j
+_j  dd 0
+\end{lstlisting}
+
+To access a C array, you need to know the size of the components of
+the array. For example, \code{int} variables are four bytes long, so if
+a C program declares an array as \code{int a[10]}, you can access
+\code{a[3]} by coding \code{mov ax,[\_a+12]}. (The byte offset 12 is
+obtained by multiplying the desired array index, 3, by the size of
+the array element, 4.) The sizes of the C base types in 32-bit compilers
+are: 1 for \code{char}, 2 for \code{short}, 4 for \code{int}, \code{long}
+and \code{float}, and 8 for \code{double}. Pointers, being 32-bit
+addresses, are also 4 bytes long.
+
+To access a C \textindex{data structure}, you need to know the offset from
+the base of the structure to the field you are interested in. You
+can either do this by converting the C structure definition into a
+NASM structure definition (using \code{STRUC}), or by calculating the
+one offset and using just that.
+
+To do either of these, you should read your C compiler's manual to
+find out how it organizes data structures. NASM gives no special
+alignment to structure members in its own \codeindex{STRUC} macro,
+so you have to specify alignment yourself if the C compiler generates it.
+Typically, you might find that a structure like
+
+\begin{lstlisting}
+struct {
+    char c;
+    int i;
+} foo;
+\end{lstlisting}
+
+might be eight bytes long rather than five, since the \code{int} field
+would be aligned to a four-byte boundary. However, this sort of
+feature is sometimes a configurable option in the C compiler, either
+using command-line options or \code{\#pragma} lines, so you have to find
+out how your own compiler does it.
+
+\xsubsection{32cmacro}{\codeindex{c32.mac}: Helper Macros for the 32-bit C Interface}
+
+Included in the NASM archives, in the \index{misc directory}\code{misc}
+directory, is a file \code{c32.mac} of macros. It defines three macros:
+\codeindex{proc}, \codeindex{arg} and \codeindex{endproc}. These are
+intended to be used for C-style procedure definitions, and they automate
+a lot of the work involved in keeping track of the calling convention.
+
+An example of an assembly function using the macro set is given
+here:
+
+\begin{lstlisting}
+proc    _proc32
+%$i         arg
+%$j         arg
+    mov     eax,[ebp + %$i]
+    mov     ebx,[ebp + %$j]
+    add     eax,[ebx]
+endproc
+\end{lstlisting}
+
+This defines \code{\_proc32} to be a procedure taking two arguments, the
+first (\code{i}) an integer and the second (\code{j}) a pointer to an
+integer. It returns \code{i + *j}.
+
+Note that the \code{arg} macro has an \code{EQU} as the first line of its
+expansion, and since the label before the macro call gets prepended
+to the first line of the expanded macro, the \code{EQU} works, defining
+\code{\%\$i} to be an offset from \code{BP}. A context-local variable is
+used, local to the context pushed by the \code{proc} macro and popped
+by the \code{endproc} macro, so that the same argument name can be used
+in later procedures. Of course, you don't \emph{have} to do that.
+
+\code{arg} can take an optional parameter, giving the size of the
+argument. If no size is given, 4 is assumed, since it is likely that
+many function parameters will be of type \code{int} or pointers.
+
+\xsection{picdll}{Writing NetBSD/FreeBSD/OpenBSD and Linux/ELF}
+\index{Shared Libraries}
+
+\code{ELF} replaced the older \code{a.out} object file format under Linux
+because it contains support for \textindex{position-independent code}
+(\textindex{PIC}), which makes writing shared libraries much easier. NASM
+supports the \code{ELF} position-independent code features, so you can
+write Linux \code{ELF} shared libraries in NASM.
+
+\textindex{NetBSD}, and its close cousins \textindex{FreeBSD} and
+\textindex{OpenBSD}, take a different approach by hacking PIC support
+into the \code{a.out} format. NASM supports this as the \codeindex{aoutb}
+output format, so you can write \textindex{BSD} shared libraries in
+NASM too.
+
+The operating system loads a PIC shared library by memory-mapping
+the library file at an arbitrarily chosen point in the address space
+of the running process. The contents of the library's code section
+must therefore not depend on where it is loaded in memory.
+
+Therefore, you cannot get at your variables by writing code like
+this:
+
+\begin{lstlisting}
+    mov     eax,[myvar]             ; WRONG
+\end{lstlisting}
+
+Instead, the linker provides an area of memory called the
+\textindex{global offset table}, or \textindex{GOT}; the GOT is situated
+at a constant distance from your library's code, so if you can find out
+where your library is loaded (which is typically done using a \code{CALL}
+and \code{POP} combination), you can obtain the address of the GOT, and
+you can then load the addresses of your variables out of linker-generated
+entries in the GOT.
+
+The \emph{data} section of a PIC shared library does not have these
+restrictions: since the data section is writable, it has to be
+copied into memory anyway rather than just paged in from the library
+file, so as long as it's being copied it can be relocated too. So
+you can put ordinary types of relocation in the data section without
+too much worry (but see \nref{picglobal} for a caveat).
+
+\xsubsection{picgot}{Obtaining the Address of the GOT}
+
+Each code module in your shared library should define the GOT as an
+external symbol:
+
+\begin{lstlisting}
+extern  _GLOBAL_OFFSET_TABLE_   ; in ELF
+extern  __GLOBAL_OFFSET_TABLE_  ; in BSD a.out
+\end{lstlisting}
+
+At the beginning of any function in your shared library which plans
+to access your data or BSS sections, you must first calculate the
+address of the GOT. This is typically done by writing the function
+in this form:
+
+\begin{lstlisting}
+func:
+    push    ebp
+    mov     ebp,esp
+    push    ebx
+    call    .get_GOT
+.get_GOT:
+    pop     ebx
+    add     ebx,_GLOBAL_OFFSET_TABLE_+$$-.get_GOT wrt ..gotpc
+
+    ; the function body comes here
+
+    mov     ebx,[ebp-4]
+    mov     esp,ebp
+    pop     ebp
+    ret
+\end{lstlisting}
+
+(For BSD, again, the symbol \code{\_GLOBAL\_OFFSET\_TABLE} requires a
+second leading underscore.)
+
+The first two lines of this function are simply the standard C
+prologue to set up a stack frame, and the last three lines are
+standard C function epilogue. The third line, and the fourth to last
+line, save and restore the \code{EBX} register, because PIC shared
+libraries use this register to store the address of the GOT.
+
+The interesting bit is the \code{CALL} instruction and the following
+two lines. The \code{CALL} and \code{POP} combination obtains the address
+of the label \code{.get\_GOT}, without having to know in advance where
+the program was loaded (since the \code{CALL} instruction is encoded
+relative to the current position). The \code{ADD} instruction makes use
+of one of the special PIC relocation types: \textindex{GOTPC relocation}.
+With the \codeindex{WRT ..gotpc} qualifier specified, the symbol
+referenced (here \code{\_GLOBAL\_OFFSET\_TABLE\_}, the special symbol
+assigned to the GOT) is given as an offset from the beginning of the
+section. (Actually, \code{ELF} encodes it as the offset from the operand
+field of the \code{ADD} instruction, but NASM simplifies this
+deliberately, so you do things the same way for both \code{ELF} and
+\code{BSD}.) So the instruction then \emph{adds} the beginning of the
+section, to get the real address of the GOT, and subtracts the value of
+\code{.get\_GOT} which it knows is in \code{EBX}. Therefore, by the time
+that instruction has finished, \code{EBX} contains the address of the GOT.
+
+If you didn't follow that, don't worry: it's never necessary to
+obtain the address of the GOT by any other means, so you can put
+those three instructions into a macro and safely ignore them:
+
+\begin{lstlisting}
+%macro  get_GOT 0
+    call    %%getgot
+%%getgot:
+    pop     ebx
+    add     ebx,_GLOBAL_OFFSET_TABLE_+$$-%%getgot wrt ..gotpc
+%endmacro
+\end{lstlisting}
+
+\xsubsection{piclocal}{Finding Your Local Data Items}
+
+Having got the GOT, you can then use it to obtain the addresses of
+your data items. Most variables will reside in the sections you have
+declared; they can be accessed using the \index{GOTOFF relocation}
+\code{..gotoff} special \indexcode{WRT ..gotoff}\code{WRT} type. The
+way this works is like this:
+
+\begin{lstlisting}
+    lea     eax,[ebx+myvar wrt ..gotoff]
+\end{lstlisting}
+
+The expression \code{myvar wrt ..gotoff} is calculated, when the shared
+library is linked, to be the offset to the local variable \code{myvar}
+from the beginning of the GOT. Therefore, adding it to \code{EBX} as
+above will place the real address of \code{myvar} in \code{EAX}.
+
+If you declare variables as \code{GLOBAL} without specifying a size for
+them, they are shared between code modules in the library, but do
+not get exported from the library to the program that loaded it.
+They will still be in your ordinary data and BSS sections, so you
+can access them in the same way as local variables, using the above
+\code{..gotoff} mechanism.
+
+Note that due to a peculiarity of the way BSD \code{a.out} format
+handles this relocation type, there must be at least one non-local
+symbol in the same section as the address you're trying to access.
+
+\xsubsection{picextern}{Finding External and Common Data Items}
+
+If your library needs to get at an external variable (external to
+the \emph{library}, not just to one of the modules within it), you must
+use the \index{GOT relocations}\indexcode{WRT ..got}\code{..got} type
+to get at it. The \code{..got} type, instead of giving you the offset from
+the GOT base to the variable, gives you the offset from the GOT base to
+a GOT \emph{entry} containing the address of the variable. The linker
+will set up this GOT entry when it builds the library, and the
+dynamic linker will place the correct address in it at load time. So
+to obtain the address of an external variable \code{extvar} in \code{EAX},
+you would code
+
+\begin{lstlisting}
+    mov     eax,[ebx+extvar wrt ..got]
+\end{lstlisting}
+
+This loads the address of \code{extvar} out of an entry in the GOT. The
+linker, when it builds the shared library, collects together every
+relocation of type \code{..got}, and builds the GOT so as to ensure it
+has every necessary entry present.
+
+Common variables must also be accessed in this way.
+
+\xsubsection{picglobal}{Exporting Symbols to the Library User}
+
+If you want to export symbols to the user of the library, you have
+to declare whether they are functions or data, and if they are data,
+you have to give the size of the data item. This is because the
+dynamic linker has to build \index{PLT}\textindex{procedure linkage table}
+entries for any exported functions, and also moves exported data
+items away from the library's data section in which they were
+declared.
+
+So to export a function to users of the library, you must use
+
+\begin{lstlisting}
+global  func:function               ; declare it as a function
+func:
+    push    ebp
+    ; etc.
+\end{lstlisting}
+
+And to export a data item such as an array, you would have to code
+
+\begin{lstlisting}
+global  array:data array.end-array  ; give the size too
+    array:  resd    128
+.end:
+\end{lstlisting}
+
+Be careful: If you export a variable to the library user, by
+declaring it as \code{GLOBAL} and supplying a size, the variable will
+end up living in the data section of the main program, rather than
+in your library's data section, where you declared it. So you will
+have to access your own global variable with the \code{..got} mechanism
+rather than \code{..gotoff}, as if it were external (which,
+effectively, it has become).
+
+Equally, if you need to store the address of an exported global in
+one of your data sections, you can't do it by means of the standard
+sort of code:
+
+\begin{lstlisting}
+dataptr:    dd  global_data_item    ; WRONG
+\end{lstlisting}
+
+NASM will interpret this code as an ordinary relocation, in which
+\code{global\_data\_item} is merely an offset from the beginning of the
+\code{.data} section (or whatever); so this reference will end up
+pointing at your data section instead of at the exported global
+which resides elsewhere.
+
+Instead of the above code, then, you must write
+
+\begin{lstlisting}
+dataptr:    dd  global_data_item wrt ..sym
+\end{lstlisting}
+
+which makes use of the special \code{WRT} type \indexcode{WRT ..sym}
+\code{..sym} to instruct NASM to search the symbol table for a particular
+symbol at that address, rather than just relocating by section base.
+
+Either method will work for functions: referring to one of your
+functions by means of
+
+\begin{lstlisting}
+funcptr:    dd  my_function
+\end{lstlisting}
+
+will give the user the address of the code you wrote, whereas
+
+\begin{lstlisting}
+funcptr:    dd  my_function wrt ..sym
+\end{lstlisting}
+
+will give the address of the procedure linkage table for the
+function, which is where the calling program will \emph{believe} the
+function lives. Either address is a valid way to call the function.
+
+\xsubsection{picproc}{Calling Procedures Outside the Library}
+
+Calling procedures outside your shared library has to be done by
+means of a \textindex{procedure linkage table}, or \textindex{PLT}.
+The PLT is placed at a known offset from where the library is loaded,
+so the library code can make calls to the PLT in a position-independent
+way. Within the PLT there is code to jump to offsets contained in
+the GOT, so function calls to other shared libraries or to routines
+in the main program can be transparently passed off to their real
+destinations.
+
+To call an external routine, you must use another special PIC
+relocation type, \index{PLT relocations}\codeindex{WRT ..plt}. This is
+much easier than the GOT-based ones: you simply replace calls such as
+\code{CALL printf} with the PLT-relative version \code{CALL printf WRT
+..plt}.
+
+\xsubsection{link}{Generating the Library File}
+
+Having written some code modules and assembled them to \code{.o} files,
+you then generate your shared library with a command such as
+
+\begin{lstlisting}
+ld -shared -o library.so module1.o module2.o        # for ELF
+ld -Bshareable -o library.so module1.o module2.o    # for BSD
+\end{lstlisting}
+
+For ELF, if your shared library is going to reside in system
+directories such as \code{/usr/lib} or \code{/lib}, it is usually worth
+using the \codeindex{-soname} flag to the linker, to store the final
+library file name, with a version number, into the library:
+
+\begin{lstlisting}
+ld -shared -soname library.so.1 -o library.so.1.2 *.o
+\end{lstlisting}
+
+You would then copy \code{library.so.1.2} into the library directory,
+and create \code{library.so.1} as a symbolic link to it.