doc/latex/src/32bit.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539

%
% vim: ts=4 sw=4 et
%
\xchapter{32bit}{Writing 32-bit Code (Unix, Win32, DJGPP)}

This chapter attempts to cover some of the common issues involved
when writing 32-bit code, to run under \textindex{Win32} or Unix,
or to be linked with C code generated by a Unix-style C compiler such as
\textindex{DJGPP}. It covers how to write assembly code to interface with
32-bit C routines, and how to write position-independent code for
shared libraries.

Almost all 32-bit code, and in particular all code running under
\code{Win32}, \code{DJGPP} or any of the PC Unix variants, runs in
\index{flat memory model}\emph{flat} memory model. This means that
the segment registers and paging have already been set up to give
you the same 32-bit 4Gb address space no matter what segment you
work relative to, and that you should ignore all segment registers
completely. When writing flat-model application code, you never
need to use a segment override or modify any segment register,
and the code-section addresses you pass to \code{CALL} and
\code{JMP} live in the same address space as the data-section addresses
you access your variables by and the stack-section addresses you access
local variables and procedure parameters by. Every address is 32 bits
long and contains only an offset part.

\xsection{32c}{Interfacing to 32-bit C Programs}

A lot of the discussion in \nref{16c}, about interfacing to
16-bit C programs, still applies when working in 32 bits. The absence of
memory models or segmentation worries simplifies things a lot.

\xsubsection{32cunder}{External Symbol Names}

Most 32-bit C compilers share the convention used by 16-bit
compilers, that the names of all global symbols (functions or data)
they define are formed by prefixing an underscore to the name as it
appears in the C program. However, not all of them do: the \code{ELF}
specification states that C symbols do \emph{not} have a leading
underscore on their assembly-language names.

The older Linux \code{a.out} C compiler, all \code{Win32} compilers,
\code{DJGPP}, and \code{NetBSD} and \code{FreeBSD}, all use the leading
underscore; for these compilers, the macros \code{cextern} and
\code{cglobal}, as given in \nref{16cunder}, will still work.
For \code{ELF}, though, the leading underscore should not be used.

See also \nref{opt-pfix}.

\xsubsection{32cfunc}{Function Definitions and Function Calls}

\index{functions!C calling convention}The \textindex{C calling convention}
in 32-bit programs is as follows. In the following description,
the words \emph{caller} and \emph{callee} are used to denote
the function doing the calling and the function which gets called.

\begin{itemize}
    \item{The caller pushes the function's parameters on the stack, one
        after another, in reverse order (right to left, so that the first
        argument specified to the function is pushed last).}

    \item{The caller then executes a near \code{CALL} instruction to pass
        control to the callee.}

    \item{The callee receives control, and typically (although this
        is not actually necessary, in functions which do not need to
        access their parameters) starts by saving the value of \code{ESP}
        in \code{EBP} so as to be able to use \code{EBP} as a base pointer
        to find its parameters on the stack. However, the caller was
        probably doing this too, so part of the calling convention states
        that \code{EBP} must be preserved by any C function. Hence the
        callee, if it is going to set up \code{EBP} as a \textindex{frame
        pointer}, must push the previous value first.}

    \item{The callee may then access its parameters relative to \code{EBP}.
        The doubleword at \code{[EBP]} holds the previous value of
        \code{EBP} as it was pushed; the next doubleword, at \code{[EBP+4]},
        holds the return address, pushed implicitly by \code{CALL}.
        The parameters start after that, at \code{[EBP+8]}. The leftmost
        parameter of the function, since it was pushed last, is accessible
        at this offset from \code{EBP}; the others follow, at successively
        greater offsets. Thus, in a function such as \code{printf} which
        takes a variable number of parameters, the pushing of the
        parameters in reverse order means that the function knows where
        to find its first parameter, which tells it the number and type
        of the remaining ones.}

    \item{The callee may also wish to decrease \code{ESP} further, so as
        to allocate space on the stack for local variables, which will
        then be accessible at negative offsets from \code{EBP}.}

    \item{The callee, if it wishes to return a value to the caller,
        should leave the value in \code{AL}, \code{AX} or \code{EAX}
        depending on the size of the value. Floating-point results
        are typically returned in \code{ST0}.}

    \item{Once the callee has finished processing, it restores 
        \code{ESP} from \code{EBP} if it had allocated local stack space,
        then pops the previous value of \code{EBP}, and returns via
        \code{RET} (equivalently, \code{RETN}).}

    \item{When the caller regains control from the callee, the function
        parameters are still on the stack, so it typically adds an
        immediate constant to \code{ESP} to remove them (instead of
        executing a number of slow \code{POP} instructions). Thus,
        if a function is accidentally called with the wrong number
        of parameters due to a prototype mismatch, the stack will
        still be returned to a sensible state since the caller, which
        \emph{knows} how many parameters it pushed, does the
        removing.}
\end{itemize}

There is an alternative calling convention used by Win32 programs
for Windows API calls, and also for functions called \emph{by} the
Windows API such as window procedures: they follow what Microsoft
calls the \code{\_\_stdcall} convention. This is slightly closer to the
Pascal convention, in that the callee clears the stack by passing a
parameter to the \code{RET} instruction. However, the parameters are
still pushed in right-to-left order.

Thus, you would define a function in C style in the following way:

\begin{lstlisting}
global  _myfunc

_myfunc:
    push    ebp
    mov     ebp,esp
    sub     esp,0x40        ; 64 bytes of local stack space
    mov     ebx,[ebp+8]     ; first parameter to function

    ; some more code

    leave                   ; mov esp,ebp / pop ebp
    ret
\end{lstlisting}

At the other end of the process, to call a C function from your
assembly code, you would do something like this:

\begin{lstlisting}
extern  _printf

    ; and then, further down...

    push    dword [myint]   ; one of my integer variables
    push    dword mystring  ; pointer into my data segment
    call    _printf
    add     esp,byte 8      ; `byte' saves space

    ; then those data items...

segment _DATA

myint       dd  1234
mystring    db  'This number -> %d <- should be 1234',10,0
\end{lstlisting}

This piece of code is the assembly equivalent of the C code

\begin{lstlisting}
    int myint = 1234;
    printf("This number -> %d <- should be 1234\n", myint);
\end{lstlisting}

\xsubsection{32cdata}{Accessing Data Items}

To get at the contents of C variables, or to declare variables which
C can access, you need only declare the names as \code{GLOBAL} or
\code{EXTERN}. (Again, the names require leading underscores, as stated
in \nref{32cunder}.) Thus, a C variable declared as \code{int i}
can be accessed from assembler as

\begin{lstlisting}
    extern _i
    mov eax,[_i]
\end{lstlisting}

And to declare your own integer variable which C programs can access
as \code{extern int j}, you do this (making sure you are assembling in
the \code{\_DATA} segment, if necessary):

\begin{lstlisting}
    global _j
_j  dd 0
\end{lstlisting}

To access a C array, you need to know the size of the components of
the array. For example, \code{int} variables are four bytes long, so if
a C program declares an array as \code{int a[10]}, you can access
\code{a[3]} by coding \code{mov ax,[\_a+12]}. (The byte offset 12 is
obtained by multiplying the desired array index, 3, by the size of
the array element, 4.) The sizes of the C base types in 32-bit compilers
are: 1 for \code{char}, 2 for \code{short}, 4 for \code{int}, \code{long}
and \code{float}, and 8 for \code{double}. Pointers, being 32-bit
addresses, are also 4 bytes long.

To access a C \textindex{data structure}, you need to know the offset from
the base of the structure to the field you are interested in. You
can either do this by converting the C structure definition into a
NASM structure definition (using \code{STRUC}), or by calculating the
one offset and using just that.

To do either of these, you should read your C compiler's manual to
find out how it organizes data structures. NASM gives no special
alignment to structure members in its own \codeindex{STRUC} macro,
so you have to specify alignment yourself if the C compiler generates it.
Typically, you might find that a structure like

\begin{lstlisting}
struct {
    char c;
    int i;
} foo;
\end{lstlisting}

might be eight bytes long rather than five, since the \code{int} field
would be aligned to a four-byte boundary. However, this sort of
feature is sometimes a configurable option in the C compiler, either
using command-line options or \code{\#pragma} lines, so you have to find
out how your own compiler does it.

\xsubsection{32cmacro}{\codeindex{c32.mac}: Helper Macros for the 32-bit C Interface}

Included in the NASM archives, in the \index{misc directory}\code{misc}
directory, is a file \code{c32.mac} of macros. It defines three macros:
\codeindex{proc}, \codeindex{arg} and \codeindex{endproc}. These are
intended to be used for C-style procedure definitions, and they automate
a lot of the work involved in keeping track of the calling convention.

An example of an assembly function using the macro set is given
here:

\begin{lstlisting}
proc    _proc32
%$i         arg
%$j         arg
    mov     eax,[ebp + %$i]
    mov     ebx,[ebp + %$j]
    add     eax,[ebx]
endproc
\end{lstlisting}

This defines \code{\_proc32} to be a procedure taking two arguments, the
first (\code{i}) an integer and the second (\code{j}) a pointer to an
integer. It returns \code{i + *j}.

Note that the \code{arg} macro has an \code{EQU} as the first line of its
expansion, and since the label before the macro call gets prepended
to the first line of the expanded macro, the \code{EQU} works, defining
\code{\%\$i} to be an offset from \code{BP}. A context-local variable is
used, local to the context pushed by the \code{proc} macro and popped
by the \code{endproc} macro, so that the same argument name can be used
in later procedures. Of course, you don't \emph{have} to do that.

\code{arg} can take an optional parameter, giving the size of the
argument. If no size is given, 4 is assumed, since it is likely that
many function parameters will be of type \code{int} or pointers.

\xsection{picdll}{Writing NetBSD/FreeBSD/OpenBSD and Linux/ELF}
\index{Shared Libraries}

\code{ELF} replaced the older \code{a.out} object file format under Linux
because it contains support for \textindex{position-independent code}
(\textindex{PIC}), which makes writing shared libraries much easier. NASM
supports the \code{ELF} position-independent code features, so you can
write Linux \code{ELF} shared libraries in NASM.

\textindex{NetBSD}, and its close cousins \textindex{FreeBSD} and
\textindex{OpenBSD}, take a different approach by hacking PIC support
into the \code{a.out} format. NASM supports this as the \codeindex{aoutb}
output format, so you can write \textindex{BSD} shared libraries in
NASM too.

The operating system loads a PIC shared library by memory-mapping
the library file at an arbitrarily chosen point in the address space
of the running process. The contents of the library's code section
must therefore not depend on where it is loaded in memory.

Therefore, you cannot get at your variables by writing code like
this:

\begin{lstlisting}
    mov     eax,[myvar]             ; WRONG
\end{lstlisting}

Instead, the linker provides an area of memory called the
\textindex{global offset table}, or \textindex{GOT}; the GOT is situated
at a constant distance from your library's code, so if you can find out
where your library is loaded (which is typically done using a \code{CALL}
and \code{POP} combination), you can obtain the address of the GOT, and
you can then load the addresses of your variables out of linker-generated
entries in the GOT.

The \emph{data} section of a PIC shared library does not have these
restrictions: since the data section is writable, it has to be
copied into memory anyway rather than just paged in from the library
file, so as long as it's being copied it can be relocated too. So
you can put ordinary types of relocation in the data section without
too much worry (but see \nref{picglobal} for a caveat).

\xsubsection{picgot}{Obtaining the Address of the GOT}

Each code module in your shared library should define the GOT as an
external symbol:

\begin{lstlisting}
extern  _GLOBAL_OFFSET_TABLE_   ; in ELF
extern  __GLOBAL_OFFSET_TABLE_  ; in BSD a.out
\end{lstlisting}

At the beginning of any function in your shared library which plans
to access your data or BSS sections, you must first calculate the
address of the GOT. This is typically done by writing the function
in this form:

\begin{lstlisting}
func:
    push    ebp
    mov     ebp,esp
    push    ebx
    call    .get_GOT
.get_GOT:
    pop     ebx
    add     ebx,_GLOBAL_OFFSET_TABLE_+$$-.get_GOT wrt ..gotpc

    ; the function body comes here

    mov     ebx,[ebp-4]
    mov     esp,ebp
    pop     ebp
    ret
\end{lstlisting}

(For BSD, again, the symbol \code{\_GLOBAL\_OFFSET\_TABLE} requires a
second leading underscore.)

The first two lines of this function are simply the standard C
prologue to set up a stack frame, and the last three lines are
standard C function epilogue. The third line, and the fourth to last
line, save and restore the \code{EBX} register, because PIC shared
libraries use this register to store the address of the GOT.

The interesting bit is the \code{CALL} instruction and the following
two lines. The \code{CALL} and \code{POP} combination obtains the address
of the label \code{.get\_GOT}, without having to know in advance where
the program was loaded (since the \code{CALL} instruction is encoded
relative to the current position). The \code{ADD} instruction makes use
of one of the special PIC relocation types: \textindex{GOTPC relocation}.
With the \codeindex{WRT ..gotpc} qualifier specified, the symbol
referenced (here \code{\_GLOBAL\_OFFSET\_TABLE\_}, the special symbol
assigned to the GOT) is given as an offset from the beginning of the
section. (Actually, \code{ELF} encodes it as the offset from the operand
field of the \code{ADD} instruction, but NASM simplifies this
deliberately, so you do things the same way for both \code{ELF} and
\code{BSD}.) So the instruction then \emph{adds} the beginning of the
section, to get the real address of the GOT, and subtracts the value of
\code{.get\_GOT} which it knows is in \code{EBX}. Therefore, by the time
that instruction has finished, \code{EBX} contains the address of the GOT.

If you didn't follow that, don't worry: it's never necessary to
obtain the address of the GOT by any other means, so you can put
those three instructions into a macro and safely ignore them:

\begin{lstlisting}
%macro  get_GOT 0
    call    %%getgot
%%getgot:
    pop     ebx
    add     ebx,_GLOBAL_OFFSET_TABLE_+$$-%%getgot wrt ..gotpc
%endmacro
\end{lstlisting}

\xsubsection{piclocal}{Finding Your Local Data Items}

Having got the GOT, you can then use it to obtain the addresses of
your data items. Most variables will reside in the sections you have
declared; they can be accessed using the \index{GOTOFF relocation}
\code{..gotoff} special \indexcode{WRT ..gotoff}\code{WRT} type. The
way this works is like this:

\begin{lstlisting}
    lea     eax,[ebx+myvar wrt ..gotoff]
\end{lstlisting}

The expression \code{myvar wrt ..gotoff} is calculated, when the shared
library is linked, to be the offset to the local variable \code{myvar}
from the beginning of the GOT. Therefore, adding it to \code{EBX} as
above will place the real address of \code{myvar} in \code{EAX}.

If you declare variables as \code{GLOBAL} without specifying a size for
them, they are shared between code modules in the library, but do
not get exported from the library to the program that loaded it.
They will still be in your ordinary data and BSS sections, so you
can access them in the same way as local variables, using the above
\code{..gotoff} mechanism.

Note that due to a peculiarity of the way BSD \code{a.out} format
handles this relocation type, there must be at least one non-local
symbol in the same section as the address you're trying to access.

\xsubsection{picextern}{Finding External and Common Data Items}

If your library needs to get at an external variable (external to
the \emph{library}, not just to one of the modules within it), you must
use the \index{GOT relocations}\indexcode{WRT ..got}\code{..got} type
to get at it. The \code{..got} type, instead of giving you the offset from
the GOT base to the variable, gives you the offset from the GOT base to
a GOT \emph{entry} containing the address of the variable. The linker
will set up this GOT entry when it builds the library, and the
dynamic linker will place the correct address in it at load time. So
to obtain the address of an external variable \code{extvar} in \code{EAX},
you would code

\begin{lstlisting}
    mov     eax,[ebx+extvar wrt ..got]
\end{lstlisting}

This loads the address of \code{extvar} out of an entry in the GOT. The
linker, when it builds the shared library, collects together every
relocation of type \code{..got}, and builds the GOT so as to ensure it
has every necessary entry present.

Common variables must also be accessed in this way.

\xsubsection{picglobal}{Exporting Symbols to the Library User}

If you want to export symbols to the user of the library, you have
to declare whether they are functions or data, and if they are data,
you have to give the size of the data item. This is because the
dynamic linker has to build \index{PLT}\textindex{procedure linkage table}
entries for any exported functions, and also moves exported data
items away from the library's data section in which they were
declared.

So to export a function to users of the library, you must use

\begin{lstlisting}
global  func:function               ; declare it as a function
func:
    push    ebp
    ; etc.
\end{lstlisting}

And to export a data item such as an array, you would have to code

\begin{lstlisting}
global  array:data array.end-array  ; give the size too
    array:  resd    128
.end:
\end{lstlisting}

Be careful: If you export a variable to the library user, by
declaring it as \code{GLOBAL} and supplying a size, the variable will
end up living in the data section of the main program, rather than
in your library's data section, where you declared it. So you will
have to access your own global variable with the \code{..got} mechanism
rather than \code{..gotoff}, as if it were external (which,
effectively, it has become).

Equally, if you need to store the address of an exported global in
one of your data sections, you can't do it by means of the standard
sort of code:

\begin{lstlisting}
dataptr:    dd  global_data_item    ; WRONG
\end{lstlisting}

NASM will interpret this code as an ordinary relocation, in which
\code{global\_data\_item} is merely an offset from the beginning of the
\code{.data} section (or whatever); so this reference will end up
pointing at your data section instead of at the exported global
which resides elsewhere.

Instead of the above code, then, you must write

\begin{lstlisting}
dataptr:    dd  global_data_item wrt ..sym
\end{lstlisting}

which makes use of the special \code{WRT} type \indexcode{WRT ..sym}
\code{..sym} to instruct NASM to search the symbol table for a particular
symbol at that address, rather than just relocating by section base.

Either method will work for functions: referring to one of your
functions by means of

\begin{lstlisting}
funcptr:    dd  my_function
\end{lstlisting}

will give the user the address of the code you wrote, whereas

\begin{lstlisting}
funcptr:    dd  my_function wrt ..sym
\end{lstlisting}

will give the address of the procedure linkage table for the
function, which is where the calling program will \emph{believe} the
function lives. Either address is a valid way to call the function.

\xsubsection{picproc}{Calling Procedures Outside the Library}

Calling procedures outside your shared library has to be done by
means of a \textindex{procedure linkage table}, or \textindex{PLT}.
The PLT is placed at a known offset from where the library is loaded,
so the library code can make calls to the PLT in a position-independent
way. Within the PLT there is code to jump to offsets contained in
the GOT, so function calls to other shared libraries or to routines
in the main program can be transparently passed off to their real
destinations.

To call an external routine, you must use another special PIC
relocation type, \index{PLT relocations}\codeindex{WRT ..plt}. This is
much easier than the GOT-based ones: you simply replace calls such as
\code{CALL printf} with the PLT-relative version \code{CALL printf WRT
..plt}.

\xsubsection{link}{Generating the Library File}

Having written some code modules and assembled them to \code{.o} files,
you then generate your shared library with a command such as

\begin{lstlisting}
ld -shared -o library.so module1.o module2.o        # for ELF
ld -Bshareable -o library.so module1.o module2.o    # for BSD
\end{lstlisting}

For ELF, if your shared library is going to reside in system
directories such as \code{/usr/lib} or \code{/lib}, it is usually worth
using the \codeindex{-soname} flag to the linker, to store the final
library file name, with a version number, into the library:

\begin{lstlisting}
ld -shared -soname library.so.1 -o library.so.1.2 *.o
\end{lstlisting}

You would then copy \code{library.so.1.2} into the library directory,
and create \code{library.so.1} as a symbolic link to it.