diff options
Diffstat (limited to 'doc/latex/src/outfmt.tex')
-rw-r--r-- | doc/latex/src/outfmt.tex | 1606 |
1 files changed, 1606 insertions, 0 deletions
diff --git a/doc/latex/src/outfmt.tex b/doc/latex/src/outfmt.tex new file mode 100644 index 00000000..7f4cb976 --- /dev/null +++ b/doc/latex/src/outfmt.tex @@ -0,0 +1,1606 @@ +% +% vim: ts=4 sw=4 et +% +\xchapter{outfmt}{\textindexlc{Output Formats}} + +NASM is a portable assembler, designed to be able to compile on any +ANSI C-supporting platform and produce output to run on a variety of +Intel x86 operating systems. For this reason, it has a large number +of available output formats, selected using the \codeindex{-f} option +on the NASM \textindex{command line}. Each of these formats, along with +its extensions to the base NASM syntax, is detailed in this chapter. + +\xsection{binfmt}{\codeindex{bin}: \textindexlc{Flat-Form Binary}\index{pure binary} Output} +\index{file extension!bin} + +The \code{bin} format does not produce object files: it generates +nothing in the output file except the code you wrote. Such ``pure +binary'' files are used by \textindex{MS-DOS}: \codeindex{.COM} +executables and \codeindex{.SYS} device drivers are pure binary +files. Pure binary output is also useful for \textindex{operating system} +and \textindex{boot loader} development. + +The \code{bin} format supports \textindex{multiple section names}. +For details of how NASM handles sections in the \code{bin} format, +see \nref{multisec}. + +Using the \code{bin} format puts NASM by default into 16-bit mode +(see \nref{bits}). In order to use \code{bin} to write 32-bit +or 64-bit code, such as an OS kernel, you need to explicitly issue +the \indexcode{BITS}\code{BITS 32} or \indexcode{BITS}\code{BITS 64} +directive. + +\code{bin} has no default output file name extension: instead, it +leaves your file name as it is once the original extension has been +removed. Thus, the default is for NASM to assemble \code{binprog.asm} +into a binary file called \code{binprog}. + +\xsubsection{binorg}{\codeindex{ORG}: Binary File \textindexlc{Program Origin}} + +The \code{bin} format provides an additional directive to the list +given in \nref{directive}: \code{ORG}. The function of the +\code{ORG} directive is to specify the origin address which NASM +will assume the program begins at when it is loaded into memory. + +For example, the following code will generate the longword +\code{0x00000104}: + +\begin{lstlisting} +org 0x100 +dd label +label: +\end{lstlisting} + +Unlike the \code{ORG} directive provided by MASM-compatible assemblers, +which allows you to jump around in the object file and overwrite +code you have already generated, NASM's \code{ORG} does exactly what +the directive says: \emph{origin}. Its sole function is to specify one +offset which is added to all internal address references within the +section; it does not permit any of the trickery that MASM's version +does. See \nref{proborg} for further comments. + +\xsubsection{binseg}{\code{bin} Extensions to the \code{SECTION} Directive} +\index{section!bin extensions to} + +The \code{bin} output format extends the \code{SECTION} (or \code{SEGMENT}) +directive to allow you to specify the alignment requirements of segments. +This is done by appending the \codeindex{ALIGN} qualifier to the end of +the section-definition line. For example, + +\begin{lstlisting} +section .data align=16 +\end{lstlisting} + +switches to the section \code{.data} and also specifies that it must be +aligned on a 16-byte boundary. + +The parameter to \code{ALIGN} specifies how many low bits of the +section start address must be forced to zero. The alignment value +given may be any power of two. +\index{section alignment!in bin} +\index{segment alignment!in bin} +\index{alignment!in bin sections} + +\xsubsection{multisec}{\textindexlc{Multisection} Support for the \code{bin} Format} +\index{bin!multisection} + +The \code{bin} format allows the use of multiple sections, of arbitrary names, +besides the ``known'' \code{.text}, \code{.data}, and \code{.bss} names. + +\begin{itemize} + \item{Sections may be designated \codeindex{progbits} or \codeindex{nobits}. + Default is \code{progbits} (except \code{.bss}, which defaults to + \code{nobits}, of course).} + + \item{Sections can be aligned at a specified boundary following the previous + section with \code{align=}, or at an arbitrary byte-granular position with + \codeindex{start=}.} + + \item{Sections can be given a virtual start address, which will be used + for the calculation of all memory references within that section + with \codeindex{vstart=}.} + + \item{Sections can be ordered using \codeindex{follows=}\code{<section>} or + \codeindex{vfollows=}\code{<section>} as an alternative to specifying + an explicit start address.} + + \item{Arguments to \code{org}, \code{start}, \code{vstart}, and \code{align=} + are critical expressions. See \nref{crit}. E.g. + \code{align=(1 << ALIGN\_SHIFT)} - \code{ALIGN\_SHIFT} must be defined + before it is used here.} + + \item{Any code which comes before an explicit \code{SECTION} directive + is directed by default into the \code{.text} section.} + + \item{If an \code{ORG} statement is not given, \code{ORG 0} is used by default.} + + \item{The \code{.bss} section will be placed after the last \code{progbits} + section, unless \code{start=}, \code{vstart=}, \code{follows=}, or + \code{vfollows=} has been specified.} + + \item{All sections are aligned on dword boundaries, unless a different + alignment has been specified.} + + \item{Sections may not overlap.} + + \item{NASM creates the \code{section.<secname>.start} for each section, + which may be used in your code.} +\end{itemize} + +\xsubsection{map}{\textindexlc{Map Files}} +\index{file extension!map} + +Map files can be generated in \code{-f bin} format by means of the \code{[map]} +option. Map types of \code{all} (default), \code{brief}, \code{sections}, +\code{segments}, or \code{symbols} may be specified. Output may be directed +to \code{stdout} (default), \code{stderr}, or a specified file. E.g. +\code{[map symbols myfile.map]}. No ``user form'' exists, the square +brackets must be used. + +\xsection{ithfmt}{\codeindex{ith}: \textindexlc{Intel Hex} Output} +\index{file extension!ith} + +The \code{ith} file format produces Intel hex-format files. Just as the +\code{bin} format, this is a flat memory image format with no support for +relocation or linking. It is usually used with ROM programmers and +similar utilities. + +All extensions supported by the \code{bin} file format is also supported by +the \code{ith} file format. + +\code{ith} provides a default output file-name extension of \code{.ith}. + +\xsection{srecfmt}{\codeindex{srec}: \textindexlc{Motorola S-Records} Output} +\index{file extension!srec} + +The \code{srec} file format produces Motorola S-records files. Just as the +\code{bin} format, this is a flat memory image format with no support for +relocation or linking. It is usually used with ROM programmers and similar +utilities. + +All extensions supported by the \code{bin} file format is also supported by +the \code{srec} file format. + +\code{srec} provides a default output file-name extension of \code{.srec}. + +\xsection{objfmt}{\codeindex{obj}: \textindexlc{Microsoft OMF}\index{OMF} Object Files} +\index{file extension!obj} + +The \code{obj} file format (NASM calls it \code{obj} rather than +\code{omf} for historical reasons) is the one produced by \textindex{MASM} +and \textindex{TASM}, which is typically fed to 16-bit DOS linkers +to produce \codeindex{.EXE} files. It is also the format used by +\textindex{OS/2}. + +\code{obj} provides a default output file-name extension of \code{.obj}. + +\code{obj} is not exclusively a 16-bit format, though: NASM has full +support for the 32-bit extensions to the format. In particular, +32-bit \code{obj} format files are used by \textindex{Borland's Win32 +compilers}, instead of using Microsoft's newer \codeindex{win32} object +file format. + +The \code{obj} format does not define any special segment names: you +can call your segments anything you like. Typical names for segments +in \code{obj} format files are \code{CODE}, \code{DATA} and \code{BSS}. + +If your source file contains code before specifying an explicit +\code{SEGMENT} directive, then NASM will invent its own segment called +\codeindex{\_\_NASMDEFSEG} for you. + +When you define a segment in an \code{obj} file, NASM defines the +segment name as a symbol as well, so that you can access the segment +address of the segment. So, for example: + +\begin{lstlisting} +segment data + +dvar: dw 1234 + +segment code + +function: + mov ax,data ; get segment address of data + mov ds,ax ; and move it into DS + inc word [dvar] ; now this reference will work + ret +\end{lstlisting} + +The \code{obj} format also enables the use of the \codeindex{SEG} +and \codeindex{WRT} operators, so that you can write code which +does things like + +\begin{lstlisting} +extern foo + + mov ax,seg foo ; get preferred segment of foo + mov ds,ax + mov ax,data ; a different segment + mov es,ax + mov ax,[ds:foo] ; this accesses `foo' + mov [es:foo wrt data],bx ; so does this +\end{lstlisting} + +\xsubsection{objseg}{\code{obj} Extensions to the \code{SEGMENT} Directive} +\index{SEGMENT!obj extensions to} + +The \code{obj} output format extends the \code{SEGMENT} (or \code{SECTION}) +directive to allow you to specify various properties of the segment +you are defining. This is done by appending extra qualifiers to the +end of the segment-definition line. For example, + +\begin{lstlisting} +segment code private align=16 +\end{lstlisting} + +defines the segment \code{code}, but also declares it to be a private +segment, and requires that the portion of it described in this code +module must be aligned on a 16-byte boundary. + +The available qualifiers are: + +%\begin{tabular}{ l l } +%\codeindex{CLASS} & +%\begin{minipage}[t]{0.8\columnwidth} +%can be used to specify the segment class; this feature indicates to +%the linker that segments of the same class should be placed near each +%other in the output file. The class name can be any word, e.g. +%\code{CLASS=CODE}. +%\end{minipage} \\ +% +%\codeindex{OVERLAY} & +%\begin{minipage}[t]{0.8\columnwidth} +%like \code{CLASS}, is specified with an arbitrary word as an argument, +%and provides overlay information to an overlay-capable linker. +%\end{minipage} +%\end{tabular} + +\begin{itemize} + \item{\codeindex{PRIVATE}, \codeindex{PUBLIC}, \codeindex{COMMON} + and \codeindex{STACK} specify the combination characteristics + of the segment. \code{PRIVATE} segments do not get combined + with any others by the linker; \code{PUBLIC} and \code{STACK} + segments get concatenated together at link time; and \code{COMMON} + segments all get overlaid on top of each other rather than stuck + end-to-end.} + + \item{\codeindex{ALIGN} is used, as shown above, to specify how many + low bits of the segment start address must be forced to zero. + The alignment value given may be any power of two from 1 to 4096; + in reality, the only values supported are 1, 2, 4, 16, 256 and 4096, + so if 8 is specified it will be rounded up to 16, and 32, 64 and 128 + will all be rounded up to 256, and so on. Note that alignment to + 4096-byte boundaries is a \textindex{PharLap} extension to the + format and may not be supported by all linkers. + \index{section alignment!in OBJ} + \index{segment alignment!in OBJ} + \index{alignment!in OBJ sections}} + + \item{\codeindex{CLASS} can be used to specify the segment class; + this feature indicates to the linker that segments of the same + class should be placed near each other in the output file. + The class name can be any word, e.g. \code{CLASS=CODE}.} + + \item{\codeindex{OVERLAY}, like \code{CLASS}, is specified with + an arbitrary word as an argument, and provides overlay information + to an overlay-capable linker.} + + \item{Segments can be declared as \codeindex{USE16} or \codeindex{USE32}, + which has the effect of recording the choice in the object file + and also ensuring that NASM's default assembly mode when assembling + in that segment is 16-bit or 32-bit respectively.} + + \item{When writing \textindex{OS/2} object files, you should declare + 32-bit segments as \codeindex{FLAT}, which causes the default + segment base for anything in the segment to be the special group + \code{FLAT}, and also defines the group if it is not already defined.} + + \item{The \code{obj} file format also allows segments to be declared as + having a pre-defined absolute segment address, although no linkers + are currently known to make sensible use of this feature; + nevertheless, NASM allows you to declare a segment such as + \code{SEGMENT SCREEN ABSOLUTE=0xB800} if you need to. The + \codeindex{ABSOLUTE} and \code{ALIGN} keywords are mutually + exclusive.} +\end{itemize} + +NASM's default segment attributes are \code{PUBLIC}, \code{ALIGN=1}, no +class, no overlay, and \code{USE16}. + +\xsubsection{group}{\codeindex{GROUP}: Defining Groups of Segments} +\index{segments!groups of} + +The \code{obj} format also allows segments to be grouped, so that a +single segment register can be used to refer to all the segments in +a group. NASM therefore supplies the \code{GROUP} directive, whereby +you can code + +\begin{lstlisting} +segment data + ; some data +segment bss + ; some uninitialized data +group dgroup data bss +\end{lstlisting} + +which will define a group called \code{dgroup} to contain the segments +\code{data} and \code{bss}. Like \code{SEGMENT}, \code{GROUP} causes +the group name to be defined as a symbol, so that you can refer to +a variable \code{var} in the \code{data} segment as \code{var wrt data} +or as \code{var wrt dgroup}, depending on which segment value is +currently in your segment register. + +If you just refer to \code{var}, however, and \code{var} is declared +in a segment which is part of a group, then NASM will default to giving +you the offset of \code{var} from the beginning of the \emph{group}, +not the \emph{segment}. Therefore \code{SEG var}, also, will return +the group base rather than the segment base. + +NASM will allow a segment to be part of more than one group, but +will generate a warning if you do this. Variables declared in a +segment which is part of more than one group will default to being +relative to the first group that was defined to contain the segment. + +A group does not have to contain any segments; you can still make +\code{WRT} references to a group which does not contain the variable +you are referring to. OS/2, for example, defines the special group +\code{FLAT} with no segments in it. + +\xsubsection{uppercase}{\codeindex{UPPERCASE}: Disabling Case Sensitivity in Output} + +Although NASM itself is \textindex{case sensitive}, some OMF linkers are +not; therefore it can be useful for NASM to output single-case +object files. The \code{UPPERCASE} format-specific directive causes all +segment, group and symbol names that are written to the object file +to be forced to upper case just before being written. Within a +source file, NASM is still case-sensitive; but the object file can +be written entirely in upper case if desired. + +\code{UPPERCASE} is used alone on a line; it requires no parameters. + +\xsubsection{import}{\codeindex{IMPORT}: Importing DLL Symbols} +\index{DLL symbols!importing} +\index{symbols!importing from DLLs} + +The \code{IMPORT} format-specific directive defines a symbol to be +imported from a DLL, for use if you are writing a DLL's +\textindex{import library} in NASM. You still need to declare the +symbol as \code{EXTERN} as well as using the \code{IMPORT} +directive. + +The \code{IMPORT} directive takes two required parameters, separated +by white space, which are (respectively) the name of the symbol you +wish to import and the name of the library you wish to import it +from. For example: + +\begin{lstlisting} +import WSAStartup wsock32.dll +\end{lstlisting} + +A third optional parameter gives the name by which the symbol is +known in the library you are importing it from, in case this is not +the same as the name you wish the symbol to be known by to your code +once you have imported it. For example: + +\begin{lstlisting} +import asyncsel wsock32.dll WSAAsyncSelect +\end{lstlisting} + +\xsubsection{export}{\codeindex{EXPORT}: Exporting DLL Symbols} +\index{DLL symbols!exporting} +\index{symbols!exporting from DLLs} + +The \code{EXPORT} format-specific directive defines a global +symbol to be exported as a DLL symbol, for use if you are +writing a DLL in NASM. You still need to declare the symbol +as \code{GLOBAL} as well as using the \code{EXPORT} directive. + +\code{EXPORT} takes one required parameter, which is the name of the +symbol you wish to export, as it was defined in your source file. An +optional second parameter (separated by white space from the first) +gives the \emph{external} name of the symbol: the name by which you +wish the symbol to be known to programs using the DLL. If this name +is the same as the internal name, you may leave the second parameter +off. + +Further parameters can be given to define attributes of the exported +symbol. These parameters, like the second, are separated by white +space. If further parameters are given, the external name must also +be specified, even if it is the same as the internal name. The +available attributes are: + +\begin{itemize} + \item{\code{resident} indicates that the exported name is + to be kept resident by the system loader. This is + an optimisation for frequently used symbols imported + by name.} + + \item{\code{nodata} indicates that the exported symbol + is a function which does not make use of any initialized + data.} + + \item{\code{parm=NNN}, where \code{NNN} is an integer, sets + the number of parameter words for the case in which + the symbol is a call gate between 32-bit and 16-bit + segments.} + + \item{An attribute which is just a number indicates that + the symbol should be exported with an identifying + number (ordinal), and gives the desired number.} +\end{itemize} + +For example: + +\begin{lstlisting} +export myfunc +export myfunc TheRealMoreFormalLookingFunctionName +export myfunc myfunc 1234 ; export by ordinal +export myfunc myfunc resident parm=23 nodata +\end{lstlisting} + +\xsubsection{dotdotstart}{\codeindex{..start}: Defining the \textindexlc{Program Entry Point}} + +\code{OMF} linkers require exactly one of the object files being linked to +define the program entry point, where execution will begin when the +program is run. If the object file that defines the entry point is +assembled using NASM, you specify the entry point by declaring the +special symbol \code{..start} at the point where you wish execution to +begin. + +\xsubsection{objextern}{\code{obj} Extensions to the \code{EXTERN} Directive} +\index{EXTERN!obj extensions to} + +If you declare an external symbol with the directive + +\begin{lstlisting} +extern foo +\end{lstlisting} + +then references such as \code{mov ax,foo} will give you the offset of +\code{foo} from its preferred segment base (as specified in whichever +module \code{foo} is actually defined in). So to access the contents of +\code{foo} you will usually need to do something like + +\begin{lstlisting} +mov ax,seg foo ; get preferred segment base +mov es,ax ; move it into ES +mov ax,[es:foo] ; and use offset `foo' from it +\end{lstlisting} + +This is a little unwieldy, particularly if you know that an external +is going to be accessible from a given segment or group, say +\code{dgroup}. So if \code{DS} already contained \code{dgroup}, +you could simply code + +\begin{lstlisting} +mov ax,[foo wrt dgroup] +\end{lstlisting} + +However, having to type this every time you want to access \code{foo} +can be a pain; so NASM allows you to declare \code{foo} in the +alternative form + +\begin{lstlisting} +extern foo:wrt dgroup +\end{lstlisting} + +This form causes NASM to pretend that the preferred segment base of +\code{foo} is in fact \code{dgroup}; so the expression \code{seg foo} +will now return \code{dgroup}, and the expression \code{foo} is +equivalent to \code{foo wrt dgroup}. + +This \index{default-WRT mechanism}default-\code{WRT} mechanism can be used +to make externals appear to be relative to any group or segment in +your program. It can also be applied to common variables: see +\nref{objcommon}. + +\xsubsection{objcommon}{\code{obj} Extensions to the \code{COMMON} Directive} +\index{COMMON!obj extensions to} + +The \code{obj} format allows common variables to be either near +\index{common variables!near} or far\index{common variables!far}; +NASM allows you to specify which your variables should be by the +use of the syntax + +\begin{lstlisting} +common nearvar 2:near ; nearvar is a near common +common farvar 10:far ; and farvar is far +\end{lstlisting} + +Far common variables may be greater in size than 64Kb, and so the +OMF specification says that they are declared as a number of +\emph{elements} of a given size. So a 10-byte far common variable could +be declared as ten one-byte elements, five two-byte elements, two +five-byte elements or one ten-byte element. + +Some \code{OMF} linkers require the \index{element size!in common +variables}\index{common variables!element size}element size, as well as +the variable size, to match when resolving common variables declared +in more than one module. Therefore NASM must allow you to specify +the element size on your far common variables. This is done by the +following syntax: + +\begin{lstlisting} +common c_5by2 10:far 5 ; two five-byte elements +common c_2by5 10:far 2 ; five two-byte elements +\end{lstlisting} + +If no element size is specified, the default is 1. Also, the \code{FAR} +keyword is not required when an element size is specified, since +only far commons may have element sizes at all. So the above +declarations could equivalently be + +\begin{lstlisting} +common c_5by2 10:5 ; two five-byte elements +common c_2by5 10:2 ; five two-byte elements +\end{lstlisting} + +In addition to these extensions, the \code{COMMON} directive +in \code{obj} also supports default-\code{WRT} specification +like \code{EXTERN} does (explained in \nref{objextern}). +So you can also declare things like + +\begin{lstlisting} +common foo 10:wrt dgroup +common bar 16:far 2:wrt data +common baz 24:wrt data:6 +\end{lstlisting} + +\xsubsection{objdepend}{Embedded File Dependency Information} + +Since NASM 2.13.02, \code{obj} files contain embedded dependency file +information. To suppress the generation of dependencies, use + +\begin{lstlisting} +%pragma obj nodepend +\end{lstlisting} + +\xsection{win32fmt}{\codeindex{win32}: Microsoft Win32 Object Files} + +The \code{win32} output format generates Microsoft Win32 object files, +suitable for passing to Microsoft linkers such as \emph{Visual C++}. +Note that Borland Win32 compilers do not use this format, but use +\code{obj} instead (see \nref{objfmt}). + +\code{win32} provides a default output file-name extension of \code{.obj}. + +Note that although Microsoft say that Win32 object files follow the +COFF (Common Object File Format) standard, the object files produced +by Microsoft Win32 compilers are not compatible with COFF linkers such +as DJGPP's, and vice versa. This is due to a difference of opinion over +the precise semantics of PC-relative relocations. To produce COFF files +suitable for DJGPP, use NASM's \code{coff} output format; conversely, +the \code{coff} format does not produce object files that Win32 linkers +can generate correct output from. + +\xsubsection{win32sect}{\code{win32} Extensions to the \code{SECTION} Directive} +\index{SECTION!win32 extensions to} + +Like the \code{obj} format, \code{win32} allows you to specify additional +information on the \code{SECTION} directive line, to control the type +and properties of sections you declare. Section types and properties +are generated automatically by NASM for the \textindex{standard section names} +\code{.text}, \code{.data} and \code{.bss}, but may still be overridden by +these qualifiers. + +The available qualifiers are: + +\begin{itemize} + \item{\code{code}, or equivalently \code{text}, defines the section + to be a code section. This marks the section as readable and + executable, but not writable, and also indicates to the linker + that the type of the section is code.} + + \item{\code{data} and \code{bss} define the section to be a data + section, analogously to \code{code}. Data sections are marked + as readable and writable, but not executable. \code{data} + declares an initialized data section, whereas \code{bss} declares + an uninitialized data section.} + + \item{\code{rdata} declares an initialized data section that is + readable but not writable. Microsoft compilers use this section + to place constants in it.} + + \item{\code{info} defines the section to be an \textindex{informational section}, + which is not included in the executable file by the linker, but may + (for example) pass information \emph{to} the linker. For example, + declaring an \code{info}-type section called \codeindex{.drectve} causes + the linker to interpret the contents of the section as command-line + options.} + + \item{\code{align=}, used with a trailing number as in \code{obj}, gives the + \index{section alignment!in win32} \index{alignment!in win32 sections} + alignment requirements of the section. The maximum you may + specify is 64: the Win32 object file format contains no means to + request a greater section alignment than this. If alignment is not + explicitly specified, the defaults are 16-byte alignment for code + sections, 8-byte alignment for rdata sections and 4-byte alignment + for data (and BSS) sections. + Informational sections get a default alignment of 1 byte (no + alignment), though the value does not matter.} +\end{itemize} + +The defaults assumed by NASM if you do not specify the above +qualifiers are: + +\begin{lstlisting} +section .text code align=16 +section .data data align=4 +section .rdata rdata align=8 +section .bss bss align=4 +\end{lstlisting} + +Any other section name is treated by default like \code{.text}. + +\xsubsection{win32safeseh}{\code{win32} Safe Structured Exception Handling} + +Among other improvements in Windows XP SP2 and Windows Server 2003 +Microsoft has introduced concept of "safe structured exception +handling." General idea is to collect handlers' entry points in +designated read-only table and have alleged entry point verified +against this table prior exception control is passed to the handler. In +order for an executable module to be equipped with such "safe exception +handler table," all object modules on linker command line has to comply +with certain criteria. If one single module among them does not, then +the table in question is omitted and above mentioned run-time checks +will not be performed for application in question. Table omission is by +default silent and therefore can be easily overlooked. One can instruct +linker to refuse to produce binary without such table by passing +\code{/safeseh} command line option. + +Without regard to this run-time check merits it's natural to expect +NASM to be capable of generating modules suitable for \code{/safeseh} +linking. From developer's viewpoint the problem is two-fold: + +\begin{itemize} + \item{how to adapt modules not deploying exception handlers of their own;} + \item{how to adapt/develop modules utilizing custom exception handling.} +\end{itemize} + +Former can be easily achieved with any NASM version by adding following +line to source code: + +\begin{lstlisting} +$@feat.00 equ 1 +\end{lstlisting} + +As of version 2.03 NASM adds this absolute symbol automatically. If +it's not already present to be precise. I.e. if for whatever reason +developer would choose to assign another value in source file, it would +still be perfectly possible. + +Registering custom exception handler on the other hand requires certain +"magic." As of version 2.03 additional directive is implemented, +\code{safeseh}, which instructs the assembler to produce appropriately +formatted input data for above mentioned "safe exception handler +table." Its typical use would be: + +\begin{lstlisting} +section .text +extern _MessageBoxA@16 +%if __NASM_VERSION_ID__ >= 0x02030000 +safeseh handler ; register handler as "safe handler" +%endif +handler: + push DWORD 1 ; MB_OKCANCEL + push DWORD caption + push DWORD text + push DWORD 0 + call _MessageBoxA@16 + sub eax,1 ; incidentally suits as return value + ; for exception handler + ret +global _main +_main: + push DWORD handler + push DWORD [fs:0] + mov DWORD [fs:0],esp ; engage exception handler + xor eax,eax + mov eax,DWORD[eax] ; cause exception + pop DWORD [fs:0] ; disengage exception handler + add esp,4 + ret +text: db 'OK to rethrow, CANCEL to generate core dump',0 +caption:db 'SEGV',0 + +section .drectve info + db '/defaultlib:user32.lib /defaultlib:msvcrt.lib ' +\end{lstlisting} + +As you might imagine, it's perfectly possible to produce .exe binary +with "safe exception handler table" and yet engage unregistered +exception handler. Indeed, handler is engaged by simply manipulating +\code{[fs:0]} location at run-time, something linker has no power over, +run-time that is. It should be explicitly mentioned that such failure +to register handler's entry point with \code{safeseh} directive has +undesired side effect at run-time. If exception is raised and +unregistered handler is to be executed, the application is abruptly +terminated without any notification whatsoever. One can argue that +system could at least have logged some kind "non-safe exception +handler in x.exe at address n" message in event log, but no, literally +no notification is provided and user is left with no clue on what +caused application failure. + +Finally, all mentions of linker in this paragraph refer to Microsoft +linker version 7.x and later. Presence of \code{@feat.00} symbol and input +data for "safe exception handler table" causes no backward +incompatibilities and "safeseh" modules generated by NASM 2.03 and +later can still be linked by earlier versions or non-Microsoft linkers. + +\xsubsection{codeview}{Debugging formats for Windows} +\index{Windows debugging formats} + +The \code{win32} and \code{win64} formats support the Microsoft CodeView +debugging format. Currently CodeView version 8 format is supported +(\codeindex{cv8}), but newer versions of the CodeView debugger should be +able to handle this format as well. + +\xsection{win64fmt}{\codeindex{win64}: Microsoft Win64 Object Files} + +The \code{win64} output format generates Microsoft Win64 object files, +which is nearly 100\% identical to the \code{win32} object format +(\nref{win32fmt}) with the exception that it is meant to target +64-bit code and the x86-64 platform altogether. This object file is used +exactly the same as the \code{win32} object format, in NASM, with regard to this exception. + +\xsubsection{win64pic}{\code{win64}: Writing Position-Independent Code} + +While \code{REL} takes good care of RIP-relative addressing, there is one +aspect that is easy to overlook for a Win64 programmer: indirect +references. Consider a switch dispatch table: + +\begin{lstlisting} + jmp qword [dsptch+rax*8] + ... +dsptch: dq case0 + dq case1 + ... +\end{lstlisting} + +Even a novice Win64 assembler programmer will soon realize that the code +is not 64-bit savvy. Most notably linker will refuse to link it with + +\begin{lstlisting} +'ADDR32' relocation to '.text' invalid without /LARGEADDRESSAWARE:NO +\end{lstlisting} + +So [s]he will have to split jmp instruction as following: + +\begin{lstlisting} + lea rbx,[rel dsptch] + jmp qword [rbx+rax*8] +\end{lstlisting} + +What happens behind the scene is that effective address in \code{lea} is +encoded relative to instruction pointer, or in perfectly position-independent +manner. But this is only part of the problem! Trouble is that in .dll context +\code{caseN} relocations will make their way to the final module and might +have to be adjusted at .dll load time. To be specific when it can't be loaded +at preferred address. And when this occurs, pages with such relocations will +be rendered private to current process, which kind of undermines the idea +of sharing .dll. But no worry, it's trivial to fix: + +\begin{lstlisting} + lea rbx,[rel dsptch] + add rbx,[rbx+rax*8] + jmp rbx + ... +dsptch: dq case0-dsptch + dq case1-dsptch + ... +\end{lstlisting} + +NASM version 2.03 and later provides another alternative, \code{wrt +..imagebase} operator, which returns offset from base address of the +current image, be it .exe or .dll module, therefore the name. For those +acquainted with PE-COFF format base address denotes start of +\code{IMAGE\_DOS\_HEADER} structure. Here is how to implement switch with +these image-relative references: + +\begin{lstlisting} + lea rbx,[rel dsptch] + mov eax,[rbx+rax*4] + sub rbx,dsptch wrt ..imagebase + add rbx,rax + jmp rbx + ... +dsptch: dd case0 wrt ..imagebase + dd case1 wrt ..imagebase +\end{lstlisting} + +One can argue that the operator is redundant. Indeed, snippet before +last works just fine with any NASM version and is not even Windows +specific... The real reason for implementing \code{wrt ..imagebase} will +become apparent in next paragraph. + +It should be noted that \code{wrt ..imagebase} is defined as 32-bit +operand only: + +\begin{lstlisting} +dd label wrt ..imagebase ; ok +dq label wrt ..imagebase ; bad +mov eax,label wrt ..imagebase ; ok +mov rax,label wrt ..imagebase ; bad +\end{lstlisting} + +\xsubsection{win64seh}{\code{win64}: Structured Exception Handling} + +Structured exception handing in Win64 is completely different matter +from Win32. Upon exception program counter value is noted, and +linker-generated table comprising start and end addresses of all the +functions [in given executable module] is traversed and compared to the +saved program counter. Thus so called \code{UNWIND\_INFO} structure is +identified. If it's not found, then offending subroutine is assumed to +be "leaf" and just mentioned lookup procedure is attempted for its +caller. In Win64 leaf function is such function that does not call any +other function \emph{nor} modifies any Win64 non-volatile registers, +including stack pointer. The latter ensures that it's possible to +identify leaf function's caller by simply pulling the value from the +top of the stack. + +While majority of subroutines written in assembler are not calling any +other function, requirement for non-volatile registers' immutability +leaves developer with not more than 7 registers and no stack frame, +which is not necessarily what [s]he counted with. Customarily one would +meet the requirement by saving non-volatile registers on stack and +restoring them upon return, so what can go wrong? If [and only if] an +exception is raised at run-time and no \code{UNWIND\_INFO} structure is +associated with such "leaf" function, the stack unwind procedure will +expect to find caller's return address on the top of stack immediately +followed by its frame. Given that developer pushed caller's +non-volatile registers on stack, would the value on top point at some +code segment or even addressable space? Well, developer can attempt +copying caller's return address to the top of stack and this would +actually work in some very specific circumstances. But unless developer +can guarantee that these circumstances are always met, it's more +appropriate to assume worst case scenario, i.e. stack unwind procedure +going berserk. Relevant question is what happens then? Application is +abruptly terminated without any notification whatsoever. Just like in +Win32 case, one can argue that system could at least have logged +"unwind procedure went berserk in x.exe at address n" in event log, but +no, no trace of failure is left. + +Now, when we understand significance of the \code{UNWIND\_INFO} structure, +let's discuss what's in it and/or how it's processed. First of all it +is checked for presence of reference to custom language-specific +exception handler. If there is one, then it's invoked. Depending on the +return value, execution flow is resumed (exception is said to be +"handled"), \emph{or} rest of \code{UNWIND\_INFO} structure is processed as +following. Beside optional reference to custom handler, it carries +information about current callee's stack frame and where non-volatile +registers are saved. Information is detailed enough to be able to +reconstruct contents of caller's non-volatile registers upon call to +current callee. And so caller's context is reconstructed, and then +unwind procedure is repeated, i.e. another \code{UNWIND\_INFO} structure is +associated, this time, with caller's instruction pointer, which is then +checked for presence of reference to language-specific handler, etc. +The procedure is recursively repeated till exception is handled. As +last resort system "handles" it by generating memory core dump and +terminating the application. + +As for the moment of this writing NASM unfortunately does not +facilitate generation of above mentioned detailed information about +stack frame layout. But as of version 2.03 it implements building +blocks for generating structures involved in stack unwinding. As +simplest example, here is how to deploy custom exception handler for +leaf function: + +\begin{lstlisting} +default rel +section .text +extern MessageBoxA +handler: + sub rsp,40 + mov rcx,0 + lea rdx,[text] + lea r8,[caption] + mov r9,1 ; MB_OKCANCEL + call MessageBoxA + sub eax,1 ; incidentally suits as return value + ; for exception handler + add rsp,40 + ret +global main +main: + xor rax,rax + mov rax,QWORD[rax] ; cause exception + ret +main_end: +text: db 'OK to rethrow, CANCEL to generate core dump',0 +caption:db 'SEGV',0 + +section .pdata rdata align=4 + dd main wrt ..imagebase + dd main_end wrt ..imagebase + dd xmain wrt ..imagebase +section .xdata rdata align=8 +xmain: db 9,0,0,0 + dd handler wrt ..imagebase +section .drectve info + db '/defaultlib:user32.lib /defaultlib:msvcrt.lib ' +\end{lstlisting} + +What you see in \code{.pdata} section is element of the "table comprising +start and end addresses of function" along with reference to associated +\code{UNWIND\_INFO} structure. And what you see in \code{.xdata} section is +\code{UNWIND\_INFO} structure describing function with no frame, but with +designated exception handler. References are \emph{required} to be +image-relative (which is the real reason for implementing \code{wrt +..imagebase} operator). It should be noted that \code{rdata align=n}, as +well as \code{wrt ..imagebase}, are optional in these two segments' +contexts, i.e. can be omitted. Latter means that \emph{all} 32-bit +references, not only above listed required ones, placed into these two +segments turn out image-relative. Why is it important to understand? +Developer is allowed to append handler-specific data to \code{UNWIND\_INFO} +structure, and if [s]he adds a 32-bit reference, then [s]he will have +to remember to adjust its value to obtain the real pointer. + +As already mentioned, in Win64 terms leaf function is one that does not +call any other function \emph{nor} modifies any non-volatile register, +including stack pointer. But it's not uncommon that assembler +programmer plans to utilize every single register and sometimes even +have variable stack frame. Is there anything one can do with bare +building blocks? I.e. besides manually composing fully-fledged +\code{UNWIND\_INFO} structure, which would surely be considered +error-prone? Yes, there is. Recall that exception handler is called +first, before stack layout is analyzed. As it turned out, it's +perfectly possible to manipulate current callee's context in custom +handler in manner that permits further stack unwinding. General idea is +that handler would not actually "handle" the exception, but instead +restore callee's context, as it was at its entry point and thus mimic +leaf function. In other words, handler would simply undertake part of +unwinding procedure. Consider following example: + +\begin{lstlisting} +function: + mov rax,rsp ; copy rsp to volatile register + push r15 ; save non-volatile registers + push rbx + push rbp + mov r11,rsp ; prepare variable stack frame + sub r11,rcx + and r11,-64 + mov QWORD[r11],rax ; check for exceptions + mov rsp,r11 ; allocate stack frame + mov QWORD[rsp],rax ; save original rsp value +magic_point: + ... + mov r11,QWORD[rsp] ; pull original rsp value + mov rbp,QWORD[r11-24] + mov rbx,QWORD[r11-16] + mov r15,QWORD[r11-8] + mov rsp,r11 ; destroy frame + ret +\end{lstlisting} + +The keyword is that up to \code{magic\_point} original \code{rsp} value +remains in chosen volatile register and no non-volatile register, +except for \code{rsp}, is modified. While past \code{magic\_point} +\code{rsp} remains constant till the very end of the \code{function}. +In this case custom language-specific exception handler would look like this: + +\begin{lstlisting} +EXCEPTION_DISPOSITION +handler(EXCEPTION_RECORD *rec, ULONG64 frame, + CONTEXT *context, DISPATCHER_CONTEXT *disp) +{ + ULONG64 *rsp; + + if (context->Rip < (ULONG64)magic_point) + rsp = (ULONG64 *)context->Rax; + else { + rsp = ((ULONG64 **)context->Rsp)[0]; + context->Rbp = rsp[-3]; + context->Rbx = rsp[-2]; + context->R15 = rsp[-1]; + } + context->Rsp = (ULONG64)rsp; + + memcpy(disp->ContextRecord, context, sizeof(CONTEXT)); + RtlVirtualUnwind(UNW_FLAG_NHANDLER, disp->ImageBase, + dips->ControlPc, disp->FunctionEntry, + disp->ContextRecord, + &disp->HandlerData, + &disp->EstablisherFrame, + NULL); + + return ExceptionContinueSearch; +} +\end{lstlisting} + +As custom handler mimics leaf function, corresponding \code{UNWIND\_INFO} +structure does not have to contain any information about stack frame +and its layout. + +\xsection{cofffmt}{\codeindex{coff}: \textindexlc{Common Object File Format}} + +The \code{coff} output type produces \code{COFF} object files suitable for +linking with the \textindex{DJGPP} linker. + +\code{coff} provides a default output file-name extension of \code{.o}. + +The \code{coff} format supports the same extensions to the \code{SECTION} +directive as \code{win32} does, except that the \code{align} qualifier and +the \code{info} section type are not supported. + +\xsection{machofmt}{\codeindex{macho32} and \codeindex{macho64}: +\textindexlc{Mach Object File Format}} +\index{Mach-O} + +The \code{macho32}, \code{macho64} output formts produces Mach-O +object files suitable for linking with the \textindex{MacOS X} linker. +\codeindex{macho} is a synonym for \code{macho32}. + +\code{macho} provides a default output file-name extension of \code{.o}. + +\xsubsection{machosect}{\code{macho} extensions to the \code{SECTION} Directive} +\index{SECTION!macho extensions to} + +The \code{macho} output format specifies section names in the format +"\emph{segment}\code{,}\emph{section}". No spaces are allowed around the +comma. The following flags can also be specified: + +\begin{itemize} + \item{\code{data} - this section contains initialized data items} + \item{\code{code} - this section contains code exclusively} + \item{\code{mixed} - this section contains both code and data} + \item{\code{bss} - this section is uninitialized and filled with zero} + \item{\code{zerofill} - same as \code{bss}} + \item{\code{no\_dead\_strip} - inhibit dead code stripping for this section} + \item{\code{live\_support} - set the live support flag for this section} + \item{\code{strip\_static\_syms} - strip static symbols for this section} + \item{\code{debug} - this section contains debugging information} + \item{\code{align=}\emph{alignment} - specify section alignment} +\end{itemize} + +The default is \code{data}, unless the section name is \code{\_\_text} or +\code{\_\_bss} in which case the default is \code{text} or \code{bss}, +respectively. + +For compatibility with other Unix platforms, the following standard +names are also supported: + +\begin{lstlisting} +.text = __TEXT,__text text +.rodata = __DATA,__const data +.data = __DATA,__data data +.bss = __DATA,__bss bss +\end{lstlisting} + +If the \code{.rodata} section contains no relocations, it is instead put +into the \code{\_\_TEXT,\_\_const} section unless this section has already +been specified explicitly. However, it is probably better to specify +\code{\_\_TEXT,\_\_const} and \code{\_\_DATA,\_\_const} explicitly as appropriate. + +\xsubsection{machotls}{\textindexlc{Thread Local Storage in Mach-O}\index{TLS}: +\code{macho} special symbols and \codeindex{WRT}} + +Mach-O defines the following special symbols that can be used on the +right-hand side of the \code{WRT} operator: + +\begin{itemize} + \item{\code{..tlvp} is used to specify access to thread-local storage.} + \item{\code{..gotpcrel} is used to specify references to the Global Offset Table. + The GOT is supported in the \code{macho64} format only.} +\end{itemize} + +\xsubsection{macho-ssvs}{\code{macho} specfic directive +\codeindex{subsections\_via\_symbols}} + +The directive \code{subsections\_via\_symbols} sets the +\code{MH\_SUBSECTIONS\_VIA\_SYMBOLS} flag in the Mach-O header, +that effectively separates a block (or a subsection) based on a symbol. +It is often used for eliminating dead codes by a linker. + +This directive takes no arguments. + +This is a macro implemented as a \code{\%pragma}. It can also be +specified in its \code{\%pragma} form, in which case it will not affect +non-Mach-O builds of the same source code: + +\begin{lstlisting} +%pragma macho subsections_via_symbols +\end{lstlisting} + +\xsubsection{macho-snds}{\code{macho} specfic directive \codeindex{no\_dead\_strip}} + +The directive \code{no\_dead\_strip} sets the Mach-O \code{SH\_NO\_DEAD\_STRIP} +section flag on the section containing a a specific symbol. This directive takes +a list of symbols as its arguments. + +This is a macro implemented as a \code{\%pragma}. It can also be +specified in its \code{\%pragma} form, in which case it will not affect +non-Mach-O builds of the same source code: + +\begin{lstlisting} +%pragma macho no_dead_strip symbol... +\end{lstlisting} + +\xsubsection{macho-pext}{\code{macho} specific extensions to the +\code{GLOBAL} Directive: \codeindex{private\_extern}} + +The directive extension to \code{GLOBAL} marks the symbol with limited +global scope. For example, you can specify the global symbol with +this extension: + +\begin{lstlisting} +global foo:private_extern +foo: + ; codes +\end{lstlisting} + +Using with static linker will clear the private extern attribute. +But linker option like \code{-keep\_private\_externs} can avoid it. + +\xsection{elffmt}{\codeindex{elf32}, \codeindex{elf64}, \codeindex{elfx32}: +\textindexlc{Executable and Linkable Format} Object Files} +\index{ELF}\index{linux!elf} + +The \code{elf32}, \code{elf64} and \code{elfx32} output formats generate +\code{ELF32} and \code{ELF64} (Executable and Linkable Format) object files, +as used by Linux as well as \textindex{Unix System V}, including +\textindex{Solaris x86}, \textindex{UnixWare} and \textindex{SCO Unix}. +\code{elf} provides a default output file-name extension of \code{.o}. +\code{elf} is a synonym for \code{elf32}. + +The \code{elfx32} format is used for the \textindex{x32} ABI, which is +a 32-bit ABI with the CPU in 64-bit mode. + +\xsubsection{abisect}{ELF specific directive \codeindex{osabi}} + +The ELF header specifies the application binary interface for the +target operating system (OSABI). This field can be set by using the +\code{osabi} directive with the numeric value (0-255) of the target +system. If this directive is not used, the default value will be "UNIX +System V ABI" (0) which will work on most systems which support ELF. + +\xsubsection{elfsect}{\code{elf} extensions to the \code{SECTION} Directive} +\index{SECTION!elf extensions to} + +Like the \code{obj} format, \code{elf} allows you to specify additional +information on the \code{SECTION} directive line, to control the type +and properties of sections you declare. Section types and properties +are generated automatically by NASM for the \textindexlc{standard section +names}, but may still be overridden by these qualifiers. + +The available qualifiers are: + +\begin{itemize} + \item{\codeindex{alloc} defines the section to be one which is loaded into + memory when the program is run. \codeindex{noalloc} defines it to be one + which is not, such as an informational or comment section.} + + \item{\codeindex{exec} defines the section to be one which should have execute + permission when the program is run. \codeindex{noexec} defines it as one + which should not.} + + \item{\codeindex{write} defines the section to be one which should be writable + when the program is run. \codeindex{nowrite} defines it as one which should + not.} + + \item{\codeindex{progbits} defines the section to be one with explicit contents + stored in the object file: an ordinary code or data section, for + example, \codeindex{nobits} defines the section to be one with no explicit + contents given, such as a BSS section.} + + \item{\code{align=}, used with a trailing number as in \code{obj}, gives the + \index{section alignment!in elf}\index{alignment!in elf sections}alignment + requirements of the section.} + + \item{\codeindex{tls} defines the section to be one which contains + thread local variables.} +\end{itemize} + +The defaults assumed by NASM if you do not specify the above +qualifiers are: +\indexcode{.text} \indexcode{.rodata} \indexcode{.lrodata} +\indexcode{.data} \indexcode{.ldata} \indexcode{.bss} +\indexcode{.lbss} \indexcode{.tdata} \indexcode{.tbss} +\indexcode{.comment} + +\begin{lstlisting} +section .text progbits alloc exec nowrite align=16 +section .rodata progbits alloc noexec nowrite align=4 +section .lrodata progbits alloc noexec nowrite align=4 +section .data progbits alloc noexec write align=4 +section .ldata progbits alloc noexec write align=4 +section .bss nobits alloc noexec write align=4 +section .lbss nobits alloc noexec write align=4 +section .tdata progbits alloc noexec write align=4 tls +section .tbss nobits alloc noexec write align=4 tls +section .comment progbits noalloc noexec nowrite align=1 +section other progbits alloc noexec nowrite align=1 +\end{lstlisting} + +(Any section name other than those in the above table is treated by +default like \code{other} in the above. Please note that section +names are case sensitive.) + +\xsubsection{elfwrt}{\textindexlc{Position-Independent Code}: \code{elf} +Special Symbols and \codeindex{WRT}} +\index{PIC} + +Since \code{ELF} does not support segment-base references, the \code{WRT} +operator is not used for its normal purpose; therefore NASM's \code{elf} +output format makes use of \code{WRT} for a different purpose, namely the +PIC-specific \index{relocations!PIC-specific}relocation types. + +\code{elf} defines five special symbols which you can use as the +right-hand side of the \code{WRT} operator to obtain PIC relocation +types. They are \codeindex{..gotpc}, \codeindex{..gotoff}, \codeindex{..got}, +\codeindex{..plt} and \codeindex{..sym}. Their functions are summarized here: + +\begin{itemize} + \item{Referring to the symbol marking the global offset table base + using \code{wrt ..gotpc} will end up giving the distance from the + beginning of the current section to the global offset table. + (\codeindex{\_GLOBAL\_OFFSET\_TABLE\_} is the standard symbol name + used to refer to the \textindex{GOT}.) So you would then need to add + \codeindex{\$\$} to the result to get the real address of the GOT.} + + \item{Referring to a location in one of your own sections using + \code{wrt ..gotoff} will give the distance from the beginning of + the GOT to the specified location, so that adding on the address + of the GOT would give the real address of the location you wanted.} + + \item{Referring to an external or global symbol using \code{wrt ..got} + causes the linker to build an entry \emph{in} the GOT containing the + address of the symbol, and the reference gives the distance from the + beginning of the GOT to the entry; so you can add on the address of + the GOT, load from the resulting address, and end up with the + address of the symbol.} + + \item{Referring to a procedure name using \code{wrt ..plt} causes the + linker to build a \textindex{procedure linkage table} entry for the symbol, + and the reference gives the address of the \textindex{PLT} entry. You can + only use this in contexts which would generate a PC-relative + relocation normally (i.e. as the destination for \code{CALL} or + \code{JMP}), since ELF contains no relocation type to refer to PLT + entries absolutely.} + + \item{Referring to a symbol name using \code{wrt ..sym} causes NASM to + write an ordinary relocation, but instead of making the relocation + relative to the start of the section and then adding on the offset + to the symbol, it will write a relocation record aimed directly at + the symbol in question. The distinction is a necessary one due to a + peculiarity of the dynamic linker.} +\end{itemize} + +A fuller explanation of how to use these relocation types to write +shared libraries entirely in NASM is given in \nref{picdll}. + +\xsubsection{elftls}{\textindexlc{Thread Local Storage in ELF}: +\code{elf} Special Symbols and \codeindex{WRT}} +\index{TLS} + +In ELF32 mode, referring to an external or global symbol using +\code{wrt ..tlsie}\indexcode{..tlsie} causes the linker to build +an entry \emph{in} the GOT containing the +offset of the symbol within the TLS block, so you can access the value +of the symbol with code such as: + +\begin{lstlisting} +mov eax,[tid wrt ..tlsie] +mov [gs:eax],ebx +\end{lstlisting} + +In ELF64 or ELFx32 mode, referring to an external or global symbol using +\code{wrt ..gottpoff}\indexcode{..gottpoff} causes the linker to build an +entry \emph{in} the GOT containing the offset of the symbol within the TLS +block, so you can access the value of the symbol with code such as: + +\begin{lstlisting} +mov rax,[rel tid wrt ..gottpoff] +mov rcx,[fs:rax] +\end{lstlisting} + +\xsubsection{elfglob}{\code{elf} Extensions to the \code{GLOBAL} Directive} +\index{GLOBAL!elf extensions to} + +\code{ELF} object files can contain more information about a global symbol +than just its address: they can contain the \index{symbol sizes!specifying} +\index{size!of symbols}size of the symbol and its \index{symbol types!specifying} +\index{type!of symbols}type as well. These are not merely debugger conveniences, +but are actually necessary when the program being written is a +\textindexlc{shared library}. NASM therefore supports some extensions to the +\code{GLOBAL} directive, allowing you to specify these features. + +You can specify whether a global variable is a function or a data +object by suffixing the name with a colon and the word +\codeindex{function} or \codeindex{data}. (\codeindex{object} is +a synonym for \code{data}.) For example: + +\begin{lstlisting} +global hashlookup:function, hashtable:data +\end{lstlisting} + +exports the global symbol \code{hashlookup} as a function and +\code{hashtable} as a data object. + +Optionally, you can control the ELF visibility of the symbol. Just +add one of the visibility keywords: \codeindex{default}, +\codeindex{internal}, \codeindex{hidden}, or \codeindex{protected}. +The default is \code{default} of course. For example, to make +\code{hashlookup} hidden: + +\begin{lstlisting} +global hashlookup:function hidden +\end{lstlisting} + +You can also specify the size of the data associated with the +symbol, as a numeric expression (which may involve labels, and even +forward references) after the type specifier. Like this: + +\begin{lstlisting} +global hashtable:data (hashtable.end - hashtable) + +hashtable: + db this,that,theother ; some data here +.end: +\end{lstlisting} + +This makes NASM automatically calculate the length of the table and +place that information into the \code{ELF} symbol table. + +Declaring the type and size of global symbols is necessary when +writing shared library code. For more information, see +\nref{picglobal}. + +\xsubsection{elfcomm}{\code{elf} Extensions to the \code{COMMON} Directive} +\index{COMMON!elf extensions to} + +\code{ELF} also allows you to specify alignment requirements +\index{common variables!alignment in elf} +\index{alignment!of elf common variables} on common variables. +This is done by putting a number (which must be a power of two) +after the name and size of the common variable, separated (as usual) +by a colon. For example, an array of doublewords would benefit from +4-byte alignment: + +\begin{lstlisting} +common dwordarray 128:4 +\end{lstlisting} + +This declares the total size of the array to be 128 bytes, and +requires that it be aligned on a 4-byte boundary. + +\xsubsection{elf16}{16-bit code and ELF} +\index{ELF!16-bit code and} + +The \code{ELF32} specification doesn't provide relocations for 8- and +16-bit values, but the GNU \code{ld} linker adds these as an extension. +NASM can generate GNU-compatible relocations, to allow 16-bit code to +be linked as ELF using GNU \code{ld}. If NASM is used with the +\code{-w+gnu-elf-extensions} option, a warning is issued when one of +these relocations is generated. + +\xsubsection{elfdbg}{Debug formats and ELF} +\index{ELF!Debug formats} + +ELF provides debug information in \code{STABS} and \code{DWARF} formats. +Line number information is generated for all executable sections, but please +note that only the ".text" section is executable by default. + +\xsection{aoutfmt}{\codeindex{aout}: Linux \code{a.out} Object Files} +\index{a.out!Linux version} +\index{linux!a.out} + +The \code{aout} format generates \code{a.out} object files, in the +form used by early Linux systems (current Linux systems use ELF, see +\nref{elffmt}.) These differ from other \code{a.out} object +files in that the magic number in the first four bytes of the file is +different; also, some implementations of \code{a.out}, for example +NetBSD's, support position-independent code, which Linux's +implementation does not. + +\code{a.out} provides a default output file-name extension of \code{.o}. + +\code{a.out} is a very simple object format. It supports no special +directives, no special symbols, no use of \code{SEG} or \code{WRT}, and no +extensions to any standard directives. It supports only the three +\textindexlc{standard section names} \codeindex{.text}, \codeindex{.data} +and \codeindex{.bss}. + +\xsection{aoutbfmt}{\codeindex{aoutb}: \textindex{NetBSD}/\textindex{FreeBSD}/\textindex{OpenBSD} +\code{a.out} Object Files} +\index{a.out!BSD version} + +The \code{aoutb} format generates \code{a.out} object files, in the form +used by the various free \code{BSD Unix} clones, \code{NetBSD}, \code{FreeBSD} +and \code{OpenBSD}. For simple object files, this object format is exactly +the same as \code{aout} except for the magic number in the first four bytes +of the file. However, the \code{aoutb} format supports +\index{PIC}\textindexlc{position-independent code} in the same way as the +\code{elf} format, so you can use it to write \code{BSD} +\textindexlc{shared libraries}. + +\code{aoutb} provides a default output file-name extension of \code{.o}. + +\code{aoutb} supports no special directives, no special symbols, and +only the three \textindexlc{standard section names} \codeindex{.text}, +\codeindex{.data} and \codeindex{.bss}. However, it also supports the same +use of \codeindex{WRT} as \code{elf} does, to provide position-independent +code relocation types. See \nref{elfwrt} for full documentation +of this feature. + +\code{aoutb} also supports the same extensions to the \code{GLOBAL} +directive as \code{elf} does: see \nref{elfglob} for +documentation of this. + +\xsection{as86fmt}{\code{as86}: \textindex{Minix}/Linux \codeindex{as86} Object Files} +\index{linux!as86} + +The Minix/Linux 16-bit assembler \code{as86} has its own non-standard +object file format. Although its companion linker \codeindex{ld86} +produces something close to ordinary \code{a.out} binaries as output, +the object file format used to communicate between \code{as86} and +\code{ld86} is not itself \code{a.out}. + +NASM supports this format, just in case it is useful, as \code{as86}. +\code{as86} provides a default output file-name extension of \code{.o}. + +\code{as86} is a very simple object format (from the NASM user's point +of view). It supports no special directives, no use of \code{SEG} or +\code{WRT}, and no extensions to any standard directives. It supports +only the three \textindexlc{standard section names} \codeindex{.text}, +\codeindex{.data} and \codeindex{.bss}. The only special symbol supported +is \code{..start}. + +\xsection{rdffmt}{\index{RDOFF}\codeindex{rdf}: \textindexlc{Relocatable Dynamic +Object File Format}} + +The \code{rdf} output format produces \code{RDOFF} object files. +\code{RDOFF} (Relocatable Dynamic Object File Format) is a home-grown +object-file format, designed alongside NASM itself and reflecting in +its file format the internal structure of the assembler. + +\code{RDOFF} is not used by any well-known operating systems. Those +writing their own systems, however, may well wish to use \code{RDOFF} +as their object format, on the grounds that it is designed primarily +for simplicity and contains very little file-header bureaucracy. + +The Unix NASM archive, and the DOS archive which includes sources, +both contain an \index{rdoff subdirectory}\code{rdoff} subdirectory +holding a set of RDOFF utilities: an RDF linker, an \code{RDF} +static-library manager, an RDF file dump utility, and a program +which will load and execute an RDF executable under Linux. + +\code{rdf} supports only the \index{standard section names} +\codeindex{.text}, \codeindex{.data} and \codeindex{.bss}. + +\xsubsection{rdflib}{Requiring a Library: The \codeindex{LIBRARY} Directive} + +\code{RDOFF} contains a mechanism for an object file to demand a given +library to be linked to the module, either at load time or run time. +This is done by the \code{LIBRARY} directive, which takes one argument +which is the name of the module: + +\begin{lstlisting} +library mylib.rdl +\end{lstlisting} + +\xsubsection{rdfmod}{Specifying a Module Name: The \codeindex{MODULE} Directive} + +Special \code{RDOFF} header record is used to store the name of the module. +It can be used, for example, by run-time loader to perform dynamic +linking. \code{MODULE} directive takes one argument which is the name +of current module: + +\begin{lstlisting} +module mymodname +\end{lstlisting} + +Note that when you statically link modules and tell linker to strip +the symbols from output file, all module names will be stripped too. +To avoid it, you should start module names with \index{\$!prefix}\code{\$}, +like: + +\begin{lstlisting} +module $kernel.core +\end{lstlisting} + +\xsubsection{rdfglob}{\code{rdf} Extensions to the \code{GLOBAL} Directive} +\index{GLOBAL!rdf extensions to} + +\code{RDOFF} global symbols can contain additional information needed by +the static linker. You can mark a global symbol as exported, thus +telling the linker do not strip it from target executable or library +file. Like in \code{ELF}, you can also specify whether an exported symbol +is a procedure (function) or data object. + +Suffixing the name with a colon and the word \codeindex{export} you make the +symbol exported: + +\begin{lstlisting} +global sys_open:export +\end{lstlisting} + +To specify that exported symbol is a procedure (function), you add the +word \codeindex{proc} or \codeindex{function} after declaration: + +\begin{lstlisting} +global sys_open:export proc +\end{lstlisting} + +Similarly, to specify exported data object, add the word \codeindex{data} +or \codeindex{object} to the directive: + +\begin{lstlisting} +global kernel_ticks:export data +\end{lstlisting} + +\xsubsection{rdfimpt}{\code{rdf} Extensions to the \code{EXTERN} Directive} +\index{EXTERN!rdf extensions to} + +By default the \code{EXTERN} directive in \code{RDOFF} declares a "pure external" +symbol (i.e. the static linker will complain if such a symbol is not resolved). +To declare an "imported" symbol, which must be resolved later during a dynamic +linking phase, \code{RDOFF} offers an additional \code{import} modifier. As in +\code{GLOBAL}, you can also specify whether an imported symbol is a procedure +(function) or data object. For example: + +\begin{lstlisting} +library $libc +extern _open:import +extern _printf:import proc +extern _errno:import data +\end{lstlisting} + +Here the directive \code{LIBRARY} is also included, which gives the dynamic linker +a hint as to where to find requested symbols. + +\xsection{dbgfmt}{\codeindex{dbg}: Debugging Format} + +The \code{dbg} format does not output an object file as such; instead, +it outputs a text file which contains a complete list of all the +transactions between the main body of NASM and the output-format +back end module. It is primarily intended to aid people who want to +write their own output drivers, so that they can get a clearer idea +of the various requests the main program makes of the output driver, +and in what order they happen. + +For simple files, one can easily use the \code{dbg} format like this: + +\begin{lstlisting} +nasm -f dbg filename.asm +\end{lstlisting} + +which will generate a diagnostic file called \code{filename.dbg}. +However, this will not work well on files which were designed for a +different object format, because each object format defines its own +macros (usually user-level forms of directives), and those macros +will not be defined in the \code{dbg} format. Therefore it can be +useful to run NASM twice, in order to do the preprocessing with the +native object format selected: + +\begin{lstlisting} +nasm -e -f rdf -o rdfprog.i rdfprog.asm +nasm -a -f dbg rdfprog.i +\end{lstlisting} + +This preprocesses \code{rdfprog.asm} into \code{rdfprog.i}, keeping the +\code{rdf} object format selected in order to make sure RDF special +directives are converted into primitive form correctly. Then the +preprocessed source is fed through the \code{dbg} format to generate +the final diagnostic output. + +This workaround will still typically not work for programs intended +for \code{obj} format, because the \code{obj}- \code{SEGMENT} and \code{GROUP} +directives have side effects of defining the segment and group names +as symbols; \code{dbg} will not do this, so the program will not +assemble. You will have to work around that by defining the symbols +yourself (using \code{EXTERN}, for example) if you really need to get a +\code{dbg} trace of an \code{obj}-specific source file. + +\code{dbg} accepts any section name and any directives at all, and logs +them all to its output file. + +\code{dbg} accepts and logs any \code{\%pragma}, but the specific \code{\%pragma}: + +\begin{lstlisting} +%pragma dbg maxdump <size> +\end{lstlisting} + +where \code{<size>} is either a number or \code{unlimited}, can be +used to control the maximum size for dumping the full contents of a +\code{rawdata} output object. |