summaryrefslogtreecommitdiff
path: root/doc/latex/src/directive.tex
diff options
context:
space:
mode:
Diffstat (limited to 'doc/latex/src/directive.tex')
-rw-r--r--doc/latex/src/directive.tex541
1 files changed, 541 insertions, 0 deletions
diff --git a/doc/latex/src/directive.tex b/doc/latex/src/directive.tex
new file mode 100644
index 00000000..964315cd
--- /dev/null
+++ b/doc/latex/src/directive.tex
@@ -0,0 +1,541 @@
+%
+% vim: ts=4 sw=4 et
+%
+\xchapter{directive}{\textindexlc{Assembler Directives}}
+
+NASM, though it attempts to avoid the bureaucracy of assemblers like
+MASM and TASM, is nevertheless forced to support a \emph{few}
+directives. These are described in this chapter.
+
+NASM's directives come in two types: \index{directives!user-level}
+\emph{user-level} directives and \index{directives!primitive}
+\emph{primitive} directives. Typically, each directive has a
+user-level form and a primitive form. In almost all cases, we
+recommend that users use the user-level forms of the directives,
+which are implemented as macros which call the primitive forms.
+
+Primitive directives are enclosed in square brackets; user-level
+directives are not.
+
+In addition to the universal directives described in this chapter,
+each object file format can optionally supply extra directives in
+order to control particular features of that file format. These
+\index{directives!format-specific}\emph{format-specific} directives are
+documented along with the formats that implement them, in
+\nref{outfmt}.
+
+\xsection{bits}{\codeindex{BITS}: Specifying Target \textindexlc{Processor Mode}}
+
+The \code{BITS} directive specifies whether NASM should generate code
+\index{16-bit mode, versus 32-bit mode}designed to run on a processor
+operating in 16-bit mode, 32-bit mode or 64-bit mode. The syntax is
+\code{BITS XX}, where XX is 16, 32 or 64.
+
+In most cases, you should not need to use \code{BITS} explicitly. The
+\code{aout}, \code{coff}, \code{elf32}, \code{elf64}, \code{macho32},
+\code{macho64}, \code{win32} and \code{win64} object formats, which
+are designed for use in 32-bit or 64-bit operating systems, all cause
+NASM to select 32-bit or 64-bit mode, respectively, by default.
+The \code{obj} object format allows you to specify each segment
+you define as either \code{USE16} or \code{USE32}, and NASM will
+set its operating mode accordingly, so the use of the \code{BITS}
+directive is once again unnecessary.
+
+The most likely reason for using the \code{BITS} directive is to write
+32-bit or 64-bit code in a flat binary file; this is because the \code{bin}
+output format defaults to 16-bit mode in anticipation of it being
+used most frequently to write DOS \code{.COM} programs, DOS \code{.SYS}
+device drivers and boot loader software.
+
+The \code{BITS} directive can also be used to generate code for
+a different mode than the standard one for the output format.
+
+You do \emph{not} need to specify \code{BITS 32} merely in order
+to use 32-bit instructions in a 16-bit DOS program; if you do, the
+assembler will generate incorrect code because it will be writing
+code targeted at a 32-bit platform, to be run on a 16-bit one.
+
+When NASM is in \code{BITS 16} mode, instructions which use 32-bit
+data are prefixed with an 0x66 byte, and those referring to 32-bit
+addresses have an 0x67 prefix. In \code{BITS 32} mode, the reverse is
+true: 32-bit instructions require no prefixes, whereas instructions
+using 16-bit data need an 0x66 and those working on 16-bit
+addresses need an 0x67.
+
+When NASM is in \code{BITS 64} mode, most instructions operate the same
+as they do for \code{BITS 32} mode. However, there are 8 more general and
+SSE registers, and 16-bit addressing is no longer supported.
+
+The default address size is 64 bits; 32-bit addressing can be selected
+with the 0x67 prefix. The default operand size is still 32 bits,
+however, and the 0x66 prefix selects 16-bit operand size.
+The \code{REX} prefix is used both to select 64-bit operand size, and
+to access the new registers. NASM automatically inserts REX prefixes
+when necessary.
+
+When the \code{REX} prefix is used, the processor does not know how to
+address the AH, BH, CH or DH (high 8-bit legacy) registers. Instead,
+it is possible to access the the low 8-bits of the SP, BP SI and DI
+registers as SPL, BPL, SIL and DIL, respectively; but only when the
+REX prefix is used.
+
+The \code{BITS} directive has an exactly equivalent primitive form,
+\code{[BITS 16]}, \code{[BITS 32]} and \code{[BITS 64]}. The user-level
+form is a macro which has no function other than to call the primitive form.
+
+Note that the space is neccessary, e.g. \code{BITS32} will \emph{not} work!
+
+\xsubsection{use163264}{\codeindex{USE16}, \codeindex{USE32}
+and \codeindex{USE64}: Aliases for BITS}
+
+The \code{USE16}, \code{USE32} and \code{USE64} directives can be used
+in place of \code{BITS 16}, \code{BITS 32} and \code{BITS 64}, for
+compatibility with other assemblers.
+
+\xsection{default}{\codeindex{DEFAULT}: Change the assembler defaults}
+
+The \code{DEFAULT} directive changes the assembler defaults. Normally,
+NASM defaults to a mode where the programmer is expected to explicitly
+specify most features directly. However, this is occasionally obnoxious,
+as the explicit form is pretty much the only one one wishes to use.
+
+Currently, \code{DEFAULT} can be set to \code{REL}, \code{ABS}, \code{BND}
+and \code{NOBND}.
+
+\xsubsection{relabs}{\codeindex{REL} and \codeindex{ABS}: RIP-relative addressing}
+
+This sets whether registerless instructions in 64-bit mode are
+\code{RIP}-relative or not. By default, they are absolute unless
+overridden with the \codeindex{REL} specifier (see \nref{effaddr}).
+However, if \code{DEFAULT REL} is specified, \code{REL} is default, unless
+overridden with the \code{ABS} specifier, \emph{except when used with an
+FS or GS segment override}.
+
+The special handling of \code{FS} and \code{GS} overrides are due to the
+fact that these registers are generally used as thread pointers or
+other special functions in 64-bit mode, and generating
+\code{RIP}-relative addresses would be extremely confusing.
+
+\code{DEFAULT REL} is disabled with \code{DEFAULT ABS}.
+
+\xsubsection{bndnobnd}{\codeindex{BND} and \codeindex{NOBND}: \code{BND} prefix}
+
+If \code{DEFAULT BND} is set, all bnd-prefix available instructions
+following this directive are prefixed with bnd. To override it,
+\code{NOBND} prefix can be used.
+
+\begin{lstlisting}
+DEFAULT BND
+ call foo ; BND will be prefixed
+ nobnd call foo ; BND will NOT be prefixed
+\end{lstlisting}
+
+\code{DEFAULT NOBND} can disable \code{DEFAULT BND} and then
+\code{BND} prefix will be added only when explicitly specified
+in code.
+
+\code{DEFAULT BND} is expected to be the normal configuration
+for writing MPX-enabled code.
+
+\xsection{section}{\codeindex{SECTION} or \codeindex{SEGMENT}: Changing and
+\textindexlc{Defining Sections}}
+
+\index{sections!changing}\index{sections!switching between}
+The \code{SECTION} directive (\code{SEGMENT} is an exactly equivalent
+synonym) changes which section of the output file the code you write
+will be assembled into. In some object file formats, the number and
+names of sections are fixed; in others, the user may make up as many
+as they wish. Hence \code{SECTION} may sometimes give an error message,
+or may define a new section, if you try to switch to a section that does
+not (yet) exist.
+
+The Unix object formats, and the \code{bin} object format (but see
+\nref{multisec}), all support the \index{sections!standardized names}
+standardized names \code{.text}, \code{.data} and \code{.bss} for the code,
+data and uninitialized-data sections. The \code{obj} format, by contrast,
+does not recognize these section names as being special, and indeed will
+strip off the leading period of any section name that has one.
+
+\xsubsection{sectmac}{The \codeindex{\_\_SECT\_\_} Macro}
+
+The \code{SECTION} directive is unusual in that its user-level form
+functions differently from its primitive form. The primitive form,
+\code{[SECTION xyz]}, simply switches the current target section to the
+one given. The user-level form, \code{SECTION xyz}, however, first
+defines the single-line macro \code{\_\_SECT\_\_} to be the primitive
+\code{[SECTION]} directive which it is about to issue, and then issues
+it. So the user-level directive
+
+\begin{lstlisting}
+ SECTION .text
+\end{lstlisting}
+
+expands to the two lines
+
+\begin{lstlisting}
+%define __SECT__ [SECTION .text]
+ [SECTION .text]
+\end{lstlisting}
+
+Users may find it useful to make use of this in their own macros.
+For example, the \code{writefile} macro defined in \nref{mlmacgre}
+can be usefully rewritten in the following more sophisticated form:
+
+\begin{lstlisting}
+%macro writefile 2+
+ [section .data]
+
+ %%str: db %2
+ %%endstr:
+
+ __SECT__
+
+ mov dx, %%str
+ mov cx, %%endstr-%%str
+ mov bx, %1
+ mov ah, 0x40
+ int 0x21
+%endmacro
+\end{lstlisting}
+
+This form of the macro, once passed a string to output, first
+switches temporarily to the data section of the file, using the
+primitive form of the \code{SECTION} directive so as not to modify
+\code{\_\_SECT\_\_}. It then declares its string in the data section,
+and then invokes \code{\_\_SECT\_\_} to switch back to \emph{whichever}
+section the user was previously working in. It thus avoids the need,
+in the previous version of the macro, to include a \code{JMP} instruction
+to jump over the data, and also does not fail if, in a complicated
+\code{OBJ} format module, the user could potentially be assembling the
+code in any of several separate code sections.
+
+\xsection{absolute}{\codeindex{ABSOLUTE}: Defining Absolute Labels}
+
+The \code{ABSOLUTE} directive can be thought of as an alternative form
+of \code{SECTION}: it causes the subsequent code to be directed at no
+physical section, but at the hypothetical section starting at the
+given absolute address. The only instructions you can use in this
+mode are the \code{RESB} family.
+
+\code{ABSOLUTE} is used as follows:
+
+\begin{lstlisting}
+absolute 0x1A
+
+ kbuf_chr resw 1
+ kbuf_free resw 1
+ kbuf resw 16
+\end{lstlisting}
+
+This example describes a section of the PC BIOS data area, at
+segment address 0x40: the above code defines \code{kbuf\_chr} to be
+0x1A, \code{kbuf\_free} to be 0x1C, and \code{kbuf} to be 0x1E.
+
+The user-level form of \code{ABSOLUTE}, like that of \code{SECTION},
+redefines the \codeindex{\_\_SECT\_\_} macro when it is invoked.
+
+\codeindex{STRUC} and \codeindex{ENDSTRUC} are defined as macros
+which use \code{ABSOLUTE} (and also \code{\_\_SECT\_\_}).
+
+\code{ABSOLUTE} doesn't have to take an absolute constant as an
+argument: it can take an expression (actually, a \textindex{critical
+expression}: see \nref{crit}) and it can be a value in a segment.
+For example, a TSR can re-use its setup code as run-time BSS like this:
+
+\begin{lstlisting}
+ org 100h ; it's a .COM program
+ jmp setup ; setup code comes last
+ ; the resident part of the TSR goes here
+ ; ...
+setup:
+ ; now write the code that installs the TSR here
+ ; ...
+absolute setup
+
+runtimevar1 resw 1
+runtimevar2 resd 20
+
+tsr_end:
+\end{lstlisting}
+
+This defines some variables ``on top of'' the setup code, so that
+after the setup has finished running, the space it took up can be
+re-used as data storage for the running TSR. The symbol
+\code{tsr\_end} can be used to calculate the total size of
+the part of the TSR that needs to be made resident.
+
+\xsection{extern}{\codeindex{EXTERN}: \textindexlc{Importing Symbols} from Other Modules}
+
+\code{EXTERN} is similar to the MASM directive \code{EXTRN} and
+the C keyword \code{extern}: it is used to declare a symbol which
+is not defined anywhere in the module being assembled, but is assumed
+to be defined in some other module and needs to be referred to by this
+one. Not every object-file format can support external variables:
+the \code{bin} format cannot.
+
+The \code{EXTERN} directive takes as many arguments as you like.
+Each argument is the name of a symbol:
+
+\begin{lstlisting}
+extern _printf
+extern _sscanf,_fscanf
+\end{lstlisting}
+
+Some object-file formats provide extra features to the \code{EXTERN}
+directive. In all cases, the extra features are used by suffixing a
+colon to the symbol name followed by object-format specific text.
+For example, the \code{obj} format allows you to declare that the
+default segment base of an external should be the group \code{dgroup}
+by means of the directive
+
+\begin{lstlisting}
+extern _variable:wrt dgroup
+\end{lstlisting}
+
+The primitive form of \code{EXTERN} differs from the user-level form
+only in that it can take only one argument at a time: the support
+for multiple arguments is implemented at the preprocessor level.
+
+You can declare the same variable as \code{EXTERN} more than once: NASM
+will quietly ignore the second and later redeclarations.
+
+If a variable is declared both \code{GLOBAL} and \code{EXTERN}, or
+if it is declared as \code{EXTERN} and then defined, it will be
+treated as \code{GLOBAL}. If a variable is declared both as
+\code{COMMON} and \code{EXTERN}, it will be treated as \code{COMMON}.
+
+\xsection{global}{\codeindex{GLOBAL}: \textindexlc{Exporting Symbols} to Other Modules}
+
+\code{GLOBAL} is the other end of \code{EXTERN}: if one module declares a
+symbol as \code{EXTERN} and refers to it, then in order to prevent
+linker errors, some other module must actually \emph{define} the
+symbol and declare it as \code{GLOBAL}. Some assemblers use the name
+\codeindex{PUBLIC} for this purpose.
+
+\code{GLOBAL} uses the same syntax as \code{EXTERN}, except that it must
+refer to symbols which \emph{are} defined in the same module as the
+\code{GLOBAL} directive. For example:
+
+\begin{lstlisting}
+global _main
+_main:
+ ; some code
+\end{lstlisting}
+
+\code{GLOBAL}, like \code{EXTERN}, allows object formats to define private
+extensions by means of a colon. The \code{elf} object format, for
+example, lets you specify whether global data items are functions or
+data:
+
+\begin{lstlisting}
+global hashlookup:function, hashtable:data
+\end{lstlisting}
+
+Like \code{EXTERN}, the primitive form of \code{GLOBAL} differs
+from the user-level form only in that it can take only one argument
+at a time.
+
+\xsection{common}{\codeindex{COMMON}: Defining Common Data Areas}
+
+The \code{COMMON} directive is used to declare \textindex{\emph{common
+variables}}. A common variable is much like a global variable declared
+in the uninitialized data section, so that
+
+\begin{lstlisting}
+common intvar 4
+\end{lstlisting}
+
+is similar in function to
+
+\begin{lstlisting}
+global intvar
+section .bss
+
+intvar resd 1
+\end{lstlisting}
+
+The difference is that if more than one module defines the same
+common variable, then at link time those variables will be
+\emph{merged}, and references to \code{intvar} in all modules
+will point at the same piece of memory.
+
+Like \code{GLOBAL} and \code{EXTERN}, \code{COMMON} supports
+object-format specific extensions. For example, the \code{obj}
+format allows common variables to be NEAR or FAR, and the \code{elf}
+format allows you to specify the alignment requirements of
+a common variable:
+
+\begin{lstlisting}
+common commvar 4:near ; works in OBJ
+common intarray 100:4 ; works in ELF: 4 byte aligned
+\end{lstlisting}
+
+Once again, like \code{EXTERN} and \code{GLOBAL}, the primitive form of
+\code{COMMON} differs from the user-level form only in that it can take
+only one argument at a time.
+
+\xsection{static}{\codeindex{STATIC}: Local Symbols within Modules}
+
+Opposite to \code{EXTERN} and \code{GLOBAL}, \code{STATIC} is local
+symbol, but should be named according to the global mangling rules
+(named by analogy with the C keyword \code{static} as applied to
+functions or global variables).
+
+\begin{lstlisting}
+static foo
+foo:
+ ; codes
+\end{lstlisting}
+
+Unlike \code{GLOBAL}, \code{STATIC} does not allow object formats
+to accept private extensions mentioned in \nref{global}.
+
+\xsection{mangling}{\codeindex{(G|L)PREFIX}, \codeindex{(G|L)POSTFIX}:
+Mangling Symbols}
+
+\code{PREFIX}, \code{GPREFIX}, \code{LPREFIX}, \code{POSTFIX},
+\code{GPOSTFIX}, and \code{LPOSTFIX} directives can prepend or
+append the given argument to a certain type of symbols. The directive
+should be as a preprocess statement. Each usage is:
+
+\begin{itemize}
+ \item{\code{PREFIX}|\code{GPREFIX}: Prepend the argument to all
+ \code{EXTERN} \code{COMMON}, \code{STATIC}, and
+ \code{GLOBAL} symbols}
+
+ \item{\code{LPREFIX}: Prepend the argument to all other symbols
+ such as Local Labels, and backend defined symbols}
+
+ \item{\code{POSTFIX}|\code{GPOSTFIX}: Append the argument to
+ all \code{EXTERN} \code{COMMON}, \code{STATIC}, and
+ \code{GLOBAL} symbols}
+
+ \item{\code{LPOSTFIX}: Append the argument to all other symbols
+ such as Local Labels, and backend defined symbols}
+\end{itemize}
+
+This is a macro implemented as a \code{\%pragma}:
+
+\begin{lstlisting}
+%pragma macho lprefix L_
+\end{lstlisting}
+
+Commandline option is also possible. See also \nref{opt-pfix}.
+
+Some toolchains is aware of a particular prefix for its own optimization
+options, such as code elimination. For instance, Mach-O backend has a
+linker that uses a simplistic naming scheme to chunk up sections into a
+meta section. When the \code{subsections\_via\_symbols} directive
+(\nref{macho-ssvs}) is declared, each symbol is the start of a
+separate block. The meta section is, then, defined to include sections
+before the one that starts with a 'L'. \code{LPREFIX} is useful here to
+mark all local symbols with the 'L' prefix to be excluded to the meta
+section. It converts local symbols compatible with the particular
+toolchain. Note that local symbols declared with \code{STATIC}
+(\nref{static}) are excluded from the symbol mangling and also
+not marked as global.
+
+\xsection{gen-namespace}{\codeindex{OUTPUT}, \codeindex{DEBUG}:
+Generic Namespaces}
+
+\code{OUTPUT} and \code{DEBUG} are generic \code{\%pragma} namespaces
+that are supposed to redirect to the current output and debug formats.
+For example, when mangling local symbols via the generic namespace:
+
+\begin{lstlisting}
+%pragma output gprefix _
+\end{lstlisting}
+
+This is useful when the directive is needed to be output format
+agnostic.
+
+The example is also euquivalent to this, when the output format is
+\code{elf}:
+
+\begin{lstlisting}
+%pragma elf gprefix _
+\end{lstlisting}
+
+
+\xsection{cpu}{\codeindex{CPU}: Defining CPU Dependencies}
+
+The \code{CPU} directive restricts assembly to those instructions which
+are available on the specified CPU.
+
+Options are:
+
+\begin{tabular}{ l l }
+ \code{CPU 8086} & Assemble only 8086 instruction set \\
+ \code{CPU 186} & Assemble instructions up to the 80186 instruction set \\
+ \code{CPU 286} & Assemble instructions up to the 286 instruction set \\
+ \code{CPU 386} & Assemble instructions up to the 386 instruction set \\
+ \code{CPU 486} & 486 instruction set \\
+ \code{CPU 586} & Pentium instruction set \\
+ \code{CPU PENTIUM} & Same as 586 \\
+ \code{CPU 686} & P6 instruction set \\
+ \code{CPU PPRO} & Same as 686 \\
+ \code{CPU P2} & Same as 686 \\
+ \code{CPU P3} & Pentium III (Katmai) instruction sets \\
+ \code{CPU KATMAI} & Same as P3 \\
+ \code{CPU P4} & Pentium 4 (Willamette) instruction set \\
+ \code{CPU WILLAMETTE} & Same as P4 \\
+ \code{CPU PRESCOTT} & Prescott instruction set \\
+ \code{CPU X64} & x86-64 (x64/AMD64/Intel 64) instruction set \\
+ \code{CPU IA64} & IA64 CPU (in x86 mode) instruction set \\
+\end{tabular}
+
+All options are case insensitive. All instructions will be selected
+only if they apply to the selected CPU or lower. By default, all
+instructions are available.
+
+\xsection{float}{\codeindex{FLOAT}: Handling of \index{constants!floating-point}
+floating-point constants}
+
+By default, floating-point constants are rounded to nearest, and IEEE
+denormals are supported. The following options can be set to alter
+this behaviour:
+
+\begin{tabular}{ l l }
+ \code{FLOAT DAZ} & Flush denormals to zero \\
+ \code{FLOAT NODAZ} & Do not flush denormals to zero (default) \\
+ \code{FLOAT NEAR} & Round to nearest (default) \\
+ \code{FLOAT UP} & Round up (toward +Infinity) \\
+ \code{FLOAT DOWN} & Round down (toward -Infinity) \\
+ \code{FLOAT ZERO} & Round toward zero \\
+ \code{FLOAT DEFAULT} & Restore default settings \\
+\end{tabular}
+
+The standard macros \codeindex{\_\_FLOAT\_DAZ\_\_},
+\codeindex{\_\_FLOAT\_ROUND\_\_}, and \codeindex{\_\_FLOAT\_\_} contain
+the current state, as long as the programmer has avoided the use
+of the brackeded primitive form, (\code{[FLOAT]}).
+
+\code{\_\_FLOAT\_\_} contains the full set of floating-point settings;
+this value can be saved away and invoked later to restore the setting.
+
+\xsection{asmdir-warning}{\codeindex{[WARNING]}: Enable or disable warnings}
+
+The \code{[WARNING]} directive can be used to enable or disable classes
+of warnings in the same way as the \code{-w} option, see \nref{opt-w}
+for more details about warning classes.
+
+\begin{itemize}
+ \item{\code{[warning +\emph{warning-class}]} enables warnings for
+ \emph{warning-class}}.
+
+ \item{\code{[warning -\emph{warning-class}]} disables warnings for
+ \emph{warning-class}}.
+
+ \item{\code{[warning *\emph{warning-class}]} restores \emph{warning-class} to
+ the original value, either the default value or as specified on the
+ command line.}
+
+ \item{\code{[warning push]} saves the current warning state on a stack.}
+
+ \item{\code{[warning pop]} restores the current warning state from the stack.}
+\end{itemize}
+
+The \code{[WARNING]} directive also accepts the \code{all}, \code{error} and
+\code{error=}\emph{warning-class} specifiers.
+
+No ``user form'' (without the brackets) currently exists.