diff options
Diffstat (limited to 'doc/latex/src/language.tex')
-rw-r--r-- | doc/latex/src/language.tex | 945 |
1 files changed, 945 insertions, 0 deletions
diff --git a/doc/latex/src/language.tex b/doc/latex/src/language.tex new file mode 100644 index 00000000..e29b8381 --- /dev/null +++ b/doc/latex/src/language.tex @@ -0,0 +1,945 @@ +% +% vim: ts=4 sw=4 et +% +\xchapter{lang}{The NASM Language} + +\xsection{syntax}{Layout of a NASM Source Line} + +Like most assemblers, each NASM source line contains (unless it +is a macro, a preprocessor directive or an assembler directive: see +\nref{preproc} and \nref{directive}) some combination +of the four fields + +\begin{lstlisting} +label: instruction operands ; comment +\end{lstlisting} + +As usual, most of these fields are optional; the presence or absence +of any combination of a label, an instruction and a comment is allowed. +Of course, the operand field is either required or forbidden by the +presence and nature of the instruction field. + +NASM uses backslash (\code{\textbackslash}) as the line continuation character; +if a line ends with backslash, the next line is considered to be +a part of the backslash-ended line. + +NASM places no restrictions on white space within a line: labels may +have white space before them, or instructions may have no space +before them, or anything. The \textindex{colon} after a label is also +optional. (Note that this means that if you intend to code \code{lodsb} +alone on a line, and type \code{lodab} by accident, then that's still a +valid source line which does nothing but define a label. Running +NASM with the command-line option \index{orphan-labels}\code{-w+orphan-labels} +will cause it to warn you if you define a label alone on a line without +a \textindex{trailing colon}.) + +\textindex{Valid characters} in labels are letters, numbers, \code{\_}, +\code{\$}, \code{\#}, \code{\@}, \code{~}, \code{.}, and \code{?}. +The only characters which may be used as the \emph{first} character of +an identifier are letters, \code{\.} (with special meaning: see +\nref{locallab}), \code{\_} and \code{?}. +An identifier may also be prefixed with a \codeindex{\$} to indicate +that it is intended to be read as an identifier and not a reserved word; +thus, if some other module you are linking with defines a symbol called +\code{eax}, you can refer to \code{\$eax} in NASM code to distinguish +the symbol from the register. Maximum length of an identifier is +4095 characters. + +The instruction field may contain any machine instruction: Pentium +and P6 instructions, FPU instructions, MMX instructions and even +undocumented instructions are all supported. The instruction may be +prefixed by \code{LOCK}, \code{REP}, \code{REPE}/\code{REPZ}, +\code{REPNE}/\code{REPNZ}, \code{XACQUIRE}/\code{XRELEASE} or +\code{BND}/\code{NOBND}, in the usual way. Explicit +\index{address-size!prefixes}address-size and \textindex{operand-size!prefixes} +\codeindex{A16}, \codeindex{A32}, \codeindex{A64}, \codeindex{O16} +and \codeindex{O32}, \codeindex{O64} are provided~-- one example of their +use is given in \nref{mixsize}. You can also use the name of a +\index{segment override}segment register as an instruction prefix: coding +\code{es mov [bx],ax} is equivalent to coding \code{mov [es:bx],ax}. +We recommend the latter syntax, since it is consistent with other syntactic +features of the language, but for instructions such as \code{LODSB}, which +has no operands and yet can require a segment override, there is no clean +syntactic way to proceed apart from \code{es lodsb}. + +An instruction is not required to use a prefix: prefixes such as +\code{CS}, \code{A32}, \code{LOCK} or \code{REPE} can appear on +a line by themselves, and NASM will just generate the prefix bytes. + +In addition to actual machine instructions, NASM also supports a +number of pseudo-instructions, described in \k{pseudop}. + +Instruction \textindex{operands} may take a number of forms: they can be +registers, described simply by the register name (e.g. \code{ax}, +\code{bp}, \code{ebx}, \code{cr0}: NASM does not use the \code{gas}-style +syntax in which register names must be prefixed by a \code{\%} sign), +or they can be \textindex{effective addresses} (see \nref{effaddr}), +constants (\nref{const}) or expressions (\nref{expr}). + +For x87 \textindex{floating-point} instructions, NASM accepts a wide +range of syntaxes: you can use two-operand forms like MASM supports, +or you can use NASM's native single-operand forms in most cases. +% Details of all forms of each supported instruction are given in +% \nref{iref}. +For example, you can code: + +\begin{lstlisting} +fadd st1 ; this sets st0 := st0 + st1 +fadd st0,st1 ; so does this + +fadd st1,st0 ; this sets st1 := st1 + st0 +fadd to st1 ; so does this +\end{lstlisting} + +Almost any x87 floating-point instruction that references memory must +use one of the prefixes \codeindex{DWORD}, \codeindex{QWORD} or +\codeindex{TWORD} to indicate what size of \textindex{memory operand} +it refers to. + +\xsection{pseudop}{\textindexlc{Pseudo-Instructions}} + +Pseudo-instructions are things which, though not real x86 machine +instructions, are used in the instruction field anyway because that's +the most convenient place to put them. The current pseudo-instructions +are \codeindex{DB}, \codeindex{DW}, \codeindex{DD}, \codeindex{DQ}, +\codeindex{DT}, \codeindex{DO}, \codeindex{DY} and \codeindex{DZ}; +their \textindex{uninitialized} counterparts \codeindex{RESB}, +\codeindex{RESW}, \codeindex{RESD}, \codeindex{RESQ}, +\codeindex{REST}, \codeindex{RESO}, \codeindex{RESY} and +\codeindex{RESZ}; the \codeindex{INCBIN} command, the \codeindex{EQU} +command, and the \codeindex{TIMES} prefix. + +\xsubsection{db}{DB and Friends: Declaring Initialized Data} + +\codeindex{DB}, \codeindex{DW}, \codeindex{DD}, \codeindex{DQ}, +\codeindex{DT}, \codeindex{DO}, \codeindex{DY} and \codeindex{DZ} +are used, much as in MASM, to declare initialized data in +the output file. They can be invoked in a wide range of ways: +\index{constants!floating-point} +\index{constants!character} +\index{constants!string} + +\begin{lstlisting} +db 0x55 ; just the byte 0x55 +db 0x55,0x56,0x57 ; three bytes in succession +db 'a',0x55 ; character constants are OK +db 'hello',13,10,'$' ; so are string constants +dw 0x1234 ; 0x34 0x12 +dw 'a' ; 0x61 0x00 (it's just a number) +dw 'ab' ; 0x61 0x62 (character constant) +dw 'abc' ; 0x61 0x62 0x63 0x00 (string) +dd 0x12345678 ; 0x78 0x56 0x34 0x12 +dd 1.234567e20 ; floating-point constant +dq 0x123456789abcdef0 ; eight byte constant +dq 1.234567e20 ; double-precision float +dt 1.234567e20 ; extended-precision float +\end{lstlisting} + +\code{DT}, \code{DO}, \code{DY} and \code{DZ} do not accept +numeric constants as operands. +\index{constants!numeric} + +\xsubsection{resb}{RESB and Friends: Declaring \textindexlc{Uninitialized} Data} + +\codeindex{RESB}, \codeindex{RESW}, \codeindex{RESD}, \codeindex{RESQ}, +\codeindex{REST}, \codeindex{RESO}, \codeindex{RESY} and \codeindex{RESZ} +are designed to be used in the BSS section of a module: they declare +\emph{uninitialized} storage space. Each takes a single operand, which is +the number of bytes, words, doublewords or whatever to reserve. As stated +in \nref{qsother}, NASM does not support the MASM/TASM syntax of +reserving uninitialized space by writing \index{?}\code{DW ?} or similar +things: this is what it does instead. The operand to a \code{RESB}-type +pseudo-instruction is a \textindex{critical expression}: +see \nref{crit}. + +For example: + +\begin{lstlisting} +buffer: resb 64 ; reserve 64 bytes +wordvar: resw 1 ; reserve a word +realarray resq 10 ; array of ten reals +ymmval: resy 1 ; one YMM register +zmmvals: resz 32 ; 32 ZMM registers +\end{lstlisting} + +\xsubsection{incbin}{\codeindex{INCBIN}: Including External \textindexlc{Binary Files}} + +\code{INCBIN} is borrowed from the old Amiga assembler \textindex{DevPac}: +it includes a binary file verbatim into the output file. This can be handy +for (for example) including \textindex{graphics} and \textindex{sound} data +directly into a game executable file. It can be called in one of these +three ways: + +\begin{lstlisting} +incbin "file.dat" ; include the whole file +incbin "file.dat",1024 ; skip the first 1024 bytes +incbin "file.dat",1024,512 ; skip the first 1024, and +\end{lstlisting} + +\code{INCBIN} is both a directive and a standard macro; the standard +macro version searches for the file in the include file search path +and adds the file to the dependency lists. This macro can be +overridden if desired. + +\xsubsection{equ}{\codeindex{EQU}: Defining Constants} + +\code{EQU} defines a symbol to a given constant value: when \code{EQU} is +used, the source line must contain a label. The action of \code{EQU} is +to define the given label name to the value of its (only) operand. +This definition is absolute, and cannot change later. So, for +example, + +\begin{lstlisting} +message db 'hello, world' +msglen equ $-message +\end{lstlisting} + +defines \code{msglen} to be the constant 12. \code{msglen} may +not then be redefined later. This is not a \textindex{preprocessor} +definition either: the value of \code{msglen} is evaluated \code{once}, +using the value of \code{\$} (see \nref{expr} for an explanation +of \code{\$}) at the point of definition, rather than being evaluated +wherever it is referenced and using the value of \code{\$} at +the point of reference. + +\xsubsection{times}{\codeindex{TIMES}: \textindexlc{Repeating} Instructions or Data} + +The \code{TIMES} prefix causes the instruction to be assembled multiple +times. This is partly present as NASM's equivalent of the \codeindex{DUP} +syntax supported by \textindex{MASM}-compatible assemblers, in that you can +code + +\begin{lstlisting} +zerobuf: times 64 db 0 +\end{lstlisting} + +or similar things; but \code{TIMES} is more versatile than that. The +argument to \code{TIMES} is not just a numeric constant, but a numeric +\emph{expression}, so you can do things like + +\begin{lstlisting} +buffer: db 'hello, world' + times 64-$+buffer db ' ' +\end{lstlisting} + +which will store exactly enough spaces to make the total length of +\code{buffer} up to 64. Finally, \code{TIMES} can be applied to ordinary +instructions, so you can code trivial \textindex{unrolled loops} in it: + +\begin{lstlisting} +times 100 movsb +\end{lstlisting} + +Note that there is no effective difference between \code{times 100 resb +1} and \code{resb 100}, except that the latter will be assembled about +100 times faster due to the internal structure of the assembler. + +The operand to \code{TIMES} is a critical expression (\nref{crit}). + +Note also that \code{TIMES} can't be applied to \textindex{macros}: the reason +for this is that \code{TIMES} is processed after the macro phase, which +allows the argument to \code{TIMES} to contain expressions such as +\code{64-\$+buffer} as above. To repeat more than one line of code, +or a complex macro, use the preprocessor \codeindex{\%rep} directive. + +\xsection{effaddr}{Effective Addresses} + +An \textindex{effective address} is any operand to an instruction which +\index{memory reference}references memory. Effective addresses, in NASM, +have a very simple syntax: they consist of an expression evaluating +to the desired address, enclosed in \textindex{square brackets}. For +example: + +\begin{lstlisting} +wordvar dw 123 + mov ax,[wordvar] + mov ax,[wordvar+1] + mov ax,[es:wordvar+bx] +\end{lstlisting} + +Anything not conforming to this simple system is not a valid memory +reference in NASM, for example \code{es:wordvar[bx]}. + +More complicated effective addresses, such as those involving more +than one register, work in exactly the same way: + +\begin{lstlisting} +mov eax,[ebx*2+ecx+offset] +mov ax,[bp+di+8] +\end{lstlisting} + +NASM is capable of doing \textindex{algebra} on these effective addresses, +so that things which don't necessarily \emph{look} legal are perfectly +all right: + +\begin{lstlisting} +mov eax,[ebx*5] ; assembles as [ebx*4+ebx] +mov eax,[label1*2-label2] ; ie [label1+(label1-label2)] +\end{lstlisting} + +Some forms of effective address have more than one assembled form; +in most such cases NASM will generate the smallest form it can. For +example, there are distinct assembled forms for the 32-bit effective +addresses \code{[eax*2+0]} and \code{[eax+eax]}, and NASM will +generally generate the latter on the grounds that the former requires +four bytes to store a zero offset. + +NASM has a hinting mechanism which will cause \code{[eax+ebx]} and +\code{[ebx+eax]} to generate different opcodes; this is occasionally +useful because \code{[esi+ebp]} and \code{[ebp+esi]} have different +default segment registers. + +However, you can force NASM to generate an effective address in a +particular form by the use of the keywords \code{BYTE}, \code{WORD}, +\code{DWORD} and \code{NOSPLIT}. If you need \code{[eax+3]} to be +assembled using a double-word offset field instead of the one byte NASM +will normally generate, you can code \code{[dword eax+3]}. Similarly, you +can force NASM to use a byte offset for a small value which it hasn't seen +on the first pass (see \nref{crit} for an example of such a code +fragment) by using \code{[byte eax+offset]}. As special cases, \code{[byte eax]} +will code \code{[eax+0]} with a byte offset of zero, and \code{[dword eax]} +will code it with a double-word offset of zero. The normal form, \code{[eax]}, +will be coded with no offset field. + +The form described in the previous paragraph is also useful if you +are trying to access data in a 32-bit segment from within 16 bit code. +For more information on this see the section on mixed-size addressing +(\nref{mixaddr}). In particular, if you need to access data with +a known offset that is larger than will fit in a 16-bit value, if you don't +specify that it is a dword offset, nasm will cause the high word of +the offset to be lost. + +Similarly, NASM will split \code{[eax*2]} into \code{[eax+eax]} because +that allows the offset field to be absent and space to be saved; in fact, +it will also split \code{[eax*2+offset]} into \code{[eax+eax+offset]}. +You can combat this behaviour by the use of the \code{NOSPLIT} keyword: +\code{[nosplit eax*2]} will force \code{[eax*2+0]} to be generated literally. +\code{[nosplit eax*1]} also has the same effect. In another way, a split EA +form \code{[0, eax*2]} can be used, too. However, \code{NOSPLIT} in +\code{[nosplit eax+eax]} will be ignored because user's intention here +is considered as \code{[eax+eax]}. + +In 64-bit mode, NASM will by default generate absolute addresses. The +\codeindex{REL} keyword makes it produce \code{RIP}-relative addresses. +Since this is frequently the normally desired behaviour, see the \code{DEFAULT} +directive (\nref{default}). The keyword \codeindex{ABS} overrides +\codeindex{REL}. + +A new form of split effective addres syntax is also supported. This is +mainly intended for mib operands as used by MPX instructions, but can +be used for any memory reference. The basic concept of this form is +splitting base and index. + +\begin{lstlisting} +mov eax,[ebx+8,ecx*4] ; ebx=base, ecx=index, 4=scale, 8=disp +\end{lstlisting} + +For mib operands, there are several ways of writing effective address +depending on the tools. NASM supports all currently possible ways of +mib syntax: + +\begin{lstlisting} +; bndstx +; next 5 lines are parsed same +; base=rax, index=rbx, scale=1, displacement=3 +bndstx [rax+0x3,rbx], bnd0 ; NASM - split EA +bndstx [rbx*1+rax+0x3], bnd0 ; GAS - '*1' indecates an index reg +bndstx [rax+rbx+3], bnd0 ; GAS - without hints +bndstx [rax+0x3], bnd0, rbx ; ICC-1 +bndstx [rax+0x3], rbx, bnd0 ; ICC-2 +\end{lstlisting} + +When broadcasting decorator is used, the opsize keyword should match +the size of each element. + +\begin{lstlisting} +vdivps zmm4, zmm5, dword [rbx]{1to16} ; single-precision float +vdivps zmm4, zmm5, zword [rbx] ; packed 512 bit memory +\end{lstlisting} + +\xsection{const}{\textindexlc{Constants}} + +NASM understands four different types of constant: numeric, +character, string and floating-point. + +\xsubsection{numconst}{Numeric Constants} +\index{constants!numeric} +\index{constants!hexadecimal} +\index{constants!decimal} +\index{constants!octal} +\index{constants!binary} + +A numeric constant is simply a number. NASM allows you to specify +numbers in a variety of number bases, in a variety of ways: you can +suffix \code{H} or \code{X}, \code{D} or \code{T}, \code{Q} or +\code{O}, and \code{B} or \code{Y} for hexadecimal, decimal, octal and +binary respectively, or you can prefix \code{0x}, for hexadecimal in +the style of C, or you can prefix \code{\$} for hexadecimal in the style +of Borland Pascal or Motorola Assemblers. Note, though, that the \index{prefix} +\codeindex{\$} prefix does double duty as a prefix on identifiers (see \nref{syntax}), +so a hex number prefixed with a \code{\$} sign must have a digit after the +\code{\$} rather than a letter. In addition, current versions of NASM accept +the prefix \code{0h} for hexadecimal, \code{0d} or \code{0t} for decimal, +\code{0o} or \code{0q} for octal, and \code{0b} or \code{0y} for binary. +Please note that unlike C, a \code{0} prefix by itself does \emph{not} imply +an octal constant! + +Numeric constants can have underscores (\code{\_}) interspersed to break +up long strings. + +Some examples (all producing exactly the same code): + +\begin{lstlisting} +mov ax,200 ; decimal +mov ax,0200 ; still decimal +mov ax,0200d ; explicitly decimal +mov ax,0d200 ; also decimal +mov ax,0c8h ; hex +mov ax,$0c8 ; hex again: the 0 is required +mov ax,0xc8 ; hex yet again +mov ax,0hc8 ; still hex +mov ax,310q ; octal +mov ax,310o ; octal again +mov ax,0o310 ; octal yet again +mov ax,0q310 ; octal yet again +mov ax,11001000b ; binary +mov ax,1100_1000b ; same binary constant +mov ax,1100_1000y ; same binary constant once more +mov ax,0b1100_1000 ; same binary constant yet again +mov ax,0y1100_1000 ; same binary constant yet again +\end{lstlisting} + +\xsubsection{strings}{\index{strings}Character Strings} + +A character string consists of up to eight characters enclosed in +either single quotes (\code{'...'}), double quotes (\code{"..."}) or +backquotes (\code{`...`}). Single or double quotes are equivalent to +NASM (except of course that surrounding the constant with single +quotes allows double quotes to appear within it and vice versa); the +contents of those are represented verbatim. Strings enclosed in +backquotes support C-style \code{\textbackslash}-escapes for +special characters. + +The following \textindex{escape sequences} are recognized by +backquoted strings: + +\begin{lstlisting} +\' single quote (') +\" double quote (") +\` backquote (`) +\\ backslash (\) +\? question mark (?) +\a BEL (ASCII 7) +\b BS (ASCII 8) +\t TAB (ASCII 9) +\n LF (ASCII 10) +\v VT (ASCII 11) +\f FF (ASCII 12) +\r CR (ASCII 13) +\e ESC (ASCII 27) +\377 Up to 3 octal digits - literal byte +\xFF Up to 2 hexadecimal digits - literal byte +\u1234 4 hexadecimal digits - Unicode character +\U12345678 8 hexadecimal digits - Unicode character +\end{lstlisting} + +All other escape sequences are reserved. Note that \code{\textbackslash 0}, +meaning a \code{NUL} character (ASCII 0), is a special case of +the octal escape sequence. + +\textindex{Unicode} characters specified with \code{\textbackslash u} +or \code{\textbackslash U} are converted to \textindex{UTF-8}. +For example, the following lines are all equivalent: + +\begin{lstlisting} +db `\u263a` ; UTF-8 smiley face +db `\xe2\x98\xba` ; UTF-8 smiley face +db 0E2h, 098h, 0BAh ; UTF-8 smiley face +\end{lstlisting} + +\xsubsection{chrconst}{Character Constants} +\index{constants!character} + +A character constant consists of a string up to eight bytes long, used +in an expression context. It is treated as if it was an integer. + +A character constant with more than one byte will be arranged +with \textindex{little-endian} order in mind: if you code + +\begin{lstlisting} +mov eax,'abcd' +\end{lstlisting} + +then the constant generated is not \code{0x61626364}, but \code{0x64636261}, +so that if you were then to store the value into memory, it would read +\code{abcd} rather than \code{dcba}. This is also the sense of character +constants understood by the Pentium's \codeindex{CPUID} instruction. + +\xsubsection{strconst}{String Constants} +\index{constants!string} + +String constants are character strings used in the context of some +pseudo-instructions, namely the \indexcode{DW}\indexcode{DD}\indexcode{DQ} +\indexcode{DT}\indexcode{DO}\indexcode{DY}\codeindex{DB} family and +\codeindex{INCBIN} (where it represents a filename.) They are also used in +certain preprocessor directives. + +A string constant looks like a character constant, only longer. It +is treated as a concatenation of maximum-size character constants +for the conditions. So the following are equivalent: + +\begin{lstlisting} +db 'hello' ; string constant +db 'h','e','l','l','o' ; equivalent character constants +\end{lstlisting} + +And the following are also equivalent: + +\begin{lstlisting} +dd 'ninechars' ; doubleword string constant +dd 'nine','char','s' ; becomes three doublewords +db 'ninechars',0,0,0 ; and really looks like this +\end{lstlisting} + +Note that when used in a string-supporting context, quoted strings are +treated as a string constants even if they are short enough to be a +character constant, because otherwise \code{db 'ab'} would have the same +effect as \code{db 'a'}, which would be silly. Similarly, three-character +or four-character constants are treated as strings when they are +operands to \code{DW}, and so forth. + +\xsubsection{unicode}{Unicode Constants} +\index{constants!unicode} +\index{UTF-16} +\index{UTF-32} + +The special operators \codeindex{\_\_utf16\_\_}, \codeindex{\_\_utf16le\_\_}, +\codeindex{\_\_utf16be\_\_}, \codeindex{\_\_utf32\_\_}, \codeindex{\_\_utf32le\_\_} +and \codeindex{\_\_utf32be\_\_} allows definition of Unicode strings. +They take a string in UTF-8 format and converts it to UTF-16 or UTF-32, +respectively. Unless the \code{be} forms are specified, the output is +littleendian. + +For example: + +\begin{lstlisting} +%define u(x) __utf16__(x) +%define w(x) __utf32__(x) + + dw u('C:\WINDOWS'), 0 ; Pathname in UTF-16 + dd w(`A + B = \u206a`), 0 ; String in UTF-32 +\end{lstlisting} + +The UTF operators can be applied either to strings passed to the +\code{DB} family instructions, or to character constants in an expression +context. + +\xsubsection{fltconst}{Floating-Point Constants} +\index{constants!floating-point} + +\textindexlc{Floating-point} constants are acceptable only as arguments to +\codeindex{DB}, \codeindex{DW}, \codeindex{DD}, \codeindex{DQ}, \codeindex{DT}, +and \codeindex{DO}, or as arguments to the special operators \codeindex{\_\_float8\_\_}, +\codeindex{\_\_float16\_\_}, \codeindex{\_\_float32\_\_}, \codeindex{\_\_float64\_\_}, +\codeindex{\_\_float80m\_\_}, \codeindex{\_\_float80e\_\_}, \codeindex{\_\_float128l\_\_}, +and \codeindex{\_\_float128h\_\_}. + +Floating-point constants are expressed in the traditional form: +digits, then a period, then optionally more digits, then optionally an +\code{E} followed by an exponent. The period is mandatory, so that NASM +can distinguish between \code{dd 1}, which declares an integer constant, +and \code{dd 1.0} which declares a floating-point constant. + +NASM also support C99-style hexadecimal floating-point: \code{0x}, +hexadecimal digits, period, optionally more hexadeximal digits, then +optionally a \code{P} followed by a \emph{binary} (not hexadecimal) +exponent in decimal notation. As an extension, NASM additionally +supports the \code{0h} and \code{\$} prefixes for hexadecimal, +as well binary and octal floating-point, using the \code{0b} or +\code{0y} and \code{0o} or \code{0q} prefixes, respectively. + +Underscores to break up groups of digits are permitted in +floating-point constants as well. + +Some examples: + +\begin{lstlisting} +db -0.2 ; "Quarter precision" +dw -0.5 ; IEEE 754r/SSE5 half precision +dd 1.2 ; an easy one +dd 1.222_222_222 ; underscores are permitted +dd 0x1p+2 ; 1.0x2^2 = 4.0 +dq 0x1p+32 ; 1.0x2^32 = 4 294 967 296.0 +dq 1.e10 ; 10 000 000 000.0 +dq 1.e+10 ; synonymous with 1.e10 +dq 1.e-10 ; 0.000 000 000 1 +dt 3.141592653589793238462 ; pi +do 1.e+4000 ; IEEE 754r quad precision +\end{lstlisting} + +The 8-bit "quarter-precision" floating-point format is +sign:exponent:mantissa = 1:4:3 with an exponent bias of 7. This +appears to be the most frequently used 8-bit floating-point format, +although it is not covered by any formal standard. This is sometimes +called a ``\textindex{minifloat}''. + +The special operators are used to produce floating-point numbers in +other contexts. They produce the binary representation of a specific +floating-point number as an integer, and can use anywhere integer +constants are used in an expression. \code{\_\_float80m\_\_} and +\code{\_\_float80e\_\_} produce the 64-bit mantissa and 16-bit +exponent of an 80-bit floating-point number, and \code{\_\_float128l\_\_} +and \code{\_\_float128h\_\_} produce the lower and upper 64-bit halves +of a 128-bit floating-point number, respectively. + +For example: + +\begin{lstlisting} +mov rax,__float64__(3.141592653589793238462) +\end{lstlisting} + +would assign the binary representation of pi as a 64-bit floating +point number into \code{RAX}. This is exactly equivalent to: + +\begin{lstlisting} +mov rax,0x400921fb54442d18 +\end{lstlisting} + +NASM cannot do compile-time arithmetic on floating-point constants. +This is because NASM is designed to be portable - although it always +generates code to run on x86 processors, the assembler itself can +run on any system with an ANSI C compiler. Therefore, the assembler +cannot guarantee the presence of a floating-point unit capable of +handling the \textindexlc{Intel number formats}, and so for NASM +to be able to do floating arithmetic it would have to include its +own complete set of floating-point routines, which would significantly +increase the size of the assembler for very little benefit. + +The special tokens \codeindex{\_\_Infinity\_\_}, \codeindex{\_\_QNaN\_\_} (or +\codeindex{\_\_NaN\_\_}) and \codeindex{\_\_SNaN\_\_} can be used to generate +\index{infinity}infinities, quiet \textindex{NaN}s, and signalling NaNs, +respectively. These are normally used as macros: + +\begin{lstlisting} +%define Inf __Infinity__ +%define NaN __QNaN__ + + dq +1.5, -Inf, NaN ; Double-precision constants +\end{lstlisting} + +The \code{\%use fp} standard macro package contains a set of convenience +macros. See \nref{pkgfp}. + +\xsubsection{bcdconst}{Packed BCD Constants} +\index{constants!packed BCD} + +x87-style packed BCD constants can be used in the same contexts as +80-bit floating-point numbers. They are suffixed with \code{p} or +prefixed with \code{0p}, and can include up to 18 decimal digits. + +As with other numeric constants, underscores can be used +to separate digits. + +For example: + +\begin{lstlisting} +dt 12_345_678_901_245_678p +dt -12_345_678_901_245_678p +dt +0p33 +dt 33p +\end{lstlisting} + +\xsection{expr}{\textindex{Expressions}} + +Expressions in NASM are similar in syntax to those in C. Expressions +are evaluated as 64-bit integers which are then adjusted to the +appropriate size. + +NASM supports two special tokens in expressions, allowing +calculations to involve the current assembly position: the +\index{\$}\index{here}\code{\$} and \codeindex{\$\$} tokens. +\code{\$} evaluates to the assembly position at the beginning +of the line containing the expression; so you can code an +\textindex{infinite loop} using \code{JMP \$}. \code{\$\$} +evaluates to the beginning of the current section; so you can +tell how far into the section you are by using \code{(\$-\$\$)}. + +The arithmetic \textindex{operators} provided by NASM are listed here, +in increasing order of \textindex{precedence}. + +\xsubsection{expor}{\codeindex{|}: Bitwise OR Operator} +\index{bitwise!OR} + +The \code{|} operator gives a bitwise OR, exactly as performed by the +\code{OR} machine instruction. Bitwise OR is the lowest-priority +arithmetic operator supported by NASM. + +\xsubsection{expxor}{\codeindex{\textasciicircum}: Bitwise XOR Operator} +\index{bitwise!XOR} + +The \code{\textasciicircum} operator provides the bitwise XOR operation. + +\xsubsection{expand}{\codeindex{\&}: Bitwise AND Operator} +\index{bitwise!AND} + +The \code{\&} operator provides the bitwise AND operation. + +\xsubsection{expshift}{\codeindex{<<} and \codeindex{>>}: \textindexlc{Bit Shift} Operators} + +\code{<<} gives a bit-shift to the left, just as it does in C. +So \code{5<<3} evaluates to 5 times 8, or 40. \code{>>} gives +a bit-shift to the right; in NASM, such a shift is \emph{always} +unsigned, so that the bits shifted in from the left-hand end +are filled with zero rather than a sign-extension of the +previous highest bit. + +\xsubsection{expplmi}{\codeindex{+} and \codeindex{-}: +\textindexlc{Addition} and \textindexlc{Subtraction} Operators} + +The \code{+} and \code{-} operators do perfectly ordinary addition +and subtraction. + +\xsubsection{expmul}{\codeindex{*}, \codeindex{/}, +\codeindex{//} and \codeindex{\%\%}: +\textindexlc{Multiplication} and \textindexlc{Division}} + +\code{*} is the multiplication operator. \code{/} and \code{//} are both +division operators: \code{/} is \textindex{unsigned division} and +\code{//} is \textindex{signed division}. Similarly, \code{\%} and +\code{\%\%} provide \index{unsigned modulo}\index{modulo operators}unsigned +and \textindex{signed modulo} operators respectively. + +NASM, like ANSI C, provides no guarantees about the sensible +operation of the signed modulo operator. + +Since the \code{\%} character is used extensively by the macro +\textindex{preprocessor}, you should ensure that both the signed +and unsigned modulo operators are followed by white space wherever +they appear. + +\xsubsection{expunary}{\textindex{Unary Operators}} +\index{unary!+} +\index{unary!-} +\index{unary!\textasciitilde} +\index{unary!seg} + +The highest-priority operators in NASM's expression grammar are those +which only apply to one argument. These are \codeindex{+}, +\codeindex{-}, \codeindex{\textasciitilde}, \codeindex{!}, +\codeindex{SEG}, and the \textindex{integer functions} operators. + +\code{-} negates its operand, \code{+} does nothing (it's provided for +symmetry with \code{-}), \code{\textasciitilde} computes the +\textindex{one's complement} of its operand, \code{!} is the +\textindex{logical negation} operator. + +\code{SEG} provides the \textindex{segment address} +of its operand (explained in more detail in \nref{segwrt}). + +A set of additional operators with leading and trailing double +underscores are used to implement the integer functions of the +\code{ifunc} macro package, see \nref{pkgifunc}. + +\xsection{segwrt}{\codeindex{SEG} and \codeindex{WRT}} + +When writing large 16-bit programs, which must be split into +multiple \textindex{segments}, it is often necessary to be able +to refer to the \index{segment address}segment part of the address +of a symbol. NASM supports the \code{SEG} operator to perform +this function. + +The \code{SEG} operator returns the \emph{\textindex{preferred}} +segment base of a symbol, defined as the segment base relative +to which the offset of the symbol makes sense. So the code + +\begin{lstlisting} +mov ax,seg symbol +mov es,ax +mov bx,symbol +\end{lstlisting} + +will load \code{ES:BX} with a valid pointer to the symbol +\code{symbol}. + +Things can be more complex than this: since 16-bit segments and +\textindex{groups} may \index{overlapping segments}overlap, +you might occasionally want to refer to some symbol using +a different segment base from the preferred one. NASM lets you +do this, by the use of the \code{WRT} (With Reference To) keyword. +So you can do things like + +\begin{lstlisting} +mov ax,weird_seg ; weird_seg is a segment base +mov es,ax +mov bx,symbol wrt weird_seg +\end{lstlisting} + +to load \code{ES:BX} with a different, but functionally equivalent, +pointer to the symbol \code{symbol}. + +NASM supports far (inter-segment) calls and jumps by means of the +syntax \code{call segment:offset}, where \code{segment} +and \code{offset} both represent immediate values. So to call +a far procedure, you could code either of + +\begin{lstlisting} +call (seg procedure):procedure +call weird_seg:(procedure wrt weird_seg) +\end{lstlisting} + +(The parentheses are included for clarity, to show the intended +parsing of the above instructions. They are not necessary in +practice.) + +NASM supports the syntax \indexcode{CALL FAR}\code{call far procedure} +as a synonym for the first of the above usages. \code{JMP} works +identically to \code{CALL} in these examples. + +To declare a \textindex{far pointer} to a data item in a data +segment, you must code + +\begin{lstlisting} +dw symbol, seg symbol +\end{lstlisting} + +NASM supports no convenient synonym for this, though you can always +invent one using the macro processor. + +\xsection{strict}{\codeindex{STRICT}: Inhibiting Optimization} + +When assembling with the optimizer set to level 2 or higher (see +\nref{opt-O}), NASM will use size specifiers (\code{BYTE}, +\code{WORD}, \code{DWORD}, \code{QWORD}, \code{TWORD}, \code{OWORD}, +\code{YWORD} or \code{ZWORD}), but will give them the smallest possible +size. The keyword \code{STRICT} can be used to inhibit optimization +and force a particular operand to be emitted in the specified size. +For example, with the optimizer on, and in \code{BITS 16} mode, + +\begin{lstlisting} +push dword 33 +\end{lstlisting} + +is encoded in three bytes \code{66 6A 21}, whereas + +\begin{lstlisting} +push strict dword 33 +\end{lstlisting} + +is encoded in six bytes, with a full dword immediate operand +\code{66 68 21 00 00 00}. + +With the optimizer off, the same code (six bytes) is generated whether +the \code{STRICT} keyword was used or not. + +\xsection{crit}{\textindexlc{Critical Expressions}} + +Although NASM has an optional multi-pass optimizer, there are some +expressions which must be resolvable on the first pass. These are +called \emph{Critical Expressions}. + +The first pass is used to determine the size of all the assembled +code and data, so that the second pass, when generating all the +code, knows all the symbol addresses the code refers to. So one +thing NASM can't handle is code whose size depends on the value +of a symbol declared after the code in question. For example, + +\begin{lstlisting} +times (label-$) db 0 +label: db 'Where am I?' +\end{lstlisting} + +The argument to \codeindex{TIMES} in this case could equally legally +evaluate to anything at all; NASM will reject this example because +it cannot tell the size of the \code{TIMES} line when it first sees it. +It will just as firmly reject the slightly \index{paradox}paradoxical +code + +\begin{lstlisting} +times (label-$+1) db 0 +label: db 'NOW where am I?' +\end{lstlisting} + +in which \emph{any} value for the \code{TIMES} argument +is by definition wrong! + +NASM rejects these examples by means of a concept called a +\emph{critical expression}, which is defined to be an +expression whose value is required to be computable in +the first pass, and which must therefore depend only +on symbols defined before it. The argument to the \code{TIMES} +prefix is a critical expression. + +\xsection{locallab}{\textindexlc{Local Labels}} + +NASM gives special treatment to symbols beginning with a \textindex{period}. +A label beginning with a single period is treated as a \emph{local} +label, which means that it is associated with the previous non-local +label. So, for example: + +\begin{lstlisting} +label1 ; some code + +.loop + ; some more code + + jne .loop + ret + +label2 ; some code + +.loop + ; some more code + + jne .loop + ret +\end{lstlisting} + +In the above code fragment, each \code{JNE} instruction jumps to the +line immediately before it, because the two definitions of +\code{.loop} are kept separate by virtue of each being associated +with the previous non-local label. + +This form of local label handling is borrowed from the old Amiga +assembler \textindex{DevPac}; however, NASM goes one step further, +in allowing access to local labels from other parts of the code. This +is achieved by means of \emph{defining} a local label in terms of the +previous non-local label: the first definition of \code{.loop} above is +really defining a symbol called \code{label1.loop}, and the second +defines a symbol called \code{label2.loop}. So, if you really needed +to, you could write + +\begin{lstlisting} +label3 ; some more code + ; and some more + + jmp label1.loop +\end{lstlisting} + +Sometimes it is useful - in a macro, for instance - to be able to +define a label which can be referenced from anywhere but which +doesn't interfere with the normal local-label mechanism. Such a +label can't be non-local because it would interfere with subsequent +definitions of, and references to, local labels; and it can't be +local because the macro that defined it wouldn't know the label's +full name. NASM therefore introduces a third type of label, which is +probably only useful in macro definitions: if a label begins with +the \index{label prefix}special prefix \codeindex{..@}, then it +does nothing to the local label mechanism. So you could code + +\begin{lstlisting} +label1: ; a non-local label +.local: ; this is really label1.local +..@foo: ; this is a special symbol +label2: ; another non-local label +.local: ; this is really label2.local + + jmp ..@foo ; this will jump three lines up +\end{lstlisting} + +NASM has the capacity to define other special symbols beginning with +a double period: for example, \code{..start} is used to specify the +entry point in the \code{obj} output format (see \nref{dotdotstart}), +\code{..imagebase} is used to find out the offset from a base address +of the current image in the \code{win64} output format +(see \nref{win64pic}). So just keep in mind that symbols +beginning with a double period are special. |