summaryrefslogtreecommitdiff
path: root/nasm.doc
diff options
context:
space:
mode:
Diffstat (limited to 'nasm.doc')
-rw-r--r--nasm.doc1769
1 files changed, 0 insertions, 1769 deletions
diff --git a/nasm.doc b/nasm.doc
deleted file mode 100644
index 264d5ba7..00000000
--- a/nasm.doc
+++ /dev/null
@@ -1,1769 +0,0 @@
- The Netwide Assembler, NASM
- ===========================
-
-Introduction
-============
-
-The Netwide Assembler grew out of an idea on comp.lang.asm.x86 (or
-possibly alt.lang.asm, I forget which), which was essentially that
-there didn't seem to be a good free x86-series assembler around, and
-that maybe someone ought to write one.
-
-- A86 is good, but not free, and in particular you don't get any
- 32-bit capability until you pay. It's DOS only, too.
-
-- GAS is free, and ports over DOS/Unix, but it's not very good,
- since it's designed to be a back end to gcc, which always feeds it
- correct code. So its error checking is minimal. Also its syntax is
- horrible, from the point of view of anyone trying to actually
- _write_ anything in it. Plus you can't write 16-bit code in it
- (properly).
-
-- AS86 is Linux specific, and (my version at least) doesn't seem to
- have much (or any) documentation.
-
-- MASM isn't very good. And it's expensive. And it runs only under
- DOS.
-
-- TASM is better, but still strives for MASM compatibility, which
- means millions of directives and tons of red tape. And its syntax
- is essentially MASM's, with the contradictions and quirks that
- entails (although it sorts out some of those by means of Ideal
- mode). It's expensive too. And it's DOS only.
-
-So here, for your coding pleasure, is NASM. At present it's still in
-prototype stage - we don't promise that it can outperform any of
-these assemblers. But please, _please_ send us bug reports, fixes,
-helpful information, and anything else you can get your hands on
-(and thanks to the many people who've done this already! You all
-know who you are), and we'll improve it out of all recognition.
-Again.
-
-Please see the file `Licence' for the legalese.
-
-Getting Started: Installation
-=============================
-
-NASM is distributed in source form, in what we hope is totally
-ANSI-compliant C. It uses no non-portable code at all, that we know
-of. It ought to compile without change on any system you care to try
-it on. We also supply a pre-compiled 16-bit DOS binary.
-
-To install it, edit the Makefile to describe your C compiler, and
-type `make'. Then copy the binary to somewhere on your path. That's
-all - NASM relies on no files other than its own executable.
-Although if you're on a Unix system, you may also want to install
-the NASM manpage (`nasm.1'). You may also want to install the binary
-and manpage for the Netwide Disassembler, NDISASM (also see
-`ndisasm.doc').
-
-Running NASM
-============
-
-To assemble a file, you issue a command of the form
-
- nasm -f <format> <filename> [-o <output>]
-
-For example,
-
- nasm -f elf myfile.asm
-
-will assemble `myfile.asm' into an ELF object file `myfile.o'. And
-
- nasm -f bin myfile.asm -o myfile.com
-
-will assemble `myfile.asm' into a raw binary program `myfile.com'.
-
-To produce a listing file, with the hex codes output from NASM
-displayed on the left of the original sources, use `-l' to give a
-listing file name, for example:
-
- nasm -f coff myfile.asm -l myfile.lst
-
-To get further usage instructions from NASM, try typing `nasm -h'.
-This will also list the available output file formats, and what they
-are.
-
-If you use Linux but aren't sure whether your system is a.out or
-ELF, type `file /usr/bin/nasm' or wherever you put the NASM binary.
-If it says something like
-
-/usr/bin/nasm: ELF 32-bit LSB executable i386 (386 and up) Version 1
-
-then your system is ELF, and you should use `-f elf' when you want
-NASM to produce Linux object files. If it says
-
-/usr/bin/nasm: Linux/i386 demand-paged executable (QMAGIC)
-
-or something similar, your system is a.out, and you should use `-f
-aout' instead.
-
-Like Unix compilers and assemblers, NASM is silent unless it goes
-wrong: you won't see any output at all, unless it gives error
-messages.
-
-If you define an environment variable called NASM, the program will
-interpret it as a list of extra command-line options, processed
-before the real command line. This is probably most useful for
-defining an include-file search path by putting a lot of `-i'
-options in the NASM variable.
-
-The variable's value will be considered to be a space-separated list
-of options unless it begins with something other than a minus sign,
-in which case the first character will be taken as the separator.
-For example, if you want to define a macro whose value has a space
-in it, then setting the NASM variable to `-dNAME="my name"' won't
-work because the string will be split at the space into `-dNAME="my'
-and `name"', but setting it to `|-dNAME="my name"' will be fine
-because all further operands will be considered to be separated by
-vertical bars and so the space has no special meaning.
-
-Quick Start for MASM Users
-==========================
-
-If you're used to writing programs with MASM, or with TASM in
-MASM-compatible (non-Ideal) mode, or with A86, this section attempts
-to outline the major differences between MASM's syntax and NASM's.
-If you're not already used to MASM, it's probably worth skipping
-this section.
-
-One simple difference is that NASM is case-sensitive. It makes a
-difference whether you call your label `foo', `Foo' or `FOO'. If
-you're assembling to the `obj' MS-DOS output format (or `os2'), you
-can invoke the `UPPERCASE' directive (documented below, in the
-Output Formats section) and ensure that all symbols exported to
-other code modules are forced to uppercase; but even then, _within_
-a single module, NASM will distinguish between labels differing only
-in case.
-
-There are also differences in some of the instructions and register
-names: for example, NASM calls the floating-point stack registers
-`st0', `st1' and so on, rather than MASM's `ST(0)' notation or A86's
-simple numeric `0'. And NASM doesn't support LODS, MOVS, STOS, SCAS,
-CMPS, INS, or OUTS, but only supports the size-specified versions
-LODSB, MOVSW, SCASD and so on.
-
-The _major_ difference, though, is the absence in NASM of variable
-typing. MASM will notice when you declare a variable as `var dw 0',
-and will remember that `var' is a WORD-type variable, so that
-instructions such as `mov var,2' can be unambiguously given the WORD
-size rather than BYTE or DWORD. NASM doesn't and won't do this. The
-statement `var dw 0' merely defines `var' to be a label marking a
-point in memory: no more and no less. It so happens that there are
-two bytes of data following that point in memory before the next
-line of code, but NASM doesn't remember or care. If you want to
-store the number 2 in such a variable, you must specify the size of
-the operation _always_: `mov word [var],2'. This is a deliberate
-design decision, _not_ a bug, so please could people not send us
-mail asking us to `fix' it...
-
-The above example also illustrates another important difference
-between MASM and NASM syntax: the use of OFFSET and of square
-brackets. In MASM, declaring `var dw 0' entitles you to code `mov
-ax,var' to get at the _contents_ of the variable, and you must write
-`mov ax,offset var' to get the _address_ of the variable. In NASM,
-`mov ax,var' gives you the address, and to get at the contents you
-must code `mov ax,[var]'. Again, this is a deliberate design
-decision, since it brings consistency to the syntax: `mov ax,[var]'
-and `mov ax,[bx]' both refer to the contents of memory and both have
-square brackets, whereas neither `mov ax,bx' nor `mov ax,var' refers
-to memory contents and so neither one has square brackets.
-
-This is even more confusing in A86, where declaring a label with a
-trailing colon defines it to be a `label' as opposed to a `variable'
-and causes A86 to adopt NASM-style semantics; so in A86, `mov
-ax,var' has different behaviour depending on whether `var' was
-declared as `var: dw 0' or `var dw 0'. NASM is very simple by
-comparison: _everything_ is a label. The OFFSET keyword is not
-required, and in fact constitutes a syntax error (though you can
-code `%define offset' to suppress the error messages if you want),
-and `var' always refers to the _address_ of the label whereas
-`[var]' refers to the _contents_.
-
-As an addendum to this point of syntax, it's also worth noting that
-the hybrid-style syntaxes supported by MASM and its clones, such as
-`mov ax,table[bx]', where a memory reference is denoted by one
-portion outside square brackets and another portion inside, are also
-not supported by NASM. The correct syntax for the above is `mov
-ax,[table+bx]'. Likewise, `mov ax,es:[di]' is wrong and `mov
-ax,[es:di]' is right.
-
-Writing Programs with NASM
-==========================
-
-Each line of a NASM source file should contain some combination of
-the four fields
-
-LABEL: INSTRUCTION OPERANDS ; COMMENT
-
-`LABEL' defines a label pointing to that point in the source. There
-are no restrictions on white space: labels may have white space
-before them, or not, as you please. The colon after the label is
-also optional. (Note that NASM can be made to give a warning when it
-sees a label which is the only thing on a line with no trailing
-colon, on the grounds that such a label might easily be a mistyped
-instruction name. The command line option `-w+orphan-labels' will
-enable this feature.)
-
-Valid characters in labels are letters, numbers, `_', `$', `#', `@',
-`~', `?', and `.'. The only characters which may be used as the
-_first_ character of an identifier are letters, `_' and `?', and
-(with special meaning: see `Local Labels') `.'. An identifier may
-also be prefixed with a $ sign to indicate that it is intended to be
-read as an identifier and not a reserved word; thus, if some other
-module you are linking with defines a symbol `eax', you can refer to
-`$eax' in NASM code to distinguish it from the register name.
-
-`INSTRUCTION' can be any machine opcode (Pentium and P6 opcodes, FPU
-opcodes, MMX opcodes and even undocumented opcodes are all
-supported). The instruction may be prefixed by LOCK, REP, REPE/REPZ
-or REPNE/REPNZ, in the usual way. Explicit address-size and operand-
-size prefixes A16, A32, O16 and O32 are provided - one example of
-their use is given in the `Unusual Instruction Sizes' section below.
-You can also use a segment register as a prefix: coding `es mov
-[bx],ax' is equivalent to coding `mov [es:bx],ax'. We recommend the
-latter syntax, since it is consistent with other syntactic features
-of the language, but for instructions such as `lodsb' there isn't
-anywhere to put a segment override except as a prefix. This is why
-we support it.
-
-The `INSTRUCTION' field may also contain some pseudo-opcodes: see
-the section on pseudo-opcodes for details.
-
-`OPERANDS' can be nonexistent, or huge, depending on the
-instruction, of course. When operands are registers, they are given
-simply as register names: `eax', `ss', `di' for example. NASM does
-_not_ use the GAS syntax, in which register names are prefixed by a
-`%' sign. Operands may also be effective addresses, or they may be
-constants or expressions. See the separate sections on these for
-details.
-
-`COMMENT' is anything after the first semicolon on the line,
-excluding semicolons inside quoted strings.
-
-Of course, all these fields are optional: the presence or absence of
-the OPERANDS field is required by the nature of the INSTRUCTION
-field, but any line may contain a LABEL or not, may contain an
-INSTRUCTION or not, and may contain a COMMENT or not, independently
-of each other.
-
-Lines may also contain nothing but a directive: see `Assembler
-Directives' below for details.
-
-NASM can currently not handle any line longer than 1024 characters.
-This may be fixed in a future release.
-
-Floating Point Instructions
-===========================
-
-NASM has support for assembling FPU opcodes. However, its syntax is
-not necessarily the same as anyone else's.
-
-NASM uses the notation `st0', `st1', etc. to denote the FPU stack
-registers. NASM also accepts a wide range of single-operand and
-two-operand forms of the instructions. For people who wish to use
-the single-operand form exclusively (this is in fact the `canonical'
-form from NASM's point of view, in that it is the form produced by
-the Netwide Disassembler), there is a TO keyword which makes
-available the opcodes which cannot be so easily accessed by one
-operand. Hence:
-
- fadd st1 ; this sets st0 := st0 + st1
- fadd st0,st1 ; so does this
- fadd st1,st0 ; this sets st1 := st1 + st0
- fadd to st1 ; so does this
-
-It's also worth noting that the FPU instructions that reference
-memory must use the prefixes DWORD, QWORD or TWORD to indicate what
-size of memory operand they refer to.
-
-NASM, in keeping with our policy of not trying to second-guess the
-programmer, will _never_ automatically insert WAIT instructions into
-your code stream. You must code WAIT yourself before _any_
-instruction that needs it. (Of course, on 286 processors or above,
-it isn't needed anyway...)
-
-NASM supports specification of floating point constants by means of
-`dd' (single precision), `dq' (double precision) and `dt' (extended
-precision). Floating-point _arithmetic_ is not done, due to
-portability constraints (not all platforms on which NASM can be run
-support the same floating point types), but simple constants can be
-specified. For example:
-
-gamma dq 0.5772156649 ; Euler's constant
-
-Pseudo-Opcodes
-==============
-
-Pseudo-opcodes are not real x86 machine opcodes, but are used in the
-instruction field anyway because that's the most convenient place to
-put them. The current pseudo-opcodes are DB, DW, DD, DQ and DT,
-their uninitialised counterparts RESB, RESW, RESD, RESQ and REST,
-the INCBIN command, the EQU command, and the TIMES prefix.
-
-DB, DW, DD, DQ and DT work as you would expect: they can each take
-an arbitrary number of operands, and when assembled, they generate
-nothing but those operands. All three of them can take string
-constants as operands. See the `Constants' section for details about
-string constants.
-
-RESB, RESW, RESD, RESQ and REST are designed to be used in the BSS
-section of a module: they declare _uninitialised_ storage space.
-Each takes a single operand, which is the number of bytes, words or
-doublewords to reserve. We do not support the MASM/TASM syntax of
-reserving uninitialised space by writing `DW ?' or similar: this is
-what we do instead. (But see `Critical Expressions' for a caveat on
-the nature of the operand.)
-
-(An aside: if you want to be able to write `DW ?' and have something
-vaguely useful happen, you can always code `? EQU 0'...)
-
-INCBIN is borrowed from the old Amiga assembler Devpac: it includes
-a binary file verbatim into the output file. This can be handy for
-(for example) including graphics and sound data directly into a game
-executable file. It can be called in one of these three ways:
-
- INCBIN "file.dat" ; include the whole file
- INCBIN "file.dat",1024 ; skip the first 1024 bytes
- INCBIN "file.dat",1024,512 ; skip the first 1024, and
- ; actually include at most 512
-
-EQU defines a symbol to a specified value: when EQU is used, the
-LABEL field must be present. The action of EQU is to define the
-given label name to the value of its (only) operand. This definition
-is absolute, and cannot change later. So, for example,
-
-message db 'hello, world'
-msglen equ $-message
-
-defines `msglen' to be the constant 12. `msglen' may not then be
-redefined later. This is not a preprocessor definition either: the
-value of `msglen' is evaluated _once_, using the value of `$' (see
-the section `Expressions' for details of `$') at the point of
-definition, rather than being evaluated wherever it is referenced
-and using the value of `$' at the point of reference. Note that the
-caveat in `Critical Expressions' applies to EQU too, at the moment.
-
-Finally, the TIMES prefix causes the instruction to be assembled
-multiple times. This is partly NASM's equivalent of the DUP syntax
-supported by MASM-compatible assemblers, in that one can do
-
-zerobuf: times 64 db 0
-
-or similar, but TIMES is more versatile than that. TIMES takes not
-just a numeric constant, but a numeric _expression_, so one can do
-things like
-
-buffer: db 'hello, world'
- times 64-$+buffer db ' '
-
-which will store exactly enough spaces to make the total length of
-`buffer' up to 64. (See the section `Critical Expressions' for a
-caveat on the use of TIMES.) Finally, TIMES can be applied to
-ordinary opcodes, so you can code trivial unrolled loops in it:
-
- times 100 movsb
-
-Note that there is no effective difference between `times 100 resb
-1' and `resb 100', except that the latter will be assembled about
-100 times faster due to the internal structure of the assembler.
-
-Note also that TIMES can't be applied to macros: the reason for this
-is that TIMES is processed after the macro phase, which allows the
-argument to TIMES to contain expressions such as `64-$+buffer' as
-above.
-
-Effective Addresses
-===================
-
-NASM's addressing scheme is very simple, although it can involve
-more typing than other assemblers. Where other assemblers
-distinguish between a _variable_ (label declared without a colon)
-and a _label_ (declared with a colon), and use different means of
-addressing the two, NASM is totally consistent.
-
-To refer to the contents of a memory location, square brackets are
-required. This applies to simple variables, computed offsets,
-segment overrides, effective addresses - _everything_. E.g.:
-
-wordvar dw 123
- mov ax,[wordvar]
- mov ax,[wordvar+1]
- mov ax,[es:wordvar+bx]
-
-NASM does _not_ support the various strange syntaxes used by MASM
-and others, such as
-
- mov ax,wordvar ; this is legal, but means something else
- mov ax,es:wordvar[bx] ; not even slightly legal
- es mov ax,wordvar[1] ; the prefix is OK, but not the rest
-
-If no square brackets are used, NASM interprets label references to
-mean the address of the label. Hence there is no need for MASM's
-OFFSET keyword, but
-
- mov ax,wordvar
-
-loads AX with the _address_ of the variable `wordvar'.
-
-More complicated effective addresses are handled by enclosing them
-within square brackets as before:
-
- mov eax,[ebp+2*edi+offset]
- mov ax,[bx+di+8]
-
-NASM will cope with some fairly strange effective addresses, if you
-try it: provided your effective address expression evaluates
-_algebraically_ to something that the instruction set supports, it
-will be able to assemble it. For example,
-
- mov eax,[ebx*5] ; actually assembles to [ebx+ebx*4]
- mov ax,[bx-si+2*si] ; actually assembles to [bx+si]
-
-will both work.
-
-There is an ambiguity in the instruction set, which allows two forms
-of 32-bit effective address with equivalent meaning:
-
- mov eax,[2*eax+0]
- mov eax,[eax+eax]
-
-These two expressions clearly refer to the same address. The
-difference is that the first one, if assembled `as is', requires a
-four-byte offset to be stored as part of the instruction, so it
-takes up more space. NASM will generate the second (smaller) form
-for both of the above instructions, in an effort to save space.
-There is not, currently, any means for forcing NASM to generate the
-larger form of the instruction.
-
-An alternative syntax is supported, in which prefixing an operand
-with `&' is synonymous with enclosing it in square brackets. The
-square bracket syntax is the recommended one, however, and is the
-syntax generated by NDISASM. But, for example, `mov eax,&ebx+ecx' is
-equivalent to `mov eax,[ebx+ecx]'.
-
-Mixing 16 and 32 Bit Code: Unusual Instruction Sizes
-====================================================
-
-A number of assemblers seem to have trouble assembling instructions
-that use a different operand or address size from the one they are
-expecting; as86 is a good example, even though the Linux kernel boot
-process (which is assembled using as86) needs several such
-instructions and as86 can't do them.
-
-Instructions such as `mov eax,2' in 16-bit mode are easy, of course,
-and NASM can do them just as well as any other assembler. The
-difficult instructions are things like far jumps.
-
-Suppose you are in a 16-bit segment, in protected mode, and you want
-to execute a far jump to a point in a 32-bit segment. You need to
-code a 32-bit far jump in a 16-bit segment; not all assemblers will
-easily support this. NASM can, by means of the `word' and `dword'
-specifiers. So you can code
-
- jmp 1234h:5678h ; this uses the default segment size
- jmp word 1234h:5678h ; this is guaranteed to be 16-bit
- jmp dword 1234h:56789ABCh ; and this is guaranteed 32-bit
-
-and NASM will generate correct code for them.
-
-Similarly, if you are coding in a 16-bit code segment, but trying to
-access memory in a 32-bit data segment, your effective addresses
-will want to be 32-bit. Of course as soon as you specify an
-effective address containing a 32-bit register, like `[eax]', the
-addressing is forced to be 32-bit anyway. But if you try to specify
-a simple offset, such as `[label]' or `[0x10000]', you will get the
-default address size, which in this case will be wrong. However,
-NASM allows you to code `[dword 0x10000]' to force a 32-bit address
-size, or conversely `[word wlabel]' to force 16 bits.
-
-Be careful not to confuse `word' and `dword' _inside_ the square
-brackets with _outside_: consider the instruction
-
- mov word [dword 0x123456],0x7890
-
-which moves 16 bits of data to an address specified by a 32-bit
-offset. There is no contradiction between the `word' and `dword' in
-this instruction, since they modify different aspects of the
-functionality. Or, even more confusingly,
-
- call dword far [fs:word 0x4321]
-
-which takes an address specified by a 16-bit offset, and extracts a
-48-bit DWORD FAR pointer from it to call.
-
-Using this effective-address syntax, the `dword' or `word' override
-may come before or after the segment override if any: NASM isn't
-fussy. Hence:
-
- mov ax,[fs:dword 0x123456]
- mov ax,[dword fs:0x123456]
-
-are equivalent forms, and generate the same code.
-
-The LOOP instruction comes in strange sizes, too: in a 16-bit
-segment it uses CX as its count register by default, and in a 32-bit
-segment it uses ECX. But it's possible to do either one in the other
-segment, and NASM will cope by letting you specify the count
-register as a second operand:
-
- loop label ; uses CX or ECX depending on mode
- loop label,cx ; always uses CX
- loop label,ecx ; always uses ECX
-
-Finally, the string instructions LODSB, STOSB, MOVSB, CMPSB, SCASB,
-INSB, and OUTSB can all have strange address sizes: typically, in a
-16-bit segment they read from [DS:SI] and write to [ES:DI], and in a
-32-bit segment they read from [DS:ESI] and write to [ES:EDI].
-However, this can be changed by the use of the explicit address-size
-prefixes `a16' and `a32'. These prefixes generate null code if used
-in the same size segment as they specify, but generate an 0x67
-prefix otherwise. Hence `a16' generates no code in a 16-bit segment,
-but 0x67 in a 32-bit one, and vice versa. So `a16 lodsb' will always
-generate code to read a byte from [DS:SI], no matter what the size
-of the segment. There are also explicit operand-size override
-prefixes, `o16' and `o32', which will optionally generate 0x66
-bytes, but these are provided for completeness and should never have
-to be used. (Note that NASM does not support the LODS, STOS, MOVS
-etc. forms of the string instructions.)
-
-Constants
-=========
-
-NASM can accept three kinds of constant: _numeric_, _character_ and
-_string_ constants.
-
-Numeric constants are simply numbers. NASM supports a variety of
-syntaxes for expressing numbers in strange bases: you can do any of
-
- 100 ; this is decimal
- 0x100 ; hex
- 100h ; hex as well
- $100 ; hex again
- 100q ; octal
- 100b ; binary
-
-NASM does not support A86's syntax of treating anything with a
-leading zero as hex, nor does it support the C syntax of treating
-anything with a leading zero as octal. Leading zeros make no
-difference to NASM. (Except that, as usual, if you have a hex
-constant beginning with a letter, and you want to use the trailing-H
-syntax to represent it, you have to use a leading zero so that NASM
-will recognise it as a number instead of a label.)
-
-The `x' in `0x100', and the trailing `h', `q' and `b', may all be
-upper case if you want.
-
-Character constants consist of up to four characters enclosed in
-single or double quotes. No escape character is defined for
-including the quote character itself: if you want to declare a
-character constant containing a double quote, enclose it in single
-quotes, and vice versa.
-
-Character constants' values are worked out in terms of a
-little-endian computer: if you code
-
- mov eax,'abcd'
-
-then if you were to examine the binary output from NASM, it would
-contain the visible string `abcd', which of course means that the
-actual value loaded into EAX would be 0x64636261, not 0x61626364.
-
-String constants are like character constants, only more so: if a
-character constant appearing as operand to a DB, DW or DD is longer
-than the word size involved (1, 2 or 4 respectively), it will be
-treated as a string constant instead, which is to say the
-concatenation of separate character constants.
-
-For example,
-
- db 'hello, world'
-
-declares a twelve-character string constant. And
-
- dd 'dontpanic'
-
-(a string constant) is equivalent to writing
-
- dd 'dont','pani','c'
-
-(three character constants), so that what actually gets assembled is
-equivalent to
-
- db 'dontpanic',0,0,0
-
-(It's worth noting that one of the reasons for the reversal of
-character constants is so that the instruction `dw "ab"' has the
-same meaning whether "ab" is treated as a character constant or a
-string constant. Hence there is less confusion.)
-
-Expressions
-===========
-
-Expressions in NASM can be formed of the following operators: `|'
-(bitwise OR), `^' (bitwise XOR), `&' (bitwise AND), `<<' and `>>'
-(logical bit shifts), `+', `-', `*' (ordinary addition, subtraction
-and multiplication), `/', `%' (unsigned division and modulo), `//',
-`%%' (signed division and modulo), `~' (bitwise NOT), and the
-operators SEG and WRT (see `SEG and WRT' below).
-
-The order of precedence is:
-
-| lowest
-^
-&
-<< >>
-binary + and -
-* / % // %%
-unary + and -, ~, SEG highest
-
-As usual, operators within a precedence level associate to the left
-(i.e. `2-3-4' evaluates the same way as `(2-3)-4').
-
-Note that since the `%' character is used by the preprocessor, it's
-worth making sure that the `%' and `%%' operators are followed by a
-space, to prevent the preprocessor trying to interpret them as
-macro-related things.
-
-A form of algebra is done by NASM when evaluating expressions: I
-have already stated that an effective address expression such as
-`[EAX*6-EAX]' will be recognised by NASM as algebraically equivalent
-to `[EAX*4+EAX]', and assembled as such. In addition, algebra can be
-done on labels as well: `label2*2-label1' is an acceptable way to
-define an address as far beyond `label2' as `label1' is before it.
-(In less algebraically capable assemblers, one might have to write
-that as `label2 + (label2-label1)', where the value of every
-sub-expression is either a valid address or a constant. NASM can of
-course cope with that version as well.)
-
-Expressions may also contain the special token `$', known as a Here
-token, which always evaluates to the address of the current assembly
-point. (That is, the address of the assembly point _before_ the
-current instruction gets assembled.) The special token `$$'
-evaluates to the address of the beginning of the current section;
-this can be used for alignment, as shown below:
-
- times ($$-$) & 3 nop ; pad with NOPs to 4-byte boundary
-
-Note that this technique aligns to a four-byte boundary with respect
-to the beginning of the _segment_; if you can't guarantee that the
-segment itself begins on a four-byte boundary, this alignment is
-useless or worse. Be sure you know what kind of alignment you can
-guarantee to get out of your linker before you start trying to use
-TIMES to align to page boundaries. (Of course, the `obj' and `os2'
-file formats can happily cope with page alignment, provided you
-specify that segment attribute.)
-
-SEG and WRT
-===========
-
-NASM contains the capability for its object file formats (currently,
-only `obj' and its variant `os2' make use of this) to permit
-programs to directly refer to the segment-base values of their
-segments. This is achieved either by the object format defining the
-segment names as symbols (`obj' and `os2' do this), or by the use of
-the SEG operator.
-
-SEG is a unary prefix operator which, when applied to a symbol
-defined in a segment, will yield the segment base value of that
-segment. (In `obj' and `os2' format, symbols defined in segments
-which are grouped are considered to be primarily a member of the
-_group_, not the segment, and the return value of SEG reflects
-this.)
-
-SEG may be used for far pointers: it is guaranteed that for any
-symbol `sym', using the offset `sym' from the segment base `SEG sym'
-yields a correct pointer to the symbol. Hence you can code a far
-call by means of
-
- CALL SEG routine:routine
-
-or store a far pointer in a data segment by
-
- DW routine, SEG routine
-
-For convenience, NASM supports the forms
-
- CALL FAR routine
- JMP FAR routine
-
-as direct synonyms for the canonical syntax
-
- CALL SEG routine:routine
- JMP SEG routine:routine
-
-No alternative syntax for
-
- DW routine, SEG routine
-
-is supported.
-
-Simply referring to `sym', for some symbol, will return the offset
-of `sym' from its _preferred_ segment base (as returned from `SEG
-sym'); sometimes, you may want to obtain the offset of `sym' from
-some _other_ segment base. (E.g. the offset of `sym' from the base
-of the segment it's in, where normally you'd get the offset from a
-group base). This is accomplished using the WRT (With Reference To)
-keyword: if `sym' is defined in segment `seg' but you want its
-offset relative to the beginning of segment `seg2', you can do
-
- mov ax,sym WRT seg2
-
-The right-hand operand to WRT must be a segment-base value. You can
-also do `sym WRT SEG sym2' if you need to.
-
-Critical Expressions
-====================
-
-NASM is a two-pass assembler: it goes over the input once to
-determine the location of all the symbols, then once more to
-actually generate the output code. Most expressions are
-non-critical, in that if they contain a forward reference and hence
-their correct value is unknown during the first pass, it doesn't
-matter. However, arguments to RESB, RESW and RESD, and the argument
-to the TIMES prefix, can actually affect the _size_ of the generated
-code, and so it is critical that the expression can be evaluated
-correctly on the first pass. So in these situations, expressions may
-not contain forward references. This prevents NASM from having to
-sort out a mess such as
-
- times (label-$) db 0
-label: db 'where am I?'
-
-in which the TIMES argument could equally legally evaluate to
-_anything_, or perhaps even worse,
-
- times (label-$+1) db 0
-label: db 'NOW where am I?'
-
-in which any value for the TIMES argument is by definition invalid.
-
-Since NASM is a two-pass assembler, this criticality condition also
-applies to the argument to EQU. Suppose, if this were not the case,
-we were to have the setup
-
- mov ax,a
-a equ b
-b:
-
-On pass one, `a' cannot be defined properly, since `b' is not known
-yet. On pass two, `b' is known, so line two can define `a' properly.
-Unfortunately, line 1 needed `a' to be defined properly, so this
-code will not assemble using only two passes.
-
-There's a related issue: in an effective address such as
-`[eax+offset]', the value of `offset' can be stored as either 1 or 4
-bytes. NASM will use the one-byte form if it knows it can, to save
-space, but will therefore be fooled by the following:
-
- mov eax,[ebx+offset]
-offset equ 10
-
-In this case, although `offset' is a small value and could easily
-fit into the one-byte form of the instruction, when NASM sees the
-instruction in the first pass it doesn't know what `offset' is, and
-for all it knows `offset' could be a symbol requiring relocation. So
-it will allocate the full four bytes for the value of `offset'. This
-can be solved by defining `offset' before it's used.
-
-Local Labels
-============
-
-NASM takes its local label scheme mainly from the old Amiga
-assembler Devpac: a local label is one that begins with a period.
-The `localness' comes from the fact that local labels are associated
-with the previous non-local label, so that you may declare the same
-local label twice if a non-local one intervenes. Hence:
-
-label1 ; some code
-.loop ; some more code
- jne .loop
- ret
-label2 ; some code
-.loop ; some more code
- jne .loop
- ret
-
-In the above code, each `jne' instruction jumps to the line of code
-before it, since the `.loop' labels are distinct from each other.
-
-NASM, however, introduces an extra capability not present in Devpac,
-which is that the local labels are actually _defined_ in terms of
-their associated non-local label. So if you really have to, you can
-write
-
-label3 ; some more code
- ; and some more
- jmp label1.loop
-
-So although local labels are _usually_ local, it is possible to
-reference them from anywhere in your program, if you really have to.
-
-Assembler Directives
-====================
-
-Assembler directives appear on a line by themselves (apart from a
-comment). They come in two forms: user-level directives and
-primitive directives. Primitive directives are enclosed in square
-brackets (no white space may appear before the opening square
-bracket, although white space and a comment may come after the
-closing bracket), and were the only form of directive supported by
-earlier versions of NASM. User-level directives look the same, only
-without the square brackets, and are the more modern form. (They are
-implemented as macros expanding to primitive directives.) There is a
-distinction in functionality, which is explained below in the
-section on structures.
-
-Some directives are universal: they may be used in any situation,
-and do not change their syntax. The universal directives are listed
-below.
-
-`BITS 16' or `BITS 32' switches NASM into 16-bit or 32-bit mode.
-(This is equivalent to USE16 and USE32 segments, in TASM or MASM.)
-In 32-bit mode, instructions are prefixed with 0x66 or 0x67 prefixes
-when they use 16-bit data or addresses; in 16-bit mode, the reverse
-happens. NASM's default depends on the object format; the defaults
-are documented with the formats. (See `obj' and `os2', in
-particular, for some unusual behaviour.)
-
-`SECTION name' or `SEGMENT name' changes which section the code you
-write will be assembled into. Acceptable section names vary between
-output formats, but most formats (indeed, all formats at the moment)
-support the names `.text', `.data' and `.bss'. Note that `.bss' is
-an uninitialised data section, and so you will receive a warning
-from NASM if you try to assemble any code or data in it. The only
-thing you can do in `.bss' without triggering a warning is to use
-RESB, RESW and RESD. That's what they're for.
-
-`ABSOLUTE address' can be considered a different form of `SECTION',
-in that it must be overridden using a SECTION directive once you
-have finished using it. It is used to assemble notional code at an
-absolute offset address; of course, you can't actually assemble
-_code_ there, since no object file format is capable of putting the
-code in place, but you can use RESB, RESW and RESD, and you can
-define labels. Hence you could, for example, define a C-like data
-structure by means of
-
- absolute 0
- stLong resd 1
- stWord resw 1
- stByte1 resb 1
- stByte2 resb 1
- st_size:
- segment .text
-
-and then carry on coding. This defines `stLong' to be zero, `stWord'
-to be 4, `stByte1' to be 6, `stByte2' to be 7 and `st_size' to be 8.
-So this has defined a data structure. The STRUC directive provides a
-nicer way to do this: see below.
-
-`EXTERN symbol' defines a symbol as being `external', in the C
-sense: `EXTERN' states that the symbol is _not_ declared in this
-module, but is declared elsewhere, and that you wish to _reference_
-it in this module.
-
-`GLOBAL symbol' defines a symbol as being global, in the sense that
-it is exported from this module and other modules may reference it.
-All symbols are local, unless declared as global. Note that the
-`GLOBAL' directive must appear before the definition of the symbol
-it refers to.
-
-`COMMON symbol size' defines a symbol as being common: it is
-declared to have the given size, and it is merged at link time with
-any declarations of the same symbol in other modules. This is not
-_fully_ supported in the `obj' or `os2' file format: see the section
-on `obj' for details.
-
-`STRUC structure' begins the definition of a data structure, and
-`ENDSTRUC' ends it. The structure shown above may be defined,
-exactly equivalently, using STRUC as follows:
-
- struc st
- stLong resd 1
- stWord resw 1
- stByte resb 1
- stStr resb 32
- endstruc
-
-Notice that this code still defines the symbol `st_size' to be the
-size of the structure. The `_size' suffix is automatically appended
-to the structure name. Notice also that the assembler takes care of
-remembering which section you were assembling in (whereas in the
-version using `ABSOLUTE' it was up to the programmer to sort that
-out).
-
-`ISTRUC structure' begins the declaration of an initialised instance
-of a data structure. You can then use the `AT' macro to assign
-values to the structure members, and `IEND' to finish. So, for
-example, given the structure `st' above:
-
- istruc st
- at stLong, dd 0x1234
- at stWord, dw 23
- at stByte, db 'q'
- at stStr, db 'hello, world', 13, 10, 0
- iend
-
-Note that there's nothing stopping the instruction after `at' from
-overflowing on to the next line if you want. So the above example
-could just as well have contained
-
- at stStr, db 'hello, world'
- db 13, 10, 0
-
-or even (if you prefer this style)
-
- at stStr
- db 'hello, world'
- db 13, 10, 0
-
-Note also that the `ISTRUC' mechanism is implemented as a set of
-macros, and uses TIMES internally to achieve its effect; so the
-structure fields must be initialised in the same order as they were
-defined in.
-
-This is where user-level directives differ from primitives: the
-`SECTION' (and `SEGMENT') user-level directives don't just call the
-primitive versions, but they also `%define' the special preprocessor
-symbol `__SECT__' to be the primitive directive that specifies the
-current section. So the `ENDSTRUC' directive can remember what
-section the assembly was directed to before the structure definition
-began. For this reason, there is no primitive version of STRUC or
-ENDSTRUC - they are implemented in terms of ABSOLUTE and SECTION.
-This also means that if you use STRUC before explicitly announcing a
-target section, you should explicitly announce one after ENDSTRUC.
-
-Directives may also be specific to the output file format. At
-present, the `bin', `obj' and `os2' formats define extra directives,
-which are specified below.
-
-The Preprocessor
-================
-
-NASM contains a full-featured macro preprocessor, which supports
-conditional assembly, multi-level file inclusion, two forms of macro
-(single-line and multi-line), and a `context stack' mechanism for
-extra macro power. Preprocessor directives all begin with a `%'
-sign.
-
-Single-line macros
-------------------
-
-Single-line macros are defined in a similar way to C, using the
-`%define' command. Hence you can do:
-
- %define ctrl 0x1F &
- %define param(a,b) ((a)+(a)*(b))
- mov byte [param(2,ebx)], ctrl 'D'
-
-which will expand to
-
- mov byte [(2)+(2)*(ebx)], 0x1F & 'D'
-
-When the expansion of a single-line macro contains tokens which
-invoke another macro, the expansion is performed at invocation time,
-not at definition time. Thus the code
-
- %define a(x) 1+b(x)
- %define b(x) 2*x
- mov ax,a(8)
-
-will evaluate in the expected way to `mov ax,1+2*8', even though the
-macro `b' wasn't defined at the time of definition of `a'.
-
-Macros defined with `%define' are case sensitive: after `%define foo
-bar', only `foo' will expand to bar: `Foo' or `FOO' will not. By
-using `%idefine' instead of `%define' (the `i' stands for
-`insensitive') you can define all the case variants of a macro at
-once, so that `%idefine foo bar' would cause `foo', `Foo' and `FOO'
-all to expand to `bar'.
-
-There is a mechanism which detects when a macro call has occurred as
-a result of a previous expansion of the same macro, to guard against
-circular references and infinite loops. If this happens, the
-preprocessor will only expand the first occurrence of the macro.
-Hence:
-
- %define a(x) 1+a(x)
- mov ax,a(3) ; becomes 1+a(3) and expands no further
-
-This can be useful for doing things like this:
-
- %macro extrn 1 ; see next section for explanation of `%macro'
- extern _%1
- %define %1 _%1
- %endmacro
-
-which would avoid having to put leading underscores on external
-variables, because you could just code
-
- extrn foo
- mov ax,foo
-
-and it would expand as
-
- extern foo
- %define foo _foo
- mov ax,foo ; becomes mov ax,_foo as required
-
-Single-line macros with parameters can be overloaded: it is possible
-to define two or more single-line macros with the same name, each
-taking a different number of parameters, and the macro processor
-will be able to distinguish between them. However, a parameterless
-single-line macro excludes the possibility of any macro of the same
-name _with_ parameters, and vice versa (though single-line macros
-may be redefined, keeping the same number of parameters, without
-error).
-
-You can pre-define single-line macros using the `-d' option on the
-NASM command line, such as
-
- nasm filename -dDEBUG
-
-(and then you might have various conditional-assembly bits under
-`%ifdef DEBUG'), or possibly
-
- nasm filename -dTYPE=4
-
-(which might allow you to re-assemble your code to do several
-different things depending on the value of TYPE).
-
-Multiple-line macros
---------------------
-
-These are defined using `%macro' and `%endmacro', so that simple things
-like this can be done:
-
- %macro prologue 0
- push ebp
- mov ebp,esp
- %endmacro
-
-This defines `prologue' to be a multi-line macro, taking no
-parameters, which expands to the two lines of code given.
-
-Similarly to single-line macros, multi-line macros are case-
-sensitive, unless you define them using `%imacro' instead of
-`%macro'.
-
-The `0' on the `%macro' line indicates that the macro `prologue'
-expects no parameters. Macros can be overloaded: if two macros are
-defined with the same name but different numbers of parameters, they
-will be treated as separate. Multi-line macros may not be redefined.
-
-The assembler will usually generate a warning if you code a line
-which looks like a macro call but involves a number of parameters
-which the macro in question isn't ready to support. (For example, if
-you code a macro `%macro foo 1' and also `%macro foo 3', then you
-write `foo a,b', a warning will be generated.) This feature can be
-disabled by the use of the command line option `-w-macro-params',
-since sometimes it's intentional (for example, you might define
-`%macro push 2' to allow you to push two registers at once; but
-`push ax' shouldn't then generate a warning).
-
-Macros taking parameters can be written using `%1', `%2' and so on
-to reference the parameters. So this code
-
- %macro movs 2
- push %2
- pop %1
- %endmacro
- movs ds,cs
-
-will define a macro `movs' to perform an effective MOV operation
-from segment to segment register. The macro call given would of
-course expand to `push cs' followed by `pop ds'.
-
-You can define a label inside a macro in such a way as to make it
-unique to that macro call (so that repeated calls to the same macro
-won't produce multiple labels with the same name), by prefixing it
-with `%%'. So:
-
- %macro retz
- jnz %%skip
- ret
- %%skip:
- %endmacro
-
-This defines a different label in place of `%%skip' every time it's
-called. (Of course the above code could have easily been coded using
-`jnz $+3', but not in more complex cases...) The actual label
-defined would be `..@2345.skip', where 2345 is replaced by some
-number that changes with each macro call. Users are warned to avoid
-defining labels of this shape themselves.
-
-Sometimes you want a macro to be able to accept arbitrarily many
-parameters and lump them into one. This can be done using the `+'
-modifier on the `%macro' line:
-
- %macro fputs 2+
- [section .data] ; this is done as a primitive to avoid
- ; disturbing the __SECT__ define
- %%str db %2
- %%end:
- __SECT__ ; this expands to a whole [section xxx] primitive
- mov dx,%%str
- mov cx,%%end-%%str
- mov bx,%1
- call writefile
- %endmacro
- fputs [filehandle], "hi there", 13, 10
-
-This declares `fputs' to be a macro that accepts _at least two_
-parameters, and all parameters after the first one are lumped
-together as part of the last specified one (in this case %2). So in
-the macro call, `%1' expands to `[filehandle]' while `%2' expands to
-the whole remainder of the line: `"hi there", 13, 10'. Note also the
-switching of sections in the middle of this macro expansion, to
-ensure separation of data and code.
-
-There is an alternative mechanism for putting commas in macro
-parameters: instead of specifying the large-parameter-ness at macro
-definition time, you can specify it at macro call time, by the use
-of braces to surround a parameter which you want to contain commas.
-So:
-
- %macro table_entry 2
- %%start:
- db %1
- times 32-($-%%start) db 0
- db %2
- times 64-($-%%start) db 0
- %endmacro
- table_entry 'foo','bar'
- table_entry 'megafoo', { 27,'[1mBAR!',27,'[m' }
-
-will expand to, effectively (actually, there will be labels present,
-but these have been omitted for clarity), the following:
-
- db 'foo'
- times 32-3 db 0
- db 'bar'
- times 64-35 db 0
- db 'megafoo'
- times 32-7 db 0
- db 27,'[1mBAR!',27,'[m'
- times 64-46 db 0
-
-Macro parameter expansions can be concatenated on to other tokens,
-so that you can do this:
-
- %macro keytab_entry 2
- keypos%1 equ $-keytab
- db %2
- %endmacro
- keytab:
- keytab_entry F1,128+1
- keytab_entry F2,128+2
- keytab_entry Return,13
-
-which will define labels called `keyposF1', `keyposF2' and
-`keyposReturn'. You can similarly do concatenations on the other
-end, such as `%1foo'. If you need to concatenate a digit on to the
-end of a macro parameter expansion, you can do this by enclosing the
-parameter number in braces: `%{1}' is always a valid synonym for
-`%1', and has the advantage that it can be legitimately prepended to
-a digit, as in `%{1}2', and cause no confusion with `%{12}'.
-Macro-specific labels and defines can be concatenated similarly:
-`%{%foo}bar' will succeed where `%%foobar' would cause confusion.
-(As it happens, `%%foobar' would work anyway, due to the format of
-macro-specific labels, but for clarity, `%{%foo}bar' is recommended
-if you _really_ want to do anything this perverse...)
-
-The parameter handling has a special case: it can treat a macro
-parameter specially if it's thought to contain a condition code. The
-reference `%+1' is identical to `%1' except that it will perform an
-initial sanity check to see if the parameter in question is a
-condition code; more usefully, the reference `%-1' will produce the
-_opposite_ condition code to the one specified in the parameter.
-This allows for things such as a conditional-MOV macro to be
-defined:
-
- %macro movc 3
- j%-1 %%skip
- mov %2,%3
- %%skip:
- %endmacro
- movc ae,ax,bx
-
-which will expand to something like
-
- jnae ..@1234.skip
- mov ax,bx
- ..@1234.skip:
-
-Note that `%+1' will allow CXZ or ECXZ to be passed as condition
-codes, but `%-1' will of course be unable to invert them.
-
-Parameters can also be defaulted: you can define a macro which, for
-example, said
-
- %macro strange 1-3 bx,3
- < some expansion text >
- %endmacro
-
-This macro takes between 1 and 3 parameters (inclusive); if
-parameter 2 is not specified it defaults to BX, and if parameter 3
-is not specified it defaults to 3. So the calls
-
- strange dx,si,di
- strange dx,si
- strange dx
-
-would be equivalent to
-
- strange dx,si,di
- strange dx,si,3
- strange dx,bx,3
-
-Defaults may be omitted, in which case they are taken to be blank.
-
-`%endm' is a valid synonym for `%endmacro'.
-
-The specification for the number of macro parameters can be suffixed
-with `.nolist' if you don't want the macro to be explicitly expanded
-in listing files:
-
- %macro ping 1-2+.nolist
- ; some stuff
- %endmacro
-
-Standard Macros and `%clear'
-----------------------------
-
-NASM defines a set of standard macros, before the input file gets
-processed; these are primarily there in order to provide standard
-language features (such as structure support). However, it's
-conceivable that a user might want to write code that doesn't have
-the standard macros defined; you can achieve this by using the
-preprocessor directive `%clear' at the top of your program, which
-will undefine _everything_ that's defined by the preprocessor.
-
-In particular, NASM defines the symbols `__NASM_MAJOR__' and
-`__NASM_MINOR__' to be the major and minor version numbers of NASM.
-
-Conditional Assembly
---------------------
-
-Similarly to the C preprocessor, the commands `%ifdef' and `%endif'
-may be used to bracket a section of code, which will then only be
-assembled if at least one of the identifiers following `%ifdef' is
-defined as a single-line macro. The command `%ifndef' has opposite
-sense to `%ifdef', and `%else' can be placed between the `%if' and
-the `%endif' to work as expected. Since there is no analogue to C's
-`#if', there is no precise `elif' directive, but `%elifdef' and
-`%elifndef' work as expected.
-
-There is another family of `%if' constructs: `%ifctx', `%ifnctx',
-`%elifctx' and `%elifnctx', which operate on the context stack
-(described below).
-
-File Inclusion
---------------
-
-You can include a file using the `%include' directive. Included
-files are searched for in the current directory, and then in all
-directories specified on the command line with the `-i' option.
-(Note that the directories specified on the command line are
-directly prepended to the filename, so they must include the
-necessary trailing slash under DOS or Unix, or the equivalent on
-other systems.)
-
-This, again, works like C: `%include' is used to include a file. Of
-course it's quite likely you'd want to do the normal sort of thing
-inside the file:
-
- %ifndef MY_MACROS_FILE
- %define MY_MACROS_FILE
- < go and define some macros >
- %endif
-
-and then elsewhere
-
- %include "my-macros-file"
- < some code making use of the macros >
-
-so that it doesn't matter if the file accidentally gets included
-more than once.
-
-You can force an include file to be included without using a
-`%include' command, by specifying it as a pre-include file on the
-command line using the `-p' option.
-
-The Context Stack
------------------
-
-This is a feature which adds a whole extra level of power to NASM's
-macro capability. The context stack is an internal object within the
-preprocessor, which holds a stack of `contexts'. Each context has a
-name - just an identifier-type token - and can also have labels and
-`%define' macros associated with it. Other macros can manipulate the
-context stack: this is where the power comes in.
-
-To start with: the preprocessor command `%push' will create a new
-context with the given name, and push it on to the top of the stack.
-`%pop', taking no arguments, pops the top context off the stack and
-destroys it. `%repl' renames the top context without destroying any
-associated labels or macros, so it's distinct from doing `%pop'
-followed by `%push'. Finally, `%ifctx' and `%ifnctx' invoke
-conditional assembly based on the name of the top context. (The
-alternative forms `%elifctx' and `%elifnctx' are also available.)
-
-As well as the `%%foo' syntax to define labels specific to a macro
-call, there is also the syntax `%$foo' to define a label specific to
-the context currently on top of the stack. `%$$foo' can be used to
-refer to the context below that, or `%$$$foo' below that, and so on.
-
-This lot allows the definition of macro combinations that enclose
-other code, such as the following big example:
-
- %macro if 1
- %push if
- j%-1 %$ifnot
- %endmacro
- %macro else 0
- %ifctx if
- %repl else
- jmp %$ifend
- %$ifnot:
- %else
- %error "expected `if' before `else'"
- %endif
- %endmacro
- %macro endif 0
- %ifctx if
- %$ifnot:
- %pop
- %elifctx else
- %$ifend:
- %pop
- %else
- %error "expected `if' or `else' before `endif'"
- %endif
- %endmacro
-
-This will cope with a large `if/endif' construct _or_ an
-`if/else/endif', without flinching. So you can code:
-
- cmp ax,bx
- if ae
- cmp bx,cx
- if ae
- mov ax,cx
- else
- mov ax,bx
- endif
- else
- cmp ax,cx
- if ae
- mov ax,cx
- endif
- endif
-
-which will place the smallest out of AX, BX and CX into AX. Note the
-use of `%repl' to change the current context from `if' to `else'
-without disturbing the associated labels `%$ifend' and `%$ifnot';
-also note that the stack mechanism allows handling of nested IF
-statements without a hitch, and that conditional assembly is used in
-the `endif' macro in order to cope with the two possible forms with
-and without an `else'. Note also the directive `%error', which
-allows the user to report errors on improper invocation of a macro
-and so can catch unmatched `endif's at preprocess time.
-
-Output Formats
-==============
-
-The current output formats supported are `bin', `aout', `coff',
-`elf', `as86', `obj', `os2', `win32', `rdf', and the debug
-pseudo-format `dbg'.
-
-`bin': flat-form binary
------------------------
-
-This is at present the only output format that generates instantly
-runnable code: all the others produce object files that need linking
-before they become executable.
-
-`bin' output files contain no red tape at all: they simply contain
-the binary representation of the exact code you wrote.
-
-The `bin' format supports a format-specific directive, which is ORG.
-`ORG addr' declares that your code should be assembled as if it were
-to be loaded into memory at the address `addr'. So a DOS .COM file
-should state `ORG 0x100', and a DOS .SYS file should state `ORG 0'.
-There should be _one_ ORG directive, at most, in an assembly file:
-NASM does not support the use of ORG to jump around inside an object
-file, like MASM does (see the `Bugs' section for a demonstration of
-the use of MASM's form of ORG to do something that NASM's won't do.)
-
-Like almost all formats (but not `obj' or `os2'), the `bin' format
-defines the section names `.text', `.data' and `.bss'. The layout is
-that `.text' comes first in the output file, followed by `.data',
-and notionally followed by `.bss'. So if you declare a BSS section
-in a flat binary file, references to the BSS section will refer to
-space past the end of the actual file. The `.data' and `.bss'
-sections are considered to be aligned on four-byte boundaries: this
-is achieved by inserting padding zero bytes between the end of the
-text section and the start of the data, if there is data present. Of
-course if no SECTION directives are present, everything will go into
-`.text', and you will get nothing in the output except the code you
-wrote.
-
-`bin' silently ignores GLOBAL directives, and will also not complain
-at EXTERN ones. You only get an error if you actually _reference_ an
-external symbol.
-
-Using the `bin' format, the default output filename is `filename'
-for inputs of `filename.asm'. If there is no extension to be
-removed, output will be placed in `nasm.out' and a warning will be
-generated.
-
-`bin' defaults to 16-bit assembly mode.
-
-`aout' and `elf': Linux object files
-------------------------------------
-
-These two object formats are the ones used under Linux. They have no
-format-specific directives, and their default output filename is
-`filename.o'.
-
-`aout' defines the three standard sections `.text', `.data' and
-`.bss'. `elf' also, defines these three, but in addition it can
-support user-defined section names, which can be declared along with
-section attributes like this:
-
- section foo align=32 exec
- section bar write nobits
-
-The available options are:
-
-- A section can be `progbits' (the default) or `nobits'. `nobits'
- sections are BSS: their contents are not stored in the object
- file, and the only thing you can sensibly do in one is RESB.
- `progbits' are normal sections.
-
-- A section can be `exec' (indicating that it contains executable
- code), or `noexec' (the default).
-
-- A section can be `write' (indicating that it should be writable
- when linked), or `nowrite' (the default).
-
-- A section can be `alloc' (indicating that its contents should be
- loaded into program VM at load time; the default) or `noalloc'
- (for storing comments and things that don't form part of the
- loaded program).
-
-- You can specify a power of two for the section alignment by
- writing `align=64' or similar.
-
-The attributes of the default sections `.text', `.data' and `.bss'
-can also be redefined from their defaults. The NASM defaults are:
-
-section .text align=16 alloc exec nowrite progbits
-section .data align=4 alloc write noexec progbits
-section .bss align=4 alloc write noexec nobits
-
-ELF is a much more featureful object-file format than a.out: in
-particular it has enough features to support the writing of position
-independent code by means of a global offset table, and position
-independent shared libraries by means of a procedure linkage table.
-Unfortunately NASM, as yet, does not support these extensions, and
-so NASM cannot be used to write shared library code under ELF. NASM
-also does not support the capability, in ELF, for specifying precise
-alignment constraints on common variables.
-
-Both `aout' and `elf' default to 32-bit assembly mode.
-
-`coff' and `win32': Common Object File Format
----------------------------------------------
-
-The `coff' format generates standard Unix COFF object files, which
-can be fed to (for example) the DJGPP linker. Its default output
-filename, like the other Unix formats, is `filename.o'.
-
-The `win32' format generates Microsoft Win32 (Windows 95 or
-Intel-platform Windows NT) object files, which nominally use the
-COFF standard, but in fact are not compatible. Its default output
-filename is `filename.obj'.
-
-`coff' and `win32' are not quite compatible formats, due to the fact
-that Microsoft's interpretation of the term `relative relocation'
-does not seem to be the same as the interpretation used by anyone
-else. It is therefore more correct to state that Win32 uses a
-_variant_ of COFF. The object files will not therefore produce
-correct output when fed to each other's linkers. (I've tried it!)
-
-In addition to this subtle incompatibility, Win32 also defines
-extensions to basic COFF, such as a mechanism for importing symbols
-from dynamic-link libraries at load time. NASM may eventually
-support this extension in the form of a format-specific directive.
-However, as yet, it does not. Neither the `coff' nor `win32' output
-formats have any specific directives.
-
-The Microsoft linker also has a small blind spot: it cannot
-correctly relocate a relative CALL or JMP to an absolute address.
-Hence all PC-relative CALLs or JMPs, when using the `win32' format,
-must have targets which are relative to sections, or to external
-symbols. You can't do
- call 0x123456
-_even_ if you happen to know that there is executable code at that
-address. The linker simply won't get the reference right; so in the
-interests of not generating incorrect code, NASM will not allow this
-form of reference to be written to a Win32 object file. (Standard
-COFF, or at least the DJGPP linker, seems to be able to cope with
-this contingency. Although that may be due to the executable having
-a zero load address...)
-
-Note also that Borland Win32 compilers reportedly do not use this
-object file format: while Borland linkers will output Win32-COFF
-type executables, their object format is the same as the old DOS OBJ
-format. So if you are using a Borland compiler, don't use the
-`win32' object format, just use `obj' and declare all your segments
-as `USE32'.
-
-Both `coff' and `win32' support, in addition to the three standard
-section names `.text', `.data' and `.bss', the ability to define
-your own sections. Currently (this may change in the future) you can
-provide the options `text' (or `code'), `data' or `bss' to determine
-the type of section. Win32 also allows `info', which is an
-informational section type used by Microsoft C compilers to store
-linker directives. So you can do:
-
- section .mysect code ; defines an extra code section
-
-or maybe, in Win32,
-
- section .drectve info ; defines an MS-compatible directive section
- db '-defaultlib:LIBC -defaultlib:OLDNAMES '
-
-to pass directives to the MS linker.
-
-Both `coff' and `win32' default to 32-bit assembly mode.
-
-`obj' and `os2': Microsoft 16-bit Object Module Format
-------------------------------------------------------
-
-The `obj' format generates 16-bit Microsoft object files, suitable
-for feeding to 16-bit versions of Microsoft C, and probably
-TLINK as well (although that hasn't been tested). The Use32
-extensions are supported.
-
-`obj' defines no special segment names: you can call segments what
-you like. Unlike the other formats, too, segment names are actually
-defined as symbols, so you can write
-
- segment CODE
- mov ax,CODE
-
-and get the _segment_ address of the segment, suitable for loading
-into a segment register.
-
-Segments can be declared with attributes:
-
- SEGMENT CODE PRIVATE ALIGN=16 CLASS=CODE OVERLAY=OVL2 USE16
-
-You can specify segments to be PRIVATE, PUBLIC, COMMON or STACK;
-their alignment may be any power of two from 1 to 256 (although only
-1, 2, 4, 16 and 256 are really supported, so anything else gets
-rounded up to the next highest one of those); their class and
-overlay names may be specified. You may also specify segments to be
-USE16 or USE32. The defaults are PUBLIC ALIGN=1, no class, no
-alignment, USE16.
-
-You can also specify that a segment is _absolute_ at a certain
-segment address:
-
- SEGMENT SCREEN ABSOLUTE=0xB800
-
-The ABSOLUTE and ALIGN keywords are mutually exclusive.
-
-The format-specific directive GROUP allows segment grouping: `GROUP
-DGROUP DATA BSS' defines the group DGROUP to contain segments DATA
-and BSS.
-
-Segments are defined as part of their group by default: if variable
-`var' is declared in segment `data', which is part of group
-`dgroup', then the expression `SEG var' is equivalent to the
-expression `dgroup', and the expression `var' evaluates to the
-offset of the variable `var' relative to the beginning of the group
-`dgroup'. You must use the expression `var WRT data' to get the
-offset of the variable `var' relative to the beginning of its
-_segment_.
-
-NASM allows a segment to be part of more than one group (like A86,
-and unlike TASM), but will generate a warning (unlike A86!).
-References to the symbols in that segment will be resolved relative
-to the _first_ group it is defined in.
-
-The directive `UPPERCASE' causes all symbol, segment and group names
-output to the object file to be uppercased. The actual _assembly_ is
-still case sensitive.
-
-To avoid getting tangled up in NASM's local label mechanism, segment
-and group names have leading periods stripped when they are defined.
-Thus, the directive `SEGMENT .text' will define a segment called
-`text', which will clash with any other symbol called `text', and
-you will _not_ be able to reference the segment base as `.text', but
-only as `text'.
-
-Common variables in OBJ files can be `near' or `far': currently,
-NASM has a horribly grotty way to support that, which is that if you
-specify the common variable's size as negative, it will be near, and
-otherwise it will be far. The support isn't perfect: if you declare
-a far common variable both in a NASM assembly module and in a C
-program, you may well find the linker reports "mismatch in
-array-size" or some such. The reason for this is that far common
-variables are defined by means of _two_ size constants, which are
-multiplied to give the real size. Apparently the Microsoft linker
-(at least) likes both constants, not merely their product, to match
-up. This may be fixed in a future release.
-
-If the module you're writing is intended to contain the program
-entry point, you can declare this by defining the special label
-`..start' at the start point, either as a label or by EQU (although
-of course the normal caveats about EQU dependency still apply).
-
-`obj' has an unusual handling of assembly modes: instead of having a
-global default for the whole file, there is a separate default for
-each segment. Thus, each SEGMENT directive carries an implicit BITS
-directive with it, which switches to 16-bit or 32-bit mode depending
-on whether the segment is a Use16 or Use32 segment. If you want to
-place 32-bit code in a Use16 segment, you can use an explicit `BITS
-32' override, but if you switch temporarily away from that segment,
-you will have to repeat the override after coming back to it.
-
-If you're trying to build a .COM application by linking several .OBJ
-files together, you need to put `resb 0x100' at the front of the
-code segment in the first object file, since otherwise the linker
-will get the linking wrong.
-
-OS/2 uses an almost exactly similar file format to DOS, with a
-couple of differences, principally that OS/2 defines a pseudo-group
-called FLAT, containing no segments, and every relocation is made
-relative to that (so it would be equivalent to writing `label WRT
-FLAT' in place of `label' _throughout_ your code). Since this would
-be inconvenient to write code for, NASM implements the `os2' variant
-on `obj', which provides this FLAT group itself and automatically
-makes the default relocation format relative to FLAT.
-
-NOTE TO OS/2 USERS: The OS/2 output format is new in NASM version
-0.95. It hasn't been tested on any actual OS/2 systems, and I don't
-know for sure that it'll work properly. Any OS/2 users are
-encouraged to give it a thorough testing and report the results to
-me. Thanks!
-
-`as86': Linux as86 (bin86-0.3)
-------------------------------
-
-This output format attempts to replicate the format used to pass
-data between the Linux x86 assembler and linker, as86 and ld86. Its
-default file name, yet again, is `filename.o'. Its default
-segment-size attribute is 16 bits.
-
-`rdf': Relocatable Dynamic Object File Format
----------------------------------------------
-
-RDOFF was designed initially to test the object-file production
-interface to NASM. It soon became apparent that it could be enhanced
-for use in serious applications due to its simplicity; code to load
-and execute an RDOFF object module is very simple. It also contains
-enhancements to allow it to be linked with a dynamic link library at
-either run- or load- time, depending on how complex you wish to make
-your loader.
-
-The `rdoff' directory in the NASM distribution archive contains
-source for an RDF linker and loader to run under Linux.
-
-`rdf' has a default segment-size attribute of 32 bits.
-
-Debugging format: `dbg'
------------------------
-
-This output format is not built into NASM by default: it's for
-debugging purposes. It produces a debug dump of everything that the
-NASM assembly module feeds to the output driver, for the benefit of
-people trying to write their own output drivers.
-
-Common Problems
-===============
-
-A few problems that people repeatedly ask me about are documented
-here.
-
-NASM's design philosophy of generating exactly the code the
-programmer asks for, without second-guessing or re-interpreting, has
-been known to cause confusion in a couple of areas.
-
-Firstly, several people have complained that instructions such as
-`add esp,4' are assembled in a form that allocates a full four-byte
-offset field to store the `4' in, even though the instruction has a
-shorter form with a single-byte offset field which would work in
-this case. The answer is that NASM by design doesn't try to guess
-which one of these forms you want: if you want one, you code one,
-and if you want the other, you code the other. The other form is
-`add esp, byte 4'.
-
-Secondly, and similarly, I've had repeated questions about
-conditional jumps. The simple `jne label', in NASM, translates
-directly to the old 8086 form of the conditional jump, in which the
-offset can be up to 128 bytes (or thereabouts) in either direction.
-NASM won't automatically generate `je $+3 / jmp label' for labels
-that are further away, and neither will it generate the 386 long-
-offset form of the instruction. If you want the 386-specific
-conditional jump that's capable of reaching anywhere in the same
-segment as the jump instruction, you want `jne near label'. If you
-want an 8086-compatible `je' over another `jmp', code one
-explicitly, or define a macro to do so. NASM doesn't do either of
-these things for you, again by design.
-
-Bugs
-====
-
-Apart from the missing features (correct OBJ COMMON support, ELF
-alignment, ELF PIC support, etc.), there are no _known_ bugs.
-However, any you find, with patches if possible, should be sent to
-<jules@earthcorp.com> or <anakin@pobox.com>, and we'll try to fix
-them.
-
-Beware of Pentium-specific instructions: Intel have provided a macro
-file for MASM, to implement the eight or nine new Pentium opcodes as
-MASM macros. NASM does not generate the same code for the CMPXCHG8B
-instruction as these macros do: this is due to a bug in the _macro_,
-not in NASM. The macro works by generating an SIDT instruction (if I
-remember rightly), which has almost exactly the right form, then
-using ORG to back up a bit and do a DB over the top of one of the
-opcode bytes. The trouble is that Intel overlooked (or MASM syntax
-didn't let them allow for) the possibility that the SIDT instruction
-may contain an 0x66 or 0x67 operand or address size prefix. If this
-happens, the ORG will back up by the wrong amount, and the macro
-will generate incorrect code. NASM gets it right. This, also, is not
-a bug in NASM, so please don't report it as one. (Also please note
-that the ORG directive in NASM doesn't work this way, and so you
-can't do equivalent tricks with it...)
-
-That's All Folks!
-=================
-
-Enjoy using NASM! Please feel free to send me comments, or
-constructive criticism, or bug fixes, or requests, or general chat.
-
-Contributions are also welcome: if anyone knows anything about any
-other object file formats I should support, please feel free to send
-me documentation and some short example files (in my experience,
-documentation is useless without at _least_ one example), or even to
-write me an output module. OS/2 object files, in particular, spring
-to mind. I don't have OS/2, though.
-
-Please keep flames to a minimum: I have had some very angry e-mails
-in the past, condemning me for writing a useless assembler, that
-output in no useful format (at the time, that was true), generated
-incorrect code (several typos in the instruction table, since fixed)
-and took up too much memory and disk space (the price you pay for
-total portability, it seems). All these were criticisms I was happy
-to hear, but I didn't appreciate the flames that went with them.
-NASM _is_ still a prototype, and you use it at your own risk. I
-_think_ it works, and if it doesn't then I want to know about it,
-but I don't guarantee anything. So don't flame me, please. Blame,
-but don't flame.
-
-- Simon Tatham <anakin@pobox.com>, 21-Nov-96