diff options
Diffstat (limited to 'nasm.doc')
-rw-r--r-- | nasm.doc | 1769 |
1 files changed, 0 insertions, 1769 deletions
diff --git a/nasm.doc b/nasm.doc deleted file mode 100644 index 264d5ba7..00000000 --- a/nasm.doc +++ /dev/null @@ -1,1769 +0,0 @@ - The Netwide Assembler, NASM - =========================== - -Introduction -============ - -The Netwide Assembler grew out of an idea on comp.lang.asm.x86 (or -possibly alt.lang.asm, I forget which), which was essentially that -there didn't seem to be a good free x86-series assembler around, and -that maybe someone ought to write one. - -- A86 is good, but not free, and in particular you don't get any - 32-bit capability until you pay. It's DOS only, too. - -- GAS is free, and ports over DOS/Unix, but it's not very good, - since it's designed to be a back end to gcc, which always feeds it - correct code. So its error checking is minimal. Also its syntax is - horrible, from the point of view of anyone trying to actually - _write_ anything in it. Plus you can't write 16-bit code in it - (properly). - -- AS86 is Linux specific, and (my version at least) doesn't seem to - have much (or any) documentation. - -- MASM isn't very good. And it's expensive. And it runs only under - DOS. - -- TASM is better, but still strives for MASM compatibility, which - means millions of directives and tons of red tape. And its syntax - is essentially MASM's, with the contradictions and quirks that - entails (although it sorts out some of those by means of Ideal - mode). It's expensive too. And it's DOS only. - -So here, for your coding pleasure, is NASM. At present it's still in -prototype stage - we don't promise that it can outperform any of -these assemblers. But please, _please_ send us bug reports, fixes, -helpful information, and anything else you can get your hands on -(and thanks to the many people who've done this already! You all -know who you are), and we'll improve it out of all recognition. -Again. - -Please see the file `Licence' for the legalese. - -Getting Started: Installation -============================= - -NASM is distributed in source form, in what we hope is totally -ANSI-compliant C. It uses no non-portable code at all, that we know -of. It ought to compile without change on any system you care to try -it on. We also supply a pre-compiled 16-bit DOS binary. - -To install it, edit the Makefile to describe your C compiler, and -type `make'. Then copy the binary to somewhere on your path. That's -all - NASM relies on no files other than its own executable. -Although if you're on a Unix system, you may also want to install -the NASM manpage (`nasm.1'). You may also want to install the binary -and manpage for the Netwide Disassembler, NDISASM (also see -`ndisasm.doc'). - -Running NASM -============ - -To assemble a file, you issue a command of the form - - nasm -f <format> <filename> [-o <output>] - -For example, - - nasm -f elf myfile.asm - -will assemble `myfile.asm' into an ELF object file `myfile.o'. And - - nasm -f bin myfile.asm -o myfile.com - -will assemble `myfile.asm' into a raw binary program `myfile.com'. - -To produce a listing file, with the hex codes output from NASM -displayed on the left of the original sources, use `-l' to give a -listing file name, for example: - - nasm -f coff myfile.asm -l myfile.lst - -To get further usage instructions from NASM, try typing `nasm -h'. -This will also list the available output file formats, and what they -are. - -If you use Linux but aren't sure whether your system is a.out or -ELF, type `file /usr/bin/nasm' or wherever you put the NASM binary. -If it says something like - -/usr/bin/nasm: ELF 32-bit LSB executable i386 (386 and up) Version 1 - -then your system is ELF, and you should use `-f elf' when you want -NASM to produce Linux object files. If it says - -/usr/bin/nasm: Linux/i386 demand-paged executable (QMAGIC) - -or something similar, your system is a.out, and you should use `-f -aout' instead. - -Like Unix compilers and assemblers, NASM is silent unless it goes -wrong: you won't see any output at all, unless it gives error -messages. - -If you define an environment variable called NASM, the program will -interpret it as a list of extra command-line options, processed -before the real command line. This is probably most useful for -defining an include-file search path by putting a lot of `-i' -options in the NASM variable. - -The variable's value will be considered to be a space-separated list -of options unless it begins with something other than a minus sign, -in which case the first character will be taken as the separator. -For example, if you want to define a macro whose value has a space -in it, then setting the NASM variable to `-dNAME="my name"' won't -work because the string will be split at the space into `-dNAME="my' -and `name"', but setting it to `|-dNAME="my name"' will be fine -because all further operands will be considered to be separated by -vertical bars and so the space has no special meaning. - -Quick Start for MASM Users -========================== - -If you're used to writing programs with MASM, or with TASM in -MASM-compatible (non-Ideal) mode, or with A86, this section attempts -to outline the major differences between MASM's syntax and NASM's. -If you're not already used to MASM, it's probably worth skipping -this section. - -One simple difference is that NASM is case-sensitive. It makes a -difference whether you call your label `foo', `Foo' or `FOO'. If -you're assembling to the `obj' MS-DOS output format (or `os2'), you -can invoke the `UPPERCASE' directive (documented below, in the -Output Formats section) and ensure that all symbols exported to -other code modules are forced to uppercase; but even then, _within_ -a single module, NASM will distinguish between labels differing only -in case. - -There are also differences in some of the instructions and register -names: for example, NASM calls the floating-point stack registers -`st0', `st1' and so on, rather than MASM's `ST(0)' notation or A86's -simple numeric `0'. And NASM doesn't support LODS, MOVS, STOS, SCAS, -CMPS, INS, or OUTS, but only supports the size-specified versions -LODSB, MOVSW, SCASD and so on. - -The _major_ difference, though, is the absence in NASM of variable -typing. MASM will notice when you declare a variable as `var dw 0', -and will remember that `var' is a WORD-type variable, so that -instructions such as `mov var,2' can be unambiguously given the WORD -size rather than BYTE or DWORD. NASM doesn't and won't do this. The -statement `var dw 0' merely defines `var' to be a label marking a -point in memory: no more and no less. It so happens that there are -two bytes of data following that point in memory before the next -line of code, but NASM doesn't remember or care. If you want to -store the number 2 in such a variable, you must specify the size of -the operation _always_: `mov word [var],2'. This is a deliberate -design decision, _not_ a bug, so please could people not send us -mail asking us to `fix' it... - -The above example also illustrates another important difference -between MASM and NASM syntax: the use of OFFSET and of square -brackets. In MASM, declaring `var dw 0' entitles you to code `mov -ax,var' to get at the _contents_ of the variable, and you must write -`mov ax,offset var' to get the _address_ of the variable. In NASM, -`mov ax,var' gives you the address, and to get at the contents you -must code `mov ax,[var]'. Again, this is a deliberate design -decision, since it brings consistency to the syntax: `mov ax,[var]' -and `mov ax,[bx]' both refer to the contents of memory and both have -square brackets, whereas neither `mov ax,bx' nor `mov ax,var' refers -to memory contents and so neither one has square brackets. - -This is even more confusing in A86, where declaring a label with a -trailing colon defines it to be a `label' as opposed to a `variable' -and causes A86 to adopt NASM-style semantics; so in A86, `mov -ax,var' has different behaviour depending on whether `var' was -declared as `var: dw 0' or `var dw 0'. NASM is very simple by -comparison: _everything_ is a label. The OFFSET keyword is not -required, and in fact constitutes a syntax error (though you can -code `%define offset' to suppress the error messages if you want), -and `var' always refers to the _address_ of the label whereas -`[var]' refers to the _contents_. - -As an addendum to this point of syntax, it's also worth noting that -the hybrid-style syntaxes supported by MASM and its clones, such as -`mov ax,table[bx]', where a memory reference is denoted by one -portion outside square brackets and another portion inside, are also -not supported by NASM. The correct syntax for the above is `mov -ax,[table+bx]'. Likewise, `mov ax,es:[di]' is wrong and `mov -ax,[es:di]' is right. - -Writing Programs with NASM -========================== - -Each line of a NASM source file should contain some combination of -the four fields - -LABEL: INSTRUCTION OPERANDS ; COMMENT - -`LABEL' defines a label pointing to that point in the source. There -are no restrictions on white space: labels may have white space -before them, or not, as you please. The colon after the label is -also optional. (Note that NASM can be made to give a warning when it -sees a label which is the only thing on a line with no trailing -colon, on the grounds that such a label might easily be a mistyped -instruction name. The command line option `-w+orphan-labels' will -enable this feature.) - -Valid characters in labels are letters, numbers, `_', `$', `#', `@', -`~', `?', and `.'. The only characters which may be used as the -_first_ character of an identifier are letters, `_' and `?', and -(with special meaning: see `Local Labels') `.'. An identifier may -also be prefixed with a $ sign to indicate that it is intended to be -read as an identifier and not a reserved word; thus, if some other -module you are linking with defines a symbol `eax', you can refer to -`$eax' in NASM code to distinguish it from the register name. - -`INSTRUCTION' can be any machine opcode (Pentium and P6 opcodes, FPU -opcodes, MMX opcodes and even undocumented opcodes are all -supported). The instruction may be prefixed by LOCK, REP, REPE/REPZ -or REPNE/REPNZ, in the usual way. Explicit address-size and operand- -size prefixes A16, A32, O16 and O32 are provided - one example of -their use is given in the `Unusual Instruction Sizes' section below. -You can also use a segment register as a prefix: coding `es mov -[bx],ax' is equivalent to coding `mov [es:bx],ax'. We recommend the -latter syntax, since it is consistent with other syntactic features -of the language, but for instructions such as `lodsb' there isn't -anywhere to put a segment override except as a prefix. This is why -we support it. - -The `INSTRUCTION' field may also contain some pseudo-opcodes: see -the section on pseudo-opcodes for details. - -`OPERANDS' can be nonexistent, or huge, depending on the -instruction, of course. When operands are registers, they are given -simply as register names: `eax', `ss', `di' for example. NASM does -_not_ use the GAS syntax, in which register names are prefixed by a -`%' sign. Operands may also be effective addresses, or they may be -constants or expressions. See the separate sections on these for -details. - -`COMMENT' is anything after the first semicolon on the line, -excluding semicolons inside quoted strings. - -Of course, all these fields are optional: the presence or absence of -the OPERANDS field is required by the nature of the INSTRUCTION -field, but any line may contain a LABEL or not, may contain an -INSTRUCTION or not, and may contain a COMMENT or not, independently -of each other. - -Lines may also contain nothing but a directive: see `Assembler -Directives' below for details. - -NASM can currently not handle any line longer than 1024 characters. -This may be fixed in a future release. - -Floating Point Instructions -=========================== - -NASM has support for assembling FPU opcodes. However, its syntax is -not necessarily the same as anyone else's. - -NASM uses the notation `st0', `st1', etc. to denote the FPU stack -registers. NASM also accepts a wide range of single-operand and -two-operand forms of the instructions. For people who wish to use -the single-operand form exclusively (this is in fact the `canonical' -form from NASM's point of view, in that it is the form produced by -the Netwide Disassembler), there is a TO keyword which makes -available the opcodes which cannot be so easily accessed by one -operand. Hence: - - fadd st1 ; this sets st0 := st0 + st1 - fadd st0,st1 ; so does this - fadd st1,st0 ; this sets st1 := st1 + st0 - fadd to st1 ; so does this - -It's also worth noting that the FPU instructions that reference -memory must use the prefixes DWORD, QWORD or TWORD to indicate what -size of memory operand they refer to. - -NASM, in keeping with our policy of not trying to second-guess the -programmer, will _never_ automatically insert WAIT instructions into -your code stream. You must code WAIT yourself before _any_ -instruction that needs it. (Of course, on 286 processors or above, -it isn't needed anyway...) - -NASM supports specification of floating point constants by means of -`dd' (single precision), `dq' (double precision) and `dt' (extended -precision). Floating-point _arithmetic_ is not done, due to -portability constraints (not all platforms on which NASM can be run -support the same floating point types), but simple constants can be -specified. For example: - -gamma dq 0.5772156649 ; Euler's constant - -Pseudo-Opcodes -============== - -Pseudo-opcodes are not real x86 machine opcodes, but are used in the -instruction field anyway because that's the most convenient place to -put them. The current pseudo-opcodes are DB, DW, DD, DQ and DT, -their uninitialised counterparts RESB, RESW, RESD, RESQ and REST, -the INCBIN command, the EQU command, and the TIMES prefix. - -DB, DW, DD, DQ and DT work as you would expect: they can each take -an arbitrary number of operands, and when assembled, they generate -nothing but those operands. All three of them can take string -constants as operands. See the `Constants' section for details about -string constants. - -RESB, RESW, RESD, RESQ and REST are designed to be used in the BSS -section of a module: they declare _uninitialised_ storage space. -Each takes a single operand, which is the number of bytes, words or -doublewords to reserve. We do not support the MASM/TASM syntax of -reserving uninitialised space by writing `DW ?' or similar: this is -what we do instead. (But see `Critical Expressions' for a caveat on -the nature of the operand.) - -(An aside: if you want to be able to write `DW ?' and have something -vaguely useful happen, you can always code `? EQU 0'...) - -INCBIN is borrowed from the old Amiga assembler Devpac: it includes -a binary file verbatim into the output file. This can be handy for -(for example) including graphics and sound data directly into a game -executable file. It can be called in one of these three ways: - - INCBIN "file.dat" ; include the whole file - INCBIN "file.dat",1024 ; skip the first 1024 bytes - INCBIN "file.dat",1024,512 ; skip the first 1024, and - ; actually include at most 512 - -EQU defines a symbol to a specified value: when EQU is used, the -LABEL field must be present. The action of EQU is to define the -given label name to the value of its (only) operand. This definition -is absolute, and cannot change later. So, for example, - -message db 'hello, world' -msglen equ $-message - -defines `msglen' to be the constant 12. `msglen' may not then be -redefined later. This is not a preprocessor definition either: the -value of `msglen' is evaluated _once_, using the value of `$' (see -the section `Expressions' for details of `$') at the point of -definition, rather than being evaluated wherever it is referenced -and using the value of `$' at the point of reference. Note that the -caveat in `Critical Expressions' applies to EQU too, at the moment. - -Finally, the TIMES prefix causes the instruction to be assembled -multiple times. This is partly NASM's equivalent of the DUP syntax -supported by MASM-compatible assemblers, in that one can do - -zerobuf: times 64 db 0 - -or similar, but TIMES is more versatile than that. TIMES takes not -just a numeric constant, but a numeric _expression_, so one can do -things like - -buffer: db 'hello, world' - times 64-$+buffer db ' ' - -which will store exactly enough spaces to make the total length of -`buffer' up to 64. (See the section `Critical Expressions' for a -caveat on the use of TIMES.) Finally, TIMES can be applied to -ordinary opcodes, so you can code trivial unrolled loops in it: - - times 100 movsb - -Note that there is no effective difference between `times 100 resb -1' and `resb 100', except that the latter will be assembled about -100 times faster due to the internal structure of the assembler. - -Note also that TIMES can't be applied to macros: the reason for this -is that TIMES is processed after the macro phase, which allows the -argument to TIMES to contain expressions such as `64-$+buffer' as -above. - -Effective Addresses -=================== - -NASM's addressing scheme is very simple, although it can involve -more typing than other assemblers. Where other assemblers -distinguish between a _variable_ (label declared without a colon) -and a _label_ (declared with a colon), and use different means of -addressing the two, NASM is totally consistent. - -To refer to the contents of a memory location, square brackets are -required. This applies to simple variables, computed offsets, -segment overrides, effective addresses - _everything_. E.g.: - -wordvar dw 123 - mov ax,[wordvar] - mov ax,[wordvar+1] - mov ax,[es:wordvar+bx] - -NASM does _not_ support the various strange syntaxes used by MASM -and others, such as - - mov ax,wordvar ; this is legal, but means something else - mov ax,es:wordvar[bx] ; not even slightly legal - es mov ax,wordvar[1] ; the prefix is OK, but not the rest - -If no square brackets are used, NASM interprets label references to -mean the address of the label. Hence there is no need for MASM's -OFFSET keyword, but - - mov ax,wordvar - -loads AX with the _address_ of the variable `wordvar'. - -More complicated effective addresses are handled by enclosing them -within square brackets as before: - - mov eax,[ebp+2*edi+offset] - mov ax,[bx+di+8] - -NASM will cope with some fairly strange effective addresses, if you -try it: provided your effective address expression evaluates -_algebraically_ to something that the instruction set supports, it -will be able to assemble it. For example, - - mov eax,[ebx*5] ; actually assembles to [ebx+ebx*4] - mov ax,[bx-si+2*si] ; actually assembles to [bx+si] - -will both work. - -There is an ambiguity in the instruction set, which allows two forms -of 32-bit effective address with equivalent meaning: - - mov eax,[2*eax+0] - mov eax,[eax+eax] - -These two expressions clearly refer to the same address. The -difference is that the first one, if assembled `as is', requires a -four-byte offset to be stored as part of the instruction, so it -takes up more space. NASM will generate the second (smaller) form -for both of the above instructions, in an effort to save space. -There is not, currently, any means for forcing NASM to generate the -larger form of the instruction. - -An alternative syntax is supported, in which prefixing an operand -with `&' is synonymous with enclosing it in square brackets. The -square bracket syntax is the recommended one, however, and is the -syntax generated by NDISASM. But, for example, `mov eax,&ebx+ecx' is -equivalent to `mov eax,[ebx+ecx]'. - -Mixing 16 and 32 Bit Code: Unusual Instruction Sizes -==================================================== - -A number of assemblers seem to have trouble assembling instructions -that use a different operand or address size from the one they are -expecting; as86 is a good example, even though the Linux kernel boot -process (which is assembled using as86) needs several such -instructions and as86 can't do them. - -Instructions such as `mov eax,2' in 16-bit mode are easy, of course, -and NASM can do them just as well as any other assembler. The -difficult instructions are things like far jumps. - -Suppose you are in a 16-bit segment, in protected mode, and you want -to execute a far jump to a point in a 32-bit segment. You need to -code a 32-bit far jump in a 16-bit segment; not all assemblers will -easily support this. NASM can, by means of the `word' and `dword' -specifiers. So you can code - - jmp 1234h:5678h ; this uses the default segment size - jmp word 1234h:5678h ; this is guaranteed to be 16-bit - jmp dword 1234h:56789ABCh ; and this is guaranteed 32-bit - -and NASM will generate correct code for them. - -Similarly, if you are coding in a 16-bit code segment, but trying to -access memory in a 32-bit data segment, your effective addresses -will want to be 32-bit. Of course as soon as you specify an -effective address containing a 32-bit register, like `[eax]', the -addressing is forced to be 32-bit anyway. But if you try to specify -a simple offset, such as `[label]' or `[0x10000]', you will get the -default address size, which in this case will be wrong. However, -NASM allows you to code `[dword 0x10000]' to force a 32-bit address -size, or conversely `[word wlabel]' to force 16 bits. - -Be careful not to confuse `word' and `dword' _inside_ the square -brackets with _outside_: consider the instruction - - mov word [dword 0x123456],0x7890 - -which moves 16 bits of data to an address specified by a 32-bit -offset. There is no contradiction between the `word' and `dword' in -this instruction, since they modify different aspects of the -functionality. Or, even more confusingly, - - call dword far [fs:word 0x4321] - -which takes an address specified by a 16-bit offset, and extracts a -48-bit DWORD FAR pointer from it to call. - -Using this effective-address syntax, the `dword' or `word' override -may come before or after the segment override if any: NASM isn't -fussy. Hence: - - mov ax,[fs:dword 0x123456] - mov ax,[dword fs:0x123456] - -are equivalent forms, and generate the same code. - -The LOOP instruction comes in strange sizes, too: in a 16-bit -segment it uses CX as its count register by default, and in a 32-bit -segment it uses ECX. But it's possible to do either one in the other -segment, and NASM will cope by letting you specify the count -register as a second operand: - - loop label ; uses CX or ECX depending on mode - loop label,cx ; always uses CX - loop label,ecx ; always uses ECX - -Finally, the string instructions LODSB, STOSB, MOVSB, CMPSB, SCASB, -INSB, and OUTSB can all have strange address sizes: typically, in a -16-bit segment they read from [DS:SI] and write to [ES:DI], and in a -32-bit segment they read from [DS:ESI] and write to [ES:EDI]. -However, this can be changed by the use of the explicit address-size -prefixes `a16' and `a32'. These prefixes generate null code if used -in the same size segment as they specify, but generate an 0x67 -prefix otherwise. Hence `a16' generates no code in a 16-bit segment, -but 0x67 in a 32-bit one, and vice versa. So `a16 lodsb' will always -generate code to read a byte from [DS:SI], no matter what the size -of the segment. There are also explicit operand-size override -prefixes, `o16' and `o32', which will optionally generate 0x66 -bytes, but these are provided for completeness and should never have -to be used. (Note that NASM does not support the LODS, STOS, MOVS -etc. forms of the string instructions.) - -Constants -========= - -NASM can accept three kinds of constant: _numeric_, _character_ and -_string_ constants. - -Numeric constants are simply numbers. NASM supports a variety of -syntaxes for expressing numbers in strange bases: you can do any of - - 100 ; this is decimal - 0x100 ; hex - 100h ; hex as well - $100 ; hex again - 100q ; octal - 100b ; binary - -NASM does not support A86's syntax of treating anything with a -leading zero as hex, nor does it support the C syntax of treating -anything with a leading zero as octal. Leading zeros make no -difference to NASM. (Except that, as usual, if you have a hex -constant beginning with a letter, and you want to use the trailing-H -syntax to represent it, you have to use a leading zero so that NASM -will recognise it as a number instead of a label.) - -The `x' in `0x100', and the trailing `h', `q' and `b', may all be -upper case if you want. - -Character constants consist of up to four characters enclosed in -single or double quotes. No escape character is defined for -including the quote character itself: if you want to declare a -character constant containing a double quote, enclose it in single -quotes, and vice versa. - -Character constants' values are worked out in terms of a -little-endian computer: if you code - - mov eax,'abcd' - -then if you were to examine the binary output from NASM, it would -contain the visible string `abcd', which of course means that the -actual value loaded into EAX would be 0x64636261, not 0x61626364. - -String constants are like character constants, only more so: if a -character constant appearing as operand to a DB, DW or DD is longer -than the word size involved (1, 2 or 4 respectively), it will be -treated as a string constant instead, which is to say the -concatenation of separate character constants. - -For example, - - db 'hello, world' - -declares a twelve-character string constant. And - - dd 'dontpanic' - -(a string constant) is equivalent to writing - - dd 'dont','pani','c' - -(three character constants), so that what actually gets assembled is -equivalent to - - db 'dontpanic',0,0,0 - -(It's worth noting that one of the reasons for the reversal of -character constants is so that the instruction `dw "ab"' has the -same meaning whether "ab" is treated as a character constant or a -string constant. Hence there is less confusion.) - -Expressions -=========== - -Expressions in NASM can be formed of the following operators: `|' -(bitwise OR), `^' (bitwise XOR), `&' (bitwise AND), `<<' and `>>' -(logical bit shifts), `+', `-', `*' (ordinary addition, subtraction -and multiplication), `/', `%' (unsigned division and modulo), `//', -`%%' (signed division and modulo), `~' (bitwise NOT), and the -operators SEG and WRT (see `SEG and WRT' below). - -The order of precedence is: - -| lowest -^ -& -<< >> -binary + and - -* / % // %% -unary + and -, ~, SEG highest - -As usual, operators within a precedence level associate to the left -(i.e. `2-3-4' evaluates the same way as `(2-3)-4'). - -Note that since the `%' character is used by the preprocessor, it's -worth making sure that the `%' and `%%' operators are followed by a -space, to prevent the preprocessor trying to interpret them as -macro-related things. - -A form of algebra is done by NASM when evaluating expressions: I -have already stated that an effective address expression such as -`[EAX*6-EAX]' will be recognised by NASM as algebraically equivalent -to `[EAX*4+EAX]', and assembled as such. In addition, algebra can be -done on labels as well: `label2*2-label1' is an acceptable way to -define an address as far beyond `label2' as `label1' is before it. -(In less algebraically capable assemblers, one might have to write -that as `label2 + (label2-label1)', where the value of every -sub-expression is either a valid address or a constant. NASM can of -course cope with that version as well.) - -Expressions may also contain the special token `$', known as a Here -token, which always evaluates to the address of the current assembly -point. (That is, the address of the assembly point _before_ the -current instruction gets assembled.) The special token `$$' -evaluates to the address of the beginning of the current section; -this can be used for alignment, as shown below: - - times ($$-$) & 3 nop ; pad with NOPs to 4-byte boundary - -Note that this technique aligns to a four-byte boundary with respect -to the beginning of the _segment_; if you can't guarantee that the -segment itself begins on a four-byte boundary, this alignment is -useless or worse. Be sure you know what kind of alignment you can -guarantee to get out of your linker before you start trying to use -TIMES to align to page boundaries. (Of course, the `obj' and `os2' -file formats can happily cope with page alignment, provided you -specify that segment attribute.) - -SEG and WRT -=========== - -NASM contains the capability for its object file formats (currently, -only `obj' and its variant `os2' make use of this) to permit -programs to directly refer to the segment-base values of their -segments. This is achieved either by the object format defining the -segment names as symbols (`obj' and `os2' do this), or by the use of -the SEG operator. - -SEG is a unary prefix operator which, when applied to a symbol -defined in a segment, will yield the segment base value of that -segment. (In `obj' and `os2' format, symbols defined in segments -which are grouped are considered to be primarily a member of the -_group_, not the segment, and the return value of SEG reflects -this.) - -SEG may be used for far pointers: it is guaranteed that for any -symbol `sym', using the offset `sym' from the segment base `SEG sym' -yields a correct pointer to the symbol. Hence you can code a far -call by means of - - CALL SEG routine:routine - -or store a far pointer in a data segment by - - DW routine, SEG routine - -For convenience, NASM supports the forms - - CALL FAR routine - JMP FAR routine - -as direct synonyms for the canonical syntax - - CALL SEG routine:routine - JMP SEG routine:routine - -No alternative syntax for - - DW routine, SEG routine - -is supported. - -Simply referring to `sym', for some symbol, will return the offset -of `sym' from its _preferred_ segment base (as returned from `SEG -sym'); sometimes, you may want to obtain the offset of `sym' from -some _other_ segment base. (E.g. the offset of `sym' from the base -of the segment it's in, where normally you'd get the offset from a -group base). This is accomplished using the WRT (With Reference To) -keyword: if `sym' is defined in segment `seg' but you want its -offset relative to the beginning of segment `seg2', you can do - - mov ax,sym WRT seg2 - -The right-hand operand to WRT must be a segment-base value. You can -also do `sym WRT SEG sym2' if you need to. - -Critical Expressions -==================== - -NASM is a two-pass assembler: it goes over the input once to -determine the location of all the symbols, then once more to -actually generate the output code. Most expressions are -non-critical, in that if they contain a forward reference and hence -their correct value is unknown during the first pass, it doesn't -matter. However, arguments to RESB, RESW and RESD, and the argument -to the TIMES prefix, can actually affect the _size_ of the generated -code, and so it is critical that the expression can be evaluated -correctly on the first pass. So in these situations, expressions may -not contain forward references. This prevents NASM from having to -sort out a mess such as - - times (label-$) db 0 -label: db 'where am I?' - -in which the TIMES argument could equally legally evaluate to -_anything_, or perhaps even worse, - - times (label-$+1) db 0 -label: db 'NOW where am I?' - -in which any value for the TIMES argument is by definition invalid. - -Since NASM is a two-pass assembler, this criticality condition also -applies to the argument to EQU. Suppose, if this were not the case, -we were to have the setup - - mov ax,a -a equ b -b: - -On pass one, `a' cannot be defined properly, since `b' is not known -yet. On pass two, `b' is known, so line two can define `a' properly. -Unfortunately, line 1 needed `a' to be defined properly, so this -code will not assemble using only two passes. - -There's a related issue: in an effective address such as -`[eax+offset]', the value of `offset' can be stored as either 1 or 4 -bytes. NASM will use the one-byte form if it knows it can, to save -space, but will therefore be fooled by the following: - - mov eax,[ebx+offset] -offset equ 10 - -In this case, although `offset' is a small value and could easily -fit into the one-byte form of the instruction, when NASM sees the -instruction in the first pass it doesn't know what `offset' is, and -for all it knows `offset' could be a symbol requiring relocation. So -it will allocate the full four bytes for the value of `offset'. This -can be solved by defining `offset' before it's used. - -Local Labels -============ - -NASM takes its local label scheme mainly from the old Amiga -assembler Devpac: a local label is one that begins with a period. -The `localness' comes from the fact that local labels are associated -with the previous non-local label, so that you may declare the same -local label twice if a non-local one intervenes. Hence: - -label1 ; some code -.loop ; some more code - jne .loop - ret -label2 ; some code -.loop ; some more code - jne .loop - ret - -In the above code, each `jne' instruction jumps to the line of code -before it, since the `.loop' labels are distinct from each other. - -NASM, however, introduces an extra capability not present in Devpac, -which is that the local labels are actually _defined_ in terms of -their associated non-local label. So if you really have to, you can -write - -label3 ; some more code - ; and some more - jmp label1.loop - -So although local labels are _usually_ local, it is possible to -reference them from anywhere in your program, if you really have to. - -Assembler Directives -==================== - -Assembler directives appear on a line by themselves (apart from a -comment). They come in two forms: user-level directives and -primitive directives. Primitive directives are enclosed in square -brackets (no white space may appear before the opening square -bracket, although white space and a comment may come after the -closing bracket), and were the only form of directive supported by -earlier versions of NASM. User-level directives look the same, only -without the square brackets, and are the more modern form. (They are -implemented as macros expanding to primitive directives.) There is a -distinction in functionality, which is explained below in the -section on structures. - -Some directives are universal: they may be used in any situation, -and do not change their syntax. The universal directives are listed -below. - -`BITS 16' or `BITS 32' switches NASM into 16-bit or 32-bit mode. -(This is equivalent to USE16 and USE32 segments, in TASM or MASM.) -In 32-bit mode, instructions are prefixed with 0x66 or 0x67 prefixes -when they use 16-bit data or addresses; in 16-bit mode, the reverse -happens. NASM's default depends on the object format; the defaults -are documented with the formats. (See `obj' and `os2', in -particular, for some unusual behaviour.) - -`SECTION name' or `SEGMENT name' changes which section the code you -write will be assembled into. Acceptable section names vary between -output formats, but most formats (indeed, all formats at the moment) -support the names `.text', `.data' and `.bss'. Note that `.bss' is -an uninitialised data section, and so you will receive a warning -from NASM if you try to assemble any code or data in it. The only -thing you can do in `.bss' without triggering a warning is to use -RESB, RESW and RESD. That's what they're for. - -`ABSOLUTE address' can be considered a different form of `SECTION', -in that it must be overridden using a SECTION directive once you -have finished using it. It is used to assemble notional code at an -absolute offset address; of course, you can't actually assemble -_code_ there, since no object file format is capable of putting the -code in place, but you can use RESB, RESW and RESD, and you can -define labels. Hence you could, for example, define a C-like data -structure by means of - - absolute 0 - stLong resd 1 - stWord resw 1 - stByte1 resb 1 - stByte2 resb 1 - st_size: - segment .text - -and then carry on coding. This defines `stLong' to be zero, `stWord' -to be 4, `stByte1' to be 6, `stByte2' to be 7 and `st_size' to be 8. -So this has defined a data structure. The STRUC directive provides a -nicer way to do this: see below. - -`EXTERN symbol' defines a symbol as being `external', in the C -sense: `EXTERN' states that the symbol is _not_ declared in this -module, but is declared elsewhere, and that you wish to _reference_ -it in this module. - -`GLOBAL symbol' defines a symbol as being global, in the sense that -it is exported from this module and other modules may reference it. -All symbols are local, unless declared as global. Note that the -`GLOBAL' directive must appear before the definition of the symbol -it refers to. - -`COMMON symbol size' defines a symbol as being common: it is -declared to have the given size, and it is merged at link time with -any declarations of the same symbol in other modules. This is not -_fully_ supported in the `obj' or `os2' file format: see the section -on `obj' for details. - -`STRUC structure' begins the definition of a data structure, and -`ENDSTRUC' ends it. The structure shown above may be defined, -exactly equivalently, using STRUC as follows: - - struc st - stLong resd 1 - stWord resw 1 - stByte resb 1 - stStr resb 32 - endstruc - -Notice that this code still defines the symbol `st_size' to be the -size of the structure. The `_size' suffix is automatically appended -to the structure name. Notice also that the assembler takes care of -remembering which section you were assembling in (whereas in the -version using `ABSOLUTE' it was up to the programmer to sort that -out). - -`ISTRUC structure' begins the declaration of an initialised instance -of a data structure. You can then use the `AT' macro to assign -values to the structure members, and `IEND' to finish. So, for -example, given the structure `st' above: - - istruc st - at stLong, dd 0x1234 - at stWord, dw 23 - at stByte, db 'q' - at stStr, db 'hello, world', 13, 10, 0 - iend - -Note that there's nothing stopping the instruction after `at' from -overflowing on to the next line if you want. So the above example -could just as well have contained - - at stStr, db 'hello, world' - db 13, 10, 0 - -or even (if you prefer this style) - - at stStr - db 'hello, world' - db 13, 10, 0 - -Note also that the `ISTRUC' mechanism is implemented as a set of -macros, and uses TIMES internally to achieve its effect; so the -structure fields must be initialised in the same order as they were -defined in. - -This is where user-level directives differ from primitives: the -`SECTION' (and `SEGMENT') user-level directives don't just call the -primitive versions, but they also `%define' the special preprocessor -symbol `__SECT__' to be the primitive directive that specifies the -current section. So the `ENDSTRUC' directive can remember what -section the assembly was directed to before the structure definition -began. For this reason, there is no primitive version of STRUC or -ENDSTRUC - they are implemented in terms of ABSOLUTE and SECTION. -This also means that if you use STRUC before explicitly announcing a -target section, you should explicitly announce one after ENDSTRUC. - -Directives may also be specific to the output file format. At -present, the `bin', `obj' and `os2' formats define extra directives, -which are specified below. - -The Preprocessor -================ - -NASM contains a full-featured macro preprocessor, which supports -conditional assembly, multi-level file inclusion, two forms of macro -(single-line and multi-line), and a `context stack' mechanism for -extra macro power. Preprocessor directives all begin with a `%' -sign. - -Single-line macros ------------------- - -Single-line macros are defined in a similar way to C, using the -`%define' command. Hence you can do: - - %define ctrl 0x1F & - %define param(a,b) ((a)+(a)*(b)) - mov byte [param(2,ebx)], ctrl 'D' - -which will expand to - - mov byte [(2)+(2)*(ebx)], 0x1F & 'D' - -When the expansion of a single-line macro contains tokens which -invoke another macro, the expansion is performed at invocation time, -not at definition time. Thus the code - - %define a(x) 1+b(x) - %define b(x) 2*x - mov ax,a(8) - -will evaluate in the expected way to `mov ax,1+2*8', even though the -macro `b' wasn't defined at the time of definition of `a'. - -Macros defined with `%define' are case sensitive: after `%define foo -bar', only `foo' will expand to bar: `Foo' or `FOO' will not. By -using `%idefine' instead of `%define' (the `i' stands for -`insensitive') you can define all the case variants of a macro at -once, so that `%idefine foo bar' would cause `foo', `Foo' and `FOO' -all to expand to `bar'. - -There is a mechanism which detects when a macro call has occurred as -a result of a previous expansion of the same macro, to guard against -circular references and infinite loops. If this happens, the -preprocessor will only expand the first occurrence of the macro. -Hence: - - %define a(x) 1+a(x) - mov ax,a(3) ; becomes 1+a(3) and expands no further - -This can be useful for doing things like this: - - %macro extrn 1 ; see next section for explanation of `%macro' - extern _%1 - %define %1 _%1 - %endmacro - -which would avoid having to put leading underscores on external -variables, because you could just code - - extrn foo - mov ax,foo - -and it would expand as - - extern foo - %define foo _foo - mov ax,foo ; becomes mov ax,_foo as required - -Single-line macros with parameters can be overloaded: it is possible -to define two or more single-line macros with the same name, each -taking a different number of parameters, and the macro processor -will be able to distinguish between them. However, a parameterless -single-line macro excludes the possibility of any macro of the same -name _with_ parameters, and vice versa (though single-line macros -may be redefined, keeping the same number of parameters, without -error). - -You can pre-define single-line macros using the `-d' option on the -NASM command line, such as - - nasm filename -dDEBUG - -(and then you might have various conditional-assembly bits under -`%ifdef DEBUG'), or possibly - - nasm filename -dTYPE=4 - -(which might allow you to re-assemble your code to do several -different things depending on the value of TYPE). - -Multiple-line macros --------------------- - -These are defined using `%macro' and `%endmacro', so that simple things -like this can be done: - - %macro prologue 0 - push ebp - mov ebp,esp - %endmacro - -This defines `prologue' to be a multi-line macro, taking no -parameters, which expands to the two lines of code given. - -Similarly to single-line macros, multi-line macros are case- -sensitive, unless you define them using `%imacro' instead of -`%macro'. - -The `0' on the `%macro' line indicates that the macro `prologue' -expects no parameters. Macros can be overloaded: if two macros are -defined with the same name but different numbers of parameters, they -will be treated as separate. Multi-line macros may not be redefined. - -The assembler will usually generate a warning if you code a line -which looks like a macro call but involves a number of parameters -which the macro in question isn't ready to support. (For example, if -you code a macro `%macro foo 1' and also `%macro foo 3', then you -write `foo a,b', a warning will be generated.) This feature can be -disabled by the use of the command line option `-w-macro-params', -since sometimes it's intentional (for example, you might define -`%macro push 2' to allow you to push two registers at once; but -`push ax' shouldn't then generate a warning). - -Macros taking parameters can be written using `%1', `%2' and so on -to reference the parameters. So this code - - %macro movs 2 - push %2 - pop %1 - %endmacro - movs ds,cs - -will define a macro `movs' to perform an effective MOV operation -from segment to segment register. The macro call given would of -course expand to `push cs' followed by `pop ds'. - -You can define a label inside a macro in such a way as to make it -unique to that macro call (so that repeated calls to the same macro -won't produce multiple labels with the same name), by prefixing it -with `%%'. So: - - %macro retz - jnz %%skip - ret - %%skip: - %endmacro - -This defines a different label in place of `%%skip' every time it's -called. (Of course the above code could have easily been coded using -`jnz $+3', but not in more complex cases...) The actual label -defined would be `..@2345.skip', where 2345 is replaced by some -number that changes with each macro call. Users are warned to avoid -defining labels of this shape themselves. - -Sometimes you want a macro to be able to accept arbitrarily many -parameters and lump them into one. This can be done using the `+' -modifier on the `%macro' line: - - %macro fputs 2+ - [section .data] ; this is done as a primitive to avoid - ; disturbing the __SECT__ define - %%str db %2 - %%end: - __SECT__ ; this expands to a whole [section xxx] primitive - mov dx,%%str - mov cx,%%end-%%str - mov bx,%1 - call writefile - %endmacro - fputs [filehandle], "hi there", 13, 10 - -This declares `fputs' to be a macro that accepts _at least two_ -parameters, and all parameters after the first one are lumped -together as part of the last specified one (in this case %2). So in -the macro call, `%1' expands to `[filehandle]' while `%2' expands to -the whole remainder of the line: `"hi there", 13, 10'. Note also the -switching of sections in the middle of this macro expansion, to -ensure separation of data and code. - -There is an alternative mechanism for putting commas in macro -parameters: instead of specifying the large-parameter-ness at macro -definition time, you can specify it at macro call time, by the use -of braces to surround a parameter which you want to contain commas. -So: - - %macro table_entry 2 - %%start: - db %1 - times 32-($-%%start) db 0 - db %2 - times 64-($-%%start) db 0 - %endmacro - table_entry 'foo','bar' - table_entry 'megafoo', { 27,'[1mBAR!',27,'[m' } - -will expand to, effectively (actually, there will be labels present, -but these have been omitted for clarity), the following: - - db 'foo' - times 32-3 db 0 - db 'bar' - times 64-35 db 0 - db 'megafoo' - times 32-7 db 0 - db 27,'[1mBAR!',27,'[m' - times 64-46 db 0 - -Macro parameter expansions can be concatenated on to other tokens, -so that you can do this: - - %macro keytab_entry 2 - keypos%1 equ $-keytab - db %2 - %endmacro - keytab: - keytab_entry F1,128+1 - keytab_entry F2,128+2 - keytab_entry Return,13 - -which will define labels called `keyposF1', `keyposF2' and -`keyposReturn'. You can similarly do concatenations on the other -end, such as `%1foo'. If you need to concatenate a digit on to the -end of a macro parameter expansion, you can do this by enclosing the -parameter number in braces: `%{1}' is always a valid synonym for -`%1', and has the advantage that it can be legitimately prepended to -a digit, as in `%{1}2', and cause no confusion with `%{12}'. -Macro-specific labels and defines can be concatenated similarly: -`%{%foo}bar' will succeed where `%%foobar' would cause confusion. -(As it happens, `%%foobar' would work anyway, due to the format of -macro-specific labels, but for clarity, `%{%foo}bar' is recommended -if you _really_ want to do anything this perverse...) - -The parameter handling has a special case: it can treat a macro -parameter specially if it's thought to contain a condition code. The -reference `%+1' is identical to `%1' except that it will perform an -initial sanity check to see if the parameter in question is a -condition code; more usefully, the reference `%-1' will produce the -_opposite_ condition code to the one specified in the parameter. -This allows for things such as a conditional-MOV macro to be -defined: - - %macro movc 3 - j%-1 %%skip - mov %2,%3 - %%skip: - %endmacro - movc ae,ax,bx - -which will expand to something like - - jnae ..@1234.skip - mov ax,bx - ..@1234.skip: - -Note that `%+1' will allow CXZ or ECXZ to be passed as condition -codes, but `%-1' will of course be unable to invert them. - -Parameters can also be defaulted: you can define a macro which, for -example, said - - %macro strange 1-3 bx,3 - < some expansion text > - %endmacro - -This macro takes between 1 and 3 parameters (inclusive); if -parameter 2 is not specified it defaults to BX, and if parameter 3 -is not specified it defaults to 3. So the calls - - strange dx,si,di - strange dx,si - strange dx - -would be equivalent to - - strange dx,si,di - strange dx,si,3 - strange dx,bx,3 - -Defaults may be omitted, in which case they are taken to be blank. - -`%endm' is a valid synonym for `%endmacro'. - -The specification for the number of macro parameters can be suffixed -with `.nolist' if you don't want the macro to be explicitly expanded -in listing files: - - %macro ping 1-2+.nolist - ; some stuff - %endmacro - -Standard Macros and `%clear' ----------------------------- - -NASM defines a set of standard macros, before the input file gets -processed; these are primarily there in order to provide standard -language features (such as structure support). However, it's -conceivable that a user might want to write code that doesn't have -the standard macros defined; you can achieve this by using the -preprocessor directive `%clear' at the top of your program, which -will undefine _everything_ that's defined by the preprocessor. - -In particular, NASM defines the symbols `__NASM_MAJOR__' and -`__NASM_MINOR__' to be the major and minor version numbers of NASM. - -Conditional Assembly --------------------- - -Similarly to the C preprocessor, the commands `%ifdef' and `%endif' -may be used to bracket a section of code, which will then only be -assembled if at least one of the identifiers following `%ifdef' is -defined as a single-line macro. The command `%ifndef' has opposite -sense to `%ifdef', and `%else' can be placed between the `%if' and -the `%endif' to work as expected. Since there is no analogue to C's -`#if', there is no precise `elif' directive, but `%elifdef' and -`%elifndef' work as expected. - -There is another family of `%if' constructs: `%ifctx', `%ifnctx', -`%elifctx' and `%elifnctx', which operate on the context stack -(described below). - -File Inclusion --------------- - -You can include a file using the `%include' directive. Included -files are searched for in the current directory, and then in all -directories specified on the command line with the `-i' option. -(Note that the directories specified on the command line are -directly prepended to the filename, so they must include the -necessary trailing slash under DOS or Unix, or the equivalent on -other systems.) - -This, again, works like C: `%include' is used to include a file. Of -course it's quite likely you'd want to do the normal sort of thing -inside the file: - - %ifndef MY_MACROS_FILE - %define MY_MACROS_FILE - < go and define some macros > - %endif - -and then elsewhere - - %include "my-macros-file" - < some code making use of the macros > - -so that it doesn't matter if the file accidentally gets included -more than once. - -You can force an include file to be included without using a -`%include' command, by specifying it as a pre-include file on the -command line using the `-p' option. - -The Context Stack ------------------ - -This is a feature which adds a whole extra level of power to NASM's -macro capability. The context stack is an internal object within the -preprocessor, which holds a stack of `contexts'. Each context has a -name - just an identifier-type token - and can also have labels and -`%define' macros associated with it. Other macros can manipulate the -context stack: this is where the power comes in. - -To start with: the preprocessor command `%push' will create a new -context with the given name, and push it on to the top of the stack. -`%pop', taking no arguments, pops the top context off the stack and -destroys it. `%repl' renames the top context without destroying any -associated labels or macros, so it's distinct from doing `%pop' -followed by `%push'. Finally, `%ifctx' and `%ifnctx' invoke -conditional assembly based on the name of the top context. (The -alternative forms `%elifctx' and `%elifnctx' are also available.) - -As well as the `%%foo' syntax to define labels specific to a macro -call, there is also the syntax `%$foo' to define a label specific to -the context currently on top of the stack. `%$$foo' can be used to -refer to the context below that, or `%$$$foo' below that, and so on. - -This lot allows the definition of macro combinations that enclose -other code, such as the following big example: - - %macro if 1 - %push if - j%-1 %$ifnot - %endmacro - %macro else 0 - %ifctx if - %repl else - jmp %$ifend - %$ifnot: - %else - %error "expected `if' before `else'" - %endif - %endmacro - %macro endif 0 - %ifctx if - %$ifnot: - %pop - %elifctx else - %$ifend: - %pop - %else - %error "expected `if' or `else' before `endif'" - %endif - %endmacro - -This will cope with a large `if/endif' construct _or_ an -`if/else/endif', without flinching. So you can code: - - cmp ax,bx - if ae - cmp bx,cx - if ae - mov ax,cx - else - mov ax,bx - endif - else - cmp ax,cx - if ae - mov ax,cx - endif - endif - -which will place the smallest out of AX, BX and CX into AX. Note the -use of `%repl' to change the current context from `if' to `else' -without disturbing the associated labels `%$ifend' and `%$ifnot'; -also note that the stack mechanism allows handling of nested IF -statements without a hitch, and that conditional assembly is used in -the `endif' macro in order to cope with the two possible forms with -and without an `else'. Note also the directive `%error', which -allows the user to report errors on improper invocation of a macro -and so can catch unmatched `endif's at preprocess time. - -Output Formats -============== - -The current output formats supported are `bin', `aout', `coff', -`elf', `as86', `obj', `os2', `win32', `rdf', and the debug -pseudo-format `dbg'. - -`bin': flat-form binary ------------------------ - -This is at present the only output format that generates instantly -runnable code: all the others produce object files that need linking -before they become executable. - -`bin' output files contain no red tape at all: they simply contain -the binary representation of the exact code you wrote. - -The `bin' format supports a format-specific directive, which is ORG. -`ORG addr' declares that your code should be assembled as if it were -to be loaded into memory at the address `addr'. So a DOS .COM file -should state `ORG 0x100', and a DOS .SYS file should state `ORG 0'. -There should be _one_ ORG directive, at most, in an assembly file: -NASM does not support the use of ORG to jump around inside an object -file, like MASM does (see the `Bugs' section for a demonstration of -the use of MASM's form of ORG to do something that NASM's won't do.) - -Like almost all formats (but not `obj' or `os2'), the `bin' format -defines the section names `.text', `.data' and `.bss'. The layout is -that `.text' comes first in the output file, followed by `.data', -and notionally followed by `.bss'. So if you declare a BSS section -in a flat binary file, references to the BSS section will refer to -space past the end of the actual file. The `.data' and `.bss' -sections are considered to be aligned on four-byte boundaries: this -is achieved by inserting padding zero bytes between the end of the -text section and the start of the data, if there is data present. Of -course if no SECTION directives are present, everything will go into -`.text', and you will get nothing in the output except the code you -wrote. - -`bin' silently ignores GLOBAL directives, and will also not complain -at EXTERN ones. You only get an error if you actually _reference_ an -external symbol. - -Using the `bin' format, the default output filename is `filename' -for inputs of `filename.asm'. If there is no extension to be -removed, output will be placed in `nasm.out' and a warning will be -generated. - -`bin' defaults to 16-bit assembly mode. - -`aout' and `elf': Linux object files ------------------------------------- - -These two object formats are the ones used under Linux. They have no -format-specific directives, and their default output filename is -`filename.o'. - -`aout' defines the three standard sections `.text', `.data' and -`.bss'. `elf' also, defines these three, but in addition it can -support user-defined section names, which can be declared along with -section attributes like this: - - section foo align=32 exec - section bar write nobits - -The available options are: - -- A section can be `progbits' (the default) or `nobits'. `nobits' - sections are BSS: their contents are not stored in the object - file, and the only thing you can sensibly do in one is RESB. - `progbits' are normal sections. - -- A section can be `exec' (indicating that it contains executable - code), or `noexec' (the default). - -- A section can be `write' (indicating that it should be writable - when linked), or `nowrite' (the default). - -- A section can be `alloc' (indicating that its contents should be - loaded into program VM at load time; the default) or `noalloc' - (for storing comments and things that don't form part of the - loaded program). - -- You can specify a power of two for the section alignment by - writing `align=64' or similar. - -The attributes of the default sections `.text', `.data' and `.bss' -can also be redefined from their defaults. The NASM defaults are: - -section .text align=16 alloc exec nowrite progbits -section .data align=4 alloc write noexec progbits -section .bss align=4 alloc write noexec nobits - -ELF is a much more featureful object-file format than a.out: in -particular it has enough features to support the writing of position -independent code by means of a global offset table, and position -independent shared libraries by means of a procedure linkage table. -Unfortunately NASM, as yet, does not support these extensions, and -so NASM cannot be used to write shared library code under ELF. NASM -also does not support the capability, in ELF, for specifying precise -alignment constraints on common variables. - -Both `aout' and `elf' default to 32-bit assembly mode. - -`coff' and `win32': Common Object File Format ---------------------------------------------- - -The `coff' format generates standard Unix COFF object files, which -can be fed to (for example) the DJGPP linker. Its default output -filename, like the other Unix formats, is `filename.o'. - -The `win32' format generates Microsoft Win32 (Windows 95 or -Intel-platform Windows NT) object files, which nominally use the -COFF standard, but in fact are not compatible. Its default output -filename is `filename.obj'. - -`coff' and `win32' are not quite compatible formats, due to the fact -that Microsoft's interpretation of the term `relative relocation' -does not seem to be the same as the interpretation used by anyone -else. It is therefore more correct to state that Win32 uses a -_variant_ of COFF. The object files will not therefore produce -correct output when fed to each other's linkers. (I've tried it!) - -In addition to this subtle incompatibility, Win32 also defines -extensions to basic COFF, such as a mechanism for importing symbols -from dynamic-link libraries at load time. NASM may eventually -support this extension in the form of a format-specific directive. -However, as yet, it does not. Neither the `coff' nor `win32' output -formats have any specific directives. - -The Microsoft linker also has a small blind spot: it cannot -correctly relocate a relative CALL or JMP to an absolute address. -Hence all PC-relative CALLs or JMPs, when using the `win32' format, -must have targets which are relative to sections, or to external -symbols. You can't do - call 0x123456 -_even_ if you happen to know that there is executable code at that -address. The linker simply won't get the reference right; so in the -interests of not generating incorrect code, NASM will not allow this -form of reference to be written to a Win32 object file. (Standard -COFF, or at least the DJGPP linker, seems to be able to cope with -this contingency. Although that may be due to the executable having -a zero load address...) - -Note also that Borland Win32 compilers reportedly do not use this -object file format: while Borland linkers will output Win32-COFF -type executables, their object format is the same as the old DOS OBJ -format. So if you are using a Borland compiler, don't use the -`win32' object format, just use `obj' and declare all your segments -as `USE32'. - -Both `coff' and `win32' support, in addition to the three standard -section names `.text', `.data' and `.bss', the ability to define -your own sections. Currently (this may change in the future) you can -provide the options `text' (or `code'), `data' or `bss' to determine -the type of section. Win32 also allows `info', which is an -informational section type used by Microsoft C compilers to store -linker directives. So you can do: - - section .mysect code ; defines an extra code section - -or maybe, in Win32, - - section .drectve info ; defines an MS-compatible directive section - db '-defaultlib:LIBC -defaultlib:OLDNAMES ' - -to pass directives to the MS linker. - -Both `coff' and `win32' default to 32-bit assembly mode. - -`obj' and `os2': Microsoft 16-bit Object Module Format ------------------------------------------------------- - -The `obj' format generates 16-bit Microsoft object files, suitable -for feeding to 16-bit versions of Microsoft C, and probably -TLINK as well (although that hasn't been tested). The Use32 -extensions are supported. - -`obj' defines no special segment names: you can call segments what -you like. Unlike the other formats, too, segment names are actually -defined as symbols, so you can write - - segment CODE - mov ax,CODE - -and get the _segment_ address of the segment, suitable for loading -into a segment register. - -Segments can be declared with attributes: - - SEGMENT CODE PRIVATE ALIGN=16 CLASS=CODE OVERLAY=OVL2 USE16 - -You can specify segments to be PRIVATE, PUBLIC, COMMON or STACK; -their alignment may be any power of two from 1 to 256 (although only -1, 2, 4, 16 and 256 are really supported, so anything else gets -rounded up to the next highest one of those); their class and -overlay names may be specified. You may also specify segments to be -USE16 or USE32. The defaults are PUBLIC ALIGN=1, no class, no -alignment, USE16. - -You can also specify that a segment is _absolute_ at a certain -segment address: - - SEGMENT SCREEN ABSOLUTE=0xB800 - -The ABSOLUTE and ALIGN keywords are mutually exclusive. - -The format-specific directive GROUP allows segment grouping: `GROUP -DGROUP DATA BSS' defines the group DGROUP to contain segments DATA -and BSS. - -Segments are defined as part of their group by default: if variable -`var' is declared in segment `data', which is part of group -`dgroup', then the expression `SEG var' is equivalent to the -expression `dgroup', and the expression `var' evaluates to the -offset of the variable `var' relative to the beginning of the group -`dgroup'. You must use the expression `var WRT data' to get the -offset of the variable `var' relative to the beginning of its -_segment_. - -NASM allows a segment to be part of more than one group (like A86, -and unlike TASM), but will generate a warning (unlike A86!). -References to the symbols in that segment will be resolved relative -to the _first_ group it is defined in. - -The directive `UPPERCASE' causes all symbol, segment and group names -output to the object file to be uppercased. The actual _assembly_ is -still case sensitive. - -To avoid getting tangled up in NASM's local label mechanism, segment -and group names have leading periods stripped when they are defined. -Thus, the directive `SEGMENT .text' will define a segment called -`text', which will clash with any other symbol called `text', and -you will _not_ be able to reference the segment base as `.text', but -only as `text'. - -Common variables in OBJ files can be `near' or `far': currently, -NASM has a horribly grotty way to support that, which is that if you -specify the common variable's size as negative, it will be near, and -otherwise it will be far. The support isn't perfect: if you declare -a far common variable both in a NASM assembly module and in a C -program, you may well find the linker reports "mismatch in -array-size" or some such. The reason for this is that far common -variables are defined by means of _two_ size constants, which are -multiplied to give the real size. Apparently the Microsoft linker -(at least) likes both constants, not merely their product, to match -up. This may be fixed in a future release. - -If the module you're writing is intended to contain the program -entry point, you can declare this by defining the special label -`..start' at the start point, either as a label or by EQU (although -of course the normal caveats about EQU dependency still apply). - -`obj' has an unusual handling of assembly modes: instead of having a -global default for the whole file, there is a separate default for -each segment. Thus, each SEGMENT directive carries an implicit BITS -directive with it, which switches to 16-bit or 32-bit mode depending -on whether the segment is a Use16 or Use32 segment. If you want to -place 32-bit code in a Use16 segment, you can use an explicit `BITS -32' override, but if you switch temporarily away from that segment, -you will have to repeat the override after coming back to it. - -If you're trying to build a .COM application by linking several .OBJ -files together, you need to put `resb 0x100' at the front of the -code segment in the first object file, since otherwise the linker -will get the linking wrong. - -OS/2 uses an almost exactly similar file format to DOS, with a -couple of differences, principally that OS/2 defines a pseudo-group -called FLAT, containing no segments, and every relocation is made -relative to that (so it would be equivalent to writing `label WRT -FLAT' in place of `label' _throughout_ your code). Since this would -be inconvenient to write code for, NASM implements the `os2' variant -on `obj', which provides this FLAT group itself and automatically -makes the default relocation format relative to FLAT. - -NOTE TO OS/2 USERS: The OS/2 output format is new in NASM version -0.95. It hasn't been tested on any actual OS/2 systems, and I don't -know for sure that it'll work properly. Any OS/2 users are -encouraged to give it a thorough testing and report the results to -me. Thanks! - -`as86': Linux as86 (bin86-0.3) ------------------------------- - -This output format attempts to replicate the format used to pass -data between the Linux x86 assembler and linker, as86 and ld86. Its -default file name, yet again, is `filename.o'. Its default -segment-size attribute is 16 bits. - -`rdf': Relocatable Dynamic Object File Format ---------------------------------------------- - -RDOFF was designed initially to test the object-file production -interface to NASM. It soon became apparent that it could be enhanced -for use in serious applications due to its simplicity; code to load -and execute an RDOFF object module is very simple. It also contains -enhancements to allow it to be linked with a dynamic link library at -either run- or load- time, depending on how complex you wish to make -your loader. - -The `rdoff' directory in the NASM distribution archive contains -source for an RDF linker and loader to run under Linux. - -`rdf' has a default segment-size attribute of 32 bits. - -Debugging format: `dbg' ------------------------ - -This output format is not built into NASM by default: it's for -debugging purposes. It produces a debug dump of everything that the -NASM assembly module feeds to the output driver, for the benefit of -people trying to write their own output drivers. - -Common Problems -=============== - -A few problems that people repeatedly ask me about are documented -here. - -NASM's design philosophy of generating exactly the code the -programmer asks for, without second-guessing or re-interpreting, has -been known to cause confusion in a couple of areas. - -Firstly, several people have complained that instructions such as -`add esp,4' are assembled in a form that allocates a full four-byte -offset field to store the `4' in, even though the instruction has a -shorter form with a single-byte offset field which would work in -this case. The answer is that NASM by design doesn't try to guess -which one of these forms you want: if you want one, you code one, -and if you want the other, you code the other. The other form is -`add esp, byte 4'. - -Secondly, and similarly, I've had repeated questions about -conditional jumps. The simple `jne label', in NASM, translates -directly to the old 8086 form of the conditional jump, in which the -offset can be up to 128 bytes (or thereabouts) in either direction. -NASM won't automatically generate `je $+3 / jmp label' for labels -that are further away, and neither will it generate the 386 long- -offset form of the instruction. If you want the 386-specific -conditional jump that's capable of reaching anywhere in the same -segment as the jump instruction, you want `jne near label'. If you -want an 8086-compatible `je' over another `jmp', code one -explicitly, or define a macro to do so. NASM doesn't do either of -these things for you, again by design. - -Bugs -==== - -Apart from the missing features (correct OBJ COMMON support, ELF -alignment, ELF PIC support, etc.), there are no _known_ bugs. -However, any you find, with patches if possible, should be sent to -<jules@earthcorp.com> or <anakin@pobox.com>, and we'll try to fix -them. - -Beware of Pentium-specific instructions: Intel have provided a macro -file for MASM, to implement the eight or nine new Pentium opcodes as -MASM macros. NASM does not generate the same code for the CMPXCHG8B -instruction as these macros do: this is due to a bug in the _macro_, -not in NASM. The macro works by generating an SIDT instruction (if I -remember rightly), which has almost exactly the right form, then -using ORG to back up a bit and do a DB over the top of one of the -opcode bytes. The trouble is that Intel overlooked (or MASM syntax -didn't let them allow for) the possibility that the SIDT instruction -may contain an 0x66 or 0x67 operand or address size prefix. If this -happens, the ORG will back up by the wrong amount, and the macro -will generate incorrect code. NASM gets it right. This, also, is not -a bug in NASM, so please don't report it as one. (Also please note -that the ORG directive in NASM doesn't work this way, and so you -can't do equivalent tricks with it...) - -That's All Folks! -================= - -Enjoy using NASM! Please feel free to send me comments, or -constructive criticism, or bug fixes, or requests, or general chat. - -Contributions are also welcome: if anyone knows anything about any -other object file formats I should support, please feel free to send -me documentation and some short example files (in my experience, -documentation is useless without at _least_ one example), or even to -write me an output module. OS/2 object files, in particular, spring -to mind. I don't have OS/2, though. - -Please keep flames to a minimum: I have had some very angry e-mails -in the past, condemning me for writing a useless assembler, that -output in no useful format (at the time, that was true), generated -incorrect code (several typos in the instruction table, since fixed) -and took up too much memory and disk space (the price you pay for -total portability, it seems). All these were criticisms I was happy -to hear, but I didn't appreciate the flames that went with them. -NASM _is_ still a prototype, and you use it at your own risk. I -_think_ it works, and if it doesn't then I want to know about it, -but I don't guarantee anything. So don't flame me, please. Blame, -but don't flame. - -- Simon Tatham <anakin@pobox.com>, 21-Nov-96 |