diff options
Diffstat (limited to 'lispref/syntax.texi')
-rw-r--r-- | lispref/syntax.texi | 723 |
1 files changed, 0 insertions, 723 deletions
diff --git a/lispref/syntax.texi b/lispref/syntax.texi deleted file mode 100644 index 585df47580a..00000000000 --- a/lispref/syntax.texi +++ /dev/null @@ -1,723 +0,0 @@ -@c -*-texinfo-*- -@c This is part of the GNU Emacs Lisp Reference Manual. -@c Copyright (C) 1990, 1991, 1992, 1993, 1994 Free Software Foundation, Inc. -@c See the file elisp.texi for copying conditions. -@setfilename ../info/syntax -@node Syntax Tables, Abbrevs, Searching and Matching, Top -@chapter Syntax Tables -@cindex parsing -@cindex syntax table -@cindex text parsing - - A @dfn{syntax table} specifies the syntactic textual function of each -character. This information is used by the parsing commands, the -complex movement commands, and others to determine where words, symbols, -and other syntactic constructs begin and end. The current syntax table -controls the meaning of the word motion functions (@pxref{Word Motion}) -and the list motion functions (@pxref{List Motion}) as well as the -functions in this chapter. - -@menu -* Basics: Syntax Basics. Basic concepts of syntax tables. -* Desc: Syntax Descriptors. How characters are classified. -* Syntax Table Functions:: How to create, examine and alter syntax tables. -* Motion and Syntax:: Moving over characters with certain syntaxes. -* Parsing Expressions:: Parsing balanced expressions - using the syntax table. -* Standard Syntax Tables:: Syntax tables used by various major modes. -* Syntax Table Internals:: How syntax table information is stored. -@end menu - -@node Syntax Basics -@section Syntax Table Concepts - -@ifinfo - A @dfn{syntax table} provides Emacs with the information that -determines the syntactic use of each character in a buffer. This -information is used by the parsing commands, the complex movement -commands, and others to determine where words, symbols, and other -syntactic constructs begin and end. The current syntax table controls -the meaning of the word motion functions (@pxref{Word Motion}) and the -list motion functions (@pxref{List Motion}) as well as the functions in -this chapter. -@end ifinfo - - A syntax table is a vector of 256 elements; it contains one entry for -each of the 256 possible characters in an 8-bit byte. Each element is -an integer that encodes the syntax of the character in question. - - Syntax tables are used only for moving across text, not for the Emacs -Lisp reader. Emacs Lisp uses built-in syntactic rules when reading Lisp -expressions, and these rules cannot be changed. - - Each buffer has its own major mode, and each major mode has its own -idea of the syntactic class of various characters. For example, in Lisp -mode, the character @samp{;} begins a comment, but in C mode, it -terminates a statement. To support these variations, Emacs makes the -choice of syntax table local to each buffer. Typically, each major -mode has its own syntax table and installs that table in each buffer -that uses that mode. Changing this table alters the syntax in all -those buffers as well as in any buffers subsequently put in that mode. -Occasionally several similar modes share one syntax table. -@xref{Example Major Modes}, for an example of how to set up a syntax -table. - -A syntax table can inherit the data for some characters from the -standard syntax table, while specifying other characters itself. The -``inherit'' syntax class means ``inherit this character's syntax from -the standard syntax table.'' Most major modes' syntax tables inherit -the syntax of character codes 0 through 31 and 128 through 255. This is -useful with character sets such as ISO Latin-1 that have additional -alphabetic characters in the range 128 to 255. Just changing the -standard syntax for these characters affects all major modes. - -@defun syntax-table-p object -This function returns @code{t} if @var{object} is a vector of length 256 -elements. This means that the vector may be a syntax table. However, -according to this test, any vector of length 256 is considered to be a -syntax table, no matter what its contents. -@end defun - -@node Syntax Descriptors -@section Syntax Descriptors -@cindex syntax classes - - This section describes the syntax classes and flags that denote the -syntax of a character, and how they are represented as a @dfn{syntax -descriptor}, which is a Lisp string that you pass to -@code{modify-syntax-entry} to specify the desired syntax. - - Emacs defines a number of @dfn{syntax classes}. Each syntax table -puts each character into one class. There is no necessary relationship -between the class of a character in one syntax table and its class in -any other table. - - Each class is designated by a mnemonic character, which serves as the -name of the class when you need to specify a class. Usually the -designator character is one that is frequently in that class; however, -its meaning as a designator is unvarying and independent of what syntax -that character currently has. - -@cindex syntax descriptor - A syntax descriptor is a Lisp string that specifies a syntax class, a -matching character (used only for the parenthesis classes) and flags. -The first character is the designator for a syntax class. The second -character is the character to match; if it is unused, put a space there. -Then come the characters for any desired flags. If no matching -character or flags are needed, one character is sufficient. - - For example, the descriptor for the character @samp{*} in C mode is -@samp{@w{. 23}} (i.e., punctuation, matching character slot unused, -second character of a comment-starter, first character of an -comment-ender), and the entry for @samp{/} is @samp{@w{. 14}} (i.e., -punctuation, matching character slot unused, first character of a -comment-starter, second character of a comment-ender). - -@menu -* Syntax Class Table:: Table of syntax classes. -* Syntax Flags:: Additional flags each character can have. -@end menu - -@node Syntax Class Table -@subsection Table of Syntax Classes - - Here is a table of syntax classes, the characters that stand for them, -their meanings, and examples of their use. - -@deffn {Syntax class} @w{whitespace character} -@dfn{Whitespace characters} (designated with @w{@samp{@ }} or @samp{-}) -separate symbols and words from each other. Typically, whitespace -characters have no other syntactic significance, and multiple whitespace -characters are syntactically equivalent to a single one. Space, tab, -newline and formfeed are almost always classified as whitespace. -@end deffn - -@deffn {Syntax class} @w{word constituent} -@dfn{Word constituents} (designated with @samp{w}) are parts of normal -English words and are typically used in variable and command names in -programs. All upper- and lower-case letters, and the digits, are typically -word constituents. -@end deffn - -@deffn {Syntax class} @w{symbol constituent} -@dfn{Symbol constituents} (designated with @samp{_}) are the extra -characters that are used in variable and command names along with word -constituents. For example, the symbol constituents class is used in -Lisp mode to indicate that certain characters may be part of symbol -names even though they are not part of English words. These characters -are @samp{$&*+-_<>}. In standard C, the only non-word-constituent -character that is valid in symbols is underscore (@samp{_}). -@end deffn - -@deffn {Syntax class} @w{punctuation character} -@dfn{Punctuation characters} (@samp{.}) are those characters that are -used as punctuation in English, or are used in some way in a programming -language to separate symbols from one another. Most programming -language modes, including Emacs Lisp mode, have no characters in this -class since the few characters that are not symbol or word constituents -all have other uses. -@end deffn - -@deffn {Syntax class} @w{open parenthesis character} -@deffnx {Syntax class} @w{close parenthesis character} -@cindex parenthesis syntax -Open and close @dfn{parenthesis characters} are characters used in -dissimilar pairs to surround sentences or expressions. Such a grouping -is begun with an open parenthesis character and terminated with a close. -Each open parenthesis character matches a particular close parenthesis -character, and vice versa. Normally, Emacs indicates momentarily the -matching open parenthesis when you insert a close parenthesis. -@xref{Blinking}. - -The class of open parentheses is designated with @samp{(}, and that of -close parentheses with @samp{)}. - -In English text, and in C code, the parenthesis pairs are @samp{()}, -@samp{[]}, and @samp{@{@}}. In Emacs Lisp, the delimiters for lists and -vectors (@samp{()} and @samp{[]}) are classified as parenthesis -characters. -@end deffn - -@deffn {Syntax class} @w{string quote} -@dfn{String quote characters} (designated with @samp{"}) are used in -many languages, including Lisp and C, to delimit string constants. The -same string quote character appears at the beginning and the end of a -string. Such quoted strings do not nest. - -The parsing facilities of Emacs consider a string as a single token. -The usual syntactic meanings of the characters in the string are -suppressed. - -The Lisp modes have two string quote characters: double-quote (@samp{"}) -and vertical bar (@samp{|}). @samp{|} is not used in Emacs Lisp, but it -is used in Common Lisp. C also has two string quote characters: -double-quote for strings, and single-quote (@samp{'}) for character -constants. - -English text has no string quote characters because English is not a -programming language. Although quotation marks are used in English, -we do not want them to turn off the usual syntactic properties of -other characters in the quotation. -@end deffn - -@deffn {Syntax class} @w{escape} -An @dfn{escape character} (designated with @samp{\}) starts an escape -sequence such as is used in C string and character constants. The -character @samp{\} belongs to this class in both C and Lisp. (In C, it -is used thus only inside strings, but it turns out to cause no trouble -to treat it this way throughout C code.) - -Characters in this class count as part of words if -@code{words-include-escapes} is non-@code{nil}. @xref{Word Motion}. -@end deffn - -@deffn {Syntax class} @w{character quote} -A @dfn{character quote character} (designated with @samp{/}) quotes the -following character so that it loses its normal syntactic meaning. This -differs from an escape character in that only the character immediately -following is ever affected. - -Characters in this class count as part of words if -@code{words-include-escapes} is non-@code{nil}. @xref{Word Motion}. - -This class is used for backslash in @TeX{} mode. -@end deffn - -@deffn {Syntax class} @w{paired delimiter} -@dfn{Paired delimiter characters} (designated with @samp{$}) are like -string quote characters except that the syntactic properties of the -characters between the delimiters are not suppressed. Only @TeX{} mode -uses a paired delimiter presently---the @samp{$} that both enters and -leaves math mode. -@end deffn - -@deffn {Syntax class} @w{expression prefix} -An @dfn{expression prefix operator} (designated with @samp{'}) is used -for syntactic operators that are part of an expression if they appear -next to one. These characters in Lisp include the apostrophe, @samp{'} -(used for quoting), the comma, @samp{,} (used in macros), and @samp{#} -(used in the read syntax for certain data types). -@end deffn - -@deffn {Syntax class} @w{comment starter} -@deffnx {Syntax class} @w{comment ender} -@cindex comment syntax -The @dfn{comment starter} and @dfn{comment ender} characters are used in -various languages to delimit comments. These classes are designated -with @samp{<} and @samp{>}, respectively. - -English text has no comment characters. In Lisp, the semicolon -(@samp{;}) starts a comment and a newline or formfeed ends one. -@end deffn - -@deffn {Syntax class} @w{inherit} -This syntax class does not specify a syntax. It says to look in the -standard syntax table to find the syntax of this character. The -designator for this syntax code is @samp{@@}. -@end deffn - -@node Syntax Flags -@subsection Syntax Flags -@cindex syntax flags - - In addition to the classes, entries for characters in a syntax table -can include flags. There are six possible flags, represented by the -characters @samp{1}, @samp{2}, @samp{3}, @samp{4}, @samp{b} and -@samp{p}. - - All the flags except @samp{p} are used to describe multi-character -comment delimiters. The digit flags indicate that a character can -@emph{also} be part of a comment sequence, in addition to the syntactic -properties associated with its character class. The flags are -independent of the class and each other for the sake of characters such -as @samp{*} in C mode, which is a punctuation character, @emph{and} the -second character of a start-of-comment sequence (@samp{/*}), @emph{and} -the first character of an end-of-comment sequence (@samp{*/}). - -The flags for a character @var{c} are: - -@itemize @bullet -@item -@samp{1} means @var{c} is the start of a two-character comment-start -sequence. - -@item -@samp{2} means @var{c} is the second character of such a sequence. - -@item -@samp{3} means @var{c} is the start of a two-character comment-end -sequence. - -@item -@samp{4} means @var{c} is the second character of such a sequence. - -@item -@c Emacs 19 feature -@samp{b} means that @var{c} as a comment delimiter belongs to the -alternative ``b'' comment style. - -Emacs supports two comment styles simultaneously in any one syntax -table. This is for the sake of C++. Each style of comment syntax has -its own comment-start sequence and its own comment-end sequence. Each -comment must stick to one style or the other; thus, if it starts with -the comment-start sequence of style ``b'', it must also end with the -comment-end sequence of style ``b''. - -The two comment-start sequences must begin with the same character; only -the second character may differ. Mark the second character of the -``b''-style comment-start sequence with the @samp{b} flag. - -A comment-end sequence (one or two characters) applies to the ``b'' -style if its first character has the @samp{b} flag set; otherwise, it -applies to the ``a'' style. - -The appropriate comment syntax settings for C++ are as follows: - -@table @asis -@item @samp{/} -@samp{124b} -@item @samp{*} -@samp{23} -@item newline -@samp{>b} -@end table - -This defines four comment-delimiting sequences: - -@table @asis -@item @samp{/*} -This is a comment-start sequence for ``a'' style because the -second character, @samp{*}, does not have the @samp{b} flag. - -@item @samp{//} -This is a comment-start sequence for ``b'' style because the second -character, @samp{/}, does have the @samp{b} flag. - -@item @samp{*/} -This is a comment-end sequence for ``a'' style because the first -character, @samp{*}, does not have the @samp{b} flag - -@item newline -This is a comment-end sequence for ``b'' style, because the newline -character has the @samp{b} flag. -@end table - -@item -@c Emacs 19 feature -@samp{p} identifies an additional ``prefix character'' for Lisp syntax. -These characters are treated as whitespace when they appear between -expressions. When they appear within an expression, they are handled -according to their usual syntax codes. - -The function @code{backward-prefix-chars} moves back over these -characters, as well as over characters whose primary syntax class is -prefix (@samp{'}). @xref{Motion and Syntax}. -@end itemize - -@node Syntax Table Functions -@section Syntax Table Functions - - In this section we describe functions for creating, accessing and -altering syntax tables. - -@defun make-syntax-table -This function creates a new syntax table. Character codes 0 through -31 and 128 through 255 are set up to inherit from the standard syntax -table. The other character codes are set up by copying what the -standard syntax table says about them. - -Most major mode syntax tables are created in this way. -@end defun - -@defun copy-syntax-table &optional table -This function constructs a copy of @var{table} and returns it. If -@var{table} is not supplied (or is @code{nil}), it returns a copy of the -current syntax table. Otherwise, an error is signaled if @var{table} is -not a syntax table. -@end defun - -@deffn Command modify-syntax-entry char syntax-descriptor &optional table -This function sets the syntax entry for @var{char} according to -@var{syntax-descriptor}. The syntax is changed only for @var{table}, -which defaults to the current buffer's syntax table, and not in any -other syntax table. The argument @var{syntax-descriptor} specifies the -desired syntax; this is a string beginning with a class designator -character, and optionally containing a matching character and flags as -well. @xref{Syntax Descriptors}. - -This function always returns @code{nil}. The old syntax information in -the table for this character is discarded. - -An error is signaled if the first character of the syntax descriptor is not -one of the twelve syntax class designator characters. An error is also -signaled if @var{char} is not a character. - -@example -@group -@exdent @r{Examples:} - -;; @r{Put the space character in class whitespace.} -(modify-syntax-entry ?\ " ") - @result{} nil -@end group - -@group -;; @r{Make @samp{$} an open parenthesis character,} -;; @r{with @samp{^} as its matching close.} -(modify-syntax-entry ?$ "(^") - @result{} nil -@end group - -@group -;; @r{Make @samp{^} a close parenthesis character,} -;; @r{with @samp{$} as its matching open.} -(modify-syntax-entry ?^ ")$") - @result{} nil -@end group - -@group -;; @r{Make @samp{/} a punctuation character,} -;; @r{the first character of a start-comment sequence,} -;; @r{and the second character of an end-comment sequence.} -;; @r{This is used in C mode.} -(modify-syntax-entry ?/ ". 14") - @result{} nil -@end group -@end example -@end deffn - -@defun char-syntax character -This function returns the syntax class of @var{character}, represented -by its mnemonic designator character. This @emph{only} returns the -class, not any matching parenthesis or flags. - -An error is signaled if @var{char} is not a character. - -The following examples apply to C mode. The first example shows that -the syntax class of space is whitespace (represented by a space). The -second example shows that the syntax of @samp{/} is punctuation. This -does not show the fact that it is also part of comment-start and -end -sequences. The third example shows that open parenthesis is in the class -of open parentheses. This does not show the fact that it has a matching -character, @samp{)}. - -@example -@group -(char-to-string (char-syntax ?\ )) - @result{} " " -@end group - -@group -(char-to-string (char-syntax ?/)) - @result{} "." -@end group - -@group -(char-to-string (char-syntax ?\()) - @result{} "(" -@end group -@end example -@end defun - -@defun set-syntax-table table -This function makes @var{table} the syntax table for the current buffer. -It returns @var{table}. -@end defun - -@defun syntax-table -This function returns the current syntax table, which is the table for -the current buffer. -@end defun - -@node Motion and Syntax -@section Motion and Syntax - - This section describes functions for moving across characters in -certain syntax classes. None of these functions exists in Emacs -version 18 or earlier. - -@defun skip-syntax-forward syntaxes &optional limit -This function moves point forward across characters having syntax classes -mentioned in @var{syntaxes}. It stops when it encounters the end of -the buffer, or position @var{limit} (if specified), or a character it is -not supposed to skip. -@ignore @c may want to change this. -The return value is the distance traveled, which is a nonnegative -integer. -@end ignore -@end defun - -@defun skip-syntax-backward syntaxes &optional limit -This function moves point backward across characters whose syntax -classes are mentioned in @var{syntaxes}. It stops when it encounters -the beginning of the buffer, or position @var{limit} (if specified), or a -character it is not supposed to skip. -@ignore @c may want to change this. -The return value indicates the distance traveled. It is an integer that -is zero or less. -@end ignore -@end defun - -@defun backward-prefix-chars -This function moves point backward over any number of characters with -expression prefix syntax. This includes both characters in the -expression prefix syntax class, and characters with the @samp{p} flag. -@end defun - -@node Parsing Expressions -@section Parsing Balanced Expressions - - Here are several functions for parsing and scanning balanced -expressions, also known as @dfn{sexps}, in which parentheses match in -pairs. The syntax table controls the interpretation of characters, so -these functions can be used for Lisp expressions when in Lisp mode and -for C expressions when in C mode. @xref{List Motion}, for convenient -higher-level functions for moving over balanced expressions. - -@defun parse-partial-sexp start limit &optional target-depth stop-before state stop-comment -This function parses a sexp in the current buffer starting at -@var{start}, not scanning past @var{limit}. It stops at position -@var{limit} or when certain criteria described below are met, and sets -point to the location where parsing stops. It returns a value -describing the status of the parse at the point where it stops. - -If @var{state} is @code{nil}, @var{start} is assumed to be at the top -level of parenthesis structure, such as the beginning of a function -definition. Alternatively, you might wish to resume parsing in the -middle of the structure. To do this, you must provide a @var{state} -argument that describes the initial status of parsing. - -@cindex parenthesis depth -If the third argument @var{target-depth} is non-@code{nil}, parsing -stops if the depth in parentheses becomes equal to @var{target-depth}. -The depth starts at 0, or at whatever is given in @var{state}. - -If the fourth argument @var{stop-before} is non-@code{nil}, parsing -stops when it comes to any character that starts a sexp. If -@var{stop-comment} is non-@code{nil}, parsing stops when it comes to the -start of a comment. - -@cindex parse state -The fifth argument @var{state} is an eight-element list of the same -form as the value of this function, described below. The return value -of one call may be used to initialize the state of the parse on another -call to @code{parse-partial-sexp}. - -The result is a list of eight elements describing the final state of -the parse: - -@enumerate 0 -@item -The depth in parentheses, counting from 0. - -@item -@cindex innermost containing parentheses -The character position of the start of the innermost parenthetical -grouping containing the stopping point; @code{nil} if none. - -@item -@cindex previous complete subexpression -The character position of the start of the last complete subexpression -terminated; @code{nil} if none. - -@item -@cindex inside string -Non-@code{nil} if inside a string. More precisely, this is the -character that will terminate the string. - -@item -@cindex inside comment -@code{t} if inside a comment (of either style). - -@item -@cindex quote character -@code{t} if point is just after a quote character. - -@item -The minimum parenthesis depth encountered during this scan. - -@item -@code{t} if inside a comment of style ``b''. -@end enumerate - -Elements 0, 3, 4, 5 and 7 are significant in the argument @var{state}. - -@cindex indenting with parentheses -This function is most often used to compute indentation for languages -that have nested parentheses. -@end defun - -@defun scan-lists from count depth -This function scans forward @var{count} balanced parenthetical groupings -from character number @var{from}. It returns the character position -where the scan stops. - -If @var{depth} is nonzero, parenthesis depth counting begins from that -value. The only candidates for stopping are places where the depth in -parentheses becomes zero; @code{scan-lists} counts @var{count} such -places and then stops. Thus, a positive value for @var{depth} means go -out @var{depth} levels of parenthesis. - -Scanning ignores comments if @code{parse-sexp-ignore-comments} is -non-@code{nil}. - -If the scan reaches the beginning or end of the buffer (or its -accessible portion), and the depth is not zero, an error is signaled. -If the depth is zero but the count is not used up, @code{nil} is -returned. -@end defun - -@defun scan-sexps from count -This function scans forward @var{count} sexps from character position -@var{from}. It returns the character position where the scan stops. - -Scanning ignores comments if @code{parse-sexp-ignore-comments} is -non-@code{nil}. - -If the scan reaches the beginning or end of (the accessible part of) the -buffer in the middle of a parenthetical grouping, an error is signaled. -If it reaches the beginning or end between groupings but before count is -used up, @code{nil} is returned. -@end defun - -@defvar parse-sexp-ignore-comments -@cindex skipping comments -If the value is non-@code{nil}, then comments are treated as -whitespace by the functions in this section and by @code{forward-sexp}. - -In older Emacs versions, this feature worked only when the comment -terminator is something like @samp{*/}, and appears only to end a -comment. In languages where newlines terminate comments, it was -necessary make this variable @code{nil}, since not every newline is the -end of a comment. This limitation no longer exists. -@end defvar - -You can use @code{forward-comment} to move forward or backward over -one comment or several comments. - -@defun forward-comment count -This function moves point forward across @var{count} comments (backward, -if @var{count} is negative). If it finds anything other than a comment -or whitespace, it stops, leaving point at the place where it stopped. -It also stops after satisfying @var{count}. -@end defun - -To move forward over all comments and whitespace following point, use -@code{(forward-comment (buffer-size))}. @code{(buffer-size)} is a good -argument to use, because the number of comments in the buffer cannot -exceed that many. - -@node Standard Syntax Tables -@section Some Standard Syntax Tables - - Most of the major modes in Emacs have their own syntax tables. Here -are several of them: - -@defun standard-syntax-table -This function returns the standard syntax table, which is the syntax -table used in Fundamental mode. -@end defun - -@defvar text-mode-syntax-table -The value of this variable is the syntax table used in Text mode. -@end defvar - -@defvar c-mode-syntax-table -The value of this variable is the syntax table for C-mode buffers. -@end defvar - -@defvar emacs-lisp-mode-syntax-table -The value of this variable is the syntax table used in Emacs Lisp mode -by editing commands. (It has no effect on the Lisp @code{read} -function.) -@end defvar - -@node Syntax Table Internals -@section Syntax Table Internals -@cindex syntax table internals - - Each element of a syntax table is an integer that encodes the syntax -of one character: the syntax class, possible matching character, and -flags. Lisp programs don't usually work with the elements directly; the -Lisp-level syntax table functions usually work with syntax descriptors -(@pxref{Syntax Descriptors}). - - The low 8 bits of each element of a syntax table indicate the -syntax class. - -@table @asis -@item @i{Integer} -@i{Class} -@item 0 -whitespace -@item 1 -punctuation -@item 2 -word -@item 3 -symbol -@item 4 -open parenthesis -@item 5 -close parenthesis -@item 6 -expression prefix -@item 7 -string quote -@item 8 -paired delimiter -@item 9 -escape -@item 10 -character quote -@item 11 -comment-start -@item 12 -comment-end -@item 13 -inherit -@end table - - The next 8 bits are the matching opposite parenthesis (if the -character has parenthesis syntax); otherwise, they are not meaningful. -The next 6 bits are the flags. |