summaryrefslogtreecommitdiff
path: root/stdlib/scanf.mli
diff options
context:
space:
mode:
Diffstat (limited to 'stdlib/scanf.mli')
-rw-r--r--stdlib/scanf.mli281
1 files changed, 154 insertions, 127 deletions
diff --git a/stdlib/scanf.mli b/stdlib/scanf.mli
index 38cbad8656..1e8a744840 100644
--- a/stdlib/scanf.mli
+++ b/stdlib/scanf.mli
@@ -232,14 +232,21 @@ val bscanf : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner;;
(** {6 Format string description} *)
-(** The format is a character string which contains three types of
+(** The format string is a character string which contains three types of
objects:
- plain characters, which are simply matched with the characters of the
input (with a special case for space and line feed, see {!Scanf.space}),
- conversion specifications, each of which causes reading and conversion of
one argument for the function [f] (see {!Scanf.conversion}),
- scanning indications to specify boundaries of tokens
- (see scanning {!Scanf.indication}). *)
+ (see scanning {!Scanf.indication}).
+
+ As a special convention for format strings, the [\@] character introduces
+ an escape for both characters [\@] and [%]: in a format string,
+ [\@\@] and [\@%] are respectively equivalent to the plain characters [\@]
+ and [%].
+ @since 3.13
+*)
(** {7:space The space character in format strings} *)
@@ -262,139 +269,153 @@ val bscanf : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner;;
(** {7:conversion Conversion specifications in format strings} *)
-(** Conversion specifications consist in the [%] character, followed by
- an optional flag, an optional field width, and followed by one or
- two conversion characters. The conversion characters and their
- meanings are:
-
- - [d]: reads an optionally signed decimal integer.
- - [i]: reads an optionally signed integer
- (usual input conventions for decimal ([0-9]+), hexadecimal
- ([0x[0-9a-f]+] and [0X[0-9A-F]+]), octal ([0o[0-7]+]), and binary
- ([0b[0-1]+]) notations are understood).
- - [u]: reads an unsigned decimal integer.
- - [x] or [X]: reads an unsigned hexadecimal integer ([[0-9a-f]+] or [[0-9A-F]+]).
- - [o]: reads an unsigned octal integer ([[0-7]+]).
- - [s]: reads a string argument that spreads as much as possible, until the
- following bounding condition holds: {ul
- {- a whitespace has been found (see {!Scanf.space}),}
- {- a scanning indication (see scanning {!Scanf.indication}) has been
- encountered,}
- {- the end-of-input has been reached.}}
- Hence, this conversion always succeeds: it returns an empty
- string, if the bounding condition holds when the scan begins.
- - [S]: reads a delimited string argument (delimiters and special
- escaped characters follow the lexical conventions of Caml).
- - [c]: reads a single character. To test the current input character
- without reading it, specify a null field width, i.e. use
- specification [%0c]. Raise [Invalid_argument], if the field width
- specification is greater than 1.
- - [C]: reads a single delimited character (delimiters and special
- escaped characters follow the lexical conventions of Caml).
- - [f], [e], [E], [g], [G]: reads an optionally signed
- floating-point number in decimal notation, in the style [dddd.ddd
- e/E+-dd].
- - [F]: reads a floating point number according to the lexical
- conventions of Caml (hence the decimal point is mandatory if the
- exponent part is not mentioned).
- - [B]: reads a boolean argument ([true] or [false]).
- - [b]: reads a boolean argument (for backward compatibility; do not use
- in new programs).
- - [ld], [li], [lu], [lx], [lX], [lo]: reads an [int32] argument to
- the format specified by the second letter for regular integers.
- - [nd], [ni], [nu], [nx], [nX], [no]: reads a [nativeint] argument to
- the format specified by the second letter for regular integers.
- - [Ld], [Li], [Lu], [Lx], [LX], [Lo]: reads an [int64] argument to
- the format specified by the second letter for regular integers.
- - [\[ range \]]: reads characters that matches one of the characters
- mentioned in the range of characters [range] (or not mentioned in
- it, if the range starts with [^]). Reads a [string] that can be
- empty, if the next input character does not match the range. The set of
- characters from [c1] to [c2] (inclusively) is denoted by [c1-c2].
- Hence, [%\[0-9\]] returns a string representing a decimal number
- or an empty string if no decimal digit is found; similarly,
- [%\[\\048-\\057\\065-\\070\]] returns a string of hexadecimal digits.
- If a closing bracket appears in a range, it must occur as the
- first character of the range (or just after the [^] in case of
- range negation); hence [\[\]\]] matches a [\]] character and
- [\[^\]\]] matches any character that is not [\]].
- - [r]: user-defined reader. Takes the next [ri] formatted input function and
- applies it to the scanning buffer [ib] to read the next argument. The
- input function [ri] must therefore have type [Scanning.in_channel -> 'a] and
- the argument read has type ['a].
- - [\{ fmt %\}]: reads a format string argument.
- The format string read must have the same type as the format string
- specification [fmt].
- For instance, ["%{ %i %}"] reads any format string that can read a value of
- type [int]; hence, if [s] is the string ["fmt:\"number is %u\""], then
- [Scanf.sscanf s "fmt: %{%i%}"] succeeds and returns the format string
- ["number is %u"].
- - [\( fmt %\)]: scanning format substitution.
- Reads a format string and then goes on scanning with the format string
- read, instead of using [fmt].
- The format string read must have the same type as the format string
- specification [fmt] that it replaces.
- For instance, ["%( %i %)"] reads any format string that can read a value
- of type [int].
- Returns the format string read, and the value read using the format
- string read.
- Hence, if [s] is the string ["\"%4d\"1234.00"], then
- [Scanf.sscanf s "%(%i%)" (fun fmt i -> fmt, i)] evaluates to
- [("%4d", 1234)].
- If the special flag [_] is used, the conversion discards the
- format string read and only returns the value read with the format
- string read.
- Hence, if [s] is the string ["\"%4d\"1234.00"], then
- [Scanf.sscanf s "%_(%i%)"] is simply equivalent to
- [Scanf.sscanf "1234.00" "%4d"].
- - [l]: returns the number of lines read so far.
- - [n]: returns the number of characters read so far.
- - [N] or [L]: returns the number of tokens read so far.
- - [!]: matches the end of input condition.
- - [%]: matches one [%] character in the input.
- - [,]: the no-op delimiter for conversion specifications.
-
- Following the [%] character that introduces a conversion, there may be
- the special flag [_]: the conversion that follows occurs as usual,
- but the resulting value is discarded.
- For instance, if [f] is the function [fun i -> i + 1], and [s] is the
- string ["x = 1"], then [Scanf.sscanf s "%_s = %i" f] returns [2].
-
- The field width is composed of an optional integer literal
- indicating the maximal width of the token to read.
- For instance, [%6d] reads an integer, having at most 6 decimal digits;
- [%4f] reads a float with at most 4 characters; and [%8\[\\000-\\255\]]
- returns the next 8 characters (or all the characters still available,
- if fewer than 8 characters are available in the input).
-
- Notes:
-
- - as mentioned above, a [%s] conversion always succeeds, even if there is
- nothing to read in the input: in this case, it simply returns [""].
-
- - in addition to the relevant digits, ['_'] characters may appear
- inside numbers (this is reminiscent to the usual Caml lexical
- conventions). If stricter scanning is desired, use the range
- conversion facility instead of the number conversions.
-
- - the [scanf] facility is not intended for heavy duty lexical
- analysis and parsing. If it appears not expressive enough for your
- needs, several alternative exists: regular expressions (module
- [Str]), stream parsers, [ocamllex]-generated lexers,
- [ocamlyacc]-generated parsers. *)
+(** Conversion specifications have the following form:
+
+ [% \[flags\] \[width\] \[.precision\] type]
+
+ In short, a conversion specification consists in the [%] character,
+ followed by optional modifiers, and a type which is made of one or
+ several characters.
+
+ The types and their meanings are:
+
+ - [d]: reads an optionally signed decimal integer.
+ - [i]: reads an optionally signed integer
+ (usual input conventions for decimal ([0-9]+), hexadecimal
+ ([0x[0-9a-f]+] and [0X[0-9A-F]+]), octal ([0o[0-7]+]), and binary
+ ([0b[0-1]+]) notations are understood).
+ - [u]: reads an unsigned decimal integer.
+ - [x] or [X]: reads an unsigned hexadecimal integer ([[0-9a-f]+] or [[0-9A-F]+]).
+ - [o]: reads an unsigned octal integer ([[0-7]+]).
+ - [s]: reads a string argument that spreads as much as possible, until
+ the following bounding conditions holds:
+ {ul
+ {- a whitespace has been found (see {!Scanf.space}),}
+ {- a scanning indication has been encountered
+ (see scanning {!Scanf.indication}),}
+ {- the end-of-input has been reached.}
+ }
+ Hence, the [%s] conversion always succeeds: it returns an empty
+ string, if the bounding condition holds when the scan begins.
+ - [S]: reads a delimited string argument (delimiters and special
+ escaped characters follow the lexical conventions of Caml).
+ - [c]: reads a single character. To test the current input character
+ without reading it, specify a null field width, i.e. use
+ specification [%0c]. Raise [Invalid_argument], if the field width
+ specification is greater than 1.
+ - [C]: reads a single delimited character (delimiters and special
+ escaped characters follow the lexical conventions of Caml).
+ - [f], [e], [E], [g], [G]: reads an optionally signed
+ floating-point number in decimal notation, in the style [dddd.ddd
+ e/E+-dd].
+ - [F]: reads a floating point number according to the lexical
+ conventions of Caml (hence the decimal point is mandatory if the
+ exponent part is not mentioned).
+ - [B]: reads a boolean argument ([true] or [false]).
+ - [b]: reads a boolean argument (for backward compatibility; do not use
+ in new programs).
+ - [ld], [li], [lu], [lx], [lX], [lo]: reads an [int32] argument to
+ the format specified by the second letter for regular integers.
+ - [nd], [ni], [nu], [nx], [nX], [no]: reads a [nativeint] argument to
+ the format specified by the second letter for regular integers.
+ - [Ld], [Li], [Lu], [Lx], [LX], [Lo]: reads an [int64] argument to
+ the format specified by the second letter for regular integers.
+ - [\[ range \]]: reads characters that matches one of the characters
+ mentioned in the range of characters [range] (or not mentioned in
+ it, if the range starts with [^]). Reads a [string] that can be
+ empty, if the next input character does not match the range. The set of
+ characters from [c1] to [c2] (inclusively) is denoted by [c1-c2].
+ Hence, [%\[0-9\]] returns a string representing a decimal number
+ or an empty string if no decimal digit is found; similarly,
+ [%\[\\048-\\057\\065-\\070\]] returns a string of hexadecimal digits.
+ If a closing bracket appears in a range, it must occur as the
+ first character of the range (or just after the [^] in case of
+ range negation); hence [\[\]\]] matches a [\]] character and
+ [\[^\]\]] matches any character that is not [\]].
+ - [r]: user-defined reader. Takes the next [ri] formatted input function and
+ applies it to the scanning buffer [ib] to read the next argument. The
+ input function [ri] must therefore have type [Scanning.in_channel -> 'a] and
+ the argument read has type ['a].
+ - [\{ fmt %\}]: reads a format string argument.
+ The format string read must have the same type as the format string
+ specification [fmt].
+ For instance, ["%{ %i %}"] reads any format string that can read a value of
+ type [int]; hence, if [s] is the string ["fmt:\"number is %u\""], then
+ [Scanf.sscanf s "fmt: %{%i%}"] succeeds and returns the format string
+ ["number is %u"].
+ - [\( fmt %\)]: scanning format substitution.
+ Reads a format string and then goes on scanning with the format string
+ read, instead of using [fmt].
+ The format string read must have the same type as the format string
+ specification [fmt] that it replaces.
+ For instance, ["%( %i %)"] reads any format string that can read a value
+ of type [int].
+ Returns the format string read, and the value read using the format
+ string read.
+ Hence, if [s] is the string ["\"%4d\"1234.00"], then
+ [Scanf.sscanf s "%(%i%)" (fun fmt i -> fmt, i)] evaluates to
+ [("%4d", 1234)].
+ If the special flag [_] is used, the conversion discards the
+ format string read and only returns the value read with the format
+ string read.
+ Hence, if [s] is the string ["\"%4d\"1234.00"], then
+ [Scanf.sscanf s "%_(%i%)"] is simply equivalent to
+ [Scanf.sscanf "1234.00" "%4d"].
+ - [l]: returns the number of lines read so far.
+ - [n]: returns the number of characters read so far.
+ - [N] or [L]: returns the number of tokens read so far.
+ - [!]: matches the end of input condition.
+ - [%]: matches one [%] character in the input.
+ - [,]: the no-op delimiter for conversion specifications.
+
+ Following the [%] character that introduces a conversion, there may be
+ the special flag [_]: the conversion that follows occurs as usual,
+ but the resulting value is discarded.
+ For instance, if [f] is the function [fun i -> i + 1], and [s] is the
+ string ["x = 1"], then [Scanf.sscanf s "%_s = %i" f] returns [2].
+
+ The optional [width] is an integer literal indicating the maximal width
+ of the token to read.
+ For instance, [%6d] reads an integer, having at most 6 decimal digits;
+ [%4f] reads a float with at most 4 characters; and [%8\[\\000-\\255\]]
+ returns the next 8 characters (or all the characters still available,
+ if fewer than 8 characters are available in the input).
+
+ The optional [precision] is a dot [.] followed by an integer literal
+ indicating the maximum number of digits that follow the decimal point in
+ the [%f], [%e], and [%E] conversions. For instance, [%.4f] reads a
+ [float] with at most 4 fractional digits.
+
+ Notes:
+
+ - as mentioned above, the [%s] conversion always succeeds, even if there is
+ nothing to read in the input: in this case, it simply returns [""].
+
+ - in addition to the relevant digits, ['_'] characters may appear
+ inside numbers (this is reminiscent to the usual Caml lexical
+ conventions). If stricter scanning is desired, use the range
+ conversion facility instead of the number conversions.
+
+ - the [scanf] facility is not intended for heavy duty lexical
+ analysis and parsing. If it appears not expressive enough for your
+ needs, several alternative exists: regular expressions (module
+ [Str]), stream parsers, [ocamllex]-generated lexers,
+ [ocamlyacc]-generated parsers. *)
(** {7:indication Scanning indications in format strings} *)
(** Scanning indications appear just after the string conversions [%s]
and [%\[ range \]] to delimit the end of the token. A scanning
- indication is introduced by a [@] character, followed by some
- constant character [c]. It means that the string token should end
+ indication is introduced by a [\@] character, followed by some
+ literal character [c]. It means that the string token should end
just before the next matching [c] (which is skipped). If no [c]
character is encountered, the string token spreads as much as
possible. For instance, ["%s@\t"] reads a string up to the next
- tab character or to the end of input. If a scanning
- indication [\@c] does not follow a string conversion, it is treated
- as a plain [c] character.
+ tab character or up to the end of input.
+
+ When it does not introduce a scanning indication, the [\@] character
+ introduces an escape for the next character: [\@c] is treated as a plain
+ [c] character.
Note:
@@ -487,3 +508,9 @@ val format_from_string :
have the same type as [fmt].
@since 3.10.0
*)
+
+(*
+ Local Variables:
+ compile-command: "cd ..; make world"
+ End:
+*)