1 files changed, 142 insertions, 156 deletions
diff --git a/stdlib/scanf.mli b/stdlib/scanf.mli
index 1e8a744840..53317d66d8 100644
--- a/stdlib/scanf.mli
+++ b/stdlib/scanf.mli
@@ -65,16 +65,16 @@
 
 (** {7 Formatted input as a functional feature} *)
 
-(** The Caml scanning facility is reminiscent of the corresponding C feature.
+(** The OCaml scanning facility is reminiscent of the corresponding C feature.
     However, it is also largely different, simpler, and yet more powerful:
     the formatted input functions are higher-order functionals and the
     parameter passing mechanism is just the regular function application not
     the variable assignment based mechanism which is typical for formatted
-    input in imperative languages; the Caml format strings also feature
+    input in imperative languages; the OCaml format strings also feature
     useful additions to easily define complex tokens; as expected within a
     functional programming language, the formatted input functions also
     support polymorphism, in particular arbitrary interaction with
-    polymorphic user-defined scanners.  Furthermore, the Caml formatted input
+    polymorphic user-defined scanners.  Furthermore, the OCaml formatted input
     facility is fully type-checked at compile time. *)
 
 (** {6 Formatted input channel} *)
@@ -232,21 +232,14 @@ val bscanf : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner;;
 
 (** {6 Format string description} *)
 
-(** The format string is a character string which contains three types of
+(** The format is a character string which contains three types of
     objects:
     - plain characters, which are simply matched with the characters of the
       input (with a special case for space and line feed, see {!Scanf.space}),
     - conversion specifications, each of which causes reading and conversion of
       one argument for the function [f] (see {!Scanf.conversion}),
     - scanning indications to specify boundaries of tokens
-      (see scanning {!Scanf.indication}).
-
-   As a special convention for format strings, the [\@] character introduces
-   an escape for both characters [\@] and [%]: in a format string,
-   [\@\@] and [\@%] are respectively equivalent to the plain characters [\@]
-   and [%].
-   @since 3.13
-*)
+      (see scanning {!Scanf.indication}). *)
 
 (** {7:space The space character in format strings} *)
 
@@ -269,157 +262,148 @@ val bscanf : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner;;
 
 (** {7:conversion Conversion specifications in format strings} *)
 
-(** Conversion specifications have the following form:
-
-   [% \[flags\] \[width\] \[.precision\] type]
-
-   In short, a conversion specification consists in the [%] character,
-   followed by optional modifiers, and a type which is made of one or
-   several characters.
-
-   The types and their meanings are:
-
-   - [d]: reads an optionally signed decimal integer.
-   - [i]: reads an optionally signed integer
-     (usual input conventions for decimal ([0-9]+), hexadecimal
-      ([0x[0-9a-f]+] and [0X[0-9A-F]+]), octal ([0o[0-7]+]), and binary
-      ([0b[0-1]+]) notations are understood).
-   - [u]: reads an unsigned decimal integer.
-   - [x] or [X]: reads an unsigned hexadecimal integer ([[0-9a-f]+] or [[0-9A-F]+]).
-   - [o]: reads an unsigned octal integer ([[0-7]+]).
-   - [s]: reads a string argument that spreads as much as possible, until
-     the following bounding conditions holds:
-     {ul
-       {- a whitespace has been found (see {!Scanf.space}),}
-       {- a scanning indication has been encountered
-          (see scanning {!Scanf.indication}),}
-       {- the end-of-input has been reached.}
-     }
-     Hence, the [%s] conversion always succeeds: it returns an empty
-     string, if the bounding condition holds when the scan begins.
-   - [S]: reads a delimited string argument (delimiters and special
-     escaped characters follow the lexical conventions of Caml).
-   - [c]: reads a single character. To test the current input character
-     without reading it, specify a null field width, i.e. use
-     specification [%0c]. Raise [Invalid_argument], if the field width
-     specification is greater than 1.
-   - [C]: reads a single delimited character (delimiters and special
-     escaped characters follow the lexical conventions of Caml).
-   - [f], [e], [E], [g], [G]: reads an optionally signed
-     floating-point number in decimal notation, in the style [dddd.ddd
-     e/E+-dd].
-   - [F]: reads a floating point number according to the lexical
-     conventions of Caml (hence the decimal point is mandatory if the
-     exponent part is not mentioned).
-   - [B]: reads a boolean argument ([true] or [false]).
-   - [b]: reads a boolean argument (for backward compatibility; do not use
-     in new programs).
-   - [ld], [li], [lu], [lx], [lX], [lo]: reads an [int32] argument to
-     the format specified by the second letter for regular integers.
-   - [nd], [ni], [nu], [nx], [nX], [no]: reads a [nativeint] argument to
-     the format specified by the second letter for regular integers.
-   - [Ld], [Li], [Lu], [Lx], [LX], [Lo]: reads an [int64] argument to
-     the format specified by the second letter for regular integers.
-   - [\[ range \]]: reads characters that matches one of the characters
-     mentioned in the range of characters [range] (or not mentioned in
-     it, if the range starts with [^]). Reads a [string] that can be
-     empty, if the next input character does not match the range. The set of
-     characters from [c1] to [c2] (inclusively) is denoted by [c1-c2].
-     Hence, [%\[0-9\]] returns a string representing a decimal number
-     or an empty string if no decimal digit is found; similarly,
-     [%\[\\048-\\057\\065-\\070\]] returns a string of hexadecimal digits.
-     If a closing bracket appears in a range, it must occur as the
-     first character of the range (or just after the [^] in case of
-     range negation); hence [\[\]\]] matches a [\]] character and
-     [\[^\]\]] matches any character that is not [\]].
-   - [r]: user-defined reader. Takes the next [ri] formatted input function and
-     applies it to the scanning buffer [ib] to read the next argument. The
-     input function [ri] must therefore have type [Scanning.in_channel -> 'a] and
-     the argument read has type ['a].
-   - [\{ fmt %\}]: reads a format string argument.
-     The format string read must have the same type as the format string
-     specification [fmt].
-     For instance, ["%{ %i %}"] reads any format string that can read a value of
-     type [int]; hence, if [s] is the string ["fmt:\"number is %u\""], then
-     [Scanf.sscanf s "fmt: %{%i%}"] succeeds and returns the format string
-     ["number is %u"].
-   - [\( fmt %\)]: scanning format substitution.
-     Reads a format string and then goes on scanning with the format string
-     read, instead of using [fmt].
-     The format string read must have the same type as the format string
-     specification [fmt] that it replaces.
-     For instance, ["%( %i %)"] reads any format string that can read a value
-     of type [int].
-     Returns the format string read, and the value read using the format
-     string read.
-     Hence, if [s] is the string ["\"%4d\"1234.00"], then
-     [Scanf.sscanf s "%(%i%)" (fun fmt i -> fmt, i)] evaluates to
-     [("%4d", 1234)].
-     If the special flag [_] is used, the conversion discards the
-     format string read and only returns the value read with the format
-     string read.
-     Hence, if [s] is the string ["\"%4d\"1234.00"], then
-     [Scanf.sscanf s "%_(%i%)"] is simply equivalent to
-     [Scanf.sscanf "1234.00" "%4d"].
-   - [l]: returns the number of lines read so far.
-   - [n]: returns the number of characters read so far.
-   - [N] or [L]: returns the number of tokens read so far.
-   - [!]: matches the end of input condition.
-   - [%]: matches one [%] character in the input.
-   - [,]: the no-op delimiter for conversion specifications.
-
-   Following the [%] character that introduces a conversion, there may be
-   the special flag [_]: the conversion that follows occurs as usual,
-   but the resulting value is discarded.
-   For instance, if [f] is the function [fun i -> i + 1], and [s] is the
-   string ["x = 1"], then [Scanf.sscanf s "%_s = %i" f] returns [2].
-
-   The optional [width] is an integer literal indicating the maximal width
-   of the token to read.
-   For instance, [%6d] reads an integer, having at most 6 decimal digits;
-   [%4f] reads a float with at most 4 characters; and [%8\[\\000-\\255\]]
-   returns the next 8 characters (or all the characters still available,
-   if fewer than 8 characters are available in the input).
-
-   The optional [precision] is a dot [.] followed by an integer literal
-   indicating the maximum number of digits that follow the decimal point in
-   the [%f], [%e], and [%E] conversions. For instance, [%.4f] reads a
-   [float] with at most 4 fractional digits.
-
-   Notes:
-
-   - as mentioned above, the [%s] conversion always succeeds, even if there is
-     nothing to read in the input: in this case, it simply returns [""].
-
-   - in addition to the relevant digits, ['_'] characters may appear
-   inside numbers (this is reminiscent to the usual Caml lexical
-   conventions). If stricter scanning is desired, use the range
-   conversion facility instead of the number conversions.
-
-   - the [scanf] facility is not intended for heavy duty lexical
-   analysis and parsing. If it appears not expressive enough for your
-   needs, several alternative exists: regular expressions (module
-   [Str]), stream parsers, [ocamllex]-generated lexers,
-   [ocamlyacc]-generated parsers. *)
+(** Conversion specifications consist in the [%] character, followed by
+    an optional flag, an optional field width, and followed by one or
+    two conversion characters. The conversion characters and their
+    meanings are:
+
+    - [d]: reads an optionally signed decimal integer.
+    - [i]: reads an optionally signed integer
+      (usual input conventions for decimal ([0-9]+), hexadecimal
+       ([0x[0-9a-f]+] and [0X[0-9A-F]+]), octal ([0o[0-7]+]), and binary
+       ([0b[0-1]+]) notations are understood).
+    - [u]: reads an unsigned decimal integer.
+    - [x] or [X]: reads an unsigned hexadecimal integer ([[0-9a-fA-F]+]).
+    - [o]: reads an unsigned octal integer ([[0-7]+]).
+    - [s]: reads a string argument that spreads as much as possible, until the
+      following bounding condition holds: {ul
+      {- a whitespace has been found (see {!Scanf.space}),}
+      {- a scanning indication (see scanning {!Scanf.indication}) has been
+         encountered,}
+      {- the end-of-input has been reached.}}
+      Hence, this conversion always succeeds: it returns an empty
+      string, if the bounding condition holds when the scan begins.
+    - [S]: reads a delimited string argument (delimiters and special
+      escaped characters follow the lexical conventions of Caml).
+    - [c]: reads a single character. To test the current input character
+      without reading it, specify a null field width, i.e. use
+      specification [%0c]. Raise [Invalid_argument], if the field width
+      specification is greater than 1.
+    - [C]: reads a single delimited character (delimiters and special
+      escaped characters follow the lexical conventions of Caml).
+    - [f], [e], [E], [g], [G]: reads an optionally signed
+      floating-point number in decimal notation, in the style [dddd.ddd
+      e/E+-dd].
+    - [F]: reads a floating point number according to the lexical
+      conventions of Caml (hence the decimal point is mandatory if the
+      exponent part is not mentioned).
+    - [B]: reads a boolean argument ([true] or [false]).
+    - [b]: reads a boolean argument (for backward compatibility; do not use
+      in new programs).
+    - [ld], [li], [lu], [lx], [lX], [lo]: reads an [int32] argument to
+      the format specified by the second letter for regular integers.
+    - [nd], [ni], [nu], [nx], [nX], [no]: reads a [nativeint] argument to
+      the format specified by the second letter for regular integers.
+    - [Ld], [Li], [Lu], [Lx], [LX], [Lo]: reads an [int64] argument to
+      the format specified by the second letter for regular integers.
+    - [\[ range \]]: reads characters that matches one of the characters
+      mentioned in the range of characters [range] (or not mentioned in
+      it, if the range starts with [^]). Reads a [string] that can be
+      empty, if the next input character does not match the range. The set of
+      characters from [c1] to [c2] (inclusively) is denoted by [c1-c2].
+      Hence, [%\[0-9\]] returns a string representing a decimal number
+      or an empty string if no decimal digit is found; similarly,
+      [%\[\\048-\\057\\065-\\070\]] returns a string of hexadecimal digits.
+      If a closing bracket appears in a range, it must occur as the
+      first character of the range (or just after the [^] in case of
+      range negation); hence [\[\]\]] matches a [\]] character and
+      [\[^\]\]] matches any character that is not [\]].
+      Use [%%] and [%\@] to include a [%] or a [\@] in a range.
+    - [r]: user-defined reader. Takes the next [ri] formatted input function and
+      applies it to the scanning buffer [ib] to read the next argument. The
+      input function [ri] must therefore have type [Scanning.in_channel -> 'a] and
+      the argument read has type ['a].
+    - [\{ fmt %\}]: reads a format string argument.
+      The format string read must have the same type as the format string
+      specification [fmt].
+      For instance, ["%{ %i %}"] reads any format string that can read a value of
+      type [int]; hence, if [s] is the string ["fmt:\"number is %u\""], then
+      [Scanf.sscanf s "fmt: %{%i%}"] succeeds and returns the format string
+      ["number is %u"].
+    - [\( fmt %\)]: scanning format substitution.
+      Reads a format string and then goes on scanning with the format string
+      read, instead of using [fmt].
+      The format string read must have the same type as the format string
+      specification [fmt] that it replaces.
+      For instance, ["%( %i %)"] reads any format string that can read a value
+      of type [int].
+      Returns the format string read, and the value read using the format
+      string read.
+      Hence, if [s] is the string ["\"%4d\"1234.00"], then
+      [Scanf.sscanf s "%(%i%)" (fun fmt i -> fmt, i)] evaluates to
+      [("%4d", 1234)].
+      If the special flag [_] is used, the conversion discards the
+      format string read and only returns the value read with the format
+      string read.
+      Hence, if [s] is the string ["\"%4d\"1234.00"], then
+      [Scanf.sscanf s "%_(%i%)"] is simply equivalent to
+      [Scanf.sscanf "1234.00" "%4d"].
+    - [l]: returns the number of lines read so far.
+    - [n]: returns the number of characters read so far.
+    - [N] or [L]: returns the number of tokens read so far.
+    - [!]: matches the end of input condition.
+    - [%]: matches one [%] character in the input.
+    - [\@]: matches one [\@] character in the input.
+    - [,]: does nothing.
+
+    Following the [%] character that introduces a conversion, there may be
+    the special flag [_]: the conversion that follows occurs as usual,
+    but the resulting value is discarded.
+    For instance, if [f] is the function [fun i -> i + 1], and [s] is the
+    string ["x = 1"], then [Scanf.sscanf s "%_s = %i" f] returns [2].
+
+    The field width is composed of an optional integer literal
+    indicating the maximal width of the token to read.
+    For instance, [%6d] reads an integer, having at most 6 decimal digits;
+    [%4f] reads a float with at most 4 characters; and [%8[\\000-\\255]]
+    returns the next 8 characters (or all the characters still available,
+    if fewer than 8 characters are available in the input).
+
+    Notes:
+
+    - as mentioned above, a [%s] conversion always succeeds, even if there is
+      nothing to read in the input: in this case, it simply returns [""].
+
+    - in addition to the relevant digits, ['_'] characters may appear
+    inside numbers (this is reminiscent to the usual Caml lexical
+    conventions). If stricter scanning is desired, use the range
+    conversion facility instead of the number conversions.
+
+    - the [scanf] facility is not intended for heavy duty lexical
+    analysis and parsing. If it appears not expressive enough for your
+    needs, several alternative exists: regular expressions (module
+    [Str]), stream parsers, [ocamllex]-generated lexers,
+    [ocamlyacc]-generated parsers. *)
 
 (** {7:indication Scanning indications in format strings} *)
 
 (** Scanning indications appear just after the string conversions [%s]
-    and [%\[ range \]] to delimit the end of the token. A scanning
+    and [%[ range ]] to delimit the end of the token. A scanning
     indication is introduced by a [\@] character, followed by some
-    literal character [c]. It means that the string token should end
+    plain character [c]. It means that the string token should end
     just before the next matching [c] (which is skipped). If no [c]
     character is encountered, the string token spreads as much as
     possible. For instance, ["%s@\t"] reads a string up to the next
-    tab character or up to the end of input.
-
-    When it does not introduce a scanning indication, the [\@] character
-    introduces an escape for the next character: [\@c] is treated as a plain
-    [c] character.
+    tab character or to the end of input. If a [\@] character appears
+    anywhere else in the format string, it is treated as a plain character.
 
     Note:
 
-    - the scanning indications introduce slight differences in the syntax of
+    - As usual in format strings, [%] characters must be escaped using [%%]
+      and [%\@] is equivalent to [\@]; this rule still holds within range
+      specifications and scanning indications.
+      For instance, ["%s@%%"] reads a string up to the next [%] character.
+    - The scanning indications introduce slight differences in the syntax of
     [Scanf] format strings, compared to those used for the [Printf]
     module. However, the scanning indications are similar to those used in
     the [Format] module; hence, when producing formatted text to be scanned
@@ -509,8 +493,10 @@ val format_from_string :
     @since 3.10.0
 *)
 
-(*
- Local Variables:
-  compile-command: "cd ..; make world"
-  End:
+val unescaped : string -> string
+(** Return a copy of the argument with escape sequences, following the
+    lexical conventions of OCaml, replaced by their corresponding
+    special characters.  If there is no escape sequence in the
+    argument, still return a copy, contrary to String.escaped.
+    @since 3.13.0
 *)