summaryrefslogtreecommitdiff
path: root/doc/html
diff options
context:
space:
mode:
authorph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2020-03-20 18:09:59 +0000
committerph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2020-03-20 18:09:59 +0000
commit2ec85e0009fc1808ed79b4697e8502795b46564b (patch)
tree8b7b8eb19fe4feecbd1f0fb9fed718d5c523259d /doc/html
parent9273b7d54f872ede1a3c77d628495065a4bfa206 (diff)
downloadpcre2-2ec85e0009fc1808ed79b4697e8502795b46564b.tar.gz
Renamed dftables as pcre2_dftables and enable it to write the tables in binary.
Update documentation about character tables. git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@1237 6239d852-aaf2-0410-a92c-79f79f948069
Diffstat (limited to 'doc/html')
-rw-r--r--doc/html/NON-AUTOTOOLS-BUILD.txt16
-rw-r--r--doc/html/README.txt68
-rw-r--r--doc/html/pcre2_set_character_tables.html9
-rw-r--r--doc/html/pcre2api.html45
-rw-r--r--doc/html/pcre2build.html48
-rw-r--r--doc/html/pcre2test.html24
6 files changed, 134 insertions, 76 deletions
diff --git a/doc/html/NON-AUTOTOOLS-BUILD.txt b/doc/html/NON-AUTOTOOLS-BUILD.txt
index 39e7620..a73c058 100644
--- a/doc/html/NON-AUTOTOOLS-BUILD.txt
+++ b/doc/html/NON-AUTOTOOLS-BUILD.txt
@@ -74,14 +74,14 @@ can skip ahead to the CMake section.
src/pcre2_chartables.c.
OR:
- Compile src/dftables.c as a stand-alone program (using -DHAVE_CONFIG_H
- if you have set up src/config.h), and then run it with the single
- argument "src/pcre2_chartables.c". This generates a set of standard
- character tables and writes them to that file. The tables are generated
- using the default C locale for your system. If you want to use a locale
- that is specified by LC_xxx environment variables, add the -L option to
- the dftables command. You must use this method if you are building on a
- system that uses EBCDIC code.
+ Compile src/pcre2_dftables.c as a stand-alone program (using
+ -DHAVE_CONFIG_H if you have set up src/config.h), and then run it with
+ the single argument "src/pcre2_chartables.c". This generates a set of
+ standard character tables and writes them to that file. The tables are
+ generated using the default C locale for your system. If you want to use
+ a locale that is specified by LC_xxx environment variables, add the -L
+ option to the pcre2_dftables command. You must use this method if you
+ are building on a system that uses EBCDIC code.
The tables in src/pcre2_chartables.c are defaults. The caller of PCRE2 can
specify alternative tables at run time.
diff --git a/doc/html/README.txt b/doc/html/README.txt
index 8ce6f96..187dc8b 100644
--- a/doc/html/README.txt
+++ b/doc/html/README.txt
@@ -269,9 +269,9 @@ library. They are also documented in the pcre2build man page.
--enable-rebuild-chartables
- a program called dftables is compiled and run in the default C locale when
- you obey "make". It builds a source file called pcre2_chartables.c. If you do
- not specify this option, pcre2_chartables.c is created as a copy of
+ a program called pcre2_dftables is compiled and run in the default C locale
+ when you obey "make". It builds a source file called pcre2_chartables.c. If
+ you do not specify this option, pcre2_chartables.c is created as a copy of
pcre2_chartables.c.dist. See "Character tables" below for further
information.
@@ -548,11 +548,11 @@ Cross-compiling using autotools
You can specify CC and CFLAGS in the normal way to the "configure" command, in
order to cross-compile PCRE2 for some other host. However, you should NOT
-specify --enable-rebuild-chartables, because if you do, the dftables.c source
-file is compiled and run on the local host, in order to generate the inbuilt
-character tables (the pcre2_chartables.c file). This will probably not work,
-because dftables.c needs to be compiled with the local compiler, not the cross
-compiler.
+specify --enable-rebuild-chartables, because if you do, the pcre2_dftables.c
+source file is compiled and run on the local host, in order to generate the
+inbuilt character tables (the pcre2_chartables.c file). This will probably not
+work, because pcre2_dftables.c needs to be compiled with the local compiler,
+not the cross compiler.
When --enable-rebuild-chartables is not specified, pcre2_chartables.c is
created by making a copy of pcre2_chartables.c.dist, which is a default set of
@@ -560,9 +560,10 @@ tables that assumes ASCII code. Cross-compiling with the default tables should
not be a problem.
If you need to modify the character tables when cross-compiling, you should
-move pcre2_chartables.c.dist out of the way, then compile dftables.c by hand
-and run it on the local host to make a new version of pcre2_chartables.c.dist.
-Then when you cross-compile PCRE2 this new version of the tables will be used.
+move pcre2_chartables.c.dist out of the way, then compile pcre2_dftables.c by
+hand and run it on the local host to make a new version of
+pcre2_chartables.c.dist. See the pcre2build section "Creating character tables
+at build time" for more details.
Making new tarballs
@@ -721,8 +722,8 @@ compile context.
The source file called pcre2_chartables.c contains the default set of tables.
By default, this is created as a copy of pcre2_chartables.c.dist, which
contains tables for ASCII coding. However, if --enable-rebuild-chartables is
-specified for ./configure, a different version of pcre2_chartables.c is built
-by the program dftables (compiled from dftables.c), which uses the ANSI C
+specified for ./configure, a new version of pcre2_chartables.c is built by the
+program pcre2_dftables (compiled from pcre2_dftables.c), which uses the ANSI C
character handling functions such as isalnum(), isalpha(), isupper(),
islower(), etc. to build the table sources. This means that the default C
locale that is set for your system will control the contents of these default
@@ -732,32 +733,31 @@ file does not get automatically re-generated. The best way to do this is to
move pcre2_chartables.c.dist out of the way and replace it with your customized
tables.
-When the dftables program is run as a result of --enable-rebuild-chartables,
-it uses the default C locale that is set on your system. It does not pay
-attention to the LC_xxx environment variables. In other words, it uses the
-system's default locale rather than whatever the compiling user happens to have
-set. If you really do want to build a source set of character tables in a
-locale that is specified by the LC_xxx variables, you can run the dftables
-program by hand with the -L option. For example:
+When the pcre2_dftables program is run as a result of specifying
+--enable-rebuild-chartables, it uses the default C locale that is set on your
+system. It does not pay attention to the LC_xxx environment variables. In other
+words, it uses the system's default locale rather than whatever the compiling
+user happens to have set. If you really do want to build a source set of
+character tables in a locale that is specified by the LC_xxx variables, you can
+run the pcre2_dftables program by hand with the -L option. For example:
- ./dftables -L pcre2_chartables.c.special
+ ./pcre2_dftables -L pcre2_chartables.c.special
-The first two 256-byte tables provide lower casing and case flipping functions,
-respectively. The next table consists of three 32-byte bit maps which identify
-digits, "word" characters, and white space, respectively. These are used when
-building 32-byte bit maps that represent character classes for code points less
-than 256. The final 256-byte table has bits indicating various character types,
-as follows:
+The second argument names the file where the source code for the tables is
+written. The first two 256-byte tables provide lower casing and case flipping
+functions, respectively. The next table consists of a number of 32-byte bit
+maps which identify certain character classes such as digits, "word"
+characters, white space, etc. These are used when building 32-byte bit maps
+that represent character classes for code points less than 256. The final
+256-byte table has bits indicating various character types, as follows:
1 white space character
2 letter
- 4 decimal digit
- 8 hexadecimal digit
+ 4 lower case letter
+ 8 decimal digit
16 alphanumeric or '_'
- 128 regular expression metacharacter or binary zero
-You should not alter the set of characters that contain the 128 bit, as that
-will cause PCRE2 to malfunction.
+See also the pcre2build section "Creating character tables at build time".
File manifest
@@ -768,7 +768,7 @@ The distribution should contain the files listed below.
(A) Source files for the PCRE2 library functions and their headers are found in
the src directory:
- src/dftables.c auxiliary program for building pcre2_chartables.c
+ src/pcre2_dftables.c auxiliary program for building pcre2_chartables.c
when --enable-rebuild-chartables is specified
src/pcre2_chartables.c.dist a default set of character tables that assume
@@ -894,4 +894,4 @@ The distribution should contain the files listed below.
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
-Last updated: 16 April 2019
+Last updated: 20 March 2020
diff --git a/doc/html/pcre2_set_character_tables.html b/doc/html/pcre2_set_character_tables.html
index 43c02ff..1ce9a4f 100644
--- a/doc/html/pcre2_set_character_tables.html
+++ b/doc/html/pcre2_set_character_tables.html
@@ -27,9 +27,12 @@ DESCRIPTION
</b><br>
<P>
This function sets a pointer to custom character tables within a compile
-context. The second argument must be the result of a call to
-<b>pcre2_maketables()</b> or NULL to request the default tables. The result is
-always zero.
+context. The second argument must point to a set of PCRE2 character tables or
+be NULL to request the default tables. The result is always zero. Character
+tables can be created by calling <b>pcre2_maketables()</b> or by running the
+<b>pcre2_dftables</b> maintenance command in binary mode (see the
+<a href="pcre2build.html"><b>pcre2build</b></a>
+documentation).
</P>
<P>
There is a complete description of the PCRE2 native API in the
diff --git a/doc/html/pcre2api.html b/doc/html/pcre2api.html
index ee056ad..673911b 100644
--- a/doc/html/pcre2api.html
+++ b/doc/html/pcre2api.html
@@ -1105,10 +1105,11 @@ less than the limit set by the caller of <b>pcre2_match()</b> or
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
</P>
<P>
-The function <b>pcre2_config()</b> makes it possible for a PCRE2 client to
-discover which optional features have been compiled into the PCRE2 library. The
+The function <b>pcre2_config()</b> makes it possible for a PCRE2 client to find
+the value of certain configuration parameters and to discover which optional
+features have been compiled into the PCRE2 library. The
<a href="pcre2build.html"><b>pcre2build</b></a>
-documentation has more details about these optional features.
+documentation has more details about these features.
</P>
<P>
The first argument for <b>pcre2_config()</b> specifies which information is
@@ -1225,6 +1226,13 @@ over compilation stack usage, see <b>pcre2_set_compile_recursion_guard()</b>.
This parameter is obsolete and should not be used in new code. The output is a
uint32_t integer that is always set to zero.
<pre>
+ PCRE2_CONFIG_TABLES_LENGTH
+</pre>
+The output is a uint32_t integer that gives the length of PCRE2's character
+processing tables in bytes. For details of these tables see the
+<a href="#localesupport">section on locale support</a>
+below.
+<pre>
PCRE2_CONFIG_UNICODE_VERSION
</pre>
The <i>where</i> argument should point to a buffer that is at least 24 code
@@ -2043,7 +2051,7 @@ calling <b>pcre2_set_character_tables()</b> to set the tables pointer therein.
</P>
<P>
For example, to build and use tables that are appropriate for the French locale
-(where accented characters with values greater than 128 are treated as
+(where accented characters with values greater than 127 are treated as
letters), the following code could be used:
<pre>
setlocale(LC_CTYPE, "fr_FR");
@@ -2057,10 +2065,10 @@ are using Windows, the name for the French locale is "french".
</P>
<P>
The pointer that is passed (via the compile context) to <b>pcre2_compile()</b>
-is saved with the compiled pattern, and the same tables are used by
-<b>pcre2_match()</b> and <b>pcre_dfa_match()</b>. Thus, for any single pattern,
-compilation and matching both happen in the same locale, but different patterns
-can be processed in different locales.
+is saved with the compiled pattern, and the same tables are used by the
+matching functions. Thus, for any single pattern, compilation and matching both
+happen in the same locale, but different patterns can be processed in different
+locales.
</P>
<P>
It is the caller's responsibility to ensure that the memory containing the
@@ -2068,6 +2076,23 @@ tables remains available while they are still in use. When they are no longer
needed, you can discard them using <b>pcre2_maketables_free()</b>, which should
pass as its first parameter the same global context that was used to create the
tables.
+</P>
+<br><b>
+Saving locale tables
+</b><br>
+<P>
+The tables described above are just a sequence of binary bytes, which makes
+them independent of hardware characteristics such as endianness or whether the
+processor is 32-bit or 64-bit. A copy of the result of <b>pcre2_maketables()</b>
+can therefore be saved in a file or elsewhere and re-used later, even in a
+different program or on another computer. The size of the tables (number of
+bytes) must be obtained by calling <b>pcre2_config()</b> with the
+PCRE2_CONFIG_TABLES_LENGTH option because <b>pcre2_maketables()</b> does not
+return this value. Note that the <b>pcre2_dftables</b> program, which is part of
+the PCRE2 build system, can be used stand-alone to create a file that contains
+a set of binary tables. See the
+<a href="pcre2build.html#createtables"><b>pcre2build</b></a>
+documentation for details.
<a name="infoaboutpattern"></a></P>
<br><a name="SEC23" href="#TOC1">INFORMATION ABOUT A COMPILED PATTERN</a><br>
<P>
@@ -2076,7 +2101,7 @@ tables.
<P>
The <b>pcre2_pattern_info()</b> function returns general information about a
compiled pattern. For information about callouts, see the
-<a href="pcre2pattern.html#infoaboutcallouts">next section.</a>
+<a href="#infoaboutcallouts">next section.</a>
The first argument for <b>pcre2_pattern_info()</b> is a pointer to the compiled
pattern. The second argument specifies which piece of information is required,
and the third argument is a pointer to a variable to receive the data. If the
@@ -3931,7 +3956,7 @@ Cambridge, England.
</P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 24 February 2020
+Last updated: 19 March 2020
<br>
Copyright &copy; 1997-2020 University of Cambridge.
<br>
diff --git a/doc/html/pcre2build.html b/doc/html/pcre2build.html
index 13d9da2..38c2f1c 100644
--- a/doc/html/pcre2build.html
+++ b/doc/html/pcre2build.html
@@ -128,7 +128,7 @@ To build it without Unicode support, add
--disable-unicode
</pre>
to the <b>configure</b> command. This setting applies to all three libraries. It
-is not possible to build one library with Unicode support, and another without,
+is not possible to build one library with Unicode support and another without
in the same configuration.
</P>
<P>
@@ -188,11 +188,11 @@ which enables the use of an execmem allocator in JIT that is compatible with
SELinux. This has no effect if JIT is not enabled. See the
<a href="pcre2jit.html"><b>pcre2jit</b></a>
documentation for a discussion of JIT usage. When JIT support is enabled,
-pcre2grep automatically makes use of it, unless you add
+<b>pcre2grep</b> automatically makes use of it, unless you add
<pre>
--disable-pcre2grep-jit
</pre>
-to the "configure" command.
+to the <b>configure</b> command.
</P>
<br><a name="SEC8" href="#TOC1">NEWLINE RECOGNITION</a><br>
<P>
@@ -321,7 +321,7 @@ As well as applying to <b>pcre2_match()</b>, the depth limit also controls
the depth of recursive function calls in <b>pcre2_dfa_match()</b>. These are
used for lookaround assertions, atomic groups, and recursion within patterns.
The limit does not apply to JIT matching.
-</P>
+<a name="createtables"></a></P>
<br><a name="SEC12" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>
<P>
PCRE2 uses fixed tables for processing characters whose code points are less
@@ -332,12 +332,34 @@ only. If you add
--enable-rebuild-chartables
</pre>
to the <b>configure</b> command, the distributed tables are no longer used.
-Instead, a program called <b>dftables</b> is compiled and run. This outputs the
-source for new set of tables, created in the default locale of your C run-time
-system. This method of replacing the tables does not work if you are cross
-compiling, because <b>dftables</b> is run on the local host. If you need to
-create alternative tables when cross compiling, you will have to do so "by
-hand".
+Instead, a program called <b>pcre2_dftables</b> is compiled and run. This
+outputs the source for new set of tables, created in the default locale of your
+C run-time system. This method of replacing the tables does not work if you are
+cross compiling, because <b>pcre2_dftables</b> needs to be run on the local
+host and therefore not compiled with the cross compiler.
+</P>
+<P>
+If you need to create alternative tables when cross compiling, you will have to
+do so "by hand". There may also be other reasons for creating tables manually.
+To cause <b>pcre2_dftables</b> to be built on the local host, run a normal
+compiling command, and then run the program with the output file as its
+argument, for example:
+<pre>
+ cc src/pcre2_dftables.c -o pcre2_dftables
+ ./pcre2_dftables src/pcre2_chartables.c
+</pre>
+This builds the tables in the default locale of the local host. If you want to
+specify a locale, you must use the -L option:
+<pre>
+ LC_ALL=fr_FR ./pcre2_dftables -L src/pcre2_chartables.c
+</pre>
+You can also specify -b (with or without -L). This causes the tables to be
+written in binary instead of as source code. A set of binary tables can be
+loaded into memory by an application and passed to <b>pcre2_compile()</b> in the
+same way as tables created by calling <b>pcre2_maketables()</b>. The tables are
+just a string of bytes, independent of hardware characteristics such as
+endianness. This means they can be bundled with an application that runs in
+different environments, to ensure consistent behaviour.
</P>
<br><a name="SEC13" href="#TOC1">USING EBCDIC CODE</a><br>
<P>
@@ -538,7 +560,7 @@ support these modifiers. If
<pre>
--disable-percent-zt
</pre>
-is specified, no use is made of the z or t modifiers. Instead or %td or %zu,
+is specified, no use is made of the z or t modifiers. Instead of %td or %zu,
%lu is used, with a cast for size_t values.
</P>
<br><a name="SEC22" href="#TOC1">SUPPORT FOR FUZZERS</a><br>
@@ -592,9 +614,9 @@ Cambridge, England.
</P>
<br><a name="SEC26" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 03 March 2019
+Last updated: 20 March 2020
<br>
-Copyright &copy; 1997-2019 University of Cambridge.
+Copyright &copy; 1997-2020 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.
diff --git a/doc/html/pcre2test.html b/doc/html/pcre2test.html
index bfd22e9..63fa461 100644
--- a/doc/html/pcre2test.html
+++ b/doc/html/pcre2test.html
@@ -376,6 +376,12 @@ This command is used to load a set of precompiled patterns from a file, as
described in the section entitled "Saving and restoring compiled patterns"
<a href="#saverestore">below.</a>
<pre>
+ #loadtables &#60;filename&#62;
+</pre>
+This command is used to load a set of binary character tables that can be
+accessed by the tables=3 qualifier. Such tables can be created by the
+<b>pcre2_dftables</b> program with the -b option.
+<pre>
#newline_default [&#60;newline-list&#62;]
</pre>
When PCRE2 is built, a default newline convention can be specified. This
@@ -679,7 +685,7 @@ heavily used in the test files.
pushcopy push a copy onto the stack
stackguard=&#60;number&#62; test the stackguard feature
subject_literal treat all subject lines as literal
- tables=[0|1|2] select internal tables
+ tables=[0|1|2|3] select internal tables
use_length do not zero-terminate the pattern
utf8_input treat input as UTF-8
</pre>
@@ -1027,18 +1033,20 @@ Using alternative character tables
</b><br>
<P>
The value specified for the <b>tables</b> modifier must be one of the digits 0,
-1, or 2. It causes a specific set of built-in character tables to be passed to
-<b>pcre2_compile()</b>. This is used in the PCRE2 tests to check behaviour with
-different character tables. The digit specifies the tables as follows:
+1, 2, or 3. It causes a specific set of built-in character tables to be passed
+to <b>pcre2_compile()</b>. This is used in the PCRE2 tests to check behaviour
+with different character tables. The digit specifies the tables as follows:
<pre>
0 do not pass any special character tables
1 the default ASCII tables, as distributed in
pcre2_chartables.c.dist
2 a set of tables defining ISO 8859 characters
+ 3 a set of tables loaded by the #loadtables command
</pre>
-In table 2, some characters whose codes are greater than 128 are identified as
-letters, digits, spaces, etc. Setting alternate character tables and a locale
-are mutually exclusive.
+In tables 2, some characters whose codes are greater than 128 are identified as
+letters, digits, spaces, etc. Tables 3 can be used only after a
+<b>#loadtables</b> command has loaded them from a binary file. Setting alternate
+character tables and a locale are mutually exclusive.
</P>
<br><b>
Setting certain match controls
@@ -2105,7 +2113,7 @@ Cambridge, England.
</P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 22 January 2020
+Last updated: 20 March 2020
<br>
Copyright &copy; 1997-2020 University of Cambridge.
<br>