diff options
author | ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069> | 2020-03-20 18:09:59 +0000 |
---|---|---|
committer | ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069> | 2020-03-20 18:09:59 +0000 |
commit | 2ec85e0009fc1808ed79b4697e8502795b46564b (patch) | |
tree | 8b7b8eb19fe4feecbd1f0fb9fed718d5c523259d /doc/html | |
parent | 9273b7d54f872ede1a3c77d628495065a4bfa206 (diff) | |
download | pcre2-2ec85e0009fc1808ed79b4697e8502795b46564b.tar.gz |
Renamed dftables as pcre2_dftables and enable it to write the tables in binary.
Update documentation about character tables.
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@1237 6239d852-aaf2-0410-a92c-79f79f948069
Diffstat (limited to 'doc/html')
-rw-r--r-- | doc/html/NON-AUTOTOOLS-BUILD.txt | 16 | ||||
-rw-r--r-- | doc/html/README.txt | 68 | ||||
-rw-r--r-- | doc/html/pcre2_set_character_tables.html | 9 | ||||
-rw-r--r-- | doc/html/pcre2api.html | 45 | ||||
-rw-r--r-- | doc/html/pcre2build.html | 48 | ||||
-rw-r--r-- | doc/html/pcre2test.html | 24 |
6 files changed, 134 insertions, 76 deletions
diff --git a/doc/html/NON-AUTOTOOLS-BUILD.txt b/doc/html/NON-AUTOTOOLS-BUILD.txt index 39e7620..a73c058 100644 --- a/doc/html/NON-AUTOTOOLS-BUILD.txt +++ b/doc/html/NON-AUTOTOOLS-BUILD.txt @@ -74,14 +74,14 @@ can skip ahead to the CMake section. src/pcre2_chartables.c. OR: - Compile src/dftables.c as a stand-alone program (using -DHAVE_CONFIG_H - if you have set up src/config.h), and then run it with the single - argument "src/pcre2_chartables.c". This generates a set of standard - character tables and writes them to that file. The tables are generated - using the default C locale for your system. If you want to use a locale - that is specified by LC_xxx environment variables, add the -L option to - the dftables command. You must use this method if you are building on a - system that uses EBCDIC code. + Compile src/pcre2_dftables.c as a stand-alone program (using + -DHAVE_CONFIG_H if you have set up src/config.h), and then run it with + the single argument "src/pcre2_chartables.c". This generates a set of + standard character tables and writes them to that file. The tables are + generated using the default C locale for your system. If you want to use + a locale that is specified by LC_xxx environment variables, add the -L + option to the pcre2_dftables command. You must use this method if you + are building on a system that uses EBCDIC code. The tables in src/pcre2_chartables.c are defaults. The caller of PCRE2 can specify alternative tables at run time. diff --git a/doc/html/README.txt b/doc/html/README.txt index 8ce6f96..187dc8b 100644 --- a/doc/html/README.txt +++ b/doc/html/README.txt @@ -269,9 +269,9 @@ library. They are also documented in the pcre2build man page. --enable-rebuild-chartables - a program called dftables is compiled and run in the default C locale when - you obey "make". It builds a source file called pcre2_chartables.c. If you do - not specify this option, pcre2_chartables.c is created as a copy of + a program called pcre2_dftables is compiled and run in the default C locale + when you obey "make". It builds a source file called pcre2_chartables.c. If + you do not specify this option, pcre2_chartables.c is created as a copy of pcre2_chartables.c.dist. See "Character tables" below for further information. @@ -548,11 +548,11 @@ Cross-compiling using autotools You can specify CC and CFLAGS in the normal way to the "configure" command, in order to cross-compile PCRE2 for some other host. However, you should NOT -specify --enable-rebuild-chartables, because if you do, the dftables.c source -file is compiled and run on the local host, in order to generate the inbuilt -character tables (the pcre2_chartables.c file). This will probably not work, -because dftables.c needs to be compiled with the local compiler, not the cross -compiler. +specify --enable-rebuild-chartables, because if you do, the pcre2_dftables.c +source file is compiled and run on the local host, in order to generate the +inbuilt character tables (the pcre2_chartables.c file). This will probably not +work, because pcre2_dftables.c needs to be compiled with the local compiler, +not the cross compiler. When --enable-rebuild-chartables is not specified, pcre2_chartables.c is created by making a copy of pcre2_chartables.c.dist, which is a default set of @@ -560,9 +560,10 @@ tables that assumes ASCII code. Cross-compiling with the default tables should not be a problem. If you need to modify the character tables when cross-compiling, you should -move pcre2_chartables.c.dist out of the way, then compile dftables.c by hand -and run it on the local host to make a new version of pcre2_chartables.c.dist. -Then when you cross-compile PCRE2 this new version of the tables will be used. +move pcre2_chartables.c.dist out of the way, then compile pcre2_dftables.c by +hand and run it on the local host to make a new version of +pcre2_chartables.c.dist. See the pcre2build section "Creating character tables +at build time" for more details. Making new tarballs @@ -721,8 +722,8 @@ compile context. The source file called pcre2_chartables.c contains the default set of tables. By default, this is created as a copy of pcre2_chartables.c.dist, which contains tables for ASCII coding. However, if --enable-rebuild-chartables is -specified for ./configure, a different version of pcre2_chartables.c is built -by the program dftables (compiled from dftables.c), which uses the ANSI C +specified for ./configure, a new version of pcre2_chartables.c is built by the +program pcre2_dftables (compiled from pcre2_dftables.c), which uses the ANSI C character handling functions such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table sources. This means that the default C locale that is set for your system will control the contents of these default @@ -732,32 +733,31 @@ file does not get automatically re-generated. The best way to do this is to move pcre2_chartables.c.dist out of the way and replace it with your customized tables. -When the dftables program is run as a result of --enable-rebuild-chartables, -it uses the default C locale that is set on your system. It does not pay -attention to the LC_xxx environment variables. In other words, it uses the -system's default locale rather than whatever the compiling user happens to have -set. If you really do want to build a source set of character tables in a -locale that is specified by the LC_xxx variables, you can run the dftables -program by hand with the -L option. For example: +When the pcre2_dftables program is run as a result of specifying +--enable-rebuild-chartables, it uses the default C locale that is set on your +system. It does not pay attention to the LC_xxx environment variables. In other +words, it uses the system's default locale rather than whatever the compiling +user happens to have set. If you really do want to build a source set of +character tables in a locale that is specified by the LC_xxx variables, you can +run the pcre2_dftables program by hand with the -L option. For example: - ./dftables -L pcre2_chartables.c.special + ./pcre2_dftables -L pcre2_chartables.c.special -The first two 256-byte tables provide lower casing and case flipping functions, -respectively. The next table consists of three 32-byte bit maps which identify -digits, "word" characters, and white space, respectively. These are used when -building 32-byte bit maps that represent character classes for code points less -than 256. The final 256-byte table has bits indicating various character types, -as follows: +The second argument names the file where the source code for the tables is +written. The first two 256-byte tables provide lower casing and case flipping +functions, respectively. The next table consists of a number of 32-byte bit +maps which identify certain character classes such as digits, "word" +characters, white space, etc. These are used when building 32-byte bit maps +that represent character classes for code points less than 256. The final +256-byte table has bits indicating various character types, as follows: 1 white space character 2 letter - 4 decimal digit - 8 hexadecimal digit + 4 lower case letter + 8 decimal digit 16 alphanumeric or '_' - 128 regular expression metacharacter or binary zero -You should not alter the set of characters that contain the 128 bit, as that -will cause PCRE2 to malfunction. +See also the pcre2build section "Creating character tables at build time". File manifest @@ -768,7 +768,7 @@ The distribution should contain the files listed below. (A) Source files for the PCRE2 library functions and their headers are found in the src directory: - src/dftables.c auxiliary program for building pcre2_chartables.c + src/pcre2_dftables.c auxiliary program for building pcre2_chartables.c when --enable-rebuild-chartables is specified src/pcre2_chartables.c.dist a default set of character tables that assume @@ -894,4 +894,4 @@ The distribution should contain the files listed below. Philip Hazel Email local part: ph10 Email domain: cam.ac.uk -Last updated: 16 April 2019 +Last updated: 20 March 2020 diff --git a/doc/html/pcre2_set_character_tables.html b/doc/html/pcre2_set_character_tables.html index 43c02ff..1ce9a4f 100644 --- a/doc/html/pcre2_set_character_tables.html +++ b/doc/html/pcre2_set_character_tables.html @@ -27,9 +27,12 @@ DESCRIPTION </b><br> <P> This function sets a pointer to custom character tables within a compile -context. The second argument must be the result of a call to -<b>pcre2_maketables()</b> or NULL to request the default tables. The result is -always zero. +context. The second argument must point to a set of PCRE2 character tables or +be NULL to request the default tables. The result is always zero. Character +tables can be created by calling <b>pcre2_maketables()</b> or by running the +<b>pcre2_dftables</b> maintenance command in binary mode (see the +<a href="pcre2build.html"><b>pcre2build</b></a> +documentation). </P> <P> There is a complete description of the PCRE2 native API in the diff --git a/doc/html/pcre2api.html b/doc/html/pcre2api.html index ee056ad..673911b 100644 --- a/doc/html/pcre2api.html +++ b/doc/html/pcre2api.html @@ -1105,10 +1105,11 @@ less than the limit set by the caller of <b>pcre2_match()</b> or <b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b> </P> <P> -The function <b>pcre2_config()</b> makes it possible for a PCRE2 client to -discover which optional features have been compiled into the PCRE2 library. The +The function <b>pcre2_config()</b> makes it possible for a PCRE2 client to find +the value of certain configuration parameters and to discover which optional +features have been compiled into the PCRE2 library. The <a href="pcre2build.html"><b>pcre2build</b></a> -documentation has more details about these optional features. +documentation has more details about these features. </P> <P> The first argument for <b>pcre2_config()</b> specifies which information is @@ -1225,6 +1226,13 @@ over compilation stack usage, see <b>pcre2_set_compile_recursion_guard()</b>. This parameter is obsolete and should not be used in new code. The output is a uint32_t integer that is always set to zero. <pre> + PCRE2_CONFIG_TABLES_LENGTH +</pre> +The output is a uint32_t integer that gives the length of PCRE2's character +processing tables in bytes. For details of these tables see the +<a href="#localesupport">section on locale support</a> +below. +<pre> PCRE2_CONFIG_UNICODE_VERSION </pre> The <i>where</i> argument should point to a buffer that is at least 24 code @@ -2043,7 +2051,7 @@ calling <b>pcre2_set_character_tables()</b> to set the tables pointer therein. </P> <P> For example, to build and use tables that are appropriate for the French locale -(where accented characters with values greater than 128 are treated as +(where accented characters with values greater than 127 are treated as letters), the following code could be used: <pre> setlocale(LC_CTYPE, "fr_FR"); @@ -2057,10 +2065,10 @@ are using Windows, the name for the French locale is "french". </P> <P> The pointer that is passed (via the compile context) to <b>pcre2_compile()</b> -is saved with the compiled pattern, and the same tables are used by -<b>pcre2_match()</b> and <b>pcre_dfa_match()</b>. Thus, for any single pattern, -compilation and matching both happen in the same locale, but different patterns -can be processed in different locales. +is saved with the compiled pattern, and the same tables are used by the +matching functions. Thus, for any single pattern, compilation and matching both +happen in the same locale, but different patterns can be processed in different +locales. </P> <P> It is the caller's responsibility to ensure that the memory containing the @@ -2068,6 +2076,23 @@ tables remains available while they are still in use. When they are no longer needed, you can discard them using <b>pcre2_maketables_free()</b>, which should pass as its first parameter the same global context that was used to create the tables. +</P> +<br><b> +Saving locale tables +</b><br> +<P> +The tables described above are just a sequence of binary bytes, which makes +them independent of hardware characteristics such as endianness or whether the +processor is 32-bit or 64-bit. A copy of the result of <b>pcre2_maketables()</b> +can therefore be saved in a file or elsewhere and re-used later, even in a +different program or on another computer. The size of the tables (number of +bytes) must be obtained by calling <b>pcre2_config()</b> with the +PCRE2_CONFIG_TABLES_LENGTH option because <b>pcre2_maketables()</b> does not +return this value. Note that the <b>pcre2_dftables</b> program, which is part of +the PCRE2 build system, can be used stand-alone to create a file that contains +a set of binary tables. See the +<a href="pcre2build.html#createtables"><b>pcre2build</b></a> +documentation for details. <a name="infoaboutpattern"></a></P> <br><a name="SEC23" href="#TOC1">INFORMATION ABOUT A COMPILED PATTERN</a><br> <P> @@ -2076,7 +2101,7 @@ tables. <P> The <b>pcre2_pattern_info()</b> function returns general information about a compiled pattern. For information about callouts, see the -<a href="pcre2pattern.html#infoaboutcallouts">next section.</a> +<a href="#infoaboutcallouts">next section.</a> The first argument for <b>pcre2_pattern_info()</b> is a pointer to the compiled pattern. The second argument specifies which piece of information is required, and the third argument is a pointer to a variable to receive the data. If the @@ -3931,7 +3956,7 @@ Cambridge, England. </P> <br><a name="SEC42" href="#TOC1">REVISION</a><br> <P> -Last updated: 24 February 2020 +Last updated: 19 March 2020 <br> Copyright © 1997-2020 University of Cambridge. <br> diff --git a/doc/html/pcre2build.html b/doc/html/pcre2build.html index 13d9da2..38c2f1c 100644 --- a/doc/html/pcre2build.html +++ b/doc/html/pcre2build.html @@ -128,7 +128,7 @@ To build it without Unicode support, add --disable-unicode </pre> to the <b>configure</b> command. This setting applies to all three libraries. It -is not possible to build one library with Unicode support, and another without, +is not possible to build one library with Unicode support and another without in the same configuration. </P> <P> @@ -188,11 +188,11 @@ which enables the use of an execmem allocator in JIT that is compatible with SELinux. This has no effect if JIT is not enabled. See the <a href="pcre2jit.html"><b>pcre2jit</b></a> documentation for a discussion of JIT usage. When JIT support is enabled, -pcre2grep automatically makes use of it, unless you add +<b>pcre2grep</b> automatically makes use of it, unless you add <pre> --disable-pcre2grep-jit </pre> -to the "configure" command. +to the <b>configure</b> command. </P> <br><a name="SEC8" href="#TOC1">NEWLINE RECOGNITION</a><br> <P> @@ -321,7 +321,7 @@ As well as applying to <b>pcre2_match()</b>, the depth limit also controls the depth of recursive function calls in <b>pcre2_dfa_match()</b>. These are used for lookaround assertions, atomic groups, and recursion within patterns. The limit does not apply to JIT matching. -</P> +<a name="createtables"></a></P> <br><a name="SEC12" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br> <P> PCRE2 uses fixed tables for processing characters whose code points are less @@ -332,12 +332,34 @@ only. If you add --enable-rebuild-chartables </pre> to the <b>configure</b> command, the distributed tables are no longer used. -Instead, a program called <b>dftables</b> is compiled and run. This outputs the -source for new set of tables, created in the default locale of your C run-time -system. This method of replacing the tables does not work if you are cross -compiling, because <b>dftables</b> is run on the local host. If you need to -create alternative tables when cross compiling, you will have to do so "by -hand". +Instead, a program called <b>pcre2_dftables</b> is compiled and run. This +outputs the source for new set of tables, created in the default locale of your +C run-time system. This method of replacing the tables does not work if you are +cross compiling, because <b>pcre2_dftables</b> needs to be run on the local +host and therefore not compiled with the cross compiler. +</P> +<P> +If you need to create alternative tables when cross compiling, you will have to +do so "by hand". There may also be other reasons for creating tables manually. +To cause <b>pcre2_dftables</b> to be built on the local host, run a normal +compiling command, and then run the program with the output file as its +argument, for example: +<pre> + cc src/pcre2_dftables.c -o pcre2_dftables + ./pcre2_dftables src/pcre2_chartables.c +</pre> +This builds the tables in the default locale of the local host. If you want to +specify a locale, you must use the -L option: +<pre> + LC_ALL=fr_FR ./pcre2_dftables -L src/pcre2_chartables.c +</pre> +You can also specify -b (with or without -L). This causes the tables to be +written in binary instead of as source code. A set of binary tables can be +loaded into memory by an application and passed to <b>pcre2_compile()</b> in the +same way as tables created by calling <b>pcre2_maketables()</b>. The tables are +just a string of bytes, independent of hardware characteristics such as +endianness. This means they can be bundled with an application that runs in +different environments, to ensure consistent behaviour. </P> <br><a name="SEC13" href="#TOC1">USING EBCDIC CODE</a><br> <P> @@ -538,7 +560,7 @@ support these modifiers. If <pre> --disable-percent-zt </pre> -is specified, no use is made of the z or t modifiers. Instead or %td or %zu, +is specified, no use is made of the z or t modifiers. Instead of %td or %zu, %lu is used, with a cast for size_t values. </P> <br><a name="SEC22" href="#TOC1">SUPPORT FOR FUZZERS</a><br> @@ -592,9 +614,9 @@ Cambridge, England. </P> <br><a name="SEC26" href="#TOC1">REVISION</a><br> <P> -Last updated: 03 March 2019 +Last updated: 20 March 2020 <br> -Copyright © 1997-2019 University of Cambridge. +Copyright © 1997-2020 University of Cambridge. <br> <p> Return to the <a href="index.html">PCRE2 index page</a>. diff --git a/doc/html/pcre2test.html b/doc/html/pcre2test.html index bfd22e9..63fa461 100644 --- a/doc/html/pcre2test.html +++ b/doc/html/pcre2test.html @@ -376,6 +376,12 @@ This command is used to load a set of precompiled patterns from a file, as described in the section entitled "Saving and restoring compiled patterns" <a href="#saverestore">below.</a> <pre> + #loadtables <filename> +</pre> +This command is used to load a set of binary character tables that can be +accessed by the tables=3 qualifier. Such tables can be created by the +<b>pcre2_dftables</b> program with the -b option. +<pre> #newline_default [<newline-list>] </pre> When PCRE2 is built, a default newline convention can be specified. This @@ -679,7 +685,7 @@ heavily used in the test files. pushcopy push a copy onto the stack stackguard=<number> test the stackguard feature subject_literal treat all subject lines as literal - tables=[0|1|2] select internal tables + tables=[0|1|2|3] select internal tables use_length do not zero-terminate the pattern utf8_input treat input as UTF-8 </pre> @@ -1027,18 +1033,20 @@ Using alternative character tables </b><br> <P> The value specified for the <b>tables</b> modifier must be one of the digits 0, -1, or 2. It causes a specific set of built-in character tables to be passed to -<b>pcre2_compile()</b>. This is used in the PCRE2 tests to check behaviour with -different character tables. The digit specifies the tables as follows: +1, 2, or 3. It causes a specific set of built-in character tables to be passed +to <b>pcre2_compile()</b>. This is used in the PCRE2 tests to check behaviour +with different character tables. The digit specifies the tables as follows: <pre> 0 do not pass any special character tables 1 the default ASCII tables, as distributed in pcre2_chartables.c.dist 2 a set of tables defining ISO 8859 characters + 3 a set of tables loaded by the #loadtables command </pre> -In table 2, some characters whose codes are greater than 128 are identified as -letters, digits, spaces, etc. Setting alternate character tables and a locale -are mutually exclusive. +In tables 2, some characters whose codes are greater than 128 are identified as +letters, digits, spaces, etc. Tables 3 can be used only after a +<b>#loadtables</b> command has loaded them from a binary file. Setting alternate +character tables and a locale are mutually exclusive. </P> <br><b> Setting certain match controls @@ -2105,7 +2113,7 @@ Cambridge, England. </P> <br><a name="SEC21" href="#TOC1">REVISION</a><br> <P> -Last updated: 22 January 2020 +Last updated: 20 March 2020 <br> Copyright © 1997-2020 University of Cambridge. <br> |