summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorBrian Pane <brianp@apache.org>2002-03-20 05:54:26 +0000
committerBrian Pane <brianp@apache.org>2002-03-20 05:54:26 +0000
commit5422fe57879e6867ad06b52ba861ded9b7dd0b91 (patch)
tree78f532bb767365969e2bba9e680bcc897c1157b1
parent0422abcb411682021eba8960ae701100a38fed43 (diff)
downloadhttpd-avendor.tar.gz
Import of PCRE 3.9avendor
git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/branches/avendor@94036 13f79535-47bb-0310-9956-ffa450edef68
-rw-r--r--srclib/pcre/AUTHORS2
-rw-r--r--srclib/pcre/COPYING26
-rw-r--r--srclib/pcre/ChangeLog175
-rw-r--r--srclib/pcre/INSTALL2
-rw-r--r--srclib/pcre/LICENCE26
-rw-r--r--srclib/pcre/Makefile.in220
-rw-r--r--srclib/pcre/NEWS39
-rw-r--r--srclib/pcre/NON-UNIX-USE5
-rw-r--r--srclib/pcre/README174
-rw-r--r--srclib/pcre/config.guess764
-rw-r--r--srclib/pcre/config.in33
-rw-r--r--srclib/pcre/config.sub280
-rw-r--r--srclib/pcre/configure.in52
-rw-r--r--srclib/pcre/dftables.c4
-rw-r--r--srclib/pcre/doc/Tech.Notes27
-rw-r--r--srclib/pcre/doc/pcre.3429
-rw-r--r--srclib/pcre/doc/pcre.html544
-rw-r--r--srclib/pcre/doc/pcre.txt511
-rw-r--r--srclib/pcre/doc/pcreposix.310
-rw-r--r--srclib/pcre/doc/pcreposix.html11
-rw-r--r--srclib/pcre/doc/pcreposix.txt11
-rw-r--r--srclib/pcre/doc/pcretest.txt531
-rw-r--r--srclib/pcre/doc/perltest.txt10
-rw-r--r--srclib/pcre/get.c40
-rw-r--r--srclib/pcre/internal.h78
-rw-r--r--srclib/pcre/ltmain.sh2452
-rw-r--r--srclib/pcre/maketables.c4
-rw-r--r--srclib/pcre/pcre.c725
-rw-r--r--srclib/pcre/pcre.in40
-rw-r--r--srclib/pcre/pcreposix.c12
-rw-r--r--srclib/pcre/pcreposix.h2
-rw-r--r--srclib/pcre/pcretest.c337
-rwxr-xr-xsrclib/pcre/perltest2
-rw-r--r--srclib/pcre/study.c38
-rw-r--r--srclib/pcre/testdata/testinput159
-rw-r--r--srclib/pcre/testdata/testinput219
-rw-r--r--srclib/pcre/testdata/testinput334
-rw-r--r--srclib/pcre/testdata/testinput41
-rw-r--r--srclib/pcre/testdata/testoutput1116
-rw-r--r--srclib/pcre/testdata/testoutput2318
-rw-r--r--srclib/pcre/testdata/testoutput366
-rw-r--r--srclib/pcre/testdata/testoutput43
42 files changed, 6167 insertions, 2065 deletions
diff --git a/srclib/pcre/AUTHORS b/srclib/pcre/AUTHORS
index bfe1b5d8a4..832dddca45 100644
--- a/srclib/pcre/AUTHORS
+++ b/srclib/pcre/AUTHORS
@@ -3,4 +3,4 @@ Written by: Philip Hazel <ph10@cam.ac.uk>
University of Cambridge Computing Service,
Cambridge, England. Phone: +44 1223 334714.
-Copyright (c) 1997-2000 University of Cambridge
+Copyright (c) 1997-2001 University of Cambridge
diff --git a/srclib/pcre/COPYING b/srclib/pcre/COPYING
index f305033c16..8effa66492 100644
--- a/srclib/pcre/COPYING
+++ b/srclib/pcre/COPYING
@@ -9,7 +9,7 @@ Written by: Philip Hazel <ph10@cam.ac.uk>
University of Cambridge Computing Service,
Cambridge, England. Phone: +44 1223 334714.
-Copyright (c) 1997-2000 University of Cambridge
+Copyright (c) 1997-2001 University of Cambridge
Permission is granted to anyone to use this software for any purpose on any
computer system, and to redistribute it freely, subject to the following
@@ -20,13 +20,31 @@ restrictions:
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
2. The origin of this software must not be misrepresented, either by
- explicit claim or by omission.
+ explicit claim or by omission. In practice, this means that if you use
+ PCRE in software which you distribute to others, commercially or
+ otherwise, you must put a sentence like this
+
+ Regular expression support is provided by the PCRE library package,
+ which is open source software, written by Philip Hazel, and copyright
+ by the University of Cambridge, England.
+
+ somewhere reasonably visible in your documentation and in any relevant
+ files or online help data or similar. A reference to the ftp site for
+ the source, that is, to
+
+ ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/
+
+ should also be given in the documentation.
3. Altered versions must be plainly marked as such, and must not be
misrepresented as being the original software.
4. If PCRE is embedded in any software that is released under the GNU
- General Purpose Licence (GPL), then the terms of that licence shall
- supersede any condition above with which it is incompatible.
+ General Purpose Licence (GPL), or Lesser General Purpose Licence (LGPL),
+ then the terms of that licence shall supersede any condition above with
+ which it is incompatible.
+
+The documentation for PCRE, supplied in the "doc" directory, is distributed
+under the same terms as the software itself.
End
diff --git a/srclib/pcre/ChangeLog b/srclib/pcre/ChangeLog
index c594ab50d8..6a71290f78 100644
--- a/srclib/pcre/ChangeLog
+++ b/srclib/pcre/ChangeLog
@@ -1,6 +1,181 @@
ChangeLog for PCRE
------------------
+Version 3.0 02-Jan-02
+---------------------
+
+1. A bit of extraneous text had somehow crept into the pcregrep documentation.
+
+2. If --disable-static was given, the building process failed when trying to
+build pcretest and pcregrep. (For some reason it was using libtool to compile
+them, which is not right, as they aren't part of the library.)
+
+
+Version 3.8 18-Dec-01
+---------------------
+
+1. The experimental UTF-8 code was completely screwed up. It was packing the
+bytes in the wrong order. How dumb can you get?
+
+
+Version 3.7 29-Oct-01
+---------------------
+
+1. In updating pcretest to check change 1 of version 3.6, I screwed up.
+This caused pcretest, when used on the test data, to segfault. Unfortunately,
+this didn't happen under Solaris 8, where I normally test things.
+
+2. The Makefile had to be changed to make it work on BSD systems, where 'make'
+doesn't seem to recognize that ./xxx and xxx are the same file. (This entry
+isn't in ChangeLog distributed with 3.7 because I forgot when I hastily made
+this fix an hour or so after the initial 3.7 release.)
+
+
+Version 3.6 23-Oct-01
+---------------------
+
+1. Crashed with /(sens|respons)e and \1ibility/ and "sense and sensibility" if
+offsets passed as NULL with zero offset count.
+
+2. The config.guess and config.sub files had not been updated when I moved to
+the latest autoconf.
+
+
+Version 3.5 15-Aug-01
+---------------------
+
+1. Added some missing #if !defined NOPOSIX conditionals in pcretest.c that
+had been forgotten.
+
+2. By using declared but undefined structures, we can avoid using "void"
+definitions in pcre.h while keeping the internal definitions of the structures
+private.
+
+3. The distribution is now built using autoconf 2.50 and libtool 1.4. From a
+user point of view, this means that both static and shared libraries are built
+by default, but this can be individually controlled. More of the work of
+handling this static/shared cases is now inside libtool instead of PCRE's make
+file.
+
+4. The pcretest utility is now installed along with pcregrep because it is
+useful for users (to test regexs) and by doing this, it automatically gets
+relinked by libtool. The documentation has been turned into a man page, so
+there are now .1, .txt, and .html versions in /doc.
+
+5. Upgrades to pcregrep:
+ (i) Added long-form option names like gnu grep.
+ (ii) Added --help to list all options with an explanatory phrase.
+ (iii) Added -r, --recursive to recurse into sub-directories.
+ (iv) Added -f, --file to read patterns from a file.
+
+6. pcre_exec() was referring to its "code" argument before testing that
+argument for NULL (and giving an error if it was NULL).
+
+7. Upgraded Makefile.in to allow for compiling in a different directory from
+the source directory.
+
+8. Tiny buglet in pcretest: when pcre_fullinfo() was called to retrieve the
+options bits, the pointer it was passed was to an int instead of to an unsigned
+long int. This mattered only on 64-bit systems.
+
+9. Fixed typo (3.4/1) in pcre.h again. Sigh. I had changed pcre.h (which is
+generated) instead of pcre.in, which it its source. Also made the same change
+in several of the .c files.
+
+10. A new release of gcc defines printf() as a macro, which broke pcretest
+because it had an ifdef in the middle of a string argument for printf(). Fixed
+by using separate calls to printf().
+
+11. Added --enable-newline-is-cr and --enable-newline-is-lf to the configure
+script, to force use of CR or LF instead of \n in the source. On non-Unix
+systems, the value can be set in config.h.
+
+12. The limit of 200 on non-capturing parentheses is a _nesting_ limit, not an
+absolute limit. Changed the text of the error message to make this clear, and
+likewise updated the man page.
+
+13. The limit of 99 on the number of capturing subpatterns has been removed.
+The new limit is 65535, which I hope will not be a "real" limit.
+
+
+Version 3.4 22-Aug-00
+---------------------
+
+1. Fixed typo in pcre.h: unsigned const char * changed to const unsigned char *.
+
+2. Diagnose condition (?(0) as an error instead of crashing on matching.
+
+
+Version 3.3 01-Aug-00
+---------------------
+
+1. If an octal character was given, but the value was greater than \377, it
+was not getting masked to the least significant bits, as documented. This could
+lead to crashes in some systems.
+
+2. Perl 5.6 (if not earlier versions) accepts classes like [a-\d] and treats
+the hyphen as a literal. PCRE used to give an error; it now behaves like Perl.
+
+3. Added the functions pcre_free_substring() and pcre_free_substring_list().
+These just pass their arguments on to (pcre_free)(), but they are provided
+because some uses of PCRE bind it to non-C systems that can call its functions,
+but cannot call free() or pcre_free() directly.
+
+4. Add "make test" as a synonym for "make check". Corrected some comments in
+the Makefile.
+
+5. Add $(DESTDIR)/ in front of all the paths in the "install" target in the
+Makefile.
+
+6. Changed the name of pgrep to pcregrep, because Solaris has introduced a
+command called pgrep for grepping around the active processes.
+
+7. Added the beginnings of support for UTF-8 character strings.
+
+8. Arranged for the Makefile to pass over the settings of CC, CFLAGS, and
+RANLIB to ./ltconfig so that they are used by libtool. I think these are all
+the relevant ones. (AR is not passed because ./ltconfig does its own figuring
+out for the ar command.)
+
+
+Version 3.2 12-May-00
+---------------------
+
+This is purely a bug fixing release.
+
+1. If the pattern /((Z)+|A)*/ was matched agained ZABCDEFG it matched Z instead
+of ZA. This was just one example of several cases that could provoke this bug,
+which was introduced by change 9 of version 2.00. The code for breaking
+infinite loops after an iteration that matches an empty string was't working
+correctly.
+
+2. The pcretest program was not imitating Perl correctly for the pattern /a*/g
+when matched against abbab (for example). After matching an empty string, it
+wasn't forcing anchoring when setting PCRE_NOTEMPTY for the next attempt; this
+caused it to match further down the string than it should.
+
+3. The code contained an inclusion of sys/types.h. It isn't clear why this
+was there because it doesn't seem to be needed, and it causes trouble on some
+systems, as it is not a Standard C header. It has been removed.
+
+4. Made 4 silly changes to the source to avoid stupid compiler warnings that
+were reported on the Macintosh. The changes were from
+
+ while ((c = *(++ptr)) != 0 && c != '\n');
+to
+ while ((c = *(++ptr)) != 0 && c != '\n') ;
+
+Totally extraordinary, but if that's what it takes...
+
+5. PCRE is being used in one environment where neither memmove() nor bcopy() is
+available. Added HAVE_BCOPY and an autoconf test for it; if neither
+HAVE_MEMMOVE nor HAVE_BCOPY is set, use a built-in emulation function which
+assumes the way PCRE uses memmove() (always moving upwards).
+
+6. PCRE is being used in one environment where strchr() is not available. There
+was only one use in pcre.c, and writing it out to avoid strchr() probably gives
+faster code anyway.
+
Version 3.1 09-Feb-00
---------------------
diff --git a/srclib/pcre/INSTALL b/srclib/pcre/INSTALL
index d63a78fef9..08802812de 100644
--- a/srclib/pcre/INSTALL
+++ b/srclib/pcre/INSTALL
@@ -4,7 +4,7 @@ Basic Installation
These are generic installation instructions that apply to systems that
can run the `configure' shell script - Unix systems and any that imitate
it. They are not specific to PCRE. There are PCRE-specific instructions
-for non-Unix systems in the file NON-UNIX.
+for non-Unix systems in the file NON-UNIX-USE.
The `configure' shell script attempts to guess correct values for
various system-dependent variables used during compilation. It uses
diff --git a/srclib/pcre/LICENCE b/srclib/pcre/LICENCE
index f305033c16..8effa66492 100644
--- a/srclib/pcre/LICENCE
+++ b/srclib/pcre/LICENCE
@@ -9,7 +9,7 @@ Written by: Philip Hazel <ph10@cam.ac.uk>
University of Cambridge Computing Service,
Cambridge, England. Phone: +44 1223 334714.
-Copyright (c) 1997-2000 University of Cambridge
+Copyright (c) 1997-2001 University of Cambridge
Permission is granted to anyone to use this software for any purpose on any
computer system, and to redistribute it freely, subject to the following
@@ -20,13 +20,31 @@ restrictions:
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
2. The origin of this software must not be misrepresented, either by
- explicit claim or by omission.
+ explicit claim or by omission. In practice, this means that if you use
+ PCRE in software which you distribute to others, commercially or
+ otherwise, you must put a sentence like this
+
+ Regular expression support is provided by the PCRE library package,
+ which is open source software, written by Philip Hazel, and copyright
+ by the University of Cambridge, England.
+
+ somewhere reasonably visible in your documentation and in any relevant
+ files or online help data or similar. A reference to the ftp site for
+ the source, that is, to
+
+ ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/
+
+ should also be given in the documentation.
3. Altered versions must be plainly marked as such, and must not be
misrepresented as being the original software.
4. If PCRE is embedded in any software that is released under the GNU
- General Purpose Licence (GPL), then the terms of that licence shall
- supersede any condition above with which it is incompatible.
+ General Purpose Licence (GPL), or Lesser General Purpose Licence (LGPL),
+ then the terms of that licence shall supersede any condition above with
+ which it is incompatible.
+
+The documentation for PCRE, supplied in the "doc" directory, is distributed
+under the same terms as the software itself.
End
diff --git a/srclib/pcre/Makefile.in b/srclib/pcre/Makefile.in
index b837424777..b7e862b071 100644
--- a/srclib/pcre/Makefile.in
+++ b/srclib/pcre/Makefile.in
@@ -12,20 +12,44 @@
# DLL_LDFLAGS=-s
+#############################################################################
+
+# PCRE is developed on a Unix system. I do not use Windows or Macs, and know
+# nothing about building software on them. Although the code of PCRE should
+# be very portable, the building system in this Makefile is designed for Unix
+# systems, with the exception of the mingw32 stuff just mentioned.
+
+# This setting enables Unix-style directory scanning in pcregrep, triggered
+# by the -f option. Maybe one day someone will add code for other systems.
+
+PCREGREP_OSTYPE=-DIS_UNIX
+
+#############################################################################
+
+
#---------------------------------------------------------------------------#
-# The next few lines are modified by "configure" to insert data that it is #
+# The following lines are modified by "configure" to insert data that it is #
# given in its arguments, or which it finds out for itself. #
#---------------------------------------------------------------------------#
-# BINDIR is the directory in which the pgrep command is installed.
-# INCDIR is the directory in which the public header file pcre.h is installed.
-# LIBDIR is the directory in which the libraries are installed.
-# MANDIR is the directory in which the man pages are installed.
-# The pcretest program, as it is a test program, does not get installed
-# anywhere.
-
+SHELL = @SHELL@
prefix = @prefix@
exec_prefix = @exec_prefix@
+top_srcdir = @top_srcdir@
+
+mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs
+
+# NB: top_builddir is not referred to directly below, but it is used in the
+# setting of $(LIBTOOL), so don't remove it!
+
+top_builddir = .
+
+# BINDIR is the directory in which the pcregrep, pcretest, and pcre-config
+# commands are installed.
+# INCDIR is the directory in which the public header files pcre.h and
+# pcreposix.h are installed.
+# LIBDIR is the directory in which the libraries are installed.
+# MANDIR is the directory in which the man pages are installed.
BINDIR = @bindir@
LIBDIR = @libdir@
@@ -35,152 +59,110 @@ MANDIR = @mandir@
CC = @CC@
CFLAGS = @CFLAGS@
RANLIB = @RANLIB@
+UTF8 = @UTF8@
+NEWLINE = @NEWLINE@
+
+INSTALL = @INSTALL@
+INSTALL_DATA = @INSTALL_DATA@
-# LIBTOOL defaults to "", which cuts out the building of shared libraries.
-# If "configure" is called with --enable-shared-libraries, then LIBTOOL is
-# set to "libtool", which causes shared libraries to be built, and LIBSUFFIX
-# is set to "la" instead of "a", which causes the shared libraries to be
-# installed.
+# LIBTOOL enables the building of shared and static libraries. It is set up
+# to do one or the other or both by ./configure.
LIBTOOL = @LIBTOOL@
-LIBSUFFIX = @LIBSUFFIX@
+LTCOMPILE = $(LIBTOOL) --mode=compile $(CC) -c $(CFLAGS) -I. $(NEWLINE)
+LINK = $(LIBTOOL) --mode=link $(CC)
# These are the version numbers for the shared libraries
PCRELIBVERSION = @PCRE_LIB_VERSION@
PCREPOSIXLIBVERSION = @PCRE_POSIXLIB_VERSION@
-
-#---------------------------------------------------------------------------#
-# A copy of install-sh is in this distribution and is used by default. #
-#---------------------------------------------------------------------------#
-
-INSTALL = ./install-sh -c
-INSTALL_DATA = ${INSTALL} -m 644
-
-
-#---------------------------------------------------------------------------#
-# For almost all systems, the command to create a library is "ar cq", but #
-# there is at least one where it is different, to make this configurable. #
-# However, I haven't got round to learning how to make "configure" find #
-# this out for itself. It is necessary to use a command such as #
-# "make AR='ar -rc'" if you need to vary this. #
-#---------------------------------------------------------------------------#
-
-AR = ar cq
-
-
##############################################################################
OBJ = maketables.o get.o study.o pcre.o
LOBJ = maketables.lo get.lo study.lo pcre.lo
-all: libtool libpcre.$(LIBSUFFIX) libpcreposix.$(LIBSUFFIX) pcretest pgrep
-
-libtool: config.guess config.sub ltconfig ltmain.sh
- @if test "$(LIBTOOL)" = "libtool"; then \
- echo '--- Building libtool ---'; \
- ./ltconfig ./ltmain.sh; \
- echo '--- Built libtool ---'; fi
-
-pgrep: libpcre.$(LIBSUFFIX) pgrep.o
- @echo ' '
- @echo '--- Building pgrep utility'
- @echo ' '
- $(LIBTOOL) $(CC) $(CFLAGS) -o pgrep pgrep.o libpcre.$(LIBSUFFIX)
-
-pcretest: libpcre.$(LIBSUFFIX) libpcreposix.$(LIBSUFFIX) pcretest.o
- @echo ' '
- @echo '--- Building pcretest testing program'
- @echo ' '
- $(LIBTOOL) $(PURIFY) $(CC) $(CFLAGS) -o pcretest pcretest.o \
- libpcre.$(LIBSUFFIX) libpcreposix.$(LIBSUFFIX)
-
-libpcre.a: $(OBJ)
- @echo ' '
- @echo '--- Building static library: libpcre'
- @echo ' '
- -rm -f libpcre.a
- $(AR) libpcre.a $(OBJ)
- $(RANLIB) libpcre.a
+all: libpcre.la libpcreposix.la pcretest pcregrep
+
+pcregrep: libpcre.la pcregrep.o
+ $(LINK) $(CFLAGS) -o pcregrep pcregrep.o libpcre.la
+
+pcretest: libpcre.la libpcreposix.la pcretest.o
+ $(LINK) $(PURIFY) $(CFLAGS) -o pcretest pcretest.o \
+ libpcre.la libpcreposix.la
libpcre.la: $(OBJ)
- @echo ' '
- @echo '--- Building shared library: libpcre'
- @echo ' '
-rm -f libpcre.la
- libtool $(CC) -version-info '$(PCRELIBVERSION)' -o libpcre.la -rpath $(LIBDIR) $(LOBJ)
-
-libpcreposix.a: pcreposix.o
- @echo ' '
- @echo '--- Building static library: libpcreposix'
- @echo ' '
- -rm -f libpcreposix.a
- $(AR) libpcreposix.a pcreposix.o
- $(RANLIB) libpcreposix.a
+ $(LINK) -rpath $(LIBDIR) -version-info \
+ '$(PCRELIBVERSION)' -o libpcre.la $(LOBJ)
libpcreposix.la: pcreposix.o
- @echo ' '
- @echo '--- Building shared library: libpcreposix'
- @echo ' '
-rm -f libpcreposix.la
- libtool $(CC) -version-info '$(PCREPOSIXLIBVERSION)' -o libpcreposix.la -rpath $(LIBDIR) pcreposix.lo
+ $(LINK) -rpath $(LIBDIR) -version-info \
+ '$(PCREPOSIXLIBVERSION)' -o libpcreposix.la pcreposix.lo
-pcre.o: chartables.c pcre.c pcre.h internal.h config.h Makefile
- $(LIBTOOL) $(CC) -c $(CFLAGS) pcre.c
+pcre.o: $(top_srcdir)/chartables.c $(top_srcdir)/pcre.c \
+ $(top_srcdir)/internal.h pcre.h config.h Makefile
+ $(LTCOMPILE) $(UTF8) $(top_srcdir)/pcre.c
-pcreposix.o: pcreposix.c pcreposix.h internal.h pcre.h config.h Makefile
- $(LIBTOOL) $(CC) -c $(CFLAGS) pcreposix.c
+pcreposix.o: $(top_srcdir)/pcreposix.c $(top_srcdir)/pcreposix.h \
+ $(top_srcdir)/internal.h pcre.h config.h Makefile
+ $(LTCOMPILE) $(top_srcdir)/pcreposix.c
-maketables.o: maketables.c pcre.h internal.h config.h Makefile
- $(LIBTOOL) $(CC) -c $(CFLAGS) maketables.c
+maketables.o: $(top_srcdir)/maketables.c $(top_srcdir)/internal.h \
+ pcre.h config.h Makefile
+ $(LTCOMPILE) $(top_srcdir)/maketables.c
-get.o: get.c pcre.h internal.h config.h Makefile
- $(LIBTOOL) $(CC) -c $(CFLAGS) get.c
+get.o: $(top_srcdir)/get.c $(top_srcdir)/internal.h \
+ pcre.h config.h Makefile
+ $(LTCOMPILE) $(top_srcdir)/get.c
-study.o: study.c pcre.h internal.h config.h Makefile
- $(LIBTOOL) $(CC) -c $(CFLAGS) study.c
+study.o: $(top_srcdir)/study.c $(top_srcdir)/internal.h \
+ pcre.h config.h Makefile
+ $(LTCOMPILE) $(UTF8) $(top_srcdir)/study.c
-pcretest.o: pcretest.c pcre.h config.h Makefile
- $(CC) -c $(CFLAGS) pcretest.c
+pcretest.o: $(top_srcdir)/pcretest.c $(top_srcdir)/internal.h pcre.h config.h Makefile
+ $(CC) -c $(CFLAGS) -I. $(UTF8) $(top_srcdir)/pcretest.c
-pgrep.o: pgrep.c pcre.h Makefile config.h
- $(CC) -c $(CFLAGS) pgrep.c
+pcregrep.o: $(top_srcdir)/pcregrep.c pcre.h Makefile config.h
+ $(CC) -c $(CFLAGS) -I. $(UTF8) $(PCREGREP_OSTYPE) $(top_srcdir)/pcregrep.c
# An auxiliary program makes the default character table source
-chartables.c: dftables
- ./dftables >chartables.c
+$(top_srcdir)/chartables.c: dftables
+ ./dftables >$(top_srcdir)/chartables.c
-dftables: dftables.c maketables.c pcre.h internal.h config.h Makefile
- $(CC) -o dftables $(CFLAGS) dftables.c
+dftables: $(top_srcdir)/dftables.c $(top_srcdir)/maketables.c \
+ $(top_srcdir)/internal.h pcre.h config.h Makefile
+ $(LINK) -o dftables $(CFLAGS) $(top_srcdir)/dftables.c
install: all
- $(LIBTOOL) $(INSTALL_DATA) libpcre.$(LIBSUFFIX) $(LIBDIR)/libpcre.$(LIBSUFFIX)
- $(LIBTOOL) $(INSTALL_DATA) libpcreposix.$(LIBSUFFIX) $(LIBDIR)/libpcreposix.$(LIBSUFFIX)
- $(INSTALL_DATA) pcre.h $(INCDIR)/pcre.h
- $(INSTALL_DATA) pcreposix.h $(INCDIR)/pcreposix.h
- $(INSTALL_DATA) doc/pcre.3 $(MANDIR)/man3/pcre.3
- $(INSTALL_DATA) doc/pcreposix.3 $(MANDIR)/man3/pcreposix.3
- $(INSTALL_DATA) doc/pgrep.1 $(MANDIR)/man1/pgrep.1
- @if test "$(LIBTOOL)" = "libtool"; then \
- echo ' '; \
- echo '--- Rebuilding pgrep to use installed shared library ---'; \
- echo $(CC) $(CFLAGS) -o pgrep pgrep.o -L$(LIBDIR) -lpcre; \
- $(CC) $(CFLAGS) -o pgrep pgrep.o -L$(LIBDIR) -lpcre; \
- echo '--- Rebuilding pcretest to use installed shared library ---'; \
- echo $(CC) $(CFLAGS) -o pcretest pcretest.o -L$(LIBDIR) -lpcre -lpcreposix; \
- $(CC) $(CFLAGS) -o pcretest pcretest.o -L$(LIBDIR) -lpcre -lpcreposix; \
- fi
- $(INSTALL) pgrep $(BINDIR)/pgrep
- $(INSTALL) pcre-config $(BINDIR)/pcre-config
+ $(mkinstalldirs) $(DESTDIR)/$(LIBDIR)
+ echo "$(LIBTOOL) --mode=install $(INSTALL) libpcre.la $(DESTDIR)/$(LIBDIR)/libpcre.la"
+ $(LIBTOOL) --mode=install $(INSTALL) libpcre.la $(DESTDIR)/$(LIBDIR)/libpcre.la
+ echo "$(LIBTOOL) --mode=install $(INSTALL) libpcreposix.la $(DESTDIR)/$(LIBDIR)/libpcreposix.la"
+ $(LIBTOOL) --mode=install $(INSTALL) libpcreposix.la $(DESTDIR)/$(LIBDIR)/libpcreposix.la
+ $(LIBTOOL) --finish $(DESTDIR)/$(LIBDIR)
+ $(mkinstalldirs) $(DESTDIR)/$(INCDIR)
+ $(INSTALL_DATA) pcre.h $(DESTDIR)/$(INCDIR)/pcre.h
+ $(INSTALL_DATA) $(top_srcdir)/pcreposix.h $(DESTDIR)/$(INCDIR)/pcreposix.h
+ $(mkinstalldirs) $(DESTDIR)/$(MANDIR)/man3
+ $(INSTALL_DATA) $(top_srcdir)/doc/pcre.3 $(DESTDIR)/$(MANDIR)/man3/pcre.3
+ $(INSTALL_DATA) $(top_srcdir)/doc/pcreposix.3 $(DESTDIR)/$(MANDIR)/man3/pcreposix.3
+ $(mkinstalldirs) $(DESTDIR)/$(MANDIR)/man1
+ $(INSTALL_DATA) $(top_srcdir)/doc/pcregrep.1 $(DESTDIR)/$(MANDIR)/man1/pcregrep.1
+ $(INSTALL_DATA) $(top_srcdir)/doc/pcretest.1 $(DESTDIR)/$(MANDIR)/man1/pcretest.1
+ $(mkinstalldirs) $(DESTDIR)/$(BINDIR)
+ $(LIBTOOL) --mode=install $(INSTALL) pcregrep $(DESTDIR)/$(BINDIR)/pcregrep
+ $(LIBTOOL) --mode=install $(INSTALL) pcretest $(DESTDIR)/$(BINDIR)/pcretest
+ $(INSTALL) pcre-config $(DESTDIR)/$(BINDIR)/pcre-config
# We deliberately omit dftables and chartables.c from 'make clean'; once made
# chartables.c shouldn't change, and if people have edited the tables by hand,
# you don't want to throw them away.
-clean:; -rm -rf *.o *.lo *.a *.la .libs pcretest pgrep testtry
+clean:; -rm -rf *.o *.lo *.a *.la .libs pcretest pcregrep testtry
# But "make distclean" should get back to a virgin distribution
@@ -190,6 +172,8 @@ distclean: clean
check: runtest
+test: runtest
+
runtest: all
./RunTest
@@ -198,7 +182,7 @@ runtest: all
# This addition for mingw32 was contributed by Paul Sokolovsky
# <Paul.Sokolovsky@technologist.com>. I (PH) don't know anything about it!
-dll: _dll libpcre.dll.a pgrep_d pcretest_d
+dll: _dll libpcre.dll.a pcregrep_d pcretest_d
_dll:
$(MAKE) CFLAGS=-DSTATIC pcre.dll
@@ -206,8 +190,8 @@ _dll:
pcre.dll: $(OBJ) pcreposix.o pcre.def
libpcre.dll.a: pcre.def
-pgrep_d: libpcre.dll.a pgrep.o
- $(CC) $(CFLAGS) -L. -o pgrep pgrep.o -lpcre.dll
+pcregrep_d: libpcre.dll.a pcregrep.o
+ $(CC) $(CFLAGS) -L. -o pcregrep pcregrep.o -lpcre.dll
pcretest_d: libpcre.dll.a pcretest.o
$(PURIFY) $(CC) $(CFLAGS) -L. -o pcretest pcretest.o -lpcre.dll
diff --git a/srclib/pcre/NEWS b/srclib/pcre/NEWS
index 4c80bd6833..27866b68a2 100644
--- a/srclib/pcre/NEWS
+++ b/srclib/pcre/NEWS
@@ -1,6 +1,45 @@
News about PCRE releases
------------------------
+Release 3.5 15-Aug-01
+---------------------
+
+1. The configuring system has been upgraded to use later versions of autoconf
+and libtool. By default it builds both a shared and a static library if the OS
+supports it. You can use --disable-shared or --disable-static on the configure
+command if you want only one of them.
+
+2. The pcretest utility is now installed along with pcregrep because it is
+useful for users (to test regexs) and by doing this, it automatically gets
+relinked by libtool. The documentation has been turned into a man page, so
+there are now .1, .txt, and .html versions in /doc.
+
+3. Upgrades to pcregrep:
+ (i) Added long-form option names like gnu grep.
+ (ii) Added --help to list all options with an explanatory phrase.
+ (iii) Added -r, --recursive to recurse into sub-directories.
+ (iv) Added -f, --file to read patterns from a file.
+
+4. Added --enable-newline-is-cr and --enable-newline-is-lf to the configure
+script, to force use of CR or LF instead of \n in the source. On non-Unix
+systems, the value can be set in config.h.
+
+5. The limit of 200 on non-capturing parentheses is a _nesting_ limit, not an
+absolute limit. Changed the text of the error message to make this clear, and
+likewise updated the man page.
+
+6. The limit of 99 on the number of capturing subpatterns has been removed.
+The new limit is 65535, which I hope will not be a "real" limit.
+
+
+Release 3.3 01-Aug-00
+---------------------
+
+There is some support for UTF-8 character strings. This is incomplete and
+experimental. The documentation describes what is and what is not implemented.
+Otherwise, this is just a bug-fixing release.
+
+
Release 3.0 01-Feb-00
---------------------
diff --git a/srclib/pcre/NON-UNIX-USE b/srclib/pcre/NON-UNIX-USE
index 09a743245b..14b1cc0d05 100644
--- a/srclib/pcre/NON-UNIX-USE
+++ b/srclib/pcre/NON-UNIX-USE
@@ -9,7 +9,10 @@ commands to do the following:
(1) Copy or rename the file config.in as config.h, and change the macros that
define HAVE_STRERROR and HAVE_MEMMOVE to define them as 1 rather than 0.
Unfortunately, because of the way Unix autoconf works, the default setting has
-to be 0.
+to be 0. You may also want to make changes to other macros in config.h. In
+particular, if you want to force a specific value for newline, you can define
+the NEWLINE macro. The default is to use '\n', thereby using whatever value
+your compiler gives to '\n'.
(2) Copy or rename the file pcre.in as pcre.h, and change the macro definitions
for PCRE_MAJOR, PCRE_MINOR, and PCRE_DATE near its start to the values set in
diff --git a/srclib/pcre/README b/srclib/pcre/README
index 90aaf4d6a6..7557374791 100644
--- a/srclib/pcre/README
+++ b/srclib/pcre/README
@@ -7,30 +7,73 @@ The latest release of PCRE is always available from
Please read the NEWS file if you are upgrading from a previous release.
+PCRE has its own native API, but a set of "wrapper" functions that are based on
+the POSIX API are also supplied in the library libpcreposix. Note that this
+just provides a POSIX calling interface to PCRE: the regular expressions
+themselves still follow Perl syntax and semantics. The header file
+for the POSIX-style functions is called pcreposix.h. The official POSIX name is
+regex.h, but I didn't want to risk possible problems with existing files of
+that name by distributing it that way. To use it with an existing program that
+uses the POSIX API, it will have to be renamed or pointed at by a link.
+
+
+Contributions by users of PCRE
+------------------------------
+
+You can find contributions from PCRE users in the directory
+
+ ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
+
+where there is also a README file giving brief descriptions of what they are.
+Several of them provide support for compiling PCRE on various flavours of
+Windows systems (I myself do not use Windows). Some are complete in themselves;
+others are pointers to URLs containing relevant files.
+
Building PCRE on a Unix system
------------------------------
-To build PCRE on a Unix system, run the "configure" command in the PCRE
-distribution directory. This is a standard GNU "autoconf" configuration script,
-for which generic instructions are supplied in INSTALL. On many systems just
-running "./configure" is sufficient, but the usual methods of changing standard
-defaults are available. For example
+To build PCRE on a Unix system, first run the "configure" command from the PCRE
+distribution directory, with your current directory set to the directory where
+you want the files to be created. This command is a standard GNU "autoconf"
+configuration script, for which generic instructions are supplied in INSTALL.
+
+Most commonly, people build PCRE within its own distribution directory, and in
+this case, on many systems, just running "./configure" is sufficient, but the
+usual methods of changing standard defaults are available. For example,
CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
specifies that the C compiler should be run with the flags '-O2 -Wall' instead
of the default, and that "make install" should install PCRE under /opt/local
-instead of the default /usr/local. The "configure" script builds thre files:
+instead of the default /usr/local.
+
+If you want to build in a different directory, just run "configure" with that
+directory as current. For example, suppose you have unpacked the PCRE source
+into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx:
+
+cd /build/pcre/pcre-xxx
+/source/pcre/pcre-xxx/configure
+
+If you want to make use of the experimential, incomplete support for UTF-8
+character strings in PCRE, you must add --enable-utf8 to the "configure"
+command. Without it, the code for handling UTF-8 is not included in the
+library. (Even when included, it still has to be enabled by an option at run
+time.)
+The "configure" script builds five files:
+
+. libtool is a script that builds shared and/or static libraries
. Makefile is built by copying Makefile.in and making substitutions.
. config.h is built by copying config.in and making substitutions.
. pcre-config is built by copying pcre-config.in and making substitutions.
+. RunTest is a script for running tests
Once "configure" has run, you can run "make". It builds two libraries called
-libpcre and libpcreposix, a test program called pcretest, and the pgrep
-command. You can use "make install" to copy these, and the public header file
-pcre.h, to appropriate live directories on your system, in the normal way.
+libpcre and libpcreposix, a test program called pcretest, and the pcregrep
+command. You can use "make install" to copy these, the public header files
+pcre.h and pcreposix.h, and the man pages to appropriate live directories on
+your system, in the normal way.
Running "make install" also installs the command pcre-config, which can be used
to recall information about the PCRE configuration and installation. For
@@ -46,26 +89,38 @@ outputs information about where the library is installed. This command can be
included in makefiles for programs that use PCRE, saving the programmer from
having to remember too many details.
+There is one esoteric feature that is controlled by "configure". It concerns
+the character value used for "newline", and is something that you probably do
+not want to change on a Unix system. The default is to use whatever value your
+compiler gives to '\n'. By using --enable-newline-is-cr or
+--enable-newline-is-lf you can force the value to be CR (13) or LF (10) if you
+really want to.
+
Shared libraries on Unix systems
--------------------------------
-The default distribution builds PCRE as two shared libraries. This support is
-new and experimental and may not work on all systems. It relies on the
-"libtool" scripts - these are distributed with PCRE. It should build a
-"libtool" script and use this to compile and link shared libraries, which are
-placed in a subdirectory called .libs. The programs pcretest and pgrep are
-built to use these uninstalled libraries by means of wrapper scripts. When you
-use "make install" to install shared libraries, pgrep and pcretest are
-automatically re-built to use the newly installed libraries. However, only
-pgrep is installed, as pcretest is really just a test program.
-
-To build PCRE using static libraries you must use --disable-shared when
+The default distribution builds PCRE as two shared libraries and two static
+libraries, as long as the operating system supports shared libraries. Shared
+library support relies on the "libtool" script which is built as part of the
+"configure" process.
+
+The libtool script is used to compile and link both shared and static
+libraries. They are placed in a subdirectory called .libs when they are newly
+built. The programs pcretest and pcregrep are built to use these uninstalled
+libraries (by means of wrapper scripts in the case of shared libraries). When
+you use "make install" to install shared libraries, pcregrep and pcretest are
+automatically re-built to use the newly installed shared libraries before being
+installed themselves. However, the versions left in the source directory still
+use the uninstalled libraries.
+
+To build PCRE using static libraries only you must use --disable-shared when
configuring it. For example
./configure --prefix=/usr/gnu --disable-shared
-Then run "make" in the usual way.
+Then run "make" in the usual way. Similarly, you can use --disable-static to
+build only shared libraries.
Building on non-Unix systems
@@ -81,28 +136,40 @@ Standard C functions.
Testing PCRE
------------
-To test PCRE on a Unix system, run the RunTest script in the pcre directory.
-(This can also be run by "make runtest" or "make check".) For other systems,
-see the instruction in NON-UNIX-USE.
+To test PCRE on a Unix system, run the RunTest script that is created by the
+configuring process. (This can also be run by "make runtest", "make check", or
+"make test".) For other systems, see the instruction in NON-UNIX-USE.
-The script runs the pcretest test program (which is documented in
-doc/pcretest.txt) on each of the testinput files (in the testdata directory) in
-turn, and compares the output with the contents of the corresponding testoutput
-file. A file called testtry is used to hold the output from pcretest. To run
-pcretest on just one of the test files, give its number as an argument to
-RunTest, for example:
+The script runs the pcretest test program (which is documented in the doc
+directory) on each of the testinput files (in the testdata directory) in turn,
+and compares the output with the contents of the corresponding testoutput file.
+A file called testtry is used to hold the output from pcretest. To run pcretest
+on just one of the test files, give its number as an argument to RunTest, for
+example:
RunTest 3
The first and third test files can also be fed directly into the perltest
script to check that Perl gives the same results. The third file requires the
additional features of release 5.005, which is why it is kept separate from the
-main test input, which needs only Perl 5.004. In the long run, when 5.005 is
-widespread, these two test files may get amalgamated.
-
-The second set of tests check pcre_info(), pcre_study(), pcre_copy_substring(),
-pcre_get_substring(), pcre_get_substring_list(), error detection and run-time
-flags that are specific to PCRE, as well as the POSIX wrapper API.
+main test input, which needs only Perl 5.004. In the long run, when 5.005 (or
+higher) is widespread, these two test files may get amalgamated.
+
+The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
+pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
+detection, and run-time flags that are specific to PCRE, as well as the POSIX
+wrapper API. It also uses the debugging flag to check some of the internals of
+pcre_compile().
+
+If you build PCRE with a locale setting that is not the standard C locale, the
+character tables may be different (see next paragraph). In some cases, this may
+cause failures in the second set of tests. For example, in a locale where the
+isprint() function yields TRUE for characters in the range 128-255, the use of
+[:isascii:] inside a character class defines a different set of characters, and
+this shows up in this test as a difference in the compiled code, which is being
+listed for checking. Where the comparison test output contains [\x00-\x7f] the
+test will contain [\x00-\xff], and similarly in some other cases. This is not a
+bug in PCRE.
The fourth set of tests checks pcre_maketables(), the facility for building a
set of character tables for a specific locale and using them instead of the
@@ -117,14 +184,10 @@ output to say why. If running this test produces instances of the error
in the comparison output, it means that locale is not available on your system,
despite being listed by "locale". This does not mean that PCRE is broken.
-PCRE has its own native API, but a set of "wrapper" functions that are based on
-the POSIX API are also supplied in the library libpcreposix.a. Note that this
-just provides a POSIX calling interface to PCRE: the regular expressions
-themselves still follow Perl syntax and semantics. The header file
-for the POSIX-style functions is called pcreposix.h. The official POSIX name is
-regex.h, but I didn't want to risk possible problems with existing files of
-that name by distributing it that way. To use it with an existing program that
-uses the POSIX API, it will have to be renamed or pointed at by a link.
+The fifth test checks the experimental, incomplete UTF-8 support. It is not run
+automatically unless PCRE is built with UTF-8 support. This file can be fed
+directly to the perltest8 script, which requires Perl 5.6 or higher. The sixth
+file tests internal UTF-8 features of PCRE that are not relevant to Perl.
Character tables
@@ -197,7 +260,7 @@ The distribution should contain the following files:
NEWS important changes in this release
NON-UNIX-USE notes on building PCRE on non-Unix systems
README this file
- RunTest a Unix shell script for running tests
+ RunTest.in template for a Unix shell script for running tests
config.guess ) files used by libtool,
config.sub ) used only when building a shared library
configure a configuring shell script (built by autoconf)
@@ -211,24 +274,29 @@ The distribution should contain the following files:
doc/pcreposix.txt plain text version
doc/pcretest.txt documentation of test program
doc/perltest.txt documentation of Perl test program
- doc/pgrep.1 man page source for the pgrep utility
- doc/pgrep.html HTML version
- doc/pgrep.txt plain text version
+ doc/pcregrep.1 man page source for the pcregrep utility
+ doc/pcregrep.html HTML version
+ doc/pcregrep.txt plain text version
install-sh a shell script for installing files
- ltconfig ) files used to build "libtool",
- ltmain.sh ) used only when building a shared library
- pcretest.c test program
+ ltmain.sh file used to build a libtool script
+ pcretest.c comprehensive test program
+ pcredemo.c simple demonstration of coding calls to PCRE
perltest Perl test program
- pgrep.c source of a grep utility that uses PCRE
+ perltest8 Perl test program for UTF-8 tests
+ pcregrep.c source of a grep utility that uses PCRE
pcre-config.in source of script which retains PCRE information
testdata/testinput1 test data, compatible with Perl 5.004 and 5.005
testdata/testinput2 test data for error messages and non-Perl things
testdata/testinput3 test data, compatible with Perl 5.005
testdata/testinput4 test data for locale-specific tests
+ testdata/testinput5 test data for UTF-8 tests compatible with Perl 5.6
+ testdata/testinput6 test data for other UTF-8 tests
testdata/testoutput1 test results corresponding to testinput1
testdata/testoutput2 test results corresponding to testinput2
testdata/testoutput3 test results corresponding to testinput3
testdata/testoutput4 test results corresponding to testinput4
+ testdata/testoutput5 test results corresponding to testinput5
+ testdata/testoutput6 test results corresponding to testinput6
(C) Auxiliary files for Win32 DLL
@@ -236,4 +304,4 @@ The distribution should contain the following files:
pcre.def
Philip Hazel <ph10@cam.ac.uk>
-February 2000
+August 2001
diff --git a/srclib/pcre/config.guess b/srclib/pcre/config.guess
index e1b5871708..ba66165161 100644
--- a/srclib/pcre/config.guess
+++ b/srclib/pcre/config.guess
@@ -1,8 +1,10 @@
#! /bin/sh
# Attempt to guess a canonical system name.
-# Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999
+# Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001
# Free Software Foundation, Inc.
-#
+
+timestamp='2001-04-20'
+
# This file is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
@@ -23,34 +25,93 @@
# the same distribution terms that you use for the rest of that program.
# Written by Per Bothner <bothner@cygnus.com>.
-# The master version of this file is at the FSF in /home/gd/gnu/lib.
-# Please send patches to <autoconf-patches@gnu.org>.
+# Please send patches to <config-patches@gnu.org>.
#
# This script attempts to guess a canonical system name similar to
# config.sub. If it succeeds, it prints the system name on stdout, and
# exits with 0. Otherwise, it exits with 1.
#
# The plan is that this can be called by configure scripts if you
-# don't specify an explicit system type (host/target name).
-#
-# Only a few systems have been added to this list; please add others
-# (but try to keep the structure clean).
-#
+# don't specify an explicit build system type.
-# Use $HOST_CC if defined. $CC may point to a cross-compiler
-if test x"$CC_FOR_BUILD" = x; then
- if test x"$HOST_CC" != x; then
- CC_FOR_BUILD="$HOST_CC"
- else
- if test x"$CC" != x; then
- CC_FOR_BUILD="$CC"
- else
- CC_FOR_BUILD=cc
- fi
- fi
+me=`echo "$0" | sed -e 's,.*/,,'`
+
+usage="\
+Usage: $0 [OPTION]
+
+Output the configuration name of the system \`$me' is run on.
+
+Operation modes:
+ -h, --help print this help, then exit
+ -t, --time-stamp print date of last modification, then exit
+ -v, --version print version number, then exit
+
+Report bugs and patches to <config-patches@gnu.org>."
+
+version="\
+GNU config.guess ($timestamp)
+
+Originally written by Per Bothner.
+Copyright (C) 1992, 93, 94, 95, 96, 97, 98, 99, 2000
+Free Software Foundation, Inc.
+
+This is free software; see the source for copying conditions. There is NO
+warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE."
+
+help="
+Try \`$me --help' for more information."
+
+# Parse command line
+while test $# -gt 0 ; do
+ case $1 in
+ --time-stamp | --time* | -t )
+ echo "$timestamp" ; exit 0 ;;
+ --version | -v )
+ echo "$version" ; exit 0 ;;
+ --help | --h* | -h )
+ echo "$usage"; exit 0 ;;
+ -- ) # Stop option processing
+ shift; break ;;
+ - ) # Use stdin as input.
+ break ;;
+ -* )
+ echo "$me: invalid option $1$help" >&2
+ exit 1 ;;
+ * )
+ break ;;
+ esac
+done
+
+if test $# != 0; then
+ echo "$me: too many arguments$help" >&2
+ exit 1
fi
+dummy=dummy-$$
+trap 'rm -f $dummy.c $dummy.o $dummy.rel $dummy; exit 1' 1 2 15
+
+# CC_FOR_BUILD -- compiler used by this script.
+# Historically, `CC_FOR_BUILD' used to be named `HOST_CC'. We still
+# use `HOST_CC' if defined, but it is deprecated.
+
+case $CC_FOR_BUILD,$HOST_CC,$CC in
+ ,,) echo "int dummy(){}" > $dummy.c
+ for c in cc gcc c89 ; do
+ ($c $dummy.c -c -o $dummy.o) >/dev/null 2>&1
+ if test $? = 0 ; then
+ CC_FOR_BUILD="$c"; break
+ fi
+ done
+ rm -f $dummy.c $dummy.o $dummy.rel
+ if test x"$CC_FOR_BUILD" = x ; then
+ CC_FOR_BUILD=no_compiler_found
+ fi
+ ;;
+ ,,*) CC_FOR_BUILD=$CC ;;
+ ,*,*) CC_FOR_BUILD=$HOST_CC ;;
+esac
+
# This is needed to find uname on a Pyramid OSx when run in the BSD universe.
# (ghazi@noc.rutgers.edu 8/24/94.)
if (test -f /.attbin/uname) >/dev/null 2>&1 ; then
@@ -59,15 +120,57 @@ fi
UNAME_MACHINE=`(uname -m) 2>/dev/null` || UNAME_MACHINE=unknown
UNAME_RELEASE=`(uname -r) 2>/dev/null` || UNAME_RELEASE=unknown
-UNAME_SYSTEM=`(uname -s) 2>/dev/null` || UNAME_SYSTEM=unknown
+UNAME_SYSTEM=`(uname -s) 2>/dev/null` || UNAME_SYSTEM=unknown
UNAME_VERSION=`(uname -v) 2>/dev/null` || UNAME_VERSION=unknown
-dummy=dummy-$$
-trap 'rm -f $dummy.c $dummy.o $dummy; exit 1' 1 2 15
-
# Note: order is significant - the case branches are not exclusive.
case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in
+ *:NetBSD:*:*)
+ # Netbsd (nbsd) targets should (where applicable) match one or
+ # more of the tupples: *-*-netbsdelf*, *-*-netbsdaout*,
+ # *-*-netbsdecoff* and *-*-netbsd*. For targets that recently
+ # switched to ELF, *-*-netbsd* would select the old
+ # object file format. This provides both forward
+ # compatibility and a consistent mechanism for selecting the
+ # object file format.
+ # Determine the machine/vendor (is the vendor relevant).
+ case "${UNAME_MACHINE}" in
+ amiga) machine=m68k-unknown ;;
+ arm32) machine=arm-unknown ;;
+ atari*) machine=m68k-atari ;;
+ sun3*) machine=m68k-sun ;;
+ mac68k) machine=m68k-apple ;;
+ macppc) machine=powerpc-apple ;;
+ hp3[0-9][05]) machine=m68k-hp ;;
+ ibmrt|romp-ibm) machine=romp-ibm ;;
+ *) machine=${UNAME_MACHINE}-unknown ;;
+ esac
+ # The Operating System including object format, if it has switched
+ # to ELF recently, or will in the future.
+ case "${UNAME_MACHINE}" in
+ i386|sparc|amiga|arm*|hp300|mvme68k|vax|atari|luna68k|mac68k|news68k|next68k|pc532|sun3*|x68k)
+ if echo __ELF__ | $CC_FOR_BUILD -E - 2>/dev/null \
+ | grep __ELF__ >/dev/null
+ then
+ # Once all utilities can be ECOFF (netbsdecoff) or a.out (netbsdaout).
+ # Return netbsd for either. FIX?
+ os=netbsd
+ else
+ os=netbsdelf
+ fi
+ ;;
+ *)
+ os=netbsd
+ ;;
+ esac
+ # The OS release
+ release=`echo ${UNAME_RELEASE}|sed -e 's/[-_].*/\./'`
+ # Since CPU_TYPE-MANUFACTURER-KERNEL-OPERATING_SYSTEM:
+ # contains redundant information, the shorter form:
+ # CPU_TYPE-MANUFACTURER-OPERATING_SYSTEM is used.
+ echo "${machine}-${os}${release}"
+ exit 0 ;;
alpha:OSF1:*:*)
if test $UNAME_RELEASE = "V4.0"; then
UNAME_RELEASE=`/usr/sbin/sizer -v | awk '{print $3}'`
@@ -77,41 +180,51 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in
# A Xn.n version is an unreleased experimental baselevel.
# 1.2 uses "1.2" for uname -r.
cat <<EOF >$dummy.s
+ .data
+\$Lformat:
+ .byte 37,100,45,37,120,10,0 # "%d-%x\n"
+
+ .text
.globl main
+ .align 4
.ent main
main:
- .frame \$30,0,\$26,0
- .prologue 0
- .long 0x47e03d80 # implver $0
- lda \$2,259
- .long 0x47e20c21 # amask $2,$1
- srl \$1,8,\$2
- sll \$2,2,\$2
- sll \$0,3,\$0
- addl \$1,\$0,\$0
- addl \$2,\$0,\$0
- ret \$31,(\$26),1
+ .frame \$30,16,\$26,0
+ ldgp \$29,0(\$27)
+ .prologue 1
+ .long 0x47e03d80 # implver \$0
+ lda \$2,-1
+ .long 0x47e20c21 # amask \$2,\$1
+ lda \$16,\$Lformat
+ mov \$0,\$17
+ not \$1,\$18
+ jsr \$26,printf
+ ldgp \$29,0(\$26)
+ mov 0,\$16
+ jsr \$26,exit
.end main
EOF
$CC_FOR_BUILD $dummy.s -o $dummy 2>/dev/null
if test "$?" = 0 ; then
- ./$dummy
- case "$?" in
- 7)
+ case `./$dummy` in
+ 0-0)
UNAME_MACHINE="alpha"
;;
- 15)
+ 1-0)
UNAME_MACHINE="alphaev5"
;;
- 14)
+ 1-1)
UNAME_MACHINE="alphaev56"
;;
- 10)
+ 1-101)
UNAME_MACHINE="alphapca56"
;;
- 16)
+ 2-303)
UNAME_MACHINE="alphaev6"
;;
+ 2-307)
+ UNAME_MACHINE="alphaev67"
+ ;;
esac
fi
rm -f $dummy.s $dummy
@@ -127,11 +240,8 @@ EOF
echo alpha-dec-winnt3.5
exit 0 ;;
Amiga*:UNIX_System_V:4.0:*)
- echo m68k-cbm-sysv4
+ echo m68k-unknown-sysv4
exit 0;;
- amiga:NetBSD:*:*)
- echo m68k-cbm-netbsd${UNAME_RELEASE}
- exit 0 ;;
amiga:OpenBSD:*:*)
echo m68k-unknown-openbsd${UNAME_RELEASE}
exit 0 ;;
@@ -162,10 +272,7 @@ EOF
arm:RISC*:1.[012]*:*|arm:riscix:1.[012]*:*)
echo arm-acorn-riscix${UNAME_RELEASE}
exit 0;;
- arm32:NetBSD:*:*)
- echo arm-unknown-netbsd`echo ${UNAME_RELEASE}|sed -e 's/[-_].*/\./'`
- exit 0 ;;
- SR2?01:HI-UX/MPP:*:*)
+ SR2?01:HI-UX/MPP:*:* | SR8000:HI-UX/MPP:*:*)
echo hppa1.1-hitachi-hiuxmpp
exit 0;;
Pyramid*:OSx*:*:* | MIS*:OSx*:*:* | MIS*:SMP_DC-OSx*:*:*)
@@ -221,15 +328,12 @@ EOF
aushp:SunOS:*:*)
echo sparc-auspex-sunos${UNAME_RELEASE}
exit 0 ;;
- atari*:NetBSD:*:*)
- echo m68k-atari-netbsd${UNAME_RELEASE}
- exit 0 ;;
atari*:OpenBSD:*:*)
echo m68k-unknown-openbsd${UNAME_RELEASE}
exit 0 ;;
# The situation for MiNT is a little confusing. The machine name
# can be virtually everything (everything which is not
- # "atarist" or "atariste" at least should have a processor
+ # "atarist" or "atariste" at least should have a processor
# > m68000). The system name ranges from "MiNT" over "FreeMiNT"
# to the lowercase version "mint" (or "freemint"). Finally
# the system name "TOS" denotes a system which is actually not
@@ -253,15 +357,9 @@ EOF
*:*MiNT:*:* | *:*mint:*:* | *:*TOS:*:*)
echo m68k-unknown-mint${UNAME_RELEASE}
exit 0 ;;
- sun3*:NetBSD:*:*)
- echo m68k-sun-netbsd${UNAME_RELEASE}
- exit 0 ;;
sun3*:OpenBSD:*:*)
echo m68k-unknown-openbsd${UNAME_RELEASE}
exit 0 ;;
- mac68k:NetBSD:*:*)
- echo m68k-apple-netbsd${UNAME_RELEASE}
- exit 0 ;;
mac68k:OpenBSD:*:*)
echo m68k-unknown-openbsd${UNAME_RELEASE}
exit 0 ;;
@@ -274,9 +372,6 @@ EOF
powerpc:machten:*:*)
echo powerpc-apple-machten${UNAME_RELEASE}
exit 0 ;;
- macppc:NetBSD:*:*)
- echo powerpc-apple-netbsd${UNAME_RELEASE}
- exit 0 ;;
RISC*:Mach:*:*)
echo mips-dec-mach_bsd4.3
exit 0 ;;
@@ -292,6 +387,7 @@ EOF
mips:*:*:UMIPS | mips:*:*:RISCos)
sed 's/^ //' << EOF >$dummy.c
#ifdef __cplusplus
+#include <stdio.h> /* for printf() prototype */
int main (int argc, char *argv[]) {
#else
int main (argc, argv) int argc; char *argv[]; {
@@ -312,10 +408,13 @@ EOF
EOF
$CC_FOR_BUILD $dummy.c -o $dummy \
&& ./$dummy `echo "${UNAME_RELEASE}" | sed -n 's/\([0-9]*\).*/\1/p'` \
- && rm $dummy.c $dummy && exit 0
+ && rm -f $dummy.c $dummy && exit 0
rm -f $dummy.c $dummy
echo mips-mips-riscos${UNAME_RELEASE}
exit 0 ;;
+ Motorola:PowerMAX_OS:*:*)
+ echo powerpc-motorola-powermax
+ exit 0 ;;
Night_Hawk:Power_UNIX:*:*)
echo powerpc-harris-powerunix
exit 0 ;;
@@ -331,7 +430,7 @@ EOF
AViiON:dgux:*:*)
# DG/UX returns AViiON for all architectures
UNAME_PROCESSOR=`/usr/bin/uname -p`
- if [ $UNAME_PROCESSOR = mc88100 ] || [ $UNAME_PROCESSOR = mc88110]
+ if [ $UNAME_PROCESSOR = mc88100 ] || [ $UNAME_PROCESSOR = mc88110 ]
then
if [ ${TARGET_BINARY_INTERFACE}x = m88kdguxelfx ] || \
[ ${TARGET_BINARY_INTERFACE}x = x ]
@@ -363,9 +462,17 @@ EOF
????????:AIX?:[12].1:2) # AIX 2.2.1 or AIX 2.1.1 is RT/PC AIX.
echo romp-ibm-aix # uname -m gives an 8 hex-code CPU id
exit 0 ;; # Note that: echo "'`uname -s`'" gives 'AIX '
- i?86:AIX:*:*)
+ i*86:AIX:*:*)
echo i386-ibm-aix
exit 0 ;;
+ ia64:AIX:*:*)
+ if [ -x /usr/bin/oslevel ] ; then
+ IBM_REV=`/usr/bin/oslevel`
+ else
+ IBM_REV=${UNAME_VERSION}.${UNAME_RELEASE}
+ fi
+ echo ${UNAME_MACHINE}-ibm-aix${IBM_REV}
+ exit 0 ;;
*:AIX:2:3)
if grep bos325 /usr/include/stdio.h >/dev/null 2>&1; then
sed 's/^ //' << EOF >$dummy.c
@@ -379,7 +486,7 @@ EOF
exit(0);
}
EOF
- $CC_FOR_BUILD $dummy.c -o $dummy && ./$dummy && rm $dummy.c $dummy && exit 0
+ $CC_FOR_BUILD $dummy.c -o $dummy && ./$dummy && rm -f $dummy.c $dummy && exit 0
rm -f $dummy.c $dummy
echo rs6000-ibm-aix3.2.5
elif grep bos324 /usr/include/stdio.h >/dev/null 2>&1; then
@@ -388,9 +495,9 @@ EOF
echo rs6000-ibm-aix3.2
fi
exit 0 ;;
- *:AIX:*:4)
+ *:AIX:*:[45])
IBM_CPU_ID=`/usr/sbin/lsdev -C -c processor -S available | head -1 | awk '{ print $1 }'`
- if /usr/sbin/lsattr -EHl ${IBM_CPU_ID} | grep POWER >/dev/null 2>&1; then
+ if /usr/sbin/lsattr -El ${IBM_CPU_ID} | grep ' POWER' >/dev/null 2>&1; then
IBM_ARCH=rs6000
else
IBM_ARCH=powerpc
@@ -398,7 +505,7 @@ EOF
if [ -x /usr/bin/oslevel ] ; then
IBM_REV=`/usr/bin/oslevel`
else
- IBM_REV=4.${UNAME_RELEASE}
+ IBM_REV=${UNAME_VERSION}.${UNAME_RELEASE}
fi
echo ${IBM_ARCH}-ibm-aix${IBM_REV}
exit 0 ;;
@@ -408,7 +515,7 @@ EOF
ibmrt:4.4BSD:*|romp-ibm:BSD:*)
echo romp-ibm-bsd4.4
exit 0 ;;
- ibmrt:*BSD:*|romp-ibm:BSD:*) # covers RT/PC NetBSD and
+ ibmrt:*BSD:*|romp-ibm:BSD:*) # covers RT/PC BSD and
echo romp-ibm-bsd${UNAME_RELEASE} # 4.3 with uname added to
exit 0 ;; # report: romp-ibm BSD 4.3
*:BOSX:*:*)
@@ -424,11 +531,31 @@ EOF
echo m68k-hp-bsd4.4
exit 0 ;;
9000/[34678]??:HP-UX:*:*)
+ HPUX_REV=`echo ${UNAME_RELEASE}|sed -e 's/[^.]*.[0B]*//'`
case "${UNAME_MACHINE}" in
9000/31? ) HP_ARCH=m68000 ;;
9000/[34]?? ) HP_ARCH=m68k ;;
9000/[678][0-9][0-9])
+ case "${HPUX_REV}" in
+ 11.[0-9][0-9])
+ if [ -x /usr/bin/getconf ]; then
+ sc_cpu_version=`/usr/bin/getconf SC_CPU_VERSION 2>/dev/null`
+ sc_kernel_bits=`/usr/bin/getconf SC_KERNEL_BITS 2>/dev/null`
+ case "${sc_cpu_version}" in
+ 523) HP_ARCH="hppa1.0" ;; # CPU_PA_RISC1_0
+ 528) HP_ARCH="hppa1.1" ;; # CPU_PA_RISC1_1
+ 532) # CPU_PA_RISC2_0
+ case "${sc_kernel_bits}" in
+ 32) HP_ARCH="hppa2.0n" ;;
+ 64) HP_ARCH="hppa2.0w" ;;
+ esac ;;
+ esac
+ fi ;;
+ esac
+ if [ "${HP_ARCH}" = "" ]; then
sed 's/^ //' << EOF >$dummy.c
+
+ #define _HPUX_SOURCE
#include <stdlib.h>
#include <unistd.h>
@@ -460,11 +587,16 @@ EOF
}
EOF
(CCOPTS= $CC_FOR_BUILD $dummy.c -o $dummy 2>/dev/null ) && HP_ARCH=`./$dummy`
+ if test -z "$HP_ARCH"; then HP_ARCH=hppa; fi
rm -f $dummy.c $dummy
+ fi ;;
esac
- HPUX_REV=`echo ${UNAME_RELEASE}|sed -e 's/[^.]*.[0B]*//'`
echo ${HP_ARCH}-hp-hpux${HPUX_REV}
exit 0 ;;
+ ia64:HP-UX:*:*)
+ HPUX_REV=`echo ${UNAME_RELEASE}|sed -e 's/[^.]*.[0B]*//'`
+ echo ia64-hp-hpux${HPUX_REV}
+ exit 0 ;;
3050*:HI-UX:*:*)
sed 's/^ //' << EOF >$dummy.c
#include <unistd.h>
@@ -491,7 +623,7 @@ EOF
exit (0);
}
EOF
- $CC_FOR_BUILD $dummy.c -o $dummy && ./$dummy && rm $dummy.c $dummy && exit 0
+ $CC_FOR_BUILD $dummy.c -o $dummy && ./$dummy && rm -f $dummy.c $dummy && exit 0
rm -f $dummy.c $dummy
echo unknown-hitachi-hiuxwe2
exit 0 ;;
@@ -510,7 +642,7 @@ EOF
hp8??:OSF1:*:*)
echo hppa1.0-hp-osf
exit 0 ;;
- i?86:OSF1:*:*)
+ i*86:OSF1:*:*)
if [ -x /usr/sbin/sysversion ] ; then
echo ${UNAME_MACHINE}-unknown-osf1mk
else
@@ -553,29 +685,30 @@ EOF
-e y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/
exit 0 ;;
CRAY*TS:*:*:*)
- echo t90-cray-unicos${UNAME_RELEASE}
+ echo t90-cray-unicos${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/'
+ exit 0 ;;
+ CRAY*T3D:*:*:*)
+ echo alpha-cray-unicosmk${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/'
exit 0 ;;
CRAY*T3E:*:*:*)
- echo alpha-cray-unicosmk${UNAME_RELEASE}
+ echo alphaev5-cray-unicosmk${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/'
+ exit 0 ;;
+ CRAY*SV1:*:*:*)
+ echo sv1-cray-unicos${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/'
exit 0 ;;
CRAY-2:*:*:*)
echo cray2-cray-unicos
exit 0 ;;
- F300:UNIX_System_V:*:*)
+ F30[01]:UNIX_System_V:*:* | F700:UNIX_System_V:*:*)
+ FUJITSU_PROC=`uname -m | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz'`
FUJITSU_SYS=`uname -p | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' | sed -e 's/\///'`
FUJITSU_REL=`echo ${UNAME_RELEASE} | sed -e 's/ /_/'`
- echo "f300-fujitsu-${FUJITSU_SYS}${FUJITSU_REL}"
+ echo "${FUJITSU_PROC}-fujitsu-${FUJITSU_SYS}${FUJITSU_REL}"
exit 0 ;;
- F301:UNIX_System_V:*:*)
- echo f301-fujitsu-uxpv`echo $UNAME_RELEASE | sed 's/ .*//'`
- exit 0 ;;
- hp3[0-9][05]:NetBSD:*:*)
- echo m68k-hp-netbsd${UNAME_RELEASE}
- exit 0 ;;
hp300:OpenBSD:*:*)
echo m68k-unknown-openbsd${UNAME_RELEASE}
exit 0 ;;
- i?86:BSD/386:*:* | i?86:BSD/OS:*:*)
+ i*86:BSD/386:*:* | i*86:BSD/OS:*:* | *:Ascend\ Embedded/OS:*:*)
echo ${UNAME_MACHINE}-pc-bsdi${UNAME_RELEASE}
exit 0 ;;
sparc*:BSD/OS:*:*)
@@ -585,17 +718,8 @@ EOF
echo ${UNAME_MACHINE}-unknown-bsdi${UNAME_RELEASE}
exit 0 ;;
*:FreeBSD:*:*)
- if test -x /usr/bin/objformat; then
- if test "elf" = "`/usr/bin/objformat`"; then
- echo ${UNAME_MACHINE}-unknown-freebsdelf`echo ${UNAME_RELEASE}|sed -e 's/[-_].*//'`
- exit 0
- fi
- fi
echo ${UNAME_MACHINE}-unknown-freebsd`echo ${UNAME_RELEASE}|sed -e 's/[-(].*//'`
exit 0 ;;
- *:NetBSD:*:*)
- echo ${UNAME_MACHINE}-unknown-netbsd`echo ${UNAME_RELEASE}|sed -e 's/[-_].*//'`
- exit 0 ;;
*:OpenBSD:*:*)
echo ${UNAME_MACHINE}-unknown-openbsd`echo ${UNAME_RELEASE}|sed -e 's/[-_].*/\./'`
exit 0 ;;
@@ -605,6 +729,9 @@ EOF
i*:MINGW*:*)
echo ${UNAME_MACHINE}-pc-mingw32
exit 0 ;;
+ i*:PW*:*)
+ echo ${UNAME_MACHINE}-pc-pw32
+ exit 0 ;;
i*:Windows_NT*:* | Pentium*:Windows_NT*:*)
# How do we know it's Interix rather than the generic POSIX subsystem?
# It also conflicts with pre-2.0 versions of AT&T UWIN. Should we
@@ -623,54 +750,41 @@ EOF
*:GNU:*:*)
echo `echo ${UNAME_MACHINE}|sed -e 's,[-/].*$,,'`-unknown-gnu`echo ${UNAME_RELEASE}|sed -e 's,/.*$,,'`
exit 0 ;;
- *:Linux:*:*)
-
- # The BFD linker knows what the default object file format is, so
- # first see if it will tell us. cd to the root directory to prevent
- # problems with other programs or directories called `ld' in the path.
- ld_help_string=`cd /; ld --help 2>&1`
- ld_supported_emulations=`echo $ld_help_string \
- | sed -ne '/supported emulations:/!d
- s/[ ][ ]*/ /g
- s/.*supported emulations: *//
- s/ .*//
- p'`
- case "$ld_supported_emulations" in
- *ia64)
- echo "${UNAME_MACHINE}-unknown-linux"
- exit 0
- ;;
- i?86linux)
- echo "${UNAME_MACHINE}-pc-linux-gnuaout"
- exit 0
- ;;
- i?86coff)
- echo "${UNAME_MACHINE}-pc-linux-gnucoff"
- exit 0
- ;;
- sparclinux)
- echo "${UNAME_MACHINE}-unknown-linux-gnuaout"
- exit 0
- ;;
- armlinux)
- echo "${UNAME_MACHINE}-unknown-linux-gnuaout"
- exit 0
- ;;
- elf32arm*)
- echo "${UNAME_MACHINE}-unknown-linux-gnu"
- exit 0
- ;;
- armelf_linux*)
- echo "${UNAME_MACHINE}-unknown-linux-gnu"
- exit 0
- ;;
- m68klinux)
- echo "${UNAME_MACHINE}-unknown-linux-gnuaout"
- exit 0
- ;;
- elf32ppc)
- # Determine Lib Version
- cat >$dummy.c <<EOF
+ i*86:Minix:*:*)
+ echo ${UNAME_MACHINE}-pc-minix
+ exit 0 ;;
+ arm*:Linux:*:*)
+ echo ${UNAME_MACHINE}-unknown-linux-gnu
+ exit 0 ;;
+ ia64:Linux:*:*)
+ echo ${UNAME_MACHINE}-unknown-linux
+ exit 0 ;;
+ m68*:Linux:*:*)
+ echo ${UNAME_MACHINE}-unknown-linux-gnu
+ exit 0 ;;
+ mips:Linux:*:*)
+ cat >$dummy.c <<EOF
+#ifdef __cplusplus
+#include <stdio.h> /* for printf() prototype */
+int main (int argc, char *argv[]) {
+#else
+int main (argc, argv) int argc; char *argv[]; {
+#endif
+#ifdef __MIPSEB__
+ printf ("%s-unknown-linux-gnu\n", argv[1]);
+#endif
+#ifdef __MIPSEL__
+ printf ("%sel-unknown-linux-gnu\n", argv[1]);
+#endif
+ return 0;
+}
+EOF
+ $CC_FOR_BUILD $dummy.c -o $dummy 2>/dev/null && ./$dummy "${UNAME_MACHINE}" && rm -f $dummy.c $dummy && exit 0
+ rm -f $dummy.c $dummy
+ ;;
+ ppc:Linux:*:*)
+ # Determine Lib Version
+ cat >$dummy.c <<EOF
#include <features.h>
#if defined(__GLIBC__)
extern char __libc_version[];
@@ -683,112 +797,130 @@ main(argc, argv)
#if defined(__GLIBC__)
printf("%s %s\n", __libc_version, __libc_release);
#else
- printf("unkown\n");
+ printf("unknown\n");
#endif
return 0;
}
EOF
- LIBC=""
- $CC_FOR_BUILD $dummy.c -o $dummy 2>/dev/null
- if test "$?" = 0 ; then
- ./$dummy | grep 1\.99 > /dev/null
- if test "$?" = 0 ; then
- LIBC="libc1"
- fi
- fi
- rm -f $dummy.c $dummy
- echo powerpc-unknown-linux-gnu${LIBC}
- exit 0
- ;;
- esac
-
- if test "${UNAME_MACHINE}" = "alpha" ; then
- sed 's/^ //' <<EOF >$dummy.s
+ LIBC=""
+ $CC_FOR_BUILD $dummy.c -o $dummy 2>/dev/null
+ if test "$?" = 0 ; then
+ ./$dummy | grep 1\.99 > /dev/null
+ if test "$?" = 0 ; then LIBC="libc1" ; fi
+ fi
+ rm -f $dummy.c $dummy
+ echo powerpc-unknown-linux-gnu${LIBC}
+ exit 0 ;;
+ alpha:Linux:*:*)
+ cat <<EOF >$dummy.s
+ .data
+ \$Lformat:
+ .byte 37,100,45,37,120,10,0 # "%d-%x\n"
+ .text
.globl main
+ .align 4
.ent main
- main:
- .frame \$30,0,\$26,0
- .prologue 0
- .long 0x47e03d80 # implver $0
- lda \$2,259
- .long 0x47e20c21 # amask $2,$1
- srl \$1,8,\$2
- sll \$2,2,\$2
- sll \$0,3,\$0
- addl \$1,\$0,\$0
- addl \$2,\$0,\$0
- ret \$31,(\$26),1
+ main:
+ .frame \$30,16,\$26,0
+ ldgp \$29,0(\$27)
+ .prologue 1
+ .long 0x47e03d80 # implver \$0
+ lda \$2,-1
+ .long 0x47e20c21 # amask \$2,\$1
+ lda \$16,\$Lformat
+ mov \$0,\$17
+ not \$1,\$18
+ jsr \$26,printf
+ ldgp \$29,0(\$26)
+ mov 0,\$16
+ jsr \$26,exit
.end main
EOF
- LIBC=""
- $CC_FOR_BUILD $dummy.s -o $dummy 2>/dev/null
+ LIBC=""
+ $CC_FOR_BUILD $dummy.s -o $dummy 2>/dev/null
+ if test "$?" = 0 ; then
+ case `./$dummy` in
+ 0-0) UNAME_MACHINE="alpha" ;;
+ 1-0) UNAME_MACHINE="alphaev5" ;;
+ 1-1) UNAME_MACHINE="alphaev56" ;;
+ 1-101) UNAME_MACHINE="alphapca56" ;;
+ 2-303) UNAME_MACHINE="alphaev6" ;;
+ 2-307) UNAME_MACHINE="alphaev67" ;;
+ esac
+ objdump --private-headers $dummy | \
+ grep ld.so.1 > /dev/null
if test "$?" = 0 ; then
- ./$dummy
- case "$?" in
- 7)
- UNAME_MACHINE="alpha"
- ;;
- 15)
- UNAME_MACHINE="alphaev5"
- ;;
- 14)
- UNAME_MACHINE="alphaev56"
- ;;
- 10)
- UNAME_MACHINE="alphapca56"
- ;;
- 16)
- UNAME_MACHINE="alphaev6"
- ;;
- esac
-
- objdump --private-headers $dummy | \
- grep ld.so.1 > /dev/null
- if test "$?" = 0 ; then
- LIBC="libc1"
- fi
+ LIBC="libc1"
fi
- rm -f $dummy.s $dummy
- echo ${UNAME_MACHINE}-unknown-linux-gnu${LIBC} ; exit 0
- elif test "${UNAME_MACHINE}" = "mips" ; then
- cat >$dummy.c <<EOF
-#ifdef __cplusplus
- int main (int argc, char *argv[]) {
-#else
- int main (argc, argv) int argc; char *argv[]; {
-#endif
-#ifdef __MIPSEB__
- printf ("%s-unknown-linux-gnu\n", argv[1]);
-#endif
-#ifdef __MIPSEL__
- printf ("%sel-unknown-linux-gnu\n", argv[1]);
-#endif
- return 0;
-}
-EOF
- $CC_FOR_BUILD $dummy.c -o $dummy 2>/dev/null && ./$dummy "${UNAME_MACHINE}" && rm $dummy.c $dummy && exit 0
- rm -f $dummy.c $dummy
- else
- # Either a pre-BFD a.out linker (linux-gnuoldld)
- # or one that does not give us useful --help.
- # GCC wants to distinguish between linux-gnuoldld and linux-gnuaout.
- # If ld does not provide *any* "supported emulations:"
- # that means it is gnuoldld.
- echo "$ld_help_string" | grep >/dev/null 2>&1 "supported emulations:"
- test $? != 0 && echo "${UNAME_MACHINE}-pc-linux-gnuoldld" && exit 0
-
- case "${UNAME_MACHINE}" in
- i?86)
- VENDOR=pc;
- ;;
- *)
- VENDOR=unknown;
- ;;
- esac
- # Determine whether the default compiler is a.out or elf
- cat >$dummy.c <<EOF
+ fi
+ rm -f $dummy.s $dummy
+ echo ${UNAME_MACHINE}-unknown-linux-gnu${LIBC}
+ exit 0 ;;
+ parisc:Linux:*:* | hppa:Linux:*:*)
+ # Look for CPU level
+ case `grep '^cpu[^a-z]*:' /proc/cpuinfo 2>/dev/null | cut -d' ' -f2` in
+ PA7*) echo hppa1.1-unknown-linux-gnu ;;
+ PA8*) echo hppa2.0-unknown-linux-gnu ;;
+ *) echo hppa-unknown-linux-gnu ;;
+ esac
+ exit 0 ;;
+ parisc64:Linux:*:* | hppa64:Linux:*:*)
+ echo hppa64-unknown-linux-gnu
+ exit 0 ;;
+ s390:Linux:*:* | s390x:Linux:*:*)
+ echo ${UNAME_MACHINE}-ibm-linux
+ exit 0 ;;
+ sh*:Linux:*:*)
+ echo ${UNAME_MACHINE}-unknown-linux-gnu
+ exit 0 ;;
+ sparc:Linux:*:* | sparc64:Linux:*:*)
+ echo ${UNAME_MACHINE}-unknown-linux-gnu
+ exit 0 ;;
+ x86_64:Linux:*:*)
+ echo x86_64-unknown-linux-gnu
+ exit 0 ;;
+ i*86:Linux:*:*)
+ # The BFD linker knows what the default object file format is, so
+ # first see if it will tell us. cd to the root directory to prevent
+ # problems with other programs or directories called `ld' in the path.
+ ld_supported_emulations=`cd /; ld --help 2>&1 \
+ | sed -ne '/supported emulations:/!d
+ s/[ ][ ]*/ /g
+ s/.*supported emulations: *//
+ s/ .*//
+ p'`
+ case "$ld_supported_emulations" in
+ i*86linux)
+ echo "${UNAME_MACHINE}-pc-linux-gnuaout"
+ exit 0
+ ;;
+ elf_i*86)
+ TENTATIVE="${UNAME_MACHINE}-pc-linux-gnu"
+ ;;
+ i*86coff)
+ echo "${UNAME_MACHINE}-pc-linux-gnucoff"
+ exit 0
+ ;;
+ esac
+ # Either a pre-BFD a.out linker (linux-gnuoldld)
+ # or one that does not give us useful --help.
+ # GCC wants to distinguish between linux-gnuoldld and linux-gnuaout.
+ # If ld does not provide *any* "supported emulations:"
+ # that means it is gnuoldld.
+ test -z "$ld_supported_emulations" && echo "${UNAME_MACHINE}-pc-linux-gnuoldld" && exit 0
+ case "${UNAME_MACHINE}" in
+ i*86)
+ VENDOR=pc;
+ ;;
+ *)
+ VENDOR=unknown;
+ ;;
+ esac
+ # Determine whether the default compiler is a.out or elf
+ cat >$dummy.c <<EOF
#include <features.h>
#ifdef __cplusplus
+#include <stdio.h> /* for printf() prototype */
int main (int argc, char *argv[]) {
#else
int main (argc, argv) int argc; char *argv[]; {
@@ -809,15 +941,16 @@ EOF
return 0;
}
EOF
- $CC_FOR_BUILD $dummy.c -o $dummy 2>/dev/null && ./$dummy "${UNAME_MACHINE}" && rm $dummy.c $dummy && exit 0
- rm -f $dummy.c $dummy
- fi ;;
+ $CC_FOR_BUILD $dummy.c -o $dummy 2>/dev/null && ./$dummy "${UNAME_MACHINE}" && rm -f $dummy.c $dummy && exit 0
+ rm -f $dummy.c $dummy
+ test x"${TENTATIVE}" != x && echo "${TENTATIVE}" && exit 0
+ ;;
# ptx 4.0 does uname -s correctly, with DYNIX/ptx in there. earlier versions
# are messed up and put the nodename in both sysname and nodename.
- i?86:DYNIX/ptx:4*:*)
+ i*86:DYNIX/ptx:4*:*)
echo i386-sequent-sysv4
exit 0 ;;
- i?86:UNIX_SV:4.2MP:2.*)
+ i*86:UNIX_SV:4.2MP:2.*)
# Unixware is an offshoot of SVR4, but it has its own version
# number series starting with 2...
# I am not positive that other SVR4 systems won't match this,
@@ -825,7 +958,7 @@ EOF
# Use sysv4.2uw... so that sysv4* matches it.
echo ${UNAME_MACHINE}-pc-sysv4.2uw${UNAME_VERSION}
exit 0 ;;
- i?86:*:4.*:* | i?86:SYSTEM_V:4.*:*)
+ i*86:*:4.*:* | i*86:SYSTEM_V:4.*:*)
UNAME_REL=`echo ${UNAME_RELEASE} | sed 's/\/MP$//'`
if grep Novell /usr/include/link.h >/dev/null 2>/dev/null; then
echo ${UNAME_MACHINE}-univel-sysv${UNAME_REL}
@@ -833,7 +966,7 @@ EOF
echo ${UNAME_MACHINE}-pc-sysv${UNAME_REL}
fi
exit 0 ;;
- i?86:*:5:7*)
+ i*86:*:5:7*)
# Fixed at (any) Pentium or better
UNAME_MACHINE=i586
if [ ${UNAME_SYSTEM} = "UnixWare" ] ; then
@@ -842,7 +975,7 @@ EOF
echo ${UNAME_MACHINE}-pc-sysv${UNAME_RELEASE}
fi
exit 0 ;;
- i?86:*:3.2:*)
+ i*86:*:3.2:*)
if test -f /usr/options/cb.name; then
UNAME_REL=`sed -n 's/.*Version //p' </usr/options/cb.name`
echo ${UNAME_MACHINE}-pc-isc$UNAME_REL
@@ -860,7 +993,11 @@ EOF
echo ${UNAME_MACHINE}-pc-sysv32
fi
exit 0 ;;
+ i*86:*DOS:*:*)
+ echo ${UNAME_MACHINE}-pc-msdosdjgpp
+ exit 0 ;;
pc:*:*:*)
+ # Left here for compatibility:
# uname -m prints for DJGPP always 'pc', but it prints nothing about
# the processor, so we play safe by assuming i386.
echo i386-pc-msdosdjgpp
@@ -884,7 +1021,7 @@ EOF
exit 0 ;;
M68*:*:R3V[567]*:*)
test -r /sysV68 && echo 'm68k-motorola-sysv' && exit 0 ;;
- 3[34]??:*:4.0:3.0 | 3[34]??,*:*:4.0:3.0 | 4850:*:4.0:3.0)
+ 3[34]??:*:4.0:3.0 | 3[34]??A:*:4.0:3.0 | 3[34]??,*:*:4.0:3.0 | 4850:*:4.0:3.0)
OS_REL=''
test -r /etc/.relid \
&& OS_REL=.`sed -n 's/[^ ]* [^ ]* \([0-9][0-9]\).*/\1/p' < /etc/.relid`
@@ -895,21 +1032,24 @@ EOF
3[34]??:*:4.0:* | 3[34]??,*:*:4.0:*)
/bin/uname -p 2>/dev/null | grep 86 >/dev/null \
&& echo i486-ncr-sysv4 && exit 0 ;;
- m68*:LynxOS:2.*:*)
+ m68*:LynxOS:2.*:* | m68*:LynxOS:3.0*:*)
echo m68k-unknown-lynxos${UNAME_RELEASE}
exit 0 ;;
mc68030:UNIX_System_V:4.*:*)
echo m68k-atari-sysv4
exit 0 ;;
- i?86:LynxOS:2.*:* | i?86:LynxOS:3.[01]*:*)
+ i*86:LynxOS:2.*:* | i*86:LynxOS:3.[01]*:* | i*86:LynxOS:4.0*:*)
echo i386-unknown-lynxos${UNAME_RELEASE}
exit 0 ;;
TSUNAMI:LynxOS:2.*:*)
echo sparc-unknown-lynxos${UNAME_RELEASE}
exit 0 ;;
- rs6000:LynxOS:2.*:* | PowerPC:LynxOS:2.*:*)
+ rs6000:LynxOS:2.*:*)
echo rs6000-unknown-lynxos${UNAME_RELEASE}
exit 0 ;;
+ PowerPC:LynxOS:2.*:* | PowerPC:LynxOS:3.[01]*:* | PowerPC:LynxOS:4.0*:*)
+ echo powerpc-unknown-lynxos${UNAME_RELEASE}
+ exit 0 ;;
SM[BE]S:UNIX_SV:*:*)
echo mips-dde-sysv${UNAME_RELEASE}
exit 0 ;;
@@ -943,7 +1083,7 @@ EOF
mc68*:A/UX:*:*)
echo m68k-apple-aux${UNAME_RELEASE}
exit 0 ;;
- news*:NEWS-OS:*:6*)
+ news*:NEWS-OS:6*:*)
echo mips-sony-newsos6
exit 0 ;;
R[34]000:*System_V*:*:* | R4000:UNIX_SYSV:*:* | R*000:UNIX_SV:*:*)
@@ -974,8 +1114,63 @@ EOF
*:Rhapsody:*:*)
echo ${UNAME_MACHINE}-apple-rhapsody${UNAME_RELEASE}
exit 0 ;;
+ *:Darwin:*:*)
+ echo `uname -p`-apple-darwin${UNAME_RELEASE}
+ exit 0 ;;
+ *:procnto*:*:* | *:QNX:[0123456789]*:*)
+ if test "${UNAME_MACHINE}" = "x86pc"; then
+ UNAME_MACHINE=pc
+ fi
+ echo `uname -p`-${UNAME_MACHINE}-nto-qnx
+ exit 0 ;;
*:QNX:*:4*)
- echo i386-qnx-qnx${UNAME_VERSION}
+ echo i386-pc-qnx
+ exit 0 ;;
+ NSR-[KW]:NONSTOP_KERNEL:*:*)
+ echo nsr-tandem-nsk${UNAME_RELEASE}
+ exit 0 ;;
+ *:NonStop-UX:*:*)
+ echo mips-compaq-nonstopux
+ exit 0 ;;
+ BS2000:POSIX*:*:*)
+ echo bs2000-siemens-sysv
+ exit 0 ;;
+ DS/*:UNIX_System_V:*:*)
+ echo ${UNAME_MACHINE}-${UNAME_SYSTEM}-${UNAME_RELEASE}
+ exit 0 ;;
+ *:Plan9:*:*)
+ # "uname -m" is not consistent, so use $cputype instead. 386
+ # is converted to i386 for consistency with other x86
+ # operating systems.
+ if test "$cputype" = "386"; then
+ UNAME_MACHINE=i386
+ else
+ UNAME_MACHINE="$cputype"
+ fi
+ echo ${UNAME_MACHINE}-unknown-plan9
+ exit 0 ;;
+ i*86:OS/2:*:*)
+ # If we were able to find `uname', then EMX Unix compatibility
+ # is probably installed.
+ echo ${UNAME_MACHINE}-pc-os2-emx
+ exit 0 ;;
+ *:TOPS-10:*:*)
+ echo pdp10-unknown-tops10
+ exit 0 ;;
+ *:TENEX:*:*)
+ echo pdp10-unknown-tenex
+ exit 0 ;;
+ KS10:TOPS-20:*:* | KL10:TOPS-20:*:* | TYPE4:TOPS-20:*:*)
+ echo pdp10-dec-tops20
+ exit 0 ;;
+ XKL-1:TOPS-20:*:* | TYPE5:TOPS-20:*:*)
+ echo pdp10-xkl-tops20
+ exit 0 ;;
+ *:TOPS-20:*:*)
+ echo pdp10-unknown-tops20
+ exit 0 ;;
+ *:ITS:*:*)
+ echo pdp10-unknown-its
exit 0 ;;
esac
@@ -1068,11 +1263,24 @@ main ()
#endif
#if defined (vax)
-#if !defined (ultrix)
- printf ("vax-dec-bsd\n"); exit (0);
-#else
- printf ("vax-dec-ultrix\n"); exit (0);
-#endif
+# if !defined (ultrix)
+# include <sys/param.h>
+# if defined (BSD)
+# if BSD == 43
+ printf ("vax-dec-bsd4.3\n"); exit (0);
+# else
+# if BSD == 199006
+ printf ("vax-dec-bsd4.3reno\n"); exit (0);
+# else
+ printf ("vax-dec-bsd\n"); exit (0);
+# endif
+# endif
+# else
+ printf ("vax-dec-bsd\n"); exit (0);
+# endif
+# else
+ printf ("vax-dec-ultrix\n"); exit (0);
+# endif
#endif
#if defined (alliant) && defined (i860)
@@ -1083,7 +1291,7 @@ main ()
}
EOF
-$CC_FOR_BUILD $dummy.c -o $dummy 2>/dev/null && ./$dummy && rm $dummy.c $dummy && exit 0
+$CC_FOR_BUILD $dummy.c -o $dummy 2>/dev/null && ./$dummy && rm -f $dummy.c $dummy && exit 0
rm -f $dummy.c $dummy
# Apollos put the system type in the environment.
@@ -1116,6 +1324,48 @@ then
esac
fi
-#echo '(Unable to guess system type)' 1>&2
+cat >&2 <<EOF
+$0: unable to guess system type
+
+This script, last modified $timestamp, has failed to recognize
+the operating system you are using. It is advised that you
+download the most up to date version of the config scripts from
+
+ ftp://ftp.gnu.org/pub/gnu/config/
+
+If the version you run ($0) is already up to date, please
+send the following data and any information you think might be
+pertinent to <config-patches@gnu.org> in order to provide the needed
+information to handle your system.
+
+config.guess timestamp = $timestamp
+
+uname -m = `(uname -m) 2>/dev/null || echo unknown`
+uname -r = `(uname -r) 2>/dev/null || echo unknown`
+uname -s = `(uname -s) 2>/dev/null || echo unknown`
+uname -v = `(uname -v) 2>/dev/null || echo unknown`
+
+/usr/bin/uname -p = `(/usr/bin/uname -p) 2>/dev/null`
+/bin/uname -X = `(/bin/uname -X) 2>/dev/null`
+
+hostinfo = `(hostinfo) 2>/dev/null`
+/bin/universe = `(/bin/universe) 2>/dev/null`
+/usr/bin/arch -k = `(/usr/bin/arch -k) 2>/dev/null`
+/bin/arch = `(/bin/arch) 2>/dev/null`
+/usr/bin/oslevel = `(/usr/bin/oslevel) 2>/dev/null`
+/usr/convex/getsysinfo = `(/usr/convex/getsysinfo) 2>/dev/null`
+
+UNAME_MACHINE = ${UNAME_MACHINE}
+UNAME_RELEASE = ${UNAME_RELEASE}
+UNAME_SYSTEM = ${UNAME_SYSTEM}
+UNAME_VERSION = ${UNAME_VERSION}
+EOF
exit 1
+
+# Local variables:
+# eval: (add-hook 'write-file-hooks 'time-stamp)
+# time-stamp-start: "timestamp='"
+# time-stamp-format: "%:y-%02m-%02d"
+# time-stamp-end: "'"
+# End:
diff --git a/srclib/pcre/config.in b/srclib/pcre/config.in
index 7631d468cd..767cbd055c 100644
--- a/srclib/pcre/config.in
+++ b/srclib/pcre/config.in
@@ -3,9 +3,13 @@
written in Standard C, but there are a few non-standard things it can cope
with, allowing it to run on SunOS4 and other "close to standard" systems.
-On a non-Unix system you should just copy this file into config.h and change
-the definitions of HAVE_STRERROR and HAVE_MEMMOVE to 1. Unfortunately, because
-of the way autoconf works, these cannot be made the defaults. */
+On a non-Unix system you should just copy this file into config.h, and set up
+the macros the way you need them. You should normally change the definitions of
+HAVE_STRERROR and HAVE_MEMMOVE to 1. Unfortunately, because of the way autoconf
+works, these cannot be made the defaults. If your system has bcopy() and not
+memmove(), change the definition of HAVE_BCOPY instead of HAVE_MEMMOVE. If your
+system has neither bcopy() nor memmove(), leave them both as 0; an emulation
+function will be used. */
/* Define to empty if the keyword does not work. */
@@ -17,12 +21,27 @@ of the way autoconf works, these cannot be made the defaults. */
/* The following two definitions are mainly for the benefit of SunOS4, which
doesn't have the strerror() or memmove() functions that should be present in
-all Standard C libraries. The macros should normally be defined with the value
-1 for other systems, but unfortunately we can't make this the default because
-"configure" files generated by autoconf will only change 0 to 1; they won't
-change 1 to 0 if the functions are not found. */
+all Standard C libraries. The macros HAVE_STRERROR and HAVE_MEMMOVE should
+normally be defined with the value 1 for other systems, but unfortunately we
+can't make this the default because "configure" files generated by autoconf
+will only change 0 to 1; they won't change 1 to 0 if the functions are not
+found. */
#define HAVE_STRERROR 0
#define HAVE_MEMMOVE 0
+/* There are some non-Unix systems that don't even have bcopy(). If this macro
+is false, an emulation is used. If HAVE_MEMMOVE is set to 1, the value of
+HAVE_BCOPY is not relevant. */
+
+#define HAVE_BCOPY 0
+
+/* The value of NEWLINE determines the newline character. The default is to
+leave it up to the compiler, but some sites want to force a particular value.
+On Unix systems, "configure" can be used to override this default. */
+
+#ifndef NEWLINE
+#define NEWLINE '\n'
+#endif
+
/* End */
diff --git a/srclib/pcre/config.sub b/srclib/pcre/config.sub
index 28426bb8fa..93a3a14643 100644
--- a/srclib/pcre/config.sub
+++ b/srclib/pcre/config.sub
@@ -1,6 +1,10 @@
#! /bin/sh
-# Configuration validation subroutine script, version 1.1.
-# Copyright (C) 1991, 92-97, 1998, 1999 Free Software Foundation, Inc.
+# Configuration validation subroutine script.
+# Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001
+# Free Software Foundation, Inc.
+
+timestamp='2001-05-11'
+
# This file is (in principle) common to ALL GNU software.
# The presence of a machine in this file suggests that SOME GNU software
# can handle that machine. It does not imply ALL GNU software can.
@@ -25,6 +29,8 @@
# configuration script generated by Autoconf, you may include it under
# the same distribution terms that you use for the rest of that program.
+# Please send patches to <config-patches@gnu.org>.
+#
# Configuration subroutine to validate and canonicalize a configuration type.
# Supply the specified configuration type as an argument.
# If it is invalid, we print an error message on stderr and exit with code 1.
@@ -45,30 +51,73 @@
# CPU_TYPE-MANUFACTURER-KERNEL-OPERATING_SYSTEM
# It is wrong to echo any other type of specification.
-if [ x$1 = x ]
-then
- echo Configuration name missing. 1>&2
- echo "Usage: $0 CPU-MFR-OPSYS" 1>&2
- echo "or $0 ALIAS" 1>&2
- echo where ALIAS is a recognized configuration type. 1>&2
- exit 1
-fi
+me=`echo "$0" | sed -e 's,.*/,,'`
-# First pass through any local machine types.
-case $1 in
- *local*)
- echo $1
- exit 0
- ;;
- *)
- ;;
+usage="\
+Usage: $0 [OPTION] CPU-MFR-OPSYS
+ $0 [OPTION] ALIAS
+
+Canonicalize a configuration name.
+
+Operation modes:
+ -h, --help print this help, then exit
+ -t, --time-stamp print date of last modification, then exit
+ -v, --version print version number, then exit
+
+Report bugs and patches to <config-patches@gnu.org>."
+
+version="\
+GNU config.sub ($timestamp)
+
+Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001
+Free Software Foundation, Inc.
+
+This is free software; see the source for copying conditions. There is NO
+warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE."
+
+help="
+Try \`$me --help' for more information."
+
+# Parse command line
+while test $# -gt 0 ; do
+ case $1 in
+ --time-stamp | --time* | -t )
+ echo "$timestamp" ; exit 0 ;;
+ --version | -v )
+ echo "$version" ; exit 0 ;;
+ --help | --h* | -h )
+ echo "$usage"; exit 0 ;;
+ -- ) # Stop option processing
+ shift; break ;;
+ - ) # Use stdin as input.
+ break ;;
+ -* )
+ echo "$me: invalid option $1$help"
+ exit 1 ;;
+
+ *local*)
+ # First pass through any local machine types.
+ echo $1
+ exit 0;;
+
+ * )
+ break ;;
+ esac
+done
+
+case $# in
+ 0) echo "$me: missing argument$help" >&2
+ exit 1;;
+ 1) ;;
+ *) echo "$me: too many arguments$help" >&2
+ exit 1;;
esac
# Separate what the user gave into CPU-COMPANY and OS or KERNEL-OS (if any).
# Here we must recognize all the valid KERNEL-OS combinations.
maybe_os=`echo $1 | sed 's/^\(.*\)-\([^-]*-[^-]*\)$/\2/'`
case $maybe_os in
- linux-gnu*)
+ nto-qnx* | linux-gnu* | storm-chaos* | os2-emx*)
os=-$maybe_os
basic_machine=`echo $1 | sed 's/^\(.*\)-\([^-]*-[^-]*\)$/\1/'`
;;
@@ -94,7 +143,7 @@ case $os in
-convergent* | -ncr* | -news | -32* | -3600* | -3100* | -hitachi* |\
-c[123]* | -convex* | -sun | -crds | -omron* | -dg | -ultra | -tti* | \
-harris | -dolphin | -highlevel | -gould | -cbm | -ns | -masscomp | \
- -apple)
+ -apple | -axis)
os=
basic_machine=$1
;;
@@ -166,27 +215,40 @@ esac
case $basic_machine in
# Recognize the basic CPU types without company name.
# Some are omitted here because they have special meanings below.
- tahoe | i860 | ia64 | m32r | m68k | m68000 | m88k | ns32k | arc | arm \
- | arme[lb] | pyramid | mn10200 | mn10300 | tron | a29k \
+ tahoe | i860 | ia64 | m32r | m68k | m68000 | m88k | ns32k | arc \
+ | arm | arme[lb] | arm[bl]e | armv[2345] | armv[345][lb] | strongarm | xscale \
+ | pyramid | mn10200 | mn10300 | tron | a29k \
| 580 | i960 | h8300 \
+ | x86 | ppcbe | mipsbe | mipsle | shbe | shle \
| hppa | hppa1.0 | hppa1.1 | hppa2.0 | hppa2.0w | hppa2.0n \
- | alpha | alphaev[4-7] | alphaev56 | alphapca5[67] \
- | we32k | ns16k | clipper | i370 | sh | powerpc | powerpcle \
- | 1750a | dsp16xx | pdp11 | mips16 | mips64 | mipsel | mips64el \
+ | hppa64 \
+ | alpha | alphaev[4-8] | alphaev56 | alphapca5[67] \
+ | alphaev6[78] \
+ | we32k | ns16k | clipper | i370 | sh | sh[34] \
+ | powerpc | powerpcle \
+ | 1750a | dsp16xx | pdp10 | pdp11 \
+ | mips16 | mips64 | mipsel | mips64el \
| mips64orion | mips64orionel | mipstx39 | mipstx39el \
| mips64vr4300 | mips64vr4300el | mips64vr4100 | mips64vr4100el \
- | mips64vr5000 | miprs64vr5000el | mcore \
- | sparc | sparclet | sparclite | sparc64 | sparcv9 | v850 | c4x \
- | thumb | d10v | fr30)
+ | mips64vr5000 | miprs64vr5000el | mcore | s390 | s390x \
+ | sparc | sparclet | sparclite | sparc64 | sparcv9 | sparcv9b \
+ | v850 | c4x \
+ | thumb | d10v | d30v | fr30 | avr | openrisc | tic80 \
+ | pj | pjl | h8500 | z8k)
+ basic_machine=$basic_machine-unknown
+ ;;
+ m6811 | m68hc11 | m6812 | m68hc12)
+ # Motorola 68HC11/12.
basic_machine=$basic_machine-unknown
+ os=-none
;;
- m88110 | m680[12346]0 | m683?2 | m68360 | m5200 | z8k | v70 | h8500 | w65 | pj | pjl)
+ m88110 | m680[12346]0 | m683?2 | m68360 | m5200 | z8k | v70 | w65 | z8k)
;;
# We use `pc' rather than `unknown'
# because (1) that's what they normally are, and
# (2) the word "unknown" tends to confuse beginning users.
- i[34567]86)
+ i*86 | x86_64)
basic_machine=$basic_machine-pc
;;
# Object if more than one company name word.
@@ -196,23 +258,30 @@ case $basic_machine in
;;
# Recognize the basic CPU types with company name.
# FIXME: clean up the formatting here.
- vax-* | tahoe-* | i[34567]86-* | i860-* | ia64-* | m32r-* | m68k-* | m68000-* \
- | m88k-* | sparc-* | ns32k-* | fx80-* | arc-* | arm-* | c[123]* \
+ vax-* | tahoe-* | i*86-* | i860-* | ia64-* | m32r-* | m68k-* | m68000-* \
+ | m88k-* | sparc-* | ns32k-* | fx80-* | arc-* | c[123]* \
+ | arm-* | armbe-* | armle-* | armv*-* | strongarm-* | xscale-* \
| mips-* | pyramid-* | tron-* | a29k-* | romp-* | rs6000-* \
| power-* | none-* | 580-* | cray2-* | h8300-* | h8500-* | i960-* \
| xmp-* | ymp-* \
- | hppa-* | hppa1.0-* | hppa1.1-* | hppa2.0-* | hppa2.0w-* | hppa2.0n-* \
- | alpha-* | alphaev[4-7]-* | alphaev56-* | alphapca5[67]-* \
+ | x86-* | ppcbe-* | mipsbe-* | mipsle-* | shbe-* | shle-* \
+ | hppa-* | hppa1.0-* | hppa1.1-* | hppa2.0-* | hppa2.0w-* \
+ | hppa2.0n-* | hppa64-* \
+ | alpha-* | alphaev[4-8]-* | alphaev56-* | alphapca5[67]-* \
+ | alphaev6[78]-* \
| we32k-* | cydra-* | ns16k-* | pn-* | np1-* | xps100-* \
| clipper-* | orion-* \
- | sparclite-* | pdp11-* | sh-* | powerpc-* | powerpcle-* \
- | sparc64-* | sparcv9-* | sparc86x-* | mips16-* | mips64-* | mipsel-* \
+ | sparclite-* | pdp10-* | pdp11-* | sh-* | sh[34]-* | sh[34]eb-* \
+ | powerpc-* | powerpcle-* | sparc64-* | sparcv9-* | sparcv9b-* | sparc86x-* \
+ | mips16-* | mips64-* | mipsel-* \
| mips64el-* | mips64orion-* | mips64orionel-* \
| mips64vr4100-* | mips64vr4100el-* | mips64vr4300-* | mips64vr4300el-* \
| mipstx39-* | mipstx39el-* | mcore-* \
- | f301-* | armv*-* | t3e-* \
+ | f30[01]-* | f700-* | s390-* | s390x-* | sv1-* | t3e-* \
+ | [cjt]90-* \
| m88110-* | m680[01234]0-* | m683?2-* | m68360-* | z8k-* | d10v-* \
- | thumb-* | v850-* | d30v-* | tic30-* | c30-* | fr30-* )
+ | thumb-* | v850-* | d30v-* | tic30-* | tic80-* | c30-* | fr30-* \
+ | bs2000-* | tic54x-* | c54x-* | x86_64-* | pj-* | pjl-*)
;;
# Recognize the various machine names and aliases which stand
# for a CPU type and a company and sometimes even an OS.
@@ -249,14 +318,14 @@ case $basic_machine in
os=-sysv
;;
amiga | amiga-*)
- basic_machine=m68k-cbm
+ basic_machine=m68k-unknown
;;
amigaos | amigados)
- basic_machine=m68k-cbm
+ basic_machine=m68k-unknown
os=-amigaos
;;
amigaunix | amix)
- basic_machine=m68k-cbm
+ basic_machine=m68k-unknown
os=-sysv4
;;
apollo68)
@@ -303,13 +372,16 @@ case $basic_machine in
basic_machine=cray2-cray
os=-unicos
;;
- [ctj]90-cray)
- basic_machine=c90-cray
+ [cjt]90)
+ basic_machine=${basic_machine}-cray
os=-unicos
;;
crds | unos)
basic_machine=m68k-crds
;;
+ cris | cris-* | etrax*)
+ basic_machine=cris-axis
+ ;;
da30 | da30-*)
basic_machine=m68k-da30
;;
@@ -357,6 +429,10 @@ case $basic_machine in
basic_machine=tron-gmicro
os=-sysv
;;
+ go32)
+ basic_machine=i386-pc
+ os=-go32
+ ;;
h3050r* | hiux*)
basic_machine=hppa1.1-hitachi
os=-hiuxwe2
@@ -432,19 +508,19 @@ case $basic_machine in
basic_machine=i370-ibm
;;
# I'm not sure what "Sysv32" means. Should this be sysv3.2?
- i[34567]86v32)
+ i*86v32)
basic_machine=`echo $1 | sed -e 's/86.*/86-pc/'`
os=-sysv32
;;
- i[34567]86v4*)
+ i*86v4*)
basic_machine=`echo $1 | sed -e 's/86.*/86-pc/'`
os=-sysv4
;;
- i[34567]86v)
+ i*86v)
basic_machine=`echo $1 | sed -e 's/86.*/86-pc/'`
os=-sysv
;;
- i[34567]86sol2)
+ i*86sol2)
basic_machine=`echo $1 | sed -e 's/86.*/86-pc/'`
os=-solaris2
;;
@@ -456,17 +532,6 @@ case $basic_machine in
basic_machine=i386-unknown
os=-vsta
;;
- i386-go32 | go32)
- basic_machine=i386-unknown
- os=-go32
- ;;
- i386-mingw32 | mingw32)
- basic_machine=i386-unknown
- os=-mingw32
- ;;
- i386-qnx | qnx)
- basic_machine=i386-qnx
- ;;
iris | iris4d)
basic_machine=mips-sgi
case $os in
@@ -492,6 +557,10 @@ case $basic_machine in
basic_machine=ns32k-utek
os=-sysv
;;
+ mingw32)
+ basic_machine=i386-pc
+ os=-mingw32
+ ;;
miniframe)
basic_machine=m68000-convergent
;;
@@ -513,12 +582,16 @@ case $basic_machine in
mips3*)
basic_machine=`echo $basic_machine | sed -e 's/mips3/mips64/'`-unknown
;;
+ mmix*)
+ basic_machine=mmix-knuth
+ os=-mmixware
+ ;;
monitor)
basic_machine=m68k-rom68k
os=-coff
;;
msdos)
- basic_machine=i386-unknown
+ basic_machine=i386-pc
os=-msdos
;;
mvs)
@@ -582,9 +655,16 @@ case $basic_machine in
basic_machine=i960-intel
os=-mon960
;;
+ nonstopux)
+ basic_machine=mips-compaq
+ os=-nonstopux
+ ;;
np1)
basic_machine=np1-gould
;;
+ nsr-tandem)
+ basic_machine=nsr-tandem
+ ;;
op50n-* | op60c-*)
basic_machine=hppa1.1-oki
os=-proelf
@@ -614,28 +694,28 @@ case $basic_machine in
pc532 | pc532-*)
basic_machine=ns32k-pc532
;;
- pentium | p5 | k5 | k6 | nexen)
+ pentium | p5 | k5 | k6 | nexgen)
basic_machine=i586-pc
;;
- pentiumpro | p6 | 6x86)
+ pentiumpro | p6 | 6x86 | athlon)
basic_machine=i686-pc
;;
pentiumii | pentium2)
- basic_machine=i786-pc
+ basic_machine=i686-pc
;;
- pentium-* | p5-* | k5-* | k6-* | nexen-*)
+ pentium-* | p5-* | k5-* | k6-* | nexgen-*)
basic_machine=i586-`echo $basic_machine | sed 's/^[^-]*-//'`
;;
- pentiumpro-* | p6-* | 6x86-*)
+ pentiumpro-* | p6-* | 6x86-* | athlon-*)
basic_machine=i686-`echo $basic_machine | sed 's/^[^-]*-//'`
;;
pentiumii-* | pentium2-*)
- basic_machine=i786-`echo $basic_machine | sed 's/^[^-]*-//'`
+ basic_machine=i686-`echo $basic_machine | sed 's/^[^-]*-//'`
;;
pn)
basic_machine=pn-gould
;;
- power) basic_machine=rs6000-ibm
+ power) basic_machine=power-ibm
;;
ppc) basic_machine=powerpc-unknown
;;
@@ -650,6 +730,10 @@ case $basic_machine in
ps2)
basic_machine=i386-ibm
;;
+ pw32)
+ basic_machine=i586-unknown
+ os=-pw32
+ ;;
rom68k)
basic_machine=m68k-rom68k
os=-coff
@@ -729,6 +813,10 @@ case $basic_machine in
sun386 | sun386i | roadrunner)
basic_machine=i386-sun
;;
+ sv1)
+ basic_machine=sv1-cray
+ os=-unicos
+ ;;
symmetry)
basic_machine=i386-sequent
os=-dynix
@@ -737,6 +825,10 @@ case $basic_machine in
basic_machine=t3e-cray
os=-unicos
;;
+ tic54x | c54x*)
+ basic_machine=tic54x-unknown
+ os=-coff
+ ;;
tx39)
basic_machine=mipstx39-unknown
;;
@@ -832,13 +924,20 @@ case $basic_machine in
vax)
basic_machine=vax-dec
;;
+ pdp10)
+ # there are many clones, so DEC is not a safe bet
+ basic_machine=pdp10-unknown
+ ;;
pdp11)
basic_machine=pdp11-dec
;;
we32k)
basic_machine=we32k-att
;;
- sparc | sparcv9)
+ sh3 | sh4)
+ basic_machine=sh-unknown
+ ;;
+ sparc | sparcv9 | sparcv9b)
basic_machine=sparc-sun
;;
cydra)
@@ -860,6 +959,9 @@ case $basic_machine in
basic_machine=c4x-none
os=-coff
;;
+ *-unknown)
+ # Make sure to match an already-canonicalized machine name.
+ ;;
*)
echo Invalid configuration \`$1\': machine \`$basic_machine\' not recognized 1>&2
exit 1
@@ -918,12 +1020,26 @@ case $os in
| -udi* | -eabi* | -lites* | -ieee* | -go32* | -aux* \
| -cygwin* | -pe* | -psos* | -moss* | -proelf* | -rtems* \
| -mingw32* | -linux-gnu* | -uxpv* | -beos* | -mpeix* | -udk* \
- | -interix* | -uwin* | -rhapsody* | -opened* | -openstep* | -oskit*)
+ | -interix* | -uwin* | -rhapsody* | -darwin* | -opened* \
+ | -openstep* | -oskit* | -conix* | -pw32* | -nonstopux* \
+ | -storm-chaos* | -tops10* | -tenex* | -tops20* | -its* | -os2*)
# Remember, each alternative MUST END IN *, to match a version number.
;;
+ -qnx*)
+ case $basic_machine in
+ x86-* | i*86-*)
+ ;;
+ *)
+ os=-nto$os
+ ;;
+ esac
+ ;;
+ -nto*)
+ os=-nto-qnx
+ ;;
-sim | -es1800* | -hms* | -xray | -os68k* | -none* | -v88r* \
| -windows* | -osx | -abug | -netware* | -os9* | -beos* \
- | -macos* | -mpw* | -magic* | -mon960* | -lnews*)
+ | -macos* | -mpw* | -magic* | -mmixware* | -mon960* | -lnews*)
;;
-mac*)
os=`echo $os | sed -e 's|mac|macos|'`
@@ -940,6 +1056,9 @@ case $os in
-opened*)
os=-openedition
;;
+ -wince*)
+ os=-wince
+ ;;
-osfrose*)
os=-osfrose
;;
@@ -964,6 +1083,9 @@ case $os in
-ns2 )
os=-nextstep2
;;
+ -nsk*)
+ os=-nsk
+ ;;
# Preserve the version number of sinix5.
-sinix5.*)
os=`echo $os | sed -e 's|sinix|sysv|'`
@@ -977,9 +1099,6 @@ case $os in
-oss*)
os=-sysv3
;;
- -qnx)
- os=-qnx4
- ;;
-svr4)
os=-sysv4
;;
@@ -1001,7 +1120,7 @@ case $os in
-xenix)
os=-xenix
;;
- -*mint | -*MiNT)
+ -*mint | -mint[0-9]* | -*MiNT | -MiNT[0-9]*)
os=-mint
;;
-none)
@@ -1035,6 +1154,9 @@ case $basic_machine in
arm*-semi)
os=-aout
;;
+ pdp10-*)
+ os=-tops20
+ ;;
pdp11-*)
os=-none
;;
@@ -1143,7 +1265,7 @@ case $basic_machine in
*-masscomp)
os=-rtu
;;
- f301-fujitsu)
+ f30[01]-fujitsu | f700-fujitsu)
os=-uxpv
;;
*-rom68k)
@@ -1221,7 +1343,7 @@ case $basic_machine in
-mpw* | -macos*)
vendor=apple
;;
- -*mint | -*MiNT)
+ -*mint | -mint[0-9]* | -*MiNT | -MiNT[0-9]*)
vendor=atari
;;
esac
@@ -1230,3 +1352,11 @@ case $basic_machine in
esac
echo $basic_machine$os
+exit 0
+
+# Local variables:
+# eval: (add-hook 'write-file-hooks 'time-stamp)
+# time-stamp-start: "timestamp='"
+# time-stamp-format: "%:y-%02m-%02d"
+# time-stamp-end: "'"
+# End:
diff --git a/srclib/pcre/configure.in b/srclib/pcre/configure.in
index a947f19d2d..e158d467e0 100644
--- a/srclib/pcre/configure.in
+++ b/srclib/pcre/configure.in
@@ -17,19 +17,20 @@ dnl digits for minor numbers less than 10. There are unlikely to be
dnl that many releases anyway.
PCRE_MAJOR=3
-PCRE_MINOR=1
-PCRE_DATE=09-Feb-2000
+PCRE_MINOR=9
+PCRE_DATE=02-Jan-2002
PCRE_VERSION=${PCRE_MAJOR}.${PCRE_MINOR}
dnl Provide versioning information for libtool shared libraries that
dnl are built by default on Unix systems.
-PCRE_LIB_VERSION=0:0:0
+PCRE_LIB_VERSION=0:1:0
PCRE_POSIXLIB_VERSION=0:0:0
dnl Checks for programs.
AC_PROG_CC
+AC_PROG_INSTALL
AC_PROG_RANLIB
dnl Checks for header files.
@@ -44,26 +45,45 @@ AC_TYPE_SIZE_T
dnl Checks for library functions.
-AC_CHECK_FUNCS(memmove strerror)
+AC_CHECK_FUNCS(bcopy memmove strerror)
-dnl Handle --enable-shared-libraries
+dnl Handle --enable-utf8
-LIBTOOL=libtool
-LIBSUFFIX=la
-AC_ARG_ENABLE(shared,
-[ --disable-shared build PCRE as a static library],
-if test "$enableval" = "no"; then
- LIBTOOL=
- LIBSUFFIX=a
+AC_ARG_ENABLE(utf8,
+[ --enable-utf8 enable UTF8 support (incomplete)],
+if test "$enableval" = "yes"; then
+ UTF8=-DSUPPORT_UTF8
fi
)
+dnl Handle --enable-newline-is-cr
+
+AC_ARG_ENABLE(newline-is-cr,
+[ --enable-newline-is-cr use CR as the newline character],
+if test "$enableval" = "yes"; then
+ NEWLINE=-DNEWLINE=13
+fi
+)
+
+dnl Handle --enable-newline-is-lf
+
+AC_ARG_ENABLE(newline-is-lf,
+[ --enable-newline-is-lf use LF as the newline character],
+if test "$enableval" = "yes"; then
+ NEWLINE=-DNEWLINE=10
+fi
+)
+
+dnl Now arrange to build libtool
+
+AC_PROG_LIBTOOL
+
dnl "Export" these variables
AC_SUBST(HAVE_MEMMOVE)
AC_SUBST(HAVE_STRERROR)
-AC_SUBST(LIBTOOL)
-AC_SUBST(LIBSUFFIX)
+AC_SUBST(NEWLINE)
+AC_SUBST(UTF8)
AC_SUBST(PCRE_MAJOR)
AC_SUBST(PCRE_MINOR)
AC_SUBST(PCRE_DATE)
@@ -71,5 +91,5 @@ AC_SUBST(PCRE_VERSION)
AC_SUBST(PCRE_LIB_VERSION)
AC_SUBST(PCRE_POSIXLIB_VERSION)
-dnl This must be last; it determines what files are written
-AC_OUTPUT(Makefile pcre.h:pcre.in pcre-config,[chmod a+x pcre-config])
+dnl This must be last; it determines what files are written as well as config.h
+AC_OUTPUT(Makefile pcre.h:pcre.in pcre-config:pcre-config.in RunTest:RunTest.in,[chmod a+x RunTest pcre-config])
diff --git a/srclib/pcre/dftables.c b/srclib/pcre/dftables.c
index d572dfd3e6..fe4ffcdb7a 100644
--- a/srclib/pcre/dftables.c
+++ b/srclib/pcre/dftables.c
@@ -8,7 +8,7 @@ and semantics are as close as possible to those of the Perl 5 language.
Written by: Philip Hazel <ph10@cam.ac.uk>
- Copyright (c) 1997-2000 University of Cambridge
+ Copyright (c) 1997-2001 University of Cambridge
-----------------------------------------------------------------------------
Permission is granted to anyone to use this software for any purpose on any
@@ -53,7 +53,7 @@ order to be consistent. */
int main(void)
{
int i;
-unsigned const char *tables = pcre_maketables();
+const unsigned char *tables = pcre_maketables();
printf(
"/*************************************************\n"
diff --git a/srclib/pcre/doc/Tech.Notes b/srclib/pcre/doc/Tech.Notes
index 03904db3cc..f5ca280115 100644
--- a/srclib/pcre/doc/Tech.Notes
+++ b/srclib/pcre/doc/Tech.Notes
@@ -135,7 +135,7 @@ end of each byte.
Back references
---------------
-OP_REF is followed by a single byte containing the reference number.
+OP_REF is followed by two bytes containing the reference number.
Repeating character classes and back references
@@ -163,11 +163,21 @@ Brackets and alternation
A pair of non-capturing (round) brackets is wrapped round each expression at
compile time, so alternation always happens in the context of brackets.
+
Non-capturing brackets use the opcode OP_BRA, while capturing brackets use
OP_BRA+1, OP_BRA+2, etc. [Note for North Americans: "bracket" to some English
speakers, including myself, can be round, square, curly, or pointy. Hence this
usage.]
+Originally PCRE was limited to 99 capturing brackets (so as not to use up all
+the opcodes). From release 3.5, there is no limit. What happens is that the
+first ones, up to EXTRACT_BASIC_MAX are handled with separate opcodes, as
+above. If there are more, the opcode is set to EXTRACT_BASIC_MAX+1, and the
+first operation in the bracket is OP_BRANUMBER, followed by a 2-byte bracket
+number. This opcode is ignored while matching, but is fished out when handling
+the bracket itself. (They could have all been done like this, but I was making
+minimal changes.)
+
A bracket opcode is followed by two bytes which give the offset to the next
alternative OP_ALT or, if there aren't any branches, to the matching KET
opcode. Each OP_ALT is followed by two bytes giving the offset to the next one,
@@ -191,8 +201,8 @@ appropriate.
A subpattern with a bounded maximum repetition is replicated in a nested
fashion up to the maximum number of times, with BRAZERO or BRAMINZERO before
each replication after the minimum, so that, for example, (abc){2,5} is
-compiled as (abc)(abc)((abc)((abc)(abc)?)?)?. The 200-bracket limit does not
-apply to these internally generated brackets.
+compiled as (abc)(abc)((abc)((abc)(abc)?)?)?. The 99 and 200 bracket limits do
+not apply to these internally generated brackets.
Assertions
@@ -202,9 +212,10 @@ Forward assertions are just like other subpatterns, but starting with one of
the opcodes OP_ASSERT or OP_ASSERT_NOT. Backward assertions use the opcodes
OP_ASSERTBACK and OP_ASSERTBACK_NOT, and the first opcode inside the assertion
is OP_REVERSE, followed by a two byte count of the number of characters to move
-back the pointer in the subject string. A separate count is present in each
-alternative of a lookbehind assertion, allowing them to have different fixed
-lengths.
+back the pointer in the subject string. When operating in UTF-8 mode, the count
+is a character count rather than a byte count. A separate count is present in
+each alternative of a lookbehind assertion, allowing them to have different
+fixed lengths.
Once-only subpatterns
@@ -219,7 +230,7 @@ Conditional subpatterns
These are like other subpatterns, but they start with the opcode OP_COND. If
the condition is a back reference, this is stored at the start of the
-subpattern using the opcode OP_CREF followed by one byte containing the
+subpattern using the opcode OP_CREF followed by two bytes containing the
reference number. Otherwise, a conditional subpattern will always start with
one of the assertions.
@@ -239,4 +250,4 @@ the compiled data.
Philip Hazel
-February 2000
+August 2001
diff --git a/srclib/pcre/doc/pcre.3 b/srclib/pcre/doc/pcre.3
index bd435e9ecd..738f76b4a9 100644
--- a/srclib/pcre/doc/pcre.3
+++ b/srclib/pcre/doc/pcre.3
@@ -44,6 +44,12 @@ pcre - Perl-compatible regular expressions.
.B int *\fIovector\fR, int \fIstringcount\fR, "const char ***\fIlistptr\fR);"
.PP
.br
+.B void pcre_free_substring(const char *\fIstringptr\fR);
+.PP
+.br
+.B void pcre_free_substring_list(const char **\fIstringptr\fR);
+.PP
+.br
.B const unsigned char *pcre_maketables(void);
.PP
.br
@@ -70,7 +76,9 @@ pcre - Perl-compatible regular expressions.
The PCRE library is a set of functions that implement regular expression
pattern matching using the same syntax and semantics as Perl 5, with just a few
differences (see below). The current implementation corresponds to Perl 5.005,
-with some additional features from the Perl development release.
+with some additional features from later versions. This includes some
+experimental, incomplete support for UTF-8 encoded strings. Details of exactly
+what is and what is not supported are given below.
PCRE has its own native API, which is described in this document. There is also
a set of wrapper functions that correspond to the POSIX regular expression API.
@@ -84,12 +92,18 @@ contain the major and minor release numbers for the library. Applications can
use these to include support for different releases.
The functions \fBpcre_compile()\fR, \fBpcre_study()\fR, and \fBpcre_exec()\fR
-are used for compiling and matching regular expressions, while
-\fBpcre_copy_substring()\fR, \fBpcre_get_substring()\fR, and
+are used for compiling and matching regular expressions. A sample program that
+demonstrates the simplest way of using them is given in the file
+\fIpcredemo.c\fR. The last section of this man page describes how to run it.
+
+The functions \fBpcre_copy_substring()\fR, \fBpcre_get_substring()\fR, and
\fBpcre_get_substring_list()\fR are convenience functions for extracting
-captured substrings from a matched subject string. The function
-\fBpcre_maketables()\fR is used (optionally) to build a set of character tables
-in the current locale for passing to \fBpcre_compile()\fR.
+captured substrings from a matched subject string; \fBpcre_free_substring()\fR
+and \fBpcre_free_substring_list()\fR are also provided, to free the memory used
+for extracted strings.
+
+The function \fBpcre_maketables()\fR is used (optionally) to build a set of
+character tables in the current locale for passing to \fBpcre_compile()\fR.
The function \fBpcre_fullinfo()\fR is used to find out information about a
compiled pattern; \fBpcre_info()\fR is an obsolete version which returns only
@@ -117,18 +131,22 @@ the same compiled pattern can safely be used by several threads at once.
The function \fBpcre_compile()\fR is called to compile a pattern into an
internal form. The pattern is a C string terminated by a binary zero, and
is passed in the argument \fIpattern\fR. A pointer to a single block of memory
-that is obtained via \fBpcre_malloc\fR is returned. This contains the
-compiled code and related data. The \fBpcre\fR type is defined for this for
-convenience, but in fact \fBpcre\fR is just a typedef for \fBvoid\fR, since the
-contents of the block are not externally defined. It is up to the caller to
-free the memory when it is no longer required.
-.PP
+that is obtained via \fBpcre_malloc\fR is returned. This contains the compiled
+code and related data. The \fBpcre\fR type is defined for the returned block;
+this is a typedef for a structure whose contents are not externally defined. It
+is up to the caller to free the memory when it is no longer required.
+
+Although the compiled code of a PCRE regex is relocatable, that is, it does not
+depend on memory location, the complete \fBpcre\fR data block is not
+fully relocatable, because it contains a copy of the \fItableptr\fR argument,
+which is an address (see below).
+
The size of a compiled pattern is roughly proportional to the length of the
pattern string, except that each character class (other than those containing
just a single character, negated or not) requires 33 bytes, and repeat
quantifiers with a minimum greater than one or a bounded maximum cause the
relevant portions of the compiled pattern to be replicated.
-.PP
+
The \fIoptions\fR argument contains independent bits that affect the
compilation. It should be zero if no options are required. Some of the options,
in particular, those that are compatible with Perl, can also be set and unset
@@ -137,19 +155,31 @@ below). For these options, the contents of the \fIoptions\fR argument specifies
their initial settings at the start of compilation and execution. The
PCRE_ANCHORED option can be set at the time of matching as well as at compile
time.
-.PP
+
If \fIerrptr\fR is NULL, \fBpcre_compile()\fR returns NULL immediately.
Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fR returns
NULL, and sets the variable pointed to by \fIerrptr\fR to point to a textual
error message. The offset from the start of the pattern to the character where
the error was discovered is placed in the variable pointed to by
\fIerroffset\fR, which must not be NULL. If it is, an immediate error is given.
-.PP
+
If the final argument, \fItableptr\fR, is NULL, PCRE uses a default set of
character tables which are built when it is compiled, using the default C
locale. Otherwise, \fItableptr\fR must be the result of a call to
\fBpcre_maketables()\fR. See the section on locale support below.
-.PP
+
+This code fragment shows a typical straightforward call to \fBpcre_compile()\fR:
+
+ pcre *re;
+ const char *error;
+ int erroffset;
+ re = pcre_compile(
+ "^A.*Z", /* the pattern */
+ 0, /* default options */
+ &error, /* for error message */
+ &erroffset, /* for error offset */
+ NULL); /* use default character tables */
+
The following option bits are defined in the header file:
PCRE_ANCHORED
@@ -223,15 +253,23 @@ This option inverts the "greediness" of the quantifiers so that they are not
greedy by default, but become greedy if followed by "?". It is not compatible
with Perl. It can also be set by a (?U) option setting within the pattern.
+ PCRE_UTF8
+
+This option causes PCRE to regard both the pattern and the subject as strings
+of UTF-8 characters instead of just byte strings. However, it is available only
+if PCRE has been built to include UTF-8 support. If not, the use of this option
+provokes an error. Support for UTF-8 is new, experimental, and incomplete.
+Details of exactly what it entails are given below.
+
.SH STUDYING A PATTERN
When a pattern is going to be used several times, it is worth spending more
time analyzing it in order to speed up the time taken for matching. The
function \fBpcre_study()\fR takes a pointer to a compiled pattern as its first
-argument, and returns a pointer to a \fBpcre_extra\fR block (another \fBvoid\fR
-typedef) containing additional information about the pattern; this can be
-passed to \fBpcre_exec()\fR. If no additional information is available, NULL
-is returned.
+argument, and returns a pointer to a \fBpcre_extra\fR block (another typedef
+for a structure with hidden contents) containing additional information about
+the pattern; this can be passed to \fBpcre_exec()\fR. If no additional
+information is available, NULL is returned.
The second argument contains option bits. At present, no options are defined
for \fBpcre_study()\fR, and this argument should always be zero.
@@ -240,6 +278,14 @@ The third argument for \fBpcre_study()\fR is a pointer to an error message. If
studying succeeds (even if no data is returned), the variable it points to is
set to NULL. Otherwise it points to a textual error message.
+This is a typical call to \fBpcre_study\fR():
+
+ pcre_extra *pe;
+ pe = pcre_study(
+ re, /* result of pcre_compile() */
+ 0, /* no options exist */
+ &error); /* set to NULL or points to a message */
+
At present, studying a pattern is useful only for non-anchored patterns that do
not have a single fixed starting character. A bitmap of possible starting
characters is created.
@@ -289,13 +335,24 @@ the following negative numbers:
PCRE_ERROR_BADMAGIC the "magic number" was not found
PCRE_ERROR_BADOPTION the value of \fIwhat\fR was invalid
+Here is a typical call of \fBpcre_fullinfo()\fR, to obtain the length of the
+compiled pattern:
+
+ int rc;
+ unsigned long int length;
+ rc = pcre_fullinfo(
+ re, /* result of pcre_compile() */
+ pe, /* result of pcre_study(), or NULL */
+ PCRE_INFO_SIZE, /* what is required */
+ &length); /* where to put the data */
+
The possible values for the third argument are defined in \fBpcre.h\fR, and are
as follows:
PCRE_INFO_OPTIONS
Return a copy of the options with which the pattern was compiled. The fourth
-argument should point to au \fBunsigned long int\fR variable. These option bits
+argument should point to an \fBunsigned long int\fR variable. These option bits
are those specified in the call to \fBpcre_compile()\fR, modified by any
top-level option settings within the pattern itself, and with the PCRE_ANCHORED
bit forcibly set if the form of the pattern implies that it can match only at
@@ -323,7 +380,7 @@ no back references.
Return information about the first character of any matched string, for a
non-anchored pattern. If there is a fixed first character, e.g. from a pattern
-such as (cat|cow|coyote), then it is returned in the integer pointed to by
+such as (cat|cow|coyote), it is returned in the integer pointed to by
\fIwhere\fR. Otherwise, if either
(a) the pattern was compiled with the PCRE_MULTILINE option, and every branch
@@ -332,9 +389,9 @@ starts with "^", or
(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set
(if it were set, the pattern would be anchored),
-then -1 is returned, indicating that the pattern matches only at the
-start of a subject string or after any "\\n" within the string. Otherwise -2 is
-returned. For anchored patterns, -2 is returned.
+-1 is returned, indicating that the pattern matches only at the start of a
+subject string or after any "\\n" within the string. Otherwise -2 is returned.
+For anchored patterns, -2 is returned.
PCRE_INFO_FIRSTTABLE
@@ -376,6 +433,20 @@ pre-compiled pattern, which is passed in the \fIcode\fR argument. If the
pattern has been studied, the result of the study should be passed in the
\fIextra\fR argument. Otherwise this must be NULL.
+Here is an example of a simple call to \fBpcre_exec()\fR:
+
+ int rc;
+ int ovector[30];
+ rc = pcre_exec(
+ re, /* result of pcre_compile() */
+ NULL, /* we didn't study the pattern */
+ "some string", /* the subject string */
+ 11, /* the length of the subject string */
+ 0, /* start at offset 0 in the subject */
+ 0, /* default options */
+ ovector, /* vector for substring information */
+ 30); /* number of elements in the vector */
+
The PCRE_ANCHORED option can be passed in the \fIoptions\fR argument, whose
unused bits must be zero. However, if a pattern was compiled with
PCRE_ANCHORED, or turned out to be anchored by virtue of its contents, it
@@ -417,9 +488,9 @@ below) and trying an ordinary match again.
The subject string is passed as a pointer in \fIsubject\fR, a length in
\fIlength\fR, and a starting offset in \fIstartoffset\fR. Unlike the pattern
-string, it may contain binary zero characters. When the starting offset is
-zero, the search for a match starts at the beginning of the subject, and this
-is by far the most common case.
+string, the subject may contain binary zero characters. When the starting
+offset is zero, the search for a match starts at the beginning of the subject,
+and this is by far the most common case.
A non-zero starting offset is useful when searching for another match in the
same subject by calling \fBpcre_exec()\fR again after a previous success.
@@ -550,15 +621,15 @@ is a pointer to the vector of integer offsets that was passed to
were captured by the match, including the substring that matched the entire
regular expression. This is the value returned by \fBpcre_exec\fR if it
is greater than zero. If \fBpcre_exec()\fR returned zero, indicating that it
-ran out of space in \fIovector\fR, then the value passed as
-\fIstringcount\fR should be the size of the vector divided by three.
+ran out of space in \fIovector\fR, the value passed as \fIstringcount\fR should
+be the size of the vector divided by three.
The functions \fBpcre_copy_substring()\fR and \fBpcre_get_substring()\fR
extract a single substring, whose number is given as \fIstringnumber\fR. A
value of zero extracts the substring that matched the entire pattern, while
higher values extract the captured substrings. For \fBpcre_copy_substring()\fR,
the string is placed in \fIbuffer\fR, whose length is given by
-\fIbuffersize\fR, while for \fBpcre_get_substring()\fR a new block of store is
+\fIbuffersize\fR, while for \fBpcre_get_substring()\fR a new block of memory is
obtained via \fBpcre_malloc\fR, and its address is returned via
\fIstringptr\fR. The yield of the function is the length of the string, not
including the terminating zero, or one of
@@ -590,6 +661,15 @@ string. This can be distinguished from a genuine zero-length substring by
inspecting the appropriate offset in \fIovector\fR, which is negative for unset
substrings.
+The two convenience functions \fBpcre_free_substring()\fR and
+\fBpcre_free_substring_list()\fR can be used to free the memory returned by
+a previous call of \fBpcre_get_substring()\fR or
+\fBpcre_get_substring_list()\fR, respectively. They do nothing more than call
+the function pointed to by \fBpcre_free\fR, which of course could be called
+directly from a C program. However, PCRE is used in some situations where it is
+linked via a special interface to another programming language which cannot use
+\fBpcre_free\fR directly; it is for these cases that the functions are
+provided.
.SH LIMITATIONS
@@ -597,8 +677,9 @@ There are some size limitations in PCRE but it is hoped that they will never in
practice be relevant.
The maximum length of a compiled pattern is 65539 (sic) bytes.
All values in repeating quantifiers must be less than 65536.
-The maximum number of capturing subpatterns is 99.
-The maximum number of all parenthesized subpatterns, including capturing
+There maximum number of capturing subpatterns is 65535.
+There is no limit to the number of non-capturing subpatterns, but the maximum
+depth of nesting of all kinds of parenthesized subpattern, including capturing
subpatterns, assertions, and other types of subpattern, is 200.
The maximum length of a subject string is the largest positive number that an
@@ -650,7 +731,7 @@ patterns using the non-Perl item (?R).
with the settings of captured strings when part of a pattern is repeated. For
example, matching "aba" against the pattern /^(a(b)?)+$/ sets $2 to the value
"b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves $2 unset. However, if
-the pattern is changed to /^(aa(b(b))?)+$/ then $2 (and $3) get set.
+the pattern is changed to /^(aa(b(b))?)+$/ then $2 (and $3) are set.
In Perl 5.004 $2 is set in both cases, and that is also true of PCRE. If in the
future Perl changes to a consistent state that is different, PCRE may change to
@@ -691,8 +772,14 @@ The syntax and semantics of the regular expressions supported by PCRE are
described below. Regular expressions are also described in the Perl
documentation and in a number of other books, some of which have copious
examples. Jeffrey Friedl's "Mastering Regular Expressions", published by
-O'Reilly (ISBN 1-56592-257), covers them in great detail. The description
-here is intended as reference documentation.
+O'Reilly (ISBN 1-56592-257), covers them in great detail.
+
+The description here is intended as reference documentation. The basic
+operation of PCRE is on strings of bytes. However, there is the beginnings of
+some support for UTF-8 character strings. To use this support you must
+configure PCRE to include it, and then call \fBpcre_compile()\fR with the
+PCRE_UTF8 option. How this affects the pattern matching is described in the
+final section of this document.
A regular expression is a pattern that is matched against a subject string from
left to right. Most characters stand for themselves in a pattern, and match the
@@ -914,16 +1001,16 @@ PCRE_MULTILINE is set.
Note that the sequences \\A, \\Z, and \\z can be used to match the start and
end of the subject in both modes, and if all branches of a pattern start with
-\\A is it always anchored, whether PCRE_MULTILINE is set or not.
+\\A it is always anchored, whether PCRE_MULTILINE is set or not.
.SH FULL STOP (PERIOD, DOT)
Outside a character class, a dot in the pattern matches any one character in
the subject, including a non-printing character, but not (by default) newline.
-If the PCRE_DOTALL option is set, then dots match newlines as well. The
-handling of dot is entirely independent of the handling of circumflex and
-dollar, the only relationship being that they both involve newline characters.
-Dot has no special meaning in a character class.
+If the PCRE_DOTALL option is set, dots match newlines as well. The handling of
+dot is entirely independent of the handling of circumflex and dollar, the only
+relationship being that they both involve newline characters. Dot has no
+special meaning in a character class.
.SH SQUARE BRACKETS
@@ -1018,7 +1105,7 @@ negation, which is indicated by a ^ character after the colon. For example,
[12[:^digit:]]
-matches "1", "2", or any non-digit. PCRE (and Perl) also recogize the POSIX
+matches "1", "2", or any non-digit. PCRE (and Perl) also recognize the POSIX
syntax [.ch.] and [=ch=] where "ch" is a "collating element", but these are not
supported, and an error is given if they are encountered.
@@ -1116,7 +1203,7 @@ For example, if the string "the red king" is matched against the pattern
the ((red|white) (king|queen))
the captured substrings are "red king", "red", and "king", and are numbered 1,
-2, and 3.
+2, and 3, respectively.
The fact that plain parentheses fulfil two functions is not always helpful.
There are often times when a grouping subpattern is required without a
@@ -1210,10 +1297,10 @@ to the string
/* first command */ not comment /* second comment */
-fails, because it matches the entire string due to the greediness of the .*
+fails, because it matches the entire string owing to the greediness of the .*
item.
-However, if a quantifier is followed by a question mark, then it ceases to be
+However, if a quantifier is followed by a question mark, it ceases to be
greedy, and instead matches the minimum number of times possible, so the
pattern
@@ -1229,8 +1316,8 @@ own right. Because it has two uses, it can sometimes appear doubled, as in
which matches one digit by preference, but can match two if that is the only
way the rest of the pattern matches.
-If the PCRE_UNGREEDY option is set (an option which is not available in Perl)
-then the quantifiers are not greedy by default, but individual ones can be made
+If the PCRE_UNGREEDY option is set (an option which is not available in Perl),
+the quantifiers are not greedy by default, but individual ones can be made
greedy by following them with a question mark. In other words, it inverts the
default behaviour.
@@ -1239,8 +1326,8 @@ is greater than 1 or with a limited maximum, more store is required for the
compiled pattern, in proportion to the size of the minimum or maximum.
If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equivalent
-to Perl's /s) is set, thus allowing the . to match newlines, then the pattern
-is implicitly anchored, because whatever follows will be tried against every
+to Perl's /s) is set, thus allowing the . to match newlines, the pattern is
+implicitly anchored, because whatever follows will be tried against every
character position in the subject string, so there is no point in retrying the
overall match at any position after the first. PCRE treats such a pattern as
though it were preceded by \\A. In cases where it is known that the subject
@@ -1284,7 +1371,7 @@ itself. So the pattern
matches "sense and sensibility" and "response and responsibility", but not
"sense and responsibility". If caseful matching is in force at the time of the
-back reference, then the case of letters is relevant. For example,
+back reference, the case of letters is relevant. For example,
((?i)rah)\\s+\\1
@@ -1292,7 +1379,7 @@ matches "rah rah" and "RAH RAH", but not "RAH rah", even though the original
capturing subpattern is matched caselessly.
There may be more than one back reference to the same subpattern. If a
-subpattern has not actually been used in a particular match, then any back
+subpattern has not actually been used in a particular match, any back
references to it always fail. For example, the pattern
(a|(bc))\\2
@@ -1300,9 +1387,9 @@ references to it always fail. For example, the pattern
always fails if it starts to match "a" rather than "bc". Because there may be
up to 99 back references, all digits following the backslash are taken
as part of a potential back reference number. If the pattern continues with a
-digit character, then some delimiter must be used to terminate the back
-reference. If the PCRE_EXTENDED option is set, this can be whitespace.
-Otherwise an empty comment can be used.
+digit character, some delimiter must be used to terminate the back reference.
+If the PCRE_EXTENDED option is set, this can be whitespace. Otherwise an empty
+comment can be used.
A back reference that occurs inside the parentheses to which it refers fails
when the subpattern is first used, so, for example, (a\\1) never matches.
@@ -1311,7 +1398,7 @@ example, the pattern
(a|b\\1)+
-matches any number of "a"s and also "aba", "ababaa" etc. At each iteration of
+matches any number of "a"s and also "aba", "ababbaa" etc. At each iteration of
the subpattern, the back reference matches the character string corresponding
to the previous iteration. In order for this to work, the pattern must be such
that the first iteration does not need to match the back reference. This can be
@@ -1390,7 +1477,7 @@ Several assertions (of any sort) may occur in succession. For example,
matches "foo" preceded by three digits that are not "999". Notice that each of
the assertions is applied independently at the same point in the subject
string. First there is a check that the previous three characters are all
-digits, then there is a check that the same three characters are not "999".
+digits, and then there is a check that the same three characters are not "999".
This pattern does \fInot\fR match "foo" preceded by six characters, the first
of which are digits and the last three of which are not "999". For example, it
doesn't match "123abcfoo". A pattern to do that is
@@ -1475,15 +1562,15 @@ what follows matches the rest of the pattern. If the pattern is specified as
^.*abcd$
-then the initial .* matches the entire string at first, but when this fails
-(because there is no following "a"), it backtracks to match all but the last
-character, then all but the last two characters, and so on. Once again the
-search for "a" covers the entire string, from right to left, so we are no
-better off. However, if the pattern is written as
+the initial .* matches the entire string at first, but when this fails (because
+there is no following "a"), it backtracks to match all but the last character,
+then all but the last two characters, and so on. Once again the search for "a"
+covers the entire string, from right to left, so we are no better off. However,
+if the pattern is written as
^(?>.*)(?<=abcd)
-then there can be no backtracking for the .* item; it can match only the entire
+there can be no backtracking for the .* item; it can match only the entire
string. The subsequent lookbehind assertion does a single test on the last four
characters. If it fails, the match fails immediately. For long strings, this
approach makes a significant difference to the processing time.
@@ -1528,11 +1615,11 @@ no-pattern (if present) is used. If there are more than two alternatives in the
subpattern, a compile-time error occurs.
There are two kinds of condition. If the text between the parentheses consists
-of a sequence of digits, then the condition is satisfied if the capturing
-subpattern of that number has previously matched. Consider the following
-pattern, which contains non-significant white space to make it more readable
-(assume the PCRE_EXTENDED option) and to divide it into three parts for ease
-of discussion:
+of a sequence of digits, the condition is satisfied if the capturing subpattern
+of that number has previously matched. The number must be greater than zero.
+Consider the following pattern, which contains non-significant white space to
+make it more readable (assume the PCRE_EXTENDED option) and to divide it into
+three parts for ease of discussion:
( \\( )? [^()]+ (?(1) \\) )
@@ -1622,7 +1709,7 @@ on at the top level. If additional parentheses are added, giving
\\( ( ( (?>[^()]+) | (?R) )* ) \\)
^ ^
^ ^
-then the string they capture is "ab(cd)ef", the contents of the top level
+the string they capture is "ab(cd)ef", the contents of the top level
parentheses. If there are more than 15 capturing parentheses in a pattern, PCRE
has to obtain extra memory to store data during a recursion, which it does by
using \fBpcre_malloc\fR, freeing it via \fBpcre_free\fR afterwards. If no
@@ -1686,6 +1773,208 @@ with the pattern above. The former gives a failure almost instantly when
applied to a whole line of "a" characters, whereas the latter takes an
appreciable time with strings longer than about 20 characters.
+
+.SH UTF-8 SUPPORT
+Starting at release 3.3, PCRE has some support for character strings encoded
+in the UTF-8 format. This is incomplete, and is regarded as experimental. In
+order to use it, you must configure PCRE to include UTF-8 support in the code,
+and, in addition, you must call \fBpcre_compile()\fR with the PCRE_UTF8 option
+flag. When you do this, both the pattern and any subject strings that are
+matched against it are treated as UTF-8 strings instead of just strings of
+bytes, but only in the cases that are mentioned below.
+
+If you compile PCRE with UTF-8 support, but do not use it at run time, the
+library will be a bit bigger, but the additional run time overhead is limited
+to testing the PCRE_UTF8 flag in several places, so should not be very large.
+
+PCRE assumes that the strings it is given contain valid UTF-8 codes. It does
+not diagnose invalid UTF-8 strings. If you pass invalid UTF-8 strings to PCRE,
+the results are undefined.
+
+Running with PCRE_UTF8 set causes these changes in the way PCRE works:
+
+1. In a pattern, the escape sequence \\x{...}, where the contents of the braces
+is a string of hexadecimal digits, is interpreted as a UTF-8 character whose
+code number is the given hexadecimal number, for example: \\x{1234}. This
+inserts from one to six literal bytes into the pattern, using the UTF-8
+encoding. If a non-hexadecimal digit appears between the braces, the item is
+not recognized.
+
+2. The original hexadecimal escape sequence, \\xhh, generates a two-byte UTF-8
+character if its value is greater than 127.
+
+3. Repeat quantifiers are NOT correctly handled if they follow a multibyte
+character. For example, \\x{100}* and \\xc3+ do not work. If you want to
+repeat such characters, you must enclose them in non-capturing parentheses,
+for example (?:\\x{100}), at present.
+
+4. The dot metacharacter matches one UTF-8 character instead of a single byte.
+
+5. Unlike literal UTF-8 characters, the dot metacharacter followed by a
+repeat quantifier does operate correctly on UTF-8 characters instead of
+single bytes.
+
+4. Although the \\x{...} escape is permitted in a character class, characters
+whose values are greater than 255 cannot be included in a class.
+
+5. A class is matched against a UTF-8 character instead of just a single byte,
+but it can match only characters whose values are less than 256. Characters
+with greater values always fail to match a class.
+
+6. Repeated classes work correctly on multiple characters.
+
+7. Classes containing just a single character whose value is greater than 127
+(but less than 256), for example, [\\x80] or [^\\x{93}], do not work because
+these are optimized into single byte matches. In the first case, of course,
+the class brackets are just redundant.
+
+8. Lookbehind assertions move backwards in the subject by a fixed number of
+characters instead of a fixed number of bytes. Simple cases have been tested
+to work correctly, but there may be hidden gotchas herein.
+
+9. The character types such as \\d and \\w do not work correctly with UTF-8
+characters. They continue to test a single byte.
+
+10. Anything not explicitly mentioned here continues to work in bytes rather
+than in characters.
+
+The following UTF-8 features of Perl 5.6 are not implemented:
+
+1. The escape sequence \\C to match a single byte.
+
+2. The use of Unicode tables and properties and escapes \\p, \\P, and \\X.
+
+
+.SH SAMPLE PROGRAM
+The code below is a simple, complete demonstration program, to get you started
+with using PCRE. This code is also supplied in the file \fIpcredemo.c\fR in the
+PCRE distribution.
+
+The program compiles the regular expression that is its first argument, and
+matches it against the subject string in its second argument. No options are
+set, and default character tables are used. If matching succeeds, the program
+outputs the portion of the subject that matched, together with the contents of
+any captured substrings.
+
+On a Unix system that has PCRE installed in \fI/usr/local\fR, you can compile
+the demonstration program using a command like this:
+
+ gcc -o pcredemo pcredemo.c -I/usr/local/include -L/usr/local/lib -lpcre
+
+Then you can run simple tests like this:
+
+ ./pcredemo 'cat|dog' 'the cat sat on the mat'
+
+Note that there is a much more comprehensive test program, called
+\fBpcretest\fR, which supports many more facilities for testing regular
+expressions. The \fBpcredemo\fR program is provided as a simple coding example.
+
+On some operating systems (e.g. Solaris) you may get an error like this when
+you try to run \fBpcredemo\fR:
+
+ ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or directory
+
+This is caused by the way shared library support works on those systems. You
+need to add
+
+ -R/usr/local/lib
+
+to the compile command to get round this problem. Here's the code:
+
+ #include <stdio.h>
+ #include <string.h>
+ #include <pcre.h>
+
+ #define OVECCOUNT 30 /* should be a multiple of 3 */
+
+ int main(int argc, char **argv)
+ {
+ pcre *re;
+ const char *error;
+ int erroffset;
+ int ovector[OVECCOUNT];
+ int rc, i;
+
+ if (argc != 3)
+ {
+ printf("Two arguments required: a regex and a "
+ "subject string\\n");
+ return 1;
+ }
+
+ /* Compile the regular expression in the first argument */
+
+ re = pcre_compile(
+ argv[1], /* the pattern */
+ 0, /* default options */
+ &error, /* for error message */
+ &erroffset, /* for error offset */
+ NULL); /* use default character tables */
+
+ /* Compilation failed: print the error message and exit */
+
+ if (re == NULL)
+ {
+ printf("PCRE compilation failed at offset %d: %s\\n",
+ erroffset, error);
+ return 1;
+ }
+
+ /* Compilation succeeded: match the subject in the second
+ argument */
+
+ rc = pcre_exec(
+ re, /* the compiled pattern */
+ NULL, /* we didn't study the pattern */
+ argv[2], /* the subject string */
+ (int)strlen(argv[2]), /* the length of the subject */
+ 0, /* start at offset 0 in the subject */
+ 0, /* default options */
+ ovector, /* vector for substring information */
+ OVECCOUNT); /* number of elements in the vector */
+
+ /* Matching failed: handle error cases */
+
+ if (rc < 0)
+ {
+ switch(rc)
+ {
+ case PCRE_ERROR_NOMATCH: printf("No match\\n"); break;
+ /*
+ Handle other special cases if you like
+ */
+ default: printf("Matching error %d\\n", rc); break;
+ }
+ return 1;
+ }
+
+ /* Match succeded */
+
+ printf("Match succeeded\\n");
+
+ /* The output vector wasn't big enough */
+
+ if (rc == 0)
+ {
+ rc = OVECCOUNT/3;
+ printf("ovector only has room for %d captured "
+ substrings\\n", rc - 1);
+ }
+
+ /* Show substrings stored in the output vector */
+
+ for (i = 0; i < rc; i++)
+ {
+ char *substring_start = argv[2] + ovector[2*i];
+ int substring_length = ovector[2*i+1] - ovector[2*i];
+ printf("%2d: %.*s\\n", i, substring_length,
+ substring_start);
+ }
+
+ return 0;
+ }
+
+
.SH AUTHOR
Philip Hazel <ph10@cam.ac.uk>
.br
@@ -1697,6 +1986,6 @@ Cambridge CB2 3QG, England.
.br
Phone: +44 1223 334714
-Last updated: 27 January 2000
+Last updated: 15 August 2001
.br
-Copyright (c) 1997-2000 University of Cambridge.
+Copyright (c) 1997-2001 University of Cambridge.
diff --git a/srclib/pcre/doc/pcre.html b/srclib/pcre/doc/pcre.html
index 2ce289007d..3e9eb36b68 100644
--- a/srclib/pcre/doc/pcre.html
+++ b/srclib/pcre/doc/pcre.html
@@ -37,7 +37,9 @@ conversion went wrong.
<LI><A NAME="TOC27" HREF="#SEC27">COMMENTS</A>
<LI><A NAME="TOC28" HREF="#SEC28">RECURSIVE PATTERNS</A>
<LI><A NAME="TOC29" HREF="#SEC29">PERFORMANCE</A>
-<LI><A NAME="TOC30" HREF="#SEC30">AUTHOR</A>
+<LI><A NAME="TOC30" HREF="#SEC30">UTF-8 SUPPORT</A>
+<LI><A NAME="TOC31" HREF="#SEC31">SAMPLE PROGRAM</A>
+<LI><A NAME="TOC32" HREF="#SEC32">AUTHOR</A>
</UL>
<LI><A NAME="SEC1" HREF="#TOC1">NAME</A>
<P>
@@ -76,6 +78,12 @@ pcre - Perl-compatible regular expressions.
<B>int *<I>ovector</I>, int <I>stringcount</I>, const char ***<I>listptr</I>);</B>
</P>
<P>
+<B>void pcre_free_substring(const char *<I>stringptr</I>);</B>
+</P>
+<P>
+<B>void pcre_free_substring_list(const char **<I>stringptr</I>);</B>
+</P>
+<P>
<B>const unsigned char *pcre_maketables(void);</B>
</P>
<P>
@@ -100,7 +108,9 @@ pcre - Perl-compatible regular expressions.
The PCRE library is a set of functions that implement regular expression
pattern matching using the same syntax and semantics as Perl 5, with just a few
differences (see below). The current implementation corresponds to Perl 5.005,
-with some additional features from the Perl development release.
+with some additional features from later versions. This includes some
+experimental, incomplete support for UTF-8 encoded strings. Details of exactly
+what is and what is not supported are given below.
</P>
<P>
PCRE has its own native API, which is described in this document. There is also
@@ -117,12 +127,20 @@ use these to include support for different releases.
</P>
<P>
The functions <B>pcre_compile()</B>, <B>pcre_study()</B>, and <B>pcre_exec()</B>
-are used for compiling and matching regular expressions, while
-<B>pcre_copy_substring()</B>, <B>pcre_get_substring()</B>, and
+are used for compiling and matching regular expressions. A sample program that
+demonstrates the simplest way of using them is given in the file
+<I>pcredemo.c</I>. The last section of this man page describes how to run it.
+</P>
+<P>
+The functions <B>pcre_copy_substring()</B>, <B>pcre_get_substring()</B>, and
<B>pcre_get_substring_list()</B> are convenience functions for extracting
-captured substrings from a matched subject string. The function
-<B>pcre_maketables()</B> is used (optionally) to build a set of character tables
-in the current locale for passing to <B>pcre_compile()</B>.
+captured substrings from a matched subject string; <B>pcre_free_substring()</B>
+and <B>pcre_free_substring_list()</B> are also provided, to free the memory used
+for extracted strings.
+</P>
+<P>
+The function <B>pcre_maketables()</B> is used (optionally) to build a set of
+character tables in the current locale for passing to <B>pcre_compile()</B>.
</P>
<P>
The function <B>pcre_fullinfo()</B> is used to find out information about a
@@ -153,11 +171,16 @@ the same compiled pattern can safely be used by several threads at once.
The function <B>pcre_compile()</B> is called to compile a pattern into an
internal form. The pattern is a C string terminated by a binary zero, and
is passed in the argument <I>pattern</I>. A pointer to a single block of memory
-that is obtained via <B>pcre_malloc</B> is returned. This contains the
-compiled code and related data. The <B>pcre</B> type is defined for this for
-convenience, but in fact <B>pcre</B> is just a typedef for <B>void</B>, since the
-contents of the block are not externally defined. It is up to the caller to
-free the memory when it is no longer required.
+that is obtained via <B>pcre_malloc</B> is returned. This contains the compiled
+code and related data. The <B>pcre</B> type is defined for the returned block;
+this is a typedef for a structure whose contents are not externally defined. It
+is up to the caller to free the memory when it is no longer required.
+</P>
+<P>
+Although the compiled code of a PCRE regex is relocatable, that is, it does not
+depend on memory location, the complete <B>pcre</B> data block is not
+fully relocatable, because it contains a copy of the <I>tableptr</I> argument,
+which is an address (see below).
</P>
<P>
The size of a compiled pattern is roughly proportional to the length of the
@@ -191,6 +214,22 @@ locale. Otherwise, <I>tableptr</I> must be the result of a call to
<B>pcre_maketables()</B>. See the section on locale support below.
</P>
<P>
+This code fragment shows a typical straightforward call to <B>pcre_compile()</B>:
+</P>
+<P>
+<PRE>
+ pcre *re;
+ const char *error;
+ int erroffset;
+ re = pcre_compile(
+ "^A.*Z", /* the pattern */
+ 0, /* default options */
+ &error, /* for error message */
+ &erroffset, /* for error offset */
+ NULL); /* use default character tables */
+</PRE>
+</P>
+<P>
The following option bits are defined in the header file:
</P>
<P>
@@ -297,15 +336,27 @@ This option inverts the "greediness" of the quantifiers so that they are not
greedy by default, but become greedy if followed by "?". It is not compatible
with Perl. It can also be set by a (?U) option setting within the pattern.
</P>
+<P>
+<PRE>
+ PCRE_UTF8
+</PRE>
+</P>
+<P>
+This option causes PCRE to regard both the pattern and the subject as strings
+of UTF-8 characters instead of just byte strings. However, it is available only
+if PCRE has been built to include UTF-8 support. If not, the use of this option
+provokes an error. Support for UTF-8 is new, experimental, and incomplete.
+Details of exactly what it entails are given below.
+</P>
<LI><A NAME="SEC6" HREF="#TOC1">STUDYING A PATTERN</A>
<P>
When a pattern is going to be used several times, it is worth spending more
time analyzing it in order to speed up the time taken for matching. The
function <B>pcre_study()</B> takes a pointer to a compiled pattern as its first
-argument, and returns a pointer to a <B>pcre_extra</B> block (another <B>void</B>
-typedef) containing additional information about the pattern; this can be
-passed to <B>pcre_exec()</B>. If no additional information is available, NULL
-is returned.
+argument, and returns a pointer to a <B>pcre_extra</B> block (another typedef
+for a structure with hidden contents) containing additional information about
+the pattern; this can be passed to <B>pcre_exec()</B>. If no additional
+information is available, NULL is returned.
</P>
<P>
The second argument contains option bits. At present, no options are defined
@@ -317,6 +368,18 @@ studying succeeds (even if no data is returned), the variable it points to is
set to NULL. Otherwise it points to a textual error message.
</P>
<P>
+This is a typical call to <B>pcre_study</B>():
+</P>
+<P>
+<PRE>
+ pcre_extra *pe;
+ pe = pcre_study(
+ re, /* result of pcre_compile() */
+ 0, /* no options exist */
+ &error); /* set to NULL or points to a message */
+</PRE>
+</P>
+<P>
At present, studying a pattern is useful only for non-anchored patterns that do
not have a single fixed starting character. A bitmap of possible starting
characters is created.
@@ -376,6 +439,21 @@ the following negative numbers:
</PRE>
</P>
<P>
+Here is a typical call of <B>pcre_fullinfo()</B>, to obtain the length of the
+compiled pattern:
+</P>
+<P>
+<PRE>
+ int rc;
+ unsigned long int length;
+ rc = pcre_fullinfo(
+ re, /* result of pcre_compile() */
+ pe, /* result of pcre_study(), or NULL */
+ PCRE_INFO_SIZE, /* what is required */
+ &length); /* where to put the data */
+</PRE>
+</P>
+<P>
The possible values for the third argument are defined in <B>pcre.h</B>, and are
as follows:
</P>
@@ -386,7 +464,7 @@ as follows:
</P>
<P>
Return a copy of the options with which the pattern was compiled. The fourth
-argument should point to au <B>unsigned long int</B> variable. These option bits
+argument should point to an <B>unsigned long int</B> variable. These option bits
are those specified in the call to <B>pcre_compile()</B>, modified by any
top-level option settings within the pattern itself, and with the PCRE_ANCHORED
bit forcibly set if the form of the pattern implies that it can match only at
@@ -430,7 +508,7 @@ no back references.
<P>
Return information about the first character of any matched string, for a
non-anchored pattern. If there is a fixed first character, e.g. from a pattern
-such as (cat|cow|coyote), then it is returned in the integer pointed to by
+such as (cat|cow|coyote), it is returned in the integer pointed to by
<I>where</I>. Otherwise, if either
</P>
<P>
@@ -442,9 +520,9 @@ starts with "^", or
(if it were set, the pattern would be anchored),
</P>
<P>
-then -1 is returned, indicating that the pattern matches only at the
-start of a subject string or after any "\n" within the string. Otherwise -2 is
-returned. For anchored patterns, -2 is returned.
+-1 is returned, indicating that the pattern matches only at the start of a
+subject string or after any "\n" within the string. Otherwise -2 is returned.
+For anchored patterns, -2 is returned.
</P>
<P>
<PRE>
@@ -501,6 +579,24 @@ pattern has been studied, the result of the study should be passed in the
<I>extra</I> argument. Otherwise this must be NULL.
</P>
<P>
+Here is an example of a simple call to <B>pcre_exec()</B>:
+</P>
+<P>
+<PRE>
+ int rc;
+ int ovector[30];
+ rc = pcre_exec(
+ re, /* result of pcre_compile() */
+ NULL, /* we didn't study the pattern */
+ "some string", /* the subject string */
+ 11, /* the length of the subject string */
+ 0, /* start at offset 0 in the subject */
+ 0, /* default options */
+ ovector, /* vector for substring information */
+ 30); /* number of elements in the vector */
+</PRE>
+</P>
+<P>
The PCRE_ANCHORED option can be passed in the <I>options</I> argument, whose
unused bits must be zero. However, if a pattern was compiled with
PCRE_ANCHORED, or turned out to be anchored by virtue of its contents, it
@@ -561,9 +657,9 @@ below) and trying an ordinary match again.
<P>
The subject string is passed as a pointer in <I>subject</I>, a length in
<I>length</I>, and a starting offset in <I>startoffset</I>. Unlike the pattern
-string, it may contain binary zero characters. When the starting offset is
-zero, the search for a match starts at the beginning of the subject, and this
-is by far the most common case.
+string, the subject may contain binary zero characters. When the starting
+offset is zero, the search for a match starts at the beginning of the subject,
+and this is by far the most common case.
</P>
<P>
A non-zero starting offset is useful when searching for another match in the
@@ -734,8 +830,8 @@ is a pointer to the vector of integer offsets that was passed to
were captured by the match, including the substring that matched the entire
regular expression. This is the value returned by <B>pcre_exec</B> if it
is greater than zero. If <B>pcre_exec()</B> returned zero, indicating that it
-ran out of space in <I>ovector</I>, then the value passed as
-<I>stringcount</I> should be the size of the vector divided by three.
+ran out of space in <I>ovector</I>, the value passed as <I>stringcount</I> should
+be the size of the vector divided by three.
</P>
<P>
The functions <B>pcre_copy_substring()</B> and <B>pcre_get_substring()</B>
@@ -743,7 +839,7 @@ extract a single substring, whose number is given as <I>stringnumber</I>. A
value of zero extracts the substring that matched the entire pattern, while
higher values extract the captured substrings. For <B>pcre_copy_substring()</B>,
the string is placed in <I>buffer</I>, whose length is given by
-<I>buffersize</I>, while for <B>pcre_get_substring()</B> a new block of store is
+<I>buffersize</I>, while for <B>pcre_get_substring()</B> a new block of memory is
obtained via <B>pcre_malloc</B>, and its address is returned via
<I>stringptr</I>. The yield of the function is the length of the string, not
including the terminating zero, or one of
@@ -789,14 +885,26 @@ string. This can be distinguished from a genuine zero-length substring by
inspecting the appropriate offset in <I>ovector</I>, which is negative for unset
substrings.
</P>
+<P>
+The two convenience functions <B>pcre_free_substring()</B> and
+<B>pcre_free_substring_list()</B> can be used to free the memory returned by
+a previous call of <B>pcre_get_substring()</B> or
+<B>pcre_get_substring_list()</B>, respectively. They do nothing more than call
+the function pointed to by <B>pcre_free</B>, which of course could be called
+directly from a C program. However, PCRE is used in some situations where it is
+linked via a special interface to another programming language which cannot use
+<B>pcre_free</B> directly; it is for these cases that the functions are
+provided.
+</P>
<LI><A NAME="SEC11" HREF="#TOC1">LIMITATIONS</A>
<P>
There are some size limitations in PCRE but it is hoped that they will never in
practice be relevant.
The maximum length of a compiled pattern is 65539 (sic) bytes.
All values in repeating quantifiers must be less than 65536.
-The maximum number of capturing subpatterns is 99.
-The maximum number of all parenthesized subpatterns, including capturing
+There maximum number of capturing subpatterns is 65535.
+There is no limit to the number of non-capturing subpatterns, but the maximum
+depth of nesting of all kinds of parenthesized subpattern, including capturing
subpatterns, assertions, and other types of subpattern, is 200.
</P>
<P>
@@ -857,7 +965,7 @@ patterns using the non-Perl item (?R).
with the settings of captured strings when part of a pattern is repeated. For
example, matching "aba" against the pattern /^(a(b)?)+$/ sets $2 to the value
"b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves $2 unset. However, if
-the pattern is changed to /^(aa(b(b))?)+$/ then $2 (and $3) get set.
+the pattern is changed to /^(aa(b(b))?)+$/ then $2 (and $3) are set.
</P>
<P>
In Perl 5.004 $2 is set in both cases, and that is also true of PCRE. If in the
@@ -908,8 +1016,15 @@ The syntax and semantics of the regular expressions supported by PCRE are
described below. Regular expressions are also described in the Perl
documentation and in a number of other books, some of which have copious
examples. Jeffrey Friedl's "Mastering Regular Expressions", published by
-O'Reilly (ISBN 1-56592-257), covers them in great detail. The description
-here is intended as reference documentation.
+O'Reilly (ISBN 1-56592-257), covers them in great detail.
+</P>
+<P>
+The description here is intended as reference documentation. The basic
+operation of PCRE is on strings of bytes. However, there is the beginnings of
+some support for UTF-8 character strings. To use this support you must
+configure PCRE to include it, and then call <B>pcre_compile()</B> with the
+PCRE_UTF8 option. How this affects the pattern matching is described in the
+final section of this document.
</P>
<P>
A regular expression is a pattern that is matched against a subject string from
@@ -1180,16 +1295,16 @@ PCRE_MULTILINE is set.
<P>
Note that the sequences \A, \Z, and \z can be used to match the start and
end of the subject in both modes, and if all branches of a pattern start with
-\A is it always anchored, whether PCRE_MULTILINE is set or not.
+\A it is always anchored, whether PCRE_MULTILINE is set or not.
</P>
<LI><A NAME="SEC16" HREF="#TOC1">FULL STOP (PERIOD, DOT)</A>
<P>
Outside a character class, a dot in the pattern matches any one character in
the subject, including a non-printing character, but not (by default) newline.
-If the PCRE_DOTALL option is set, then dots match newlines as well. The
-handling of dot is entirely independent of the handling of circumflex and
-dollar, the only relationship being that they both involve newline characters.
-Dot has no special meaning in a character class.
+If the PCRE_DOTALL option is set, dots match newlines as well. The handling of
+dot is entirely independent of the handling of circumflex and dollar, the only
+relationship being that they both involve newline characters. Dot has no
+special meaning in a character class.
</P>
<LI><A NAME="SEC17" HREF="#TOC1">SQUARE BRACKETS</A>
<P>
@@ -1305,7 +1420,7 @@ negation, which is indicated by a ^ character after the colon. For example,
</PRE>
</P>
<P>
-matches "1", "2", or any non-digit. PCRE (and Perl) also recogize the POSIX
+matches "1", "2", or any non-digit. PCRE (and Perl) also recognize the POSIX
syntax [.ch.] and [=ch=] where "ch" is a "collating element", but these are not
supported, and an error is given if they are encountered.
</P>
@@ -1437,7 +1552,7 @@ For example, if the string "the red king" is matched against the pattern
</P>
<P>
the captured substrings are "red king", "red", and "king", and are numbered 1,
-2, and 3.
+2, and 3, respectively.
</P>
<P>
The fact that plain parentheses fulfil two functions is not always helpful.
@@ -1576,11 +1691,11 @@ to the string
</PRE>
</P>
<P>
-fails, because it matches the entire string due to the greediness of the .*
+fails, because it matches the entire string owing to the greediness of the .*
item.
</P>
<P>
-However, if a quantifier is followed by a question mark, then it ceases to be
+However, if a quantifier is followed by a question mark, it ceases to be
greedy, and instead matches the minimum number of times possible, so the
pattern
</P>
@@ -1605,8 +1720,8 @@ which matches one digit by preference, but can match two if that is the only
way the rest of the pattern matches.
</P>
<P>
-If the PCRE_UNGREEDY option is set (an option which is not available in Perl)
-then the quantifiers are not greedy by default, but individual ones can be made
+If the PCRE_UNGREEDY option is set (an option which is not available in Perl),
+the quantifiers are not greedy by default, but individual ones can be made
greedy by following them with a question mark. In other words, it inverts the
default behaviour.
</P>
@@ -1617,8 +1732,8 @@ compiled pattern, in proportion to the size of the minimum or maximum.
</P>
<P>
If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equivalent
-to Perl's /s) is set, thus allowing the . to match newlines, then the pattern
-is implicitly anchored, because whatever follows will be tried against every
+to Perl's /s) is set, thus allowing the . to match newlines, the pattern is
+implicitly anchored, because whatever follows will be tried against every
character position in the subject string, so there is no point in retrying the
overall match at any position after the first. PCRE treats such a pattern as
though it were preceded by \A. In cases where it is known that the subject
@@ -1677,7 +1792,7 @@ itself. So the pattern
<P>
matches "sense and sensibility" and "response and responsibility", but not
"sense and responsibility". If caseful matching is in force at the time of the
-back reference, then the case of letters is relevant. For example,
+back reference, the case of letters is relevant. For example,
</P>
<P>
<PRE>
@@ -1690,7 +1805,7 @@ capturing subpattern is matched caselessly.
</P>
<P>
There may be more than one back reference to the same subpattern. If a
-subpattern has not actually been used in a particular match, then any back
+subpattern has not actually been used in a particular match, any back
references to it always fail. For example, the pattern
</P>
<P>
@@ -1702,9 +1817,9 @@ references to it always fail. For example, the pattern
always fails if it starts to match "a" rather than "bc". Because there may be
up to 99 back references, all digits following the backslash are taken
as part of a potential back reference number. If the pattern continues with a
-digit character, then some delimiter must be used to terminate the back
-reference. If the PCRE_EXTENDED option is set, this can be whitespace.
-Otherwise an empty comment can be used.
+digit character, some delimiter must be used to terminate the back reference.
+If the PCRE_EXTENDED option is set, this can be whitespace. Otherwise an empty
+comment can be used.
</P>
<P>
A back reference that occurs inside the parentheses to which it refers fails
@@ -1718,7 +1833,7 @@ example, the pattern
</PRE>
</P>
<P>
-matches any number of "a"s and also "aba", "ababaa" etc. At each iteration of
+matches any number of "a"s and also "aba", "ababbaa" etc. At each iteration of
the subpattern, the back reference matches the character string corresponding
to the previous iteration. In order for this to work, the pattern must be such
that the first iteration does not need to match the back reference. This can be
@@ -1836,7 +1951,7 @@ Several assertions (of any sort) may occur in succession. For example,
matches "foo" preceded by three digits that are not "999". Notice that each of
the assertions is applied independently at the same point in the subject
string. First there is a check that the previous three characters are all
-digits, then there is a check that the same three characters are not "999".
+digits, and then there is a check that the same three characters are not "999".
This pattern does <I>not</I> match "foo" preceded by six characters, the first
of which are digits and the last three of which are not "999". For example, it
doesn't match "123abcfoo". A pattern to do that is
@@ -1957,11 +2072,11 @@ what follows matches the rest of the pattern. If the pattern is specified as
</PRE>
</P>
<P>
-then the initial .* matches the entire string at first, but when this fails
-(because there is no following "a"), it backtracks to match all but the last
-character, then all but the last two characters, and so on. Once again the
-search for "a" covers the entire string, from right to left, so we are no
-better off. However, if the pattern is written as
+the initial .* matches the entire string at first, but when this fails (because
+there is no following "a"), it backtracks to match all but the last character,
+then all but the last two characters, and so on. Once again the search for "a"
+covers the entire string, from right to left, so we are no better off. However,
+if the pattern is written as
</P>
<P>
<PRE>
@@ -1969,7 +2084,7 @@ better off. However, if the pattern is written as
</PRE>
</P>
<P>
-then there can be no backtracking for the .* item; it can match only the entire
+there can be no backtracking for the .* item; it can match only the entire
string. The subsequent lookbehind assertion does a single test on the last four
characters. If it fails, the match fails immediately. For long strings, this
approach makes a significant difference to the processing time.
@@ -2032,11 +2147,11 @@ subpattern, a compile-time error occurs.
</P>
<P>
There are two kinds of condition. If the text between the parentheses consists
-of a sequence of digits, then the condition is satisfied if the capturing
-subpattern of that number has previously matched. Consider the following
-pattern, which contains non-significant white space to make it more readable
-(assume the PCRE_EXTENDED option) and to divide it into three parts for ease
-of discussion:
+of a sequence of digits, the condition is satisfied if the capturing subpattern
+of that number has previously matched. The number must be greater than zero.
+Consider the following pattern, which contains non-significant white space to
+make it more readable (assume the PCRE_EXTENDED option) and to divide it into
+three parts for ease of discussion:
</P>
<P>
<PRE>
@@ -2157,7 +2272,7 @@ on at the top level. If additional parentheses are added, giving
^ ^
^ ^
</PRE>
-then the string they capture is "ab(cd)ef", the contents of the top level
+the string they capture is "ab(cd)ef", the contents of the top level
parentheses. If there are more than 15 capturing parentheses in a pattern, PCRE
has to obtain extra memory to store data during a recursion, which it does by
using <B>pcre_malloc</B>, freeing it via <B>pcre_free</B> afterwards. If no
@@ -2241,7 +2356,302 @@ with the pattern above. The former gives a failure almost instantly when
applied to a whole line of "a" characters, whereas the latter takes an
appreciable time with strings longer than about 20 characters.
</P>
-<LI><A NAME="SEC30" HREF="#TOC1">AUTHOR</A>
+<LI><A NAME="SEC30" HREF="#TOC1">UTF-8 SUPPORT</A>
+<P>
+Starting at release 3.3, PCRE has some support for character strings encoded
+in the UTF-8 format. This is incomplete, and is regarded as experimental. In
+order to use it, you must configure PCRE to include UTF-8 support in the code,
+and, in addition, you must call <B>pcre_compile()</B> with the PCRE_UTF8 option
+flag. When you do this, both the pattern and any subject strings that are
+matched against it are treated as UTF-8 strings instead of just strings of
+bytes, but only in the cases that are mentioned below.
+</P>
+<P>
+If you compile PCRE with UTF-8 support, but do not use it at run time, the
+library will be a bit bigger, but the additional run time overhead is limited
+to testing the PCRE_UTF8 flag in several places, so should not be very large.
+</P>
+<P>
+PCRE assumes that the strings it is given contain valid UTF-8 codes. It does
+not diagnose invalid UTF-8 strings. If you pass invalid UTF-8 strings to PCRE,
+the results are undefined.
+</P>
+<P>
+Running with PCRE_UTF8 set causes these changes in the way PCRE works:
+</P>
+<P>
+1. In a pattern, the escape sequence \x{...}, where the contents of the braces
+is a string of hexadecimal digits, is interpreted as a UTF-8 character whose
+code number is the given hexadecimal number, for example: \x{1234}. This
+inserts from one to six literal bytes into the pattern, using the UTF-8
+encoding. If a non-hexadecimal digit appears between the braces, the item is
+not recognized.
+</P>
+<P>
+2. The original hexadecimal escape sequence, \xhh, generates a two-byte UTF-8
+character if its value is greater than 127.
+</P>
+<P>
+3. Repeat quantifiers are NOT correctly handled if they follow a multibyte
+character. For example, \x{100}* and \xc3+ do not work. If you want to
+repeat such characters, you must enclose them in non-capturing parentheses,
+for example (?:\x{100}), at present.
+</P>
+<P>
+4. The dot metacharacter matches one UTF-8 character instead of a single byte.
+</P>
+<P>
+5. Unlike literal UTF-8 characters, the dot metacharacter followed by a
+repeat quantifier does operate correctly on UTF-8 characters instead of
+single bytes.
+</P>
+<P>
+4. Although the \x{...} escape is permitted in a character class, characters
+whose values are greater than 255 cannot be included in a class.
+</P>
+<P>
+5. A class is matched against a UTF-8 character instead of just a single byte,
+but it can match only characters whose values are less than 256. Characters
+with greater values always fail to match a class.
+</P>
+<P>
+6. Repeated classes work correctly on multiple characters.
+</P>
+<P>
+7. Classes containing just a single character whose value is greater than 127
+(but less than 256), for example, [\x80] or [^\x{93}], do not work because
+these are optimized into single byte matches. In the first case, of course,
+the class brackets are just redundant.
+</P>
+<P>
+8. Lookbehind assertions move backwards in the subject by a fixed number of
+characters instead of a fixed number of bytes. Simple cases have been tested
+to work correctly, but there may be hidden gotchas herein.
+</P>
+<P>
+9. The character types such as \d and \w do not work correctly with UTF-8
+characters. They continue to test a single byte.
+</P>
+<P>
+10. Anything not explicitly mentioned here continues to work in bytes rather
+than in characters.
+</P>
+<P>
+The following UTF-8 features of Perl 5.6 are not implemented:
+</P>
+<P>
+1. The escape sequence \C to match a single byte.
+</P>
+<P>
+2. The use of Unicode tables and properties and escapes \p, \P, and \X.
+</P>
+<LI><A NAME="SEC31" HREF="#TOC1">SAMPLE PROGRAM</A>
+<P>
+The code below is a simple, complete demonstration program, to get you started
+with using PCRE. This code is also supplied in the file <I>pcredemo.c</I> in the
+PCRE distribution.
+</P>
+<P>
+The program compiles the regular expression that is its first argument, and
+matches it against the subject string in its second argument. No options are
+set, and default character tables are used. If matching succeeds, the program
+outputs the portion of the subject that matched, together with the contents of
+any captured substrings.
+</P>
+<P>
+On a Unix system that has PCRE installed in <I>/usr/local</I>, you can compile
+the demonstration program using a command like this:
+</P>
+<P>
+<PRE>
+ gcc -o pcredemo pcredemo.c -I/usr/local/include -L/usr/local/lib -lpcre
+</PRE>
+</P>
+<P>
+Then you can run simple tests like this:
+</P>
+<P>
+<PRE>
+ ./pcredemo 'cat|dog' 'the cat sat on the mat'
+</PRE>
+</P>
+<P>
+Note that there is a much more comprehensive test program, called
+<B>pcretest</B>, which supports many more facilities for testing regular
+expressions. The <B>pcredemo</B> program is provided as a simple coding example.
+</P>
+<P>
+On some operating systems (e.g. Solaris) you may get an error like this when
+you try to run <B>pcredemo</B>:
+</P>
+<P>
+<PRE>
+ ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or directory
+</PRE>
+</P>
+<P>
+This is caused by the way shared library support works on those systems. You
+need to add
+</P>
+<P>
+<PRE>
+ -R/usr/local/lib
+</PRE>
+</P>
+<P>
+to the compile command to get round this problem. Here's the code:
+</P>
+<P>
+<PRE>
+ #include &#60;stdio.h&#62;
+ #include &#60;string.h&#62;
+ #include &#60;pcre.h&#62;
+</PRE>
+</P>
+<P>
+<PRE>
+ #define OVECCOUNT 30 /* should be a multiple of 3 */
+</PRE>
+</P>
+<P>
+<PRE>
+ int main(int argc, char **argv)
+ {
+ pcre *re;
+ const char *error;
+ int erroffset;
+ int ovector[OVECCOUNT];
+ int rc, i;
+</PRE>
+</P>
+<P>
+<PRE>
+ if (argc != 3)
+ {
+ printf("Two arguments required: a regex and a "
+ "subject string\n");
+ return 1;
+ }
+</PRE>
+</P>
+<P>
+<PRE>
+ /* Compile the regular expression in the first argument */
+</PRE>
+</P>
+<P>
+<PRE>
+ re = pcre_compile(
+ argv[1], /* the pattern */
+ 0, /* default options */
+ &error, /* for error message */
+ &erroffset, /* for error offset */
+ NULL); /* use default character tables */
+</PRE>
+</P>
+<P>
+<PRE>
+ /* Compilation failed: print the error message and exit */
+</PRE>
+</P>
+<P>
+<PRE>
+ if (re == NULL)
+ {
+ printf("PCRE compilation failed at offset %d: %s\n",
+ erroffset, error);
+ return 1;
+ }
+</PRE>
+</P>
+<P>
+<PRE>
+ /* Compilation succeeded: match the subject in the second
+ argument */
+</PRE>
+</P>
+<P>
+<PRE>
+ rc = pcre_exec(
+ re, /* the compiled pattern */
+ NULL, /* we didn't study the pattern */
+ argv[2], /* the subject string */
+ (int)strlen(argv[2]), /* the length of the subject */
+ 0, /* start at offset 0 in the subject */
+ 0, /* default options */
+ ovector, /* vector for substring information */
+ OVECCOUNT); /* number of elements in the vector */
+</PRE>
+</P>
+<P>
+<PRE>
+ /* Matching failed: handle error cases */
+</PRE>
+</P>
+<P>
+<PRE>
+ if (rc &#60; 0)
+ {
+ switch(rc)
+ {
+ case PCRE_ERROR_NOMATCH: printf("No match\n"); break;
+ /*
+ Handle other special cases if you like
+ */
+ default: printf("Matching error %d\n", rc); break;
+ }
+ return 1;
+ }
+</PRE>
+</P>
+<P>
+<PRE>
+ /* Match succeded */
+</PRE>
+</P>
+<P>
+<PRE>
+ printf("Match succeeded\n");
+</PRE>
+</P>
+<P>
+<PRE>
+ /* The output vector wasn't big enough */
+</PRE>
+</P>
+<P>
+<PRE>
+ if (rc == 0)
+ {
+ rc = OVECCOUNT/3;
+ printf("ovector only has room for %d captured "
+ substrings\n", rc - 1);
+ }
+</PRE>
+</P>
+<P>
+<PRE>
+ /* Show substrings stored in the output vector */
+</PRE>
+</P>
+<P>
+<PRE>
+ for (i = 0; i &#60; rc; i++)
+ {
+ char *substring_start = argv[2] + ovector[2*i];
+ int substring_length = ovector[2*i+1] - ovector[2*i];
+ printf("%2d: %.*s\n", i, substring_length,
+ substring_start);
+ }
+</PRE>
+</P>
+<P>
+<PRE>
+ return 0;
+ }
+</PRE>
+</P>
+<LI><A NAME="SEC32" HREF="#TOC1">AUTHOR</A>
<P>
Philip Hazel &#60;ph10@cam.ac.uk&#62;
<BR>
@@ -2254,6 +2664,6 @@ Cambridge CB2 3QG, England.
Phone: +44 1223 334714
</P>
<P>
-Last updated: 27 January 2000
+Last updated: 15 August 2001
<BR>
-Copyright (c) 1997-2000 University of Cambridge.
+Copyright (c) 1997-2001 University of Cambridge.
diff --git a/srclib/pcre/doc/pcre.txt b/srclib/pcre/doc/pcre.txt
index f28ee99e8b..95f148f3de 100644
--- a/srclib/pcre/doc/pcre.txt
+++ b/srclib/pcre/doc/pcre.txt
@@ -28,6 +28,10 @@ SYNOPSIS
int pcre_get_substring_list(const char *subject,
int *ovector, int stringcount, const char ***listptr);
+ void pcre_free_substring(const char *stringptr);
+
+ void pcre_free_substring_list(const char **stringptr);
+
const unsigned char *pcre_maketables(void);
int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
@@ -48,9 +52,12 @@ DESCRIPTION
The PCRE library is a set of functions that implement regu-
lar expression pattern matching using the same syntax and
semantics as Perl 5, with just a few differences (see
+
below). The current implementation corresponds to Perl
- 5.005, with some additional features from the Perl develop-
- ment release.
+ 5.005, with some additional features from later versions.
+ This includes some experimental, incomplete support for
+ UTF-8 encoded strings. Details of exactly what is and what
+ is not supported are given below.
PCRE has its own native API, which is described in this
document. There is also a set of wrapper functions that
@@ -67,13 +74,21 @@ DESCRIPTION
releases.
The functions pcre_compile(), pcre_study(), and pcre_exec()
- are used for compiling and matching regular expressions,
- while pcre_copy_substring(), pcre_get_substring(), and
- pcre_get_substring_list() are convenience functions for
+ are used for compiling and matching regular expressions. A
+ sample program that demonstrates the simplest way of using
+ them is given in the file pcredemo.c. The last section of
+ this man page describes how to run it.
+
+ The functions pcre_copy_substring(), pcre_get_substring(),
+ and pcre_get_substring_list() are convenience functions for
extracting captured substrings from a matched subject
- string. The function pcre_maketables() is used (optionally)
- to build a set of character tables in the current locale for
- passing to pcre_compile().
+ string; pcre_free_substring() and pcre_free_substring_list()
+ are also provided, to free the memory used for extracted
+ strings.
+
+ The function pcre_maketables() is used (optionally) to build
+ a set of character tables in the current locale for passing
+ to pcre_compile().
The function pcre_fullinfo() is used to find out information
about a compiled pattern; pcre_info() is an obsolete version
@@ -103,18 +118,22 @@ MULTI-THREADING
-
COMPILING A PATTERN
The function pcre_compile() is called to compile a pattern
into an internal form. The pattern is a C string terminated
by a binary zero, and is passed in the argument pattern. A
pointer to a single block of memory that is obtained via
pcre_malloc is returned. This contains the compiled code and
- related data. The pcre type is defined for this for conveni-
- ence, but in fact pcre is just a typedef for void, since the
- contents of the block are not externally defined. It is up
- to the caller to free the memory when it is no longer
- required.
+ related data. The pcre type is defined for the returned
+ block; this is a typedef for a structure whose contents are
+ not externally defined. It is up to the caller to free the
+ memory when it is no longer required.
+
+ Although the compiled code of a PCRE regex is relocatable,
+ that is, it does not depend on memory location, the complete
+ pcre data block is not fully relocatable, because it con-
+ tains a copy of the tableptr argument, which is an address
+ (see below).
The size of a compiled pattern is roughly proportional to
the length of the pattern string, except that each character
@@ -149,6 +168,19 @@ COMPILING A PATTERN
must be the result of a call to pcre_maketables(). See the
section on locale support below.
+ This code fragment shows a typical straightforward call to
+ pcre_compile():
+
+ pcre *re;
+ const char *error;
+ int erroffset;
+ re = pcre_compile(
+ "^A.*Z", /* the pattern */
+ 0, /* default options */
+ &error, /* for error message */
+ &erroffset, /* for error offset */
+ NULL); /* use default character tables */
+
The following option bits are defined in the header file:
PCRE_ANCHORED
@@ -235,6 +267,16 @@ COMPILING A PATTERN
followed by "?". It is not compatible with Perl. It can also
be set by a (?U) option setting within the pattern.
+ PCRE_UTF8
+
+ This option causes PCRE to regard both the pattern and the
+ subject as strings of UTF-8 characters instead of just byte
+ strings. However, it is available only if PCRE has been
+ built to include UTF-8 support. If not, the use of this
+ option provokes an error. Support for UTF-8 is new, experi-
+ mental, and incomplete. Details of exactly what it entails
+ are given below.
+
STUDYING A PATTERN
@@ -242,10 +284,11 @@ STUDYING A PATTERN
worth spending more time analyzing it in order to speed up
the time taken for matching. The function pcre_study() takes
a pointer to a compiled pattern as its first argument, and
- returns a pointer to a pcre_extra block (another void
- typedef) containing additional information about the pat-
- tern; this can be passed to pcre_exec(). If no additional
- information is available, NULL is returned.
+ returns a pointer to a pcre_extra block (another typedef for
+ a structure with hidden contents) containing additional
+ information about the pattern; this can be passed to
+ pcre_exec(). If no additional information is available, NULL
+ is returned.
The second argument contains option bits. At present, no
options are defined for pcre_study(), and this argument
@@ -256,6 +299,14 @@ STUDYING A PATTERN
the variable it points to is set to NULL. Otherwise it
points to a textual error message.
+ This is a typical call to pcre_study():
+
+ pcre_extra *pe;
+ pe = pcre_study(
+ re, /* result of pcre_compile() */
+ 0, /* no options exist */
+ &error); /* set to NULL or points to a message */
+
At present, studying a pattern is useful only for non-
anchored patterns that do not have a single fixed starting
character. A bitmap of possible starting characters is
@@ -316,13 +367,24 @@ INFORMATION ABOUT A PATTERN
PCRE_ERROR_BADMAGIC the "magic number" was not found
PCRE_ERROR_BADOPTION the value of what was invalid
+ Here is a typical call of pcre_fullinfo(), to obtain the
+ length of the compiled pattern:
+
+ int rc;
+ unsigned long int length;
+ rc = pcre_fullinfo(
+ re, /* result of pcre_compile() */
+ pe, /* result of pcre_study(), or NULL */
+ PCRE_INFO_SIZE, /* what is required */
+ &length); /* where to put the data */
+
The possible values for the third argument are defined in
pcre.h, and are as follows:
PCRE_INFO_OPTIONS
Return a copy of the options with which the pattern was com-
- piled. The fourth argument should point to au unsigned long
+ piled. The fourth argument should point to an unsigned long
int variable. These option bits are those specified in the
call to pcre_compile(), modified by any top-level option
settings within the pattern itself, and with the
@@ -353,8 +415,8 @@ INFORMATION ABOUT A PATTERN
Return information about the first character of any matched
string, for a non-anchored pattern. If there is a fixed
first character, e.g. from a pattern such as
- (cat|cow|coyote), then it is returned in the integer pointed
- to by where. Otherwise, if either
+ (cat|cow|coyote), it is returned in the integer pointed to
+ by where. Otherwise, if either
(a) the pattern was compiled with the PCRE_MULTILINE option,
and every branch starts with "^", or
@@ -363,10 +425,10 @@ INFORMATION ABOUT A PATTERN
PCRE_DOTALL is not set (if it were set, the pattern would be
anchored),
- then -1 is returned, indicating that the pattern matches
- only at the start of a subject string or after any "\n"
- within the string. Otherwise -2 is returned. For anchored
- patterns, -2 is returned.
+ -1 is returned, indicating that the pattern matches only at
+ the start of a subject string or after any "\n" within the
+ string. Otherwise -2 is returned. For anchored patterns, -2
+ is returned.
PCRE_INFO_FIRSTTABLE
@@ -409,11 +471,34 @@ INFORMATION ABOUT A PATTERN
MATCHING A PATTERN
The function pcre_exec() is called to match a subject string
+
+
+
+
+
+SunOS 5.8 Last change: 9
+
+
+
against a pre-compiled pattern, which is passed in the code
argument. If the pattern has been studied, the result of the
study should be passed in the extra argument. Otherwise this
must be NULL.
+ Here is an example of a simple call to pcre_exec():
+
+ int rc;
+ int ovector[30];
+ rc = pcre_exec(
+ re, /* result of pcre_compile() */
+ NULL, /* we didn't study the pattern */
+ "some string", /* the subject string */
+ 11, /* the length of the subject string */
+ 0, /* start at offset 0 in the subject */
+ 0, /* default options */
+ ovector, /* vector for substring information */
+ 30); /* number of elements in the vector */
+
The PCRE_ANCHORED option can be passed in the options argu-
ment, whose unused bits must be zero. However, if a pattern
was compiled with PCRE_ANCHORED, or turned out to be
@@ -464,10 +549,10 @@ MATCHING A PATTERN
The subject string is passed as a pointer in subject, a
length in length, and a starting offset in startoffset.
- Unlike the pattern string, it may contain binary zero char-
- acters. When the starting offset is zero, the search for a
- match starts at the beginning of the subject, and this is by
- far the most common case.
+ Unlike the pattern string, the subject may contain binary
+ zero characters. When the starting offset is zero, the
+ search for a match starts at the beginning of the subject,
+ and this is by far the most common case.
A non-zero starting offset is useful when searching for
another match in the same subject by calling pcre_exec()
@@ -603,6 +688,7 @@ MATCHING A PATTERN
+
EXTRACTING CAPTURED SUBSTRINGS
Captured substrings can be accessed directly by using the
offsets returned by pcre_exec() in ovector. For convenience,
@@ -622,8 +708,8 @@ EXTRACTING CAPTURED SUBSTRINGS
entire regular expression. This is the value returned by
pcre_exec if it is greater than zero. If pcre_exec()
returned zero, indicating that it ran out of space in ovec-
- tor, then the value passed as stringcount should be the size
- of the vector divided by three.
+ tor, the value passed as stringcount should be the size of
+ the vector divided by three.
The functions pcre_copy_substring() and pcre_get_substring()
extract a single substring, whose number is given as string-
@@ -631,7 +717,7 @@ EXTRACTING CAPTURED SUBSTRINGS
the entire pattern, while higher values extract the captured
substrings. For pcre_copy_substring(), the string is placed
in buffer, whose length is given by buffersize, while for
- pcre_get_substring() a new block of store is obtained via
+ pcre_get_substring() a new block of memory is obtained via
pcre_malloc, and its address is returned via stringptr. The
yield of the function is the length of the string, not
including the terminating zero, or one of
@@ -665,6 +751,16 @@ EXTRACTING CAPTURED SUBSTRINGS
inspecting the appropriate offset in ovector, which is nega-
tive for unset substrings.
+ The two convenience functions pcre_free_substring() and
+ pcre_free_substring_list() can be used to free the memory
+ returned by a previous call of pcre_get_substring() or
+ pcre_get_substring_list(), respectively. They do nothing
+ more than call the function pointed to by pcre_free, which
+ of course could be called directly from a C program. How-
+ ever, PCRE is used in some situations where it is linked via
+ a special interface to another programming language which
+ cannot use pcre_free directly; it is for these cases that
+ the functions are provided.
@@ -672,10 +768,12 @@ LIMITATIONS
There are some size limitations in PCRE but it is hoped that
they will never in practice be relevant. The maximum length
of a compiled pattern is 65539 (sic) bytes. All values in
- repeating quantifiers must be less than 65536. The maximum
- number of capturing subpatterns is 99. The maximum number
- of all parenthesized subpatterns, including capturing sub-
- patterns, assertions, and other types of subpattern, is 200.
+ repeating quantifiers must be less than 65536. There max-
+ imum number of capturing subpatterns is 65535. There is no
+ limit to the number of non-capturing subpatterns, but the
+ maximum depth of nesting of all kinds of parenthesized sub-
+ pattern, including capturing subpatterns, assertions, and
+ other types of subpattern, is 200.
The maximum length of a subject string is the largest posi-
tive number that an integer variable can hold. However, PCRE
@@ -733,13 +831,14 @@ DIFFERENCES FROM PERL
(?p{code}) constructions. However, there is some experimen-
tal support for recursive patterns using the non-Perl item
(?R).
+
8. There are at the time of writing some oddities in Perl
5.005_02 concerned with the settings of captured strings
when part of a pattern is repeated. For example, matching
"aba" against the pattern /^(a(b)?)+$/ sets $2 to the value
"b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves $2
unset. However, if the pattern is changed to
- /^(aa(b(b))?)+$/ then $2 (and $3) get set.
+ /^(aa(b(b))?)+$/ then $2 (and $3) are set.
In Perl 5.004 $2 is set in both cases, and that is also true
of PCRE. If in the future Perl changes to a consistent state
@@ -785,11 +884,17 @@ REGULAR EXPRESSION DETAILS
The syntax and semantics of the regular expressions sup-
ported by PCRE are described below. Regular expressions are
also described in the Perl documentation and in a number of
-
other books, some of which have copious examples. Jeffrey
Friedl's "Mastering Regular Expressions", published by
- O'Reilly (ISBN 1-56592-257), covers them in great detail.
+ O'Reilly (ISBN 1-56592-257), covers them in great detail.
+
The description here is intended as reference documentation.
+ The basic operation of PCRE is on strings of bytes. However,
+ there is the beginnings of some support for UTF-8 character
+ strings. To use this support you must configure PCRE to
+ include it, and then call pcre_compile() with the PCRE_UTF8
+ option. How this affects the pattern matching is described
+ in the final section of this document.
A regular expression is a pattern that is matched against a
subject string from left to right. Most characters stand for
@@ -844,6 +949,7 @@ BACKSLASH
The backslash character has several uses. Firstly, if it is
followed by a non-alphameric character, it takes away any
special meaning that character may have. This use of
+
backslash as an escape character applies both inside and
outside character classes.
@@ -1047,7 +1153,7 @@ CIRCUMFLEX AND DOLLAR
Note that the sequences \A, \Z, and \z can be used to match
the start and end of the subject in both modes, and if all
- branches of a pattern start with \A is it always anchored,
+ branches of a pattern start with \A it is always anchored,
whether PCRE_MULTILINE is set or not.
@@ -1056,11 +1162,11 @@ FULL STOP (PERIOD, DOT)
Outside a character class, a dot in the pattern matches any
one character in the subject, including a non-printing char-
acter, but not (by default) newline. If the PCRE_DOTALL
- option is set, then dots match newlines as well. The han-
- dling of dot is entirely independent of the handling of cir-
- cumflex and dollar, the only relationship being that they
- both involve newline characters. Dot has no special meaning
- in a character class.
+ option is set, dots match newlines as well. The handling of
+ dot is entirely independent of the handling of circumflex
+ and dollar, the only relationship being that they both
+ involve newline characters. Dot has no special meaning in a
+ character class.
@@ -1174,7 +1280,7 @@ POSIX CHARACTER CLASSES
[12[:^digit:]]
matches "1", "2", or any non-digit. PCRE (and Perl) also
- recogize the POSIX syntax [.ch.] and [=ch=] where "ch" is a
+ recognize the POSIX syntax [.ch.] and [=ch=] where "ch" is a
"collating element", but these are not supported, and an
error is given if they are encountered.
@@ -1293,7 +1399,7 @@ SUBPATTERNS
the ((red|white) (king|queen))
the captured substrings are "red king", "red", and "king",
- and are numbered 1, 2, and 3.
+ and are numbered 1, 2, and 3, respectively.
The fact that plain parentheses fulfil two functions is not
always helpful. There are often times when a grouping sub-
@@ -1364,7 +1470,6 @@ REPETITION
one that does not match the syntax of a quantifier, is taken
as a literal character. For example, {,6} is not a quantif-
ier, but a literal string of four characters.
-
The quantifier {0} is permitted, causing the expression to
behave as if the previous item and the quantifier were not
present.
@@ -1403,12 +1508,12 @@ REPETITION
/* first command */ not comment /* second comment */
- fails, because it matches the entire string due to the
+ fails, because it matches the entire string owing to the
greediness of the .* item.
- However, if a quantifier is followed by a question mark,
- then it ceases to be greedy, and instead matches the minimum
- number of times possible, so the pattern
+ However, if a quantifier is followed by a question mark, it
+ ceases to be greedy, and instead matches the minimum number
+ of times possible, so the pattern
/\*.*?\*/
@@ -1425,7 +1530,7 @@ REPETITION
that is the only way the rest of the pattern matches.
If the PCRE_UNGREEDY option is set (an option which is not
- available in Perl) then the quantifiers are not greedy by
+ available in Perl), the quantifiers are not greedy by
default, but individual ones can be made greedy by following
them with a question mark. In other words, it inverts the
default behaviour.
@@ -1437,7 +1542,7 @@ REPETITION
If a pattern starts with .* or .{0,} and the PCRE_DOTALL
option (equivalent to Perl's /s) is set, thus allowing the .
- to match newlines, then the pattern is implicitly anchored,
+ to match newlines, the pattern is implicitly anchored,
because whatever follows will be tried against every charac-
ter position in the subject string, so there is no point in
retrying the overall match at any position after the first.
@@ -1469,6 +1574,14 @@ REPETITION
BACK REFERENCES
Outside a character class, a backslash followed by a digit
greater than 0 (and possibly further digits) is a back
+
+
+
+
+SunOS 5.8 Last change: 30
+
+
+
reference to a capturing subpattern earlier (i.e. to its
left) in the pattern, provided there have been that many
previous capturing left parentheses.
@@ -1490,8 +1603,8 @@ BACK REFERENCES
matches "sense and sensibility" and "response and responsi-
bility", but not "sense and responsibility". If caseful
- matching is in force at the time of the back reference, then
- the case of letters is relevant. For example,
+ matching is in force at the time of the back reference, the
+ case of letters is relevant. For example,
((?i)rah)\s+\1
@@ -1501,8 +1614,8 @@ BACK REFERENCES
There may be more than one back reference to the same sub-
pattern. If a subpattern has not actually been used in a
- particular match, then any back references to it always
- fail. For example, the pattern
+ particular match, any back references to it always fail. For
+ example, the pattern
(a|(bc))\2
@@ -1510,19 +1623,19 @@ BACK REFERENCES
Because there may be up to 99 back references, all digits
following the backslash are taken as part of a potential
back reference number. If the pattern continues with a digit
- character, then some delimiter must be used to terminate the
- back reference. If the PCRE_EXTENDED option is set, this can
- be whitespace. Otherwise an empty comment can be used.
+ character, some delimiter must be used to terminate the back
+ reference. If the PCRE_EXTENDED option is set, this can be
+ whitespace. Otherwise an empty comment can be used.
A back reference that occurs inside the parentheses to which
it refers fails when the subpattern is first used, so, for
example, (a\1) never matches. However, such references can
- be useful inside repeated subpatterns. For example, the
- pattern
+ be useful inside repeated subpatterns. For example, the pat-
+ tern
(a|b\1)+
- matches any number of "a"s and also "aba", "ababaa" etc. At
+ matches any number of "a"s and also "aba", "ababbaa" etc. At
each iteration of the subpattern, the back reference matches
the character string corresponding to the previous itera-
tion. In order for this to work, the pattern must be such
@@ -1612,7 +1725,7 @@ ASSERTIONS
matches "foo" preceded by three digits that are not "999".
Notice that each of the assertions is applied independently
at the same point in the subject string. First there is a
- check that the previous three characters are all digits,
+ check that the previous three characters are all digits, and
then there is a check that the same three characters are not
"999". This pattern does not match "foo" preceded by six
characters, the first of which are digits and the last three
@@ -1713,21 +1826,20 @@ ONCE-ONLY SUBPATTERNS
^.*abcd$
- then the initial .* matches the entire string at first, but
- when this fails (because there is no following "a"), it
- backtracks to match all but the last character, then all but
- the last two characters, and so on. Once again the search
- for "a" covers the entire string, from right to left, so we
- are no better off. However, if the pattern is written as
+ the initial .* matches the entire string at first, but when
+ this fails (because there is no following "a"), it back-
+ tracks to match all but the last character, then all but the
+ last two characters, and so on. Once again the search for
+ "a" covers the entire string, from right to left, so we are
+ no better off. However, if the pattern is written as
^(?>.*)(?<=abcd)
- then there can be no backtracking for the .* item; it can
- match only the entire string. The subsequent lookbehind
- assertion does a single test on the last four characters. If
- it fails, the match fails immediately. For long strings,
- this approach makes a significant difference to the process-
- ing time.
+ there can be no backtracking for the .* item; it can match
+ only the entire string. The subsequent lookbehind assertion
+ does a single test on the last four characters. If it fails,
+ the match fails immediately. For long strings, this approach
+ makes a significant difference to the processing time.
When a pattern contains an unlimited repeat inside a subpat-
tern that can itself be repeated an unlimited number of
@@ -1777,12 +1889,13 @@ CONDITIONAL SUBPATTERNS
error occurs.
There are two kinds of condition. If the text between the
- parentheses consists of a sequence of digits, then the
- condition is satisfied if the capturing subpattern of that
- number has previously matched. Consider the following pat-
- tern, which contains non-significant white space to make it
- more readable (assume the PCRE_EXTENDED option) and to
- divide it into three parts for ease of discussion:
+ parentheses consists of a sequence of digits, the condition
+ is satisfied if the capturing subpattern of that number has
+ previously matched. The number must be greater than zero.
+ Consider the following pattern, which contains non-
+ significant white space to make it more readable (assume the
+ PCRE_EXTENDED option) and to divide it into three parts for
+ ease of discussion:
( \( )? [^()]+ (?(1) \) )
@@ -1888,8 +2001,8 @@ RECURSIVE PATTERNS
\( ( ( (?>[^()]+) | (?R) )* ) \)
^ ^
- ^ ^ then the string they capture
- is "ab(cd)ef", the contents of the top level parentheses. If
+ ^ ^ the string they capture is
+ "ab(cd)ef", the contents of the top level parentheses. If
there are more than 15 capturing parentheses in a pattern,
PCRE has to obtain extra memory to store data during a
recursion, which it does by using pcre_malloc, freeing it
@@ -1967,6 +2080,230 @@ PERFORMANCE
+UTF-8 SUPPORT
+ Starting at release 3.3, PCRE has some support for character
+ strings encoded in the UTF-8 format. This is incomplete, and
+ is regarded as experimental. In order to use it, you must
+ configure PCRE to include UTF-8 support in the code, and, in
+ addition, you must call pcre_compile() with the PCRE_UTF8
+ option flag. When you do this, both the pattern and any sub-
+ ject strings that are matched against it are treated as
+ UTF-8 strings instead of just strings of bytes, but only in
+ the cases that are mentioned below.
+
+ If you compile PCRE with UTF-8 support, but do not use it at
+ run time, the library will be a bit bigger, but the addi-
+ tional run time overhead is limited to testing the PCRE_UTF8
+ flag in several places, so should not be very large.
+
+ PCRE assumes that the strings it is given contain valid
+ UTF-8 codes. It does not diagnose invalid UTF-8 strings. If
+ you pass invalid UTF-8 strings to PCRE, the results are
+ undefined.
+
+ Running with PCRE_UTF8 set causes these changes in the way
+ PCRE works:
+
+ 1. In a pattern, the escape sequence \x{...}, where the
+ contents of the braces is a string of hexadecimal digits, is
+ interpreted as a UTF-8 character whose code number is the
+ given hexadecimal number, for example: \x{1234}. This
+ inserts from one to six literal bytes into the pattern,
+ using the UTF-8 encoding. If a non-hexadecimal digit appears
+ between the braces, the item is not recognized.
+
+ 2. The original hexadecimal escape sequence, \xhh, generates
+ a two-byte UTF-8 character if its value is greater than 127.
+
+ 3. Repeat quantifiers are NOT correctly handled if they fol-
+ low a multibyte character. For example, \x{100}* and \xc3+
+ do not work. If you want to repeat such characters, you must
+ enclose them in non-capturing parentheses, for example
+ (?:\x{100}), at present.
+
+ 4. The dot metacharacter matches one UTF-8 character instead
+ of a single byte.
+
+ 5. Unlike literal UTF-8 characters, the dot metacharacter
+ followed by a repeat quantifier does operate correctly on
+ UTF-8 characters instead of single bytes.
+
+ 4. Although the \x{...} escape is permitted in a character
+ class, characters whose values are greater than 255 cannot
+ be included in a class.
+
+ 5. A class is matched against a UTF-8 character instead of
+ just a single byte, but it can match only characters whose
+ values are less than 256. Characters with greater values
+ always fail to match a class.
+
+ 6. Repeated classes work correctly on multiple characters.
+
+ 7. Classes containing just a single character whose value is
+ greater than 127 (but less than 256), for example, [\x80] or
+ [^\x{93}], do not work because these are optimized into sin-
+ gle byte matches. In the first case, of course, the class
+ brackets are just redundant.
+
+ 8. Lookbehind assertions move backwards in the subject by a
+ fixed number of characters instead of a fixed number of
+ bytes. Simple cases have been tested to work correctly, but
+ there may be hidden gotchas herein.
+
+ 9. The character types such as \d and \w do not work
+ correctly with UTF-8 characters. They continue to test a
+ single byte.
+
+ 10. Anything not explicitly mentioned here continues to work
+ in bytes rather than in characters.
+
+ The following UTF-8 features of Perl 5.6 are not imple-
+ mented:
+
+ 1. The escape sequence \C to match a single byte.
+
+ 2. The use of Unicode tables and properties and escapes \p,
+ \P, and \X.
+
+
+
+SAMPLE PROGRAM
+ The code below is a simple, complete demonstration program,
+ to get you started with using PCRE. This code is also sup-
+ plied in the file pcredemo.c in the PCRE distribution.
+
+ The program compiles the regular expression that is its
+ first argument, and matches it against the subject string in
+ its second argument. No options are set, and default charac-
+ ter tables are used. If matching succeeds, the program out-
+ puts the portion of the subject that matched, together with
+ the contents of any captured substrings.
+
+ On a Unix system that has PCRE installed in /usr/local, you
+ can compile the demonstration program using a command like
+ this:
+
+ gcc -o pcredemo pcredemo.c -I/usr/local/include
+ -L/usr/local/lib -lpcre
+
+ Then you can run simple tests like this:
+
+ ./pcredemo 'cat|dog' 'the cat sat on the mat'
+
+ Note that there is a much more comprehensive test program,
+ called pcretest, which supports many more facilities for
+ testing regular expressions. The pcredemo program is pro-
+ vided as a simple coding example.
+
+ On some operating systems (e.g. Solaris) you may get an
+ error like this when you try to run pcredemo:
+
+ ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such
+ file or directory
+
+ This is caused by the way shared library support works on
+ those systems. You need to add
+
+ -R/usr/local/lib
+
+ to the compile command to get round this problem. Here's the
+ code:
+
+ #include <stdio.h>
+ #include <string.h>
+ #include <pcre.h>
+
+ #define OVECCOUNT 30 /* should be a multiple of 3 */
+
+ int main(int argc, char **argv)
+ {
+ pcre *re;
+ const char *error;
+ int erroffset;
+ int ovector[OVECCOUNT];
+ int rc, i;
+
+ if (argc != 3)
+ {
+ printf("Two arguments required: a regex and a "
+ "subject string\n");
+ return 1;
+ }
+
+ /* Compile the regular expression in the first argument */
+
+ re = pcre_compile(
+ argv[1], /* the pattern */
+ 0, /* default options */
+ &error, /* for error message */
+ &erroffset, /* for error offset */
+ NULL); /* use default character tables */
+
+ /* Compilation failed: print the error message and exit */
+
+ if (re == NULL)
+ {
+ printf("PCRE compilation failed at offset %d: %s\n",
+ erroffset, error);
+ return 1;
+ }
+
+ /* Compilation succeeded: match the subject in the second
+ argument */
+
+ rc = pcre_exec(
+ re, /* the compiled pattern */
+ NULL, /* we didn't study the pattern */
+ argv[2], /* the subject string */
+ (int)strlen(argv[2]), /* the length of the subject */
+ 0, /* start at offset 0 in the subject */
+ 0, /* default options */
+ ovector, /* vector for substring information */
+ OVECCOUNT); /* number of elements in the vector */
+
+ /* Matching failed: handle error cases */
+
+ if (rc < 0)
+ {
+ switch(rc)
+ {
+ case PCRE_ERROR_NOMATCH: printf("No match\n"); break;
+ /*
+ Handle other special cases if you like
+ */
+ default: printf("Matching error %d\n", rc); break;
+ }
+ return 1;
+ }
+
+ /* Match succeded */
+
+ printf("Match succeeded\n");
+
+ /* The output vector wasn't big enough */
+
+ if (rc == 0)
+ {
+ rc = OVECCOUNT/3;
+ printf("ovector only has room for %d captured "
+ substrings\n", rc - 1);
+ }
+
+ /* Show substrings stored in the output vector */
+
+ for (i = 0; i < rc; i++)
+ {
+ char *substring_start = argv[2] + ovector[2*i];
+ int substring_length = ovector[2*i+1] - ovector[2*i];
+ printf("%2d: %.*s\n", i, substring_length,
+ substring_start);
+ }
+
+ return 0;
+ }
+
+
+
AUTHOR
Philip Hazel <ph10@cam.ac.uk>
University Computing Service,
@@ -1974,5 +2311,5 @@ AUTHOR
Cambridge CB2 3QG, England.
Phone: +44 1223 334714
- Last updated: 27 January 2000
- Copyright (c) 1997-2000 University of Cambridge.
+ Last updated: 15 August 2001
+ Copyright (c) 1997-2001 University of Cambridge.
diff --git a/srclib/pcre/doc/pcreposix.3 b/srclib/pcre/doc/pcreposix.3
index 1be5d9ac6f..41716ead91 100644
--- a/srclib/pcre/doc/pcreposix.3
+++ b/srclib/pcre/doc/pcreposix.3
@@ -77,6 +77,14 @@ to the native function.
The PCRE_MULTILINE option is set when the expression is passed for compilation
to the native function.
+In the absence of these flags, no options are passed to the native function.
+This means the the regex is compiled with PCRE default semantics. In
+particular, the way it handles newline characters in the subject string is the
+Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
+\fIsome\fR of the effects specified for REG_NEWLINE. It does not affect the way
+newlines are matched by . (they aren't) or a negative class such as [^a] (they
+are).
+
The yield of \fBregcomp()\fR is zero on success, and non-zero otherwise. The
\fIpreg\fR structure is filled in on success, and one member of the structure
is publicized: \fIre_nsub\fR contains the number of capturing subpatterns in
@@ -138,4 +146,4 @@ Cambridge CB2 3QG, England.
.br
Phone: +44 1223 334714
-Copyright (c) 1997-1999 University of Cambridge.
+Copyright (c) 1997-2000 University of Cambridge.
diff --git a/srclib/pcre/doc/pcreposix.html b/srclib/pcre/doc/pcreposix.html
index 121d90f867..9c89478420 100644
--- a/srclib/pcre/doc/pcreposix.html
+++ b/srclib/pcre/doc/pcreposix.html
@@ -107,6 +107,15 @@ The PCRE_MULTILINE option is set when the expression is passed for compilation
to the native function.
</P>
<P>
+In the absence of these flags, no options are passed to the native function.
+This means the the regex is compiled with PCRE default semantics. In
+particular, the way it handles newline characters in the subject string is the
+Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
+<I>some</I> of the effects specified for REG_NEWLINE. It does not affect the way
+newlines are matched by . (they aren't) or a negative class such as [^a] (they
+are).
+</P>
+<P>
The yield of <B>regcomp()</B> is zero on success, and non-zero otherwise. The
<I>preg</I> structure is filled in on success, and one member of the structure
is publicized: <I>re_nsub</I> contains the number of capturing subpatterns in
@@ -179,4 +188,4 @@ Cambridge CB2 3QG, England.
Phone: +44 1223 334714
</P>
<P>
-Copyright (c) 1997-1999 University of Cambridge.
+Copyright (c) 1997-2000 University of Cambridge.
diff --git a/srclib/pcre/doc/pcreposix.txt b/srclib/pcre/doc/pcreposix.txt
index 4a7036f340..2d76f7cdcc 100644
--- a/srclib/pcre/doc/pcreposix.txt
+++ b/srclib/pcre/doc/pcreposix.txt
@@ -80,6 +80,15 @@ COMPILING A PATTERN
The PCRE_MULTILINE option is set when the expression is
passed for compilation to the native function.
+ In the absence of these flags, no options are passed to the
+ native function. This means the the regex is compiled with
+ PCRE default semantics. In particular, the way it handles
+ newline characters in the subject string is the Perl way,
+ not the POSIX way. Note that setting PCRE_MULTILINE has only
+ some of the effects specified for REG_NEWLINE. It does not
+ affect the way newlines are matched by . (they aren't) or a
+ negative class such as [^a] (they are).
+
The yield of regcomp() is zero on success, and non-zero oth-
erwise. The preg structure is filled in on success, and one
member of the structure is publicized: re_nsub contains the
@@ -147,4 +156,4 @@ AUTHOR
Cambridge CB2 3QG, England.
Phone: +44 1223 334714
- Copyright (c) 1997-1999 University of Cambridge.
+ Copyright (c) 1997-2000 University of Cambridge.
diff --git a/srclib/pcre/doc/pcretest.txt b/srclib/pcre/doc/pcretest.txt
index 831fdac987..0e13b6c6c5 100644
--- a/srclib/pcre/doc/pcretest.txt
+++ b/srclib/pcre/doc/pcretest.txt
@@ -1,216 +1,319 @@
-The pcretest program
---------------------
+NAME
+ pcretest - a program for testing Perl-compatible regular
+ expressions.
-This program is intended for testing PCRE, but it can also be used for
-experimenting with regular expressions.
-If it is given two filename arguments, it reads from the first and writes to
-the second. If it is given only one filename argument, it reads from that file
-and writes to stdout. Otherwise, it reads from stdin and writes to stdout, and
-prompts for each line of input, using "re>" to prompt for regular expressions,
-and "data>" to prompt for data lines.
-
-The program handles any number of sets of input on a single input file. Each
-set starts with a regular expression, and continues with any number of data
-lines to be matched against the pattern. An empty line signals the end of the
-data lines, at which point a new regular expression is read. The regular
-expressions are given enclosed in any non-alphameric delimiters other than
-backslash, for example
-
- /(a|bc)x+yz/
-
-White space before the initial delimiter is ignored. A regular expression may
-be continued over several input lines, in which case the newline characters are
-included within it. See the test input files in the testdata directory for many
-examples. It is possible to include the delimiter within the pattern by
-escaping it, for example
-
- /abc\/def/
-
-If you do so, the escape and the delimiter form part of the pattern, but since
-delimiters are always non-alphameric, this does not affect its interpretation.
-If the terminating delimiter is immediately followed by a backslash, for
-example,
-
- /abc/\
-
-then a backslash is added to the end of the pattern. This is done to provide a
-way of testing the error condition that arises if a pattern finishes with a
-backslash, because
-
- /abc\/
-
-is interpreted as the first line of a pattern that starts with "abc/", causing
-pcretest to read the next line as a continuation of the regular expression.
-
-The pattern may be followed by i, m, s, or x to set the PCRE_CASELESS,
-PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively. For
-example:
-
- /caseless/i
-
-These modifier letters have the same effect as they do in Perl. There are
-others which set PCRE options that do not correspond to anything in Perl: /A,
-/E, and /X set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and PCRE_EXTRA respectively.
-
-Searching for all possible matches within each subject string can be requested
-by the /g or /G modifier. After finding a match, PCRE is called again to search
-the remainder of the subject string. The difference between /g and /G is that
-the former uses the startoffset argument to pcre_exec() to start searching at
-a new point within the entire string (which is in effect what Perl does),
-whereas the latter passes over a shortened substring. This makes a difference
-to the matching process if the pattern begins with a lookbehind assertion
-(including \b or \B).
-
-If any call to pcre_exec() in a /g or /G sequence matches an empty string, the
-next call is done with the PCRE_NOTEMPTY flag set so that it cannot match an
-empty string again at the same point. If however, this second match fails, the
-start offset is advanced by one, and the match is retried. This imitates the
-way Perl handles such cases when using the /g modifier or the split() function.
-
-There are a number of other modifiers for controlling the way pcretest
-operates.
-
-The /+ modifier requests that as well as outputting the substring that matched
-the entire pattern, pcretest should in addition output the remainder of the
-subject string. This is useful for tests where the subject contains multiple
-copies of the same substring.
-
-The /L modifier must be followed directly by the name of a locale, for example,
-
- /pattern/Lfr
-
-For this reason, it must be the last modifier letter. The given locale is set,
-pcre_maketables() is called to build a set of character tables for the locale,
-and this is then passed to pcre_compile() when compiling the regular
-expression. Without an /L modifier, NULL is passed as the tables pointer; that
-is, /L applies only to the expression on which it appears.
-
-The /I modifier requests that pcretest output information about the compiled
-expression (whether it is anchored, has a fixed first character, and so on). It
-does this by calling pcre_fullinfo() after compiling an expression, and
-outputting the information it gets back. If the pattern is studied, the results
-of that are also output.
-
-The /D modifier is a PCRE debugging feature, which also assumes /I. It causes
-the internal form of compiled regular expressions to be output after
-compilation.
-
-The /S modifier causes pcre_study() to be called after the expression has been
-compiled, and the results used when the expression is matched.
-
-The /M modifier causes the size of memory block used to hold the compiled
-pattern to be output.
-
-Finally, the /P modifier causes pcretest to call PCRE via the POSIX wrapper API
-rather than its native API. When this is done, all other modifiers except /i,
-/m, and /+ are ignored. REG_ICASE is set if /i is present, and REG_NEWLINE is
-set if /m is present. The wrapper functions force PCRE_DOLLAR_ENDONLY always,
-and PCRE_DOTALL unless REG_NEWLINE is set.
-
-Before each data line is passed to pcre_exec(), leading and trailing whitespace
-is removed, and it is then scanned for \ escapes. The following are recognized:
-
- \a alarm (= BEL)
- \b backspace
- \e escape
- \f formfeed
- \n newline
- \r carriage return
- \t tab
- \v vertical tab
- \nnn octal character (up to 3 octal digits)
- \xhh hexadecimal character (up to 2 hex digits)
-
- \A pass the PCRE_ANCHORED option to pcre_exec()
- \B pass the PCRE_NOTBOL option to pcre_exec()
- \Cdd call pcre_copy_substring() for substring dd after a successful match
- (any decimal number less than 32)
- \Gdd call pcre_get_substring() for substring dd after a successful match
- (any decimal number less than 32)
- \L call pcre_get_substringlist() after a successful match
- \N pass the PCRE_NOTEMPTY option to pcre_exec()
- \Odd set the size of the output vector passed to pcre_exec() to dd
- (any number of decimal digits)
- \Z pass the PCRE_NOTEOL option to pcre_exec()
-
-A backslash followed by anything else just escapes the anything else. If the
-very last character is a backslash, it is ignored. This gives a way of passing
-an empty line as data, since a real empty line terminates the data input.
-
-If /P was present on the regex, causing the POSIX wrapper API to be used, only
-\B, and \Z have any effect, causing REG_NOTBOL and REG_NOTEOL to be passed to
-regexec() respectively.
-
-When a match succeeds, pcretest outputs the list of captured substrings that
-pcre_exec() returns, starting with number 0 for the string that matched the
-whole pattern. Here is an example of an interactive pcretest run.
-
- $ pcretest
- PCRE version 2.06 08-Jun-1999
-
- re> /^abc(\d+)/
- data> abc123
- 0: abc123
- 1: 123
- data> xyz
- No match
-
-If the strings contain any non-printing characters, they are output as \0x
-escapes. If the pattern has the /+ modifier, then the output for substring 0 is
-followed by the the rest of the subject string, identified by "0+" like this:
-
- re> /cat/+
- data> cataract
- 0: cat
- 0+ aract
-
-If the pattern has the /g or /G modifier, the results of successive matching
-attempts are output in sequence, like this:
-
- re> /\Bi(\w\w)/g
- data> Mississippi
- 0: iss
- 1: ss
- 0: iss
- 1: ss
- 0: ipp
- 1: pp
-
-"No match" is output only if the first match attempt fails.
-
-If any of \C, \G, or \L are present in a data line that is successfully
-matched, the substrings extracted by the convenience functions are output with
-C, G, or L after the string number instead of a colon. This is in addition to
-the normal full list. The string length (that is, the return from the
-extraction function) is given in parentheses after each string for \C and \G.
-
-Note that while patterns can be continued over several lines (a plain ">"
-prompt is used for continuations), data lines may not. However newlines can be
-included in data by means of the \n escape.
-
-If the -p option is given to pcretest, it is equivalent to adding /P to each
-regular expression: the POSIX wrapper API is used to call PCRE. None of the
-following flags has any effect in this case.
-
-If the option -d is given to pcretest, it is equivalent to adding /D to each
-regular expression: the internal form is output after compilation.
-
-If the option -i is given to pcretest, it is equivalent to adding /I to each
-regular expression: information about the compiled pattern is given after
-compilation.
-
-If the option -m is given to pcretest, it outputs the size of each compiled
-pattern after it has been compiled. It is equivalent to adding /M to each
-regular expression. For compatibility with earlier versions of pcretest, -s is
-a synonym for -m.
-
-If the -t option is given, each compile, study, and match is run 20000 times
-while being timed, and the resulting time per compile or match is output in
-milliseconds. Do not set -t with -s, because you will then get the size output
-20000 times and the timing will be distorted. If you want to change the number
-of repetitions used for timing, edit the definition of LOOPREPEAT at the top of
-pcretest.c
-
-Philip Hazel <ph10@cam.ac.uk>
-January 2000
+
+SYNOPSIS
+ pcretest [-d] [-i] [-m] [-o osize] [-p] [-t] [source] [des-
+ tination]
+
+ pcretest was written as a test program for the PCRE regular
+ expression library itself, but it can also be used for
+ experimenting with regular expressions. This man page
+ describes the features of the test program; for details of
+ the regular expressions themselves, see the pcre man page.
+
+
+
+OPTIONS
+ -d Behave as if each regex had the /D modifier (see
+ below); the internal form is output after compila-
+ tion.
+
+ -i Behave as if each regex had the /I modifier;
+ information about the compiled pattern is given
+ after compilation.
+
+ -m Output the size of each compiled pattern after it
+ has been compiled. This is equivalent to adding /M
+ to each regular expression. For compatibility with
+ earlier versions of pcretest, -s is a synonym for
+ -m.
+
+ -o osize Set the number of elements in the output vector
+ that is used when calling PCRE to be osize. The
+ default value is 45, which is enough for 14 cap-
+ turing subexpressions. The vector size can be
+ changed for individual matching calls by including
+ \O in the data line (see below).
+
+ -p Behave as if each regex has /P modifier; the POSIX
+ wrapper API is used to call PCRE. None of the
+ other options has any effect when -p is set.
+
+ -t Run each compile, study, and match 20000 times
+ with a timer, and output resulting time per com-
+ pile or match (in milliseconds). Do not set -t
+ with -m, because you will then get the size output
+ 20000 times and the timing will be distorted.
+
+
+
+DESCRIPTION
+ If pcretest is given two filename arguments, it reads from
+ the first and writes to the second. If it is given only one
+
+
+
+
+SunOS 5.8 Last change: 1
+
+
+
+ filename argument, it reads from that file and writes to
+ stdout. Otherwise, it reads from stdin and writes to stdout,
+ and prompts for each line of input, using "re>" to prompt
+ for regular expressions, and "data>" to prompt for data
+ lines.
+
+ The program handles any number of sets of input on a single
+ input file. Each set starts with a regular expression, and
+ continues with any number of data lines to be matched
+ against the pattern. An empty line signals the end of the
+ data lines, at which point a new regular expression is read.
+ The regular expressions are given enclosed in any non-
+ alphameric delimiters other than backslash, for example
+
+ /(a|bc)x+yz/
+
+ White space before the initial delimiter is ignored. A regu-
+ lar expression may be continued over several input lines, in
+ which case the newline characters are included within it. It
+ is possible to include the delimiter within the pattern by
+ escaping it, for example
+
+ /abc\/def/
+
+ If you do so, the escape and the delimiter form part of the
+ pattern, but since delimiters are always non-alphameric,
+ this does not affect its interpretation. If the terminating
+ delimiter is immediately followed by a backslash, for exam-
+ ple,
+
+ /abc/\
+
+ then a backslash is added to the end of the pattern. This is
+ done to provide a way of testing the error condition that
+ arises if a pattern finishes with a backslash, because
+
+ /abc\/
+
+ is interpreted as the first line of a pattern that starts
+ with "abc/", causing pcretest to read the next line as a
+ continuation of the regular expression.
+
+
+
+PATTERN MODIFIERS
+ The pattern may be followed by i, m, s, or x to set the
+ PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED
+ options, respectively. For example:
+
+ /caseless/i
+
+ These modifier letters have the same effect as they do in
+ Perl. There are others which set PCRE options that do not
+ correspond to anything in Perl: /A, /E, and /X set
+ PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and PCRE_EXTRA respec-
+ tively.
+
+ Searching for all possible matches within each subject
+ string can be requested by the /g or /G modifier. After
+ finding a match, PCRE is called again to search the
+ remainder of the subject string. The difference between /g
+ and /G is that the former uses the startoffset argument to
+ pcre_exec() to start searching at a new point within the
+ entire string (which is in effect what Perl does), whereas
+ the latter passes over a shortened substring. This makes a
+ difference to the matching process if the pattern begins
+ with a lookbehind assertion (including \b or \B).
+
+ If any call to pcre_exec() in a /g or /G sequence matches an
+ empty string, the next call is done with the PCRE_NOTEMPTY
+ and PCRE_ANCHORED flags set in order to search for another,
+ non-empty, match at the same point. If this second match
+ fails, the start offset is advanced by one, and the normal
+ match is retried. This imitates the way Perl handles such
+ cases when using the /g modifier or the split() function.
+
+ There are a number of other modifiers for controlling the
+ way pcretest operates.
+
+ The /+ modifier requests that as well as outputting the sub-
+ string that matched the entire pattern, pcretest should in
+ addition output the remainder of the subject string. This is
+ useful for tests where the subject contains multiple copies
+ of the same substring.
+
+ The /L modifier must be followed directly by the name of a
+ locale, for example,
+
+ /pattern/Lfr
+
+ For this reason, it must be the last modifier letter. The
+ given locale is set, pcre_maketables() is called to build a
+ set of character tables for the locale, and this is then
+ passed to pcre_compile() when compiling the regular expres-
+ sion. Without an /L modifier, NULL is passed as the tables
+ pointer; that is, /L applies only to the expression on which
+ it appears.
+
+ The /I modifier requests that pcretest output information
+ about the compiled expression (whether it is anchored, has a
+ fixed first character, and so on). It does this by calling
+ pcre_fullinfo() after compiling an expression, and output-
+ ting the information it gets back. If the pattern is stu-
+ died, the results of that are also output.
+ The /D modifier is a PCRE debugging feature, which also
+ assumes /I. It causes the internal form of compiled regular
+ expressions to be output after compilation.
+
+ The /S modifier causes pcre_study() to be called after the
+ expression has been compiled, and the results used when the
+ expression is matched.
+
+ The /M modifier causes the size of memory block used to hold
+ the compiled pattern to be output.
+
+ The /P modifier causes pcretest to call PCRE via the POSIX
+ wrapper API rather than its native API. When this is done,
+ all other modifiers except /i, /m, and /+ are ignored.
+ REG_ICASE is set if /i is present, and REG_NEWLINE is set if
+ /m is present. The wrapper functions force
+ PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless
+ REG_NEWLINE is set.
+
+ The /8 modifier causes pcretest to call PCRE with the
+ PCRE_UTF8 option set. This turns on the (currently incom-
+ plete) support for UTF-8 character handling in PCRE, pro-
+ vided that it was compiled with this support enabled. This
+ modifier also causes any non-printing characters in output
+ strings to be printed using the \x{hh...} notation if they
+ are valid UTF-8 sequences.
+
+
+
+DATA LINES
+ Before each data line is passed to pcre_exec(), leading and
+ trailing whitespace is removed, and it is then scanned for \
+ escapes. The following are recognized:
+
+ \a alarm (= BEL)
+ \b backspace
+ \e escape
+ \f formfeed
+ \n newline
+ \r carriage return
+ \t tab
+ \v vertical tab
+ \nnn octal character (up to 3 octal digits)
+ \xhh hexadecimal character (up to 2 hex digits)
+ \x{hh...} hexadecimal UTF-8 character
+
+ \A pass the PCRE_ANCHORED option to pcre_exec()
+ \B pass the PCRE_NOTBOL option to pcre_exec()
+ \Cdd call pcre_copy_substring() for substring dd
+ after a successful match (any decimal number
+ less than 32)
+ \Gdd call pcre_get_substring() for substring dd
+
+ after a successful match (any decimal number
+ less than 32)
+ \L call pcre_get_substringlist() after a
+ successful match
+ \N pass the PCRE_NOTEMPTY option to pcre_exec()
+ \Odd set the size of the output vector passed to
+ pcre_exec() to dd (any number of decimal
+ digits)
+ \Z pass the PCRE_NOTEOL option to pcre_exec()
+
+ When \O is used, it may be higher or lower than the size set
+ by the -O option (or defaulted to 45); \O applies only to
+ the call of pcre_exec() for the line in which it appears.
+
+ A backslash followed by anything else just escapes the any-
+ thing else. If the very last character is a backslash, it is
+ ignored. This gives a way of passing an empty line as data,
+ since a real empty line terminates the data input.
+
+ If /P was present on the regex, causing the POSIX wrapper
+ API to be used, only B, and Z have any effect, causing
+ REG_NOTBOL and REG_NOTEOL to be passed to regexec() respec-
+ tively.
+
+ The use of \x{hh...} to represent UTF-8 characters is not
+ dependent on the use of the /8 modifier on the pattern. It
+ is recognized always. There may be any number of hexadecimal
+ digits inside the braces. The result is from one to six
+ bytes, encoded according to the UTF-8 rules.
+
+
+
+OUTPUT FROM PCRETEST
+ When a match succeeds, pcretest outputs the list of captured
+ substrings that pcre_exec() returns, starting with number 0
+ for the string that matched the whole pattern. Here is an
+ example of an interactive pcretest run.
+
+ $ pcretest
+ PCRE version 2.06 08-Jun-1999
+
+ re> /^abc(\d+)/
+ data> abc123
+ 0: abc123
+ 1: 123
+ data> xyz
+ No match
+
+ If the strings contain any non-printing characters, they are
+ output as \0x escapes, or as \x{...} escapes if the /8
+ modifier was present on the pattern. If the pattern has the
+ /+ modifier, then the output for substring 0 is followed by
+ the the rest of the subject string, identified by "0+" like
+ this:
+
+ re> /cat/+
+ data> cataract
+ 0: cat
+ 0+ aract
+
+ If the pattern has the /g or /G modifier, the results of
+ successive matching attempts are output in sequence, like
+ this:
+
+ re> /\Bi(\w\w)/g
+ data> Mississippi
+ 0: iss
+ 1: ss
+ 0: iss
+ 1: ss
+ 0: ipp
+ 1: pp
+
+ "No match" is output only if the first match attempt fails.
+
+ If any of the sequences \C, \G, or \L are present in a data
+ line that is successfully matched, the substrings extracted
+ by the convenience functions are output with C, G, or L
+ after the string number instead of a colon. This is in addi-
+ tion to the normal full list. The string length (that is,
+ the return from the extraction function) is given in
+ parentheses after each string for \C and \G.
+
+ Note that while patterns can be continued over several lines
+ (a plain ">" prompt is used for continuations), data lines
+ may not. However newlines can be included in data by means
+ of the \n escape.
+
+
+
+AUTHOR
+ Philip Hazel <ph10@cam.ac.uk>
+ University Computing Service,
+ New Museums Site,
+ Cambridge CB2 3QG, England.
+ Phone: +44 1223 334714
+
+ Last updated: 15 August 2001
+ Copyright (c) 1997-2001 University of Cambridge.
diff --git a/srclib/pcre/doc/perltest.txt b/srclib/pcre/doc/perltest.txt
index 6c38ebe19f..5a404016b5 100644
--- a/srclib/pcre/doc/perltest.txt
+++ b/srclib/pcre/doc/perltest.txt
@@ -13,11 +13,17 @@ for perltest as well as for pcretest, and the special upper case modifiers such
as /A that pcretest recognizes are not used in these files. The output should
be identical, apart from the initial identifying banner.
+For testing UTF-8 features, an alternative form of perltest, called perltest8,
+is supplied. This requires Perl 5.6 or higher. It recognizes the special
+modifier /8 that pcretest uses to invoke UTF-8 functionality. The testinput5
+file can be fed to perltest8.
+
The testinput2 and testinput4 files are not suitable for feeding to perltest,
since they do make use of the special upper case modifiers and escapes that
pcretest uses to test some features of PCRE. The first of these files also
contains malformed regular expressions, in order to check that PCRE diagnoses
-them correctly.
+them correctly. Similarly, testinput6 tests UTF-8 features that do not relate
+to Perl.
Philip Hazel <ph10@cam.ac.uk>
-January 2000
+August 2000
diff --git a/srclib/pcre/get.c b/srclib/pcre/get.c
index 035668e301..55e736dc24 100644
--- a/srclib/pcre/get.c
+++ b/srclib/pcre/get.c
@@ -9,7 +9,7 @@ the file Tech.Notes for some information on the internals.
Written by: Philip Hazel <ph10@cam.ac.uk>
- Copyright (c) 1997-1999 University of Cambridge
+ Copyright (c) 1997-2001 University of Cambridge
-----------------------------------------------------------------------------
Permission is granted to anyone to use this software for any purpose on any
@@ -144,6 +144,25 @@ return 0;
/*************************************************
+* Free store obtained by get_substring_list *
+*************************************************/
+
+/* This function exists for the benefit of people calling PCRE from non-C
+programs that can call its functions, but not free() or (pcre_free)() directly.
+
+Argument: the result of a previous pcre_get_substring_list()
+Returns: nothing
+*/
+
+void
+pcre_free_substring_list(const char **pointer)
+{
+(pcre_free)((void *)pointer);
+}
+
+
+
+/*************************************************
* Copy captured string to new store *
*************************************************/
@@ -186,4 +205,23 @@ substring[yield] = 0;
return yield;
}
+
+
+/*************************************************
+* Free store obtained by get_substring *
+*************************************************/
+
+/* This function exists for the benefit of people calling PCRE from non-C
+programs that can call its functions, but not free() or (pcre_free)() directly.
+
+Argument: the result of a previous pcre_get_substring()
+Returns: nothing
+*/
+
+void
+pcre_free_substring(const char *pointer)
+{
+(pcre_free)((void *)pointer);
+}
+
/* End of get.c */
diff --git a/srclib/pcre/internal.h b/srclib/pcre/internal.h
index 91ff3018f1..0c8c1c9df6 100644
--- a/srclib/pcre/internal.h
+++ b/srclib/pcre/internal.h
@@ -9,7 +9,7 @@ the file Tech.Notes for some information on the internals.
Written by: Philip Hazel <ph10@cam.ac.uk>
- Copyright (c) 1997-2000 University of Cambridge
+ Copyright (c) 1997-2001 University of Cambridge
-----------------------------------------------------------------------------
Permission is granted to anyone to use this software for any purpose on any
@@ -40,11 +40,27 @@ modules, but which are not relevant to the outside. */
#include "config.h"
/* To cope with SunOS4 and other systems that lack memmove() but have bcopy(),
-define a macro for memmove() if HAVE_MEMMOVE is false. */
+define a macro for memmove() if HAVE_MEMMOVE is false, provided that HAVE_BCOPY
+is set. Otherwise, include an emulating function for those systems that have
+neither (there some non-Unix environments where this is the case). This assumes
+that all calls to memmove are moving strings upwards in store, which is the
+case in PCRE. */
#if ! HAVE_MEMMOVE
#undef memmove /* some systems may have a macro */
+#if HAVE_BCOPY
#define memmove(a, b, c) bcopy(b, a, c)
+#else
+void *
+pcre_memmove(unsigned char *dest, const unsigned char *src, size_t n)
+{
+int i;
+dest += n;
+src += n;
+for (i = 0; i < n; ++i) *(--dest) = *(--src);
+}
+#define memmove(a, b, c) pcre_memmove(a, b, c)
+#endif
#endif
/* Standard C headers plus the external interface definition */
@@ -89,7 +105,7 @@ time, run time or study time, respectively. */
#define PUBLIC_OPTIONS \
(PCRE_CASELESS|PCRE_EXTENDED|PCRE_ANCHORED|PCRE_MULTILINE| \
- PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY)
+ PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY|PCRE_UTF8)
#define PUBLIC_EXEC_OPTIONS \
(PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY)
@@ -107,12 +123,36 @@ typedef int BOOL;
#define FALSE 0
#define TRUE 1
+/* Escape items that are just an encoding of a particular data value. Note that
+ESC_N is defined as yet another macro, which is set in config.h to either \n
+(the default) or \r (which some people want). */
+
+#ifndef ESC_E
+#define ESC_E 27
+#endif
+
+#ifndef ESC_F
+#define ESC_F '\f'
+#endif
+
+#ifndef ESC_N
+#define ESC_N NEWLINE
+#endif
+
+#ifndef ESC_R
+#define ESC_R '\r'
+#endif
+
+#ifndef ESC_T
+#define ESC_T '\t'
+#endif
+
/* These are escaped items that aren't just an encoding of a particular data
value such as \n. They must have non-zero values, as check_escape() returns
their negation. Also, they must appear in the same order as in the opcode
definitions below, up to ESC_z. The final one must be ESC_REF as subsequent
values are used for \1, \2, \3, etc. There is a test in the code for an escape
-greater than ESC_b and less than ESC_X to detect the types that may be
+greater than ESC_b and less than ESC_Z to detect the types that may be
repeated. If any new escapes are put in-between that don't consume a character,
that code will have to change. */
@@ -208,19 +248,26 @@ enum {
OP_ONCE, /* Once matched, don't back up into the subpattern */
OP_COND, /* Conditional group */
- OP_CREF, /* Used to hold an extraction string number */
+ OP_CREF, /* Used to hold an extraction string number (cond ref) */
OP_BRAZERO, /* These two must remain together and in this */
OP_BRAMINZERO, /* order. */
+ OP_BRANUMBER, /* Used for extracting brackets whose number is greater
+ than can fit into an opcode. */
+
OP_BRA /* This and greater values are used for brackets that
- extract substrings. */
+ extract substrings up to a basic limit. After that,
+ use is made of OP_BRANUMBER. */
};
-/* The highest extraction number. This is limited by the number of opcodes
-left after OP_BRA, i.e. 255 - OP_BRA. We actually set it somewhat lower. */
+/* The highest extraction number before we have to start using additional
+bytes. (Originally PCRE didn't have support for extraction counts highter than
+this number.) The value is limited by the number of opcodes left after OP_BRA,
+i.e. 255 - OP_BRA. We actually set it a bit lower to leave room for additional
+opcodes. */
-#define EXTRACT_MAX 99
+#define EXTRACT_BASIC_MAX 150
/* The texts of compile-time error messages are defined as macros here so that
they can be accessed by the POSIX wrapper and converted into error codes. Yes,
@@ -239,13 +286,13 @@ just to accommodate the POSIX wrapper. */
#define ERR10 "operand of unlimited repeat could match the empty string"
#define ERR11 "internal error: unexpected repeat"
#define ERR12 "unrecognized character after (?"
-#define ERR13 "too many capturing parenthesized sub-patterns"
+#define ERR13 "unused error"
#define ERR14 "missing )"
#define ERR15 "back reference to non-existent subpattern"
#define ERR16 "erroffset passed as NULL"
#define ERR17 "unknown option bit(s) set"
#define ERR18 "missing ) after comment"
-#define ERR19 "too many sets of parentheses"
+#define ERR19 "parentheses nested too deeply"
#define ERR20 "regular expression too large"
#define ERR21 "failed to get memory"
#define ERR22 "unmatched parentheses"
@@ -258,6 +305,10 @@ just to accommodate the POSIX wrapper. */
#define ERR29 "(?p must be followed by )"
#define ERR30 "unknown POSIX class name"
#define ERR31 "POSIX collating elements are not supported"
+#define ERR32 "this version of PCRE is not compiled with PCRE_UTF8 support"
+#define ERR33 "characters with values > 255 are not yet supported in classes"
+#define ERR34 "character value in \\x{...} sequence is too large"
+#define ERR35 "invalid condition (?(0)"
/* All character handling must be done as unsigned characters. Otherwise there
are problems with top-bit-set characters and functions such as isspace().
@@ -276,8 +327,8 @@ typedef struct real_pcre {
size_t size;
const unsigned char *tables;
unsigned long int options;
- uschar top_bracket;
- uschar top_backref;
+ unsigned short int top_bracket;
+ unsigned short int top_backref;
uschar first_char;
uschar req_char;
uschar code[1];
@@ -314,6 +365,7 @@ typedef struct match_data {
BOOL offset_overflow; /* Set if too many extractions */
BOOL notbol; /* NOTBOL flag */
BOOL noteol; /* NOTEOL flag */
+ BOOL utf8; /* UTF8 flag */
BOOL endonly; /* Dollar not before final \n */
BOOL notempty; /* Empty string match not wanted */
const uschar *start_pattern; /* For use when recursing */
diff --git a/srclib/pcre/ltmain.sh b/srclib/pcre/ltmain.sh
index 50515ad0b9..5959c479b0 100644
--- a/srclib/pcre/ltmain.sh
+++ b/srclib/pcre/ltmain.sh
@@ -1,7 +1,8 @@
# ltmain.sh - Provide generalized library-building support services.
-# NOTE: Changing this file will not affect anything until you rerun ltconfig.
+# NOTE: Changing this file will not affect anything until you rerun configure.
#
-# Copyright (C) 1996-1999 Free Software Foundation, Inc.
+# Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001
+# Free Software Foundation, Inc.
# Originally by Gordon Matzigkeit <gord@gnu.ai.mit.edu>, 1996
#
# This program is free software; you can redistribute it and/or modify
@@ -54,8 +55,8 @@ modename="$progname"
# Constants.
PROGRAM=ltmain.sh
PACKAGE=libtool
-VERSION=1.3.4
-TIMESTAMP=" (1.385.2.196 1999/12/07 21:47:57)"
+VERSION=1.4
+TIMESTAMP=" (1.920 2001/04/24 23:26:18)"
default_mode=
help="Try \`$progname --help' for more information."
@@ -83,12 +84,6 @@ if test "${LANG+set}" = set; then
save_LANG="$LANG"; LANG=C; export LANG
fi
-if test "$LTCONFIG_VERSION" != "$VERSION"; then
- echo "$modename: ltconfig version \`$LTCONFIG_VERSION' does not match $PROGRAM version \`$VERSION'" 1>&2
- echo "Fatal configuration error. See the $PACKAGE docs for more information." 1>&2
- exit 1
-fi
-
if test "$build_libtool_libs" != yes && test "$build_old_libs" != yes; then
echo "$modename: not configured to build any kind of library" 1>&2
echo "Fatal configuration error. See the $PACKAGE docs for more information." 1>&2
@@ -113,16 +108,16 @@ do
arg="$1"
shift
- case "$arg" in
+ case $arg in
-*=*) optarg=`$echo "X$arg" | $Xsed -e 's/[-_a-zA-Z0-9]*=//'` ;;
*) optarg= ;;
esac
# If the previous option needs an argument, assign it.
if test -n "$prev"; then
- case "$prev" in
+ case $prev in
execute_dlfiles)
- eval "$prev=\"\$$prev \$arg\""
+ execute_dlfiles="$execute_dlfiles $arg"
;;
*)
eval "$prev=\$arg"
@@ -135,7 +130,7 @@ do
fi
# Have we seen a non-optional argument yet?
- case "$arg" in
+ case $arg in
--help)
show_help=yes
;;
@@ -146,7 +141,7 @@ do
;;
--config)
- sed -e '1,/^### BEGIN LIBTOOL CONFIG/d' -e '/^### END LIBTOOL CONFIG/,$d' $0
+ sed -e '1,/^# ### BEGIN LIBTOOL CONFIG/d' -e '/^# ### END LIBTOOL CONFIG/,$d' $0
exit 0
;;
@@ -211,12 +206,12 @@ if test -z "$show_help"; then
# Infer the operation mode.
if test -z "$mode"; then
- case "$nonopt" in
+ case $nonopt in
*cc | *++ | gcc* | *-gcc*)
mode=link
for arg
do
- case "$arg" in
+ case $arg in
-c)
mode=compile
break
@@ -261,12 +256,13 @@ if test -z "$show_help"; then
help="Try \`$modename --help --mode=$mode' for more information."
# These modes are in order of execution frequency so that they run quickly.
- case "$mode" in
+ case $mode in
# libtool compile mode
compile)
modename="$modename: compile"
# Get the compilation command and the source file.
base_compile=
+ prev=
lastarg=
srcfile="$nonopt"
suppress_output=
@@ -274,8 +270,34 @@ if test -z "$show_help"; then
user_target=no
for arg
do
+ case $prev in
+ "") ;;
+ xcompiler)
+ # Aesthetically quote the previous argument.
+ prev=
+ lastarg=`$echo "X$arg" | $Xsed -e "$sed_quote_subst"`
+
+ case $arg in
+ # Double-quote args containing other shell metacharacters.
+ # Many Bourne shells cannot handle close brackets correctly
+ # in scan sets, so we specify it separately.
+ *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \ ]*|*]*|"")
+ arg="\"$arg\""
+ ;;
+ esac
+
+ # Add the previous argument to base_compile.
+ if test -z "$base_compile"; then
+ base_compile="$lastarg"
+ else
+ base_compile="$base_compile $lastarg"
+ fi
+ continue
+ ;;
+ esac
+
# Accept any command-line options.
- case "$arg" in
+ case $arg in
-o)
if test "$user_target" != "no"; then
$echo "$modename: you cannot specify \`-o' more than once" 1>&2
@@ -288,9 +310,53 @@ if test -z "$show_help"; then
build_old_libs=yes
continue
;;
+
+ -prefer-pic)
+ pic_mode=yes
+ continue
+ ;;
+
+ -prefer-non-pic)
+ pic_mode=no
+ continue
+ ;;
+
+ -Xcompiler)
+ prev=xcompiler
+ continue
+ ;;
+
+ -Wc,*)
+ args=`$echo "X$arg" | $Xsed -e "s/^-Wc,//"`
+ lastarg=
+ IFS="${IFS= }"; save_ifs="$IFS"; IFS=','
+ for arg in $args; do
+ IFS="$save_ifs"
+
+ # Double-quote args containing other shell metacharacters.
+ # Many Bourne shells cannot handle close brackets correctly
+ # in scan sets, so we specify it separately.
+ case $arg in
+ *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \ ]*|*]*|"")
+ arg="\"$arg\""
+ ;;
+ esac
+ lastarg="$lastarg $arg"
+ done
+ IFS="$save_ifs"
+ lastarg=`$echo "X$lastarg" | $Xsed -e "s/^ //"`
+
+ # Add the arguments to base_compile.
+ if test -z "$base_compile"; then
+ base_compile="$lastarg"
+ else
+ base_compile="$base_compile $lastarg"
+ fi
+ continue
+ ;;
esac
- case "$user_target" in
+ case $user_target in
next)
# The next one is the -o target name
user_target=yes
@@ -316,10 +382,10 @@ if test -z "$show_help"; then
lastarg=`$echo "X$lastarg" | $Xsed -e "$sed_quote_subst"`
# Double-quote args containing other shell metacharacters.
- # Many Bourne shells cannot handle close brackets correctly in scan
- # sets, so we specify it separately.
- case "$lastarg" in
- *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \ ]*|*]*)
+ # Many Bourne shells cannot handle close brackets correctly
+ # in scan sets, so we specify it separately.
+ case $lastarg in
+ *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \ ]*|*]*|"")
lastarg="\"$lastarg\""
;;
esac
@@ -332,7 +398,7 @@ if test -z "$show_help"; then
fi
done
- case "$user_target" in
+ case $user_target in
set)
;;
no)
@@ -348,7 +414,7 @@ if test -z "$show_help"; then
# Recognize several different file suffixes.
# If the user specifies -o file.o, it is replaced with file.lo
xform='[cCFSfmso]'
- case "$libobj" in
+ case $libobj in
*.ada) xform=ada ;;
*.adb) xform=adb ;;
*.ads) xform=ads ;;
@@ -363,7 +429,7 @@ if test -z "$show_help"; then
libobj=`$echo "X$libobj" | $Xsed -e "s/\.$xform$/.lo/"`
- case "$libobj" in
+ case $libobj in
*.lo) obj=`$echo "X$libobj" | $Xsed -e "$lo2o"` ;;
*)
$echo "$modename: cannot determine name of library object from \`$libobj'" 1>&2
@@ -387,10 +453,21 @@ if test -z "$show_help"; then
$run $rm $removelist
trap "$run $rm $removelist; exit 1" 1 2 15
+ # On Cygwin there's no "real" PIC flag so we must build both object types
+ case $host_os in
+ cygwin* | mingw* | pw32* | os2*)
+ pic_mode=default
+ ;;
+ esac
+ if test $pic_mode = no && test "$deplibs_check_method" != pass_all; then
+ # non-PIC code in shared libraries is not supported
+ pic_mode=default
+ fi
+
# Calculate the filename of the output object if compiler does
# not support -o with -c
if test "$compiler_c_o" = no; then
- output_obj=`$echo "X$srcfile" | $Xsed -e 's%^.*/%%' -e 's%\..*$%%'`.${objext}
+ output_obj=`$echo "X$srcfile" | $Xsed -e 's%^.*/%%' -e 's%\.[^.]*$%%'`.${objext}
lockfile="$output_obj.lock"
removelist="$removelist $output_obj $lockfile"
trap "$run $rm $removelist; exit 1" 1 2 15
@@ -402,7 +479,7 @@ if test -z "$show_help"; then
# Lock this critical section if it is needed
# We use this script file to make the link, it avoids creating a new file
if test "$need_locks" = yes; then
- until ln "$0" "$lockfile" 2>/dev/null; do
+ until $run ln "$0" "$lockfile" 2>/dev/null; do
$show "Waiting for $lockfile to be removed"
sleep 2
done
@@ -434,8 +511,13 @@ compiler."
# Without this assignment, base_compile gets emptied.
fbsd_hideous_sh_bug=$base_compile
- # All platforms use -DPIC, to notify preprocessed assembler code.
- command="$base_compile $srcfile $pic_flag -DPIC"
+ if test "$pic_mode" != no; then
+ # All platforms use -DPIC, to notify preprocessed assembler code.
+ command="$base_compile $srcfile $pic_flag -DPIC"
+ else
+ # Don't build PIC code
+ command="$base_compile $srcfile"
+ fi
if test "$build_old_libs" = yes; then
lo_libobj="$libobj"
dir=`$echo "X$libobj" | $Xsed -e 's%/[^/]*$%%'`
@@ -506,7 +588,8 @@ compiler."
fi
# If we have no pic_flag, then copy the object into place and finish.
- if test -z "$pic_flag" && test "$build_old_libs" = yes; then
+ if (test -z "$pic_flag" || test "$pic_mode" != default) &&
+ test "$build_old_libs" = yes; then
# Rename the .lo from within objdir to obj
if test -f $obj; then
$show $rm $obj
@@ -546,7 +629,13 @@ compiler."
# Only build a position-dependent object if we build old libraries.
if test "$build_old_libs" = yes; then
- command="$base_compile $srcfile"
+ if test "$pic_mode" != yes; then
+ # Don't build PIC code
+ command="$base_compile $srcfile"
+ else
+ # All platforms use -DPIC, to notify preprocessed assembler code.
+ command="$base_compile $srcfile $pic_flag -DPIC"
+ fi
if test "$compiler_c_o" = yes; then
command="$command -o $obj"
output_obj="$obj"
@@ -612,17 +701,17 @@ compiler."
# Unlock the critical section if it was locked
if test "$need_locks" != no; then
- $rm "$lockfile"
+ $run $rm "$lockfile"
fi
exit 0
;;
# libtool link mode
- link)
+ link | relink)
modename="$modename: link"
- case "$host" in
- *-*-cygwin* | *-*-mingw* | *-*-os2*)
+ case $host in
+ *-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-os2*)
# It is impossible to link a dll without this setting, and
# we shouldn't force the makefile maintainer to figure out
# which system we are compiling for in order to pass an extra
@@ -635,179 +724,12 @@ compiler."
# -no-undefined on the libtool link line when we can be certain
# that all symbols are satisfied, otherwise we get a static library.
allow_undefined=yes
-
- # This is a source program that is used to create dlls on Windows
- # Don't remove nor modify the starting and closing comments
-# /* ltdll.c starts here */
-# #define WIN32_LEAN_AND_MEAN
-# #include <windows.h>
-# #undef WIN32_LEAN_AND_MEAN
-# #include <stdio.h>
-#
-# #ifndef __CYGWIN__
-# # ifdef __CYGWIN32__
-# # define __CYGWIN__ __CYGWIN32__
-# # endif
-# #endif
-#
-# #ifdef __cplusplus
-# extern "C" {
-# #endif
-# BOOL APIENTRY DllMain (HINSTANCE hInst, DWORD reason, LPVOID reserved);
-# #ifdef __cplusplus
-# }
-# #endif
-#
-# #ifdef __CYGWIN__
-# #include <cygwin/cygwin_dll.h>
-# DECLARE_CYGWIN_DLL( DllMain );
-# #endif
-# HINSTANCE __hDllInstance_base;
-#
-# BOOL APIENTRY
-# DllMain (HINSTANCE hInst, DWORD reason, LPVOID reserved)
-# {
-# __hDllInstance_base = hInst;
-# return TRUE;
-# }
-# /* ltdll.c ends here */
- # This is a source program that is used to create import libraries
- # on Windows for dlls which lack them. Don't remove nor modify the
- # starting and closing comments
-# /* impgen.c starts here */
-# /* Copyright (C) 1999 Free Software Foundation, Inc.
-#
-# This file is part of GNU libtool.
-#
-# This program is free software; you can redistribute it and/or modify
-# it under the terms of the GNU General Public License as published by
-# the Free Software Foundation; either version 2 of the License, or
-# (at your option) any later version.
-#
-# This program is distributed in the hope that it will be useful,
-# but WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-# GNU General Public License for more details.
-#
-# You should have received a copy of the GNU General Public License
-# along with this program; if not, write to the Free Software
-# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
-# */
-#
-# #include <stdio.h> /* for printf() */
-# #include <unistd.h> /* for open(), lseek(), read() */
-# #include <fcntl.h> /* for O_RDONLY, O_BINARY */
-# #include <string.h> /* for strdup() */
-#
-# static unsigned int
-# pe_get16 (fd, offset)
-# int fd;
-# int offset;
-# {
-# unsigned char b[2];
-# lseek (fd, offset, SEEK_SET);
-# read (fd, b, 2);
-# return b[0] + (b[1]<<8);
-# }
-#
-# static unsigned int
-# pe_get32 (fd, offset)
-# int fd;
-# int offset;
-# {
-# unsigned char b[4];
-# lseek (fd, offset, SEEK_SET);
-# read (fd, b, 4);
-# return b[0] + (b[1]<<8) + (b[2]<<16) + (b[3]<<24);
-# }
-#
-# static unsigned int
-# pe_as32 (ptr)
-# void *ptr;
-# {
-# unsigned char *b = ptr;
-# return b[0] + (b[1]<<8) + (b[2]<<16) + (b[3]<<24);
-# }
-#
-# int
-# main (argc, argv)
-# int argc;
-# char *argv[];
-# {
-# int dll;
-# unsigned long pe_header_offset, opthdr_ofs, num_entries, i;
-# unsigned long export_rva, export_size, nsections, secptr, expptr;
-# unsigned long name_rvas, nexp;
-# unsigned char *expdata, *erva;
-# char *filename, *dll_name;
-#
-# filename = argv[1];
-#
-# dll = open(filename, O_RDONLY|O_BINARY);
-# if (!dll)
-# return 1;
-#
-# dll_name = filename;
-#
-# for (i=0; filename[i]; i++)
-# if (filename[i] == '/' || filename[i] == '\\' || filename[i] == ':')
-# dll_name = filename + i +1;
-#
-# pe_header_offset = pe_get32 (dll, 0x3c);
-# opthdr_ofs = pe_header_offset + 4 + 20;
-# num_entries = pe_get32 (dll, opthdr_ofs + 92);
-#
-# if (num_entries < 1) /* no exports */
-# return 1;
-#
-# export_rva = pe_get32 (dll, opthdr_ofs + 96);
-# export_size = pe_get32 (dll, opthdr_ofs + 100);
-# nsections = pe_get16 (dll, pe_header_offset + 4 +2);
-# secptr = (pe_header_offset + 4 + 20 +
-# pe_get16 (dll, pe_header_offset + 4 + 16));
-#
-# expptr = 0;
-# for (i = 0; i < nsections; i++)
-# {
-# char sname[8];
-# unsigned long secptr1 = secptr + 40 * i;
-# unsigned long vaddr = pe_get32 (dll, secptr1 + 12);
-# unsigned long vsize = pe_get32 (dll, secptr1 + 16);
-# unsigned long fptr = pe_get32 (dll, secptr1 + 20);
-# lseek(dll, secptr1, SEEK_SET);
-# read(dll, sname, 8);
-# if (vaddr <= export_rva && vaddr+vsize > export_rva)
-# {
-# expptr = fptr + (export_rva - vaddr);
-# if (export_rva + export_size > vaddr + vsize)
-# export_size = vsize - (export_rva - vaddr);
-# break;
-# }
-# }
-#
-# expdata = (unsigned char*)malloc(export_size);
-# lseek (dll, expptr, SEEK_SET);
-# read (dll, expdata, export_size);
-# erva = expdata - export_rva;
-#
-# nexp = pe_as32 (expdata+24);
-# name_rvas = pe_as32 (expdata+32);
-#
-# printf ("EXPORTS\n");
-# for (i = 0; i<nexp; i++)
-# {
-# unsigned long name_rva = pe_as32 (erva+name_rvas+i*4);
-# printf ("\t%s @ %ld ;\n", erva+name_rva, 1+ i);
-# }
-#
-# return 0;
-# }
-# /* impgen.c ends here */
;;
*)
allow_undefined=yes
;;
esac
+ libtool_args="$nonopt"
compile_command="$nonopt"
finalize_command="$nonopt"
@@ -818,18 +740,12 @@ compiler."
convenience=
old_convenience=
deplibs=
- linkopts=
+ old_deplibs=
+ compiler_flags=
+ linker_flags=
+ dllsearchpath=
+ lib_search_path=`pwd`
- if test -n "$shlibpath_var"; then
- # get the directories listed in $shlibpath_var
- eval lib_search_path=\`\$echo \"X \${$shlibpath_var}\" \| \$Xsed -e \'s/:/ /g\'\`
- else
- lib_search_path=
- fi
- # now prepend the system-specific ones
- eval lib_search_path=\"$sys_lib_search_path_spec\$lib_search_path\"
- eval sys_lib_dlsearch_path=\"$sys_lib_dlsearch_path_spec\"
-
avoid_version=no
dlfiles=
dlprefiles=
@@ -839,9 +755,9 @@ compiler."
export_symbols_regex=
generated=
libobjs=
- link_against_libtool_libs=
ltlibs=
module=no
+ no_install=no
objs=
prefer_static_libs=no
preload=no
@@ -858,7 +774,7 @@ compiler."
# We need to know -static, to get the right output filenames.
for arg
do
- case "$arg" in
+ case $arg in
-all-static | -static)
if test "X$arg" = "X-all-static"; then
if test "$build_libtool_libs" = yes && test -z "$link_static_flag"; then
@@ -887,17 +803,24 @@ compiler."
while test $# -gt 0; do
arg="$1"
shift
+ case $arg in
+ *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \ ]*|*]*|"")
+ qarg=\"`$echo "X$arg" | $Xsed -e "$sed_quote_subst"`\" ### testsuite: skip nested quoting test
+ ;;
+ *) qarg=$arg ;;
+ esac
+ libtool_args="$libtool_args $qarg"
# If the previous option needs an argument, assign it.
if test -n "$prev"; then
- case "$prev" in
+ case $prev in
output)
compile_command="$compile_command @OUTPUT@"
finalize_command="$finalize_command @OUTPUT@"
;;
esac
- case "$prev" in
+ case $prev in
dlfiles|dlprefiles)
if test "$preload" = no; then
# Add the symbol object into the linking commands.
@@ -905,7 +828,7 @@ compiler."
finalize_command="$finalize_command @SYMFILE@"
preload=yes
fi
- case "$arg" in
+ case $arg in
*.la | *.lo) ;; # We handle these cases below.
force)
if test "$dlself" = no; then
@@ -934,6 +857,7 @@ compiler."
dlprefiles="$dlprefiles $arg"
fi
prev=
+ continue
;;
esac
;;
@@ -958,7 +882,7 @@ compiler."
;;
rpath | xrpath)
# We need an absolute path.
- case "$arg" in
+ case $arg in
[\\/]* | [A-Za-z]:[\\/]*) ;;
*)
$echo "$modename: only absolute run-paths are allowed" 1>&2
@@ -979,17 +903,32 @@ compiler."
prev=
continue
;;
+ xcompiler)
+ compiler_flags="$compiler_flags $qarg"
+ prev=
+ compile_command="$compile_command $qarg"
+ finalize_command="$finalize_command $qarg"
+ continue
+ ;;
+ xlinker)
+ linker_flags="$linker_flags $qarg"
+ compiler_flags="$compiler_flags $wl$qarg"
+ prev=
+ compile_command="$compile_command $wl$qarg"
+ finalize_command="$finalize_command $wl$qarg"
+ continue
+ ;;
*)
eval "$prev=\"\$arg\""
prev=
continue
;;
esac
- fi
+ fi # test -n $prev
prevarg="$arg"
- case "$arg" in
+ case $arg in
-all-static)
if test -n "$link_static_flag"; then
compile_command="$compile_command $link_static_flag"
@@ -1026,7 +965,7 @@ compiler."
-export-symbols | -export-symbols-regex)
if test -n "$export_symbols" || test -n "$export_symbols_regex"; then
- $echo "$modename: not more than one -exported-symbols argument allowed"
+ $echo "$modename: more than one -exported-symbols argument is not allowed"
exit 1
fi
if test "X$arg" = "X-export-symbols"; then
@@ -1037,58 +976,65 @@ compiler."
continue
;;
+ # The native IRIX linker understands -LANG:*, -LIST:* and -LNO:*
+ # so, if we see these flags be careful not to treat them like -L
+ -L[A-Z][A-Z]*:*)
+ case $with_gcc/$host in
+ no/*-*-irix*)
+ compile_command="$compile_command $arg"
+ finalize_command="$finalize_command $arg"
+ ;;
+ esac
+ continue
+ ;;
+
-L*)
dir=`$echo "X$arg" | $Xsed -e 's/^-L//'`
# We need an absolute path.
- case "$dir" in
+ case $dir in
[\\/]* | [A-Za-z]:[\\/]*) ;;
*)
absdir=`cd "$dir" && pwd`
if test -z "$absdir"; then
- $echo "$modename: warning: cannot determine absolute directory name of \`$dir'" 1>&2
- $echo "$modename: passing it literally to the linker, although it might fail" 1>&2
- absdir="$dir"
+ $echo "$modename: cannot determine absolute directory name of \`$dir'" 1>&2
+ exit 1
fi
dir="$absdir"
;;
esac
- case " $deplibs " in
- *" $arg "*) ;;
- *) deplibs="$deplibs $arg";;
- esac
- case " $lib_search_path " in
- *" $dir "*) ;;
- *) lib_search_path="$lib_search_path $dir";;
+ case "$deplibs " in
+ *" -L$dir "*) ;;
+ *)
+ deplibs="$deplibs -L$dir"
+ lib_search_path="$lib_search_path $dir"
+ ;;
esac
- case "$host" in
- *-*-cygwin* | *-*-mingw* | *-*-os2*)
- dllsearchdir=`cd "$dir" && pwd || echo "$dir"`
- case ":$dllsearchpath:" in
- ::) dllsearchpath="$dllsearchdir";;
- *":$dllsearchdir:"*) ;;
- *) dllsearchpath="$dllsearchpath:$dllsearchdir";;
+ case $host in
+ *-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-os2*)
+ case :$dllsearchpath: in
+ *":$dir:"*) ;;
+ *) dllsearchpath="$dllsearchpath:$dir";;
esac
;;
esac
+ continue
;;
-l*)
- if test "$arg" = "-lc"; then
- case "$host" in
- *-*-cygwin* | *-*-mingw* | *-*-os2* | *-*-beos*)
- # These systems don't actually have c library (as such)
+ if test "X$arg" = "X-lc" || test "X$arg" = "X-lm"; then
+ case $host in
+ *-*-cygwin* | *-*-pw32* | *-*-beos*)
+ # These systems don't actually have a C or math library (as such)
continue
;;
- esac
- elif test "$arg" = "-lm"; then
- case "$host" in
- *-*-cygwin* | *-*-beos*)
- # These systems don't actually have math library (as such)
- continue
+ *-*-mingw* | *-*-os2*)
+ # These systems don't actually have a C library (as such)
+ test "X$arg" = "X-lc" && continue
;;
esac
fi
deplibs="$deplibs $arg"
+ continue
;;
-module)
@@ -1096,6 +1042,25 @@ compiler."
continue
;;
+ -no-fast-install)
+ fast_install=no
+ continue
+ ;;
+
+ -no-install)
+ case $host in
+ *-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-os2*)
+ # The PATH hackery in wrapper scripts is required on Windows
+ # in order for the loader to find any dlls it needs.
+ $echo "$modename: warning: \`-no-install' is ignored for $host" 1>&2
+ $echo "$modename: warning: assuming \`-no-fast-install' instead" 1>&2
+ fast_install=no
+ ;;
+ *) no_install=yes ;;
+ esac
+ continue
+ ;;
+
-no-undefined)
allow_undefined=no
continue
@@ -1121,7 +1086,7 @@ compiler."
-R*)
dir=`$echo "X$arg" | $Xsed -e 's/^-R//'`
# We need an absolute path.
- case "$dir" in
+ case $dir in
[\\/]* | [A-Za-z]:[\\/]*) ;;
*)
$echo "$modename: only absolute run-paths are allowed" 1>&2
@@ -1136,11 +1101,11 @@ compiler."
;;
-static)
- # If we have no pic_flag, then this is the same as -all-static.
- if test -z "$pic_flag" && test -n "$link_static_flag"; then
- compile_command="$compile_command $link_static_flag"
- finalize_command="$finalize_command $link_static_flag"
- fi
+ # The effects of -static are defined in a previous loop.
+ # We used to do the same as -all-static on platforms that
+ # didn't have a PIC flag, but the assumption that the effects
+ # would be equivalent was wrong. It would break on at least
+ # Digital Unix and AIX.
continue
;;
@@ -1154,28 +1119,71 @@ compiler."
continue
;;
+ -Wc,*)
+ args=`$echo "X$arg" | $Xsed -e "$sed_quote_subst" -e 's/^-Wc,//'`
+ arg=
+ IFS="${IFS= }"; save_ifs="$IFS"; IFS=','
+ for flag in $args; do
+ IFS="$save_ifs"
+ case $flag in
+ *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \ ]*|*]*|"")
+ flag="\"$flag\""
+ ;;
+ esac
+ arg="$arg $wl$flag"
+ compiler_flags="$compiler_flags $flag"
+ done
+ IFS="$save_ifs"
+ arg=`$echo "X$arg" | $Xsed -e "s/^ //"`
+ ;;
+
+ -Wl,*)
+ args=`$echo "X$arg" | $Xsed -e "$sed_quote_subst" -e 's/^-Wl,//'`
+ arg=
+ IFS="${IFS= }"; save_ifs="$IFS"; IFS=','
+ for flag in $args; do
+ IFS="$save_ifs"
+ case $flag in
+ *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \ ]*|*]*|"")
+ flag="\"$flag\""
+ ;;
+ esac
+ arg="$arg $wl$flag"
+ compiler_flags="$compiler_flags $wl$flag"
+ linker_flags="$linker_flags $flag"
+ done
+ IFS="$save_ifs"
+ arg=`$echo "X$arg" | $Xsed -e "s/^ //"`
+ ;;
+
+ -Xcompiler)
+ prev=xcompiler
+ continue
+ ;;
+
+ -Xlinker)
+ prev=xlinker
+ continue
+ ;;
+
# Some other compiler flag.
-* | +*)
# Unknown arguments in both finalize_command and compile_command need
# to be aesthetically quoted because they are evaled later.
arg=`$echo "X$arg" | $Xsed -e "$sed_quote_subst"`
- case "$arg" in
- *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \ ]*|*]*)
+ case $arg in
+ *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \ ]*|*]*|"")
arg="\"$arg\""
;;
esac
;;
- *.o | *.obj | *.a | *.lib)
- # A standard object.
- objs="$objs $arg"
- ;;
-
- *.lo)
- # A library object.
+ *.lo | *.$objext)
+ # A library or standard object.
if test "$prev" = dlfiles; then
- dlfiles="$dlfiles $arg"
- if test "$build_libtool_libs" = yes && test "$dlopen" = yes; then
+ # This file was specified with -dlopen.
+ if test "$build_libtool_libs" = yes && test "$dlopen_support" = yes; then
+ dlfiles="$dlfiles $arg"
prev=
continue
else
@@ -1188,357 +1196,890 @@ compiler."
# Preload the old-style object.
dlprefiles="$dlprefiles "`$echo "X$arg" | $Xsed -e "$lo2o"`
prev=
+ else
+ case $arg in
+ *.lo) libobjs="$libobjs $arg" ;;
+ *) objs="$objs $arg" ;;
+ esac
fi
- libobjs="$libobjs $arg"
+ ;;
+
+ *.$libext)
+ # An archive.
+ deplibs="$deplibs $arg"
+ old_deplibs="$old_deplibs $arg"
+ continue
;;
*.la)
# A libtool-controlled library.
- dlname=
- libdir=
- library_names=
- old_library=
+ if test "$prev" = dlfiles; then
+ # This library was specified with -dlopen.
+ dlfiles="$dlfiles $arg"
+ prev=
+ elif test "$prev" = dlprefiles; then
+ # The library was specified with -dlpreopen.
+ dlprefiles="$dlprefiles $arg"
+ prev=
+ else
+ deplibs="$deplibs $arg"
+ fi
+ continue
+ ;;
+
+ # Some other compiler argument.
+ *)
+ # Unknown arguments in both finalize_command and compile_command need
+ # to be aesthetically quoted because they are evaled later.
+ arg=`$echo "X$arg" | $Xsed -e "$sed_quote_subst"`
+ case $arg in
+ *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \ ]*|*]*|"")
+ arg="\"$arg\""
+ ;;
+ esac
+ ;;
+ esac # arg
+
+ # Now actually substitute the argument into the commands.
+ if test -n "$arg"; then
+ compile_command="$compile_command $arg"
+ finalize_command="$finalize_command $arg"
+ fi
+ done # argument parsing loop
+
+ if test -n "$prev"; then
+ $echo "$modename: the \`$prevarg' option requires an argument" 1>&2
+ $echo "$help" 1>&2
+ exit 1
+ fi
+
+ if test "$export_dynamic" = yes && test -n "$export_dynamic_flag_spec"; then
+ eval arg=\"$export_dynamic_flag_spec\"
+ compile_command="$compile_command $arg"
+ finalize_command="$finalize_command $arg"
+ fi
+
+ # calculate the name of the file, without its directory
+ outputname=`$echo "X$output" | $Xsed -e 's%^.*/%%'`
+ libobjs_save="$libobjs"
+
+ if test -n "$shlibpath_var"; then
+ # get the directories listed in $shlibpath_var
+ eval shlib_search_path=\`\$echo \"X\${$shlibpath_var}\" \| \$Xsed -e \'s/:/ /g\'\`
+ else
+ shlib_search_path=
+ fi
+ eval sys_lib_search_path=\"$sys_lib_search_path_spec\"
+ eval sys_lib_dlsearch_path=\"$sys_lib_dlsearch_path_spec\"
+
+ output_objdir=`$echo "X$output" | $Xsed -e 's%/[^/]*$%%'`
+ if test "X$output_objdir" = "X$output"; then
+ output_objdir="$objdir"
+ else
+ output_objdir="$output_objdir/$objdir"
+ fi
+ # Create the object directory.
+ if test ! -d $output_objdir; then
+ $show "$mkdir $output_objdir"
+ $run $mkdir $output_objdir
+ status=$?
+ if test $status -ne 0 && test ! -d $output_objdir; then
+ exit $status
+ fi
+ fi
+
+ # Determine the type of output
+ case $output in
+ "")
+ $echo "$modename: you must specify an output file" 1>&2
+ $echo "$help" 1>&2
+ exit 1
+ ;;
+ *.$libext) linkmode=oldlib ;;
+ *.lo | *.$objext) linkmode=obj ;;
+ *.la) linkmode=lib ;;
+ *) linkmode=prog ;; # Anything else should be a program.
+ esac
+
+ specialdeplibs=
+ libs=
+ # Find all interdependent deplibs by searching for libraries
+ # that are linked more than once (e.g. -la -lb -la)
+ for deplib in $deplibs; do
+ case "$libs " in
+ *" $deplib "*) specialdeplibs="$specialdeplibs $deplib" ;;
+ esac
+ libs="$libs $deplib"
+ done
+ deplibs=
+ newdependency_libs=
+ newlib_search_path=
+ need_relink=no # whether we're linking any uninstalled libtool libraries
+ notinst_deplibs= # not-installed libtool libraries
+ notinst_path= # paths that contain not-installed libtool libraries
+ case $linkmode in
+ lib)
+ passes="conv link"
+ for file in $dlfiles $dlprefiles; do
+ case $file in
+ *.la) ;;
+ *)
+ $echo "$modename: libraries can \`-dlopen' only libtool libraries: $file" 1>&2
+ exit 1
+ ;;
+ esac
+ done
+ ;;
+ prog)
+ compile_deplibs=
+ finalize_deplibs=
+ alldeplibs=no
+ newdlfiles=
+ newdlprefiles=
+ passes="conv scan dlopen dlpreopen link"
+ ;;
+ *) passes="conv"
+ ;;
+ esac
+ for pass in $passes; do
+ if test $linkmode = prog; then
+ # Determine which files to process
+ case $pass in
+ dlopen)
+ libs="$dlfiles"
+ save_deplibs="$deplibs" # Collect dlpreopened libraries
+ deplibs=
+ ;;
+ dlpreopen) libs="$dlprefiles" ;;
+ link) libs="$deplibs %DEPLIBS% $dependency_libs" ;;
+ esac
+ fi
+ for deplib in $libs; do
+ lib=
+ found=no
+ case $deplib in
+ -l*)
+ if test $linkmode = oldlib && test $linkmode = obj; then
+ $echo "$modename: warning: \`-l' is ignored for archives/objects: $deplib" 1>&2
+ continue
+ fi
+ if test $pass = conv; then
+ deplibs="$deplib $deplibs"
+ continue
+ fi
+ name=`$echo "X$deplib" | $Xsed -e 's/^-l//'`
+ for searchdir in $newlib_search_path $lib_search_path $sys_lib_search_path $shlib_search_path; do
+ # Search the libtool library
+ lib="$searchdir/lib${name}.la"
+ if test -f "$lib"; then
+ found=yes
+ break
+ fi
+ done
+ if test "$found" != yes; then
+ # deplib doesn't seem to be a libtool library
+ if test "$linkmode,$pass" = "prog,link"; then
+ compile_deplibs="$deplib $compile_deplibs"
+ finalize_deplibs="$deplib $finalize_deplibs"
+ else
+ deplibs="$deplib $deplibs"
+ test $linkmode = lib && newdependency_libs="$deplib $newdependency_libs"
+ fi
+ continue
+ fi
+ ;; # -l
+ -L*)
+ case $linkmode in
+ lib)
+ deplibs="$deplib $deplibs"
+ test $pass = conv && continue
+ newdependency_libs="$deplib $newdependency_libs"
+ newlib_search_path="$newlib_search_path "`$echo "X$deplib" | $Xsed -e 's/^-L//'`
+ ;;
+ prog)
+ if test $pass = conv; then
+ deplibs="$deplib $deplibs"
+ continue
+ fi
+ if test $pass = scan; then
+ deplibs="$deplib $deplibs"
+ newlib_search_path="$newlib_search_path "`$echo "X$deplib" | $Xsed -e 's/^-L//'`
+ else
+ compile_deplibs="$deplib $compile_deplibs"
+ finalize_deplibs="$deplib $finalize_deplibs"
+ fi
+ ;;
+ *)
+ $echo "$modename: warning: \`-L' is ignored for archives/objects: $deplib" 1>&2
+ ;;
+ esac # linkmode
+ continue
+ ;; # -L
+ -R*)
+ if test $pass = link; then
+ dir=`$echo "X$deplib" | $Xsed -e 's/^-R//'`
+ # Make sure the xrpath contains only unique directories.
+ case "$xrpath " in
+ *" $dir "*) ;;
+ *) xrpath="$xrpath $dir" ;;
+ esac
+ fi
+ deplibs="$deplib $deplibs"
+ continue
+ ;;
+ *.la) lib="$deplib" ;;
+ *.$libext)
+ if test $pass = conv; then
+ deplibs="$deplib $deplibs"
+ continue
+ fi
+ case $linkmode in
+ lib)
+ if test "$deplibs_check_method" != pass_all; then
+ echo
+ echo "*** Warning: This library needs some functionality provided by $deplib."
+ echo "*** I have the capability to make that library automatically link in when"
+ echo "*** you link to this library. But I can only do this if you have a"
+ echo "*** shared version of the library, which you do not appear to have."
+ else
+ echo
+ echo "*** Warning: Linking the shared library $output against the"
+ echo "*** static library $deplib is not portable!"
+ deplibs="$deplib $deplibs"
+ fi
+ continue
+ ;;
+ prog)
+ if test $pass != link; then
+ deplibs="$deplib $deplibs"
+ else
+ compile_deplibs="$deplib $compile_deplibs"
+ finalize_deplibs="$deplib $finalize_deplibs"
+ fi
+ continue
+ ;;
+ esac # linkmode
+ ;; # *.$libext
+ *.lo | *.$objext)
+ if test $pass = dlpreopen || test "$dlopen_support" != yes || test "$build_libtool_libs" = no; then
+ # If there is no dlopen support or we're linking statically,
+ # we need to preload.
+ newdlprefiles="$newdlprefiles $deplib"
+ compile_deplibs="$deplib $compile_deplibs"
+ finalize_deplibs="$deplib $finalize_deplibs"
+ else
+ newdlfiles="$newdlfiles $deplib"
+ fi
+ continue
+ ;;
+ %DEPLIBS%)
+ alldeplibs=yes
+ continue
+ ;;
+ esac # case $deplib
+ if test $found = yes || test -f "$lib"; then :
+ else
+ $echo "$modename: cannot find the library \`$lib'" 1>&2
+ exit 1
+ fi
# Check to see that this really is a libtool archive.
- if (sed -e '2q' $arg | egrep "^# Generated by .*$PACKAGE") >/dev/null 2>&1; then :
+ if (sed -e '2q' $lib | egrep "^# Generated by .*$PACKAGE") >/dev/null 2>&1; then :
else
- $echo "$modename: \`$arg' is not a valid libtool archive" 1>&2
+ $echo "$modename: \`$lib' is not a valid libtool archive" 1>&2
exit 1
fi
+ ladir=`$echo "X$lib" | $Xsed -e 's%/[^/]*$%%'`
+ test "X$ladir" = "X$lib" && ladir="."
+
+ dlname=
+ dlopen=
+ dlpreopen=
+ libdir=
+ library_names=
+ old_library=
# If the library was installed with an old release of libtool,
# it will not redefine variable installed.
installed=yes
# Read the .la file
- # If there is no directory component, then add one.
- case "$arg" in
- */* | *\\*) . $arg ;;
- *) . ./$arg ;;
+ case $lib in
+ */* | *\\*) . $lib ;;
+ *) . ./$lib ;;
esac
+ if test "$linkmode,$pass" = "lib,link" ||
+ test "$linkmode,$pass" = "prog,scan" ||
+ { test $linkmode = oldlib && test $linkmode = obj; }; then
+ # Add dl[pre]opened files of deplib
+ test -n "$dlopen" && dlfiles="$dlfiles $dlopen"
+ test -n "$dlpreopen" && dlprefiles="$dlprefiles $dlpreopen"
+ fi
+
+ if test $pass = conv; then
+ # Only check for convenience libraries
+ deplibs="$lib $deplibs"
+ if test -z "$libdir"; then
+ if test -z "$old_library"; then
+ $echo "$modename: cannot find name of link library for \`$lib'" 1>&2
+ exit 1
+ fi
+ # It is a libtool convenience library, so add in its objects.
+ convenience="$convenience $ladir/$objdir/$old_library"
+ old_convenience="$old_convenience $ladir/$objdir/$old_library"
+ tmp_libs=
+ for deplib in $dependency_libs; do
+ deplibs="$deplib $deplibs"
+ case "$tmp_libs " in
+ *" $deplib "*) specialdeplibs="$specialdeplibs $deplib" ;;
+ esac
+ tmp_libs="$tmp_libs $deplib"
+ done
+ elif test $linkmode != prog && test $linkmode != lib; then
+ $echo "$modename: \`$lib' is not a convenience library" 1>&2
+ exit 1
+ fi
+ continue
+ fi # $pass = conv
+
# Get the name of the library we link against.
linklib=
for l in $old_library $library_names; do
linklib="$l"
done
-
if test -z "$linklib"; then
- $echo "$modename: cannot find name of link library for \`$arg'" 1>&2
+ $echo "$modename: cannot find name of link library for \`$lib'" 1>&2
exit 1
fi
- # Find the relevant object directory and library name.
- name=`$echo "X$arg" | $Xsed -e 's%^.*/%%' -e 's/\.la$//' -e 's/^lib//'`
-
- if test "X$installed" = Xyes; then
- dir="$libdir"
- else
- dir=`$echo "X$arg" | $Xsed -e 's%/[^/]*$%%'`
- if test "X$dir" = "X$arg"; then
- dir="$objdir"
+ # This library was specified with -dlopen.
+ if test $pass = dlopen; then
+ if test -z "$libdir"; then
+ $echo "$modename: cannot -dlopen a convenience library: \`$lib'" 1>&2
+ exit 1
+ fi
+ if test -z "$dlname" || test "$dlopen_support" != yes || test "$build_libtool_libs" = no; then
+ # If there is no dlname, no dlopen support or we're linking
+ # statically, we need to preload.
+ dlprefiles="$dlprefiles $lib"
else
- dir="$dir/$objdir"
+ newdlfiles="$newdlfiles $lib"
fi
- fi
-
- if test -n "$dependency_libs"; then
- # Extract -R and -L from dependency_libs
- temp_deplibs=
- for deplib in $dependency_libs; do
- case "$deplib" in
- -R*) temp_xrpath=`$echo "X$deplib" | $Xsed -e 's/^-R//'`
- case " $rpath $xrpath " in
- *" $temp_xrpath "*) ;;
- *) xrpath="$xrpath $temp_xrpath";;
- esac;;
- -L*) case "$compile_command $temp_deplibs " in
- *" $deplib "*) ;;
- *) temp_deplibs="$temp_deplibs $deplib";;
- esac
- temp_dir=`$echo "X$deplib" | $Xsed -e 's/^-L//'`
- case " $lib_search_path " in
- *" $temp_dir "*) ;;
- *) lib_search_path="$lib_search_path $temp_dir";;
- esac
- ;;
- *) temp_deplibs="$temp_deplibs $deplib";;
- esac
- done
- dependency_libs="$temp_deplibs"
- fi
-
- if test -z "$libdir"; then
- # It is a libtool convenience library, so add in its objects.
- convenience="$convenience $dir/$old_library"
- old_convenience="$old_convenience $dir/$old_library"
- deplibs="$deplibs$dependency_libs"
- compile_command="$compile_command $dir/$old_library$dependency_libs"
- finalize_command="$finalize_command $dir/$old_library$dependency_libs"
continue
- fi
+ fi # $pass = dlopen
- # This library was specified with -dlopen.
- if test "$prev" = dlfiles; then
- dlfiles="$dlfiles $arg"
- if test -z "$dlname" || test "$dlopen" != yes || test "$build_libtool_libs" = no; then
- # If there is no dlname, no dlopen support or we're linking statically,
- # we need to preload.
- prev=dlprefiles
- else
- # We should not create a dependency on this library, but we
- # may need any libraries it requires.
- compile_command="$compile_command$dependency_libs"
- finalize_command="$finalize_command$dependency_libs"
- prev=
- continue
+ # We need an absolute path.
+ case $ladir in
+ [\\/]* | [A-Za-z]:[\\/]*) abs_ladir="$ladir" ;;
+ *)
+ abs_ladir=`cd "$ladir" && pwd`
+ if test -z "$abs_ladir"; then
+ $echo "$modename: warning: cannot determine absolute directory name of \`$ladir'" 1>&2
+ $echo "$modename: passing it literally to the linker, although it might fail" 1>&2
+ abs_ladir="$ladir"
fi
- fi
+ ;;
+ esac
+ laname=`$echo "X$lib" | $Xsed -e 's%^.*/%%'`
- # The library was specified with -dlpreopen.
- if test "$prev" = dlprefiles; then
+ # Find the relevant object directory and library name.
+ if test "X$installed" = Xyes; then
+ if test ! -f "$libdir/$linklib" && test -f "$abs_ladir/$linklib"; then
+ $echo "$modename: warning: library \`$lib' was moved." 1>&2
+ dir="$ladir"
+ absdir="$abs_ladir"
+ libdir="$abs_ladir"
+ else
+ dir="$libdir"
+ absdir="$libdir"
+ fi
+ else
+ dir="$ladir/$objdir"
+ absdir="$abs_ladir/$objdir"
+ # Remove this search path later
+ notinst_path="$notinst_path $abs_ladir"
+ fi # $installed = yes
+ name=`$echo "X$laname" | $Xsed -e 's/\.la$//' -e 's/^lib//'`
+
+ # This library was specified with -dlpreopen.
+ if test $pass = dlpreopen; then
+ if test -z "$libdir"; then
+ $echo "$modename: cannot -dlpreopen a convenience library: \`$lib'" 1>&2
+ exit 1
+ fi
# Prefer using a static library (so that no silly _DYNAMIC symbols
# are required to link).
if test -n "$old_library"; then
- dlprefiles="$dlprefiles $dir/$old_library"
+ newdlprefiles="$newdlprefiles $dir/$old_library"
+ # Otherwise, use the dlname, so that lt_dlopen finds it.
+ elif test -n "$dlname"; then
+ newdlprefiles="$newdlprefiles $dir/$dlname"
else
- dlprefiles="$dlprefiles $dir/$linklib"
+ newdlprefiles="$newdlprefiles $dir/$linklib"
fi
- prev=
+ fi # $pass = dlpreopen
+
+ if test -z "$libdir"; then
+ # Link the convenience library
+ if test $linkmode = lib; then
+ deplibs="$dir/$old_library $deplibs"
+ elif test "$linkmode,$pass" = "prog,link"; then
+ compile_deplibs="$dir/$old_library $compile_deplibs"
+ finalize_deplibs="$dir/$old_library $finalize_deplibs"
+ else
+ deplibs="$lib $deplibs"
+ fi
+ continue
fi
- if test -n "$library_names" &&
- { test "$prefer_static_libs" = no || test -z "$old_library"; }; then
- link_against_libtool_libs="$link_against_libtool_libs $arg"
- if test -n "$shlibpath_var"; then
- # Make sure the rpath contains only unique directories.
- case "$temp_rpath " in
- *" $dir "*) ;;
- *) temp_rpath="$temp_rpath $dir" ;;
- esac
+ if test $linkmode = prog && test $pass != link; then
+ newlib_search_path="$newlib_search_path $ladir"
+ deplibs="$lib $deplibs"
+
+ linkalldeplibs=no
+ if test "$link_all_deplibs" != no || test -z "$library_names" ||
+ test "$build_libtool_libs" = no; then
+ linkalldeplibs=yes
fi
- # We need an absolute path.
- case "$dir" in
- [\\/] | [A-Za-z]:[\\/]*) absdir="$dir" ;;
- *)
- absdir=`cd "$dir" && pwd`
- if test -z "$absdir"; then
- $echo "$modename: warning: cannot determine absolute directory name of \`$dir'" 1>&2
- $echo "$modename: passing it literally to the linker, although it might fail" 1>&2
- absdir="$dir"
+ tmp_libs=
+ for deplib in $dependency_libs; do
+ case $deplib in
+ -L*) newlib_search_path="$newlib_search_path "`$echo "X$deplib" | $Xsed -e 's/^-L//'`;; ### testsuite: skip nested quoting test
+ esac
+ # Need to link against all dependency_libs?
+ if test $linkalldeplibs = yes; then
+ deplibs="$deplib $deplibs"
+ else
+ # Need to hardcode shared library paths
+ # or/and link against static libraries
+ newdependency_libs="$deplib $newdependency_libs"
fi
- ;;
- esac
-
- # This is the magic to use -rpath.
- # Skip directories that are in the system default run-time
- # search path, unless they have been requested with -R.
- case " $sys_lib_dlsearch_path " in
- *" $absdir "*) ;;
- *)
- case "$compile_rpath " in
- *" $absdir "*) ;;
- *) compile_rpath="$compile_rpath $absdir"
+ case "$tmp_libs " in
+ *" $deplib "*) specialdeplibs="$specialdeplibs $deplib" ;;
esac
- ;;
- esac
+ tmp_libs="$tmp_libs $deplib"
+ done # for deplib
+ continue
+ fi # $linkmode = prog...
- case " $sys_lib_dlsearch_path " in
- *" $libdir "*) ;;
- *)
- case "$finalize_rpath " in
+ link_static=no # Whether the deplib will be linked statically
+ if test -n "$library_names" &&
+ { test "$prefer_static_libs" = no || test -z "$old_library"; }; then
+ # Link against this shared library
+
+ if test "$linkmode,$pass" = "prog,link" ||
+ { test $linkmode = lib && test $hardcode_into_libs = yes; }; then
+ # Hardcode the library path.
+ # Skip directories that are in the system default run-time
+ # search path.
+ case " $sys_lib_dlsearch_path " in
+ *" $absdir "*) ;;
+ *)
+ case "$compile_rpath " in
+ *" $absdir "*) ;;
+ *) compile_rpath="$compile_rpath $absdir"
+ esac
+ ;;
+ esac
+ case " $sys_lib_dlsearch_path " in
*" $libdir "*) ;;
- *) finalize_rpath="$finalize_rpath $libdir"
+ *)
+ case "$finalize_rpath " in
+ *" $libdir "*) ;;
+ *) finalize_rpath="$finalize_rpath $libdir"
+ esac
+ ;;
esac
- ;;
- esac
+ if test $linkmode = prog; then
+ # We need to hardcode the library path
+ if test -n "$shlibpath_var"; then
+ # Make sure the rpath contains only unique directories.
+ case "$temp_rpath " in
+ *" $dir "*) ;;
+ *" $absdir "*) ;;
+ *) temp_rpath="$temp_rpath $dir" ;;
+ esac
+ fi
+ fi
+ fi # $linkmode,$pass = prog,link...
- lib_linked=yes
- case "$hardcode_action" in
- immediate | unsupported)
- if test "$hardcode_direct" = no; then
- compile_command="$compile_command $dir/$linklib"
- deplibs="$deplibs $dir/$linklib"
- case "$host" in
- *-*-cygwin* | *-*-mingw* | *-*-os2*)
- dllsearchdir=`cd "$dir" && pwd || echo "$dir"`
- if test -n "$dllsearchpath"; then
- dllsearchpath="$dllsearchpath:$dllsearchdir"
- else
- dllsearchpath="$dllsearchdir"
- fi
- ;;
- esac
- elif test "$hardcode_minus_L" = no; then
- case "$host" in
- *-*-sunos*)
- compile_shlibpath="$compile_shlibpath$dir:"
+ if test "$alldeplibs" = yes &&
+ { test "$deplibs_check_method" = pass_all ||
+ { test "$build_libtool_libs" = yes &&
+ test -n "$library_names"; }; }; then
+ # We only need to search for static libraries
+ continue
+ fi
+
+ if test "$installed" = no; then
+ notinst_deplibs="$notinst_deplibs $lib"
+ need_relink=yes
+ fi
+
+ if test -n "$old_archive_from_expsyms_cmds"; then
+ # figure out the soname
+ set dummy $library_names
+ realname="$2"
+ shift; shift
+ libname=`eval \\$echo \"$libname_spec\"`
+ # use dlname if we got it. it's perfectly good, no?
+ if test -n "$dlname"; then
+ soname="$dlname"
+ elif test -n "$soname_spec"; then
+ # bleh windows
+ case $host in
+ *cygwin*)
+ major=`expr $current - $age`
+ versuffix="-$major"
;;
esac
- case "$compile_command " in
- *" -L$dir "*) ;;
- *) compile_command="$compile_command -L$dir";;
- esac
- compile_command="$compile_command -l$name"
- deplibs="$deplibs -L$dir -l$name"
- elif test "$hardcode_shlibpath_var" = no; then
- case ":$compile_shlibpath:" in
- *":$dir:"*) ;;
- *) compile_shlibpath="$compile_shlibpath$dir:";;
+ eval soname=\"$soname_spec\"
+ else
+ soname="$realname"
+ fi
+
+ # Make a new name for the extract_expsyms_cmds to use
+ soroot="$soname"
+ soname=`echo $soroot | sed -e 's/^.*\///'`
+ newlib="libimp-`echo $soname | sed 's/^lib//;s/\.dll$//'`.a"
+
+ # If the library has no export list, then create one now
+ if test -f "$output_objdir/$soname-def"; then :
+ else
+ $show "extracting exported symbol list from \`$soname'"
+ IFS="${IFS= }"; save_ifs="$IFS"; IFS='~'
+ eval cmds=\"$extract_expsyms_cmds\"
+ for cmd in $cmds; do
+ IFS="$save_ifs"
+ $show "$cmd"
+ $run eval "$cmd" || exit $?
+ done
+ IFS="$save_ifs"
+ fi
+
+ # Create $newlib
+ if test -f "$output_objdir/$newlib"; then :; else
+ $show "generating import library for \`$soname'"
+ IFS="${IFS= }"; save_ifs="$IFS"; IFS='~'
+ eval cmds=\"$old_archive_from_expsyms_cmds\"
+ for cmd in $cmds; do
+ IFS="$save_ifs"
+ $show "$cmd"
+ $run eval "$cmd" || exit $?
+ done
+ IFS="$save_ifs"
+ fi
+ # make sure the library variables are pointing to the new library
+ dir=$output_objdir
+ linklib=$newlib
+ fi # test -n $old_archive_from_expsyms_cmds
+
+ if test $linkmode = prog || test "$mode" != relink; then
+ add_shlibpath=
+ add_dir=
+ add=
+ lib_linked=yes
+ case $hardcode_action in
+ immediate | unsupported)
+ if test "$hardcode_direct" = no; then
+ add="$dir/$linklib"
+ elif test "$hardcode_minus_L" = no; then
+ case $host in
+ *-*-sunos*) add_shlibpath="$dir" ;;
+ esac
+ add_dir="-L$dir"
+ add="-l$name"
+ elif test "$hardcode_shlibpath_var" = no; then
+ add_shlibpath="$dir"
+ add="-l$name"
+ else
+ lib_linked=no
+ fi
+ ;;
+ relink)
+ if test "$hardcode_direct" = yes; then
+ add="$dir/$linklib"
+ elif test "$hardcode_minus_L" = yes; then
+ add_dir="-L$dir"
+ add="-l$name"
+ elif test "$hardcode_shlibpath_var" = yes; then
+ add_shlibpath="$dir"
+ add="-l$name"
+ else
+ lib_linked=no
+ fi
+ ;;
+ *) lib_linked=no ;;
+ esac
+
+ if test "$lib_linked" != yes; then
+ $echo "$modename: configuration error: unsupported hardcode properties"
+ exit 1
+ fi
+
+ if test -n "$add_shlibpath"; then
+ case :$compile_shlibpath: in
+ *":$add_shlibpath:"*) ;;
+ *) compile_shlibpath="$compile_shlibpath$add_shlibpath:" ;;
esac
- compile_command="$compile_command -l$name"
- deplibs="$deplibs -l$name"
+ fi
+ if test $linkmode = prog; then
+ test -n "$add_dir" && compile_deplibs="$add_dir $compile_deplibs"
+ test -n "$add" && compile_deplibs="$add $compile_deplibs"
else
- lib_linked=no
+ test -n "$add_dir" && deplibs="$add_dir $deplibs"
+ test -n "$add" && deplibs="$add $deplibs"
+ if test "$hardcode_direct" != yes && \
+ test "$hardcode_minus_L" != yes && \
+ test "$hardcode_shlibpath_var" = yes; then
+ case :$finalize_shlibpath: in
+ *":$libdir:"*) ;;
+ *) finalize_shlibpath="$finalize_shlibpath$libdir:" ;;
+ esac
+ fi
fi
- ;;
+ fi
- relink)
+ if test $linkmode = prog || test "$mode" = relink; then
+ add_shlibpath=
+ add_dir=
+ add=
+ # Finalize command for both is simple: just hardcode it.
if test "$hardcode_direct" = yes; then
- compile_command="$compile_command $absdir/$linklib"
- deplibs="$deplibs $absdir/$linklib"
+ add="$libdir/$linklib"
elif test "$hardcode_minus_L" = yes; then
- case "$compile_command " in
- *" -L$absdir "*) ;;
- *) compile_command="$compile_command -L$absdir";;
- esac
- compile_command="$compile_command -l$name"
- deplibs="$deplibs -L$absdir -l$name"
+ add_dir="-L$libdir"
+ add="-l$name"
elif test "$hardcode_shlibpath_var" = yes; then
- case ":$compile_shlibpath:" in
- *":$absdir:"*) ;;
- *) compile_shlibpath="$compile_shlibpath$absdir:";;
+ case :$finalize_shlibpath: in
+ *":$libdir:"*) ;;
+ *) finalize_shlibpath="$finalize_shlibpath$libdir:" ;;
esac
- compile_command="$compile_command -l$name"
- deplibs="$deplibs -l$name"
+ add="-l$name"
else
- lib_linked=no
+ # We cannot seem to hardcode it, guess we'll fake it.
+ add_dir="-L$libdir"
+ add="-l$name"
fi
- ;;
-
- *)
- lib_linked=no
- ;;
- esac
-
- if test "$lib_linked" != yes; then
- $echo "$modename: configuration error: unsupported hardcode properties"
- exit 1
- fi
- # Finalize command for both is simple: just hardcode it.
- if test "$hardcode_direct" = yes; then
- finalize_command="$finalize_command $libdir/$linklib"
- elif test "$hardcode_minus_L" = yes; then
- case "$finalize_command " in
- *" -L$libdir "*) ;;
- *) finalize_command="$finalize_command -L$libdir";;
- esac
- finalize_command="$finalize_command -l$name"
- elif test "$hardcode_shlibpath_var" = yes; then
- case ":$finalize_shlibpath:" in
- *":$libdir:"*) ;;
- *) finalize_shlibpath="$finalize_shlibpath$libdir:";;
- esac
- finalize_command="$finalize_command -l$name"
- else
- # We cannot seem to hardcode it, guess we'll fake it.
- case "$finalize_command " in
- *" -L$dir "*) ;;
- *) finalize_command="$finalize_command -L$libdir";;
- esac
- finalize_command="$finalize_command -l$name"
+ if test $linkmode = prog; then
+ test -n "$add_dir" && finalize_deplibs="$add_dir $finalize_deplibs"
+ test -n "$add" && finalize_deplibs="$add $finalize_deplibs"
+ else
+ test -n "$add_dir" && deplibs="$add_dir $deplibs"
+ test -n "$add" && deplibs="$add $deplibs"
+ fi
fi
- else
- # Transform directly to old archives if we don't build new libraries.
- if test -n "$pic_flag" && test -z "$old_library"; then
- $echo "$modename: cannot find static library for \`$arg'" 1>&2
- exit 1
+ elif test $linkmode = prog; then
+ if test "$alldeplibs" = yes &&
+ { test "$deplibs_check_method" = pass_all ||
+ { test "$build_libtool_libs" = yes &&
+ test -n "$library_names"; }; }; then
+ # We only need to search for static libraries
+ continue
fi
+ # Try to link the static library
# Here we assume that one of hardcode_direct or hardcode_minus_L
# is not unsupported. This is valid on all known static and
# shared platforms.
if test "$hardcode_direct" != unsupported; then
test -n "$old_library" && linklib="$old_library"
- compile_command="$compile_command $dir/$linklib"
- finalize_command="$finalize_command $dir/$linklib"
+ compile_deplibs="$dir/$linklib $compile_deplibs"
+ finalize_deplibs="$dir/$linklib $finalize_deplibs"
else
- case "$compile_command " in
- *" -L$dir "*) ;;
- *) compile_command="$compile_command -L$dir";;
- esac
- compile_command="$compile_command -l$name"
- case "$finalize_command " in
- *" -L$dir "*) ;;
- *) finalize_command="$finalize_command -L$dir";;
- esac
- finalize_command="$finalize_command -l$name"
+ compile_deplibs="-l$name -L$dir $compile_deplibs"
+ finalize_deplibs="-l$name -L$dir $finalize_deplibs"
+ fi
+ elif test "$build_libtool_libs" = yes; then
+ # Not a shared library
+ if test "$deplibs_check_method" != pass_all; then
+ # We're trying link a shared library against a static one
+ # but the system doesn't support it.
+
+ # Just print a warning and add the library to dependency_libs so
+ # that the program can be linked against the static library.
+ echo
+ echo "*** Warning: This library needs some functionality provided by $lib."
+ echo "*** I have the capability to make that library automatically link in when"
+ echo "*** you link to this library. But I can only do this if you have a"
+ echo "*** shared version of the library, which you do not appear to have."
+ if test "$module" = yes; then
+ echo "*** Therefore, libtool will create a static module, that should work "
+ echo "*** as long as the dlopening application is linked with the -dlopen flag."
+ if test -z "$global_symbol_pipe"; then
+ echo
+ echo "*** However, this would only work if libtool was able to extract symbol"
+ echo "*** lists from a program, using \`nm' or equivalent, but libtool could"
+ echo "*** not find such a program. So, this module is probably useless."
+ echo "*** \`nm' from GNU binutils and a full rebuild may help."
+ fi
+ if test "$build_old_libs" = no; then
+ build_libtool_libs=module
+ build_old_libs=yes
+ else
+ build_libtool_libs=no
+ fi
+ fi
+ else
+ convenience="$convenience $dir/$old_library"
+ old_convenience="$old_convenience $dir/$old_library"
+ deplibs="$dir/$old_library $deplibs"
+ link_static=yes
+ fi
+ fi # link shared/static library?
+
+ if test $linkmode = lib; then
+ if test -n "$dependency_libs" &&
+ { test $hardcode_into_libs != yes || test $build_old_libs = yes ||
+ test $link_static = yes; }; then
+ # Extract -R from dependency_libs
+ temp_deplibs=
+ for libdir in $dependency_libs; do
+ case $libdir in
+ -R*) temp_xrpath=`$echo "X$libdir" | $Xsed -e 's/^-R//'`
+ case " $xrpath " in
+ *" $temp_xrpath "*) ;;
+ *) xrpath="$xrpath $temp_xrpath";;
+ esac;;
+ *) temp_deplibs="$temp_deplibs $libdir";;
+ esac
+ done
+ dependency_libs="$temp_deplibs"
fi
- fi
-
- # Add in any libraries that this one depends upon.
- compile_command="$compile_command$dependency_libs"
- finalize_command="$finalize_command$dependency_libs"
- continue
- ;;
- # Some other compiler argument.
- *)
- # Unknown arguments in both finalize_command and compile_command need
- # to be aesthetically quoted because they are evaled later.
- arg=`$echo "X$arg" | $Xsed -e "$sed_quote_subst"`
- case "$arg" in
- *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \ ]*|*]*)
- arg="\"$arg\""
- ;;
- esac
- ;;
- esac
+ newlib_search_path="$newlib_search_path $absdir"
+ # Link against this library
+ test "$link_static" = no && newdependency_libs="$abs_ladir/$laname $newdependency_libs"
+ # ... and its dependency_libs
+ tmp_libs=
+ for deplib in $dependency_libs; do
+ newdependency_libs="$deplib $newdependency_libs"
+ case "$tmp_libs " in
+ *" $deplib "*) specialdeplibs="$specialdeplibs $deplib" ;;
+ esac
+ tmp_libs="$tmp_libs $deplib"
+ done
- # Now actually substitute the argument into the commands.
- if test -n "$arg"; then
- compile_command="$compile_command $arg"
- finalize_command="$finalize_command $arg"
+ if test $link_all_deplibs != no; then
+ # Add the search paths of all dependency libraries
+ for deplib in $dependency_libs; do
+ case $deplib in
+ -L*) path="$deplib" ;;
+ *.la)
+ dir=`$echo "X$deplib" | $Xsed -e 's%/[^/]*$%%'`
+ test "X$dir" = "X$deplib" && dir="."
+ # We need an absolute path.
+ case $dir in
+ [\\/]* | [A-Za-z]:[\\/]*) absdir="$dir" ;;
+ *)
+ absdir=`cd "$dir" && pwd`
+ if test -z "$absdir"; then
+ $echo "$modename: warning: cannot determine absolute directory name of \`$dir'" 1>&2
+ absdir="$dir"
+ fi
+ ;;
+ esac
+ if grep "^installed=no" $deplib > /dev/null; then
+ path="-L$absdir/$objdir"
+ else
+ eval libdir=`sed -n -e 's/^libdir=\(.*\)$/\1/p' $deplib`
+ if test -z "$libdir"; then
+ $echo "$modename: \`$deplib' is not a valid libtool archive" 1>&2
+ exit 1
+ fi
+ if test "$absdir" != "$libdir"; then
+ $echo "$modename: warning: \`$deplib' seems to be moved" 1>&2
+ fi
+ path="-L$absdir"
+ fi
+ ;;
+ *) continue ;;
+ esac
+ case " $deplibs " in
+ *" $path "*) ;;
+ *) deplibs="$deplibs $path" ;;
+ esac
+ done
+ fi # link_all_deplibs != no
+ fi # linkmode = lib
+ done # for deplib in $libs
+ if test $pass = dlpreopen; then
+ # Link the dlpreopened libraries before other libraries
+ for deplib in $save_deplibs; do
+ deplibs="$deplib $deplibs"
+ done
fi
- done
-
- if test -n "$prev"; then
- $echo "$modename: the \`$prevarg' option requires an argument" 1>&2
- $echo "$help" 1>&2
- exit 1
- fi
-
- if test "$export_dynamic" = yes && test -n "$export_dynamic_flag_spec"; then
- eval arg=\"$export_dynamic_flag_spec\"
- compile_command="$compile_command $arg"
- finalize_command="$finalize_command $arg"
- fi
-
- oldlibs=
- # calculate the name of the file, without its directory
- outputname=`$echo "X$output" | $Xsed -e 's%^.*/%%'`
- libobjs_save="$libobjs"
-
- case "$output" in
- "")
- $echo "$modename: you must specify an output file" 1>&2
- $echo "$help" 1>&2
- exit 1
- ;;
+ if test $pass != dlopen; then
+ test $pass != scan && dependency_libs="$newdependency_libs"
+ if test $pass != conv; then
+ # Make sure lib_search_path contains only unique directories.
+ lib_search_path=
+ for dir in $newlib_search_path; do
+ case "$lib_search_path " in
+ *" $dir "*) ;;
+ *) lib_search_path="$lib_search_path $dir" ;;
+ esac
+ done
+ newlib_search_path=
+ fi
- *.a | *.lib)
- if test -n "$link_against_libtool_libs"; then
- $echo "$modename: error: cannot link libtool libraries into archives" 1>&2
- exit 1
+ if test "$linkmode,$pass" != "prog,link"; then
+ vars="deplibs"
+ else
+ vars="compile_deplibs finalize_deplibs"
+ fi
+ for var in $vars dependency_libs; do
+ # Add libraries to $var in reverse order
+ eval tmp_libs=\"\$$var\"
+ new_libs=
+ for deplib in $tmp_libs; do
+ case $deplib in
+ -L*) new_libs="$deplib $new_libs" ;;
+ *)
+ case " $specialdeplibs " in
+ *" $deplib "*) new_libs="$deplib $new_libs" ;;
+ *)
+ case " $new_libs " in
+ *" $deplib "*) ;;
+ *) new_libs="$deplib $new_libs" ;;
+ esac
+ ;;
+ esac
+ ;;
+ esac
+ done
+ tmp_libs=
+ for deplib in $new_libs; do
+ case $deplib in
+ -L*)
+ case " $tmp_libs " in
+ *" $deplib "*) ;;
+ *) tmp_libs="$tmp_libs $deplib" ;;
+ esac
+ ;;
+ *) tmp_libs="$tmp_libs $deplib" ;;
+ esac
+ done
+ eval $var=\"$tmp_libs\"
+ done # for var
fi
-
- if test -n "$deplibs"; then
- $echo "$modename: warning: \`-l' and \`-L' are ignored for archives" 1>&2
+ if test "$pass" = "conv" &&
+ { test "$linkmode" = "lib" || test "$linkmode" = "prog"; }; then
+ libs="$deplibs" # reset libs
+ deplibs=
fi
+ done # for pass
+ if test $linkmode = prog; then
+ dlfiles="$newdlfiles"
+ dlprefiles="$newdlprefiles"
+ fi
+ case $linkmode in
+ oldlib)
if test -n "$dlfiles$dlprefiles" || test "$dlself" != no; then
$echo "$modename: warning: \`-dlopen' is ignored for archives" 1>&2
fi
@@ -1566,11 +2107,12 @@ compiler."
# Now set the variables for building old libraries.
build_libtool_libs=no
oldlibs="$output"
+ objs="$objs$old_deplibs"
;;
- *.la)
+ lib)
# Make sure we only generate libraries of the form `libNAME.la'.
- case "$outputname" in
+ case $outputname in
lib*)
name=`$echo "X$outputname" | $Xsed -e 's/\.la$//' -e 's/^lib//'`
eval libname=\"$libname_spec\"
@@ -1591,26 +2133,20 @@ compiler."
;;
esac
- output_objdir=`$echo "X$output" | $Xsed -e 's%/[^/]*$%%'`
- if test "X$output_objdir" = "X$output"; then
- output_objdir="$objdir"
- else
- output_objdir="$output_objdir/$objdir"
- fi
-
if test -n "$objs"; then
- $echo "$modename: cannot build libtool library \`$output' from non-libtool objects:$objs" 2>&1
- exit 1
- fi
-
- # How the heck are we supposed to write a wrapper for a shared library?
- if test -n "$link_against_libtool_libs"; then
- $echo "$modename: error: cannot link shared libraries into libtool libraries" 1>&2
- exit 1
+ if test "$deplibs_check_method" != pass_all; then
+ $echo "$modename: cannot build libtool library \`$output' from non-libtool objects on this host:$objs" 2>&1
+ exit 1
+ else
+ echo
+ echo "*** Warning: Linking the shared library $output against the non-libtool"
+ echo "*** objects $objs is not portable!"
+ libobjs="$libobjs $objs"
+ fi
fi
- if test -n "$dlfiles$dlprefiles" || test "$dlself" != no; then
- $echo "$modename: warning: \`-dlopen' is ignored for libtool libraries" 1>&2
+ if test "$dlself" != no; then
+ $echo "$modename: warning: \`-dlopen self' is ignored for libtool libraries" 1>&2
fi
set dummy $rpath
@@ -1628,7 +2164,6 @@ compiler."
build_libtool_libs=convenience
build_old_libs=yes
fi
- dependency_libs="$deplibs"
if test -n "$vinfo"; then
$echo "$modename: warning: \`-version-info' is ignored for convenience libraries" 1>&2
@@ -1655,8 +2190,8 @@ compiler."
age="$4"
# Check that each of the things are valid numbers.
- case "$current" in
- 0 | [1-9] | [1-9][0-9]*) ;;
+ case $current in
+ 0 | [1-9] | [1-9][0-9] | [1-9][0-9][0-9]) ;;
*)
$echo "$modename: CURRENT \`$current' is not a nonnegative integer" 1>&2
$echo "$modename: \`$vinfo' is not valid version information" 1>&2
@@ -1664,8 +2199,8 @@ compiler."
;;
esac
- case "$revision" in
- 0 | [1-9] | [1-9][0-9]*) ;;
+ case $revision in
+ 0 | [1-9] | [1-9][0-9] | [1-9][0-9][0-9]) ;;
*)
$echo "$modename: REVISION \`$revision' is not a nonnegative integer" 1>&2
$echo "$modename: \`$vinfo' is not valid version information" 1>&2
@@ -1673,8 +2208,8 @@ compiler."
;;
esac
- case "$age" in
- 0 | [1-9] | [1-9][0-9]*) ;;
+ case $age in
+ 0 | [1-9] | [1-9][0-9] | [1-9][0-9][0-9]) ;;
*)
$echo "$modename: AGE \`$age' is not a nonnegative integer" 1>&2
$echo "$modename: \`$vinfo' is not valid version information" 1>&2
@@ -1692,12 +2227,31 @@ compiler."
major=
versuffix=
verstring=
- case "$version_type" in
+ case $version_type in
none) ;;
+ darwin)
+ # Like Linux, but with the current version available in
+ # verstring for coding it into the library header
+ major=.`expr $current - $age`
+ versuffix="$major.$age.$revision"
+ # Darwin ld doesn't like 0 for these options...
+ minor_current=`expr $current + 1`
+ verstring="-compatibility_version $minor_current -current_version $minor_current.$revision"
+ ;;
+
+ freebsd-aout)
+ major=".$current"
+ versuffix=".$current.$revision";
+ ;;
+
+ freebsd-elf)
+ major=".$current"
+ versuffix=".$current";
+ ;;
+
irix)
major=`expr $current - $age + 1`
- versuffix="$major.$revision"
verstring="sgi$major.$revision"
# Add in all the interfaces that we are compatible with.
@@ -1707,6 +2261,10 @@ compiler."
loop=`expr $loop - 1`
verstring="sgi$major.$iface:$verstring"
done
+
+ # Before this point, $major must not contain `.'.
+ major=.$major
+ versuffix="$major.$revision"
;;
linux)
@@ -1736,21 +2294,11 @@ compiler."
versuffix=".$current.$revision"
;;
- freebsd-aout)
- major=".$current"
- versuffix=".$current.$revision";
- ;;
-
- freebsd-elf)
- major=".$current"
- versuffix=".$current";
- ;;
-
windows)
- # Like Linux, but with '-' rather than '.', since we only
- # want one extension on Windows 95.
+ # Use '-' rather than '.', since we only want one
+ # extension on DOS 8.3 filesystems.
major=`expr $current - $age`
- versuffix="-$major-$age-$revision"
+ versuffix="-$major"
;;
*)
@@ -1777,7 +2325,7 @@ compiler."
versuffix=
verstring=""
fi
-
+
# Check to see if the archive will have undefined symbols.
if test "$allow_undefined" = yes; then
if test "$allow_undefined_flag" = unsupported; then
@@ -1789,30 +2337,12 @@ compiler."
# Don't allow undefined symbols.
allow_undefined_flag="$no_undefined_flag"
fi
-
- dependency_libs="$deplibs"
- case "$host" in
- *-*-cygwin* | *-*-mingw* | *-*-os2* | *-*-beos*)
- # these systems don't actually have a c library (as such)!
- ;;
- *)
- # Add libc to deplibs on all other systems.
- deplibs="$deplibs -lc"
- ;;
- esac
fi
- # Create the output directory, or remove our outputs if we need to.
- if test -d $output_objdir; then
+ if test "$mode" != relink; then
+ # Remove our outputs.
$show "${rm}r $output_objdir/$outputname $output_objdir/$libname.* $output_objdir/${libname}${release}.*"
$run ${rm}r $output_objdir/$outputname $output_objdir/$libname.* $output_objdir/${libname}${release}.*
- else
- $show "$mkdir $output_objdir"
- $run $mkdir $output_objdir
- status=$?
- if test $status -ne 0 && test ! -d $output_objdir; then
- exit $status
- fi
fi
# Now set the variables for building old libraries.
@@ -1823,7 +2353,70 @@ compiler."
oldobjs="$objs "`$echo "X$libobjs" | $SP2NL | $Xsed -e '/\.'${libext}'$/d' -e "$lo2o" | $NL2SP`
fi
+ # Eliminate all temporary directories.
+ for path in $notinst_path; do
+ lib_search_path=`echo "$lib_search_path " | sed -e 's% $path % %g'`
+ deplibs=`echo "$deplibs " | sed -e 's% -L$path % %g'`
+ dependency_libs=`echo "$dependency_libs " | sed -e 's% -L$path % %g'`
+ done
+
+ if test -n "$xrpath"; then
+ # If the user specified any rpath flags, then add them.
+ temp_xrpath=
+ for libdir in $xrpath; do
+ temp_xrpath="$temp_xrpath -R$libdir"
+ case "$finalize_rpath " in
+ *" $libdir "*) ;;
+ *) finalize_rpath="$finalize_rpath $libdir" ;;
+ esac
+ done
+ if test $hardcode_into_libs != yes || test $build_old_libs = yes; then
+ dependency_libs="$temp_xrpath $dependency_libs"
+ fi
+ fi
+
+ # Make sure dlfiles contains only unique files that won't be dlpreopened
+ old_dlfiles="$dlfiles"
+ dlfiles=
+ for lib in $old_dlfiles; do
+ case " $dlprefiles $dlfiles " in
+ *" $lib "*) ;;
+ *) dlfiles="$dlfiles $lib" ;;
+ esac
+ done
+
+ # Make sure dlprefiles contains only unique files
+ old_dlprefiles="$dlprefiles"
+ dlprefiles=
+ for lib in $old_dlprefiles; do
+ case "$dlprefiles " in
+ *" $lib "*) ;;
+ *) dlprefiles="$dlprefiles $lib" ;;
+ esac
+ done
+
if test "$build_libtool_libs" = yes; then
+ if test -n "$rpath"; then
+ case $host in
+ *-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-os2* | *-*-beos*)
+ # these systems don't actually have a c library (as such)!
+ ;;
+ *-*-rhapsody* | *-*-darwin1.[012])
+ # Rhapsody C library is in the System framework
+ deplibs="$deplibs -framework System"
+ ;;
+ *-*-netbsd*)
+ # Don't link with libc until the a.out ld.so is fixed.
+ ;;
+ *)
+ # Add libc to deplibs on all other systems if necessary.
+ if test $build_libtool_need_lc = "yes"; then
+ deplibs="$deplibs -lc"
+ fi
+ ;;
+ esac
+ fi
+
# Transform deplibs into only deplibs that can be linked in shared.
name_save=$name
libname_save=$libname
@@ -1838,7 +2431,7 @@ compiler."
major=""
newdeplibs=
droppeddeps=no
- case "$deplibs_check_method" in
+ case $deplibs_check_method in
pass_all)
# Don't check for shared/static. Everything works.
# This might be a little naive. We might want to check
@@ -1863,7 +2456,7 @@ EOF
for i in $deplibs; do
name="`expr $i : '-l\(.*\)'`"
# If $name is empty we are operating on a -L argument.
- if test "$name" != "" ; then
+ if test -n "$name" && test "$name" != "0"; then
libname=`eval \\$echo \"$libname_spec\"`
deplib_matches=`eval \\$echo \"$library_names_spec\"`
set dummy $deplib_matches
@@ -1888,7 +2481,7 @@ EOF
for i in $deplibs; do
name="`expr $i : '-l\(.*\)'`"
# If $name is empty we are operating on a -L argument.
- if test "$name" != "" ; then
+ if test -n "$name" && test "$name" != "0"; then
$rm conftest
$CC -o conftest conftest.c $i
# Did it work?
@@ -1924,19 +2517,19 @@ EOF
;;
file_magic*)
set dummy $deplibs_check_method
- file_magic_regex="`expr \"$deplibs_check_method\" : \"$2 \(.*\)\"`"
+ file_magic_regex=`expr "$deplibs_check_method" : "$2 \(.*\)"`
for a_deplib in $deplibs; do
name="`expr $a_deplib : '-l\(.*\)'`"
# If $name is empty we are operating on a -L argument.
- if test "$name" != "" ; then
+ if test -n "$name" && test "$name" != "0"; then
libname=`eval \\$echo \"$libname_spec\"`
- for i in $lib_search_path; do
+ for i in $lib_search_path $sys_lib_search_path $shlib_search_path; do
potential_libs=`ls $i/$libname[.-]* 2>/dev/null`
for potent_lib in $potential_libs; do
# Follow soft links.
if ls -lLd "$potent_lib" 2>/dev/null \
| grep " -> " >/dev/null; then
- continue
+ continue
fi
# The statement above tries to avoid entering an
# endless loop below, in case of cyclic links.
@@ -1946,7 +2539,7 @@ EOF
potlib="$potent_lib"
while test -h "$potlib" 2>/dev/null; do
potliblink=`ls -ld $potlib | sed 's/.* -> //'`
- case "$potliblink" in
+ case $potliblink in
[\\/]* | [A-Za-z]:[\\/]*) potlib="$potliblink";;
*) potlib=`$echo "X$potlib" | $Xsed -e 's,[^/]*$,,'`"$potliblink";;
esac
@@ -1974,6 +2567,40 @@ EOF
fi
done # Gone through all deplibs.
;;
+ match_pattern*)
+ set dummy $deplibs_check_method
+ match_pattern_regex=`expr "$deplibs_check_method" : "$2 \(.*\)"`
+ for a_deplib in $deplibs; do
+ name="`expr $a_deplib : '-l\(.*\)'`"
+ # If $name is empty we are operating on a -L argument.
+ if test -n "$name" && test "$name" != "0"; then
+ libname=`eval \\$echo \"$libname_spec\"`
+ for i in $lib_search_path $sys_lib_search_path $shlib_search_path; do
+ potential_libs=`ls $i/$libname[.-]* 2>/dev/null`
+ for potent_lib in $potential_libs; do
+ if eval echo \"$potent_lib\" 2>/dev/null \
+ | sed 10q \
+ | egrep "$match_pattern_regex" > /dev/null; then
+ newdeplibs="$newdeplibs $a_deplib"
+ a_deplib=""
+ break 2
+ fi
+ done
+ done
+ if test -n "$a_deplib" ; then
+ droppeddeps=yes
+ echo
+ echo "*** Warning: This library needs some functionality provided by $a_deplib."
+ echo "*** I have the capability to make that library automatically link in when"
+ echo "*** you link to this library. But I can only do this if you have a"
+ echo "*** shared version of the library, which you do not appear to have."
+ fi
+ else
+ # Add a -L argument.
+ newdeplibs="$newdeplibs $a_deplib"
+ fi
+ done # Gone through all deplibs.
+ ;;
none | unknown | *)
newdeplibs=""
if $echo "X $deplibs" | $Xsed -e 's/ -lc$//' \
@@ -1996,6 +2623,13 @@ EOF
libname=$libname_save
name=$name_save
+ case $host in
+ *-*-rhapsody* | *-*-darwin1.[012])
+ # On Rhapsody replace the C library is the System framework
+ newdeplibs=`$echo "X $newdeplibs" | $Xsed -e 's/ -lc / -framework System /'`
+ ;;
+ esac
+
if test "$droppeddeps" = yes; then
if test "$module" = yes; then
echo
@@ -2021,6 +2655,21 @@ EOF
echo "*** The inter-library dependencies that have been dropped here will be"
echo "*** automatically added whenever a program is linked with this library"
echo "*** or is declared to -dlopen it."
+
+ if test $allow_undefined = no; then
+ echo
+ echo "*** Since this library must not contain undefined symbols,"
+ echo "*** because either the platform does not support them or"
+ echo "*** it was explicitly requested with -no-undefined,"
+ echo "*** libtool will only create a static version of it."
+ if test "$build_old_libs" = no; then
+ oldlibs="$output_objdir/$libname.$libext"
+ build_libtool_libs=module
+ build_old_libs=yes
+ else
+ build_libtool_libs=no
+ fi
+ fi
fi
fi
# Done checking deplibs!
@@ -2031,9 +2680,64 @@ EOF
library_names=
old_library=
dlname=
-
+
# Test again, we may have decided not to build it any more
if test "$build_libtool_libs" = yes; then
+ if test $hardcode_into_libs = yes; then
+ # Hardcode the library paths
+ hardcode_libdirs=
+ dep_rpath=
+ rpath="$finalize_rpath"
+ test "$mode" != relink && rpath="$compile_rpath$rpath"
+ for libdir in $rpath; do
+ if test -n "$hardcode_libdir_flag_spec"; then
+ if test -n "$hardcode_libdir_separator"; then
+ if test -z "$hardcode_libdirs"; then
+ hardcode_libdirs="$libdir"
+ else
+ # Just accumulate the unique libdirs.
+ case $hardcode_libdir_separator$hardcode_libdirs$hardcode_libdir_separator in
+ *"$hardcode_libdir_separator$libdir$hardcode_libdir_separator"*)
+ ;;
+ *)
+ hardcode_libdirs="$hardcode_libdirs$hardcode_libdir_separator$libdir"
+ ;;
+ esac
+ fi
+ else
+ eval flag=\"$hardcode_libdir_flag_spec\"
+ dep_rpath="$dep_rpath $flag"
+ fi
+ elif test -n "$runpath_var"; then
+ case "$perm_rpath " in
+ *" $libdir "*) ;;
+ *) perm_rpath="$perm_rpath $libdir" ;;
+ esac
+ fi
+ done
+ # Substitute the hardcoded libdirs into the rpath.
+ if test -n "$hardcode_libdir_separator" &&
+ test -n "$hardcode_libdirs"; then
+ libdir="$hardcode_libdirs"
+ eval dep_rpath=\"$hardcode_libdir_flag_spec\"
+ fi
+ if test -n "$runpath_var" && test -n "$perm_rpath"; then
+ # We should set the runpath_var.
+ rpath=
+ for dir in $perm_rpath; do
+ rpath="$rpath$dir:"
+ done
+ eval "$runpath_var='$rpath\$$runpath_var'; export $runpath_var"
+ fi
+ test -n "$dep_rpath" && deplibs="$dep_rpath $deplibs"
+ fi
+
+ shlibpath="$finalize_shlibpath"
+ test "$mode" != relink && shlibpath="$compile_shlibpath$shlibpath"
+ if test -n "$shlibpath"; then
+ eval "$shlibpath_var='$shlibpath\$$shlibpath_var'; export $shlibpath_var"
+ fi
+
# Get the real and link names of the library.
eval library_names=\"$library_names_spec\"
set dummy $library_names
@@ -2045,6 +2749,7 @@ EOF
else
soname="$realname"
fi
+ test -z "$dlname" && dlname=$soname
lib="$output_objdir/$realname"
for link
@@ -2116,7 +2821,7 @@ EOF
for xlib in $convenience; do
# Extract the objects.
- case "$xlib" in
+ case $xlib in
[\\/]* | [A-Za-z]:[\\/]*) xabs="$xlib" ;;
*) xabs=`pwd`"/$xlib" ;;
esac
@@ -2141,7 +2846,12 @@ EOF
if test "$thread_safe" = yes && test -n "$thread_safe_flag_spec"; then
eval flag=\"$thread_safe_flag_spec\"
- linkopts="$linkopts $flag"
+ linker_flags="$linker_flags $flag"
+ fi
+
+ # Make a backup of the uninstalled library when relinking
+ if test "$mode" = relink; then
+ $run eval '(cd $output_objdir && $rm ${realname}U && $mv $realname ${realname}U)' || exit $?
fi
# Do each of the archive commands.
@@ -2158,6 +2868,12 @@ EOF
done
IFS="$save_ifs"
+ # Restore the uninstalled library and exit
+ if test "$mode" = relink; then
+ $run eval '(cd $output_objdir && $rm ${realname}T && $mv $realname ${realname}T && $mv "$realname"U $realname)' || exit $?
+ exit 0
+ fi
+
# Create links to the real library.
for linkname in $linknames; do
if test "$realname" != "$linkname"; then
@@ -2174,12 +2890,7 @@ EOF
fi
;;
- *.lo | *.o | *.obj)
- if test -n "$link_against_libtool_libs"; then
- $echo "$modename: error: cannot link libtool libraries into objects" 1>&2
- exit 1
- fi
-
+ obj)
if test -n "$deplibs"; then
$echo "$modename: warning: \`-l' and \`-L' are ignored for objects" 1>&2
fi
@@ -2204,9 +2915,9 @@ EOF
$echo "$modename: warning: \`-release' is ignored for objects" 1>&2
fi
- case "$output" in
+ case $output in
*.lo)
- if test -n "$objs"; then
+ if test -n "$objs$old_deplibs"; then
$echo "$modename: cannot build library object \`$output' from non-libtool objects" 1>&2
exit 1
fi
@@ -2230,7 +2941,7 @@ EOF
gentop=
# reload_cmds runs $LD directly, so let us get rid of
# -Wl from whole_archive_flag_spec
- wl=
+ wl=
if test -n "$convenience"; then
if test -n "$whole_archive_flag_spec"; then
@@ -2249,7 +2960,7 @@ EOF
for xlib in $convenience; do
# Extract the objects.
- case "$xlib" in
+ case $xlib in
[\\/]* | [A-Za-z]:[\\/]*) xabs="$xlib" ;;
*) xabs=`pwd`"/$xlib" ;;
esac
@@ -2273,7 +2984,7 @@ EOF
fi
# Create the old-style object.
- reload_objs="$objs "`$echo "X$libobjs" | $SP2NL | $Xsed -e '/\.'${libext}$'/d' -e '/\.lib$/d' -e "$lo2o" | $NL2SP`" $reload_conv_objs"
+ reload_objs="$objs$old_deplibs "`$echo "X$libobjs" | $SP2NL | $Xsed -e '/\.'${libext}$'/d' -e '/\.lib$/d' -e "$lo2o" | $NL2SP`" $reload_conv_objs" ### testsuite: skip nested quoting test
output="$obj"
eval cmds=\"$reload_cmds\"
@@ -2308,7 +3019,7 @@ EOF
exit 0
fi
- if test -n "$pic_flag"; then
+ if test -n "$pic_flag" || test "$pic_mode" != default; then
# Only do commands if we really have different PIC objects.
reload_objs="$libobjs $reload_conv_objs"
output="$libobj"
@@ -2344,8 +3055,10 @@ EOF
exit 0
;;
- # Anything else should be a program.
- *)
+ prog)
+ case $host in
+ *cygwin*) output=`echo $output | sed -e 's,.exe$,,;s,$,.exe,'` ;;
+ esac
if test -n "$vinfo"; then
$echo "$modename: warning: \`-version-info' is ignored for programs" 1>&2
fi
@@ -2355,20 +3068,27 @@ EOF
fi
if test "$preload" = yes; then
- if test "$dlopen" = unknown && test "$dlopen_self" = unknown &&
+ if test "$dlopen_support" = unknown && test "$dlopen_self" = unknown &&
test "$dlopen_self_static" = unknown; then
$echo "$modename: warning: \`AC_LIBTOOL_DLOPEN' not used. Assuming no dlopen support."
- fi
+ fi
fi
-
+
+ case $host in
+ *-*-rhapsody* | *-*-darwin1.[012])
+ # On Rhapsody replace the C library is the System framework
+ compile_deplibs=`$echo "X $compile_deplibs" | $Xsed -e 's/ -lc / -framework System /'`
+ finalize_deplibs=`$echo "X $finalize_deplibs" | $Xsed -e 's/ -lc / -framework System /'`
+ ;;
+ esac
+
+ compile_command="$compile_command $compile_deplibs"
+ finalize_command="$finalize_command $finalize_deplibs"
+
if test -n "$rpath$xrpath"; then
# If the user specified any rpath flags, then add them.
for libdir in $rpath $xrpath; do
# This is the magic to use -rpath.
- case "$compile_rpath " in
- *" $libdir "*) ;;
- *) compile_rpath="$compile_rpath $libdir" ;;
- esac
case "$finalize_rpath " in
*" $libdir "*) ;;
*) finalize_rpath="$finalize_rpath $libdir" ;;
@@ -2386,7 +3106,7 @@ EOF
hardcode_libdirs="$libdir"
else
# Just accumulate the unique libdirs.
- case "$hardcode_libdir_separator$hardcode_libdirs$hardcode_libdir_separator" in
+ case $hardcode_libdir_separator$hardcode_libdirs$hardcode_libdir_separator in
*"$hardcode_libdir_separator$libdir$hardcode_libdir_separator"*)
;;
*)
@@ -2404,6 +3124,14 @@ EOF
*) perm_rpath="$perm_rpath $libdir" ;;
esac
fi
+ case $host in
+ *-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-os2*)
+ case :$dllsearchpath: in
+ *":$libdir:"*) ;;
+ *) dllsearchpath="$dllsearchpath:$libdir";;
+ esac
+ ;;
+ esac
done
# Substitute the hardcoded libdirs into the rpath.
if test -n "$hardcode_libdir_separator" &&
@@ -2422,7 +3150,7 @@ EOF
hardcode_libdirs="$libdir"
else
# Just accumulate the unique libdirs.
- case "$hardcode_libdir_separator$hardcode_libdirs$hardcode_libdir_separator" in
+ case $hardcode_libdir_separator$hardcode_libdirs$hardcode_libdir_separator in
*"$hardcode_libdir_separator$libdir$hardcode_libdir_separator"*)
;;
*)
@@ -2449,23 +3177,6 @@ EOF
fi
finalize_rpath="$rpath"
- output_objdir=`$echo "X$output" | $Xsed -e 's%/[^/]*$%%'`
- if test "X$output_objdir" = "X$output"; then
- output_objdir="$objdir"
- else
- output_objdir="$output_objdir/$objdir"
- fi
-
- # Create the binary in the object directory, then wrap it.
- if test ! -d $output_objdir; then
- $show "$mkdir $output_objdir"
- $run $mkdir $output_objdir
- status=$?
- if test $status -ne 0 && test ! -d $output_objdir; then
- exit $status
- fi
- fi
-
if test -n "$libobjs" && test "$build_old_libs" = yes; then
# Transform all the library objects into standard objects.
compile_command=`$echo "X$compile_command" | $SP2NL | $Xsed -e "$lo2o" | $NL2SP`
@@ -2482,7 +3193,7 @@ EOF
fi
if test -n "$dlsyms"; then
- case "$dlsyms" in
+ case $dlsyms in
"") ;;
*.c)
# Discover the nlist of each of the dlfiles.
@@ -2514,7 +3225,7 @@ extern \"C\" {
test -z "$run" && $echo ': @PROGRAM@ ' > "$nlist"
# Add our own program objects to the symbol list.
- progfiles=`$echo "X$objs" | $SP2NL | $Xsed -e "$lo2o" | $NL2SP`
+ progfiles=`$echo "X$objs$old_deplibs" | $SP2NL | $Xsed -e "$lo2o" | $NL2SP`
for arg in $progfiles; do
$show "extracting global C symbols from \`$arg'"
$run eval "$NM $arg | $global_symbol_pipe >> '$nlist'"
@@ -2524,7 +3235,7 @@ extern \"C\" {
$run eval 'egrep -v " ($exclude_expsyms)$" "$nlist" > "$nlist"T'
$run eval '$mv "$nlist"T "$nlist"'
fi
-
+
if test -n "$export_symbols_regex"; then
$run eval 'egrep -e "$export_symbols_regex" "$nlist" > "$nlist"T'
$run eval '$mv "$nlist"T "$nlist"'
@@ -2613,13 +3324,13 @@ static const void *lt_preloaded_setup() {
fi
pic_flag_for_symtable=
- case "$host" in
+ case $host in
# compiling the symbol table file with pic_flag works around
# a FreeBSD bug that causes programs to crash when -lm is
# linked before any other PIC object. But we must not use
# pic_flag when linking with -static. The problem exists in
# FreeBSD 2.2.6 and is fixed in FreeBSD 3.1.
- *-*-freebsd2*|*-*-freebsd3.0*|*-*-freebsdelf3.0*)
+ *-*-freebsd2*|*-*-freebsd3.0*|*-*-freebsdelf3.0*)
case "$compile_command " in
*" -static "*) ;;
*) pic_flag_for_symtable=" $pic_flag -DPIC -DFREEBSD_WORKAROUND";;
@@ -2658,7 +3369,7 @@ static const void *lt_preloaded_setup() {
finalize_command=`$echo "X$finalize_command" | $Xsed -e "s% @SYMFILE@%%"`
fi
- if test -z "$link_against_libtool_libs" || test "$build_libtool_libs" != yes; then
+ if test $need_relink = no || test "$build_libtool_libs" != yes; then
# Replace the output file specification.
compile_command=`$echo "X$compile_command" | $Xsed -e 's%@OUTPUT@%'"$output"'%g'`
link_command="$compile_command$compile_rpath"
@@ -2667,7 +3378,7 @@ static const void *lt_preloaded_setup() {
$show "$link_command"
$run eval "$link_command"
status=$?
-
+
# Delete the generated files.
if test -n "$dlsyms"; then
$show "$rm $output_objdir/${outputname}S.${objext}"
@@ -2681,7 +3392,7 @@ static const void *lt_preloaded_setup() {
# We should set the shlibpath_var
rpath=
for dir in $temp_rpath; do
- case "$dir" in
+ case $dir in
[\\/]* | [A-Za-z]:[\\/]*)
# Absolute path.
rpath="$rpath$dir:"
@@ -2723,11 +3434,24 @@ static const void *lt_preloaded_setup() {
fi
fi
+ if test "$no_install" = yes; then
+ # We don't need to create a wrapper script.
+ link_command="$compile_var$compile_command$compile_rpath"
+ # Replace the output file specification.
+ link_command=`$echo "X$link_command" | $Xsed -e 's%@OUTPUT@%'"$output"'%g'`
+ # Delete the old output file.
+ $run $rm $output
+ # Link the executable and exit
+ $show "$link_command"
+ $run eval "$link_command" || exit $?
+ exit 0
+ fi
+
if test "$hardcode_action" = relink; then
# Fast installation is not supported
link_command="$compile_var$compile_command$compile_rpath"
relink_command="$finalize_var$finalize_command$finalize_rpath"
-
+
$echo "$modename: warning: this platform does not like uninstalled shared libraries" 1>&2
$echo "$modename: \`$output' will be relinked during installation" 1>&2
else
@@ -2747,7 +3471,7 @@ static const void *lt_preloaded_setup() {
# Replace the output file specification.
link_command=`$echo "X$link_command" | $Xsed -e 's%@OUTPUT@%'"$output_objdir/$outputname"'%g'`
-
+
# Delete the old output files.
$run $rm $output $output_objdir/$outputname $output_objdir/lt-$outputname
@@ -2759,12 +3483,24 @@ static const void *lt_preloaded_setup() {
# Quote the relink command for shipping.
if test -n "$relink_command"; then
+ # Preserve any variables that may affect compiler behavior
+ for var in $variables_saved_for_relink; do
+ if eval test -z \"\${$var+set}\"; then
+ relink_command="{ test -z \"\${$var+set}\" || unset $var || { $var=; export $var; }; }; $relink_command"
+ elif eval var_value=\$$var; test -z "$var_value"; then
+ relink_command="$var=; export $var; $relink_command"
+ else
+ var_value=`$echo "X$var_value" | $Xsed -e "$sed_quote_subst"`
+ relink_command="$var=\"$var_value\"; export $var; $relink_command"
+ fi
+ done
+ relink_command="cd `pwd`; $relink_command"
relink_command=`$echo "X$relink_command" | $Xsed -e "$sed_quote_subst"`
fi
# Quote $echo for shipping.
if test "X$echo" = "X$SHELL $0 --fallback-echo"; then
- case "$0" in
+ case $0 in
[\\/]* | [A-Za-z]:[\\/]*) qecho="$SHELL $0 --fallback-echo";;
*) qecho="$SHELL `pwd`/$0 --fallback-echo";;
esac
@@ -2780,6 +3516,11 @@ static const void *lt_preloaded_setup() {
case $output in
*.exe) output=`echo $output|sed 's,.exe$,,'` ;;
esac
+ # test for cygwin because mv fails w/o .exe extensions
+ case $host in
+ *cygwin*) exeext=.exe ;;
+ *) exeext= ;;
+ esac
$rm $output
trap "$rm $output; exit 1" 1 2 15
@@ -2809,7 +3550,7 @@ relink_command=\"$relink_command\"
# This environment variable determines our operation mode.
if test \"\$libtool_install_magic\" = \"$magic\"; then
# install mode needs the following variable:
- link_against_libtool_libs='$link_against_libtool_libs'
+ notinst_deplibs='$notinst_deplibs'
else
# When we are sourced in execute mode, \$file and \$echo are already set.
if test \"\$libtool_execute_magic\" != \"$magic\"; then
@@ -2842,7 +3583,7 @@ else
# If there was a directory component, then change thisdir.
if test \"x\$destdir\" != \"x\$file\"; then
case \"\$destdir\" in
- [\\/]* | [A-Za-z]:[\\/]*) thisdir=\"\$destdir\" ;;
+ [\\\\/]* | [A-Za-z]:[\\\\/]*) thisdir=\"\$destdir\" ;;
*) thisdir=\"\$thisdir/\$destdir\" ;;
esac
fi
@@ -2858,9 +3599,9 @@ else
if test "$fast_install" = yes; then
echo >> $output "\
- program=lt-'$outputname'
+ program=lt-'$outputname'$exeext
progdir=\"\$thisdir/$objdir\"
-
+
if test ! -f \"\$progdir/\$program\" || \\
{ file=\`ls -1dt \"\$progdir/\$program\" \"\$progdir/../\$program\" 2>/dev/null | sed 1q\`; \\
test \"X\$file\" != \"X\$progdir/\$program\"; }; then
@@ -2877,7 +3618,7 @@ else
# relink executable if necessary
if test -n \"\$relink_command\"; then
- if (cd \"\$thisdir\" && eval \$relink_command); then :
+ if (eval \$relink_command); then :
else
$rm \"\$progdir/\$file\"
exit 1
@@ -2927,13 +3668,21 @@ else
# Run the actual program with our arguments.
"
case $host in
- *-*-cygwin* | *-*-mingw | *-*-os2*)
- # win32 systems need to use the prog path for dll
- # lookup to work
+ # win32 systems need to use the prog path for dll
+ # lookup to work
+ *-*-cygwin* | *-*-pw32*)
+ $echo >> $output "\
+ exec \$progdir/\$program \${1+\"\$@\"}
+"
+ ;;
+
+ # Backslashes separate directories on plain windows
+ *-*-mingw | *-*-os2*)
$echo >> $output "\
exec \$progdir\\\\\$program \${1+\"\$@\"}
"
;;
+
*)
$echo >> $output "\
# Export the path to the program.
@@ -2975,7 +3724,7 @@ fi\
oldobjs="$libobjs_save"
build_libtool_libs=no
else
- oldobjs="$objs "`$echo "X$libobjs_save" | $SP2NL | $Xsed -e '/\.'${libext}'$/d' -e '/\.lib$/d' -e "$lo2o" | $NL2SP`
+ oldobjs="$objs$old_deplibs "`$echo "X$libobjs_save" | $SP2NL | $Xsed -e '/\.'${libext}'$/d' -e '/\.lib$/d' -e "$lo2o" | $NL2SP`
fi
addlibs="$old_convenience"
fi
@@ -2991,11 +3740,11 @@ fi\
exit $status
fi
generated="$generated $gentop"
-
+
# Add in members from convenience archives.
for xlib in $addlibs; do
# Extract the objects.
- case "$xlib" in
+ case $xlib in
[\\/]* | [A-Za-z]:[\\/]*) xabs="$xlib" ;;
*) xabs=`pwd`"/$xlib" ;;
esac
@@ -3056,19 +3805,26 @@ fi\
fi
# Now create the libtool archive.
- case "$output" in
+ case $output in
*.la)
old_library=
test "$build_old_libs" = yes && old_library="$libname.$libext"
$show "creating $output"
- if test -n "$xrpath"; then
- temp_xrpath=
- for libdir in $xrpath; do
- temp_xrpath="$temp_xrpath -R$libdir"
- done
- dependency_libs="$temp_xrpath $dependency_libs"
- fi
+ # Preserve any variables that may affect compiler behavior
+ for var in $variables_saved_for_relink; do
+ if eval test -z \"\${$var+set}\"; then
+ relink_command="{ test -z \"\${$var+set}\" || unset $var || { $var=; export $var; }; }; $relink_command"
+ elif eval var_value=\$$var; test -z "$var_value"; then
+ relink_command="$var=; export $var; $relink_command"
+ else
+ var_value=`$echo "X$var_value" | $Xsed -e "$sed_quote_subst"`
+ relink_command="$var=\"$var_value\"; export $var; $relink_command"
+ fi
+ done
+ # Quote the link command for shipping.
+ relink_command="cd `pwd`; $SHELL $0 --mode=relink $libtool_args"
+ relink_command=`$echo "X$relink_command" | $Xsed -e "$sed_quote_subst"`
# Only create the output if not a dry run.
if test -z "$run"; then
@@ -3078,8 +3834,52 @@ fi\
break
fi
output="$output_objdir/$outputname"i
+ # Replace all uninstalled libtool libraries with the installed ones
+ newdependency_libs=
+ for deplib in $dependency_libs; do
+ case $deplib in
+ *.la)
+ name=`$echo "X$deplib" | $Xsed -e 's%^.*/%%'`
+ eval libdir=`sed -n -e 's/^libdir=\(.*\)$/\1/p' $deplib`
+ if test -z "$libdir"; then
+ $echo "$modename: \`$deplib' is not a valid libtool archive" 1>&2
+ exit 1
+ fi
+ newdependency_libs="$newdependency_libs $libdir/$name"
+ ;;
+ *) newdependency_libs="$newdependency_libs $deplib" ;;
+ esac
+ done
+ dependency_libs="$newdependency_libs"
+ newdlfiles=
+ for lib in $dlfiles; do
+ name=`$echo "X$lib" | $Xsed -e 's%^.*/%%'`
+ eval libdir=`sed -n -e 's/^libdir=\(.*\)$/\1/p' $lib`
+ if test -z "$libdir"; then
+ $echo "$modename: \`$lib' is not a valid libtool archive" 1>&2
+ exit 1
+ fi
+ newdlfiles="$newdlfiles $libdir/$name"
+ done
+ dlfiles="$newdlfiles"
+ newdlprefiles=
+ for lib in $dlprefiles; do
+ name=`$echo "X$lib" | $Xsed -e 's%^.*/%%'`
+ eval libdir=`sed -n -e 's/^libdir=\(.*\)$/\1/p' $lib`
+ if test -z "$libdir"; then
+ $echo "$modename: \`$lib' is not a valid libtool archive" 1>&2
+ exit 1
+ fi
+ newdlprefiles="$newdlprefiles $libdir/$name"
+ done
+ dlprefiles="$newdlprefiles"
fi
$rm $output
+ # place dlname in correct position for cygwin
+ tdlname=$dlname
+ case $host,$output,$installed,$module,$dlname in
+ *cygwin*,*lai,yes,no,*.dll) tdlname=../bin/$dlname ;;
+ esac
$echo > $output "\
# $outputname - a libtool library file
# Generated by $PROGRAM - GNU $PACKAGE $VERSION$TIMESTAMP
@@ -3088,7 +3888,7 @@ fi\
# It is necessary for linking the library.
# The name that we can dlopen(3).
-dlname='$dlname'
+dlname='$tdlname'
# Names of this library.
library_names='$library_names'
@@ -3107,16 +3907,23 @@ revision=$revision
# Is this an already installed library?
installed=$installed
+# Files to dlopen/dlpreopen
+dlopen='$dlfiles'
+dlpreopen='$dlprefiles'
+
# Directory that this library needs to be installed in:
-libdir='$install_libdir'\
-"
+libdir='$install_libdir'"
+ if test "$installed" = no && test $need_relink = yes; then
+ $echo >> $output "\
+relink_command=\"$relink_command\""
+ fi
done
fi
# Do a symbolic link so that the libtool archive can be found in
# LD_LIBRARY_PATH before the program is installed.
$show "(cd $output_objdir && $rm $outputname && $LN_S ../$outputname $outputname)"
- $run eval "(cd $output_objdir && $rm $outputname && $LN_S ../$outputname $outputname)" || exit $?
+ $run eval '(cd $output_objdir && $rm $outputname && $LN_S ../$outputname $outputname)' || exit $?
;;
esac
exit 0
@@ -3128,10 +3935,12 @@ libdir='$install_libdir'\
# There may be an optional sh(1) argument at the beginning of
# install_prog (especially on Windows NT).
- if test "$nonopt" = "$SHELL" || test "$nonopt" = /bin/sh; then
+ if test "$nonopt" = "$SHELL" || test "$nonopt" = /bin/sh ||
+ # Allow the use of GNU shtool's install command.
+ $echo "X$nonopt" | $Xsed | grep shtool > /dev/null; then
# Aesthetically quote it.
arg=`$echo "X$nonopt" | $Xsed -e "$sed_quote_subst"`
- case "$arg" in
+ case $arg in
*[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \ ]*|*]*)
arg="\"$arg\""
;;
@@ -3147,7 +3956,7 @@ libdir='$install_libdir'\
# The real first argument should be the name of the installation program.
# Aesthetically quote it.
arg=`$echo "X$arg" | $Xsed -e "$sed_quote_subst"`
- case "$arg" in
+ case $arg in
*[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \ ]*|*]*)
arg="\"$arg\""
;;
@@ -3170,7 +3979,7 @@ libdir='$install_libdir'\
continue
fi
- case "$arg" in
+ case $arg in
-d) isdir=yes ;;
-f) prev="-f" ;;
-g) prev="-g" ;;
@@ -3195,7 +4004,7 @@ libdir='$install_libdir'\
# Aesthetically quote the argument.
arg=`$echo "X$arg" | $Xsed -e "$sed_quote_subst"`
- case "$arg" in
+ case $arg in
*[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \ ]*|*]*)
arg="\"$arg\""
;;
@@ -3246,11 +4055,11 @@ libdir='$install_libdir'\
exit 1
fi
fi
- case "$destdir" in
+ case $destdir in
[\\/]* | [A-Za-z]:[\\/]*) ;;
*)
for file in $files; do
- case "$file" in
+ case $file in
*.lo) ;;
*)
$echo "$modename: \`$destdir' must be an absolute directory name" 1>&2
@@ -3272,8 +4081,8 @@ libdir='$install_libdir'\
for file in $files; do
# Do each installation.
- case "$file" in
- *.a | *.lib)
+ case $file in
+ *.$libext)
# Do the static libraries later.
staticlibs="$staticlibs $file"
;;
@@ -3289,8 +4098,9 @@ libdir='$install_libdir'\
library_names=
old_library=
+ relink_command=
# If there is no directory component, then add one.
- case "$file" in
+ case $file in
*/* | *\\*) . $file ;;
*) . ./$file ;;
esac
@@ -3309,10 +4119,20 @@ libdir='$install_libdir'\
esac
fi
- dir="`$echo "X$file" | $Xsed -e 's%/[^/]*$%%'`/"
+ dir=`$echo "X$file" | $Xsed -e 's%/[^/]*$%%'`/
test "X$dir" = "X$file/" && dir=
dir="$dir$objdir"
+ if test -n "$relink_command"; then
+ $echo "$modename: warning: relinking \`$file'" 1>&2
+ $show "$relink_command"
+ if $run eval "$relink_command"; then :
+ else
+ $echo "$modename: error: relink \`$file' with the above command before installing it" 1>&2
+ continue
+ fi
+ fi
+
# See the names of the shared library.
set dummy $library_names
if test -n "$2"; then
@@ -3320,9 +4140,16 @@ libdir='$install_libdir'\
shift
shift
+ srcname="$realname"
+ test -n "$relink_command" && srcname="$realname"T
+
# Install the shared library and build the symlinks.
- $show "$install_prog $dir/$realname $destdir/$realname"
- $run eval "$install_prog $dir/$realname $destdir/$realname" || exit $?
+ $show "$install_prog $dir/$srcname $destdir/$realname"
+ $run eval "$install_prog $dir/$srcname $destdir/$realname" || exit $?
+ if test -n "$stripme" && test -n "$striplib"; then
+ $show "$striplib $destdir/$realname"
+ $run eval "$striplib $destdir/$realname" || exit $?
+ fi
if test $# -gt 0; then
# Delete the old symlinks, and create new ones.
@@ -3369,11 +4196,11 @@ libdir='$install_libdir'\
fi
# Deduce the name of the destination old-style object file.
- case "$destfile" in
+ case $destfile in
*.lo)
staticdest=`$echo "X$destfile" | $Xsed -e "$lo2o"`
;;
- *.o | *.obj)
+ *.$objext)
staticdest="$destfile"
destfile=
;;
@@ -3412,39 +4239,46 @@ libdir='$install_libdir'\
# Do a test to see if this is really a libtool program.
if (sed -e '4q' $file | egrep "^# Generated by .*$PACKAGE") >/dev/null 2>&1; then
- link_against_libtool_libs=
+ notinst_deplibs=
relink_command=
# If there is no directory component, then add one.
- case "$file" in
+ case $file in
*/* | *\\*) . $file ;;
*) . ./$file ;;
esac
# Check the variables that should have been set.
- if test -z "$link_against_libtool_libs"; then
+ if test -z "$notinst_deplibs"; then
$echo "$modename: invalid libtool wrapper script \`$file'" 1>&2
exit 1
fi
finalize=yes
- for lib in $link_against_libtool_libs; do
+ for lib in $notinst_deplibs; do
# Check to see that each library is installed.
libdir=
if test -f "$lib"; then
# If there is no directory component, then add one.
- case "$lib" in
+ case $lib in
*/* | *\\*) . $lib ;;
*) . ./$lib ;;
esac
fi
- libfile="$libdir/`$echo "X$lib" | $Xsed -e 's%^.*/%%g'`"
+ libfile="$libdir/"`$echo "X$lib" | $Xsed -e 's%^.*/%%g'` ### testsuite: skip nested quoting test
if test -n "$libdir" && test ! -f "$libfile"; then
$echo "$modename: warning: \`$lib' has not been installed in \`$libdir'" 1>&2
finalize=no
fi
done
+ relink_command=
+ # If there is no directory component, then add one.
+ case $file in
+ */* | *\\*) . $file ;;
+ *) . ./$file ;;
+ esac
+
outputname=
if test "$fast_install" = no && test -n "$relink_command"; then
if test "$finalize" = yes && test -z "$run"; then
@@ -3456,6 +4290,7 @@ libdir='$install_libdir'\
$echo "$modename: error: cannot create temporary directory \`$tmpdir'" 1>&2
continue
fi
+ file=`$echo "X$file" | $Xsed -e 's%^.*/%%'`
outputname="$tmpdir/$file"
# Replace the output file specification.
relink_command=`$echo "X$relink_command" | $Xsed -e 's%@OUTPUT@%'"$outputname"'%g'`
@@ -3477,6 +4312,23 @@ libdir='$install_libdir'\
fi
fi
+ # remove .exe since cygwin /usr/bin/install will append another
+ # one anyways
+ case $install_prog,$host in
+ /usr/bin/install*,*cygwin*)
+ case $file:$destfile in
+ *.exe:*.exe)
+ # this is ok
+ ;;
+ *.exe:*)
+ destfile=$destfile.exe
+ ;;
+ *:*.exe)
+ destfile=`echo $destfile | sed -e 's,.exe$,,'`
+ ;;
+ esac
+ ;;
+ esac
$show "$install_prog$stripme $file $destfile"
$run eval "$install_prog\$stripme \$file \$destfile" || exit $?
test -n "$outputname" && ${rm}r "$tmpdir"
@@ -3493,6 +4345,11 @@ libdir='$install_libdir'\
$show "$install_prog $file $oldlib"
$run eval "$install_prog \$file \$oldlib" || exit $?
+ if test -n "$stripme" && test -n "$striplib"; then
+ $show "$old_striplib $oldlib"
+ $run eval "$old_striplib $oldlib" || exit $?
+ fi
+
# Do each command in the postinstall commands.
eval cmds=\"$old_postinstall_cmds\"
IFS="${IFS= }"; save_ifs="$IFS"; IFS='~'
@@ -3553,7 +4410,7 @@ libdir='$install_libdir'\
fi
# Exit here if they wanted silent mode.
- test "$show" = : && exit 0
+ test "$show" = ":" && exit 0
echo "----------------------------------------------------------------------"
echo "Libraries have been installed in:"
@@ -3563,7 +4420,7 @@ libdir='$install_libdir'\
echo
echo "If you ever happen to want to link against installed libraries"
echo "in a given directory, LIBDIR, you must either use libtool, and"
- echo "specify the full pathname of the library, or use \`-LLIBDIR'"
+ echo "specify the full pathname of the library, or use the \`-LLIBDIR'"
echo "flag during linking and do at least one of the following:"
if test -n "$shlibpath_var"; then
echo " - add LIBDIR to the \`$shlibpath_var' environment variable"
@@ -3613,7 +4470,7 @@ libdir='$install_libdir'\
fi
dir=
- case "$file" in
+ case $file in
*.la)
# Check to see that this really is a libtool archive.
if (sed -e '2q' $file | egrep "^# Generated by .*$PACKAGE") >/dev/null 2>&1; then :
@@ -3628,7 +4485,7 @@ libdir='$install_libdir'\
library_names=
# If there is no directory component, then add one.
- case "$file" in
+ case $file in
*/* | *\\*) . $file ;;
*) . ./$file ;;
esac
@@ -3683,13 +4540,13 @@ libdir='$install_libdir'\
args=
for file
do
- case "$file" in
+ case $file in
-*) ;;
*)
# Do a test to see if this is really a libtool program.
if (sed -e '4q' $file | egrep "^# Generated by .*$PACKAGE") >/dev/null 2>&1; then
# If there is no directory component, then add one.
- case "$file" in
+ case $file in
*/* | *\\*) . $file ;;
*) . ./$file ;;
esac
@@ -3706,8 +4563,8 @@ libdir='$install_libdir'\
if test -z "$run"; then
if test -n "$shlibpath_var"; then
- # Export the shlibpath_var.
- eval "export $shlibpath_var"
+ # Export the shlibpath_var.
+ eval "export $shlibpath_var"
fi
# Restore saved enviroment variables
@@ -3726,23 +4583,30 @@ libdir='$install_libdir'\
else
# Display what would be done.
if test -n "$shlibpath_var"; then
- eval "\$echo \"\$shlibpath_var=\$$shlibpath_var\""
- $echo "export $shlibpath_var"
+ eval "\$echo \"\$shlibpath_var=\$$shlibpath_var\""
+ $echo "export $shlibpath_var"
fi
$echo "$cmd$args"
exit 0
fi
;;
- # libtool uninstall mode
- uninstall)
- modename="$modename: uninstall"
+ # libtool clean and uninstall mode
+ clean | uninstall)
+ modename="$modename: $mode"
rm="$nonopt"
files=
+ rmforce=
+ exit_status=0
+
+ # This variable tells wrapper scripts just to set variables rather
+ # than running their programs.
+ libtool_install_magic="$magic"
for arg
do
- case "$arg" in
+ case $arg in
+ -f) rm="$rm $arg"; rmforce=yes ;;
-*) rm="$rm $arg" ;;
*) files="$files $arg" ;;
esac
@@ -3754,14 +4618,42 @@ libdir='$install_libdir'\
exit 1
fi
+ rmdirs=
+
for file in $files; do
dir=`$echo "X$file" | $Xsed -e 's%/[^/]*$%%'`
- test "X$dir" = "X$file" && dir=.
+ if test "X$dir" = "X$file"; then
+ dir=.
+ objdir="$objdir"
+ else
+ objdir="$dir/$objdir"
+ fi
name=`$echo "X$file" | $Xsed -e 's%^.*/%%'`
+ test $mode = uninstall && objdir="$dir"
+
+ # Remember objdir for removal later, being careful to avoid duplicates
+ if test $mode = clean; then
+ case " $rmdirs " in
+ *" $objdir "*) ;;
+ *) rmdirs="$rmdirs $objdir" ;;
+ esac
+ fi
+
+ # Don't error if the file doesn't exist and rm -f was used.
+ if (test -L "$file") >/dev/null 2>&1 \
+ || (test -h "$file") >/dev/null 2>&1 \
+ || test -f "$file"; then
+ :
+ elif test -d "$file"; then
+ exit_status=1
+ continue
+ elif test "$rmforce" = yes; then
+ continue
+ fi
rmfiles="$file"
- case "$name" in
+ case $name in
*.la)
# Possibly a libtool archive, so verify it.
if (sed -e '2q' $file | egrep "^# Generated by .*$PACKAGE") >/dev/null 2>&1; then
@@ -3769,38 +4661,43 @@ libdir='$install_libdir'\
# Delete the libtool libraries and symlinks.
for n in $library_names; do
- rmfiles="$rmfiles $dir/$n"
+ rmfiles="$rmfiles $objdir/$n"
done
- test -n "$old_library" && rmfiles="$rmfiles $dir/$old_library"
-
- $show "$rm $rmfiles"
- $run $rm $rmfiles
-
- if test -n "$library_names"; then
- # Do each command in the postuninstall commands.
- eval cmds=\"$postuninstall_cmds\"
- IFS="${IFS= }"; save_ifs="$IFS"; IFS='~'
- for cmd in $cmds; do
+ test -n "$old_library" && rmfiles="$rmfiles $objdir/$old_library"
+ test $mode = clean && rmfiles="$rmfiles $objdir/$name $objdir/${name}i"
+
+ if test $mode = uninstall; then
+ if test -n "$library_names"; then
+ # Do each command in the postuninstall commands.
+ eval cmds=\"$postuninstall_cmds\"
+ IFS="${IFS= }"; save_ifs="$IFS"; IFS='~'
+ for cmd in $cmds; do
+ IFS="$save_ifs"
+ $show "$cmd"
+ $run eval "$cmd"
+ if test $? != 0 && test "$rmforce" != yes; then
+ exit_status=1
+ fi
+ done
IFS="$save_ifs"
- $show "$cmd"
- $run eval "$cmd"
- done
- IFS="$save_ifs"
- fi
+ fi
- if test -n "$old_library"; then
- # Do each command in the old_postuninstall commands.
- eval cmds=\"$old_postuninstall_cmds\"
- IFS="${IFS= }"; save_ifs="$IFS"; IFS='~'
- for cmd in $cmds; do
+ if test -n "$old_library"; then
+ # Do each command in the old_postuninstall commands.
+ eval cmds=\"$old_postuninstall_cmds\"
+ IFS="${IFS= }"; save_ifs="$IFS"; IFS='~'
+ for cmd in $cmds; do
+ IFS="$save_ifs"
+ $show "$cmd"
+ $run eval "$cmd"
+ if test $? != 0 && test "$rmforce" != yes; then
+ exit_status=1
+ fi
+ done
IFS="$save_ifs"
- $show "$cmd"
- $run eval "$cmd"
- done
- IFS="$save_ifs"
+ fi
+ # FIXME: should reinstall the best remaining shared library.
fi
-
- # FIXME: should reinstall the best remaining shared library.
fi
;;
@@ -3809,17 +4706,35 @@ libdir='$install_libdir'\
oldobj=`$echo "X$name" | $Xsed -e "$lo2o"`
rmfiles="$rmfiles $dir/$oldobj"
fi
- $show "$rm $rmfiles"
- $run $rm $rmfiles
;;
*)
- $show "$rm $rmfiles"
- $run $rm $rmfiles
+ # Do a test to see if this is a libtool program.
+ if test $mode = clean &&
+ (sed -e '4q' $file | egrep "^# Generated by .*$PACKAGE") >/dev/null 2>&1; then
+ relink_command=
+ . $dir/$file
+
+ rmfiles="$rmfiles $objdir/$name $objdir/${name}S.${objext}"
+ if test "$fast_install" = yes && test -n "$relink_command"; then
+ rmfiles="$rmfiles $objdir/lt-$name"
+ fi
+ fi
;;
esac
+ $show "$rm $rmfiles"
+ $run $rm $rmfiles || exit_status=1
done
- exit 0
+
+ # Try to remove the ${objdir}s in the directories where we deleted files
+ for dir in $rmdirs; do
+ if test -d "$dir"; then
+ $show "rmdir $dir"
+ $run rmdir $dir >/dev/null 2>&1
+ fi
+ done
+
+ exit $exit_status
;;
"")
@@ -3835,7 +4750,7 @@ libdir='$install_libdir'\
fi # test -z "$show_help"
# We need to display help for each of the modes.
-case "$mode" in
+case $mode in
"") $echo \
"Usage: $modename [OPTION]... [MODE-ARG]...
@@ -3854,6 +4769,7 @@ Provide generalized library-building support services.
MODE must be one of the following:
+ clean remove files from the build directory
compile compile a source file into a libtool object
execute automatically set library path, then run a program
finish complete the installation of libtool libraries
@@ -3866,6 +4782,20 @@ a more detailed description of MODE."
exit 0
;;
+clean)
+ $echo \
+"Usage: $modename [OPTION]... --mode=clean RM [RM-OPTION]... FILE...
+
+Remove files from the build directory.
+
+RM is the name of the program to use to delete files associated with each FILE
+(typically \`/bin/rm'). RM-OPTIONS are options (such as \`-f') to be passed
+to RM.
+
+If FILE is a libtool library, object or program, all the files associated
+with it are deleted. Otherwise, only FILE itself is deleted using RM."
+ ;;
+
compile)
$echo \
"Usage: $modename [OPTION]... --mode=compile COMPILE-COMMAND... SOURCEFILE
@@ -3875,6 +4805,8 @@ Compile a source file into a libtool library object.
This mode accepts the following additional options:
-o OUTPUT-FILE set the output file name to OUTPUT-FILE
+ -prefer-pic try to building PIC objects only
+ -prefer-non-pic try to building non-PIC objects only
-static always build a \`.o' file suitable for static linking
COMPILE-COMMAND is a command to be used in creating a \`standard' object file
@@ -3954,6 +4886,8 @@ The following components of LINK-COMMAND are treated specially:
-LLIBDIR search LIBDIR for required installed libraries
-lNAME OUTPUT-FILE requires the installed library libNAME
-module build a library that can dlopened
+ -no-fast-install disable the fast-install mode
+ -no-install link a not-installable executable
-no-undefined declare that a library does not refer to external symbols
-o OUTPUT-FILE create OUTPUT-FILE from the specified objects
-release RELEASE specify package release information
diff --git a/srclib/pcre/maketables.c b/srclib/pcre/maketables.c
index c0f06c0375..01078f19e6 100644
--- a/srclib/pcre/maketables.c
+++ b/srclib/pcre/maketables.c
@@ -8,7 +8,7 @@ and semantics are as close as possible to those of the Perl 5 language.
Written by: Philip Hazel <ph10@cam.ac.uk>
- Copyright (c) 1997-2000 University of Cambridge
+ Copyright (c) 1997-2001 University of Cambridge
-----------------------------------------------------------------------------
Permission is granted to anyone to use this software for any purpose on any
@@ -58,7 +58,7 @@ Arguments: none
Returns: pointer to the contiguous block of data
*/
-unsigned const char *
+const unsigned char *
pcre_maketables(void)
{
unsigned char *yield, *p;
diff --git a/srclib/pcre/pcre.c b/srclib/pcre/pcre.c
index e45dee8d96..ad3ddc7c57 100644
--- a/srclib/pcre/pcre.c
+++ b/srclib/pcre/pcre.c
@@ -9,7 +9,7 @@ the file Tech.Notes for some information on the internals.
Written by: Philip Hazel <ph10@cam.ac.uk>
- Copyright (c) 1997-2000 University of Cambridge
+ Copyright (c) 1997-2001 University of Cambridge
-----------------------------------------------------------------------------
Permission is granted to anyone to use this software for any purpose on any
@@ -60,12 +60,25 @@ the external pcre header. */
#endif
-/* Number of items on the nested bracket stacks at compile time. This should
-not be set greater than 200. */
+/* Maximum number of items on the nested bracket stacks at compile time. This
+applies to the nesting of all kinds of parentheses. It does not limit
+un-nested, non-capturing parentheses. This number can be made bigger if
+necessary - it is used to dimension one int and one unsigned char vector at
+compile time. */
#define BRASTACK_SIZE 200
+/* The number of bytes in a literal character string above which we can't add
+any more is different when UTF-8 characters may be encountered. */
+
+#ifdef SUPPORT_UTF8
+#define MAXLIT 250
+#else
+#define MAXLIT 255
+#endif
+
+
/* Min and max values for the common repeats; for the maxima, 0 => infinity */
static const char rep_min[] = { 0, 0, 1, 1, 0, 0 };
@@ -85,7 +98,7 @@ static const char *OP_names[] = {
"class", "Ref", "Recurse",
"Alt", "Ket", "KetRmax", "KetRmin", "Assert", "Assert not",
"AssertB", "AssertB not", "Reverse", "Once", "Cond", "Cref",
- "Brazero", "Braminzero", "Bra"
+ "Brazero", "Braminzero", "Branumber", "Bra"
};
#endif
@@ -101,9 +114,9 @@ static const short int escapes[] = {
0, 0, 0, 0, 0, 0, 0, 0, /* H - O */
0, 0, 0, -ESC_S, 0, 0, 0, -ESC_W, /* P - W */
0, 0, -ESC_Z, '[', '\\', ']', '^', '_', /* X - _ */
- '`', 7, -ESC_b, 0, -ESC_d, 27, '\f', 0, /* ` - g */
- 0, 0, 0, 0, 0, 0, '\n', 0, /* h - o */
- 0, 0, '\r', -ESC_s, '\t', 0, 0, -ESC_w, /* p - w */
+ '`', 7, -ESC_b, 0, -ESC_d, ESC_E, ESC_F, 0, /* ` - g */
+ 0, 0, 0, 0, 0, 0, ESC_N, 0, /* h - o */
+ 0, 0, ESC_R, -ESC_s, ESC_T, 0, 0, -ESC_w, /* p - w */
0, 0, -ESC_z /* x - z */
};
@@ -145,6 +158,21 @@ static BOOL
compile_regex(int, int, int *, uschar **, const uschar **, const char **,
BOOL, int, int *, int *, compile_data *);
+/* Structure for building a chain of data that actually lives on the
+stack, for holding the values of the subject pointer at the start of each
+subpattern, so as to detect when an empty string has been matched by a
+subpattern - to break infinite loops. */
+
+typedef struct eptrblock {
+ struct eptrblock *prev;
+ const uschar *saved_eptr;
+} eptrblock;
+
+/* Flag bits for the match() function */
+
+#define match_condassert 0x01 /* Called to check a condition assertion */
+#define match_isgroup 0x02 /* Set if start of bracketed group */
+
/*************************************************
@@ -161,6 +189,64 @@ void (*pcre_free)(void *) = free;
+/*************************************************
+* Macros and tables for character handling *
+*************************************************/
+
+/* When UTF-8 encoding is being used, a character is no longer just a single
+byte. The macros for character handling generate simple sequences when used in
+byte-mode, and more complicated ones for UTF-8 characters. */
+
+#ifndef SUPPORT_UTF8
+#define GETCHARINC(c, eptr) c = *eptr++;
+#define GETCHARLEN(c, eptr, len) c = *eptr;
+#define BACKCHAR(eptr)
+
+#else /* SUPPORT_UTF8 */
+
+/* Get the next UTF-8 character, advancing the pointer */
+
+#define GETCHARINC(c, eptr) \
+ c = *eptr++; \
+ if (md->utf8 && (c & 0xc0) == 0xc0) \
+ { \
+ int a = utf8_table4[c & 0x3f]; /* Number of additional bytes */ \
+ int s = 6*a; \
+ c = (c & utf8_table3[a]) << s; \
+ while (a-- > 0) \
+ { \
+ s -= 6; \
+ c |= (*eptr++ & 0x3f) << s; \
+ } \
+ }
+
+/* Get the next UTF-8 character, not advancing the pointer, setting length */
+
+#define GETCHARLEN(c, eptr, len) \
+ c = *eptr; \
+ len = 1; \
+ if (md->utf8 && (c & 0xc0) == 0xc0) \
+ { \
+ int i; \
+ int a = utf8_table4[c & 0x3f]; /* Number of additional bytes */ \
+ int s = 6*a; \
+ c = (c & utf8_table3[a]) << s; \
+ for (i = 1; i <= a; i++) \
+ { \
+ s -= 6; \
+ c |= (eptr[i] & 0x3f) << s; \
+ } \
+ len += a; \
+ }
+
+/* If the pointer is not at the start of a character, move it back until
+it is. */
+
+#define BACKCHAR(eptr) while((*eptr & 0xc0) == 0x80) eptr--;
+
+#endif
+
+
/*************************************************
* Default character tables *
@@ -176,6 +262,66 @@ tables. */
+#ifdef SUPPORT_UTF8
+/*************************************************
+* Tables for UTF-8 support *
+*************************************************/
+
+/* These are the breakpoints for different numbers of bytes in a UTF-8
+character. */
+
+static int utf8_table1[] = { 0x7f, 0x7ff, 0xffff, 0x1fffff, 0x3ffffff, 0x7fffffff};
+
+/* These are the indicator bits and the mask for the data bits to set in the
+first byte of a character, indexed by the number of additional bytes. */
+
+static int utf8_table2[] = { 0, 0xc0, 0xe0, 0xf0, 0xf8, 0xfc};
+static int utf8_table3[] = { 0xff, 0x1f, 0x0f, 0x07, 0x03, 0x01};
+
+/* Table of the number of extra characters, indexed by the first character
+masked with 0x3f. The highest number for a valid UTF-8 character is in fact
+0x3d. */
+
+static uschar utf8_table4[] = {
+ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
+ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
+ 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
+ 3,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5 };
+
+
+/*************************************************
+* Convert character value to UTF-8 *
+*************************************************/
+
+/* This function takes an integer value in the range 0 - 0x7fffffff
+and encodes it as a UTF-8 character in 0 to 6 bytes.
+
+Arguments:
+ cvalue the character value
+ buffer pointer to buffer for result - at least 6 bytes long
+
+Returns: number of characters placed in the buffer
+*/
+
+static int
+ord2utf8(int cvalue, uschar *buffer)
+{
+register int i, j;
+for (i = 0; i < sizeof(utf8_table1)/sizeof(int); i++)
+ if (cvalue <= utf8_table1[i]) break;
+buffer += i;
+for (j = i; j > 0; j--)
+ {
+ *buffer-- = 0x80 | (cvalue & 0x3f);
+ cvalue >>= 6;
+ }
+*buffer = utf8_table2[i] | cvalue;
+return i + 1;
+}
+#endif
+
+
+
/*************************************************
* Return version string *
*************************************************/
@@ -334,9 +480,9 @@ while (length-- > 0)
/* This function is called when a \ has been encountered. It either returns a
positive value for a simple escape such as \n, or a negative value which
-encodes one of the more complicated things such as \d. On entry, ptr is
-pointing at the \. On exit, it is on the final character of the escape
-sequence.
+encodes one of the more complicated things such as \d. When UTF-8 is enabled,
+a positive value greater than 255 may be returned. On entry, ptr is pointing at
+the \. On exit, it is on the final character of the escape sequence.
Arguments:
ptrptr points to the pattern position pointer
@@ -358,7 +504,9 @@ check_escape(const uschar **ptrptr, const char **errorptr, int bracount,
const uschar *ptr = *ptrptr;
int c, i;
-c = *(++ptr) & 255; /* Ensure > 0 on signed-char systems */
+/* If backslash is at the end of the pattern, it's an error. */
+
+c = *(++ptr);
if (c == 0) *errorptr = ERR1;
/* Digits or letters may have special meaning; all others are literals. */
@@ -418,18 +566,46 @@ else
}
/* \0 always starts an octal number, but we may drop through to here with a
- larger first octal digit */
+ larger first octal digit. */
case '0':
c -= '0';
while(i++ < 2 && (cd->ctypes[ptr[1]] & ctype_digit) != 0 &&
ptr[1] != '8' && ptr[1] != '9')
c = c * 8 + *(++ptr) - '0';
+ c &= 255; /* Take least significant 8 bits */
break;
- /* Special escapes not starting with a digit are straightforward */
+ /* \x is complicated when UTF-8 is enabled. \x{ddd} is a character number
+ which can be greater than 0xff, but only if the ddd are hex digits. */
case 'x':
+#ifdef SUPPORT_UTF8
+ if (ptr[1] == '{' && (options & PCRE_UTF8) != 0)
+ {
+ const uschar *pt = ptr + 2;
+ register int count = 0;
+ c = 0;
+ while ((cd->ctypes[*pt] & ctype_xdigit) != 0)
+ {
+ count++;
+ c = c * 16 + cd->lcc[*pt] -
+ (((cd->ctypes[*pt] & ctype_digit) != 0)? '0' : 'W');
+ pt++;
+ }
+ if (*pt == '}')
+ {
+ if (c < 0 || count > 8) *errorptr = ERR34;
+ ptr = pt;
+ break;
+ }
+ /* If the sequence of hex digits does not end with '}', then we don't
+ recognize this construct; fall through to the normal \x handling. */
+ }
+#endif
+
+ /* Read just a single hex char */
+
c = 0;
while (i++ < 2 && (cd->ctypes[ptr[1]] & ctype_xdigit) != 0)
{
@@ -439,6 +615,8 @@ else
}
break;
+ /* Other special escapes not starting with a digit are straightforward */
+
case 'c':
c = *(++ptr);
if (c == 0)
@@ -576,12 +754,13 @@ if the length is fixed. This is needed for dealing with backward assertions.
Arguments:
code points to the start of the pattern (the bracket)
+ options the compiling options
Returns: the fixed length, or -1 if there is no fixed length
*/
static int
-find_fixedlength(uschar *code)
+find_fixedlength(uschar *code, int options)
{
int length = -1;
@@ -602,7 +781,7 @@ for (;;)
case OP_BRA:
case OP_ONCE:
case OP_COND:
- d = find_fixedlength(cc);
+ d = find_fixedlength(cc, options);
if (d < 0) return -1;
branchlength += d;
do cc += (cc[1] << 8) + cc[2]; while (*cc == OP_ALT);
@@ -638,10 +817,11 @@ for (;;)
/* Skip over things that don't match chars */
case OP_REVERSE:
+ case OP_BRANUMBER:
+ case OP_CREF:
cc++;
/* Fall through */
- case OP_CREF:
case OP_OPT:
cc++;
/* Fall through */
@@ -656,10 +836,17 @@ for (;;)
cc++;
break;
- /* Handle char strings */
+ /* Handle char strings. In UTF-8 mode we must count characters, not bytes.
+ This requires a scan of the string, unfortunately. We assume valid UTF-8
+ strings, so all we do is reduce the length by one for byte whose bits are
+ 10xxxxxx. */
case OP_CHARS:
branchlength += *(++cc);
+#ifdef SUPPORT_UTF8
+ for (d = 1; d <= *cc; d++)
+ if ((cc[d] & 0xc0) == 0x80) branchlength--;
+#endif
cc += *cc + 1;
break;
@@ -688,7 +875,7 @@ for (;;)
/* Check a class for variable quantification */
case OP_CLASS:
- cc += (*cc == OP_REF)? 2 : 33;
+ cc += 33;
switch (*cc)
{
@@ -795,7 +982,7 @@ return -1;
Arguments:
options the option bits
- brackets points to number of brackets used
+ brackets points to number of extracting brackets used
code points to the pointer to the current code point
ptrptr points to the current pattern pointer
errorptr points to pointer to error message
@@ -846,7 +1033,7 @@ for (;; ptr++)
int class_charcount;
int class_lastchar;
int newoptions;
- int condref;
+ int skipbytes;
int subreqchar;
c = *ptr;
@@ -855,7 +1042,9 @@ for (;; ptr++)
if ((cd->ctypes[c] & ctype_space) != 0) continue;
if (c == '#')
{
- while ((c = *(++ptr)) != 0 && c != '\n');
+ /* The space before the ; is to avoid a warning on a silly compiler
+ on the Macintosh. */
+ while ((c = *(++ptr)) != 0 && c != NEWLINE) ;
continue;
}
}
@@ -1037,7 +1226,17 @@ for (;; ptr++)
goto FAILED;
}
}
- /* Fall through if single character */
+
+ /* Fall through if single character, but don't at present allow
+ chars > 255 in UTF-8 mode. */
+
+#ifdef SUPPORT_UTF8
+ if (c > 255)
+ {
+ *errorptr = ERR33;
+ goto FAILED;
+ }
+#endif
}
/* A single character may be followed by '-' to form a range. However,
@@ -1057,17 +1256,29 @@ for (;; ptr++)
}
/* The second part of a range can be a single-character escape, but
- not any of the other escapes. */
+ not any of the other escapes. Perl 5.6 treats a hyphen as a literal
+ in such circumstances. */
if (d == '\\')
{
+ const uschar *oldptr = ptr;
d = check_escape(&ptr, errorptr, *brackets, options, TRUE, cd);
+
+#ifdef SUPPORT_UTF8
+ if (d > 255)
+ {
+ *errorptr = ERR33;
+ goto FAILED;
+ }
+#endif
+ /* \b is backslash; any other special means the '-' was literal */
+
if (d < 0)
{
if (d == -ESC_b) d = '\b'; else
{
- *errorptr = ERR7;
- goto FAILED;
+ ptr = oldptr - 2;
+ goto SINGLE_CHARACTER; /* A few lines below */
}
}
}
@@ -1095,6 +1306,8 @@ for (;; ptr++)
/* Handle a lone single character - we can get here for a normal
non-escape char, or after \ that introduces a single character. */
+ SINGLE_CHARACTER:
+
class [c/8] |= (1 << (c&7));
if ((options & PCRE_CASELESS) != 0)
{
@@ -1369,7 +1582,7 @@ for (;; ptr++)
OP_BRAZERO in front of it, and because the group appears once in the
data, whereas in other cases it appears the minimum number of times. For
this reason, it is simplest to treat this case separately, as otherwise
- the code gets far too mess. There are several special subcases when the
+ the code gets far too messy. There are several special subcases when the
minimum is zero. */
if (repeat_min == 0)
@@ -1520,7 +1733,7 @@ for (;; ptr++)
case '(':
newoptions = options;
- condref = -1;
+ skipbytes = 0;
if (*(++ptr) == '?')
{
@@ -1543,9 +1756,18 @@ for (;; ptr++)
bravalue = OP_COND; /* Conditional group */
if ((cd->ctypes[*(++ptr)] & ctype_digit) != 0)
{
- condref = *ptr - '0';
+ int condref = *ptr - '0';
while (*(++ptr) != ')') condref = condref*10 + *ptr - '0';
+ if (condref == 0)
+ {
+ *errorptr = ERR35;
+ goto FAILED;
+ }
ptr++;
+ code[3] = OP_CREF;
+ code[4] = condref >> 8;
+ code[5] = condref & 255;
+ skipbytes = 3;
}
else ptr--;
break;
@@ -1648,16 +1870,21 @@ for (;; ptr++)
}
}
- /* Else we have a referencing group; adjust the opcode. */
+ /* Else we have a referencing group; adjust the opcode. If the bracket
+ number is greater than EXTRACT_BASIC_MAX, we set the opcode one higher, and
+ arrange for the true number to follow later, in an OP_BRANUMBER item. */
else
{
- if (++(*brackets) > EXTRACT_MAX)
+ if (++(*brackets) > EXTRACT_BASIC_MAX)
{
- *errorptr = ERR13;
- goto FAILED;
+ bravalue = OP_BRA + EXTRACT_BASIC_MAX + 1;
+ code[3] = OP_BRANUMBER;
+ code[4] = *brackets >> 8;
+ code[5] = *brackets & 255;
+ skipbytes = 3;
}
- bravalue = OP_BRA + *brackets;
+ else bravalue = OP_BRA + *brackets;
}
/* Process nested bracketed re. Assertions may not be repeated, but other
@@ -1673,13 +1900,13 @@ for (;; ptr++)
options | PCRE_INGROUP, /* Set for all nested groups */
((options & PCRE_IMS) != (newoptions & PCRE_IMS))?
newoptions & PCRE_IMS : -1, /* Pass ims options if changed */
- brackets, /* Bracket level */
+ brackets, /* Extracting bracket count */
&tempcode, /* Where to put code (updated) */
&ptr, /* Input pointer (updated) */
errorptr, /* Where to put an error message */
(bravalue == OP_ASSERTBACK ||
bravalue == OP_ASSERTBACK_NOT), /* TRUE if back assert */
- condref, /* Condition reference number */
+ skipbytes, /* Skip over OP_COND/OP_BRANUMBER */
&subreqchar, /* For possible last char */
&subcountlits, /* For literal count */
cd)) /* Tables block */
@@ -1693,7 +1920,7 @@ for (;; ptr++)
/* If this is a conditional bracket, check that there are no more than
two branches in the group. */
- if (bravalue == OP_COND)
+ else if (bravalue == OP_COND)
{
uschar *tc = code;
condcount = 0;
@@ -1760,9 +1987,11 @@ for (;; ptr++)
{
if (-c >= ESC_REF)
{
+ int number = -c - ESC_REF;
previous = code;
*code++ = OP_REF;
- *code++ = -c - ESC_REF;
+ *code++ = number >> 8;
+ *code++ = number & 255;
}
else
{
@@ -1795,7 +2024,9 @@ for (;; ptr++)
if ((cd->ctypes[c] & ctype_space) != 0) continue;
if (c == '#')
{
- while ((c = *(++ptr)) != 0 && c != '\n');
+ /* The space before the ; is to avoid a warning on a silly compiler
+ on the Macintosh. */
+ while ((c = *(++ptr)) != 0 && c != NEWLINE) ;
if (c == 0) break;
continue;
}
@@ -1810,6 +2041,20 @@ for (;; ptr++)
tempptr = ptr;
c = check_escape(&ptr, errorptr, *brackets, options, FALSE, cd);
if (c < 0) { ptr = tempptr; break; }
+
+ /* If a character is > 127 in UTF-8 mode, we have to turn it into
+ two or more characters in the UTF-8 encoding. */
+
+#ifdef SUPPORT_UTF8
+ if (c > 127 && (options & PCRE_UTF8) != 0)
+ {
+ uschar buffer[8];
+ int len = ord2utf8(c, buffer);
+ for (c = 0; c < len; c++) *code++ = buffer[c];
+ length += len;
+ continue;
+ }
+#endif
}
/* Ordinary character or single-char escape */
@@ -1820,7 +2065,7 @@ for (;; ptr++)
/* This "while" is the end of the "do" above. */
- while (length < 255 && (cd->ctypes[c = *(++ptr)] & ctype_meta) == 0);
+ while (length < MAXLIT && (cd->ctypes[c = *(++ptr)] & ctype_meta) == 0);
/* Update the last character and the count of literals */
@@ -1832,7 +2077,7 @@ for (;; ptr++)
the next state. */
previous[1] = length;
- if (length < 255) ptr--;
+ if (length < MAXLIT) ptr--;
break;
}
} /* end of big loop */
@@ -1870,7 +2115,7 @@ Argument:
ptrptr -> the address of the current pattern pointer
errorptr -> pointer to error message
lookbehind TRUE if this is a lookbehind assertion
- condref > 0 for OPT_CREF setting at start of conditional group
+ skipbytes skip this many bytes at start (for OP_COND, OP_BRANUMBER)
reqchar -> place to put the last required character, or a negative number
countlits -> place to put the shortest literal count of any branch
cd points to the data block with tables pointers
@@ -1880,7 +2125,7 @@ Returns: TRUE on success
static BOOL
compile_regex(int options, int optchanged, int *brackets, uschar **codeptr,
- const uschar **ptrptr, const char **errorptr, BOOL lookbehind, int condref,
+ const uschar **ptrptr, const char **errorptr, BOOL lookbehind, int skipbytes,
int *reqchar, int *countlits, compile_data *cd)
{
const uschar *ptr = *ptrptr;
@@ -1893,16 +2138,7 @@ int branchreqchar, branchcountlits;
*reqchar = -1;
*countlits = INT_MAX;
-code += 3;
-
-/* At the start of a reference-based conditional group, insert the reference
-number as an OP_CREF item. */
-
-if (condref > 0)
- {
- *code++ = OP_CREF;
- *code++ = condref;
- }
+code += 3 + skipbytes;
/* Loop for each alternative branch */
@@ -1970,7 +2206,7 @@ for (;;)
if (lookbehind)
{
*code = OP_END;
- length = find_fixedlength(last_branch);
+ length = find_fixedlength(last_branch, options);
DPRINTF(("fixed length = %d\n", length));
if (length < 0)
{
@@ -2054,7 +2290,8 @@ for (;;)
break;
case OP_CREF:
- code += 2;
+ case OP_BRANUMBER:
+ code += 3;
break;
case OP_WORD_BOUNDARY:
@@ -2261,6 +2498,16 @@ uschar bralenstack[BRASTACK_SIZE];
uschar *code_base, *code_end;
#endif
+/* Can't support UTF8 unless PCRE has been compiled to include the code. */
+
+#ifndef SUPPORT_UTF8
+if ((options & PCRE_UTF8) != 0)
+ {
+ *errorptr = ERR32;
+ return NULL;
+ }
+#endif
+
/* We can't pass back an error message if errorptr is NULL; I guess the best we
can do is just return NULL. */
@@ -2307,13 +2554,16 @@ while ((c = *(++ptr)) != 0)
{
int min, max;
int class_charcount;
+ int bracket_length;
if ((options & PCRE_EXTENDED) != 0)
{
if ((compile_block.ctypes[c] & ctype_space) != 0) continue;
if (c == '#')
{
- while ((c = *(++ptr)) != 0 && c != '\n');
+ /* The space before the ; is to avoid a warning on a silly compiler
+ on the Macintosh. */
+ while ((c = *(++ptr)) != 0 && c != NEWLINE) ;
continue;
}
}
@@ -2339,7 +2589,7 @@ while ((c = *(++ptr)) != 0)
}
length++;
- /* A back reference needs an additional char, plus either one or 5
+ /* A back reference needs an additional 2 bytes, plus either one or 5
bytes for a repeat. We also need to keep the value of the highest
back reference. */
@@ -2347,7 +2597,7 @@ while ((c = *(++ptr)) != 0)
{
int refnum = -c - ESC_REF;
if (refnum > top_backref) top_backref = refnum;
- length++; /* For single back reference */
+ length += 2; /* For single back reference */
if (ptr[1] == '{' && is_counted_repeat(ptr+2, &compile_block))
{
ptr = read_repeat_counts(ptr+2, &min, &max, errorptr, &compile_block);
@@ -2445,6 +2695,7 @@ while ((c = *(++ptr)) != 0)
case '(':
branch_newextra = 0;
+ bracket_length = 3;
/* Handle special forms of bracket, which all start (? */
@@ -2512,7 +2763,7 @@ while ((c = *(++ptr)) != 0)
if ((compile_block.ctypes[ptr[3]] & ctype_digit) != 0)
{
ptr += 4;
- length += 2;
+ length += 3;
while ((compile_block.ctypes[*ptr] & ctype_digit) != 0) ptr++;
if (*ptr != ')')
{
@@ -2523,8 +2774,8 @@ while ((c = *(++ptr)) != 0)
else /* An assertion must follow */
{
ptr++; /* Can treat like ':' as far as spacing is concerned */
-
- if (ptr[2] != '?' || strchr("=!<", ptr[3]) == NULL)
+ if (ptr[2] != '?' ||
+ (ptr[3] != '=' && ptr[3] != '!' && ptr[3] != '<') )
{
ptr += 2; /* To get right offset in message */
*errorptr = ERR28;
@@ -2639,15 +2890,19 @@ while ((c = *(++ptr)) != 0)
}
/* Extracting brackets must be counted so we can process escapes in a
- Perlish way. */
+ Perlish way. If the number exceeds EXTRACT_BASIC_MAX we are going to
+ need an additional 3 bytes of store per extracting bracket. */
- else bracount++;
+ else
+ {
+ bracount++;
+ if (bracount > EXTRACT_BASIC_MAX) bracket_length += 3;
+ }
- /* Non-special forms of bracket. Save length for computing whole length
- at end if there's a repeat that requires duplication of the group. Also
- save the current value of branch_extra, and start the new group with
- the new value. If non-zero, this will either be 2 for a (?imsx: group, or 3
- for a lookbehind assertion. */
+ /* Save length for computing whole length at end if there's a repeat that
+ requires duplication of the group. Also save the current value of
+ branch_extra, and start the new group with the new value. If non-zero, this
+ will either be 2 for a (?imsx: group, or 3 for a lookbehind assertion. */
if (brastackptr >= sizeof(brastack)/sizeof(int))
{
@@ -2659,7 +2914,7 @@ while ((c = *(++ptr)) != 0)
branch_extra = branch_newextra;
brastack[brastackptr++] = length;
- length += 3;
+ length += bracket_length;
continue;
/* Handle ket. Look for subsequent max/min; for certain sets of values we
@@ -2737,7 +2992,9 @@ while ((c = *(++ptr)) != 0)
if ((compile_block.ctypes[c] & ctype_space) != 0) continue;
if (c == '#')
{
- while ((c = *(++ptr)) != 0 && c != '\n');
+ /* The space before the ; is to avoid a warning on a silly compiler
+ on the Macintosh. */
+ while ((c = *(++ptr)) != 0 && c != NEWLINE) ;
continue;
}
}
@@ -2752,6 +3009,16 @@ while ((c = *(++ptr)) != 0)
&compile_block);
if (*errorptr != NULL) goto PCRE_ERROR_RETURN;
if (c < 0) { ptr = saveptr; break; }
+
+#ifdef SUPPORT_UTF8
+ if (c > 127 && (options & PCRE_UTF8) != 0)
+ {
+ int i;
+ for (i = 0; i < sizeof(utf8_table1)/sizeof(int); i++)
+ if (c <= utf8_table1[i]) break;
+ runlength += i;
+ }
+#endif
}
/* Ordinary character or single-char escape */
@@ -2761,7 +3028,7 @@ while ((c = *(++ptr)) != 0)
/* This "while" is the end of the "do" above. */
- while (runlength < 255 &&
+ while (runlength < MAXLIT &&
(compile_block.ctypes[c = *(++ptr)] & ctype_meta) == 0);
ptr--;
@@ -2808,7 +3075,7 @@ ptr = (const uschar *)pattern;
code = re->code;
*code = OP_BRA;
bracount = 0;
-(void)compile_regex(options, -1, &bracount, &code, &ptr, errorptr, FALSE, -1,
+(void)compile_regex(options, -1, &bracount, &code, &ptr, errorptr, FALSE, 0,
&reqchar, &countlits, &compile_block);
re->top_bracket = bracount;
re->top_backref = top_backref;
@@ -2922,7 +3189,10 @@ while (code < code_end)
if (*code >= OP_BRA)
{
- printf("%3d Bra %d", (code[1] << 8) + code[2], *code - OP_BRA);
+ if (*code - OP_BRA > EXTRACT_BASIC_MAX)
+ printf("%3d Bra extra", (code[1] << 8) + code[2]);
+ else
+ printf("%3d Bra %d", (code[1] << 8) + code[2], *code - OP_BRA);
code += 2;
}
@@ -2933,16 +3203,6 @@ while (code < code_end)
code++;
break;
- case OP_COND:
- printf("%3d Cond", (code[1] << 8) + code[2]);
- code += 2;
- break;
-
- case OP_CREF:
- printf(" %.2d %s", code[1], OP_names[*code]);
- code++;
- break;
-
case OP_CHARS:
charlength = *(++code);
printf("%3d ", charlength);
@@ -2959,11 +3219,10 @@ while (code < code_end)
case OP_ASSERTBACK:
case OP_ASSERTBACK_NOT:
case OP_ONCE:
- printf("%3d %s", (code[1] << 8) + code[2], OP_names[*code]);
- code += 2;
- break;
-
case OP_REVERSE:
+ case OP_BRANUMBER:
+ case OP_COND:
+ case OP_CREF:
printf("%3d %s", (code[1] << 8) + code[2], OP_names[*code]);
code += 2;
break;
@@ -3036,8 +3295,8 @@ while (code < code_end)
break;
case OP_REF:
- printf(" \\%d", *(++code));
- code ++;
+ printf(" \\%d", (code[1] << 8) | code[2]);
+ code += 3;
goto CLASS_REF_REPEAT;
case OP_CLASS:
@@ -3195,18 +3454,36 @@ Arguments:
offset_top current top pointer
md pointer to "static" info for the match
ims current /i, /m, and /s options
- condassert TRUE if called to check a condition assertion
- eptrb eptr at start of last bracket
+ eptrb pointer to chain of blocks containing eptr at start of
+ brackets - for testing for empty matches
+ flags can contain
+ match_condassert - this is an assertion condition
+ match_isgroup - this is the start of a bracketed group
Returns: TRUE if matched
*/
static BOOL
match(register const uschar *eptr, register const uschar *ecode,
- int offset_top, match_data *md, unsigned long int ims, BOOL condassert,
- const uschar *eptrb)
+ int offset_top, match_data *md, unsigned long int ims, eptrblock *eptrb,
+ int flags)
{
unsigned long int original_ims = ims; /* Save for resetting on ')' */
+eptrblock newptrb;
+
+/* At the start of a bracketed group, add the current subject pointer to the
+stack of such pointers, to be re-instated at the end of the group when we hit
+the closing ket. When match() is called in other circumstances, we don't add to
+the stack. */
+
+if ((flags & match_isgroup) != 0)
+ {
+ newptrb.prev = eptrb;
+ newptrb.saved_eptr = eptr;
+ eptrb = &newptrb;
+ }
+
+/* Now start processing the operations. */
for (;;)
{
@@ -3232,8 +3509,14 @@ for (;;)
if (op > OP_BRA)
{
+ int offset;
int number = op - OP_BRA;
- int offset = number << 1;
+
+ /* For extended extraction brackets (large number), we have to fish out the
+ number from a dummy opcode at the start. */
+
+ if (number > EXTRACT_BASIC_MAX) number = (ecode[4] << 8) | ecode[5];
+ offset = number << 1;
#ifdef DEBUG
printf("start bracket %d subject=", number);
@@ -3252,7 +3535,8 @@ for (;;)
do
{
- if (match(eptr, ecode+3, offset_top, md, ims, FALSE, eptr)) return TRUE;
+ if (match(eptr, ecode+3, offset_top, md, ims, eptrb, match_isgroup))
+ return TRUE;
ecode += (ecode[1] << 8) + ecode[2];
}
while (*ecode == OP_ALT);
@@ -3262,6 +3546,7 @@ for (;;)
md->offset_vector[offset] = save_offset1;
md->offset_vector[offset+1] = save_offset2;
md->offset_vector[md->offset_end - number] = save_offset3;
+
return FALSE;
}
@@ -3278,7 +3563,8 @@ for (;;)
DPRINTF(("start bracket 0\n"));
do
{
- if (match(eptr, ecode+3, offset_top, md, ims, FALSE, eptr)) return TRUE;
+ if (match(eptr, ecode+3, offset_top, md, ims, eptrb, match_isgroup))
+ return TRUE;
ecode += (ecode[1] << 8) + ecode[2];
}
while (*ecode == OP_ALT);
@@ -3293,11 +3579,11 @@ for (;;)
case OP_COND:
if (ecode[3] == OP_CREF) /* Condition is extraction test */
{
- int offset = ecode[4] << 1; /* Doubled reference number */
+ int offset = (ecode[4] << 9) | (ecode[5] << 1); /* Doubled ref number */
return match(eptr,
ecode + ((offset < offset_top && md->offset_vector[offset] >= 0)?
- 5 : 3 + (ecode[1] << 8) + ecode[2]),
- offset_top, md, ims, FALSE, eptr);
+ 6 : 3 + (ecode[1] << 8) + ecode[2]),
+ offset_top, md, ims, eptrb, match_isgroup);
}
/* The condition is an assertion. Call match() to evaluate it - setting
@@ -3305,20 +3591,23 @@ for (;;)
else
{
- if (match(eptr, ecode+3, offset_top, md, ims, TRUE, NULL))
+ if (match(eptr, ecode+3, offset_top, md, ims, NULL,
+ match_condassert | match_isgroup))
{
ecode += 3 + (ecode[4] << 8) + ecode[5];
while (*ecode == OP_ALT) ecode += (ecode[1] << 8) + ecode[2];
}
else ecode += (ecode[1] << 8) + ecode[2];
- return match(eptr, ecode+3, offset_top, md, ims, FALSE, eptr);
+ return match(eptr, ecode+3, offset_top, md, ims, eptrb, match_isgroup);
}
/* Control never reaches here */
- /* Skip over conditional reference data if encountered (should not be) */
+ /* Skip over conditional reference or large extraction number data if
+ encountered. */
case OP_CREF:
- ecode += 2;
+ case OP_BRANUMBER:
+ ecode += 3;
break;
/* End of the pattern. If PCRE_NOTEMPTY is set, fail if we have matched
@@ -3348,7 +3637,7 @@ for (;;)
case OP_ASSERTBACK:
do
{
- if (match(eptr, ecode+3, offset_top, md, ims, FALSE, NULL)) break;
+ if (match(eptr, ecode+3, offset_top, md, ims, NULL, match_isgroup)) break;
ecode += (ecode[1] << 8) + ecode[2];
}
while (*ecode == OP_ALT);
@@ -3356,7 +3645,7 @@ for (;;)
/* If checking an assertion for a condition, return TRUE. */
- if (condassert) return TRUE;
+ if ((flags & match_condassert) != 0) return TRUE;
/* Continue from after the assertion, updating the offsets high water
mark, since extracts may have been taken during the assertion. */
@@ -3372,21 +3661,34 @@ for (;;)
case OP_ASSERTBACK_NOT:
do
{
- if (match(eptr, ecode+3, offset_top, md, ims, FALSE, NULL)) return FALSE;
+ if (match(eptr, ecode+3, offset_top, md, ims, NULL, match_isgroup))
+ return FALSE;
ecode += (ecode[1] << 8) + ecode[2];
}
while (*ecode == OP_ALT);
- if (condassert) return TRUE;
+ if ((flags & match_condassert) != 0) return TRUE;
+
ecode += 3;
continue;
/* Move the subject pointer back. This occurs only at the start of
each branch of a lookbehind assertion. If we are too close to the start to
- move back, this match function fails. */
+ move back, this match function fails. When working with UTF-8 we move
+ back a number of characters, not bytes. */
case OP_REVERSE:
+#ifdef SUPPORT_UTF8
+ c = (ecode[1] << 8) + ecode[2];
+ for (i = 0; i < c; i++)
+ {
+ eptr--;
+ BACKCHAR(eptr)
+ }
+#else
eptr -= (ecode[1] << 8) + ecode[2];
+#endif
+
if (eptr < md->start_subject) return FALSE;
ecode += 3;
break;
@@ -3423,7 +3725,8 @@ for (;;)
for (i = 1; i <= c; i++)
save[i] = md->offset_vector[md->offset_end - i];
- rc = match(eptr, md->start_pattern, offset_top, md, ims, FALSE, eptrb);
+ rc = match(eptr, md->start_pattern, offset_top, md, ims, eptrb,
+ match_isgroup);
for (i = 1; i <= c; i++)
md->offset_vector[md->offset_end - i] = save[i];
if (save != stacksave) (pcre_free)(save);
@@ -3449,10 +3752,12 @@ for (;;)
case OP_ONCE:
{
const uschar *prev = ecode;
+ const uschar *saved_eptr = eptr;
do
{
- if (match(eptr, ecode+3, offset_top, md, ims, FALSE, eptr)) break;
+ if (match(eptr, ecode+3, offset_top, md, ims, eptrb, match_isgroup))
+ break;
ecode += (ecode[1] << 8) + ecode[2];
}
while (*ecode == OP_ALT);
@@ -3475,7 +3780,7 @@ for (;;)
5.005. If there is an options reset, it will get obeyed in the normal
course of events. */
- if (*ecode == OP_KET || eptr == eptrb)
+ if (*ecode == OP_KET || eptr == saved_eptr)
{
ecode += 3;
break;
@@ -3494,13 +3799,14 @@ for (;;)
if (*ecode == OP_KETRMIN)
{
- if (match(eptr, ecode+3, offset_top, md, ims, FALSE, eptr) ||
- match(eptr, prev, offset_top, md, ims, FALSE, eptr)) return TRUE;
+ if (match(eptr, ecode+3, offset_top, md, ims, eptrb, 0) ||
+ match(eptr, prev, offset_top, md, ims, eptrb, match_isgroup))
+ return TRUE;
}
else /* OP_KETRMAX */
{
- if (match(eptr, prev, offset_top, md, ims, FALSE, eptr) ||
- match(eptr, ecode+3, offset_top, md, ims, FALSE, eptr)) return TRUE;
+ if (match(eptr, prev, offset_top, md, ims, eptrb, match_isgroup) ||
+ match(eptr, ecode+3, offset_top, md, ims, eptrb, 0)) return TRUE;
}
}
return FALSE;
@@ -3521,7 +3827,8 @@ for (;;)
case OP_BRAZERO:
{
const uschar *next = ecode+1;
- if (match(eptr, next, offset_top, md, ims, FALSE, eptr)) return TRUE;
+ if (match(eptr, next, offset_top, md, ims, eptrb, match_isgroup))
+ return TRUE;
do next += (next[1] << 8) + next[2]; while (*next == OP_ALT);
ecode = next + 3;
}
@@ -3531,7 +3838,8 @@ for (;;)
{
const uschar *next = ecode+1;
do next += (next[1] << 8) + next[2]; while (*next == OP_ALT);
- if (match(eptr, next+3, offset_top, md, ims, FALSE, eptr)) return TRUE;
+ if (match(eptr, next+3, offset_top, md, ims, eptrb, match_isgroup))
+ return TRUE;
ecode++;
}
break;
@@ -3546,6 +3854,9 @@ for (;;)
case OP_KETRMAX:
{
const uschar *prev = ecode - (ecode[1] << 8) - ecode[2];
+ const uschar *saved_eptr = eptrb->saved_eptr;
+
+ eptrb = eptrb->prev; /* Back up the stack of bracket start pointers */
if (*prev == OP_ASSERT || *prev == OP_ASSERT_NOT ||
*prev == OP_ASSERTBACK || *prev == OP_ASSERTBACK_NOT ||
@@ -3562,10 +3873,19 @@ for (;;)
if (*prev != OP_COND)
{
+ int offset;
int number = *prev - OP_BRA;
- int offset = number << 1;
- DPRINTF(("end bracket %d\n", number));
+ /* For extended extraction brackets (large number), we have to fish out
+ the number from a dummy opcode at the start. */
+
+ if (number > EXTRACT_BASIC_MAX) number = (prev[4] << 8) | prev[5];
+ offset = number << 1;
+
+#ifdef DEBUG
+ printf("end bracket %d", number);
+ printf("\n");
+#endif
if (number > 0)
{
@@ -3591,7 +3911,7 @@ for (;;)
5.005. If there is an options reset, it will get obeyed in the normal
course of events. */
- if (*ecode == OP_KET || eptr == eptrb)
+ if (*ecode == OP_KET || eptr == saved_eptr)
{
ecode += 3;
break;
@@ -3602,13 +3922,14 @@ for (;;)
if (*ecode == OP_KETRMIN)
{
- if (match(eptr, ecode+3, offset_top, md, ims, FALSE, eptr) ||
- match(eptr, prev, offset_top, md, ims, FALSE, eptr)) return TRUE;
+ if (match(eptr, ecode+3, offset_top, md, ims, eptrb, 0) ||
+ match(eptr, prev, offset_top, md, ims, eptrb, match_isgroup))
+ return TRUE;
}
else /* OP_KETRMAX */
{
- if (match(eptr, prev, offset_top, md, ims, FALSE, eptr) ||
- match(eptr, ecode+3, offset_top, md, ims, FALSE, eptr)) return TRUE;
+ if (match(eptr, prev, offset_top, md, ims, eptrb, match_isgroup) ||
+ match(eptr, ecode+3, offset_top, md, ims, eptrb, 0)) return TRUE;
}
}
return FALSE;
@@ -3619,7 +3940,7 @@ for (;;)
if (md->notbol && eptr == md->start_subject) return FALSE;
if ((ims & PCRE_MULTILINE) != 0)
{
- if (eptr != md->start_subject && eptr[-1] != '\n') return FALSE;
+ if (eptr != md->start_subject && eptr[-1] != NEWLINE) return FALSE;
ecode++;
break;
}
@@ -3638,7 +3959,7 @@ for (;;)
case OP_DOLL:
if ((ims & PCRE_MULTILINE) != 0)
{
- if (eptr < md->end_subject) { if (*eptr != '\n') return FALSE; }
+ if (eptr < md->end_subject) { if (*eptr != NEWLINE) return FALSE; }
else { if (md->noteol) return FALSE; }
ecode++;
break;
@@ -3649,7 +3970,7 @@ for (;;)
if (!md->endonly)
{
if (eptr < md->end_subject - 1 ||
- (eptr == md->end_subject - 1 && *eptr != '\n')) return FALSE;
+ (eptr == md->end_subject - 1 && *eptr != NEWLINE)) return FALSE;
ecode++;
break;
@@ -3668,7 +3989,7 @@ for (;;)
case OP_EODN:
if (eptr < md->end_subject - 1 ||
- (eptr == md->end_subject - 1 && *eptr != '\n')) return FALSE;
+ (eptr == md->end_subject - 1 && *eptr != NEWLINE)) return FALSE;
ecode++;
break;
@@ -3690,9 +4011,13 @@ for (;;)
/* Match a single character type; inline for speed */
case OP_ANY:
- if ((ims & PCRE_DOTALL) == 0 && eptr < md->end_subject && *eptr == '\n')
+ if ((ims & PCRE_DOTALL) == 0 && eptr < md->end_subject && *eptr == NEWLINE)
return FALSE;
if (eptr++ >= md->end_subject) return FALSE;
+#ifdef SUPPORT_UTF8
+ if (md->utf8)
+ while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++;
+#endif
ecode++;
break;
@@ -3749,8 +4074,8 @@ for (;;)
case OP_REF:
{
int length;
- int offset = ecode[1] << 1; /* Doubled reference number */
- ecode += 2; /* Advance past the item */
+ int offset = (ecode[1] << 9) | (ecode[2] << 1); /* Doubled ref number */
+ ecode += 3; /* Advance past item */
/* If the reference is unset, set the length to be longer than the amount
of subject left; this ensures that every attempt at a match fails. We
@@ -3819,7 +4144,7 @@ for (;;)
{
for (i = min;; i++)
{
- if (match(eptr, ecode, offset_top, md, ims, FALSE, eptrb))
+ if (match(eptr, ecode, offset_top, md, ims, eptrb, 0))
return TRUE;
if (i >= max || !match_ref(offset, eptr, length, md, ims))
return FALSE;
@@ -3840,7 +4165,7 @@ for (;;)
}
while (eptr >= pp)
{
- if (match(eptr, ecode, offset_top, md, ims, FALSE, eptrb))
+ if (match(eptr, ecode, offset_top, md, ims, eptrb, 0))
return TRUE;
eptr -= length;
}
@@ -3894,7 +4219,13 @@ for (;;)
for (i = 1; i <= min; i++)
{
if (eptr >= md->end_subject) return FALSE;
- c = *eptr++;
+ GETCHARINC(c, eptr) /* Get character; increment eptr */
+
+#ifdef SUPPORT_UTF8
+ /* We do not yet support class members > 255 */
+ if (c > 255) return FALSE;
+#endif
+
if ((data[c/8] & (1 << (c&7))) != 0) continue;
return FALSE;
}
@@ -3911,10 +4242,15 @@ for (;;)
{
for (i = min;; i++)
{
- if (match(eptr, ecode, offset_top, md, ims, FALSE, eptrb))
+ if (match(eptr, ecode, offset_top, md, ims, eptrb, 0))
return TRUE;
if (i >= max || eptr >= md->end_subject) return FALSE;
- c = *eptr++;
+ GETCHARINC(c, eptr) /* Get character; increment eptr */
+
+#ifdef SUPPORT_UTF8
+ /* We do not yet support class members > 255 */
+ if (c > 255) return FALSE;
+#endif
if ((data[c/8] & (1 << (c&7))) != 0) continue;
return FALSE;
}
@@ -3926,17 +4262,29 @@ for (;;)
else
{
const uschar *pp = eptr;
- for (i = min; i < max; eptr++, i++)
+ int len = 1;
+ for (i = min; i < max; i++)
{
if (eptr >= md->end_subject) break;
- c = *eptr;
- if ((data[c/8] & (1 << (c&7))) != 0) continue;
- break;
+ GETCHARLEN(c, eptr, len) /* Get character, set length if UTF-8 */
+
+#ifdef SUPPORT_UTF8
+ /* We do not yet support class members > 255 */
+ if (c > 255) break;
+#endif
+ if ((data[c/8] & (1 << (c&7))) == 0) break;
+ eptr += len;
}
while (eptr >= pp)
- if (match(eptr--, ecode, offset_top, md, ims, FALSE, eptrb))
+ {
+ if (match(eptr--, ecode, offset_top, md, ims, eptrb, 0))
return TRUE;
+
+#ifdef SUPPORT_UTF8
+ BACKCHAR(eptr)
+#endif
+ }
return FALSE;
}
}
@@ -4032,7 +4380,7 @@ for (;;)
{
for (i = min;; i++)
{
- if (match(eptr, ecode, offset_top, md, ims, FALSE, eptrb))
+ if (match(eptr, ecode, offset_top, md, ims, eptrb, 0))
return TRUE;
if (i >= max || eptr >= md->end_subject ||
c != md->lcc[*eptr++])
@@ -4049,7 +4397,7 @@ for (;;)
eptr++;
}
while (eptr >= pp)
- if (match(eptr--, ecode, offset_top, md, ims, FALSE, eptrb))
+ if (match(eptr--, ecode, offset_top, md, ims, eptrb, 0))
return TRUE;
return FALSE;
}
@@ -4066,7 +4414,7 @@ for (;;)
{
for (i = min;; i++)
{
- if (match(eptr, ecode, offset_top, md, ims, FALSE, eptrb))
+ if (match(eptr, ecode, offset_top, md, ims, eptrb, 0))
return TRUE;
if (i >= max || eptr >= md->end_subject || c != *eptr++) return FALSE;
}
@@ -4081,7 +4429,7 @@ for (;;)
eptr++;
}
while (eptr >= pp)
- if (match(eptr--, ecode, offset_top, md, ims, FALSE, eptrb))
+ if (match(eptr--, ecode, offset_top, md, ims, eptrb, 0))
return TRUE;
return FALSE;
}
@@ -4163,7 +4511,7 @@ for (;;)
{
for (i = min;; i++)
{
- if (match(eptr, ecode, offset_top, md, ims, FALSE, eptrb))
+ if (match(eptr, ecode, offset_top, md, ims, eptrb, 0))
return TRUE;
if (i >= max || eptr >= md->end_subject ||
c == md->lcc[*eptr++])
@@ -4180,7 +4528,7 @@ for (;;)
eptr++;
}
while (eptr >= pp)
- if (match(eptr--, ecode, offset_top, md, ims, FALSE, eptrb))
+ if (match(eptr--, ecode, offset_top, md, ims, eptrb, 0))
return TRUE;
return FALSE;
}
@@ -4197,7 +4545,7 @@ for (;;)
{
for (i = min;; i++)
{
- if (match(eptr, ecode, offset_top, md, ims, FALSE, eptrb))
+ if (match(eptr, ecode, offset_top, md, ims, eptrb, 0))
return TRUE;
if (i >= max || eptr >= md->end_subject || c == *eptr++) return FALSE;
}
@@ -4212,7 +4560,7 @@ for (;;)
eptr++;
}
while (eptr >= pp)
- if (match(eptr--, ecode, offset_top, md, ims, FALSE, eptrb))
+ if (match(eptr--, ecode, offset_top, md, ims, eptrb, 0))
return TRUE;
return FALSE;
}
@@ -4256,15 +4604,31 @@ for (;;)
/* First, ensure the minimum number of matches are present. Use inline
code for maximizing the speed, and do the type test once at the start
- (i.e. keep it out of the loop). Also test that there are at least the
- minimum number of characters before we start. */
+ (i.e. keep it out of the loop). Also we can test that there are at least
+ the minimum number of bytes before we start, except when doing '.' in
+ UTF8 mode. Leave the test in in all cases; in the special case we have
+ to test after each character. */
if (min > md->end_subject - eptr) return FALSE;
if (min > 0) switch(ctype)
{
case OP_ANY:
+#ifdef SUPPORT_UTF8
+ if (md->utf8)
+ {
+ for (i = 1; i <= min; i++)
+ {
+ if (eptr >= md->end_subject ||
+ (*eptr++ == NEWLINE && (ims & PCRE_DOTALL) == 0))
+ return FALSE;
+ while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++;
+ }
+ break;
+ }
+#endif
+ /* Non-UTF8 can be faster */
if ((ims & PCRE_DOTALL) == 0)
- { for (i = 1; i <= min; i++) if (*eptr++ == '\n') return FALSE; }
+ { for (i = 1; i <= min; i++) if (*eptr++ == NEWLINE) return FALSE; }
else eptr += min;
break;
@@ -4312,14 +4676,18 @@ for (;;)
{
for (i = min;; i++)
{
- if (match(eptr, ecode, offset_top, md, ims, FALSE, eptrb)) return TRUE;
+ if (match(eptr, ecode, offset_top, md, ims, eptrb, 0)) return TRUE;
if (i >= max || eptr >= md->end_subject) return FALSE;
c = *eptr++;
switch(ctype)
{
case OP_ANY:
- if ((ims & PCRE_DOTALL) == 0 && c == '\n') return FALSE;
+ if ((ims & PCRE_DOTALL) == 0 && c == NEWLINE) return FALSE;
+#ifdef SUPPORT_UTF8
+ if (md->utf8)
+ while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++;
+#endif
break;
case OP_NOT_DIGIT:
@@ -4359,11 +4727,38 @@ for (;;)
switch(ctype)
{
case OP_ANY:
+
+ /* Special code is required for UTF8, but when the maximum is unlimited
+ we don't need it. */
+
+#ifdef SUPPORT_UTF8
+ if (md->utf8 && max < INT_MAX)
+ {
+ if ((ims & PCRE_DOTALL) == 0)
+ {
+ for (i = min; i < max; i++)
+ {
+ if (eptr >= md->end_subject || *eptr++ == NEWLINE) break;
+ while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++;
+ }
+ }
+ else
+ {
+ for (i = min; i < max; i++)
+ {
+ eptr++;
+ while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++;
+ }
+ }
+ break;
+ }
+#endif
+ /* Non-UTF8 can be faster */
if ((ims & PCRE_DOTALL) == 0)
{
for (i = min; i < max; i++)
{
- if (eptr >= md->end_subject || *eptr == '\n') break;
+ if (eptr >= md->end_subject || *eptr == NEWLINE) break;
eptr++;
}
}
@@ -4431,8 +4826,14 @@ for (;;)
}
while (eptr >= pp)
- if (match(eptr--, ecode, offset_top, md, ims, FALSE, eptrb))
+ {
+ if (match(eptr--, ecode, offset_top, md, ims, eptrb, 0))
return TRUE;
+#ifdef SUPPORT_UTF8
+ if (md->utf8)
+ while (eptr > pp && (*eptr & 0xc0) == 0x80) eptr--;
+#endif
+ }
return FALSE;
}
/* Control never gets here */
@@ -4498,8 +4899,8 @@ const uschar *req_char_ptr = start_match - 1;
const real_pcre *re = (const real_pcre *)external_re;
const real_pcre_extra *extra = (const real_pcre_extra *)external_extra;
BOOL using_temporary_offsets = FALSE;
-BOOL anchored = ((re->options | options) & PCRE_ANCHORED) != 0;
-BOOL startline = (re->options & PCRE_STARTLINE) != 0;
+BOOL anchored;
+BOOL startline;
if ((options & ~PUBLIC_EXEC_OPTIONS) != 0) return PCRE_ERROR_BADOPTION;
@@ -4507,12 +4908,16 @@ if (re == NULL || subject == NULL ||
(offsets == NULL && offsetcount > 0)) return PCRE_ERROR_NULL;
if (re->magic_number != MAGIC_NUMBER) return PCRE_ERROR_BADMAGIC;
+anchored = ((re->options | options) & PCRE_ANCHORED) != 0;
+startline = (re->options & PCRE_STARTLINE) != 0;
+
match_block.start_pattern = re->code;
match_block.start_subject = (const uschar *)subject;
match_block.end_subject = match_block.start_subject + length;
end_subject = match_block.end_subject;
match_block.endonly = (re->options & PCRE_DOLLAR_ENDONLY) != 0;
+match_block.utf8 = (re->options & PCRE_UTF8) != 0;
match_block.notbol = (options & PCRE_NOTBOL) != 0;
match_block.noteol = (options & PCRE_NOTEOL) != 0;
@@ -4634,7 +5039,7 @@ do
{
if (start_match > match_block.start_subject + start_offset)
{
- while (start_match < end_subject && start_match[-1] != '\n')
+ while (start_match < end_subject && start_match[-1] != NEWLINE)
start_match++;
}
}
@@ -4717,7 +5122,7 @@ do
if certain parts of the pattern were not used. */
match_block.start_match = start_match;
- if (!match(start_match, re->code, 2, &match_block, ims, FALSE, start_match))
+ if (!match(start_match, re->code, 2, &match_block, ims, NULL, match_isgroup))
continue;
/* Copy the offset information from temporary store if necessary */
@@ -4739,7 +5144,7 @@ do
rc = match_block.offset_overflow? 0 : match_block.end_offset_top/2;
- if (match_block.offset_end < 2) rc = 0; else
+ if (offsetcount < 2) rc = 0; else
{
offsets[0] = start_match - match_block.start_subject;
offsets[1] = match_block.end_match_ptr - match_block.start_subject;
diff --git a/srclib/pcre/pcre.in b/srclib/pcre/pcre.in
index 74b0cfc579..ef3756905e 100644
--- a/srclib/pcre/pcre.in
+++ b/srclib/pcre/pcre.in
@@ -2,14 +2,17 @@
* Perl-Compatible Regular Expressions *
*************************************************/
-/* Copyright (c) 1997-2000 University of Cambridge */
+/* Copyright (c) 1997-2001 University of Cambridge */
#ifndef _PCRE_H
#define _PCRE_H
-#define PCRE_MAJOR @PCRE_MAJOR@
-#define PCRE_MINOR @PCRE_MINOR@
-#define PCRE_DATE @PCRE_DATE@
+/* The file pcre.h is build by "configure". Do not edit it; instead
+make changes to pcre.in. */
+
+#define PCRE_MAJOR @PCRE_MAJOR@
+#define PCRE_MINOR @PCRE_MINOR@
+#define PCRE_DATE @PCRE_DATE@
/* Win32 uses DLL by default */
@@ -26,7 +29,6 @@
/* Have to include stdlib.h in order to ensure that size_t is defined;
it is needed here for malloc. */
-#include <sys/types.h>
#include <stdlib.h>
/* Allow for C++ users */
@@ -48,6 +50,7 @@ extern "C" {
#define PCRE_NOTEOL 0x0100
#define PCRE_UNGREEDY 0x0200
#define PCRE_NOTEMPTY 0x0400
+#define PCRE_UTF8 0x0800
/* Exec-time and get-time error codes */
@@ -71,8 +74,11 @@ extern "C" {
/* Types */
-typedef void pcre;
-typedef void pcre_extra;
+struct real_pcre; /* declaration; the definition is private */
+struct real_pcre_extra; /* declaration; the definition is private */
+
+typedef struct real_pcre pcre;
+typedef struct real_pcre_extra pcre_extra;
/* Store get and free functions. These can be set to alternative malloc/free
functions if required. Some magic is required for Win32 DLL; it is null on
@@ -86,15 +92,17 @@ PCRE_DL_IMPORT extern void (*pcre_free)(void *);
/* Functions */
extern pcre *pcre_compile(const char *, int, const char **, int *,
- const unsigned char *);
-extern int pcre_copy_substring(const char *, int *, int, int, char *, int);
-extern int pcre_exec(const pcre *, const pcre_extra *, const char *,
- int, int, int, int *, int);
-extern int pcre_get_substring(const char *, int *, int, int, const char **);
-extern int pcre_get_substring_list(const char *, int *, int, const char ***);
-extern int pcre_info(const pcre *, int *, int *);
-extern int pcre_fullinfo(const pcre *, const pcre_extra *, int, void *);
-extern unsigned const char *pcre_maketables(void);
+ const unsigned char *);
+extern int pcre_copy_substring(const char *, int *, int, int, char *, int);
+extern int pcre_exec(const pcre *, const pcre_extra *, const char *,
+ int, int, int, int *, int);
+extern void pcre_free_substring(const char *);
+extern void pcre_free_substring_list(const char **);
+extern int pcre_get_substring(const char *, int *, int, int, const char **);
+extern int pcre_get_substring_list(const char *, int *, int, const char ***);
+extern int pcre_info(const pcre *, int *, int *);
+extern int pcre_fullinfo(const pcre *, const pcre_extra *, int, void *);
+extern const unsigned char *pcre_maketables(void);
extern pcre_extra *pcre_study(const pcre *, int, const char **);
extern const char *pcre_version(void);
diff --git a/srclib/pcre/pcreposix.c b/srclib/pcre/pcreposix.c
index 7c66ccec89..ba48d55a00 100644
--- a/srclib/pcre/pcreposix.c
+++ b/srclib/pcre/pcreposix.c
@@ -12,7 +12,7 @@ functions.
Written by: Philip Hazel <ph10@cam.ac.uk>
- Copyright (c) 1997-2000 University of Cambridge
+ Copyright (c) 1997-2001 University of Cambridge
-----------------------------------------------------------------------------
Permission is granted to anyone to use this software for any purpose on any
@@ -62,13 +62,13 @@ static int eint[] = {
REG_BADRPT, /* "operand of unlimited repeat could match the empty string" */
REG_ASSERT, /* "internal error: unexpected repeat" */
REG_BADPAT, /* "unrecognized character after (?" */
- REG_ESIZE, /* "too many capturing parenthesized sub-patterns" */
+ REG_ASSERT, /* "unused error" */
REG_EPAREN, /* "missing )" */
REG_ESUBREG, /* "back reference to non-existent subpattern" */
REG_INVARG, /* "erroffset passed as NULL" */
REG_INVARG, /* "unknown option bit(s) set" */
REG_EPAREN, /* "missing ) after comment" */
- REG_ESIZE, /* "too many sets of parentheses" */
+ REG_ESIZE, /* "parentheses nested too deeply" */
REG_ESIZE, /* "regular expression too large" */
REG_ESPACE, /* "failed to get memory" */
REG_EPAREN, /* "unmatched brackets" */
@@ -80,7 +80,11 @@ static int eint[] = {
REG_BADPAT, /* "assertion expected after (?(" */
REG_BADPAT, /* "(?p must be followed by )" */
REG_ECTYPE, /* "unknown POSIX class name" */
- REG_BADPAT /* "POSIX collating elements are not supported" */
+ REG_BADPAT, /* "POSIX collating elements are not supported" */
+ REG_INVARG, /* "this version of PCRE is not compiled with PCRE_UTF8 support" */
+ REG_BADPAT, /* "characters with values > 255 are not yet supported in classes" */
+ REG_BADPAT, /* "character value in \x{...} sequence is too large" */
+ REG_BADPAT /* "invalid condition (?(0)" */
};
/* Table of texts corresponding to POSIX error codes */
diff --git a/srclib/pcre/pcreposix.h b/srclib/pcre/pcreposix.h
index 7660acbd55..e70af2de84 100644
--- a/srclib/pcre/pcreposix.h
+++ b/srclib/pcre/pcreposix.h
@@ -2,7 +2,7 @@
* Perl-Compatible Regular Expressions *
*************************************************/
-/* Copyright (c) 1997-2000 University of Cambridge */
+/* Copyright (c) 1997-2001 University of Cambridge */
#ifndef _PCREPOSIX_H
#define _PCREPOSIX_H
diff --git a/srclib/pcre/pcretest.c b/srclib/pcre/pcretest.c
index 85569fbeb7..f04443ab30 100644
--- a/srclib/pcre/pcretest.c
+++ b/srclib/pcre/pcretest.c
@@ -38,6 +38,114 @@ static size_t gotten_store;
+static int utf8_table1[] = {
+ 0x0000007f, 0x000007ff, 0x0000ffff, 0x001fffff, 0x03ffffff, 0x7fffffff};
+
+static int utf8_table2[] = {
+ 0, 0xc0, 0xe0, 0xf0, 0xf8, 0xfc};
+
+static int utf8_table3[] = {
+ 0xff, 0x1f, 0x0f, 0x07, 0x03, 0x01};
+
+
+/*************************************************
+* Convert character value to UTF-8 *
+*************************************************/
+
+/* This function takes an integer value in the range 0 - 0x7fffffff
+and encodes it as a UTF-8 character in 0 to 6 bytes.
+
+Arguments:
+ cvalue the character value
+ buffer pointer to buffer for result - at least 6 bytes long
+
+Returns: number of characters placed in the buffer
+ -1 if input character is negative
+ 0 if input character is positive but too big (only when
+ int is longer than 32 bits)
+*/
+
+static int
+ord2utf8(int cvalue, unsigned char *buffer)
+{
+register int i, j;
+for (i = 0; i < sizeof(utf8_table1)/sizeof(int); i++)
+ if (cvalue <= utf8_table1[i]) break;
+if (i >= sizeof(utf8_table1)/sizeof(int)) return 0;
+if (cvalue < 0) return -1;
+
+buffer += i;
+for (j = i; j > 0; j--)
+ {
+ *buffer-- = 0x80 | (cvalue & 0x3f);
+ cvalue >>= 6;
+ }
+*buffer = utf8_table2[i] | cvalue;
+return i + 1;
+}
+
+
+/*************************************************
+* Convert UTF-8 string to value *
+*************************************************/
+
+/* This function takes one or more bytes that represents a UTF-8 character,
+and returns the value of the character.
+
+Argument:
+ buffer a pointer to the byte vector
+ vptr a pointer to an int to receive the value
+
+Returns: > 0 => the number of bytes consumed
+ -6 to 0 => malformed UTF-8 character at offset = (-return)
+*/
+
+int
+utf82ord(unsigned char *buffer, int *vptr)
+{
+int c = *buffer++;
+int d = c;
+int i, j, s;
+
+for (i = -1; i < 6; i++) /* i is number of additional bytes */
+ {
+ if ((d & 0x80) == 0) break;
+ d <<= 1;
+ }
+
+if (i == -1) { *vptr = c; return 1; } /* ascii character */
+if (i == 0 || i == 6) return 0; /* invalid UTF-8 */
+
+/* i now has a value in the range 1-5 */
+
+s = 6*i;
+d = (c & utf8_table3[i]) << s;
+
+for (j = 0; j < i; j++)
+ {
+ c = *buffer++;
+ if ((c & 0xc0) != 0x80) return -(j+1);
+ s -= 6;
+ d |= (c & 0x3f) << s;
+ }
+
+/* Check that encoding was the correct unique one */
+
+for (j = 0; j < sizeof(utf8_table1)/sizeof(int); j++)
+ if (d <= utf8_table1[j]) break;
+if (j != i) return -(i+1);
+
+/* Valid value */
+
+*vptr = d;
+return i+1;
+}
+
+
+
+
+
+
/* Debugging function to print the internal form of the regex. This is the same
code as contained in pcre.c under the DEBUG macro. */
@@ -52,7 +160,7 @@ static const char *OP_names[] = {
"class", "Ref", "Recurse",
"Alt", "Ket", "KetRmax", "KetRmin", "Assert", "Assert not",
"AssertB", "AssertB not", "Reverse", "Once", "Cond", "Cref",
- "Brazero", "Braminzero", "Bra"
+ "Brazero", "Braminzero", "Branumber", "Bra"
};
@@ -71,7 +179,10 @@ for(;;)
if (*code >= OP_BRA)
{
- fprintf(outfile, "%3d Bra %d", (code[1] << 8) + code[2], *code - OP_BRA);
+ if (*code - OP_BRA > EXTRACT_BASIC_MAX)
+ fprintf(outfile, "%3d Bra extra", (code[1] << 8) + code[2]);
+ else
+ fprintf(outfile, "%3d Bra %d", (code[1] << 8) + code[2], *code - OP_BRA);
code += 2;
}
@@ -87,16 +198,6 @@ for(;;)
code++;
break;
- case OP_COND:
- fprintf(outfile, "%3d Cond", (code[1] << 8) + code[2]);
- code += 2;
- break;
-
- case OP_CREF:
- fprintf(outfile, " %.2d %s", code[1], OP_names[*code]);
- code++;
- break;
-
case OP_CHARS:
charlength = *(++code);
fprintf(outfile, "%3d ", charlength);
@@ -114,11 +215,10 @@ for(;;)
case OP_ASSERTBACK:
case OP_ASSERTBACK_NOT:
case OP_ONCE:
- fprintf(outfile, "%3d %s", (code[1] << 8) + code[2], OP_names[*code]);
- code += 2;
- break;
-
+ case OP_COND:
+ case OP_BRANUMBER:
case OP_REVERSE:
+ case OP_CREF:
fprintf(outfile, "%3d %s", (code[1] << 8) + code[2], OP_names[*code]);
code += 2;
break;
@@ -191,8 +291,8 @@ for(;;)
break;
case OP_REF:
- fprintf(outfile, " \\%d", *(++code));
- code++;
+ fprintf(outfile, " \\%d", (code[1] << 8) | code[2]);
+ code += 3;
goto CLASS_REF_REPEAT;
case OP_CLASS:
@@ -265,14 +365,31 @@ for(;;)
-/* Character string printing function. */
+/* Character string printing function. A "normal" and a UTF-8 version. */
-static void pchars(unsigned char *p, int length)
+static void pchars(unsigned char *p, int length, int utf8)
{
int c;
while (length-- > 0)
+ {
+ if (utf8)
+ {
+ int rc = utf82ord(p, &c);
+ if (rc > 0)
+ {
+ length -= rc - 1;
+ p += rc;
+ if (c < 256 && isprint(c)) fprintf(outfile, "%c", c);
+ else fprintf(outfile, "\\x{%02x}", c);
+ continue;
+ }
+ }
+
+ /* Not UTF-8, or malformed UTF-8 */
+
if (isprint(c = *(p++))) fprintf(outfile, "%c", c);
else fprintf(outfile, "\\x%02x", c);
+ }
}
@@ -317,7 +434,12 @@ int op = 1;
int timeit = 0;
int showinfo = 0;
int showstore = 0;
+int size_offsets = 45;
+int size_offsets_max;
+int *offsets;
+#if !defined NOPOSIX
int posix = 0;
+#endif
int debug = 0;
int done = 0;
unsigned char buffer[30000];
@@ -331,27 +453,51 @@ outfile = stdout;
while (argc > 1 && argv[op][0] == '-')
{
+ char *endptr;
+
if (strcmp(argv[op], "-s") == 0 || strcmp(argv[op], "-m") == 0)
showstore = 1;
else if (strcmp(argv[op], "-t") == 0) timeit = 1;
else if (strcmp(argv[op], "-i") == 0) showinfo = 1;
else if (strcmp(argv[op], "-d") == 0) showinfo = debug = 1;
+ else if (strcmp(argv[op], "-o") == 0 && argc > 2 &&
+ ((size_offsets = (int)strtoul(argv[op+1], &endptr, 10)), *endptr == 0))
+ {
+ op++;
+ argc--;
+ }
+#if !defined NOPOSIX
else if (strcmp(argv[op], "-p") == 0) posix = 1;
+#endif
else
{
- printf("*** Unknown option %s\n", argv[op]);
- printf("Usage: pcretest [-d] [-i] [-p] [-s] [-t] [<input> [<output>]]\n");
- printf(" -d debug: show compiled code; implies -i\n"
- " -i show information about compiled pattern\n"
- " -p use POSIX interface\n"
- " -s output store information\n"
- " -t time compilation and execution\n");
+ printf("** Unknown or malformed option %s\n", argv[op]);
+ printf("Usage: pcretest [-d] [-i] [-o <n>] [-p] [-s] [-t] [<input> [<output>]]\n");
+ printf(" -d debug: show compiled code; implies -i\n"
+ " -i show information about compiled pattern\n"
+ " -o <n> set size of offsets vector to <n>\n");
+#if !defined NOPOSIX
+ printf(" -p use POSIX interface\n");
+#endif
+ printf(" -s output store information\n"
+ " -t time compilation and execution\n");
return 1;
}
op++;
argc--;
}
+/* Get the store for the offsets vector, and remember what it was */
+
+size_offsets_max = size_offsets;
+offsets = malloc(size_offsets_max * sizeof(int));
+if (offsets == NULL)
+ {
+ printf("** Failed to get %d bytes of memory for offsets vector\n",
+ size_offsets_max * sizeof(int));
+ return 1;
+ }
+
/* Sort out the input and output files */
if (argc > 1)
@@ -396,13 +542,14 @@ while (!done)
const char *error;
unsigned char *p, *pp, *ppp;
- unsigned const char *tables = NULL;
+ const unsigned char *tables = NULL;
int do_study = 0;
int do_debug = debug;
int do_G = 0;
int do_g = 0;
int do_showinfo = showinfo;
int do_showrest = 0;
+ int utf8 = 0;
int erroroffset, len, delimiter;
if (infile == stdin) printf(" re> ");
@@ -494,6 +641,7 @@ while (!done)
case 'S': do_study = 1; break;
case 'U': options |= PCRE_UNGREEDY; break;
case 'X': options |= PCRE_EXTRA; break;
+ case '8': options |= PCRE_UTF8; utf8 = 1; break;
case 'L':
ppp = pp;
@@ -594,13 +742,14 @@ while (!done)
if (do_showinfo)
{
+ unsigned long int get_options;
int old_first_char, old_options, old_count;
int count, backrefmax, first_char, need_char;
size_t size;
if (do_debug) print_internals(re);
- new_info(re, NULL, PCRE_INFO_OPTIONS, &options);
+ new_info(re, NULL, PCRE_INFO_OPTIONS, &get_options);
new_info(re, NULL, PCRE_INFO_SIZE, &size);
new_info(re, NULL, PCRE_INFO_CAPTURECOUNT, &count);
new_info(re, NULL, PCRE_INFO_BACKREFMAX, &backrefmax);
@@ -620,9 +769,9 @@ while (!done)
"First char disagreement: pcre_fullinfo=%d pcre_info=%d\n",
first_char, old_first_char);
- if (old_options != options) fprintf(outfile,
- "Options disagreement: pcre_fullinfo=%d pcre_info=%d\n", options,
- old_options);
+ if (old_options != (int)get_options) fprintf(outfile,
+ "Options disagreement: pcre_fullinfo=%ld pcre_info=%d\n",
+ get_options, old_options);
}
if (size != gotten_store) fprintf(outfile,
@@ -632,16 +781,17 @@ while (!done)
fprintf(outfile, "Capturing subpattern count = %d\n", count);
if (backrefmax > 0)
fprintf(outfile, "Max back reference = %d\n", backrefmax);
- if (options == 0) fprintf(outfile, "No options\n");
- else fprintf(outfile, "Options:%s%s%s%s%s%s%s%s\n",
- ((options & PCRE_ANCHORED) != 0)? " anchored" : "",
- ((options & PCRE_CASELESS) != 0)? " caseless" : "",
- ((options & PCRE_EXTENDED) != 0)? " extended" : "",
- ((options & PCRE_MULTILINE) != 0)? " multiline" : "",
- ((options & PCRE_DOTALL) != 0)? " dotall" : "",
- ((options & PCRE_DOLLAR_ENDONLY) != 0)? " dollar_endonly" : "",
- ((options & PCRE_EXTRA) != 0)? " extra" : "",
- ((options & PCRE_UNGREEDY) != 0)? " ungreedy" : "");
+ if (get_options == 0) fprintf(outfile, "No options\n");
+ else fprintf(outfile, "Options:%s%s%s%s%s%s%s%s%s\n",
+ ((get_options & PCRE_ANCHORED) != 0)? " anchored" : "",
+ ((get_options & PCRE_CASELESS) != 0)? " caseless" : "",
+ ((get_options & PCRE_EXTENDED) != 0)? " extended" : "",
+ ((get_options & PCRE_MULTILINE) != 0)? " multiline" : "",
+ ((get_options & PCRE_DOTALL) != 0)? " dotall" : "",
+ ((get_options & PCRE_DOLLAR_ENDONLY) != 0)? " dollar_endonly" : "",
+ ((get_options & PCRE_EXTRA) != 0)? " extra" : "",
+ ((get_options & PCRE_UNGREEDY) != 0)? " ungreedy" : "",
+ ((get_options & PCRE_UTF8) != 0)? " utf8" : "");
if (((((real_pcre *)re)->options) & PCRE_ICHANGED) != 0)
fprintf(outfile, "Case state changes\n");
@@ -744,6 +894,8 @@ while (!done)
{
unsigned char *q;
unsigned char *bptr = dbuffer;
+ int *use_offsets = offsets;
+ int use_size_offsets = size_offsets;
int count, c;
int copystrings = 0;
int getstrings = 0;
@@ -751,8 +903,6 @@ while (!done)
int gmatched = 0;
int start_offset = 0;
int g_notempty = 0;
- int offsets[45];
- int size_offsets = sizeof(offsets)/sizeof(int);
options = 0;
@@ -796,6 +946,30 @@ while (!done)
break;
case 'x':
+
+ /* Handle \x{..} specially - new Perl thing for utf8 */
+
+ if (*p == '{')
+ {
+ unsigned char *pt = p;
+ c = 0;
+ while (isxdigit(*(++pt)))
+ c = c * 16 + tolower(*pt) - ((isdigit(*pt))? '0' : 'W');
+ if (*pt == '}')
+ {
+ unsigned char buffer[8];
+ int ii, utn;
+ utn = ord2utf8(c, buffer);
+ for (ii = 0; ii < utn - 1; ii++) *q++ = buffer[ii];
+ c = buffer[ii]; /* Last byte */
+ p = pt + 1;
+ break;
+ }
+ /* Not correct form; fall through */
+ }
+
+ /* Ordinary \x */
+
c = 0;
while (i++ < 2 && isxdigit(*p))
{
@@ -836,7 +1010,20 @@ while (!done)
case 'O':
while(isdigit(*p)) n = n * 10 + *p++ - '0';
- if (n <= (int)(sizeof(offsets)/sizeof(int))) size_offsets = n;
+ if (n > size_offsets_max)
+ {
+ size_offsets_max = n;
+ free(offsets);
+ use_offsets = offsets = malloc(size_offsets_max * sizeof(int));
+ if (offsets == NULL)
+ {
+ printf("** Failed to get %d bytes of memory for offsets vector\n",
+ size_offsets_max * sizeof(int));
+ return 1;
+ }
+ }
+ use_size_offsets = n;
+ if (n == 0) use_offsets = NULL;
continue;
case 'Z':
@@ -856,11 +1043,11 @@ while (!done)
{
int rc;
int eflags = 0;
- regmatch_t pmatch[sizeof(offsets)/sizeof(int)];
+ regmatch_t *pmatch = malloc(sizeof(regmatch_t) * use_size_offsets);
if ((options & PCRE_NOTBOL) != 0) eflags |= REG_NOTBOL;
if ((options & PCRE_NOTEOL) != 0) eflags |= REG_NOTEOL;
- rc = regexec(&preg, (const char *)bptr, size_offsets, pmatch, eflags);
+ rc = regexec(&preg, (const char *)bptr, use_size_offsets, pmatch, eflags);
if (rc != 0)
{
@@ -870,23 +1057,24 @@ while (!done)
else
{
size_t i;
- for (i = 0; i < size_offsets; i++)
+ for (i = 0; i < use_size_offsets; i++)
{
if (pmatch[i].rm_so >= 0)
{
fprintf(outfile, "%2d: ", (int)i);
pchars(dbuffer + pmatch[i].rm_so,
- pmatch[i].rm_eo - pmatch[i].rm_so);
+ pmatch[i].rm_eo - pmatch[i].rm_so, utf8);
fprintf(outfile, "\n");
if (i == 0 && do_showrest)
{
fprintf(outfile, " 0+ ");
- pchars(dbuffer + pmatch[i].rm_eo, len - pmatch[i].rm_eo);
+ pchars(dbuffer + pmatch[i].rm_eo, len - pmatch[i].rm_eo, utf8);
fprintf(outfile, "\n");
}
}
}
}
+ free(pmatch);
}
/* Handle matching via the native interface - repeats for /g and /G */
@@ -903,7 +1091,7 @@ while (!done)
clock_t start_time = clock();
for (i = 0; i < LOOPREPEAT; i++)
count = pcre_exec(re, extra, (char *)bptr, len,
- start_offset, options | g_notempty, offsets, size_offsets);
+ start_offset, options | g_notempty, use_offsets, use_size_offsets);
time_taken = clock() - start_time;
fprintf(outfile, "Execute time %.3f milliseconds\n",
((double)time_taken * 1000.0)/
@@ -911,12 +1099,12 @@ while (!done)
}
count = pcre_exec(re, extra, (char *)bptr, len,
- start_offset, options | g_notempty, offsets, size_offsets);
+ start_offset, options | g_notempty, use_offsets, use_size_offsets);
if (count == 0)
{
fprintf(outfile, "Matched, but too many substrings\n");
- count = size_offsets/3;
+ count = use_size_offsets/3;
}
/* Matched */
@@ -926,19 +1114,19 @@ while (!done)
int i;
for (i = 0; i < count * 2; i += 2)
{
- if (offsets[i] < 0)
+ if (use_offsets[i] < 0)
fprintf(outfile, "%2d: <unset>\n", i/2);
else
{
fprintf(outfile, "%2d: ", i/2);
- pchars(bptr + offsets[i], offsets[i+1] - offsets[i]);
+ pchars(bptr + use_offsets[i], use_offsets[i+1] - use_offsets[i], utf8);
fprintf(outfile, "\n");
if (i == 0)
{
if (do_showrest)
{
fprintf(outfile, " 0+ ");
- pchars(bptr + offsets[i+1], len - offsets[i+1]);
+ pchars(bptr + use_offsets[i+1], len - use_offsets[i+1], utf8);
fprintf(outfile, "\n");
}
}
@@ -950,7 +1138,7 @@ while (!done)
if ((copystrings & (1 << i)) != 0)
{
char copybuffer[16];
- int rc = pcre_copy_substring((char *)bptr, offsets, count,
+ int rc = pcre_copy_substring((char *)bptr, use_offsets, count,
i, copybuffer, sizeof(copybuffer));
if (rc < 0)
fprintf(outfile, "copy substring %d failed %d\n", i, rc);
@@ -964,14 +1152,15 @@ while (!done)
if ((getstrings & (1 << i)) != 0)
{
const char *substring;
- int rc = pcre_get_substring((char *)bptr, offsets, count,
+ int rc = pcre_get_substring((char *)bptr, use_offsets, count,
i, &substring);
if (rc < 0)
fprintf(outfile, "get substring %d failed %d\n", i, rc);
else
{
fprintf(outfile, "%2dG %s (%d)\n", i, substring, rc);
- free((void *)substring);
+ /* free((void *)substring); */
+ pcre_free_substring(substring);
}
}
}
@@ -979,7 +1168,7 @@ while (!done)
if (getlist)
{
const char **stringlist;
- int rc = pcre_get_substring_list((char *)bptr, offsets, count,
+ int rc = pcre_get_substring_list((char *)bptr, use_offsets, count,
&stringlist);
if (rc < 0)
fprintf(outfile, "get substring list failed %d\n", rc);
@@ -989,23 +1178,24 @@ while (!done)
fprintf(outfile, "%2dL %s\n", i, stringlist[i]);
if (stringlist[i] != NULL)
fprintf(outfile, "string list not terminated by NULL\n");
- free((void *)stringlist);
+ /* free((void *)stringlist); */
+ pcre_free_substring_list(stringlist);
}
}
}
/* Failed to match. If this is a /g or /G loop and we previously set
- PCRE_NOTEMPTY after a null match, this is not necessarily the end.
+ g_notempty after a null match, this is not necessarily the end.
We want to advance the start offset, and continue. Fudge the offset
values to achieve this. We won't be at the end of the string - that
- was checked before setting PCRE_NOTEMPTY. */
+ was checked before setting g_notempty. */
else
{
if (g_notempty != 0)
{
- offsets[0] = start_offset;
- offsets[1] = start_offset + 1;
+ use_offsets[0] = start_offset;
+ use_offsets[1] = start_offset + 1;
}
else
{
@@ -1025,26 +1215,27 @@ while (!done)
/* If we have matched an empty string, first check to see if we are at
the end of the subject. If so, the /g loop is over. Otherwise, mimic
what Perl's /g options does. This turns out to be rather cunning. First
- we set PCRE_NOTEMPTY and try the match again at the same point. If this
- fails (picked up above) we advance to the next character. */
+ we set PCRE_NOTEMPTY and PCRE_ANCHORED and try the match again at the
+ same point. If this fails (picked up above) we advance to the next
+ character. */
g_notempty = 0;
- if (offsets[0] == offsets[1])
+ if (use_offsets[0] == use_offsets[1])
{
- if (offsets[0] == len) break;
- g_notempty = PCRE_NOTEMPTY;
+ if (use_offsets[0] == len) break;
+ g_notempty = PCRE_NOTEMPTY | PCRE_ANCHORED;
}
/* For /g, update the start offset, leaving the rest alone */
- if (do_g) start_offset = offsets[1];
+ if (do_g) start_offset = use_offsets[1];
/* For /G, update the pointer and length */
else
{
- bptr += offsets[1];
- len -= offsets[1];
+ bptr += use_offsets[1];
+ len -= use_offsets[1];
}
} /* End of loop for /g and /G */
} /* End of loop for data lines */
diff --git a/srclib/pcre/perltest b/srclib/pcre/perltest
index 1e96c792b9..e6f797498c 100755
--- a/srclib/pcre/perltest
+++ b/srclib/pcre/perltest
@@ -9,7 +9,7 @@
sub pchars {
my($t) = "";
-foreach $c (split(//, @_[0]))
+foreach $c (split(//, $_[0]))
{
if (ord $c >= 32 && ord $c < 127) { $t .= $c; }
else { $t .= sprintf("\\x%02x", ord $c); }
diff --git a/srclib/pcre/study.c b/srclib/pcre/study.c
index 676db94665..f924543d21 100644
--- a/srclib/pcre/study.c
+++ b/srclib/pcre/study.c
@@ -9,7 +9,7 @@ the file Tech.Notes for some information on the internals.
Written by: Philip Hazel <ph10@cam.ac.uk>
- Copyright (c) 1997-2000 University of Cambridge
+ Copyright (c) 1997-2001 University of Cambridge
-----------------------------------------------------------------------------
Permission is granted to anyone to use this software for any purpose on any
@@ -104,8 +104,6 @@ do
while (try_next)
{
- try_next = FALSE;
-
/* If a branch starts with a bracket or a positive lookahead assertion,
recurse to set bits from within them. That's all for this branch. */
@@ -113,6 +111,7 @@ do
{
if (!set_start_bits(tcode, start_bits, caseless, cd))
return FALSE;
+ try_next = FALSE;
}
else switch(*tcode)
@@ -120,12 +119,17 @@ do
default:
return FALSE;
+ /* Skip over extended extraction bracket number */
+
+ case OP_BRANUMBER:
+ tcode += 3;
+ break;
+
/* Skip over lookbehind and negative lookahead assertions */
case OP_ASSERT_NOT:
case OP_ASSERTBACK:
case OP_ASSERTBACK_NOT:
- try_next = TRUE;
do tcode += (tcode[1] << 8) + tcode[2]; while (*tcode == OP_ALT);
tcode += 3;
break;
@@ -135,7 +139,6 @@ do
case OP_OPT:
caseless = (tcode[1] & PCRE_CASELESS) != 0;
tcode += 2;
- try_next = TRUE;
break;
/* BRAZERO does the bracket, but carries on. */
@@ -147,7 +150,6 @@ do
dummy = 1;
do tcode += (tcode[1] << 8) + tcode[2]; while (*tcode == OP_ALT);
tcode += 3;
- try_next = TRUE;
break;
/* Single-char * or ? sets the bit and tries the next item */
@@ -158,7 +160,6 @@ do
case OP_MINQUERY:
set_bit(start_bits, tcode[1], caseless, cd);
tcode += 2;
- try_next = TRUE;
break;
/* Single-char upto sets the bit and tries the next */
@@ -167,7 +168,6 @@ do
case OP_MINUPTO:
set_bit(start_bits, tcode[3], caseless, cd);
tcode += 4;
- try_next = TRUE;
break;
/* At least one single char sets the bit and stops */
@@ -181,6 +181,7 @@ do
case OP_PLUS:
case OP_MINPLUS:
set_bit(start_bits, tcode[1], caseless, cd);
+ try_next = FALSE;
break;
/* Single character type sets the bits and stops */
@@ -188,31 +189,37 @@ do
case OP_NOT_DIGIT:
for (c = 0; c < 32; c++)
start_bits[c] |= ~cd->cbits[c+cbit_digit];
+ try_next = FALSE;
break;
case OP_DIGIT:
for (c = 0; c < 32; c++)
start_bits[c] |= cd->cbits[c+cbit_digit];
+ try_next = FALSE;
break;
case OP_NOT_WHITESPACE:
for (c = 0; c < 32; c++)
start_bits[c] |= ~cd->cbits[c+cbit_space];
+ try_next = FALSE;
break;
case OP_WHITESPACE:
for (c = 0; c < 32; c++)
start_bits[c] |= cd->cbits[c+cbit_space];
+ try_next = FALSE;
break;
case OP_NOT_WORDCHAR:
for (c = 0; c < 32; c++)
start_bits[c] |= ~cd->cbits[c+cbit_word];
+ try_next = FALSE;
break;
case OP_WORDCHAR:
for (c = 0; c < 32; c++)
start_bits[c] |= cd->cbits[c+cbit_word];
+ try_next = FALSE;
break;
/* One or more character type fudges the pointer and restarts, knowing
@@ -221,12 +228,10 @@ do
case OP_TYPEPLUS:
case OP_TYPEMINPLUS:
tcode++;
- try_next = TRUE;
break;
case OP_TYPEEXACT:
tcode += 3;
- try_next = TRUE;
break;
/* Zero or more repeats of character types set the bits and then
@@ -274,7 +279,6 @@ do
}
tcode += 2;
- try_next = TRUE;
break;
/* Character class: set the bits and either carry on or not,
@@ -292,16 +296,16 @@ do
case OP_CRQUERY:
case OP_CRMINQUERY:
tcode++;
- try_next = TRUE;
break;
case OP_CRRANGE:
case OP_CRMINRANGE:
- if (((tcode[1] << 8) + tcode[2]) == 0)
- {
- tcode += 5;
- try_next = TRUE;
- }
+ if (((tcode[1] << 8) + tcode[2]) == 0) tcode += 5;
+ else try_next = FALSE;
+ break;
+
+ default:
+ try_next = FALSE;
break;
}
}
diff --git a/srclib/pcre/testdata/testinput1 b/srclib/pcre/testdata/testinput1
index d72a2c580b..66df9b3d79 100644
--- a/srclib/pcre/testdata/testinput1
+++ b/srclib/pcre/testdata/testinput1
@@ -1441,10 +1441,6 @@
ABCabc
abcABC
-/(main(O)?)+/
- mainmain
- mainOmain
-
/ab{3cd/
ab{3cd
@@ -1898,4 +1894,57 @@
//g
abc
-/ End of test input /
+/<tr([\w\W\s\d][^<>]{0,})><TD([\w\W\s\d][^<>]{0,})>([\d]{0,}\.)(.*)((<BR>([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/is
+ <TR BGCOLOR='#DBE9E9'><TD align=left valign=top>43.<a href='joblist.cfm?JobID=94 6735&Keyword='>Word Processor<BR>(N-1286)</a></TD><TD align=left valign=top>Lega lstaff.com</TD><TD align=left valign=top>CA - Statewide</TD></TR>
+
+/a[^a]b/
+ acb
+ a\nb
+
+/a.b/
+ acb
+ *** Failers
+ a\nb
+
+/a[^a]b/s
+ acb
+ a\nb
+
+/a.b/s
+ acb
+ a\nb
+
+/^(b+?|a){1,2}?c/
+ bac
+ bbac
+ bbbac
+ bbbbac
+ bbbbbac
+
+/^(b+|a){1,2}?c/
+ bac
+ bbac
+ bbbac
+ bbbbac
+ bbbbbac
+
+/(?!\A)x/m
+ x\nb\n
+ a\bx\n
+
+/\x0{ab}/
+ \0{ab}
+
+/(A|B)*?CD/
+ CD
+
+/(A|B)*CD/
+ CD
+
+/(AB)*?\1/
+ ABABAB
+
+/(AB)*\1/
+ ABABAB
+
+/ End of testinput1 /
diff --git a/srclib/pcre/testdata/testinput2 b/srclib/pcre/testdata/testinput2
index 1d9504cec2..f41478e104 100644
--- a/srclib/pcre/testdata/testinput2
+++ b/srclib/pcre/testdata/testinput2
@@ -40,8 +40,6 @@
/[\B]/
-/[a-\w]/
-
/[z-a]/
/^*/
@@ -707,4 +705,19 @@
Ab
AB
-/ End of test input /
+/[\200-\410]/
+
+/^(?(0)f|b)oo/
+
+/This one's here because of the large output vector needed/
+
+/(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\w+)\s+(\270)/
+ \O900 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 ABC ABC
+
+/This one's here because Perl does this differently and PCRE can't at present/
+
+/(main(O)?)+/
+ mainmain
+ mainOmain
+
+/ End of testinput2 /
diff --git a/srclib/pcre/testdata/testinput3 b/srclib/pcre/testdata/testinput3
index 0c884d3b7c..d3bd74fdd3 100644
--- a/srclib/pcre/testdata/testinput3
+++ b/srclib/pcre/testdata/testinput3
@@ -1689,4 +1689,36 @@
<a href=\"abcd xyz pqr\" cats
<a href = \'abcd xyz pqr\' cats
-/ End of test input /
+/((Z)+|A)*/
+ ZABCDEFG
+
+/(Z()|A)*/
+ ZABCDEFG
+
+/(Z(())|A)*/
+ ZABCDEFG
+
+/((?>Z)+|A)*/
+ ZABCDEFG
+
+/((?>)+|A)*/
+ ZABCDEFG
+
+/a*/g
+ abbab
+
+/^[a-\d]/
+ abcde
+ -things
+ 0digit
+ *** Failers
+ bcdef
+
+/^[\d-a]/
+ abcde
+ -things
+ 0digit
+ *** Failers
+ bcdef
+
+/ End of testinput3 /
diff --git a/srclib/pcre/testdata/testinput4 b/srclib/pcre/testdata/testinput4
index c23b52aceb..f2878965f6 100644
--- a/srclib/pcre/testdata/testinput4
+++ b/srclib/pcre/testdata/testinput4
@@ -62,3 +62,4 @@
*** Failers
école
+/ End of testinput4 /
diff --git a/srclib/pcre/testdata/testoutput1 b/srclib/pcre/testdata/testoutput1
index affb475aa5..f0047ffc40 100644
--- a/srclib/pcre/testdata/testoutput1
+++ b/srclib/pcre/testdata/testoutput1
@@ -1,4 +1,4 @@
-PCRE version 3.1 09-Feb-2000
+PCRE version 3.9 02-Jan-2002
/the quick brown fox/
the quick brown fox
@@ -2079,15 +2079,6 @@ No match
0: abcABC
1: abc
-/(main(O)?)+/
- mainmain
- 0: mainmain
- 1: main
- mainOmain
- 0: mainOmain
- 1: main
- 2: O
-
/ab{3cd/
ab{3cd
0: ab{3cd
@@ -2920,5 +2911,108 @@ No match
0:
0:
-/ End of test input /
+/<tr([\w\W\s\d][^<>]{0,})><TD([\w\W\s\d][^<>]{0,})>([\d]{0,}\.)(.*)((<BR>([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/is
+ <TR BGCOLOR='#DBE9E9'><TD align=left valign=top>43.<a href='joblist.cfm?JobID=94 6735&Keyword='>Word Processor<BR>(N-1286)</a></TD><TD align=left valign=top>Lega lstaff.com</TD><TD align=left valign=top>CA - Statewide</TD></TR>
+ 0: <TR BGCOLOR='#DBE9E9'><TD align=left valign=top>43.<a href='joblist.cfm?JobID=94 6735&Keyword='>Word Processor<BR>(N-1286)</a></TD><TD align=left valign=top>Lega lstaff.com</TD><TD align=left valign=top>CA - Statewide</TD></TR>
+ 1: BGCOLOR='#DBE9E9'
+ 2: align=left valign=top
+ 3: 43.
+ 4: <a href='joblist.cfm?JobID=94 6735&Keyword='>Word Processor<BR>(N-1286)
+ 5:
+ 6:
+ 7: <unset>
+ 8: align=left valign=top
+ 9: Lega lstaff.com
+10: align=left valign=top
+11: CA - Statewide
+
+/a[^a]b/
+ acb
+ 0: acb
+ a\nb
+ 0: a\x0ab
+
+/a.b/
+ acb
+ 0: acb
+ *** Failers
+No match
+ a\nb
+No match
+
+/a[^a]b/s
+ acb
+ 0: acb
+ a\nb
+ 0: a\x0ab
+
+/a.b/s
+ acb
+ 0: acb
+ a\nb
+ 0: a\x0ab
+
+/^(b+?|a){1,2}?c/
+ bac
+ 0: bac
+ 1: a
+ bbac
+ 0: bbac
+ 1: a
+ bbbac
+ 0: bbbac
+ 1: a
+ bbbbac
+ 0: bbbbac
+ 1: a
+ bbbbbac
+ 0: bbbbbac
+ 1: a
+
+/^(b+|a){1,2}?c/
+ bac
+ 0: bac
+ 1: a
+ bbac
+ 0: bbac
+ 1: a
+ bbbac
+ 0: bbbac
+ 1: a
+ bbbbac
+ 0: bbbbac
+ 1: a
+ bbbbbac
+ 0: bbbbbac
+ 1: a
+
+/(?!\A)x/m
+ x\nb\n
+No match
+ a\bx\n
+ 0: x
+
+/\x0{ab}/
+ \0{ab}
+ 0: \x00{ab}
+
+/(A|B)*?CD/
+ CD
+ 0: CD
+
+/(A|B)*CD/
+ CD
+ 0: CD
+
+/(AB)*?\1/
+ ABABAB
+ 0: ABAB
+ 1: AB
+
+/(AB)*\1/
+ ABABAB
+ 0: ABABAB
+ 1: AB
+
+/ End of testinput1 /
diff --git a/srclib/pcre/testdata/testoutput2 b/srclib/pcre/testdata/testoutput2
index d0fc036f0f..e8844d2aeb 100644
--- a/srclib/pcre/testdata/testoutput2
+++ b/srclib/pcre/testdata/testoutput2
@@ -1,4 +1,4 @@
-PCRE version 3.1 09-Feb-2000
+PCRE version 3.9 02-Jan-2002
/(a)b|/
Capturing subpattern count = 1
@@ -94,9 +94,6 @@ Failed: missing terminating ] for character class at offset 5
/[\B]/
Failed: invalid escape sequence in character class at offset 2
-/[a-\w]/
-Failed: invalid escape sequence in character class at offset 4
-
/[z-a]/
Failed: range out of order in character class at offset 3
@@ -2064,7 +2061,318 @@ No match
AB
No match
-/ End of test input /
+/[\200-\410]/
+Failed: range out of order in character class at offset 9
+
+/^(?(0)f|b)oo/
+Failed: invalid condition (?(0) at offset 5
+
+/This one's here because of the large output vector needed/
+Capturing subpattern count = 0
+No options
+First char = 'T'
+Need char = 'd'
+
+/(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\w+)\s+(\270)/
+Capturing subpattern count = 271
+Max back reference = 270
+No options
+No first char
+No need char
+ \O900 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 ABC ABC
+ 0: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 ABC ABC
+ 1: 1
+ 2: 2
+ 3: 3
+ 4: 4
+ 5: 5
+ 6: 6
+ 7: 7
+ 8: 8
+ 9: 9
+10: 10
+11: 11
+12: 12
+13: 13
+14: 14
+15: 15
+16: 16
+17: 17
+18: 18
+19: 19
+20: 20
+21: 21
+22: 22
+23: 23
+24: 24
+25: 25
+26: 26
+27: 27
+28: 28
+29: 29
+30: 30
+31: 31
+32: 32
+33: 33
+34: 34
+35: 35
+36: 36
+37: 37
+38: 38
+39: 39
+40: 40
+41: 41
+42: 42
+43: 43
+44: 44
+45: 45
+46: 46
+47: 47
+48: 48
+49: 49
+50: 50
+51: 51
+52: 52
+53: 53
+54: 54
+55: 55
+56: 56
+57: 57
+58: 58
+59: 59
+60: 60
+61: 61
+62: 62
+63: 63
+64: 64
+65: 65
+66: 66
+67: 67
+68: 68
+69: 69
+70: 70
+71: 71
+72: 72
+73: 73
+74: 74
+75: 75
+76: 76
+77: 77
+78: 78
+79: 79
+80: 80
+81: 81
+82: 82
+83: 83
+84: 84
+85: 85
+86: 86
+87: 87
+88: 88
+89: 89
+90: 90
+91: 91
+92: 92
+93: 93
+94: 94
+95: 95
+96: 96
+97: 97
+98: 98
+99: 99
+100: 100
+101: 101
+102: 102
+103: 103
+104: 104
+105: 105
+106: 106
+107: 107
+108: 108
+109: 109
+110: 110
+111: 111
+112: 112
+113: 113
+114: 114
+115: 115
+116: 116
+117: 117
+118: 118
+119: 119
+120: 120
+121: 121
+122: 122
+123: 123
+124: 124
+125: 125
+126: 126
+127: 127
+128: 128
+129: 129
+130: 130
+131: 131
+132: 132
+133: 133
+134: 134
+135: 135
+136: 136
+137: 137
+138: 138
+139: 139
+140: 140
+141: 141
+142: 142
+143: 143
+144: 144
+145: 145
+146: 146
+147: 147
+148: 148
+149: 149
+150: 150
+151: 151
+152: 152
+153: 153
+154: 154
+155: 155
+156: 156
+157: 157
+158: 158
+159: 159
+160: 160
+161: 161
+162: 162
+163: 163
+164: 164
+165: 165
+166: 166
+167: 167
+168: 168
+169: 169
+170: 170
+171: 171
+172: 172
+173: 173
+174: 174
+175: 175
+176: 176
+177: 177
+178: 178
+179: 179
+180: 180
+181: 181
+182: 182
+183: 183
+184: 184
+185: 185
+186: 186
+187: 187
+188: 188
+189: 189
+190: 190
+191: 191
+192: 192
+193: 193
+194: 194
+195: 195
+196: 196
+197: 197
+198: 198
+199: 199
+200: 200
+201: 201
+202: 202
+203: 203
+204: 204
+205: 205
+206: 206
+207: 207
+208: 208
+209: 209
+210: 210
+211: 211
+212: 212
+213: 213
+214: 214
+215: 215
+216: 216
+217: 217
+218: 218
+219: 219
+220: 220
+221: 221
+222: 222
+223: 223
+224: 224
+225: 225
+226: 226
+227: 227
+228: 228
+229: 229
+230: 230
+231: 231
+232: 232
+233: 233
+234: 234
+235: 235
+236: 236
+237: 237
+238: 238
+239: 239
+240: 240
+241: 241
+242: 242
+243: 243
+244: 244
+245: 245
+246: 246
+247: 247
+248: 248
+249: 249
+250: 250
+251: 251
+252: 252
+253: 253
+254: 254
+255: 255
+256: 256
+257: 257
+258: 258
+259: 259
+260: 260
+261: 261
+262: 262
+263: 263
+264: 264
+265: 265
+266: 266
+267: 267
+268: 268
+269: 269
+270: ABC
+271: ABC
+
+/This one's here because Perl does this differently and PCRE can't at present/
+Capturing subpattern count = 0
+No options
+First char = 'T'
+Need char = 't'
+
+/(main(O)?)+/
+Capturing subpattern count = 2
+No options
+First char = 'm'
+Need char = 'n'
+ mainmain
+ 0: mainmain
+ 1: main
+ mainOmain
+ 0: mainOmain
+ 1: main
+ 2: O
+
+/ End of testinput2 /
Capturing subpattern count = 0
No options
First char = ' '
diff --git a/srclib/pcre/testdata/testoutput3 b/srclib/pcre/testdata/testoutput3
index 8eb215eb59..cbe9aaa755 100644
--- a/srclib/pcre/testdata/testoutput3
+++ b/srclib/pcre/testdata/testoutput3
@@ -1,4 +1,4 @@
-PCRE version 3.1 09-Feb-2000
+PCRE version 3.9 02-Jan-2002
/(?<!bar)foo/
foo
@@ -2925,5 +2925,67 @@ No match
1: '
2: abcd xyz pqr
-/ End of test input /
+/((Z)+|A)*/
+ ZABCDEFG
+ 0: ZA
+ 1: A
+ 2: Z
+
+/(Z()|A)*/
+ ZABCDEFG
+ 0: ZA
+ 1: A
+ 2:
+
+/(Z(())|A)*/
+ ZABCDEFG
+ 0: ZA
+ 1: A
+ 2:
+ 3:
+
+/((?>Z)+|A)*/
+ ZABCDEFG
+ 0: ZA
+ 1: A
+
+/((?>)+|A)*/
+ ZABCDEFG
+ 0:
+ 1:
+
+/a*/g
+ abbab
+ 0: a
+ 0:
+ 0:
+ 0: a
+ 0:
+ 0:
+
+/^[a-\d]/
+ abcde
+ 0: a
+ -things
+ 0: -
+ 0digit
+ 0: 0
+ *** Failers
+No match
+ bcdef
+No match
+
+/^[\d-a]/
+ abcde
+ 0: a
+ -things
+ 0: -
+ 0digit
+ 0: 0
+ *** Failers
+No match
+ bcdef
+No match
+
+/ End of testinput3 /
diff --git a/srclib/pcre/testdata/testoutput4 b/srclib/pcre/testdata/testoutput4
index d951b48861..df81a0f548 100644
--- a/srclib/pcre/testdata/testoutput4
+++ b/srclib/pcre/testdata/testoutput4
@@ -1,4 +1,4 @@
-PCRE version 3.1 09-Feb-2000
+PCRE version 3.9 02-Jan-2002
/^[\w]+/
*** Failers
@@ -112,4 +112,5 @@ No match
école
No match
+/ End of testinput4 /