diff options
Diffstat (limited to 'gcc/ada/g-spipat.ads')
-rw-r--r-- | gcc/ada/g-spipat.ads | 1204 |
1 files changed, 1204 insertions, 0 deletions
diff --git a/gcc/ada/g-spipat.ads b/gcc/ada/g-spipat.ads new file mode 100644 index 00000000000..9b66d9e3e72 --- /dev/null +++ b/gcc/ada/g-spipat.ads @@ -0,0 +1,1204 @@ +------------------------------------------------------------------------------ +-- -- +-- GNAT LIBRARY COMPONENTS -- +-- -- +-- G N A T . S P I T B O L . P A T T E R N S -- +-- -- +-- S p e c -- +-- -- +-- $Revision: 1.17 $ +-- -- +-- Copyright (C) 1997-1999 Ada Core Technologies, Inc. -- +-- -- +-- GNAT is free software; you can redistribute it and/or modify it under -- +-- terms of the GNU General Public License as published by the Free Soft- -- +-- ware Foundation; either version 2, or (at your option) any later ver- -- +-- sion. GNAT is distributed in the hope that it will be useful, but WITH- -- +-- OUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY -- +-- or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License -- +-- for more details. You should have received a copy of the GNU General -- +-- Public License distributed with GNAT; see file COPYING. If not, write -- +-- to the Free Software Foundation, 59 Temple Place - Suite 330, Boston, -- +-- MA 02111-1307, USA. -- +-- -- +-- As a special exception, if other files instantiate generics from this -- +-- unit, or you link this unit with other files to produce an executable, -- +-- this unit does not by itself cause the resulting executable to be -- +-- covered by the GNU General Public License. This exception does not -- +-- however invalidate any other reasons why the executable file might be -- +-- covered by the GNU Public License. -- +-- -- +-- GNAT is maintained by Ada Core Technologies Inc (http://www.gnat.com). -- +-- -- +------------------------------------------------------------------------------ + +-- SPITBOL-like pattern construction and matching + +-- This child package of GNAT.SPITBOL provides a complete implementation +-- of the SPITBOL-like pattern construction and matching operations. This +-- package is based on Macro-SPITBOL created by Robert Dewar. + +------------------------------------------------------------ +-- Summary of Pattern Matching Packages in GNAT Hierarchy -- +------------------------------------------------------------ + +-- There are three related packages that perform pattern maching functions. +-- the following is an outline of these packages, to help you determine +-- which is best for your needs. + +-- GNAT.Regexp (files g-regexp.ads/g-regexp.adb) +-- This is a simple package providing Unix-style regular expression +-- matching with the restriction that it matches entire strings. It +-- is particularly useful for file name matching, and in particular +-- it provides "globbing patterns" that are useful in implementing +-- unix or DOS style wild card matching for file names. + +-- GNAT.Regpat (files g-regpat.ads/g-regpat.adb) +-- This is a more complete implementation of Unix-style regular +-- expressions, copied from the original V7 style regular expression +-- library written in C by Henry Spencer. It is functionally the +-- same as this library, and uses the same internal data structures +-- stored in a binary compatible manner. + +-- GNAT.Spitbol.Patterns (files g-spipat.ads/g-spipat.adb) +-- This is a completely general patterm matching package based on the +-- pattern language of SNOBOL4, as implemented in SPITBOL. The pattern +-- language is modeled on context free grammars, with context sensitive +-- extensions that provide full (type 0) computational capabilities. + +with Ada.Finalization; use Ada.Finalization; +with Ada.Strings.Maps; use Ada.Strings.Maps; +with Ada.Text_IO; use Ada.Text_IO; + +package GNAT.Spitbol.Patterns is +pragma Elaborate_Body (Patterns); + + ------------------------------- + -- Pattern Matching Tutorial -- + ------------------------------- + + -- A pattern matching operation (a call to one of the Match subprograms) + -- takes a subject string and a pattern, and optionally a replacement + -- string. The replacement string option is only allowed if the subject + -- is a variable. + + -- The pattern is matched against the subject string, and either the + -- match fails, or it succeeds matching a contiguous substring. If a + -- replacement string is specified, then the subject string is modified + -- by replacing the matched substring with the given replacement. + + + -- Concatenation and Alternation + -- ============================= + + -- A pattern consists of a series of pattern elements. The pattern is + -- built up using either the concatenation operator: + + -- A & B + + -- which means match A followed immediately by matching B, or the + -- alternation operator: + + -- A or B + + -- which means first attempt to match A, and then if that does not + -- succeed, match B. + + -- There is full backtracking, which means that if a given pattern + -- element fails to match, then previous alternatives are matched. + -- For example if we have the pattern: + + -- (A or B) & (C or D) & (E or F) + + -- First we attempt to match A, if that succeeds, then we go on to try + -- to match C, and if that succeeds, we go on to try to match E. If E + -- fails, then we try F. If F fails, then we go back and try matching + -- D instead of C. Let's make this explicit using a specific example, + -- and introducing the simplest kind of pattern element, which is a + -- literal string. The meaning of this pattern element is simply to + -- match the characters that correspond to the string characters. Now + -- let's rewrite the above pattern form with specific string literals + -- as the pattern elements: + + -- ("ABC" or "AB") & ("DEF" or "CDE") & ("GH" or "IJ") + + -- The following strings will be attempted in sequence: + + -- ABC . DEF . GH + -- ABC . DEF . IJ + -- ABC . CDE . GH + -- ABC . CDE . IJ + -- AB . DEF . GH + -- AB . DEF . IJ + -- AB . CDE . GH + -- AB . CDE . IJ + + -- Here we use the dot simply to separate the pieces of the string + -- matched by the three separate elements. + + + -- Moving the Start Point + -- ====================== + + -- A pattern is not required to match starting at the first character + -- of the string, and is not required to match to the end of the string. + -- The first attempt does indeed attempt to match starting at the first + -- character of the string, trying all the possible alternatives. But + -- if all alternatives fail, then the starting point of the match is + -- moved one character, and all possible alternatives are attempted at + -- the new anchor point. + + -- The entire match fails only when every possible starting point has + -- been attempted. As an example, suppose that we had the subject + -- string + + -- "ABABCDEIJKL" + + -- matched using the pattern in the previous example: + + -- ("ABC" or "AB") & ("DEF" or "CDE") & ("GH" or "IJ") + + -- would succeed, afer two anchor point moves: + + -- "ABABCDEIJKL" + -- ^^^^^^^ + -- matched + -- section + + -- This mode of pattern matching is called the unanchored mode. It is + -- also possible to put the pattern matcher into anchored mode by + -- setting the global variable Anchored_Mode to True. This will cause + -- all subsequent matches to be performed in anchored mode, where the + -- match is required to start at the first character. + + -- We will also see later how the effect of an anchored match can be + -- obtained for a single specified anchor point if this is desired. + + + -- Other Pattern Elements + -- ====================== + + -- In addition to strings (or single characters), there are many special + -- pattern elements that correspond to special predefined alternations: + + -- Arb Matches any string. First it matches the null string, and + -- then on a subsequent failure, matches one character, and + -- then two characters, and so on. It only fails if the + -- entire remaining string is matched. + + -- Bal Matches a non-empty string that is parentheses balanced + -- with respect to ordinary () characters. Examples of + -- balanced strings are "ABC", "A((B)C)", and "A(B)C(D)E". + -- Bal matches the shortest possible balanced string on the + -- first attempt, and if there is a subsequent failure, + -- attempts to extend the string. + + -- Cancel Immediately aborts the entire pattern match, signalling + -- failure. This is a specialized pattern element, which is + -- useful in conjunction with some of the special pattern + -- elements that have side effects. + + -- Fail The null alternation. Matches no possible strings, so it + -- always signals failure. This is a specialized pattern + -- element, which is useful in conjunction with some of the + -- special pattern elements that have side effects. + + -- Fence Matches the null string at first, and then if a failure + -- causes alternatives to be sought, aborts the match (like + -- a Cancel). Note that using Fence at the start of a pattern + -- has the same effect as matching in anchored mode. + + -- Rest Matches from the current point to the last character in + -- the string. This is a specialized pattern element, which + -- is useful in conjunction with some of the special pattern + -- elements that have side effects. + + -- Succeed Repeatedly matches the null string (it is equivalent to + -- the alternation ("" or "" or "" ....). This is a special + -- pattern element, which is useful in conjunction with some + -- of the special pattern elements that have side effects. + + + -- Pattern Construction Functions + -- ============================== + + -- The following functions construct additional pattern elements + + -- Any(S) Where S is a string, matches a single character that is + -- any one of the characters in S. Fails if the current + -- character is not one of the given set of characters. + + -- Arbno(P) Where P is any pattern, matches any number of instances + -- of the pattern, starting with zero occurrences. It is + -- thus equivalent to ("" or (P & ("" or (P & ("" ....)))). + -- The pattern P may contain any number of pattern elements + -- including the use of alternatiion and concatenation. + + -- Break(S) Where S is a string, matches a string of zero or more + -- characters up to but not including a break character + -- that is one of the characters given in the string S. + -- Can match the null string, but cannot match the last + -- character in the string, since a break character is + -- required to be present. + + -- BreakX(S) Where S is a string, behaves exactly like Break(S) when + -- it first matches, but if a string is successfully matched, + -- then a susequent failure causes an attempt to extend the + -- matched string. + + -- Fence(P) Where P is a pattern, attempts to match the pattern P + -- including trying all possible alternatives of P. If none + -- of these alternatives succeeds, then the Fence pattern + -- fails. If one alternative succeeds, then the pattern + -- match proceeds, but on a subsequent failure, no attempt + -- is made to search for alternative matches of P. The + -- pattern P may contain any number of pattern elements + -- including the use of alternatiion and concatenation. + + -- Len(N) Where N is a natural number, matches the given number of + -- characters. For example, Len(10) matches any string that + -- is exactly ten characters long. + + -- NotAny(S) Where S is a string, matches a single character that is + -- not one of the characters of S. Fails if the current + -- characer is one of the given set of characters. + + -- NSpan(S) Where S is a string, matches a string of zero or more + -- characters that is among the characters given in the + -- string. Always matches the longest possible such string. + -- Always succeeds, since it can match the null string. + + -- Pos(N) Where N is a natural number, matches the null string + -- if exactly N characters have been matched so far, and + -- otherwise fails. + + -- Rpos(N) Where N is a natural number, matches the null string + -- if exactly N characters remain to be matched, and + -- otherwise fails. + + -- Rtab(N) Where N is a natural number, matches characters from + -- the current position until exactly N characters remain + -- to be matched in the string. Fails if fewer than N + -- unmatched characters remain in the string. + + -- Tab(N) Where N is a natural number, matches characters from + -- the current position until exactly N characters have + -- been matched in all. Fails if more than N characters + -- have already been matched. + + -- Span(S) Where S is a string, matches a string of one or more + -- characters that is among the characters given in the + -- string. Always matches the longest possible such string. + -- Fails if the current character is not one of the given + -- set of characters. + + -- Recursive Pattern Matching + -- ========================== + + -- The plus operator (+P) where P is a pattern variable, creates + -- a recursive pattern that will, at pattern matching time, follow + -- the pointer to obtain the referenced pattern, and then match this + -- pattern. This may be used to construct recursive patterns. Consider + -- for example: + + -- P := ("A" or ("B" & (+P))) + + -- On the first attempt, this pattern attempts to match the string "A". + -- If this fails, then the alternative matches a "B", followed by an + -- attempt to match P again. This second attempt first attempts to + -- match "A", and so on. The result is a pattern that will match a + -- string of B's followed by a single A. + + -- This particular example could simply be written as NSpan('B') & 'A', + -- but the use of recursive patterns in the general case can construct + -- complex patterns which could not otherwise be built. + + + -- Pattern Assignment Operations + -- ============================= + + -- In addition to the overall result of a pattern match, which indicates + -- success or failure, it is often useful to be able to keep track of + -- the pieces of the subject string that are matched by individual + -- pattern elements, or subsections of the pattern. + + -- The pattern assignment operators allow this capability. The first + -- form is the immediate assignment: + + -- P * S + + -- Here P is an arbitrary pattern, and S is a variable of type VString + -- that will be set to the substring matched by P. This assignment + -- happens during pattern matching, so if P matches more than once, + -- then the assignment happens more than once. + + -- The deferred assignment operation: + + -- P ** S + + -- avoids these multiple assignments by deferring the assignment to the + -- end of the match. If the entire match is successful, and if the + -- pattern P was part of the successful match, then at the end of the + -- matching operation the assignment to S of the string matching P is + -- performed. + + -- The cursor assignment operation: + + -- Setcur(N'Access) + + -- assigns the current cursor position to the natural variable N. The + -- cursor position is defined as the count of characters that have been + -- matched so far (including any start point moves). + + -- Finally the operations * and ** may be used with values of type + -- Text_IO.File_Access. The effect is to do a Put_Line operation of + -- the matched substring. These are particularly useful in debugging + -- pattern matches. + + + -- Deferred Matching + -- ================= + + -- The pattern construction functions (such as Len and Any) all permit + -- the use of pointers to natural or string values, or functions that + -- return natural or string values. These forms cause the actual value + -- to be obtained at pattern matching time. This allows interesting + -- possibilities for constructing dynamic patterns as illustrated in + -- the examples section. + + -- In addition the (+S) operator may be used where S is a pointer to + -- string or function returning string, with a similar deferred effect. + + -- A special use of deferred matching is the construction of predicate + -- functions. The element (+P) where P is an access to a function that + -- returns a Boolean value, causes the function to be called at the + -- time the element is matched. If the function returns True, then the + -- null string is matched, if the function returns False, then failure + -- is signalled and previous alternatives are sought. + + -- Deferred Replacement + -- ==================== + + -- The simple model given for pattern replacement (where the matched + -- substring is replaced by the string given as the third argument to + -- Match) works fine in simple cases, but this approach does not work + -- in the case where the expression used as the replacement string is + -- dependent on values set by the match. + + -- For example, suppose we want to find an instance of a parenthesized + -- character, and replace the parentheses with square brackets. At first + -- glance it would seem that: + + -- Match (Subject, '(' & Len (1) * Char & ')', '[' & Char & ']'); + + -- would do the trick, but that does not work, because the third + -- argument to Match gets evaluated too early, before the call to + -- Match, and before the pattern match has had a chance to set Char. + + -- To solve this problem we provide the deferred replacement capability. + -- With this approach, which of course is only needed if the pattern + -- involved has side effects, is to do the match in two stages. The + -- call to Match sets a pattern result in a variable of the private + -- type Match_Result, and then a subsequent Replace operation uses + -- this Match_Result object to perform the required replacement. + + -- Using this approach, we can now write the above operation properly + -- in a manner that will work: + + -- M : Match_Result; + -- ... + -- Match (Subject, '(' & Len (1) * Char & ')', M); + -- Replace (M, '[' & Char & ']'); + + -- As with other Match cases, there is a function and procedure form + -- of this match call. A call to Replace after a failed match has no + -- effect. Note that Subject should not be modified between the calls. + + -- Examples of Pattern Matching + -- ============================ + + -- First a simple example of the use of pattern replacement to remove + -- a line number from the start of a string. We assume that the line + -- number has the form of a string of decimal digits followed by a + -- period, followed by one or more spaces. + + -- Digs : constant Pattern := Span("0123456789"); + + -- Lnum : constant Pattern := Pos(0) & Digs & '.' & Span(' '); + + -- Now to use this pattern we simply do a match with a replacement: + + -- Match (Line, Lnum, ""); + + -- which replaces the line number by the null string. Note that it is + -- also possible to use an Ada.Strings.Maps.Character_Set value as an + -- argument to Span and similar functions, and in particular all the + -- useful constants 'in Ada.Strings.Maps.Constants are available. This + -- means that we could define Digs as: + + -- Digs : constant Pattern := Span(Decimal_Digit_Set); + + -- The style we use here, of defining constant patterns and then using + -- them is typical. It is possible to build up patterns dynamically, + -- but it is usually more efficient to build them in pieces in advance + -- using constant declarations. Note in particular that although it is + -- possible to construct a pattern directly as an argument for the + -- Match routine, it is much more efficient to preconstruct the pattern + -- as we did in this example. + + -- Now let's look at the use of pattern assignment to break a + -- string into sections. Suppose that the input string has two + -- unsigned decimal integers, separated by spaces or a comma, + -- with spaces allowed anywhere. Then we can isolate the two + -- numbers with the following pattern: + + -- Num1, Num2 : aliased VString; + + -- B : constant Pattern := NSpan(' '); + + -- N : constant Pattern := Span("0123456789"); + + -- T : constant Pattern := + -- NSpan(' ') & N * Num1 & Span(" ,") & N * Num2; + + -- The match operation Match (" 124, 257 ", T) would assign the + -- string 124 to Num1 and the string 257 to Num2. + + -- Now let's see how more complex elements can be built from the + -- set of primitive elements. The following pattern matches strings + -- that have the syntax of Ada 95 based literals: + + -- Digs : constant Pattern := Span(Decimal_Digit_Set); + -- UDigs : constant Pattern := Digs & Arbno('_' & Digs); + + -- Edig : constant Pattern := Span(Hexadecimal_Digit_Set); + -- UEdig : constant Pattern := Edig & Arbno('_' & Edig); + + -- Bnum : constant Pattern := Udigs & '#' & UEdig & '#'; + + -- A match against Bnum will now match the desired strings, e.g. + -- it will match 16#123_abc#, but not a#b#. However, this pattern + -- is not quite complete, since it does not allow colons to replace + -- the pound signs. The following is more complete: + + -- Bchar : constant Pattern := Any("#:"); + -- Bnum : constant Pattern := Udigs & Bchar & UEdig & Bchar; + + -- but that is still not quite right, since it allows # and : to be + -- mixed, and they are supposed to be used consistently. We solve + -- this by using a deferred match. + + -- Temp : aliased VString; + + -- Bnum : constant Pattern := + -- Udigs & Bchar * Temp & UEdig & (+Temp) + + -- Here the first instance of the base character is stored in Temp, and + -- then later in the pattern we rematch the value that was assigned. + + -- For an example of a recursive pattern, let's define a pattern + -- that is like the built in Bal, but the string matched is balanced + -- with respect to square brackets or curly brackets. + + -- The language for such strings might be defined in extended BNF as + + -- ELEMENT ::= <any character other than [] or {}> + -- | '[' BALANCED_STRING ']' + -- | '{' BALANCED_STRING '}' + + -- BALANCED_STRING ::= ELEMENT {ELEMENT} + + -- Here we use {} to indicate zero or more occurrences of a term, as + -- is common practice in extended BNF. Now we can translate the above + -- BNF into recursive patterns as follows: + + -- Element, Balanced_String : aliased Pattern; + -- . + -- . + -- . + -- Element := NotAny ("[]{}") + -- or + -- ('[' & (+Balanced_String) & ']') + -- or + -- ('{' & (+Balanced_String) & '}'); + + -- Balanced_String := Element & Arbno (Element); + + -- Note the important use of + here to refer to a pattern not yet + -- defined. Note also that we use assignments precisely because we + -- cannot refer to as yet undeclared variables in initializations. + + -- Now that this pattern is constructed, we can use it as though it + -- were a new primitive pattern element, and for example, the match: + + -- Match ("xy[ab{cd}]", Balanced_String * Current_Output & Fail); + + -- will generate the output: + + -- x + -- xy + -- xy[ab{cd}] + -- y + -- y[ab{cd}] + -- [ab{cd}] + -- a + -- ab + -- ab{cd} + -- b + -- b{cd} + -- {cd} + -- c + -- cd + -- d + + -- Note that the function of the fail here is simply to force the + -- pattern Balanced_String to match all possible alternatives. Studying + -- the operation of this pattern in detail is highly instructive. + + -- Finally we give a rather elaborate example of the use of deferred + -- matching. The following declarations build up a pattern which will + -- find the longest string of decimal digits in the subject string. + + -- Max, Cur : VString; + -- Loc : Natural; + + -- function GtS return Boolean is + -- begin + -- return Length (Cur) > Length (Max); + -- end GtS; + + -- Digit : constant Character_Set := Decimal_Digit_Set; + + -- Digs : constant Pattern := Span(Digit); + + -- Find : constant Pattern := + -- "" * Max & Fence & -- initialize Max to null + -- BreakX (Digit) & -- scan looking for digits + -- ((Span(Digit) * Cur & -- assign next string to Cur + -- (+GtS'Unrestricted_Access) & -- check size(Cur) > Size(Max) + -- Setcur(Loc'Access)) -- if so, save location + -- * Max) & -- and assign to Max + -- Fail; -- seek all alternatives + + -- As we see from the comments here, complex patterns like this take + -- on aspects of sequential programs. In fact they are sequential + -- programs with general backtracking. In this pattern, we first use + -- a pattern assignment that matches null and assigns it to Max, so + -- that it is initialized for the new match. Now BreakX scans to the + -- next digit. Arb would do here, but BreakX will be more efficient. + -- Once we have found a digit, we scan out the longest string of + -- digits with Span, and assign it to Cur. The deferred call to GtS + -- tests if the string we assigned to Cur is the longest so far. If + -- not, then failure is signalled, and we seek alternatives (this + -- means that BreakX will extend and look for the next digit string). + -- If the call to GtS succeeds then the matched string is assigned + -- as the largest string so far into Max and its location is saved + -- in Loc. Finally Fail forces the match to fail and seek alternatives, + -- so that the entire string is searched. + + -- If the pattern Find is matched against a string, the variable Max + -- at the end of the pattern will have the longest string of digits, + -- and Loc will be the starting character location of the string. For + -- example, Match("ab123cd4657ef23", Find) will assign "4657" to Max + -- and 11 to Loc (indicating that the string ends with the eleventh + -- character of the string). + + -- Note: the use of Unrestricted_Access to reference GtS will not + -- be needed if GtS is defined at the outer level, but definitely + -- will be necessary if GtS is a nested function (in which case of + -- course the scope of the pattern Find will be restricted to this + -- nested scope, and this cannot be checked, i.e. use of the pattern + -- outside this scope is erroneous). Generally it is a good idea to + -- define patterns and the functions they call at the outer level + -- where possible, to avoid such problems. + + + -- Correspondence with Pattern Matching in SPITBOL + -- =============================================== + + -- Generally the Ada syntax and names correspond closely to SPITBOL + -- syntax for pattern matching construction. + + -- The basic pattern construction operators are renamed as follows: + + -- Spitbol Ada + + -- (space) & + -- | or + -- $ * + -- . ** + + -- The Ada operators were chosen so that the relative precedences of + -- these operators corresponds to that of the Spitbol operators, but + -- as always, the use of parentheses is advisable to clarify. + + -- The pattern construction operators all have similar names except for + + -- Spitbol Ada + + -- Abort Cancel + -- Rem Rest + + -- where we have clashes with Ada reserved names. + + -- Ada requires the use of 'Access to refer to functions used in the + -- pattern match, and often the use of 'Unrestricted_Access may be + -- necessary to get around the scope restrictions if the functions + -- are not declared at the outer level. + + -- The actual pattern matching syntax is modified in Ada as follows: + + -- Spitbol Ada + + -- X Y Match (X, Y); + -- X Y = Z Match (X, Y, Z); + + -- and pattern failure is indicated by returning a Boolean result from + -- the Match function (True for success, False for failure). + + ----------------------- + -- Type Declarations -- + ----------------------- + + type Pattern is private; + -- Type representing a pattern. This package provides a complete set of + -- operations for constructing patterns that can be used in the pattern + -- matching operations provided. + + type Boolean_Func is access function return Boolean; + -- General Boolean function type. When this type is used as a formal + -- parameter type in this package, it indicates a deferred predicate + -- pattern. The function will be called when the pattern element is + -- matched and failure signalled if False is returned. + + type Natural_Func is access function return Natural; + -- General Natural function type. When this type is used as a formal + -- parameter type in this package, it indicates a deferred pattern. + -- The function will be called when the pattern element is matched + -- to obtain the currently referenced Natural value. + + type VString_Func is access function return VString; + -- General VString function type. When this type is used as a formal + -- parameter type in this package, it indicates a deferred pattern. + -- The function will be called when the pattern element is matched + -- to obtain the currently referenced string value. + + subtype PString is String; + -- This subtype is used in the remainder of the package to indicate a + -- formal parameter that is converted to its corresponding pattern, + -- i.e. a pattern that matches the characters of the string. + + subtype PChar is Character; + -- Similarly, this subtype is used in the remainder of the package to + -- indicate a formal parameter that is converted to its corresponding + -- pattern, i.e. a pattern that matches this one character. + + subtype VString_Var is VString; + subtype Pattern_Var is Pattern; + -- These synonyms are used as formal parameter types to a function where, + -- if the language allowed, we would use in out parameters, but we are + -- not allowed to have in out parameters for functions. Instead we pass + -- actuals which must be variables, and with a bit of trickery in the + -- body, manage to interprete them properly as though they were indeed + -- in out parameters. + + -------------------------------- + -- Basic Pattern Construction -- + -------------------------------- + + function "&" (L : Pattern; R : Pattern) return Pattern; + function "&" (L : PString; R : Pattern) return Pattern; + function "&" (L : Pattern; R : PString) return Pattern; + function "&" (L : PChar; R : Pattern) return Pattern; + function "&" (L : Pattern; R : PChar) return Pattern; + + -- Pattern concatenation. Matches L followed by R. + + function "or" (L : Pattern; R : Pattern) return Pattern; + function "or" (L : PString; R : Pattern) return Pattern; + function "or" (L : Pattern; R : PString) return Pattern; + function "or" (L : PString; R : PString) return Pattern; + function "or" (L : PChar; R : Pattern) return Pattern; + function "or" (L : Pattern; R : PChar) return Pattern; + function "or" (L : PChar; R : PChar) return Pattern; + function "or" (L : PString; R : PChar) return Pattern; + function "or" (L : PChar; R : PString) return Pattern; + -- Pattern alternation. Creates a pattern that will first try to match + -- L and then on a subsequent failure, attempts to match R instead. + + ---------------------------------- + -- Pattern Assignment Functions -- + ---------------------------------- + + function "*" (P : Pattern; Var : VString_Var) return Pattern; + function "*" (P : PString; Var : VString_Var) return Pattern; + function "*" (P : PChar; Var : VString_Var) return Pattern; + -- Matches P, and if the match succeeds, assigns the matched substring + -- to the given VString variable S. This assignment happens as soon as + -- the substring is matched, and if the pattern P1 is matched more than + -- once during the course of the match, then the assignment will occur + -- more than once. + + function "**" (P : Pattern; Var : VString_Var) return Pattern; + function "**" (P : PString; Var : VString_Var) return Pattern; + function "**" (P : PChar; Var : VString_Var) return Pattern; + -- Like "*" above, except that the assignment happens at most once + -- after the entire match is completed successfully. If the match + -- fails, then no assignment takes place. + + ---------------------------------- + -- Deferred Matching Operations -- + ---------------------------------- + + function "+" (Str : VString_Var) return Pattern; + -- Here Str must be a VString variable. This function constructs a + -- pattern which at pattern matching time will access the current + -- value of this variable, and match against these characters. + + function "+" (Str : VString_Func) return Pattern; + -- Constructs a pattern which at pattern matching time calls the given + -- function, and then matches against the string or character value + -- that is returned by the call. + + function "+" (P : Pattern_Var) return Pattern; + -- Here P must be a Pattern variable. This function constructs a + -- pattern which at pattern matching time will access the current + -- value of this variable, and match against the pattern value. + + function "+" (P : Boolean_Func) return Pattern; + -- Constructs a predicate pattern function that at pattern matching time + -- calls the given function. If True is returned, then the pattern matches. + -- If False is returned, then failure is signalled. + + -------------------------------- + -- Pattern Building Functions -- + -------------------------------- + + function Arb return Pattern; + -- Constructs a pattern that will match any string. On the first attempt, + -- the pattern matches a null string, then on each successive failure, it + -- matches one more character, and only fails if matching the entire rest + -- of the string. + + function Arbno (P : Pattern) return Pattern; + function Arbno (P : PString) return Pattern; + function Arbno (P : PChar) return Pattern; + -- Pattern repetition. First matches null, then on a subsequent failure + -- attempts to match an additional instance of the given pattern. + -- Equivalent to (but more efficient than) P & ("" or (P & ("" or ... + + function Any (Str : String) return Pattern; + function Any (Str : VString) return Pattern; + function Any (Str : Character) return Pattern; + function Any (Str : Character_Set) return Pattern; + function Any (Str : access VString) return Pattern; + function Any (Str : VString_Func) return Pattern; + -- Constructs a pattern that matches a single character that is one of + -- the characters in the given argument. The pattern fails if the current + -- character is not in Str. + + function Bal return Pattern; + -- Constructs a pattern that will match any non-empty string that is + -- parentheses balanced with respect to the normal parentheses characters. + -- Attempts to extend the string if a subsequent failure occurs. + + function Break (Str : String) return Pattern; + function Break (Str : VString) return Pattern; + function Break (Str : Character) return Pattern; + function Break (Str : Character_Set) return Pattern; + function Break (Str : access VString) return Pattern; + function Break (Str : VString_Func) return Pattern; + -- Constructs a pattern that matches a (possibly null) string which + -- is immediately followed by a character in the given argument. This + -- character is not part of the matched string. The pattern fails if + -- the remaining characters to be matched do not include any of the + -- characters in Str. + + function BreakX (Str : String) return Pattern; + function BreakX (Str : VString) return Pattern; + function BreakX (Str : Character) return Pattern; + function BreakX (Str : Character_Set) return Pattern; + function BreakX (Str : access VString) return Pattern; + function BreakX (Str : VString_Func) return Pattern; + -- Like Break, but the pattern attempts to extend on a failure to find + -- the next occurrence of a character in Str, and only fails when the + -- last such instance causes a failure. + + function Cancel return Pattern; + -- Constructs a pattern that immediately aborts the entire match + + function Fail return Pattern; + -- Constructs a pattern that always fails. + + function Fence return Pattern; + -- Constructs a pattern that matches null on the first attempt, and then + -- causes the entire match to be aborted if a subsequent failure occurs. + + function Fence (P : Pattern) return Pattern; + -- Constructs a pattern that first matches P. if P fails, then the + -- constructed pattern fails. If P succeeds, then the match proceeds, + -- but if subsequent failure occurs, alternatives in P are not sought. + -- The idea of Fence is that each time the pattern is matched, just + -- one attempt is made to match P, without trying alternatives. + + function Len (Count : Natural) return Pattern; + function Len (Count : access Natural) return Pattern; + function Len (Count : Natural_Func) return Pattern; + -- Constructs a pattern that matches exactly the given number of + -- characters. The pattern fails if fewer than this number of characters + -- remain to be matched in the string. + + function NotAny (Str : String) return Pattern; + function NotAny (Str : VString) return Pattern; + function NotAny (Str : Character) return Pattern; + function NotAny (Str : Character_Set) return Pattern; + function NotAny (Str : access VString) return Pattern; + function NotAny (Str : VString_Func) return Pattern; + -- Constructs a pattern that matches a single character that is not + -- one of the characters in the given argument. The pattern Fails if + -- the current character is in Str. + + function NSpan (Str : String) return Pattern; + function NSpan (Str : VString) return Pattern; + function NSpan (Str : Character) return Pattern; + function NSpan (Str : Character_Set) return Pattern; + function NSpan (Str : access VString) return Pattern; + function NSpan (Str : VString_Func) return Pattern; + -- Constructs a pattern that matches the longest possible string + -- consisting entirely of characters from the given argument. The + -- string may be empty, so this pattern always succeeds. + + function Pos (Count : Natural) return Pattern; + function Pos (Count : access Natural) return Pattern; + function Pos (Count : Natural_Func) return Pattern; + -- Constructs a pattern that matches the null string if exactly Count + -- characters have already been matched, and otherwise fails. + + function Rest return Pattern; + -- Constructs a pattern that always succeeds, matching the remaining + -- unmatched characters in the pattern. + + function Rpos (Count : Natural) return Pattern; + function Rpos (Count : access Natural) return Pattern; + function Rpos (Count : Natural_Func) return Pattern; + -- Constructs a pattern that matches the null string if exactly Count + -- characters remain to be matched in the string, and otherwise fails. + + function Rtab (Count : Natural) return Pattern; + function Rtab (Count : access Natural) return Pattern; + function Rtab (Count : Natural_Func) return Pattern; + -- Constructs a pattern that matches from the current location until + -- exactly Count characters remain to be matched in the string. The + -- pattern fails if fewer than Count characters remain to be matched. + + function Setcur (Var : access Natural) return Pattern; + -- Constructs a pattern that matches the null string, and assigns the + -- current cursor position in the string. This value is the number of + -- characters matched so far. So it is zero at the start of the match. + + function Span (Str : String) return Pattern; + function Span (Str : VString) return Pattern; + function Span (Str : Character) return Pattern; + function Span (Str : Character_Set) return Pattern; + function Span (Str : access VString) return Pattern; + function Span (Str : VString_Func) return Pattern; + -- Constructs a pattern that matches the longest possible string + -- consisting entirely of characters from the given argument. The + -- string cannot be empty , so the pattern fails if the current + -- character is not one of the characters in Str. + + function Succeed return Pattern; + -- Constructs a pattern that succeeds matching null, both on the first + -- attempt, and on any rematch attempt, i.e. it is equivalent to an + -- infinite alternation of null strings. + + function Tab (Count : Natural) return Pattern; + function Tab (Count : access Natural) return Pattern; + function Tab (Count : Natural_Func) return Pattern; + -- Constructs a pattern that from the current location until Count + -- characters have been matched. The pattern fails if more than Count + -- characters have already been matched. + + --------------------------------- + -- Pattern Matching Operations -- + --------------------------------- + + -- The Match function performs an actual pattern matching operation. + -- The versions with three parameters perform a match without modifying + -- the subject string and return a Boolean result indicating if the + -- match is successful or not. The Anchor parameter is set to True to + -- obtain an anchored match in which the pattern is required to match + -- the first character of the string. In an unanchored match, which is + + -- the default, successive attempts are made to match the given pattern + -- at each character of the subject string until a match succeeds, or + -- until all possibilities have failed. + + -- Note that pattern assignment functions in the pattern may generate + -- side effects, so these functions are not necessarily pure. + + Anchored_Mode : Boolean := False; + -- This global variable can be set True to cause all subsequent pattern + -- matches to operate in anchored mode. In anchored mode, no attempt is + -- made to move the anchor point, so that if the match succeeds it must + -- succeed starting at the first character. Note that the effect of + -- anchored mode may be achieved in individual pattern matches by using + -- Fence or Pos(0) at the start of the pattern. + + Pattern_Stack_Overflow : exception; + -- Exception raised if internal pattern matching stack overflows. This + -- is typically the result of runaway pattern recursion. If there is a + -- genuine case of stack overflow, then either the match must be broken + -- down into simpler steps, or the stack limit must be reset. + + Stack_Size : constant Positive := 2000; + -- Size used for internal pattern matching stack. Increase this size if + -- complex patterns cause Pattern_Stack_Overflow to be raised. + + -- Simple match functions. The subject is matched against the pattern. + -- Any immediate or deferred assignments or writes are executed, and + -- the returned value indicates whether or not the match succeeded. + + function Match + (Subject : VString; + Pat : Pattern) + return Boolean; + + function Match + (Subject : VString; + Pat : PString) + return Boolean; + + function Match + (Subject : String; + Pat : Pattern) + return Boolean; + + function Match + (Subject : String; + Pat : PString) + return Boolean; + + -- Replacement functions. The subject is matched against the pattern. + -- Any immediate or deferred assignments or writes are executed, and + -- the returned value indicates whether or not the match succeeded. + -- If the match succeeds, then the matched part of the subject string + -- is replaced by the given Replace string. + + function Match + (Subject : VString_Var; + Pat : Pattern; + Replace : VString) + return Boolean; + + function Match + (Subject : VString_Var; + Pat : PString; + Replace : VString) + return Boolean; + + function Match + (Subject : VString_Var; + Pat : Pattern; + Replace : String) + return Boolean; + + function Match + (Subject : VString_Var; + Pat : PString; + Replace : String) + return Boolean; + + -- Simple match procedures. The subject is matched against the pattern. + -- Any immediate or deferred assignments or writes are executed. No + -- indication of success or failure is returned. + + procedure Match + (Subject : VString; + Pat : Pattern); + + procedure Match + (Subject : VString; + Pat : PString); + + procedure Match + (Subject : String; + Pat : Pattern); + + procedure Match + (Subject : String; + Pat : PString); + + -- Replacement procedures. The subject is matched against the pattern. + -- Any immediate or deferred assignments or writes are executed. No + -- indication of success or failure is returned. If the match succeeds, + -- then the matched part of the subject string is replaced by the given + -- Replace string. + + procedure Match + (Subject : in out VString; + Pat : Pattern; + Replace : VString); + + procedure Match + (Subject : in out VString; + Pat : PString; + Replace : VString); + + procedure Match + (Subject : in out VString; + Pat : Pattern; + Replace : String); + + procedure Match + (Subject : in out VString; + Pat : PString; + Replace : String); + + -- Deferred Replacement + + type Match_Result is private; + -- Type used to record result of pattern match + + subtype Match_Result_Var is Match_Result; + -- This synonyms is used as a formal parameter type to a function where, + -- if the language allowed, we would use an in out parameter, but we are + -- not allowed to have in out parameters for functions. Instead we pass + -- actuals which must be variables, and with a bit of trickery in the + -- body, manage to interprete them properly as though they were indeed + -- in out parameters. + + function Match + (Subject : VString_Var; + Pat : Pattern; + Result : Match_Result_Var) + return Boolean; + + procedure Match + (Subject : in out VString; + Pat : Pattern; + Result : out Match_Result); + + procedure Replace + (Result : in out Match_Result; + Replace : VString); + -- Given a previous call to Match which set Result, performs a pattern + -- replacement if the match was successful. Has no effect if the match + -- failed. This call should immediately follow the Match call. + + ------------------------ + -- Debugging Routines -- + ------------------------ + + -- Debugging pattern matching operations can often be quite complex, + -- since there is no obvious way to trace the progress of the match. + -- The declarations in this section provide some debugging assistance. + + Debug_Mode : Boolean := False; + -- This global variable can be set True to generate debugging on all + -- subsequent calls to Match. The debugging output is a full trace of + -- the actions of the pattern matcher, written to Standard_Output. The + -- level of this information is intended to be comprehensible at the + -- abstract level of this package declaration. However, note that the + -- use of this switch often generates large amounts of output. + + function "*" (P : Pattern; Fil : File_Access) return Pattern; + function "*" (P : PString; Fil : File_Access) return Pattern; + function "*" (P : PChar; Fil : File_Access) return Pattern; + function "**" (P : Pattern; Fil : File_Access) return Pattern; + function "**" (P : PString; Fil : File_Access) return Pattern; + function "**" (P : PChar; Fil : File_Access) return Pattern; + -- These are similar to the corresponding pattern assignment operations + -- except that instead of setting the value of a variable, the matched + -- substring is written to the appropriate file. This can be useful in + -- following the progress of a match without generating the full amount + + -- of information obtained by setting Debug_Mode to True. + + Terminal : constant File_Access := Standard_Error; + Output : constant File_Access := Standard_Output; + -- Two handy synonyms for use with the above pattern write operations. + + -- Finally we have some routines that are useful for determining what + -- patterns are in use, particularly if they are constructed dynamically. + + function Image (P : Pattern) return String; + function Image (P : Pattern) return VString; + -- This procedures yield strings that corresponds to the syntax needed + -- to create the given pattern using the functions in this package. The + -- form of this string is such that it could actually be compiled and + -- evaluated to yield the required pattern except for references to + -- variables and functions, which are output using one of the following + -- forms: + -- + -- access Natural NP(16#...#) + -- access Pattern PP(16#...#) + -- access VString VP(16#...#) + -- + -- Natural_Func NF(16#...#) + -- VString_Func VF(16#...#) + -- + -- where 16#...# is the hex representation of the integer address that + -- corresponds to the given access value + + procedure Dump (P : Pattern); + -- This procedure writes information about the pattern to Standard_Out. + -- The format of this information is keyed to the internal data structures + -- used to implement patterns. The information provided by Dump is thus + -- more precise than that yielded by Image, but is also a bit more obscure + -- (i.e. it cannot be interpreted solely in terms of this spec, you have + -- to know something about the data structures). + + ------------------ + -- Private Part -- + ------------------ + +private + type PE; + -- Pattern element, a pattern is a plex structure of PE's. This type + -- is defined and sdescribed in the body of this package. + + type PE_Ptr is access all PE; + -- Pattern reference. PE's use PE_Ptr values to reference other PE's + + type Pattern is new Controlled with record + + Stk : Natural; + -- Maximum number of stack entries required for matching this + -- pattern. See description of pattern history stack in body. + + P : PE_Ptr; + -- Pointer to initial pattern element for pattern + + end record; + + pragma Finalize_Storage_Only (Pattern); + + procedure Adjust (Object : in out Pattern); + -- Adjust routine used to copy pattern objects + + procedure Finalize (Object : in out Pattern); + -- Finalization routine used to release storage allocated for a pattern. + + type VString_Ptr is access all VString; + + type Match_Result is record + Var : VString_Ptr; + -- Pointer to subject string. Set to null if match failed. + + Start : Natural; + -- Starting index position (1's origin) of matched section of + -- subject string. Only valid if Var is non-null. + + Stop : Natural; + -- Ending index position (1's origin) of matched section of + -- subject string. Only valid if Var is non-null. + + end record; + + pragma Volatile (Match_Result); + -- This ensures that the Result parameter is passed by reference, so + -- that we can play our games with the bogus Match_Result_Var parameter + -- in the function case to treat it as though it were an in out parameter. + +end GNAT.Spitbol.Patterns; |