summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorEric Blake <ebb9@byu.net>2009-01-06 22:03:27 -0700
committerEric Blake <ebb9@byu.net>2009-01-07 14:11:08 -0700
commitae9dfa87a514d290fe349710a9f643d52856f4ba (patch)
tree56561d37876dd6b774b9ce07e608c3df436a9093
parente6819ca240b76700f07c31ba157f7795caada02e (diff)
downloadm4-ae9dfa87a514d290fe349710a9f643d52856f4ba.tar.gz
Enhance substr to support negative values.
* doc/m4.texinfo (Substr): Document new semantics, and how to simulate old. * modules/m4.c (substr): Support negative values. * NEWS: Document this. Signed-off-by: Eric Blake <ebb9@byu.net> (cherry picked from commit e9e4abba45f7e9f368cf497e14bc2ce64b867a02)
-rw-r--r--ChangeLog8
-rw-r--r--NEWS9
-rw-r--r--doc/m4.texinfo157
-rw-r--r--modules/m4.c46
4 files changed, 193 insertions, 27 deletions
diff --git a/ChangeLog b/ChangeLog
index aa41fcc3..47f1897b 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@
+2009-01-07 Eric Blake <ebb9@byu.net>
+
+ Enhance substr to support negative values.
+ * doc/m4.texinfo (Substr): Document new semantics, and how to
+ simulate old.
+ * modules/m4.c (substr): Support negative values.
+ * NEWS: Document this.
+
2009-01-05 Eric Blake <ebb9@byu.net>
Maintainer cleanups.
diff --git a/NEWS b/NEWS
index ca4e0b0b..b3396ff7 100644
--- a/NEWS
+++ b/NEWS
@@ -1,6 +1,6 @@
GNU m4 NEWS - History of user-visible changes. -*- outline -*-
-Copyright (C) 1992, 1993, 1994, 1998, 2000, 2001, 2006, 2007, 2008 Free
-Software Foundation, Inc.
+Copyright (C) 1992, 1993, 1994, 1998, 2000, 2001, 2006, 2007, 2008, 2009
+Free Software Foundation, Inc.
* Noteworthy changes in Version 1.9b (200x-??-??) [beta]
Released by ????, based on git version 1.9a-*
@@ -242,6 +242,11 @@ promoted to 2.0.
the current expansion is nested within argument collection of another
macro. It has also been optimized for faster performance.
+** The `substr' builtin now treats negative arguments as indices relative
+ to the end of the string. The manual gives an
+ example of how to recover M4 1.4.x behavior, as well as an example of
+ simulating the new negative argument semantics with older M4.
+
** The `-d'/`--debug' command-line option now understands `-' and `+'
modifiers, the way the builtin `debugmode' has always done; this allows
`-d-V' to disable prior debug settings from the command line, similar to
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index c5e36dd5..6b515cfd 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -46,7 +46,7 @@ This manual (@value{UPDATED}) is for @acronym{GNU} M4 (version
language.
Copyright @copyright{} 1989, 1990, 1991, 1992, 1993, 1994, 1998, 1999,
-2000, 2001, 2004, 2005, 2006, 2007, 2008 Free Software Foundation, Inc.
+2000, 2001, 2004, 2005, 2006, 2007, 2008, 2009 Free Software Foundation, Inc.
@quotation
Permission is granted to copy, distribute and/or modify this document
@@ -7012,12 +7012,27 @@ regexp(`GNUs not Unix', `\w\(\w+\)$', `POSIX_EXTENDED', `')
Substrings are extracted with @code{substr}:
@deffn {Builtin (m4)} substr (@var{string}, @var{from}, @ovar{length})
-Expands to the substring of @var{string}, which starts at index
-@var{from}, and extends for @var{length} characters, or to the end of
-@var{string}, if @var{length} is omitted. The starting index of a
-is always 0. The expansion is empty if there is an error parsing
-@var{from} or @var{length}, if @var{from} is beyond the end of
-@var{string}, or if @var{length} is negative.
+Performs a substring operation on @var{string}. If @var{from} is
+positive, it represents the 0-based index where the substring begins.
+If @var{length} is omitted, the substring ends at the end of
+@var{string}; if it is positive, @var{length} is added to the starting
+index to determine the ending index.
+
+@cindex @acronym{GNU} extensions
+As a @acronym{GNU} extension, if @var{from} is negative, it is added to
+the length of @var{string} to determine the starting index; if it is
+empty, the start of the string is used. Likewise, if @var{length} is
+negative, it is added to the length of @var{string} to determine the
+ending index, and an emtpy @var{length} behaves like an omitted
+@var{length}. It is not an error if either of the resulting indices lie
+outside the string, but the selected substring only contains the bytes
+of @var{string} that overlap the selected indices. If the end point
+lies before the beginning point, the substring chosen is the empty
+string located at the starting index.
+
+The expansion is the selected substring, which may be empty. The
+expansion is empty and a warning issued if @var{from} or @var{length}
+cannot be parsed.
The macro @code{substr} is recognized only with parameters.
@end deffn
@@ -7029,15 +7044,137 @@ substr(`gnus, gnats, and armadillos', `6', `5')
@result{}gnats
@end example
-Omitting @var{from} evokes a warning, but still produces output.
+Omitting @var{from} evokes a warning, but still produces output. On the
+other hand, selecting a @var{from} or @var{length} that lies beyond
+@var{string} is not a problem.
@example
substr(`abc')
@error{}m4:stdin:1: Warning: substr: too few arguments: 1 < 2
@result{}abc
-substr(`abc',)
-@error{}m4:stdin:2: Warning: substr: empty string treated as 0
+substr(`abc', `')
@result{}abc
+substr(`abc', `4')
+@result{}
+substr(`abc', `1', `4')
+@result{}bc
+@end example
+
+Using negative values for @var{from} or @var{length} are @acronym{GNU}
+extensions, useful for accessing a fixed size tail of an
+arbitrary-length string. Prior to M4 1.6, using these values would
+silently result in the empty string. Some other implementations crash
+on negative values, and many treat an explicitly empty @var{length} as
+0, which is different from the omitted @var{length} implying the rest of
+the original @var{string}.
+
+@example
+substr(`abcde', `2', `')
+@result{}cde
+substr(`abcde', `-3')
+@result{}cde
+substr(`abcde', `', `-3')
+@result{}ab
+substr(`abcde', `-6')
+@result{}abcde
+substr(`abcde', `-6', `5')
+@result{}abcd
+substr(`abcde', `-7', `1')
+@result{}
+substr(`abcde', `1', `-2')
+@result{}bc
+substr(`abcde', `-4', `-1')
+@result{}bcd
+substr(`abcde', `4', `-3')
+@result{}
+substr(`abcdefghij', `-09', `08')
+@result{}bcdefghi
+@end example
+
+If backwards compabitility to M4 1.4.x behavior is necessary, the
+following macro is sufficient to do the job (mimicking warnings about
+empty @var{from} or @var{length} or an ignored fourth argument is left
+as an exercise to the reader).
+
+@example
+define(`substr', `ifelse(`$#', `0', ``$0'',
+ eval(`2 < $#')`$3', `1', `',
+ index(`$2$3', `-'), `-1', `builtin(`$0', `$1', `$2', `$3')')')
+@result{}
+substr(`abcde', `3')
+@result{}de
+substr(`abcde', `3', `')
+@result{}
+substr(`abcde', `-1')
+@result{}
+substr(`abcde', `1', `-1')
+@result{}
+substr(`abcde', `2', `1', `C')
+@result{}c
+@end example
+
+On the other hand, it is possible to portably emulate the @acronym{GNU}
+extension of negative @var{from} and @var{length} arguments across all
+@code{m4} implementations, albeit with a lot more overhead. This
+example uses @code{incr} and @code{decr} to normalize @samp{-08} to
+something that a later @code{eval} will treat as a decimal value, rather
+than looking like an invalid octal number, while avoiding using these
+macros on an empty string. The helper macro @code{_substr_normalize} is
+recursive, since it is easier to fix @var{length} after @var{from} has
+been normalized, with the final iteration supplying two non-negative
+arguments to the original builtin, now named @code{_substr}.
+
+@comment options: -daq -t_substr
+@example
+$ @kbd{m4 -daq -t _substr}
+define(`_substr', defn(`substr'))dnl
+define(`substr', `ifelse(`$#', `0', ``$0'',
+ `_$0(`$1', _$0_normalize(len(`$1'),
+ ifelse(`$2', `', `0', `incr(decr(`$2'))'),
+ ifelse(`$3', `', `', `incr(decr(`$3'))')))')')dnl
+define(`_substr_normalize', `ifelse(
+ eval(`$2 < 0 && $1 + $2 >= 0'), `1',
+ `$0(`$1', eval(`$1 + $2'), `$3')',
+ eval(`$2 < 0')`$3', `1', ``0', `$1'',
+ eval(`$2 < 0 && $3 - 0 >= 0 && $1 + $2 + $3 - 0 >= 0'), `1',
+ `$0(`$1', `0', eval(`$1 + $2 + $3 - 0'))',
+ eval(`$2 < 0 && $3 - 0 >= 0'), `1', ``0', `0'',
+ eval(`$2 < 0'), `1', `$0(`$1', `0', `$3')',
+ `$3', `', ``$2', `$1'',
+ eval(`$3 - 0 < 0 && $1 - $2 + $3 - 0 >= 0'), `1',
+ ``$2', eval(`$1 - $2 + $3')',
+ eval(`$3 - 0 < 0'), `1', ``$2', `0'',
+ ``$2', `$3'')')dnl
+substr(`abcde', `2', `')
+@error{}m4trace: -1- _substr(`abcde', `2', `5')
+@result{}cde
+substr(`abcde', `-3')
+@error{}m4trace: -1- _substr(`abcde', `2', `5')
+@result{}cde
+substr(`abcde', `', `-3')
+@error{}m4trace: -1- _substr(`abcde', `0', `2')
+@result{}ab
+substr(`abcde', `-6')
+@error{}m4trace: -1- _substr(`abcde', `0', `5')
+@result{}abcde
+substr(`abcde', `-6', `5')
+@error{}m4trace: -1- _substr(`abcde', `0', `4')
+@result{}abcd
+substr(`abcde', `-7', `1')
+@error{}m4trace: -1- _substr(`abcde', `0', `0')
+@result{}
+substr(`abcde', `1', `-2')
+@error{}m4trace: -1- _substr(`abcde', `1', `2')
+@result{}bc
+substr(`abcde', `-4', `-1')
+@error{}m4trace: -1- _substr(`abcde', `1', `3')
+@result{}bcd
+substr(`abcde', `4', `-3')
+@error{}m4trace: -1- _substr(`abcde', `4', `0')
+@result{}
+substr(`abcdefghij', `-09', `08')
+@error{}m4trace: -1- _substr(`abcdefghij', `1', `8')
+@result{}bcdefghi
@end example
@node Translit
diff --git a/modules/m4.c b/modules/m4.c
index f578261a..b09510c8 100644
--- a/modules/m4.c
+++ b/modules/m4.c
@@ -1,6 +1,6 @@
/* GNU m4 -- A simple macro processor
- Copyright (C) 2000, 2002, 2003, 2004, 2006, 2007, 2008 Free Software
- Foundation, Inc.
+ Copyright (C) 2000, 2002, 2003, 2004, 2006, 2007, 2008, 2009 Free
+ Software Foundation, Inc.
This file is part of GNU M4.
@@ -924,17 +924,19 @@ M4BUILTIN_HANDLER (index)
m4_shipout_int (obs, retval);
}
-/* The macro "substr" extracts substrings from the first argument, starting
- from the index given by the second argument, extending for a length
- given by the third argument. If the third argument is missing, the
- substring extends to the end of the first argument. */
+/* The macro "substr" extracts substrings from the first argument,
+ starting from the index given by the second argument, extending for
+ a length given by the third argument. If the third argument is
+ missing or empty, the substring extends to the end of the first
+ argument. As an extension, negative arguments are treated as
+ indices relative to the string length. */
M4BUILTIN_HANDLER (substr)
{
const m4_call_info *me = m4_arg_info (argv);
const char *str = M4ARG (1);
int start = 0;
+ int end;
int length;
- int avail;
if (argc <= 2)
{
@@ -942,19 +944,33 @@ M4BUILTIN_HANDLER (substr)
return;
}
- length = avail = M4ARGLEN (1);
- if (!m4_numeric_arg (context, me, M4ARG (2), &start))
+ length = M4ARGLEN (1);
+ if (!m4_arg_empty (argv, 2)
+ && !m4_numeric_arg (context, me, M4ARG (2), &start))
return;
+ if (start < 0)
+ start += length;
- if (argc >= 4 && !m4_numeric_arg (context, me, M4ARG (3), &length))
- return;
+ if (m4_arg_empty (argv, 3))
+ end = length;
+ else
+ {
+ if (!m4_numeric_arg (context, me, M4ARG (3), &end))
+ return;
+ if (end < 0)
+ end += length;
+ else
+ end += start;
+ }
- if (start < 0 || length <= 0 || start >= avail)
+ if (start < 0)
+ start = 0;
+ if (length < end)
+ end = length;
+ if (end <= start)
return;
- if (start + length > avail)
- length = avail - start;
- obstack_grow (obs, str + start, length);
+ obstack_grow (obs, str + start, end - start);
}