summaryrefslogtreecommitdiff
path: root/reference/C/CONTRIB/SNIP/match.doc
diff options
context:
space:
mode:
Diffstat (limited to 'reference/C/CONTRIB/SNIP/match.doc')
-rwxr-xr-xreference/C/CONTRIB/SNIP/match.doc126
1 files changed, 126 insertions, 0 deletions
diff --git a/reference/C/CONTRIB/SNIP/match.doc b/reference/C/CONTRIB/SNIP/match.doc
new file mode 100755
index 0000000..96d783b
--- /dev/null
+++ b/reference/C/CONTRIB/SNIP/match.doc
@@ -0,0 +1,126 @@
+
+ REGEX Globber (Wild Card Matching)
+
+ A *IX SH style pattern matcher written in C
+ V1.10 Dedicated to the Public Domain
+
+ March 12, 1991
+ J. Kercheval
+ [72450,3702] -- johnk@wrq.com
+
+
+
+
+*IX SH style Regular Expressions
+================================
+
+The *IX command SH is a working shell similar in feel to the MSDOS
+shell COMMAND.COM. In point of fact much of what we see in our
+familiar DOS PROMPT was gleaned from the early UNIX shells available
+for many of machines the people involved in the computing arena had
+at the time of the development of DOS and it's much maligned
+precursor CP/M (although the UNIX shells were and are much more
+flexible and powerful then those on the current flock of micro
+machines). The designers of DOS and CP/M did some fairly strange
+things with their command processor and OS. One of those things was
+to only selectively adopt the regular expressions allowed within the
+*IX shells. Only '?' and '*' were allowed in filenames and even with
+these the '*' was allowed only at the end of a pattern and in fact
+when used to specify the filename the '*' did not apply to extension.
+This gave rise to the all too common expression "*.*".
+
+REGEX Globber is a SH pattern matcher. This allows such
+specifications as *75.zip or * (equivelant to *.* in DOS lingo).
+Expressions such as [a-e]*t would fit the name "apple.crt" or
+"catspaw.bat" or "elegant". This allows considerably wider
+flexibility in file specification, general parsing or any other
+circumstance in which this type of pattern matching is wanted.
+
+A match would mean that the entire string TEXT is used up in matching
+the PATTERN and conversely the matched TEXT uses up the entire
+PATTERN.
+
+In the specified pattern string:
+ `*' matches any sequence of characters (zero or more)
+ `?' matches any character
+ `\' suppresses syntactic significance of a special character
+ [SET] matches any character in the specified set,
+ [!SET] or [^SET] matches any character not in the specified set.
+
+A set is composed of characters or ranges; a range looks like
+'character hyphen character' (as in 0-9 or A-Z). [0-9a-zA-Z_] is the
+minimal set of characters allowed in the [..] pattern construct.
+Other characters are allowed (ie. 8 bit characters) if your system
+will support them (it almost certainly will).
+
+To suppress the special syntactic significance of any of `[]*?!^-\',
+and match the character exactly, precede it with a `\'.
+
+To view several examples of good and bad patterns and text see the
+output of MATCHTST.BAT
+
+
+
+MATCH() and MATCHE()
+====================
+
+The match module as written has two parsing routines, one is matche()
+and the other is match(). Since match() is a call to matche() which
+simply has its output mapped to a BOOLEAN value (ie TRUE if pattern
+matches or FALSE otherwise), I will concentrate my explanations here
+on matche().
+
+The purpose of matche() is to match a pattern against a string of
+text (usually a file name or specification). The match routine has
+extensive pattern validity checking built into it as part of the
+parser and allows for a robust pattern match.
+
+The parser gives an error code on return of type int. The error code
+will be one of the the following defined values (defined in match.h):
+
+ MATCH_PATTERN - bad pattern or misformed pattern
+ MATCH_LITERAL - match failed on character match (standard
+ character)
+ MATCH_RANGE - match failure on character range ([..] construct)
+ MATCH_ABORT - premature end of text string (pattern longer
+ than text string)
+ MATCH_END - premature end of pattern string (text longer
+ than pattern called for)
+ MATCH_VALID - valid match using pattern
+
+The functions are declared as follows:
+
+ BOOLEAN match (char *pattern, char *text);
+
+ int matche(register char *pattern, register char *text);
+
+
+
+IS_VALID_PATTERN() and IS_PATTERN()
+===================================
+
+There are two routines for determining properties of a pattern
+string. The first, is_pattern(), is designed simply to determine if
+some character exists within the text which is consistent with a SH
+regular expression (this function returns TRUE if so and FALSE if
+not). The second, is_valid_pattern() is designed to check the
+validity of a given pattern string (TRUE return if valid, FALSE if
+not). By 'validity', I mean well formed or syntactically correct.
+
+In addition, is_valid_pattern() has as one of it's parameters a
+return code for determining the type of error found in the pattern if
+one exists. The error codes are as follows and defined in match.h:
+
+ PATTERN_VALID - pattern is well formed
+ PATTERN_ESC - pattern has invalid literal escape ('\' at end of
+ pattern)
+ PATTERN_RANGE - [..] construct has a no end range in a '-' pair
+ (ie [a-])
+ PATTERN_CLOSE - [..] construct has no end bracket (ie [abc-g )
+ PATTERN_EMPTY - [..] construct is empty (ie [])
+
+The functions are declared as follows:
+
+ BOOLEAN is_valid_pattern (char *pattern, int *error_type);
+
+ BOOLEAN is_pattern (char *pattern);