CFX_PCRegEx is a Perl-compatible regular expression (regex) parsing extension tag for the Allaire ColdFusion server software, originally written by Rick Osborne in September of 2000. It is designed to be a replacement to (or supplement of) the existing ColdFusion regex capabilities. Both Find() and Replace() capabilities are available, including backreferences, POSIX expressions, and just about anything else you can do with Perl regexes.
The tag uses the PCRE (Perl Compatible Regular Expression) engine, which was written by Philip Hazel and is copyright by the University of Cambridge, England. For more information on the PCRE engine, see <ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/>.
<CFX_PCREGEX SUBJECT="#Subject#" PATTERN="#Pattern#" RESULTS="#ResultVar#" OFFSET="1" COUNT="1" MAXSUBS="ALL" DEBUG="True">
<CFX_PCREGEX SUBJECT="#Subject#" PATTERN="#Pattern#" RESULTS="#ResultVar#" REPLACE="Replacement" COUNT="ALL">
|Subject||Yes||The string to be matched against.|
|Pattern||Yes||The regular expression pattern to use on Subject|
|Results||Yes||The name of the variable to store the results in.|
|Offset||No||1||The offset inside the Subject string at which to start the regex match.|
|Replace||No||The replacement string that will be substituted into the Subject string at the matched locations.|
|Count||No||1||The maximum number of match attempts made for the Pattern.|
|MaxSubs||No||ALL||The maximum number of subexpression matches to return.|
|Debug||No||Display informative debugging information. (Only valid with the Debug version of the tag. The Release version wil not output any debugging information.)|
|PCRegEx.Time||The amount of time taken to process the regex. (This does not include tag load/unlad time.)|
|PCRegEx.Message||Error message (if any) for the regex.|
|PCRegEx.Offset||The character offset in the Subject where the error (if any) occurred. This is 0 for no error, and -1 for No Match.|
|PCRegEx.Version||Version of the tag.|
|PCRegEx.PCREVersion||Version of the PCRE engine used by the tag.|
|PCRegEx.PCRELicense||License information for the PCRE engine.|
|PCRegEx.PCREURL||URL for more information about the PCRE engine.|
The variable specified in the Results attribute will be set differently depending on the mode of the tag. If a Replace attribute is found, the tag will go into Replace mode, otherwise it is in Find mode by default.
In Find mode, the result variable will be set to a query with the following columns: Match, Sub, Pos, Len. The Match and Sub columns are only useful when the Count attribute is set to something greater than 1. The Match column contains the number of the matched expression (starting at 1). The Sub column contians the number of the matched subexpression for the current Match. The Pos column is the position in the Subject string where the matching subexpression starts, and the Len column holds the length. When Pos equals 0, the subexpression was not matched. (This is different than the way CF handles subexpressions, as it simply collapses the result set to eliminate unmatched subexpressions.) When Sub equals 0, the Pos and Len represent the entire matched expression. The RecordCount for a given result set should equal the number of matches multiplied by one more than the number of subexpressions per match. (RecordCount = Matches * (Subexpressions + 1))
If you are stuck using a CF-style regex that captures subexpressions that you aren't going to use, you will speed up the execution considerably by setting the MaxSubs attribute to 0.
For any positive result, a few shortcuts can be used. Result.Pos and Result.Len signify the first matched expression. Result.Match[RecordCount] is the total number of matches. Result.Sub[RecordCount] is the number of subexpressions returned for each match. (Remember that this does not include the 0th match.
In Replace mode the result variable will always be set to the resultant string. No information about the number of matches or anything like that is set. All you get is the resultant string.
Backreferences to subexpressions can be used in the Replace string, just as with the REReplace() and REReplaceNoCase() standard CFML functions, with one addition: the backreference \0 will return the entire matching expression. (Like $& in Perl.) The backref parser tries to be smart and outguess clumsy coders (and be efficient). A pass is made over the Replace string to see if there are any actual backrefs being made. It looks for something akin to "\\[[:digit:]][[:digit:]]?"; that is a backslash followed by one or two digits. If such a backref is found, then the engine will try to interpolate the Replace string for each matching expression. In such a case, you must escape any backslashes that you want to use as actual backslashes. If no valid backrefs are found, then you do not need to escape your backslashes. For example, a Replace string of "\1\\\2" would be interpolated, while "\a\\b" would not, and in the first case the backslash is escaped and in the second case it is not.
<!--- From the Allaire book ---> <CFSET data="Some BIG string"> <CFX_PCREGEX SUBJECT="#data#" PATTERN=" [A-Z]+ " RESULTS="bigstring"> <CFIF PCREGEX.MESSAGE IS ""> <CFOUTPUT>Match found at #bigstring.pos# : #Mid(data,bigstring.pos,bigstring.len)#</CFOUTPUT> <CFELSE> <CFOUTPUT>There was an error: #PCREGEX.MESSAGE#</CFOUTPUT> </CFIF> <!--- Should see: Match fount at 5 : BIG ---> <!--- Find all of the words --- like split() in Perl ---> <CFX_PCREGEX SUBJECT="#data#" PATTERN="\w+" RESULTS="words" COUNT="ALL"> <CFOUTPUT QUERY="words">#Mid(data,Pos,Len)#<BR></CFQUERY> <!--- Should see: Some<BR>BIG<BR>string ---> <!--- From the Allaire book ---> <CFX_PCREGEX SUBJECT="Allaire's Web Site" PATTERN="[[:space:]]" REPLACE="*" RESULTS="starred" COUNT="ALL"> <CFOUTPUT>#starred#</CFOUTPUT> <!--- Should see: Allaire's*Web*Site ---> <!--- From the Allaire book ---> <CFX_PCREGEX SUBJECT="There is is coffee in the the kitchen" PATTERN="([A-Za-z]+)[ ]+\1" REPLACE="*" RESULTS="starred" COUNT="ALL"> <CFOUTPUT>#starred#</CFOUTPUT> <!--- Should see: There * coffee in * kitchen ---> <!--- From the Allaire book ---> <CFX_PCREGEX SUBJECT="There is is a cat in in the kitchen" PATTERN="([A-Za-z]+)[ ]+\1" REPLACE="\1" RESULTS="onedupe"> <CFOUTPUT>#onedup#</CFOUTPUT> <!--- Should see: There is a cat in in the kitchen ---> <!--- From the Allaire book ---> <CFX_PCREGEX SUBJECT="There is is a cat in in the kitchen" PATTERN="([A-Za-z]+)[ ]+\1" REPLACE="\1" RESULTS="nodupes" COUNT="ALL"> <CFOUTPUT>#nodupes#</CFOUTPUT> <!--- Should see: There is a cat in the kitchen --->
Tag installation is just like any other tag installation. See the Allaire reference material for details. The distribution for this program should have come with two DLLs: a Debug and a Release version. Both DLLs have the same functionality, with the exception that the Debug version is compiled with debugging information, while the Release version has none of this and is optimized for speed.
This program was originally written by Rick Osborne. All questions or comments should be directed to him at <firstname.lastname@example.org>.
The primary distribution URL for this program is <http://www.rixsoft.com/ColdFusion/CFX/PCRegEx/>. Latest versions will be kept at that URL, so if you did not obtain this program from that URL, please check for a newer version. This help file should be included with every distribution, along with the executables (DLL), source (C++), and test file (CFM). If any parts of this distribution are missing, please visit the preceding URL for a full distribution.
Note: The PCRE source code is not distributed with the source code for this DLL. You must obtain the PCRE source code seperately if you want to manually compile the code for this DLL.
This program is being release under the same license as the PCRE engine. Please see the next section for details.
PCRE is a library of functions to support regular expressions whose syntax and semantics are as close as possible to those of the Perl 5 language.
Written by: Philip Hazel <email@example.com>
University of Cambridge Computing Service,
Cambridge, England. Phone: +44 1223 334714.
Copyright (c) 1997-2000 University of Cambridge
Permission is granted to anyone to use this software for any purpose on any computer system, and to redistribute it freely, subject to the following restrictions:
Regular expression support is provided by the PCRE library package, which is open source software, written by Philip Hazel, and copyright by the University of Cambridge, England.somewhere reasonably visible in your documentation and in any relevant files or online help data or similar. A reference to the ftp site for the source, that is, to
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/should also be given in the documentation.
Last Updated 2000-10-11 by Rick Osborne