REGEXREPLACE(regex, text, replacement [, NOCASE])
regex | A standard Perl regular expression. |
text | The text to parse. |
replacement | The replacement text. In this string, $0 refers to the substring that matched the regex pattern, and $1, $2, $3... match the first, second, third... groups in the pattern. |
NOCASE | Optional. Specifies a case insensitive search. |
Return: | REGEXREPLACE returns a single value. |
The REGEXREPLACE function uses the regex to parse through the text and find matches, then replace them with the replacement string. The regex must be a standard Perl regular expression. We use third-party libraries to support this, so for non-unicode text, see boost docs at http://www.boost.org/doc/libs/1_58_0/libs/regex/doc/html/index.html. Note that the version of Boost library may vary depending on your distro. For unicode text, see the ICU docs, the sections 'Regular Expression Metacharacters' and 'Regular Expression Operators' at http://userguide.icu-project.org/strings/regexp and the links from there, in particular the section 'UnicodeSet patterns' at http://userguide.icu-project.org/strings/unicodeset. We use version 2.6 which should support all listed features.
Example:
REGEXREPLACE('(.a)t', 'the cat sat on the mat', '$1p'); //ASCII REGEXREPLACE(u'(.a)t', u'the cat sat on the mat', u'$1p'); //UNICODE // both of these examples return 'the cap sap on the map' inrec := {STRING10 str, UNICODE10 ustr}; inset := DATASET([{'She', u'Eins'}, {'Sells', u'Zwei'}, {'Sea', u'Drei'}, {'Shells', u'Vier'}], inrec); outrec := {STRING10 orig, STRING10 withcase, STRING10 wocase, UNICODE10 uorig,UNICODE10 uwithcase,UNICODE10 uwocase}; outrec trans(inrec l) := TRANSFORM SELF.orig := l.str; SELF.withcase := REGEXREPLACE('s', l.str, 'f'); SELF.wocase := REGEXREPLACE('s', l.str, 'f', NOCASE); SELF.uorig := l.ustr; SELF.uwithcase := REGEXREPLACE(u'e', l.ustr, u'\u00EB'); SELF.uwocase := REGEXREPLACE(u'e', l.ustr, u'\u00EB', NOCASE); END; OUTPUT(PROJECT(inset, trans(LEFT))); /* the result set is: orig withcase wocase uorig uwithcase uwocase She She fhe Eins Eins \xc3\xabins Sells Sellf fellf Zwei Zw\xc3\xabi Zw\xc3\xabi Sea Sea fea Drei Dr\xc3\xabi Dr\xc3\xabi Shells Shellf fhellf Vier Vi\xc3\xabr Vi\xc3\xabr */