Skip to main content

REGEXREPLACE

REGEXREPLACE(regex, text, replacement [, NOCASE])

regexA standard Perl regular expression.
textThe text to parse.
replacementThe replacement text. In this string, $0 refers to the substring that matched the regex pattern, and $1, $2, $3... match the first, second, third... groups in the pattern.
NOCASEOptional. Specifies a case insensitive search.
Return:REGEXREPLACE returns a single value.

The REGEXREPLACE function uses the regex to parse through the text and find matches, then replace them with the replacement string. The regex must be a standard Perl regular expression. We use third-party libraries to support this, so for non-unicode text, see boost docs at http://www.boost.org/doc/libs/1_58_0/libs/regex/doc/html/index.html. Note that the version of Boost library may vary depending on your distro. For unicode text, see the ICU docs, the sections 'Regular Expression Metacharacters' and 'Regular Expression Operators' at http://userguide.icu-project.org/strings/regexp and the links from there, in particular the section 'UnicodeSet patterns' at http://userguide.icu-project.org/strings/unicodeset. We use version 2.6 which should support all listed features.

Example:

REGEXREPLACE('(.a)t', 'the cat sat on the mat', '$1p');
        //ASCII
REGEXREPLACE(u'(.a)t', u'the cat sat on the mat', u'$1p');
        //UNICODE
// both of these examples return 'the cap sap on the map'

inrec := {STRING10 str, UNICODE10 ustr};
inset := DATASET([{'She', u'Eins'}, {'Sells', u'Zwei'},
{'Sea', u'Drei'}, {'Shells', u'Vier'}], inrec);
outrec := {STRING10 orig, STRING10 withcase, STRING10
        wocase,
UNICODE10 uorig,UNICODE10 uwithcase,UNICODE10 uwocase};

outrec trans(inrec l) := TRANSFORM
SELF.orig := l.str;
SELF.withcase := REGEXREPLACE('s', l.str, 'f');
SELF.wocase := REGEXREPLACE('s', l.str, 'f', NOCASE);
SELF.uorig := l.ustr;
SELF.uwithcase := REGEXREPLACE(u'e', l.ustr, u'\u00EB');
SELF.uwocase := REGEXREPLACE(u'e', l.ustr, u'\u00EB',
        NOCASE);
END;
OUTPUT(PROJECT(inset, trans(LEFT)));

/* the result set is:
orig withcase wocase uorig uwithcase uwocase
She She fhe Eins Eins \xc3\xabins
Sells Sellf fellf Zwei Zw\xc3\xabi Zw\xc3\xabi
Sea Sea fea Drei Dr\xc3\xabi Dr\xc3\xabi
Shells Shellf fhellf Vier Vi\xc3\xabr Vi\xc3\xabr */

See Also: PARSE, REGEXFIND