There are three value types specifically designed and required to define parsing pattern attributes:
PATTERN patternid := parsepattern;
patternid | The attribute name of the pattern. |
parsepattern | The pattern, very similar to regular expressions. This may contain other previously defined PATTERN attributes. See ParsePattern Definitions below. |
The PATTERN value type defines a parsing expression very similar to regular expression patterns.
TOKEN tokenid := parsepattern;
tokenid | The attribute name of the token. |
parsepattern | The token pattern, very similar to regular expressions. This may contain PATTERN attributes but no TOKEN or RULE attributes. See ParsePattern Definitions below. |
The TOKEN value type defines a parsing expression very similar to a PATTERN, but once matched, the parser doesn't backtrack to find alternative matches as it would with PATTERN.
RULE [ ( recstruct ) ] ruleid := rulePattern;
recstruct | Optional. The attribute name of a RECORD structure attribute (valid only when the PARSE option is used on the PARSE function). |
ruleid | The attribute name of the rule. |
rulePattern | The rule pattern, very similar to regular expressions. This may contain PATTERN attributes, TOKEN attributes, or RULE attributes. See ParsePattern Definitions below. |
The RULE value type defines a parsing expression containing combinations of TOKENs. If a RULE definition contains a PATTERN it is implicitly converted to a TOKEN. Like PATTERN, once matched, the parser backtracks to find alternative RULE matches.
If the PARSE option is present on the PARSE function (thereby implementing tomita parsing for the operation), each alternative RULE rulePattern may have an associated TRANSFORM function. The different input patterns can be referred to using $1, $2 etc. If the pattern has an associated recstruct then $1 is a row, otherwise it is a string. Default TRANSFORM functions are created in two circumstances:
1. If there are no patterns, the default transform clears the row. For example:
RULE(myRecord) := ; //empty expression = cleared row2. If there is only a single pattern with an associated record, and that record matches the type of the rule being defined. For example:
RULE(myRecord) e0 := '(' USE(myRecord, 'expression') ')';
A parsepattern may contain any combination of the following elements:
Examples:
rs := RECORD
STRING100 line;
END;
ds := DATASET([{'the fox; and the hen'}], rs);
PATTERN ws := PATTERN('[ \t\r\n]');
PATTERN Alpha := PATTERN('[A-Za-z]');
PATTERN Word := Alpha+;
PATTERN Article := ['the', 'A'];
PATTERN JustAWord := Word PENALTY(1);
PATTERN notHen := VALIDATE(Word, MATCHTEXT != 'hen');
PATTERN NoHenWord := notHen PENALTY(1);
RULE NounPhraseComponent1 := JustAWord | Article ws Word;
RULE NounPhraseComponent2 := NoHenWord | Article ws Word;
ps1 := RECORD
out1 := MATCHTEXT(NounPhraseComponent1);
END;
ps2 := RECORD
out2 := MATCHTEXT(NounPhraseComponent2);
END;
p1 := PARSE(ds, line, NounPhraseComponent1, ps1, BEST, MANY, NOCASE);
p2 := PARSE(ds, line, NounPhraseComponent2, ps2, BEST, MANY, NOCASE);
OUTPUT(p1);
OUTPUT(p2);
See Also: PARSE, RECORD Structure, TRANSFORM Structure, DATASET