Skip to main content

NLP RECORD and TRANSFORM Functions

The following functions are used in field definition expressions within the RECORD structure or TRANSFORM function that defines the result set from the PARSE function:

MATCHED( [ patternreference ] )

MATCHED returns true or false as to whether the patternreference found a match. If the patternreference is omitted, it indicates whether the entire pattern matched or not (for use with the NOT MATCHED option).

MATCHTEXT [ (patternreference) ]

MATCHTEXT returns the matching ASCII text the patternreference found, or blank if not found. If the patternreference is omitted, MATCHTEXT returns all matching text.

MATCHUNICODE(patternreference)

MATCHUNICODE returns the matching Unicode text the patternreference found, or blank if not found.

MATCHLENGTH(patternreference)

MATCHLENGTH returns the number of characters in the matching text the patternreference found, or 0 if not found.

MATCHPOSITION(patternreference)

MATCHPOSITION returns the position within the text of the first character in the matching text the patternreference found, or 0 if not found.

MATCHROW(patternreference)

MATCHROW returns the entire row of the matching text the patternreference found for a RULE (valid only when the PARSE option is used on the PARSE function). This may be used to fully qualify a field in the RECORD structure of the row.

Pattern References

The patternreference parameter to these functions is a slash-delimited (/) list of previously defined PATTERN, TOKEN, or RULE attributes with or without an instance number appended in square brackets.

If an instance number is supplied, the patternreference matches a particular occurrence, otherwise it matches any. The patternreference provides a path through the regular expression grammar to a particular result. The path to a particular attribute can either be fully or partially specified.

Example:

PATTERN ws := PATTERN('[ \t\r\n]');
PATTERN arb := PATTERN('[-!.,\t a-zA-Z0-9]')+;
PATTERN number := PATTERN('[0-9]')+;
PATTERN age := '(' number OPT('/I') ')';
PATTERN role := '[' arb ']';
PATTERN m_rank := '<' number '>';
PATTERN actor := arb OPT(ws '(I)' ws);
          
NLP_layout_actor_movie := RECORD
  STRING30 actor_name := MATCHTEXT(actor);
  STRING50 movie_name := MATCHTEXT(arb[2]); //2nd instance of arb
  UNSIGNED2 movie_year := (UNSIGNED)MATCHTEXT(age/number);
                         //number within age
  STRING20 movie_role := MATCHTEXT(role/arb); //arb within role
  UNSIGNED1 cast_rank := (UNSIGNED)MATCHTEXT(m_rank/number);
END;
          
// This example demonstrates the use of productions in PARSE code
//(only supported in the tomita version of PARSE).
PATTERN ws := [' ','\t'];
TOKEN number := PATTERN('[0-9]+');
TOKEN plus := '+';
TOKEN minus := '-';

attrRec := RECORD
  INTEGER val;
END;

RULE(attrRec) e0 :=
          '(' USE(attrRec,expr)? ')' |
          number TRANSFORM(attrRec, SELF.val := (INTEGER)$1;) |
          '-' SELF TRANSFORM(attrRec, SELF.val := -$2.val;);
RULE(attrRec) e1 :=
          e0 |
          SELF '*' e0 TRANSFORM(attrRec, SELF.val := $1.val * $3.val;) |
          USE(attrRec, e1) '/' e0
               TRANSFORM(attrRec, SELF.val := $1.val / $3.val;);
RULE(attrRec) e2 :=
          e1 |
          SELF plus e1 TRANSFORM(attrRec, SELF.val := $1.val + $3.val;) |
          SELF minus e1 TRANSFORM(attrRec, SELF.val := $1.val - $3.val;);
RULE(attrRec) expr := e2;
 
infile := DATASET([{'1+2*3'},{'1+2*z'},{'1+2+(3+4)*4/2'}],
          { STRING line });
resultsRec := RECORD
  RECORDOF(infile);
  attrRec;
  STRING exprText;
  INTEGER value3;
END;

resultsRec extractResults(infile l, attrRec attr) := TRANSFORM
  SELF := l;
  SELF := attr;
  SELF.exprText := MATCHTEXT;
  SELF.value3 := MATCHROW(e0[3]).val;
END;

OUTPUT(PARSE(infile,line,expr,extractResults(LEFT, $1),
            FIRST,WHOLE,PARSE,SKIP(ws)));

See Also: PARSE, RECORD Structure, TRANSFORM Structure