Using a RunTime variable in a PATTERN
Hi,
With help from Richard I got a RULE together to parse CSV record type input where the field separator can be in the data, (so use of quotes)
This works just fine, but the character used as a field separator can vary, e.g. be a '|', So I pass the field separator as a parameter to this FUNCTION, but the compiler barfs if I attempt to use the parameter in defining a PATTERN:
e.g.
The compiler errors with:
Err - any ideas:
Allan
With help from Richard I got a RULE together to parse CSV record type input where the field separator can be in the data, (so use of quotes)
- Code: Select all
PATTERN AlphaNumeric := PATTERN('[[:alnum:]]');
PATTERN Space := ' ';
PATTERN Sep := ',';
PATTERN Punct := PATTERN('[-_+.]');
PATTERN AnyTxt := ANY*?;
PATTERN OperandText := (AlphaNumeric | Punct | Space)+;
PATTERN OperandQuotedText := '\'' AnyTxt '\'';
PATTERN OperandDQuotedText := '"' AnyTxt '"';
PATTERN Operand := OperandText | OperandQuotedText | OperandDQuotedText;
RULE cmds := (Operand Sep) | ('' Sep) | (Operand LAST);
d := DATASET([{'"It Don\'t mean a thing, if it ain\'t got that swing",\'a comma, in an operand\',1, 455445 ,,, ,,, Allan and Anna , Nina ,'}],{STRING txt});
PARSE(d,txt,cmds,
TRANSFORM({STRING txt
;UNSIGNED2 Len},
SELF.txt := MATCHTEXT(Operand);
SELF.Len :=MATCHLENGTH(Operand)),MANY MIN);
This works just fine, but the character used as a field separator can vary, e.g. be a '|', So I pass the field separator as a parameter to this FUNCTION, but the compiler barfs if I attempt to use the parameter in defining a PATTERN:
e.g.
- Code: Select all
PATTERN Sep := pFieldDelimiter;
The compiler errors with:
- Code: Select all
Error: This expression cannot be included in a pattern (21, 30), 2285,
Err - any ideas:
Allan
- Allan
- Posts: 444
- Joined: Sat Oct 01, 2011 7:26 pm
Allan,
I'd suggest you try pre-defining all the most common CSV delimiters as a set of strings, then pass the one to use as its position in the set to the function, something like this:
HTH,
Richard
I'd suggest you try pre-defining all the most common CSV delimiters as a set of strings, then pass the one to use as its position in the set to the function, something like this:
- Code: Select all
SetSeps := [',', '|', '\t', ':'];
PATTERN Sep := SetSeps[pFieldDelimiter];

HTH,
Richard
- rtaylor
- Community Advisory Board Member
- Posts: 1619
- Joined: Wed Oct 26, 2011 7:40 pm
Thanks Richard,
Yes that was my 1st thought, but it's really sub-optimal.
This could well be rolled out across the UK and Ireland and sods law says some bright spark will want to separate on something not in the list.
Its actually more complex than just having a list of separators as you have to ensure the separator is not in the list of allowed punctuation.
Also it offends my sense of Beauty I might have for a language.
Cheers
Allan
Yes that was my 1st thought, but it's really sub-optimal.
This could well be rolled out across the UK and Ireland and sods law says some bright spark will want to separate on something not in the list.

Its actually more complex than just having a list of separators as you have to ensure the separator is not in the list of allowed punctuation.
Also it offends my sense of Beauty I might have for a language.
Cheers
Allan
- Allan
- Posts: 444
- Joined: Sat Oct 01, 2011 7:26 pm
Allan,
Yeah, using a FUNCTIONMACRO may just work.
HTH,
Richard
Yeah, using a FUNCTIONMACRO may just work.
HTH,
Richard
- rtaylor
- Community Advisory Board Member
- Posts: 1619
- Joined: Wed Oct 26, 2011 7:40 pm
Richard,
Unfortunately FUNCTIONMACRO still complains in the same way.
I even tried doing the work in embedded C++ but hit problems there, see other tickets.
But with the Grit between my teeth, I was not going to be defeated.
Got The above functionality working, in standard ECL. You can split strings up by a field separator defined at runtime, plus is copes with quoted strings.
Is there a competition for the most impenetrable, unmaintainable ECL written? If there is I would like to submit the following.
It splits a string into a DATASET of chars (a byte stream) and runs it through a finite state machine!!
The slightly messy bit at the end (apart from all of it):
Could be made cleaner with an <end of text> character appended to the end of the byte stream, but again I did not want to second guess a character to use.
Ah - I never want to do that again
Yours
Allan
Unfortunately FUNCTIONMACRO still complains in the same way.
I even tried doing the work in embedded C++ but hit problems there, see other tickets.
But with the Grit between my teeth, I was not going to be defeated.
Got The above functionality working, in standard ECL. You can split strings up by a field separator defined at runtime, plus is copes with quoted strings.
Is there a competition for the most impenetrable, unmaintainable ECL written? If there is I would like to submit the following.
It splits a string into a DATASET of chars (a byte stream) and runs it through a finite state machine!!
- Code: Select all
RIn := {UNSIGNED4 Itm;
UNSIGNED2 State;
STRING Text};
Ind := 'Allan,and anna,,,a,,"it Don\'t mean, a thing, if it ain\'t got that swing.",\'an Inner "Double,Quote", with "another" hello, and "Another with, embedded comma"\',"a quoted, string" , trailing , stuff.';
d := DATASET(LENGTH(Ind),TRANSFORM(Rin;SELF.Itm := COUNTER;SELF.State := 0;SELF.Text := Ind[COUNTER]));
sd := SORT(d,Itm);
OUTPUT(sd,NAMED('INPUT_AS_A_BYTE_STREAM'),ALL);
DQ := 1;
SQ := 2;
SEP := 3;
REST := 4;
FiniteStateMachine := DICTIONARY(DATASET([{0,DQ,5},{0,SQ,4},{0,SEP,1},{0,REST,0}
,{1,DQ,7},{1,SQ,6},{1,SEP,3},{1,REST,2}
,{2,DQ,5},{2,SQ,4},{2,SEP,1},{2,REST,0}
,{3,DQ,7},{3,SQ,6},{3,SEP,3},{3,REST,2}
,{4,DQ,4},{4,SQ,0},{4,SEP,4},{4,REST,4}
,{5,DQ,0},{5,SQ,5},{5,SEP,5},{5,REST,5}
,{6,DQ,4},{6,SQ,0},{6,SEP,4},{6,REST,4}
,{7,DQ,0},{7,SQ,5},{7,SEP,5},{7,REST,5}
],{UNSIGNED2 CurrentState;UNSIGNED1 Tokn;UNSIGNED2 NextState})
,{CurrentState,Tokn=> NextState});
RIn QuoteIt(RIn L,RIn R) := TRANSFORM
SELF.State:= FiniteStateMachine[L.State,CASE(R.Text[1],'"' => DQ,'\'' => SQ, ',' => SEP, REST)].NextState;
SELF.Text := MAP(SELF.State IN [0,4,5] => L.Text + R.Text
,SELF.State IN [1] => L.Text
,SELF.State IN [2,6,7] => R.Text
, /* [3] */ '');
SELF := R;
END;
AllStr := ITERATE(sd,QuoteIt(LEFT,RIGHT));
OUTPUT(AllStr,NAMED('OUTPUT_FROM_FSM'),ALL);
Filtered := AllStr(State IN [1,3]) & AllStr[COUNT(AllStr)];
OUTPUT(Filtered,NAMED('RECORD_CUT_INTO_FIELDS'));
The slightly messy bit at the end (apart from all of it):
- Code: Select all
& AllStr[COUNT(AllStr)];
Could be made cleaner with an <end of text> character appended to the end of the byte stream, but again I did not want to second guess a character to use.
Ah - I never want to do that again
Yours
Allan
- Allan
- Posts: 444
- Joined: Sat Oct 01, 2011 7:26 pm
Allan,
Here's my version
:
Of course, my next step would be to take all this code and turn it into a FUNCTION that takes a single STRING parameter.
HTH,
Richard
Here's my version

- Code: Select all
Ind := 'Allan,and anna,,,a,,"it Don\'t mean, a thing, if it ain\'t got that swing.",\'an Inner "Double,Quote", with "another" hello, and "Another with, embedded comma"\',"a quoted, string" , trailing , stuff.';
RIn := {UNSIGNED4 Itm;
UNSIGNED1 State;
STRING Text};
sd := DATASET(LENGTH(Ind),
TRANSFORM(Rin;
SELF.Itm := COUNTER;
SELF.State := 0;
SELF.Text := Ind[COUNTER]));
OUTPUT(sd,NAMED('INPUT_AS_A_BYTE_STREAM'),ALL);
DQ := 1;
SQ := 2;
SEP := 3;
REST := 4;
FSM := DICTIONARY(DATASET([{0,DQ,5},{0,SQ,4},{0,SEP,1},{0,REST,0}
,{1,DQ,7},{1,SQ,6},{1,SEP,3},{1,REST,2}
,{2,DQ,5},{2,SQ,4},{2,SEP,1},{2,REST,0}
,{3,DQ,7},{3,SQ,6},{3,SEP,3},{3,REST,2}
,{4,DQ,4},{4,SQ,0},{4,SEP,4},{4,REST,4}
,{5,DQ,0},{5,SQ,5},{5,SEP,5},{5,REST,5}
,{6,DQ,4},{6,SQ,0},{6,SEP,4},{6,REST,4}
,{7,DQ,0},{7,SQ,5},{7,SEP,5},{7,REST,5}
],{UNSIGNED1 CurrentState,
UNSIGNED1 Tokn,
UNSIGNED1 NextState})
,{CurrentState, Tokn => NextState});
RIn QuoteIt(RIn L,RIn R) := TRANSFORM
LastItem := R.itm = COUNT(sd);
SELF.State:= IF(LastItem,
3,
FSM[L.State,CASE(R.Text[1],
'"' => DQ,
'\'' => SQ,
',' => SEP,
REST)].NextState);
SELF.Text := MAP(SELF.State IN [0,4,5] OR LastItem => L.Text + R.Text
,SELF.State = 1 => L.Text
,SELF.State IN [2,6,7] => R.Text
, /* [3] */ '');
SELF := R;
END;
AllStr := ITERATE(sd,QuoteIt(LEFT,RIGHT));
OUTPUT(AllStr,NAMED('OUTPUT_FROM_FSM'),ALL);
Filtered := AllStr(State IN [1,3]);
OUTPUT(Filtered,NAMED('RECORD_CUT_INTO_FIELDS'));
Of course, my next step would be to take all this code and turn it into a FUNCTION that takes a single STRING parameter.

HTH,
Richard
- rtaylor
- Community Advisory Board Member
- Posts: 1619
- Joined: Wed Oct 26, 2011 7:40 pm
Thanks Richard,
Yes, I realised I did not need the SORT but it was not the meat of the program , so just got left in.
I like your approach to 'last item' it's tidyer than mine.
I must admit, I did not think you would take it any further, just leave it as a curiosity.
Any way, Thanks
Allan
Yes, I realised I did not need the SORT but it was not the meat of the program , so just got left in.
I like your approach to 'last item' it's tidyer than mine.
I must admit, I did not think you would take it any further, just leave it as a curiosity.
Any way, Thanks
Allan
- Allan
- Posts: 444
- Joined: Sat Oct 01, 2011 7:26 pm
Ah Richard,
Just found out your version does not quite work for trailing field separators in a record
e.g.
As you have unconditionally forced 'LastItem' to perform action 'L.Text + R.Text' but that is not the case for trailing separators.
Tricky things FSM
The 'proper' solution is to have an <end of text> token actually in the byte stream.
I'll post and amendment to your solution.
Just thought I'd better mention it just in case, being published on the forum, someone actually uses this mad stuff as an example!
Yours
Allan
Just found out your version does not quite work for trailing field separators in a record
e.g.
- Code: Select all
trailing , stuff.,,,
As you have unconditionally forced 'LastItem' to perform action 'L.Text + R.Text' but that is not the case for trailing separators.
Tricky things FSM

The 'proper' solution is to have an <end of text> token actually in the byte stream.
I'll post and amendment to your solution.
Just thought I'd better mention it just in case, being published on the forum, someone actually uses this mad stuff as an example!
Yours
Allan
- Allan
- Posts: 444
- Joined: Sat Oct 01, 2011 7:26 pm
ok,
This copes with, recognises and retains, empty fields at the end of records.
I appends an 'X' to the input stream to make sure the ITERATE gets run one further iteration, the 'X' itself is not used.
Then the test:
is still valid, but this time can put the new token EOT through the finite state machine.
resulting in a natural flow to the handling of the EOT.
Yours
Allan
This copes with, recognises and retains, empty fields at the end of records.
I appends an 'X' to the input stream to make sure the ITERATE gets run one further iteration, the 'X' itself is not used.
Then the test:
- Code: Select all
LastItem := R.itm = COUNT(sd);
is still valid, but this time can put the new token EOT through the finite state machine.
resulting in a natural flow to the handling of the EOT.
- Code: Select all
Ind := 'Allan,and anna,,,a,,"it Don\'t mean, a thing, if it ain\'t got that swing.",\'an Inner "Double,Quote",'
+' with "another" hello, and "Another with, embedded comma"\',"a quoted, string" , trailing , stuff.,,,';
RIn := {UNSIGNED4 Itm;
UNSIGNED1 State;
STRING Text};
sd := DATASET(LENGTH(Ind)+1,
TRANSFORM(Rin;
SELF.Itm := COUNTER;
SELF.State := 0;
SELF.Text := IF(COUNTER > LENGTH(Ind),'X',Ind[COUNTER])));
OUTPUT(sd,NAMED('INPUT_AS_A_BYTE_STREAM'),ALL);
DQ := 1;
SQ := 2;
SEP := 3;
REST := 4;
EOT := 5;
FSM := DICTIONARY(DATASET([{0,DQ,5},{0,SQ,4},{0,SEP,1},{0,REST,0},{0,EOT,1}
,{1,DQ,7},{1,SQ,6},{1,SEP,3},{1,REST,2},{1,EOT,3}
,{2,DQ,5},{2,SQ,4},{2,SEP,1},{2,REST,0},{2,EOT,1}
,{3,DQ,7},{3,SQ,6},{3,SEP,3},{3,REST,2},{3,EOT,3}
,{4,DQ,4},{4,SQ,0},{4,SEP,4},{4,REST,4},{4,EOT,1}
,{5,DQ,0},{5,SQ,5},{5,SEP,5},{5,REST,5},{5,EOT,1}
,{6,DQ,4},{6,SQ,0},{6,SEP,4},{6,REST,4},{6,EOT,1}
,{7,DQ,0},{7,SQ,5},{7,SEP,5},{7,REST,5},{7,EOT,1}
],{UNSIGNED1 CurrentState,
UNSIGNED1 Tokn,
UNSIGNED1 NextState})
,{CurrentState, Tokn => NextState});
RIn SplitRecordsIntoFields(RIn L,RIn R) := TRANSFORM
LastItem := R.itm = COUNT(sd);
SELF.State:= FSM[L.State,IF(LastItem
,EOT
,CASE(R.Text[1],
'"' => DQ,
'\'' => SQ,
',' => SEP,
REST))].NextState;
SELF.Text := MAP(SELF.State IN [0,4,5] => L.Text + R.Text
,SELF.State = 1 => L.Text
,SELF.State IN [2,6,7] => R.Text
, /* [3] */ '');
SELF := R;
END;
AllStr := ITERATE(sd,SplitRecordsIntoFields(LEFT,RIGHT));
OUTPUT(AllStr,NAMED('OUTPUT_FROM_FSM'),ALL);
Filtered := AllStr(State IN [1,3]);
OUTPUT(Filtered,NAMED('RECORD_CUT_INTO_FIELDS'));
Yours
Allan
- Allan
- Posts: 444
- Joined: Sat Oct 01, 2011 7:26 pm
13 posts
• Page 1 of 2 • 1, 2
Who is online
Users browsing this forum: No registered users and 1 guest