Tue Feb 25, 2020 2:58 am
Login Register Lost Password? Contact Us


Using a RunTime variable in a PATTERN

Questions around writing code and queries

Sat Jan 25, 2020 5:00 pm Change Time Zone

Hi,

With help from Richard I got a RULE together to parse CSV record type input where the field separator can be in the data, (so use of quotes)
Code: Select all
PATTERN AlphaNumeric  := PATTERN('[[:alnum:]]');
PATTERN Space         := ' ';
PATTERN Sep           := ',';
PATTERN Punct         := PATTERN('[-_+.]');
PATTERN AnyTxt        := ANY*?;

PATTERN OperandText   := (AlphaNumeric | Punct | Space)+;
PATTERN OperandQuotedText := '\'' AnyTxt '\'';
PATTERN OperandDQuotedText := '"' AnyTxt '"';

PATTERN Operand := OperandText | OperandQuotedText | OperandDQuotedText;

RULE cmds := (Operand Sep) | ('' Sep) | (Operand LAST);

d := DATASET([{'"It Don\'t mean a thing, if it ain\'t got that swing",\'a comma, in an operand\',1, 455445 ,,, ,,,   Allan and Anna  ,   Nina   ,'}],{STRING txt});

PARSE(d,txt,cmds,
      TRANSFORM({STRING txt
                ;UNSIGNED2 Len},
          SELF.txt := MATCHTEXT(Operand);
          SELF.Len :=MATCHLENGTH(Operand)),MANY MIN);

This works just fine, but the character used as a field separator can vary, e.g. be a '|', So I pass the field separator as a parameter to this FUNCTION, but the compiler barfs if I attempt to use the parameter in defining a PATTERN:

e.g.
Code: Select all
    PATTERN Sep           := pFieldDelimiter;

The compiler errors with:
Code: Select all
Error:    This expression cannot be included in a pattern (21, 30), 2285,


Err - any ideas:

Allan
Allan
 
Posts: 419
Joined: Sat Oct 01, 2011 7:26 pm

Mon Jan 27, 2020 2:26 pm Change Time Zone

Allan,

I'd suggest you try pre-defining all the most common CSV delimiters as a set of strings, then pass the one to use as its position in the set to the function, something like this:
Code: Select all
SetSeps     := [',', '|', '\t', ':'];
PATTERN Sep := SetSeps[pFieldDelimiter];
Let me know if that works. :)

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1519
Joined: Wed Oct 26, 2011 7:40 pm

Tue Jan 28, 2020 9:11 am Change Time Zone

Thanks Richard,

Yes that was my 1st thought, but it's really sub-optimal.
This could well be rolled out across the UK and Ireland and sods law says some bright spark will want to separate on something not in the list. :-(
Its actually more complex than just having a list of separators as you have to ensure the separator is not in the list of allowed punctuation.

Also it offends my sense of Beauty I might have for a language.

Cheers

Allan
Allan
 
Posts: 419
Joined: Sat Oct 01, 2011 7:26 pm

Tue Jan 28, 2020 10:08 am Change Time Zone

Thinking of trying the FUNCTIONMACRO route?
Allan
 
Posts: 419
Joined: Sat Oct 01, 2011 7:26 pm

Tue Jan 28, 2020 2:05 pm Change Time Zone

Allan,

Yeah, using a FUNCTIONMACRO may just work.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1519
Joined: Wed Oct 26, 2011 7:40 pm

Wed Feb 12, 2020 7:57 pm Change Time Zone

Richard,
Unfortunately FUNCTIONMACRO still complains in the same way.
I even tried doing the work in embedded C++ but hit problems there, see other tickets.
But with the Grit between my teeth, I was not going to be defeated.
Got The above functionality working, in standard ECL. You can split strings up by a field separator defined at runtime, plus is copes with quoted strings.
Is there a competition for the most impenetrable, unmaintainable ECL written? If there is I would like to submit the following.
It splits a string into a DATASET of chars (a byte stream) and runs it through a finite state machine!!
Code: Select all
RIn := {UNSIGNED4 Itm;
        UNSIGNED2 State;
        STRING Text};

Ind := 'Allan,and anna,,,a,,"it Don\'t mean, a thing, if it ain\'t got that swing.",\'an Inner "Double,Quote", with "another" hello, and "Another with, embedded comma"\',"a quoted, string" , trailing , stuff.';

d := DATASET(LENGTH(Ind),TRANSFORM(Rin;SELF.Itm := COUNTER;SELF.State := 0;SELF.Text := Ind[COUNTER]));
sd := SORT(d,Itm);
OUTPUT(sd,NAMED('INPUT_AS_A_BYTE_STREAM'),ALL);

DQ   := 1;
SQ   := 2;
SEP  := 3;
REST := 4;

FiniteStateMachine := DICTIONARY(DATASET([{0,DQ,5},{0,SQ,4},{0,SEP,1},{0,REST,0}
                                         ,{1,DQ,7},{1,SQ,6},{1,SEP,3},{1,REST,2}
                                         ,{2,DQ,5},{2,SQ,4},{2,SEP,1},{2,REST,0}
                                         ,{3,DQ,7},{3,SQ,6},{3,SEP,3},{3,REST,2}
                                         ,{4,DQ,4},{4,SQ,0},{4,SEP,4},{4,REST,4}
                                         ,{5,DQ,0},{5,SQ,5},{5,SEP,5},{5,REST,5}
                                         ,{6,DQ,4},{6,SQ,0},{6,SEP,4},{6,REST,4}
                                         ,{7,DQ,0},{7,SQ,5},{7,SEP,5},{7,REST,5}
                                         ],{UNSIGNED2 CurrentState;UNSIGNED1 Tokn;UNSIGNED2 NextState})
                                ,{CurrentState,Tokn=> NextState});

RIn QuoteIt(RIn L,RIn R) := TRANSFORM

    SELF.State:= FiniteStateMachine[L.State,CASE(R.Text[1],'"' => DQ,'\'' => SQ, ',' => SEP, REST)].NextState;
    SELF.Text := MAP(SELF.State IN [0,4,5] => L.Text + R.Text
                    ,SELF.State IN [1]     => L.Text
                    ,SELF.State IN [2,6,7] => R.Text
                    , /* [3] */               '');
    SELF := R;
END;

AllStr := ITERATE(sd,QuoteIt(LEFT,RIGHT));
OUTPUT(AllStr,NAMED('OUTPUT_FROM_FSM'),ALL);

Filtered := AllStr(State IN [1,3]) & AllStr[COUNT(AllStr)];
OUTPUT(Filtered,NAMED('RECORD_CUT_INTO_FIELDS'));

The slightly messy bit at the end (apart from all of it):
Code: Select all
& AllStr[COUNT(AllStr)];

Could be made cleaner with an <end of text> character appended to the end of the byte stream, but again I did not want to second guess a character to use.

Ah - I never want to do that again

Yours

Allan
Allan
 
Posts: 419
Joined: Sat Oct 01, 2011 7:26 pm

Wed Feb 12, 2020 11:42 pm Change Time Zone

Allan,

Here's my version :) :
Code: Select all
Ind := 'Allan,and anna,,,a,,"it Don\'t mean, a thing, if it ain\'t got that swing.",\'an Inner "Double,Quote", with "another" hello, and "Another with, embedded comma"\',"a quoted, string" , trailing , stuff.';

RIn := {UNSIGNED4 Itm;
        UNSIGNED1 State;
        STRING Text};
sd := DATASET(LENGTH(Ind),
              TRANSFORM(Rin;
                        SELF.Itm := COUNTER;
                        SELF.State := 0;
                        SELF.Text := Ind[COUNTER]));
OUTPUT(sd,NAMED('INPUT_AS_A_BYTE_STREAM'),ALL);

DQ   := 1;
SQ   := 2;
SEP  := 3;
REST := 4;

FSM := DICTIONARY(DATASET([{0,DQ,5},{0,SQ,4},{0,SEP,1},{0,REST,0}
                          ,{1,DQ,7},{1,SQ,6},{1,SEP,3},{1,REST,2}
                          ,{2,DQ,5},{2,SQ,4},{2,SEP,1},{2,REST,0}
                          ,{3,DQ,7},{3,SQ,6},{3,SEP,3},{3,REST,2}
                          ,{4,DQ,4},{4,SQ,0},{4,SEP,4},{4,REST,4}
                          ,{5,DQ,0},{5,SQ,5},{5,SEP,5},{5,REST,5}
                          ,{6,DQ,4},{6,SQ,0},{6,SEP,4},{6,REST,4}
                          ,{7,DQ,0},{7,SQ,5},{7,SEP,5},{7,REST,5}
                          ],{UNSIGNED1 CurrentState,
                             UNSIGNED1 Tokn,
                             UNSIGNED1 NextState})
                  ,{CurrentState, Tokn => NextState});

RIn QuoteIt(RIn L,RIn R) := TRANSFORM
    LastItem := R.itm = COUNT(sd);
    SELF.State:= IF(LastItem,
                    3,
                    FSM[L.State,CASE(R.Text[1],
                                     '"' => DQ,
                                     '\'' => SQ,
                                      ',' => SEP,
                                      REST)].NextState);
    SELF.Text := MAP(SELF.State IN [0,4,5] OR LastItem => L.Text + R.Text
                    ,SELF.State = 1        => L.Text
                    ,SELF.State IN [2,6,7] => R.Text
                    , /* [3] */               '');
    SELF := R;
END;

AllStr := ITERATE(sd,QuoteIt(LEFT,RIGHT));
OUTPUT(AllStr,NAMED('OUTPUT_FROM_FSM'),ALL);

Filtered := AllStr(State IN [1,3]);
OUTPUT(Filtered,NAMED('RECORD_CUT_INTO_FIELDS'));
You'll note that the problem you mentioned is gone with the addition of the LastItem definition. I also removed the SORT since your DATASET(cnt,TRANSFORM()) will build the records already sorted. And I changed your State fields to UNSIGNED1 since the range of possible values is only 0-7.

Of course, my next step would be to take all this code and turn it into a FUNCTION that takes a single STRING parameter. :)

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1519
Joined: Wed Oct 26, 2011 7:40 pm

Thu Feb 13, 2020 8:27 am Change Time Zone

Thanks Richard,

Yes, I realised I did not need the SORT but it was not the meat of the program , so just got left in.

I like your approach to 'last item' it's tidyer than mine.

I must admit, I did not think you would take it any further, just leave it as a curiosity.

Any way, Thanks

Allan
Allan
 
Posts: 419
Joined: Sat Oct 01, 2011 7:26 pm

Thu Feb 13, 2020 3:38 pm Change Time Zone

Ah Richard,

Just found out your version does not quite work for trailing field separators in a record
e.g.
Code: Select all
trailing , stuff.,,,

As you have unconditionally forced 'LastItem' to perform action 'L.Text + R.Text' but that is not the case for trailing separators.

Tricky things FSM :-)

The 'proper' solution is to have an <end of text> token actually in the byte stream.
I'll post and amendment to your solution.

Just thought I'd better mention it just in case, being published on the forum, someone actually uses this mad stuff as an example!

Yours

Allan
Allan
 
Posts: 419
Joined: Sat Oct 01, 2011 7:26 pm

Thu Feb 13, 2020 6:22 pm Change Time Zone

ok,
This copes with, recognises and retains, empty fields at the end of records.
I appends an 'X' to the input stream to make sure the ITERATE gets run one further iteration, the 'X' itself is not used.
Then the test:
Code: Select all
LastItem := R.itm = COUNT(sd);

is still valid, but this time can put the new token EOT through the finite state machine.
resulting in a natural flow to the handling of the EOT.
Code: Select all
Ind := 'Allan,and anna,,,a,,"it Don\'t mean, a thing, if it ain\'t got that swing.",\'an Inner "Double,Quote",'
      +' with "another" hello, and "Another with, embedded comma"\',"a quoted, string" , trailing , stuff.,,,';

RIn := {UNSIGNED4 Itm;
        UNSIGNED1 State;
        STRING Text};
sd := DATASET(LENGTH(Ind)+1,
              TRANSFORM(Rin;
                        SELF.Itm := COUNTER;
                        SELF.State := 0;
                        SELF.Text := IF(COUNTER > LENGTH(Ind),'X',Ind[COUNTER])));
OUTPUT(sd,NAMED('INPUT_AS_A_BYTE_STREAM'),ALL);

DQ   := 1;
SQ   := 2;
SEP  := 3;
REST := 4;
EOT  := 5;

FSM := DICTIONARY(DATASET([{0,DQ,5},{0,SQ,4},{0,SEP,1},{0,REST,0},{0,EOT,1}
                          ,{1,DQ,7},{1,SQ,6},{1,SEP,3},{1,REST,2},{1,EOT,3}
                          ,{2,DQ,5},{2,SQ,4},{2,SEP,1},{2,REST,0},{2,EOT,1}
                          ,{3,DQ,7},{3,SQ,6},{3,SEP,3},{3,REST,2},{3,EOT,3}
                          ,{4,DQ,4},{4,SQ,0},{4,SEP,4},{4,REST,4},{4,EOT,1}
                          ,{5,DQ,0},{5,SQ,5},{5,SEP,5},{5,REST,5},{5,EOT,1}
                          ,{6,DQ,4},{6,SQ,0},{6,SEP,4},{6,REST,4},{6,EOT,1}
                          ,{7,DQ,0},{7,SQ,5},{7,SEP,5},{7,REST,5},{7,EOT,1}
                          ],{UNSIGNED1 CurrentState,
                             UNSIGNED1 Tokn,
                             UNSIGNED1 NextState})
                  ,{CurrentState, Tokn => NextState});

RIn SplitRecordsIntoFields(RIn L,RIn R) := TRANSFORM
    LastItem := R.itm = COUNT(sd);
    SELF.State:= FSM[L.State,IF(LastItem
                                 ,EOT
                                 ,CASE(R.Text[1],
                                       '"'  => DQ,
                                       '\'' => SQ,
                                       ','  => SEP,
                                        REST))].NextState;
    SELF.Text := MAP(SELF.State IN [0,4,5] => L.Text + R.Text
                    ,SELF.State = 1        => L.Text
                    ,SELF.State IN [2,6,7] => R.Text
                    , /* [3] */               '');
    SELF := R;
END;

AllStr := ITERATE(sd,SplitRecordsIntoFields(LEFT,RIGHT));
OUTPUT(AllStr,NAMED('OUTPUT_FROM_FSM'),ALL);

Filtered := AllStr(State IN [1,3]);
OUTPUT(Filtered,NAMED('RECORD_CUT_INTO_FIELDS'));


Yours

Allan
Allan
 
Posts: 419
Joined: Sat Oct 01, 2011 7:26 pm

Next

Return to Programming

Who is online

Users browsing this forum: No registered users and 0 guests

cron