Tue Feb 25, 2020 1:50 am
Login Register Lost Password? Contact Us


Finding the number of elements matched by PARSE

Comments and questions related to the Enterprise Control Language

Thu Dec 12, 2019 11:01 am Change Time Zone

Hi,

I have textual input of the form:
Code: Select all
[setelement1,setelement2,setelement3]

I can supply patterns to PARSE to extract individual elements from the set, no problem.
And obviously there can be a variable number of set elements.
But I need to load these elements into separate rows in a DATASET, but I can't see a way of processing a variable number of elements. There does not seem to be a equivalent of MATCHED or MATCHETEXT that gives you the number of elements (at that level) matched.
I know that if you use an 'out of range' index to a MATCHTEXT(pattern path) you get an empty string back so I've tried:

Code: Select all
RETURN LOOP(DATASET([],Layouts.Boundary)
     ,MATCHTEXT(OneSet/OperandAlpha[COUNTER]) <> ''
     ,PROJECT(ROWS(LEFT),TRANSFORM(Layouts.Boundary;
               SELF.BoundaryID := 0;
               SELF.TypeBoundary := Constants.BoundaryTypes.SETELEMENT;
               SELF.Value := MATCHTEXT(OneSet/OperandAlpha[COUNTER])
               )
        )
     );


This passes the syntax check but at runtime I get error:
Code: Select all
Expression is not constant: COUNTER

I've had this problem before and raised a ticket with the core team.
https://track.hpccsystems.com/browse/HPCC-22160
This has been 'accepted' as an issue, but I need a workaround now.

I'll attach my PATTERNS I'm using if it helps. (the set PATTERN is called 'oneset')
Any ideas?

Yours
Allan
Attachments
Patterns.ecl
The PATTERN attributes
(1.17 KiB) Downloaded 13 times
Last edited by Allan on Fri Dec 13, 2019 8:57 am, edited 1 time in total.
Allan
 
Posts: 419
Joined: Sat Oct 01, 2011 7:26 pm

Thu Dec 12, 2019 2:45 pm Change Time Zone

I have actually 'fixed' it myself, by MACROising 500 constant references to the instance of the pattern match:
Code: Select all
DATASET(Layouts.Boundary) GatherSetElements := FUNCTION
MAC_accessOperandAlpha (num) := MACRO
    #DECLARE(eclfragment)
    #SET(eclfragment,'')
    #DECLARE(cnt)
    #SET(cnt,0)
    #DECLARE(sep)
    #SET(sep,'')

    #LOOP
      #IF (%cnt% = num)
     #BREAK
      #ELSE
   #SET(cnt,%cnt% +1)
   #APPEND(eclfragment,%'sep'%+'{0,Constants.BoundaryTypes.SETELEMENT,MATCHTEXT(OneSet/OperandAlpha['+%'cnt'%+'])}')
   #SET(sep,',')
      #END
    #END
    %'eclfragment'%
ENDMACRO;

d := DATASET([ #EXPAND(MAC_accessOperandAlpha(500)) ],Layouts.Boundary);
RETURN d(Value != '');
END;


Not pretty at all, so if anyone has a better idea please share it.

Thanks

Allan
Allan
 
Posts: 419
Joined: Sat Oct 01, 2011 7:26 pm

Mon Dec 23, 2019 8:24 am Change Time Zone

Allan,

I would probably approach this by adding a UID to each record before doing the PARSE.

Then I would let PARSE extract each set element (and keep the record number it came from in each result rec).

Then I would do a simple crosstab on that result to determine how many set elements were in each unique input record.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1519
Joined: Wed Oct 26, 2011 7:40 pm

Fri Jan 03, 2020 11:50 am Change Time Zone

Hi Richard,

I don't understand your post.
Could you give an example when you have time.

Thanks

Allan
Allan
 
Posts: 419
Joined: Sat Oct 01, 2011 7:26 pm

Sat Jan 04, 2020 8:46 am Change Time Zone

Allan,

I meant something like this:
Code: Select all
ds := DATASET()[
  {'setelement1,setelement2,setelement3'},
  {'setelement1,setelement2,setelement3'},
  {'setelement1,setelement2,setelement3'},
  {'setelement1,setelement2,setelement3'}],{STRING s});

//add a UID to each rec
UIDrec := {UNSIGNED UID,STRING s};
ds_UID := PROJECT(ds,TRANSFORM(UIDrec,
                               SELF.UID := COUNTER,
                               SELF.s := LEFT.s));
//parsing patterns:             
PATTERN nbr := PATTERN('[0-9]');             
PATTERN sep := ',';
PATTERN element  := 'setelement' nbr;     
PATTERN elements := element OPT(sep);

UIDrec XF(ds L) := TRANSFORM
  SELF.UID := L.UID;
  SELF.s := MATCHTEXT(element);
END; 
PARSE(ds_UID,s,elements,XF(LEFT));

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1519
Joined: Wed Oct 26, 2011 7:40 pm

Fri Jan 17, 2020 10:27 am Change Time Zone

Thanks Richard,

This implementation is a definite improvement on mine.
Just correcting two typos in your ECL:
Code: Select all
ds := DATASET([
  {'setelement1,setelement2,setelement3'},
  {'setelement1,setelement2,setelement3'},
  {'setelement1,setelement2,setelement3'},
  {'setelement1,setelement2,setelement3'}],{STRING s});

//add a UID to each rec
UIDrec := {UNSIGNED UID,STRING s};
ds_UID := PROJECT(ds,TRANSFORM(UIDrec,
                               SELF.UID := COUNTER,
                               SELF.s := LEFT.s));
//parsing patterns:             
PATTERN nbr := PATTERN('[0-9]');             
PATTERN sep := ',';
PATTERN element  := 'setelement' nbr;     
PATTERN elements := element OPT(sep);

UIDrec XF(ds_UID L) := TRANSFORM
  SELF.UID := L.UID;
  SELF.s := MATCHTEXT(element);
END;
PARSE(ds_UID,s,elements,XF(LEFT));

Yours
Allan
Allan
 
Posts: 419
Joined: Sat Oct 01, 2011 7:26 pm


Return to ECL

Who is online

Users browsing this forum: No registered users and 1 guest