Wed Sep 30, 2020 10:52 am
Login Register Lost Password? Contact Us


Matching the 'longest' string preferentially using PARSE.

Comments and questions related to the Enterprise Control Language

Fri Jan 03, 2020 12:00 pm Change Time Zone

Hi,

I expect this is a very simple question to answer, but the following is bugging me.
I have a comma separated list of integers or reals, interspersed in any order, so for example:

Code: Select all
12,13,-5.8,+22.732,6,1234567,-128.0


If a number is a real PATTERN(int dot int) I want that selected over a possible match of just an 'Int'.
So I would like to extract from the above:
    12
    13
    -5.8
    +22.732
    6
    1234567
    -128.0
my problem is that my match for integers matches in preference to my pattern for real and returns say:
    12
    13
    -5

This is a 2 min question for the likes of Richard.

Cheers
Allan
Allan
 
Posts: 430
Joined: Sat Oct 01, 2011 7:26 pm

Sat Jan 04, 2020 8:49 am Change Time Zone

Allan,

Once you've extracted the numeric values, just cast them all to STRING then use the LENGTH function to get the longest one. :D

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1556
Joined: Wed Oct 26, 2011 7:40 pm

Mon Jan 06, 2020 11:24 am Change Time Zone

Thanks Richard,

Yes obviously I could just match ,(.*), which extracts the items that can then have post processing done on them, but it does not answer my question.

There may be many levels of PATTERNS that match on particular stretch of input but the question is how to inhibit matching on shorter patterns if a longer pattern matches, but DO match if any longer patterns do NOT match.

Cheers
Allan
Allan
 
Posts: 430
Joined: Sat Oct 01, 2011 7:26 pm

Mon Jan 06, 2020 3:26 pm Change Time Zone

Allan,

Sorry, I obviously read too quickly and replied too glibly. :)

Here's my real solution (two ways):
Code: Select all
rec := {STRING n};
ds := DATASET([{'12,13,-5.8,+22.732,6,1234567,-128.0'},
               {'11,10,-9.6,+34.999,9,7654321,-459.0'}],rec);

//SplitWords solution:                     
IMPORT Std;
ResRec := {DATASET(rec) Nbrs};
P := PROJECT(ds,TRANSFORM(ResRec,
                          SELF.Nbrs := DATASET(Std.Str.SplitWords(LEFT.n,','),rec)));
P.Nbrs;

//PARSE solution:                     
PATTERN nbr := PATTERN('[-+.0-9]')+;
PATTERN sep := ',';
RULE    num := nbr OPT(sep);

Prec := {STRING n := MATCHTEXT(nbr)};
PARSE(ds,n,num,Prec,FIRST);
The first solution just uses the SplitWords function from the Standard Library and a nested child dataset.

The second is the PARSE answer I think you're looking for. Notice that I'm using a single pattern for the numbers and not building from smaller patterns -- for this problem, that makes more sense to me.

I think the general rule would be to try to create parsing patterns that encompass all the possible variants of a single entity type (in this case, matching both positive and negative ints and reals as just generic numeric entities). Doing that should sidestep the "shorter match vs longer match" issue you're asking about.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1556
Joined: Wed Oct 26, 2011 7:40 pm

Tue Jan 07, 2020 9:12 am Change Time Zone

Thanks Richard,

Yes I am concentrating on PARSE. Still getting my head round it.
ok, you're solution is just a clever pretty version of my ,(.*), solution. As you say is 'side steps' the issue.

I aiming to get my head round the general way to preferentially match longer patterns using PARSE, but confining myself to this specific example, say I wanted additional information returned by PARSE, say it had to recognise that the element was an integer or real so returned a dataset like:
Code: Select all
Real     itm
FALSE    12
FALSE    13
TRUE     -5.8
TRUE     +22.732
etc

Then the issue could not be sidestepped. (well could be by analysing the element within the transform, but I'm looking to the pattern matcher to do the work.)

Yours
Allan
Allan
 
Posts: 430
Joined: Sat Oct 01, 2011 7:26 pm

Tue Jan 07, 2020 9:41 am Change Time Zone

Hum,

Thinking on about this, perhaps the only way is to do the work in the TRANSFORM.
By that I mean use:

Code: Select all
WHICH(MATCHED(<long pattern reference>),MATCHED(<shorter pattern reference>),MATCHED(<shortest pattern reference>));


?

Allan
Allan
 
Posts: 430
Joined: Sat Oct 01, 2011 7:26 pm

Wed Jan 08, 2020 9:00 pm Change Time Zone

Allan,

OK, here are THREE ways to do it (the first two you've already seen). You'll note that all three examples create exactly the same result:
Code: Select all
rec := {STRING n};
ds := DATASET([{'12,13,-5.8,+22.732,6,1234567,-128.0'},
               {'11,10,-9.6,+34.999,9,7654321,-459.0'}],rec);

//SplitWords solution:                     
IMPORT Std;
ResRec := {DATASET(rec) Nbrs};
P := PROJECT(ds,TRANSFORM(ResRec,
                          SELF.Nbrs := DATASET(Std.Str.SplitWords(LEFT.n,','),rec)));
P.Nbrs;

//PARSE solution:                     
PATTERN nbr := PATTERN('[-+.0-9]')+;
PATTERN sep := ',';
RULE num := nbr OPT(sep);

Prec := {STRING n := MATCHTEXT(nbr)};
PARSE(ds,n,num,Prec,FIRST);

//Second PARSE solution:                     
PATTERN int := PATTERN('[0-9]')+;
PATTERN dot := '.';
PATTERN sign := ['+','-'];
PATTERN real_nbr := int dot int;
PATTERN int_nbr  := int;
PATTERN val := real_nbr | int_nbr;

RULE the_val := OPT(sign) val OPT(sep);

Vrec := {STRING n := MATCHTEXT(sign) + MATCHTEXT(val)};
PARSE(ds,n,the_val,Vrec,FIRST);
The key to your pattern precedence issue is handled by the order of the alternative patterns in the val PATTERN definition.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1556
Joined: Wed Oct 26, 2011 7:40 pm

Fri Jan 17, 2020 10:25 am Change Time Zone

Great Richard,

So I have the key now to controlling preference.
Perhaps this could be made clear in the REF manual?

Thanks very much

Allan
Allan
 
Posts: 430
Joined: Sat Oct 01, 2011 7:26 pm


Return to ECL

Who is online

Users browsing this forum: No registered users and 1 guest

cron