Tips and Tricks for ECL — Part 2 — PARSE

PARSE Function

I receive a lot of interesting questions from members in our community about how to do this and that in ECL. Recently, someone sent me some code and asked how to PARSE dates.

I am always happy to share my knowledge of ECL, so if you have a question you want me to answer (in the blog), please send an email to richard.taylor@lexisnexisrisk.com and I will do my best to come up with an ECL example demonstrating the solution.

How PARSE Patterns are Built

PARSE is usually associated with free form text parsing, but it can also be used to parse and standardize any data in any string field in any dataset. So that is what this article discusses — parsing and standardizing dates and times from a single field of a multi-field dataset.

Parsing is all about pattern matching, so here are the types of data patterns we are looking for and the dataset the code will actually parse:

// Here are the top 20 patterns encountered:
 // AAA, 99 AAA 9999 99:99:99 -9999 
 // 99-99-9999 
 // 99.99.9999 
 // --99/99/99 
 // 99 AAA 9999 
 // 99/99/99 
 // AAA 99, 9999 
 // 99/99/9999 
 // -99/99/99 
 // 99 AAA. 9999 
 // AAAAAAAA 99, 9999 
 // AAAAA 99, 9999 
 // AAAAAAA 99, 9999 
 // AAAA 99, 9999 
 // AAA, 99 AAA 9999 99:99:99 AAA 
 // 99-AAA-9999 
 // 99 AAAAAAAA 9999 
 // 99/99/9999 99:99:99 
 // 99-AAAAAAAA, 9999 99:99 AAA 
 // 99 AAAAA 9999
 
 //Here's the test data, based on the above pattern examples:
  ds := DATASET([{1,'25 MAY. 2005'},  {2,'30/08/2009'},  {3,'THURSDAY, MARCH 7, 2013 - 1:30 PM'},
                 {4,'7 SEP, 2006'},   {5,'18-09-2008'},  {6,'SAT, 04/27/2013 - 1:30:24 AM'},
                 {7,'25 MARCH 2013'}, {8,'22.01.2013'},  {9,'15-MAR-2004'}, 
                 {10,'3-14-13'},      {11,'13/03/20'},   {12,'27/08/85 17:45:30'}],
                {UNSIGNED1 UID,STRING40 s});

Just as in all ECL coding, you start parse patterns by creating “atomic” bits and then growing them into more complex patterns as you go. So the first thing we need to do is define the bottom level of patterns.

//the "atomic" bits:
PATTERN Alpha := PATTERN('[A-Z]')+;    //any number of alpha characters together
PATTERN Nbr   := PATTERN('[0-9]');     //a single numeric digit
PATTERN Sep   := PATTERN('[-, /.]');   //separators -- note the presence of "space" character

The PATTERN function in a PATTERN definition uses Perl standard regular expressions to define a matching pattern. The Alpha definition finds any number of capital letters A to Z strung together (the + sign indicates 1 or more instances). Nbr defines a number as a single numeric character, zero (0) through nine (9). The Sep definition specifies the set of valid separators between patterns — in this case the dash, comma, space, slash, and period characters.

Next, I grow my patterns by building on previously defined patterns:

//more complex building blocks:
PATTERN Ws    := Sep OPT(Sep);         //"white space" = 1 or 2 separators
PATTERN Num12 := OPT(Nbr) Nbr;         //a 1 or 2-digit number

Ws (white space) is a Sep followed optionally by a second Sep (exactly one or two Sep instances) that specify the white space between the text patterns we want to find. Num12 is defined similarly as a one or two digit number (for use as month and day values).

Three Ways to Define Repeating Patterns in ECL

The Year pattern is a repeating type of pattern, because a year may be represented as a 2-digit or 4-digit number. So the following code demonstrates three ways to define the same thing.

The first way is explicit — two single digits optionally followed by an additional two single digits:

PATTERN Year  := Nbr Nbr OPT(Nbr Nbr);	   //a 2 or 4-digit number, explicit

Or we can use the REPEAT function to define exactly the same pattern.

// PATTERN Year  := REPEAT(Nbr,2) OPT(REPEAT(Nbr,2));	 //a 2 or 4-digit number, using REPEAT syntax

Or we can use shorthand (regular expression syntax) instead of REPEAT to achieve the same pattern:

// PATTERN Year  := Nbr*2 OPT(Nbr*2);	    //a 2 or 4-digit number using regular expression syntax

What if the Data Includes Time?

Some of the example data patterns include 12-hour time (with AM and PM), and those patterns show us several different ways of representing time. So there are two different types of time patterns, AMPM and Zulu:

PATTERN AMPM  := Num12 ':' Num12 OPT(':' Num12) ' ' Alpha;
PATTERN Zulu  := Num12 ':' Num12 ':' Num12;
PATTERN Time  := (AMPM | Zulu);

The AMPM pattern is a numeric pattern ending with a non-optional Alpha pattern. So that’s a one or two digit number followed by a colon, followed by another one or two digit number, optionally followed by another colon and another one or two digit number, but always followed by Alpha characters (in our example data, that’s always going to be AM or PM).

Zulu is a purely numeric pattern — three instances of Num12 delimited by colons, giving us hours, minutes and seconds.

The Time pattern matches either the AMPM pattern or the Zulu pattern — the vertical bar (|) is shorthand for a logical OR operator in pattern expressions.

Using the VALIDATE Function

Because our input data is not yet standardized, we’re getting month data that may be either numeric or alphabetic characters, so to parse the month data I’m using a technique that is less commonly used — the VALIDATE function (which is only valid for use in PARSE pattern expressions).

//a pattern using VALIDATE:
SetMonths := ['JAN','FEB','MAR','APR','MAY','JUN','JUL','AUG','SEP','OCT','NOV','DEC'];
isValidMonth(STRING txt) := txt[1..3] IN SetMonths;
PATTERN Month := VALIDATE(Alpha,isValidMonth(MATCHTEXT)) ;

The VALIDATE function passes the matching text from the Alpha pattern to the isValidMonth function. That isValidMonth returns a Boolean True/False — are the first three characters of that passed text in my SetMonths? So, VALIDATE defines the Month as either the valid Alpha text or an empty string if the Alpha is invalid. This eliminates non-month text (like “Thursday” or “Saturday”) as not months.

Putting it All Together

Finally, all the date patterns that we’re looking for fall into three possible categories of dates:

//the final parsing patterns:
PATTERN NumDate    := Num12 Ws Num12 Ws Year OPT(ws+ Time);
PATTERN AlphaDate1 := Month Ws Num12 Ws Year OPT(ws+ Time);
PATTERN AlphaDate2 := Num12 Ws Month Ws Year OPT(ws+ Time);

NumDate defines a fully numeric pattern, AlphaDate1 defines a “US-style” Month-Day-Year pattern with valid alphabetic month data, and AlphaDate2 defines a “British-style” Day-Month-Year pattern with valid alphabetic month data. All three may or may not have time data.

And by getting all the example patterns resolved to just three possibilities, that makes our final DateRule RULE just looking for which of the three patterns the data matches: NumDate, AlphaDate1, or AlphaDate2.

//and the RULE to actually do the pattern matching:
RULE DateRule := (NumDate | AlphaDate1 | AlphaDate2);

Since DateRule is a RULE, the three patterns are treated as TOKENs.

Formatting the PARSE Ouput

This RECORD structure defines the layout of the result from the PARSE function.

OutRec := RECORD
   UNSIGNED1 UID;
   UNSIGNED1 PatternType;
   STRING40  InputStr;
   STRING2   Day;
   STRING2   Month;
   STRING4   Year; 
  STRING8   YYYYMMDD; 
  STRING12  Time;
 END;

The PatternType field is there just to show us exactly which pattern the DateRule matched. The InputStr just repeats the input data. Then the Day, Month, and Year fields show the granular detail of how those values were parsed.

Last, the YYYYMMDD and Time fields are the real purpose of the exercise — the standardized formats that we want to carry forward to build our data product from the raw input data.

PARSE Using a TRANSFORM

Most PARSE code just uses a RECORD structure as the output, so the PARSE operates the same way the TABLE function does. But you can also use a TRANSFORM function to make PARSE operate the same way the PROJECT function does. The advantage of using TRANSFORM is that you can write more complex code than you could in a RECORD structure, allowing you to format your result any way you need to.

I have some fairly complex code to actually determine what the dates are and standardize them into a single standard output format no matter what input format they are in. So the TRANSFORM starts simply:

OutRec XF(ds L) := TRANSFORM

The first thing I want to do is determine which pattern matched. The WHICH function returns the ordinal position of the first Boolean expression parameter that is true. The MATCHED function is specific to PARSE use, and it returns a BOOLEAN indicating “Did this pattern match or not?” So the WhichPtn definition (scope limited to operate only within this TRANSFORM) tells which pattern made the match.

  //determine which pattern matched
   WhichPtn := WHICH(MATCHED(NumDate),MATCHED(AlphaDate1),MATCHED(AlphaDate2));

Then I can simply define the data for the first three fields:

  //output the pattern type and matching input string
   SELF.PatternType := WhichPtn; 
    SELF.InputStr := L.s;
   SELF.UID := L.UID;

Now I need to know whether I’m dealing with an America or British style date format:

  //determine if numeric date is in "dd mm" format instead of "mm dd": 
  P1 := IF(WhichPtn = 1,MATCHTEXT(NumDate/Num12[1]),'');
  P2 := IF(WhichPtn = 1,MATCHTEXT(NumDate/Num12[2]),'');
  //if first pair of digits can't be a month, flag as "B"ritish, else "A"merican format
  P3 := IF((UNSIGNED1)P1 > 12,'B','A');

These three local definitions (P1, P2, and P3), work together to determine whether the style is British or American so they can determine exactly which bit of which pattern specifies the day of the month. The MATCHTEXT function is specific to PARSE use, and it returns a the text that matches the parameter it’s given. For P1, that’s the first instance of Num12 within NumDate. For P2, that’s the second instance of Num12 within NumDate. The P3 determines whether the P1 value is not a valid month value (1-12), if so, then it’s a British date.

Now to determine the actual Day value:

  DayNum := (UNSIGNED1)CHOOSE(WhichPtn,
                              IF(P3 = 'B',P1,P2),           //pattern 1
                              MATCHTEXT(AlphaDate1/Num12),  //pattern 2
                              MATCHTEXT(AlphaDate2/Num12)); //pattern 3
  SELF.Day := IF(DayNum < 10,INTFORMAT(DayNum,2,1),(STRING2)DayNum);

The CHOOSE function in the DayNum local definition uses the WhichPtn integer to control where to extract that information from (P1 or P2 if it was pattern 1 match). Then the IF function in the SELF.Day definition ensures that the result is a 2-digit string.

The month number is handled similarly, making use of the already defined P1/P2 values to determine whichever one was not the day number:

//determine how the month is represented
 M1 := CHOOSE(WhichPtn,
                '',                           //pattern 1
                MATCHTEXT(AlphaDate1/Alpha),  //pattern 2
                MATCHTEXT(AlphaDate2/Alpha)); //pattern 3
   M2 := CASE(M1[1..3],'JAN' => '01','FEB' => '02','MAR' => '03','APR' => '04','MAY' => '05','JUN' => '06',
                       'JUL' => '07','AUG' => '08','SEP' => '09','OCT' => '10','NOV' => '11','DEC' => '12','');
   FmtMnth(STRING2 m) := IF(LENGTH(TRIM(m)) = 1, '0' + m, m);									   SELF.Month := IF(WhichPtn = 1,
                    IF(P3 = 'B',FmtMnth(P2),FmtMnth(P1)), //pattern 1
                    M2);                                  //pattern 2 & 3

The M1 and M2 definitions translate month names to just the standard 2-digit number for the month. FmtMonth ensures a 2-digit month number string to go into the SELF.Month result field.

The Year value can be either 2 or 4 digits, so any 2-digit years are translated to the appropriate 4-digit representation. This code assumes that any 2-digit year value greater than or equal to 80 must be a 20th century date, and all others are assumed to be 21st century:

//handle 2 vs 4-digit years
PYear := CHOOSE(WhichPtn,
                   MATCHTEXT(NumDate/Year),      //pattern 1
                   MATCHTEXT(AlphaDate1/Year),   //pattern 2
                   MATCHTEXT(AlphaDate2/Year));  //pattern 3
   SELF.Year := IF(LENGTH(PYear) = 4,
                   PYear,
                   IF(PYear >= '80','19'+Pyear,'20'+Pyear));
//and put it all together in a standard format																   SELF.YYYYMMDD := SELF.Year + SELF.Month + SELF.Day;

So at this point, standardizing all the date elements into a YYYYMMDD string is a simple concatenation of the three date elements into the SELF.YYYYMMDD result field.

Standardizing the time is a little simpler. The format we’re going for here is the Zulu format (24 hour clock — AKA “military” time), as in HH:MM:SS.

//and standardize the time
   isAMPMtime  := MATCHED(Time/AMPM);
   isAMPMSecs  := MATCHED(Time/AMPM/Num12[3]);
   AMPMhour    := (UNSIGNED1)MATCHTEXT(Time/AMPM/Num12[1]);
   AMPMhourStr := IF(MATCHTEXT(Time/AMPM/Alpha)='PM',
                     ((STRING2)(AMPMhour + 12)),
                     INTFORMAT(AMPMhour,2,1));
   SELF.Time  := MAP(isAMPMtime AND isAMPMSecs =>
                        AMPMhourStr + ':' + MATCHTEXT(Time/AMPM/Num12[2])
                                    + ':' + MATCHTEXT(Time/AMPM/Num12[3]) ,
                              isAMPMtime AND NOT isAMPMSecs =>
                                AMPMhourStr + ':' + MATCHTEXT(Time/AMPM/Num12[2]) + ':00',
                     MATCHTEXT(Time/Zulu));
 END;

First you need to determine if there is an “AM” or “PM” in the time format (with the isAMPMtime definition), and if there are seconds represented or not (with the isAMPMsecs definition). Then the hour value (with the AMPMhour definition) and whether you need to add 12 to that for a PM hour (with the AMPMhourStr definition). Then creating the final standard format is accomplished with a simple MAP function into the SELF.Time result field..

Finally, the PARSE itself:

p := PARSE(ds,s,DateRule,XF(LEFT), BEST);

p;

This parses the data in the “s” field from the “ds” dataset, using the DateRule RULE to look for matches. The output is processed by the XF TRANSFORM function, passing it the LEFT record (just like a PROJECT does). The BEST option indicates that only the best match for the RULE will end up in the result. Running the code produces this result:

1   3  25 MAY. 2005                              25  05  2005  20050525              
2   1  30/08/2009                                30  08  2009  20090830              
3   2  THURSDAY, MARCH 7, 2013 - 1:30 PM         07  03  2013  20130307  13:30:00    
4   3  7 SEP, 2006                               07  09  2006  20060907              
5   1  18-09-2008                                18  09  2008  20080918              
6   1  SAT, 04/27/2013 - 1:30:24 AM              27  04  2013  20130427  01:30:24    
7   3  25 MARCH 2013                             25  03  2013  20130325              
8   1  22.01.2013                                22  01  2013  20130122              
9   3  15-MAR-2004                               15  03  2004  20040315              
10  1  3-14-13                                   14  03  2013  20130314              
11  1  13/03/20                                  13  03  2020  20200313              
12  1  27/08/85 17:45:30                         27  08  1985  19850827  17:45:30

Record 1 matched pattern type 3 (“25 May, 2005”) producing the standard date “20050525” and no time.

Record 3 matched pattern type 2 (“THURSDAY, MARCH 7, 2013 – 1:30 PM”) producing the standard date “20130307” and standard time “13:30:00”

Record 6 matched pattern type 1 (“SAT, 04/27/2013 – 1:30:24 AM”) producing the standard date “20130427” and standard time “01:30:24”.

Other ways to Parse Dates

There are different ways in ECL to parse dates other than the PARSE function. One way is simply using the Date Standard Library STD.Date.MatchDateString function, which requires you to create a set of formats to use for its matching, like this:

//And using the Date Library parsing function you can get almost as far: 
IMPORT STD;
SetFormats := [ '%m/%d/%Y',   '%d/%m/%Y',   '%m/%d/%y',    '%d/%m/%y',    
                '%m.%d.%Y',   '%d.%m.%Y',   '%m.%d.%y',    '%d.%m.%y',
                '%m-%d-%Y',   '%d-%m-%Y',   '%m-%d-%y',    '%d-%m-%y',
                '%d%t%B%t%y', '%d%t%b%t%y', '%d%t%B.%t%y', '%d%t%b.%t%y',                 
                '%d-%B-%y',   '%d-%B-%Y',   '%d-%b-%y']; 
OutRec2 := RECORD
  UNSIGNED1 UID;
  STRING40 InStr;
  UNSIGNED4 Date;
END;
pstd := PROJECT(ds,TRANSFORM(OutRec2,SELF.UID:=LEFT.UID,SELF.InputStr:=LEFT.s,
                             SELF.Date:= STD.Date.MatchDateString(LEFT.s,SetFormats) ));
pstd;

When you run this code, the result looks like this: and you’ll see that the function’s standardized dates don’t necessarily match what was just PARSEd.

1   25 MAY. 2005                              20200525
2   30/08/2009                                20090830
3   THURSDAY, MARCH 7, 2013 - 1:30 PM         0
4   7 SEP, 2006                               0
5   18-09-2008                                20080918
6   SAT, 04/27/2013 - 1:30:24 AM              0
7   25 MARCH 2013                             20200325
8   22.01.2013                                20130122
9   15-MAR-2004                               20200315
10  3-14-13                                   130314
11  13/03/20                                  200313
12  27/08/85 17:45:30                         850827

So how can I detect, which one is better to use? Here’s a simple JOIN between the two results with a TRANSFORM to see which ones did not match:

JOIN(p,pstd, LEFT.UID = RIGHT.UID,
     TRANSFORM({UNSIGNED1 UID,STRING40 InputStr,STRING8 YYYYMMDD, UNSIGNED4 Date, STRING3 Match},
               SELF.Match := IF(LEFT.YYYYMMDD = (STRING8)RIGHT.Date,'yes','NO'); 
               SELF := LEFT; SELF := RIGHT));

Which produces this result:

1   25 MAY. 2005                              20050525  20200525  NO 
2   30/08/2009                                20090830  20090830  yes
3   THURSDAY, MARCH 7, 2013 - 1:30 PM         20130307  0         NO 
4   7 SEP, 2006                               20060907  0         NO 
5   18-09-2008                                20080918  20080918  yes
6   SAT, 04/27/2013 - 1:30:24 AM              20130427  0         NO 
7   25 MARCH 2013                             20130325  20200325  NO 
8   22.01.2013                                20130122  20130122  yes
9   15-MAR-2004                               20040315  20200315  NO 
10  3-14-13                                   20130314  130314    NO 
11  13/03/20                                  20200313  200313    NO 
12  27/08/85 17:45:30                         19850827  850827    NO

So that’s the power and the possibilities of some of the things that we can do in PARSE. Here’s the whole code file in one go:

 // Here are the top 20 patterns encountered:
 // AAA, 99 AAA 9999 99:99:99 -9999 
 // 99-99-9999 
 // 99.99.9999 
 // --99/99/99 
 // 99 AAA 9999 
 // 99/99/99 
 // AAA 99, 9999 
 // 99/99/9999 
 // -99/99/99 
 // 99 AAA. 9999 
 // AAAAAAAA 99, 9999 
 // AAAAA 99, 9999 
 // AAAAAAA 99, 9999 
 // AAAA 99, 9999 
 // AAA, 99 AAA 9999 99:99:99 AAA 
 // 99-AAA-9999 
 // 99 AAAAAAAA 9999 
 // 99/99/9999 99:99:99 
 // 99-AAAAAAAA, 9999 99:99 AAA 
 // 99 AAAAA 9999 

  ds := DATASET([{1,'25 MAY. 2005'},  {2,'30/08/2009'},  {3,'THURSDAY, MARCH 7, 2013 - 1:30 PM'},
                 {4,'7 SEP, 2006'},   {5,'18-09-2008'},  {6,'SAT, 04/27/2013 - 1:30:24 AM'},
                 {7,'25 MARCH 2013'}, {8,'22.01.2013'},  {9,'15-MAR-2004'}, 
                 {10,'3-14-13'},      {11,'13/03/20'},   {12,'27/08/85 17:45:30'}],
                {UNSIGNED1 UID,STRING40 s});

//Here's the PARSE code:

//the "atomic" bits:
PATTERN Alpha := PATTERN('[A-Z]')+;    //any number of alpha characters together
PATTERN Nbr   := PATTERN('[0-9]');     //a single numeric digit
PATTERN Sep   := PATTERN('[-, /.]');   //separators -- note the presence of "space" character

//more complex building blocks:
PATTERN Ws    := Sep OPT(Sep);         //"white space" = 1 or 2 separators
PATTERN Num12 := OPT(Nbr) Nbr;							  //a 1 or 2-digit number

PATTERN Year  := Nbr Nbr OPT(Nbr Nbr);	   //a 2 or 4-digit number, explicit
// PATTERN Year  := REPEAT(Nbr,2) OPT(REPEAT(Nbr,2));	 //a 2 or 4-digit number, using REPEAT syntax 
// PATTERN Year  := Nbr*2 OPT(Nbr*2);	    //a 2 or 4-digit number using regular expression syntax 

PATTERN AMPM  := Num12 ':' Num12 OPT(':' Num12) ' ' Alpha;
PATTERN Zulu  := Num12 ':' Num12 ':' Num12;
PATTERN Time  := (AMPM | Zulu);

//a pattern using VALIDATE:
SetMonths := ['JAN','FEB','MAR','APR','MAY','JUN','JUL','AUG','SEP','OCT','NOV','DEC'];
isValidMonth(STRING txt) := txt[1..3] IN SetMonths;
PATTERN Month := VALIDATE(Alpha,isValidMonth(MATCHTEXT)) ;

//the final parsing patterns:
PATTERN NumDate    := Num12 Ws Num12 Ws Year OPT(ws+ Time);
PATTERN AlphaDate1 := Month Ws Num12 Ws Year OPT(ws+ Time);
PATTERN AlphaDate2 := Num12 Ws Month Ws Year OPT(ws+ Time);

//and the RULE to actually do the pattern matching:
RULE DateRule := (NumDate | AlphaDate1 | AlphaDate2);

OutRec := RECORD
  UNSIGNED1 UID;
  UNSIGNED1 PatternType;
  STRING40  InputStr;
  STRING2   Day;
  STRING2   Month;
  STRING4   Year;
  STRING8   YYYYMMDD;
  STRING12  Time;
END;						 

OutRec XF(ds L) := TRANSFORM

  //determine which pattern matched
  WhichPtn := WHICH(MATCHED(NumDate),MATCHED(AlphaDate1),MATCHED(AlphaDate2));

  //output the pattern type and matching input string
  SELF.PatternType := WhichPtn;  
  SELF.UID := L.UID;
  SELF.InputStr := L.s;
		
  //determine if numeric date is in "dd mm" format instead of "mm dd": 
  P1 := IF(WhichPtn = 1,MATCHTEXT(NumDate/Num12[1]),'');
  P2 := IF(WhichPtn = 1,MATCHTEXT(NumDate/Num12[2]),'');
  //if first pair of digits can't be a month, flag as "B"ritish, else "A"merican format
  P3 := IF((UNSIGNED1)P1 > 12,'B','A');  
		
  DayNum := (UNSIGNED1)CHOOSE(WhichPtn,
                              IF(P3 = 'B',P1,P2),           //pattern 1
                              MATCHTEXT(AlphaDate1/Num12),  //pattern 2
                              MATCHTEXT(AlphaDate2/Num12)); //pattern 3
  SELF.Day := IF(DayNum < 10,INTFORMAT(DayNum,2,1),(STRING2)DayNum);

  //determine how the month is represented
  M1 := CHOOSE(WhichPtn,
               '',                           //pattern 1
               MATCHTEXT(AlphaDate1/Alpha),  //pattern 2
               MATCHTEXT(AlphaDate2/Alpha)); //pattern 3
  M2 := CASE(M1[1..3],'JAN' => '01','FEB' => '02','MAR' => '03','APR' => '04','MAY' => '05','JUN' => '06',
                      'JUL' => '07','AUG' => '08','SEP' => '09','OCT' => '10','NOV' => '11','DEC' => '12','');
  FmtMnth(STRING2 m) := IF(LENGTH(TRIM(m)) = 1, '0' + m, m);									
  SELF.Month := IF(WhichPtn = 1,
                   IF(P3 = 'B',FmtMnth(P2),FmtMnth(P1)), //pattern 1
                   M2);                                  //pattern 2 & 3   

  //handle 2 vs 4-digit years
  PYear := CHOOSE(WhichPtn,
                  MATCHTEXT(NumDate/Year),      //pattern 1
                  MATCHTEXT(AlphaDate1/Year),   //pattern 2
                  MATCHTEXT(AlphaDate2/Year));  //pattern 3
  SELF.Year := IF(LENGTH(PYear) = 4,
                  PYear,
                  IF(PYear >= '80','19'+Pyear,'20'+Pyear));

  //and put it all together in a standard format																
  SELF.YYYYMMDD := SELF.Year + SELF.Month + SELF.Day;

  //and standardize the time
  isAMPMtime  := MATCHED(Time/AMPM);
  isAMPMSecs  := MATCHED(Time/AMPM/Num12[3]);
  AMPMhour    := (UNSIGNED1)MATCHTEXT(Time/AMPM/Num12[1]);
  AMPMhourStr := IF(MATCHTEXT(Time/AMPM/Alpha)='PM',
                    ((STRING2)(AMPMhour + 12)),
                    INTFORMAT(AMPMhour,2,1));
  SELF.Time  := MAP(isAMPMtime AND isAMPMSecs => 
                      AMPMhourStr + ':' + MATCHTEXT(Time/AMPM/Num12[2]) 
                                  + ':' + MATCHTEXT(Time/AMPM/Num12[3]) ,
                    isAMPMtime AND NOT isAMPMSecs => 
                      AMPMhourStr + ':' + MATCHTEXT(Time/AMPM/Num12[2]) + ':00',
                    MATCHTEXT(Time/Zulu));
END;

p := PARSE(ds,s,DateRule,XF(LEFT), BEST);

p;
//************************************************************************************
//And using the Date Library parsing function you can get almost as far: 
IMPORT STD;
SetFormats := [ '%m/%d/%Y',   '%d/%m/%Y',   '%m/%d/%y',    '%d/%m/%y',    
                '%m.%d.%Y',   '%d.%m.%Y',   '%m.%d.%y',    '%d.%m.%y',
                '%m-%d-%Y',   '%d-%m-%Y',   '%m-%d-%y',    '%d-%m-%y',
                '%d%t%B%t%y', '%d%t%b%t%y', '%d%t%B.%t%y', '%d%t%b.%t%y',                 
                '%d-%B-%y',   '%d-%B-%Y',   '%d-%b-%y']; 
OutRec2 := RECORD
  UNSIGNED1 UID;
  STRING40  InputStr;
  UNSIGNED4 Date;
END;
pstd := PROJECT(ds,TRANSFORM(OutRec2,SELF.UID:=LEFT.UID,SELF.InputStr:=LEFT.s,
                             SELF.Date:= STD.Date.MatchDateString(LEFT.s,SetFormats) ));
pstd;

//From the Date.ecl file:										 
// /**
 // * Matches a string against a set of date string formats and returns a valid
 // * Date_t object from the first format that successfully parses the string.
 // *
 // * @param date_text     The string to be converted.
 // * @param formats       A set of formats to check against the string.
 // *                      (See documentation for strftime)
 // * @return              The date that was matched in the string.
 // *                      Returns 0 if failed to match.
 // */

// EXPORT Date_t MatchDateString(STRING date_text, SET OF VARSTRING formats) :=
    // StringLib.MatchDate(date_text, formats);

// strftime docs here:		http://www.cplusplus.com/reference/ctime/strftime/
// and here:              http://en.cppreference.com/w/c/chrono/strftime

JOIN(p,pstd, LEFT.UID = RIGHT.UID,
     TRANSFORM({UNSIGNED1 UID,STRING40 InputStr,STRING8 YYYYMMDD, UNSIGNED4 Date, STRING3 Match},
               SELF.Match := IF(LEFT.YYYYMMDD = (STRING8)RIGHT.Date,'yes','NO'); 
               SELF := LEFT; SELF := RIGHT));

Getting Started with HPCC Systems

Getting Started with HPCC Systems

Let’s get started

Detailed documentation

Detailed documentation

Detailed documentation

Check out the Wiki

HPCC Systems Training

HPCC Systems Training

HPCC Systems Training

HPCC Systems Training

Welcome to the HPCC Systems developer community!

Welcome to the HPCC Systems developer community!

Welcome to the HPCC Systems developer community!

Welcome to the HPCC Systems developer community!

Welcome to the HPCC Systems developer community!

Welcome to the HPCC Systems developer community!

Welcome to the HPCC Systems developer community!

Welcome to the HPCC Systems developer community!

Tips and Tricks for ECL — Part 2 — PARSE