ECL Tips – All about the ECL SET

Data PictureBob Foreman, a technical trainer at LexisNexis Risk Solutions, spotlighted ECL SETS, and other functions that use ECL SET during HPCC Systems Tech Talk 21, as part of his monthly “ECL Tips.” Bob is a frequent contributor to our Tech Talks, and provides valuable information on programming with Enterprise Control Language (ECL). 

In this blog, we:

  • Discuss ECL SET basics, ECL SET types, and other features of ECL SET.
  • Examine ECL functions and expressions that incorporate ECL SET.
  • Provide code examples and demonstrate best practices for ECL SET.
  • Discuss how ECL SETs and DATASETs interact.

ECL SET Basics

A SET is as an abstract data type that can store unique values, without any particular order. 

In ECL SET:

  • Values are generally basic data types, but can also be a SET of Datasets.
  • SETS are defined within square brackets.
  • Each element in the SET must have the same data type; these elements may be constant values or expressions.
  • The naming convention for SET definitions is that the name should begin with “SET.”
  • The SET structure has a definition name, definition operator, and square brackets containing values and expressions. Each individual element in the SET is separated by commas.
  • A SET can have duplicate values, and STRINGS do not have to be a constant length.

Now that we’ve described some of the basic characteristics of ECL SET, let’s discuss ECL SET in more detail. 

ECL Definition – Static SET 

Static SET – A Static SET is created as a SET of “explicitly declared” constant values or expressions that must be declared within square brackets, whether that SET is defined as a separate definition or simply included inline in another expression. Static SETS only allow query operations on their elements, such as checking whether a given value is in the SET, or listing the values in some arbitrary order. SET Definitions are typically used in coding for a Static SET.

SET Definition – A SET Definition is defined as any definition whose expression is a SET of values, defined within square brackets. SET definitions can also be defined using the SET function. SETs defined this way may be used like any other SET.

Example of Static SETs: 

SetTrueFalseValues     := [0, 1];
SetSoutheastStates     := ['FL','GA','AL','SC'];
SetStatusCodes           := ['1','X','9','W'];
SetInts               := [1,2+3,45,def1,7*3,def2];

Note:  The SetInts (integer sets) definition must evaluate to an integer, or the compiler will     
return an error.

Next, let’s look at the “workhorse” of the expressions used with ECL SET – the “IN Operator.”

Expressions: “IN” Operator

IN Operator
value IN value_set
value – The value definition or constant to search for in the value_set.
value_set – The set to search.

The IN operator is shorthand for a collection of “OR” conditions. It searches the “value_set” to find a match for the value and returns a Boolean TRUE or FALSE. This eliminates the use of multiple “OR” conditions in a filter. 

Example for the “IN” Operator: 

//SetSoutheastStates := ['FL','GA','AL','SC'];
//BOOLEAN IsSoutheastState(STRING2 state) :=  state IN SetSoutheastStates;
  • In this example, we have an inline Dataset, “SetSoutheastStates := [‘FL’,’GA’,’AL’,’SC’];” of Southeast states. 
  • The states are in the form of a two character string, “(STRING2 state).”
  • The Boolean expression, “BOOLEAN IsSoutheastState(STRING2 state) :=  state IN SetSoutheastStates;”  asks if a specific state is in the SET of Southeast states.  If the state is FL, GA, AL, or SC it will return a TRUE. Otherwise, it will return a FALSE. 

Another type of ECL SET is the Dynamic SET.  Let’s take a closer look at how to create a Dynamic SET.

SET Function – Creating a Dynamic SET

A Dynamic SET is a SET of implicitly declared values or expressions that must be declared within square brackets, whether that SET is defined as a separate definition or simply included in-line in another expression. All values must be of the same type. Unlike the Static SET, the elements of Dynamic SETs are not fixed, but may be added or removed during execution of the program. The SET Function is typically used in a Dynamic SET.

The ECL SET function returns a SET for use in any SET operation (such as the “IN” operator).

SET(recordset, field )
recordset – The SET of records from which to derive the SET of values.
field – The field in the recordset from which to obtain the values.

Example for Dynamic SET:

r := {STRING1 Letter};
SomeFile := DATASET([{'A'},{'B'},{'C'},{'D'},{'E'},{'F'},{'G'},{'H'},{'I'},{'J'}],r);
SetLettersD_J := SET(SomeFile(Letter > 'C'), Letter);    
y := 'A' IN SetLettersD_J;   //results in FALSE         
z := 'D' IN SetLettersD_J;    //results in TRUE
  • In this example, we have an inline Dataset, “SomeFile := DATASET([{‘A’},{‘B’},{‘C’},{‘D’},{‘E’},{‘F’},{‘G’},{‘H’},{‘I’},{‘J’}],r).”
  • The SET function applies a filter designating letters greater than ‘C,’ “SetLettersD_J := SET(SomeFile(Letter > ‘C’), Letter.”
  • The Boolean expression, “y := ‘A’ IN SetLettersD_J”  asks if the letter ‘A’ is in the SET of letters “D” thru “J.” This results in FALSE. 
  • The Boolean expression, “z := ‘D’ IN SetLettersD_J”  asks if the letter ‘D’ is in the SET of letters “D” thru “J.” This results in TRUE.

ECL SET is also a Value Type. The value type that we will examine is “SET OF.”

Value Types – ECL SET 

SET [OF type]
type – The value type of the SET (INTEGER, REAL, BOOLEAN, STRING, UNICODE, DATA, or DATASET(recstruct)). Defaults to INTEGER, if omitted.

Value type, SET OF, defines the name or passed parameter as a SET of values. The keyword “ALL” may be used as the passed parameter default value to indicate all possible values for the SET. Empty square brackets indicate that there is no possible value.

Example for SET OF:

 SET OF INTEGER1 SetCloseCodes := [65,66,90,114,115,123];  

  HasCode(INTEGER code, SET s) := code IN s;
  ClosedAccounts := Accounts(HasCode(acct_status,SetCloseCodes));

  • In this example, there is an inline Dataset, “SET OF INTEGER1 SetCloseCodes := [65,66,90,114,115,123].”
  • For this Dataset, the SET of numbers has been defined as integers. If the “type” of number is not specified, it defaults to an 8 byte integer. 
  • In this example, a SET of 1 byte integers is specified.
  • The function, “HasCode(INTEGER code, SET s) := code IN s,” passes the Dataset, “SET OF INTEGER1 SetCloseCodes := [65,66,90,114,115,123].” So, the word “SET” can be used as one of the function parameters.
  • The function, “HasCode(INTEGER code, SET s) := code IN s,” asks if the code being passed is in the SET that is being passed.
  • When evaluating the “ClosedAccounts” expression, the function is used as a filter. This filter is used to determine if the field, “acct_status” is valid, as each account record is evaluated. 
  • If the “INTEGER code” is any one of the numbers defined in the Dataset, the record will be received, otherwise, the record will be rejected. 

So far, we have discussed the various types of ECL SET. Now let’s look at a feature of ECL SET. ECL SET is implicitly ordered and can be indexed to access specific elements.

SET Ordering and Indexing

In ECL, Declared SETS must be indexed to access individual elements within the SET.

Example for SET Ordering and Indexing:

 SetNums := [5,4,3,2,1];  

LastNumInSet := SetNums [5];    

// LastNumInSet contains the value 1

  • In this example we have Declared SET, “SetNums := [5,4,3,2,1].”
  • The expression, “LastNumInSet := SetNums [5];” asks for the 5th element of that SET.
  • The 5th element in the Declared SET is the number 1.

Another feature of ECL SET is that all “STRINGS can be thought of as a SET of characters.”

All STRINGS are SETs too!

In ECL, every STRING that you encounter might be an implicit SET

Strings (character sets) may also be indexed to access individual characters within the STRING.

Example for indexing STRINGs:

MyString   := 'ABCDE';
MiddleChar := MyString[3];    //MiddleChar contains “C”
  • In this example, there is a STRING of letters, “ MyString     := ‘ABCDE’.”
  • The expression, “MiddleChar := MyString[3]” indexes into the STRING and asks for the third character of that STRING.
  • The answer is “C”.

Now, let’s illustrate how Substrings work. 

Substrings – may be extracted by using 2 periods to separate the beginning and ending element numbers to specify the Substring to extract.

Example for Substrings:

MyString     := 'ABCDE';
MySubString1 := MyString[2..4];   //’BCD’
MySubString2 := MyString[..4];    //’ABCD’
MySubString3 := MyString[2..];    //’BCDE’
  • In Substring expression, “ MySubString1 := MyString[2..4],” “[2..4]” is an inclusive range, and returns Substring ’BCD.’
  • Substring expression, “MySubString2 := MyString[..4],” indicates that the desired Substring starts at the beginning of the STRING and extracts characters up to, and including the 4th character.  This returns Substring ’ABCD.’
  • For Substring expression, “MySubString3 := MyString[2..],” the Substring starts at the 2nd character and go to the end of the STRING. This returns Substring ’BCDE.’

So, those are the basics of ECL SET. 

Now, let’s talk about how a SET and DATASET interact with one another.

SETs and DATASETs

The elements of any SET can be extracted by applying that SET as the first parameter of any inline DATASET: 

IMPORT $;
SetAllStates     := SET($.Persons,State);
SetDS            := DATASET(SetAllStates,{STRING2 State});
SortedSet        := SORT(SetDS,State);
DedupedSet       := DEDUP(SortedSet,State);
CountSet         := COUNT(DedupedSet);
OUTPUT(CountSet);
  • In the expression, “SetAllStates:=SET($.Persons,State),” a Dynamic SET is used to create a SET of states from a recordset or a Dataset. This expression is dropped into a Dataset statement, where it is given an inline record structure. In this example, “STRING 2” is the field that reads from the Dataset. All the individual elements of the SET can be treated as a Dataset. 
  • Everything that can be done with a Dataset (sorting, projecting, parsing, joining, etc.), can done with SET by dropping it into an inline Dataset and refining it. The SET function can then be used to drop them back into a SET.
  • In this example, SORT, DEDUP, and COUNT functions are applied to the Dataset to determine the number of unique states in the recordset.
  • The COUNT function counts the SETS directly. It is possible do a count of a recordset, comma delimited list, and so, because a SET is a comma delimited list that’s enclosed in square brackets, a count of a SET can be done.  The COUNT will tell you how many elements there are in a specific SET.

Other functions that incorporate ECL SET are “RANK” and “RANKED.” 

RANK

A SET in its native form cannot be sorted directly, because a SORT is looking for a recordset. So, unless the data is dropped into a Dataset, a SORT cannot be done. However, SETs can be sorted directly by using the functions “RANK” and “RANKED.” 

RANK(position, set [,DESCEND])
position – An integer indicating the element in the sorted SET to return.
set – The set of values.
DESCEND – Indicates descending order sort.

The RANK function sorts the SET in ascending (or descending) order, then returns the ordinal position (index value) of the unsorted SET’s position element after the SET has been sorted. This is the opposite of RANKED.

Example for the RANK function:

Ranking := RANK(1, [20, 30,10,40]); 
//Returns 2 – element 1 (20), after sorting, is element 2
Ranking := RANK(1, [20, 30,10,40], DESCEND);  
//Returns 3 – element 1 (20), after descending sort, is element 3
  • The RANK function takes the first element in the SET and determines what position it will be after the SET is sorted.
  • In the expression, “Ranking := RANK(1, [20, 30,10,40]),” using the default ASCEND attribute, the first element, 20, is the 2nd element after sorting [10, 20, 30, 40]
  • In the expression, “Ranking := RANK(1, [20, 30,10,40], DESCEND),” using the DESCEND attribute, the first element, 20, is the 3rd element after sorting [40, 30, 20, 10].

RANKED

RANKED(position, set [,DESCEND])
position – An integer indicating the element in the unsorted SET to return.
set – The set of values.
DESCEND – Indicates descending order sort.

The RANKED function sorts the SET in ascending (or descending) order, then returns the ordinal position (index value) of the sorted SET’s position element in the unsorted SET. This is the opposite of RANK.

Example for the RANKED function:

Ranking := RANKED(1, [20, 30,10,40]); 
// Returns 3 – element 1 after sorting (10), was element 3
Ranking := RANKED(1, [20, 30,10,40], DESCEND);
//Returns 4 – element 1 after descending sort (40), was element 4
  • In the expression, “Ranking := RANKED(1, [20, 30,10,40]),” using the default ASCEND attribute, the 1st element after sorting is 10 [10, 20, 30, 40]. This element’s position after the RANKED command is for the unsorted SET [20, 30, 10, 40]. Therefore, the result is the 3rd position.  
  • In the expression, “Ranking := RANKED(1, [20, 30,10,40], DESCEND),” the 1st element in the SET after sorting in DESCENDING order is 40 [40, 30, 20, 10]. This element’s position after the RANKED command is for the unsorted SET [20, 30, 10, 40].  Therefore, the result is the 4th position.  

Another function that uses ECL SET is “MERGEJOIN.”

MERGEJOIN

The MERGEJOIN function is a variation of the SET OF DATASETs forms of the MERGE and JOIN functions. Like MERGE, it merges records from the SET into a single result SET, but like JOIN, it uses the condition and type to determine which records to include in the result SET. It includes all records, unchanged, from the SET that match the condition.

MERGEJOIN(set,condition,SORTED(fields)[,type])
set – The set of Datasets to process. 
condition – The expression that specifies how to match records between the Datasets.
SORTED – Specifies the sort order of records in the input SET and also the output sort order of the result SET.
fields –  A comma-delimited list of fields in the setofdatasets, which must be a subset of the input sort order. These fields must all be used in the joincondition as they define the order in which the fields are STEPPED.
type – The type of join to perform (default is an inner join).

Example for MERGEJOIN:

Rec := RECORD
STRING1 Letter;
UNSIGNED1 DS;
END;    
ds1 := DATASET([{'A',1},{'B',1},{'C',1},{'D',1},{'E',1}],Rec);
ds2 := DATASET([{'A',2},{'B',2},{'H',2},{'I',2},{'J',2}],Rec);
ds3 := DATASET([{'B',3},{'C',3},{'M',3},{'N',3},{'O',3}],Rec);
SetDS := [ds1,ds2,ds3];
j1 := MERGEJOIN(SetDS, SORTED(Letter));      //e.g., ds1.field = ds2.field AND ds2.field = ds3.field
j2 := MERGEJOIN(SetDS, SORTED(Letter),MOFN(3,4));
OUTPUT(j1);
OUTPUT (j2);
//MOFN: One record for every record with matching records in min number of adjacent Datasets //within the setofdatasets. If max is specified, the record is not included if max number of dataset //matches are exceeded.
  • In this example there are three Datasets, ds1, ds2, and ds3. SetDS, defined by “SetDS := [ds1,ds2,ds3],” is applied to the MERGEJOIN. The MERGEJOIN merges the three Datasets into a single SET, based on the join condition.
  • In this example, we’re looking for matching left letters and right letters, per expression, “j1 := MERGEJOIN(SetDS, SORTED(Letter)).” MERGEJOIN looks for common letters in all three Datasets. The only records that have a matching letter in all three Datasets is the letter ‘B.’  
  • In the expression, j2 := MERGEJOIN(SetDS, SORTED(Letter),MOFN(3,4)),” attribute ‘MOFN’ returns the matches of the value N. The first parameter is the minimum matches that it will return, and the second parameter is the maximum matches that it will return. 

NOTE: MERGEJOIN is best used in ROXI and HTHOR. If used in THOR, it defaults to local
              operations.

Finally, the last function that we will discuss is the “RANGE” function. 

RANGE

The RANGE function extracts a subset of the “setofdatasets” as a SET. The “setofintegers” specifies which elements of the “setofdatasets” comprise the resulting SET of Datasets. This is typically used in the GRAPH function.

RANGE(setofdatasets, setofintegers )
setofdatasets – A set of Datasets.
setofintegers – A set of integers.

Example of the RANGE function:

r          := {STRING1 Letter};

ds1     := DATASET([{'A'},{'B'},{'C'},{'D'},{'E'}],r);

ds2     := DATASET([{'F'},{'G'},{'H'},{'I'},{'J'}],r);

ds3     := DATASET([{'K'},{'L'},{'M'},{'N'},{'O'}],r);

ds4     := DATASET([{'P'},{'Q'},{'R'},{'S'},{'T'}],r);

SetDS := [ds1,ds2,ds3,ds4];

outDS := RANGE(setDS,[1,3]);    //use only 1st and 3rd elements

OUTPUT(outDS[1]);        //results in A,B,C,D,E

OUTPUT(outDS[2]);         //results in K,L,M,N,O
  • In this example, the RANGE function in expression, “outDS := RANGE(setDS,[1,3]),” extracts a subset of the setofdatasets, “SetDS := [ds1,ds2,ds3,ds4].” The elements specified are the 1st and 3rd elements. 
  • The resulting subset for, “ OUTPUT(outDS[1])” is ‘A,B,C,D,E.’
  • The resulting subset for, “ OUTPUT(outDS[2])” is ‘K,L,M,N,O.’

Summary 

ECL SET is a great tool to use when creating queries.  It streamlines code by using various expressions and functions to “refine” data.  

More information about ECL SET and related functions can be found in the ECL Language Reference Manual. Please use the link below to access the document. 

About Bob Foreman

Since 2011, Bob Foreman has worked with the HPCC Systems technology platform and the ECL programming language, and has been a technical trainer for over 25 years. He is the developer and designer of the HPCC Systems Online Training Courses, and is the Senior Instructor for all classroom and WebEx/Lync based training.

If you would like to watch Bob Foreman’s Tech Talk video, “All About the ECL SET,” please use the following link:

Acknowledgments

A special thank you goes to Bob Foreman and Richard Taylor for their guidance and valuable contributions to this blog post.