Skip to main content

DATASET

attr := DATASET( file, struct, filetype [,LOOKUP]);

attr := DATASET( dataset, file, filetype [,LOOKUP]);

attr := DATASET( WORKUNIT( [ wuid , ] namedoutput ), struct );

[ attr := ] DATASET( recordset [, recstruct ] );

DATASET( row )

DATASET( childstruct [, COUNT( count ) | LENGTH( size ) ] [, CHOOSEN( maxrecs ) ] )

[GROUPED] [LINKCOUNTED] [STREAMED] DATASET( struct )

DATASET( dict )

DATASET( count, transform [, DISTRIBUTED | LOCAL ] )

attrThe name of the DATASET for later use in other definitions.
fileA string constant containing the logical file name. See the Scope & Logical Filenames section for more on logical filenames.
structThe RECORD structure defining the layout of the fields. This may use RECORDOF.
filetypeOne of the following keywords, optionally followed by relevant options for that specific type of file: THOR /FLAT, CSV, XML, JSON, PIPE. Each of these is discussed in its own section, below.
datasetA previously-defined DATASET or recordset from which the record layout is derived. This form is primarily used by the BUILD action and is equivalent to:
      ds := DATASET('filename',RECORDOF(anotherdataset), ... )
LOOKUPOptional. Specifies that the file layout should be looked up at compile time. See File Layout Resolution at Compile Time in the Programmer's Guide for more details.
WORKUNITSpecifies the DATASET is the result of an OUTPUT with the NAMED option within the same or another workunit.
wuidOptional. A string expression that specifies the workunit identifier of the job that produced the NAMED OUTPUT.
namedoutputA string expression that specifies the name given in the NAMED option.
recordset

A set of in-line data records. This can simply name a previously-defined set definition or explicitly use square brackets to indicate an in-line set definition. Within the square brackets records are separated by commas. The records are specified by either:

1) Using curly braces ({}) to surround the field values for each record. The field values within each record are comma-delimited.

2) A comma-delimited list of in-line transform functions that produce the data rows. All the transform functions in the list must produce records in the same result format.

recstructOptional. The RECORD structure of the recordset. Omittable only if the recordset parameter is just one record or a list of in-line transform functions.
rowA single data record. This may be a single-record passed parameter, or the ROW or PROJECT function that defines a 1-row dataset.
childstructThe RECORD structure of the child records being defined. This may use the RECORDOF function.
COUNTOptional. Specifies the number of child records attached to the parent (for use when interfacing to external file formats).
countAn expression defining the number of child records. This may be a constant or a field in the enclosing RECORD structure (addressed as SELF.fieldname).
LENGTHOptional. Specifies the size of the child records attached to the parent (for use when interfacing to external file formats).
sizeAn expression defining the size of child records. This may be a constant or a field in the enclosing RECORD structure (addressed as SELF.fieldname).
CHOOSENOptional. Limits the number of child records attached to the parent. This implicitly uses the CHOOSEN function wherever the child dataset is read.
maxrecsAn expression defining the maximum number of child records for a single parent.
GROUPEDSpecifies the DATASET being passed has been grouped using the GROUP function.
LINKCOUNTEDSpecifies the DATASET being passed or returned uses the link counted format (each row is stored as a separate memory allocation) instead of the default (embedded) format where the rows of a dataset are all stored in a single block of memory. This is primarily for use in BEGINC++ functions or external C++ library functions.
STREAMEDSpecifies the DATASET being returned is returned as a pointer to an IRowStream interface (see the eclhelper.hpp include file for the definition).Valid only as a return type. This is primarily for use in BEGINC++ functions or external C++ library functions.
structThe RECORD structure of the dataset field or parameter. This may use the RECORDOF function.
dictThe name of a DICTIONARY definition.
countAn integer expression specifying the number of records to create.
transformThe TRANSFORM function that will create the records. This may take an integer COUNTER parameter.
DISTRIBUTEDOptional. Specifies distributing the created records across all nodes of the cluster. If omitted, all records are created on node 1.
LOCALOptional. Specifies records are created on every node.

The DATASET declaration defines a file of records, on disk or in memory. The layout of the records is specified by a RECORD structure (the struct or recstruct parameters described above). The distribution of records across execution nodes is undefined in general, as it depends on how the DATASET came to be (sprayed in from a landing zone or written to disk by an OUTPUT action), the size of the cluster on which it resides, and the size of the cluster on which it is used (to specify distribution requirements for a particular operation, see the DISTRIBUTE function).

The first two forms are alternatives to each other and either may be used with any of the filetypes described below (THOR/FLAT, CSV, XML, JSON, PIPE).

The third form defines the result of an OUTPUT with the NAMED option within the same workunit or the workunit specified by the wuid (see Named Output DATASETs below).

The fourth form defines an in-line dataset (see In-line DATASETs below).

The fifth form is only used in an expression context to allow you to in-line a single record dataset (see Single-row DATASET Expressions below).

The sixth form is only used as a value type in a RECORD structure to define a child dataset (see Child DATASETs below).

The seventh form is only used as a value type to pass DATASET parameters (see DATASET as a Parameter Type below).

The eighth form is used to define a DICTIONARY as a DATASET (see DATASET from DICTIONARY below).

The ninth form is used to create a DATASET using a TRANSFORM function (see DATASET from TRANSFORM below)

THOR/FLAT Files

attr := DATASET( file, struct, THOR [,__COMPRESSED__][,OPT ] [,UNSORTED][,PRELOAD([nbr])] [,ENCRYPT(key) ]);

attr := DATASET( file, struct, FLAT [,__COMPRESSED__] [,OPT] [,UNSORTED] [,PRELOAD([nbr])] [,ENCRYPT(key) ]);

THORSpecifies the file is in the Data Refinery (may optionally be specified as FLAT, which is synonymous with THOR in this context).
__COMPRESSED__Optional. Specifies that the THOR file is compressed because it is a result of the PERSIST Workflow Service or was OUTPUT with the COMPRESSED option.
__GROUPED__Specifies the DATASET has been grouped using the GROUP function.
OPTOptional. Specifies that using dataset when the THOR file doesn't exist results in an empty recordset instead of an error condition.
UNSORTEDOptional. Specifies the THOR file is not sorted, as a hint to the optimizer.
PRELOADOptional. Specifies the file is left in memory after loading (valid only for Rapid Data Delivery Engine use).
nbrOptional. An integer constant specifying how many indexes to create "on the fly" for speedier access to the dataset. If > 1000, specifies the amount of memory set aside for these indexes.
ENCRYPTOptional. Specifies the file was created by OUTPUT with the ENCRYPT option.
keyA string constant containing the encryption key used to create the file.

This form defines a THOR file that exists in the Data Refinery. This could contain either fixed-length or variable-length records, depending on the layout specified in the RECORD struct.

The struct may contain an UNSIGNED8 field with either {virtual(fileposition)} or {virtual(localfileposition)} appended to the field name. This indicates the field contains the record's position within the file (or part), and is used for those instances where a usable pointer to the record is needed, such as the BUILD function.

Example:

PtblRec := RECORD
  STRING2 State := Person.per_st;
  STRING20 City := Person.per_full_city;
  STRING25 Lname := Person.per_last_name;
  STRING15 Fname := Person.per_first_name;
END;
          
Tbl := TABLE(Person,PtblRec);
         
PtblOut := OUTPUT(Tbl,,'RTTEMP::TestFile');
          //write a THOR file
         
Ptbl := DATASET('~Thor400::RTTEMP::TestFile',
                {PtblRec,UNSIGNED8 __fpos {virtual(fileposition)}},
                THOR,OPT);
             // __fpos contains the "pointer" to each record
             // Thor400 is the scope name and RTTEMP is the
             // directory in which TestFile is located
             //using ENCRYPT
OUTPUT(Tbl,,'~Thor400::RTTEMP::TestFileEncrypted',ENCRYPT('mykey'));
PtblE := DATASET('~Thor400::RTTEMP::TestFileEncrypted',
                 PtblRec,
                 THOR,OPT,ENCRYPT('mykey'));

CSV Files

attr := DATASET( file, struct, CSV [ ( [ HEADING( n ) ] [, SEPARATOR( f_delimiters ) ]

[, TERMINATOR( r_delimiters ) ] [, QUOTE( characters ) ] [, ESCAPE( esc ) ] [, MAXLENGTH( size ) ]

[ ASCII | EBCDIC | UNICODE ] [, NOTRIM ]) ] [,ENCRYPT(key) ] [, __COMPRESSED__]);

CSVSpecifies the file is a "comma separated values" ASCII file.
HEADING(n)Optional. The number of header records in the file. If omitted, the default is zero (0).
SEPARATOROptional. The field delimiter. If omitted, the default is a comma (',') or the delimiter specified in the spray operation that put the file on disk.
f_delimitersA single string constant, or set of string constants, that define the character(s) used as the field delimiter. If Unicode constants are used, then the UTF8 representation of the character(s) will be used.
TERMINATOR

Optional. The record delimiter. If omitted, the default is a line feed ('\n') or the delimiter specified in the spray operation that put the file on disk.

r_delimiters

A single string constant, or set of string constants, that define the character(s) used as the record delimiter.

QUOTE

Optional. The string quote character used. If omitted, the default is a single quote ('\'') or the delimiter specified in the spray operation that put the file on disk.

characters

A single string constant, or set of string constants, that define the character(s) used as the string value delimiter.

ESCAPE

Optional. The string escape character used to indicate the next character (usually a control character) is part of the data and not to be interpreted as a field or row delimiter. If omitted, the default is the escape character specified in the spray operation that put the file on disk (if any).

esc

A single string constant, or set of string constants, that define the character(s) used to escape control characters.

MAXLENGTH(size)

Optional. Maximum record length in the file in bytes. If omitted, the default is 4096. There is a hard limit of 10MB but that can be overridden using #OPTION(maxCSVRowSizeMb,nn) where nn is the maximum size in MB. The maximum record size should be set as conservatively as possible.

ASCII

Specifies all input is in ASCII format, including any EBCDIC or UNICODE fields.

EBCDIC

Specifies all input is in EBCDIC format except the SEPARATOR and TERMINATOR (which are expressed as ASCII values).

UNICODE

Specifies all input is in Unicode UTF8 format.

NOTRIM

Specifies preserving all whitespace in the input data (the default is to trim leading blanks).

ENCRYPT

Optional. Specifies the file was created by OUTPUT with the ENCRYPT option.

key

A string constant containing the encryption key used to create the file.

__COMPRESSED__Optional. Specifies that the file is compressed because it was OUTPUT with the COMPRESSED option.

This form is used to read an ASCII CSV file. This can also be used to read any variable-length record file that has a defined record delimiter. If none of the ASCII, EBCDIC, or UNICODE options are specified, the default input is in ASCII format with any UNICODE fields in UTF8 format.

Example:

CSVRecord := RECORD
  UNSIGNED4 person_id;
  STRING20 per_surname;
  STRING20 per_forename;
END;

file1 := DATASET('MyFile.CSV',CSVrecord,CSV);            //all defaults
file2 := DATASET('MyFile.CSV',CSVrecord,CSV(HEADING(1)); //1 header
file3 := DATASET('MyFile.CSV',
                 CSVrecord,
                 CSV(HEADING(1),
                     SEPARATOR([',','\t']),
                     TERMINATOR(['\n','\r\n','\n\r'])));
          //1 header record, either comma or tab field delimiters,
          // either LF or CR/LF or LF/CR record delimiters

XML Files

attr := DATASET( file, struct, XML( xpath [, NOROOT ] ) [,ENCRYPT(key) ]);

XMLSpecifies the file is an XML file.
xpathA string constant containing the full XPATH to the tag that delimits the records in the file.
NOROOTSpecifies the file is an XML file with no file tags, only row tags.
ENCRYPTOptional. Specifies the file was created by OUTPUT with the ENCRYPT option.
keyA string constant containing the encryption key used to create the file.

This form is used to read an XML file into the Data Refinery. The xpath parameter defines the record delimiter tag using a subset of standard XPATH (www.w3.org/TR/xpath) syntax (see the XPATH Support section under the RECORD structure discussion for a description of the supported subset).

The key to getting individual field values from the XML lies in the RECORD structure field definitions. If the field name exactly matches a lower case XML tag containing the data, then nothing special is required. Otherwise, {xpath(xpathtag)} appended to the field name (where the xpathtag is a string constant containing standard XPATH syntax) is required to extract the data. An XPATH consisting of empty angle brackets (<>) indicates the field receives the entire record. An absolute XPATH is used to access properties of parent elements. Because XML is case sensitive, and ECL identifiers are case insensitive, xpaths need to be specified if the tag contains any upper case characters.

NOTE: XML reading and parsing can consume a large amount of memory, depending on the usage. In particular, if the specified xpath matches a very large amount of data, then a large data structure will be provided to the transform. Therefore, the more you match, the more resources you consume per match. For example, if you have a very large document and you match an element near the root that virtually encompasses the whole thing, then the whole thing will be constructed as a referenceable structure that the ECL can get at.

Example:

/* an XML file called "MyFile" contains this XML data:
<library>
  <book isbn="123456789X">
    <author>Bayliss</author>
    <title>A Way Too Far</title>
  </book>
  <book isbn="1234567801">
    <author>Smith</author>
    <title>A Way Too Short</title>
  </book>
</library>
*/

rform := RECORD
  STRING author; //data from author tag -- tag name is lowercase and matches field name
  STRING name {XPATH('title')}; //data from title tag, renaming the field
  STRING isbn {XPATH('@isbn')}; //isbn definition data from book tag
tag
END;
books := DATASET('MyFile',rform,XML('library/book'));

JSON Files

attr := DATASET( file, struct, JSON( xpath [, NOROOT ] ) [,ENCRYPT(key) ]);

JSONSpecifies the file is a JSON file.
xpathA string constant containing the full XPATH to the tag that delimits the records in the file.
NOROOTSpecifies the file is a JSON file with no root level markup, only a collection of objects.
ENCRYPTOptional. Specifies the file was created by OUTPUT with the ENCRYPT option.
keyA string constant containing the encryption key used to create the file.

This form is used to read a JSON file. The xpath parameter defines the path used to locate records within the JSON content using a subset of standard XPATH (www.w3.org/TR/xpath) syntax (see the XPATH Support section under the RECORD structure discussion for a description of the supported subset).

The key to getting individual field values from the JSON lies in the RECORD structure field definitions. If the field name exactly matches a lower case JSON tag containing the data, then nothing special is required. Otherwise, {xpath(xpathtag)} appended to the field name (where the xpathtag is a string constant containing standard XPATH syntax) is required to extract the data. An XPATH consisting of empty quotes ('') indicates the field receives the entire record. An absolute XPATH is used to access properties of child elements. Because JSON is case sensitive, and ECL identifiers are case insensitive, xpaths need to be specified if the tag contains any upper case characters.

NOTE: JSON reading and parsing can consume a large amount of memory, depending on the usage. In particular, if the specified xpath matches a very large amount of data, then a large data structure will be provided to the transform. Therefore, the more you match, the more resources you consume per match. For example, if you have a very large document and you match an element near the root that virtually encompasses the whole thing, then the whole thing will be constructed as a referenceable structure that the ECL can get at.

Example:

/* a JSON  file called "MyBooks.json" contains this data:
[
  {
    "id" : "978-0641723445",
    "name" : "The Lightning Thief",
    "author" : "Rick Riordan"
  }
,
  {
    "id" : "978-1423103349",
    "name" : "The Sea of Monsters",
    "author" : "Rick Riordan"
  }
]
*/

BookRec := RECORD
  STRING ID {XPATH('id')}; //data from id tag -- renames field to uppercase
  STRING title {XPATH('name')}; //data from name tag, renaming the field
  STRING author; //data from author tag -- tag name is lowercase and matches field name  
END;

books := DATASET('~jd::mybooks.json',BookRec,JSON('/'));
OUTPUT(books);

PIPE Files

attr := DATASET( file, struct, PIPE( command [, CSV | XML ]) );

PIPESpecifies the filecomes from the commandprogram. This is a "read" pipe.
commandThe name of the program to execute, which must output records in the struct format to standard output.
CSVOptional. Specifies the output data format is CSV. If omitted, the format is raw.
XMLOptional. Specifies the output data format is XML. If omitted, the format is raw.

This form uses PIPE(command) to send the file to the command program, which then returns the records to standard output in the struct format. This is also known as an input PIPE (analogous to the PIPE function and PIPE option on OUTPUT).

Example:

PtblRec := RECORD
  STRING2 State;
  STRING20 City;
  STRING25 Lname;
  STRING15 Fname;
END;
         
Ptbl := DATASET('~Thor50::RTTEMP::TestFile',
                PtblRec,
                PIPE('ProcessFile'));
          // ProcessFile is the input pipe

Named Output DATASETs

attr := DATASET( WORKUNIT( [ wuid , ] namedoutput ), struct );

This form allows you to use as a DATASET the result of an OUTPUT with the NAMED option within the same workunit, or the workunit specified by the wuid (workunit ID). This is a feature most useful in the Rapid Data Delivery Engine.

Example:

//Named Output DATASET in the same workunit:
a := OUTPUT(Person(per_st='FL') ,NAMED('FloridaFolk'));
x := DATASET(WORKUNIT('FloridaFolk'),
             RECORDOF(Person));
b := OUTPUT(x(per_first_name[1..4]='RICH'));
          
SEQUENTIAL(a,b);

//Named Output DATASET in separate workunits:
//First Workunit (wuid=W20051202-155102) contains this code:
MyRec := {STRING1 Value1,STRING1 Value2, INTEGER1 Value3};
SomeFile := DATASET([{'C','G',1},{'C','C',2},{'A','X',3},
                     {'B','G',4},{'A','B',5}],MyRec);
OUTPUT(SomeFile,NAMED('Fred'));

// Second workunit contains this code, producing the same result:
ds := DATASET(WORKUNIT('W20051202-155102','Fred'), MyRec);
OUTPUT(ds);

In-line DATASETs

[ attr := ] DATASET( recordset , recstruct );

This form allows you to in-line a set of data and have it treated as a file. This is useful in situations where file operations are needed on dynamically generated data (such as the runtime values of a set of pre-defined expressions). It is also useful to test any boundary conditions for definitions by creating a small well-defined set of records with constant values that specifically exercise those boundaries. This form may be used in an expression context.

Nested RECORD structures may be represented by nesting records within records. Nested child datasets may also be initialized inside TRANSFORM functions using inline datasets (see the Child DATASETs discussion).

Example:

//Inline DATASET using definition values
myrec := {REAL diff, INTEGER1 reason};
rms5008 := 10.0;
rms5009 := 11.0;
rms5010 := 12.0;
btable := DATASET([{rms5008,72},{rms5009,7},{rms5010,65}], myrec);
          
//Inline DATASET with nested RECORD structures
nameRecord := {STRING20 lname,STRING10 fname,STRING1 initial := ''};
personRecord := RECORD
  nameRecord primary;
  nameRecord mother;
  nameRecord father;
END;
personDataset := DATASET([{{'James','Walters','C'},
                           {'Jessie','Blenger'},
                           {'Horatio','Walters'}},
                          {{'Anne','Winston'},
                           {'Sant','Aclause'},
                           {'Elfin','And'}}], personRecord);
        
        
// Inline DATASET containing a Child DATASET
childPersonRecord := {STRING fname,UNSIGNED1 age};
personRecord := RECORD
  STRING20 fname;
  STRING20 lname;
  UNSIGNED2 numChildren;
  DATASET(childPersonRecord) children;
END;

personDataset := DATASET([{'Kevin','Hall',2,[{'Abby',2},{'Nat',2}]},
                          {'Jon','Simms',3,[{'Jen',18},{'Ali',16},{'Andy',13}]}],
                         personRecord);
         
         
// Inline DATASET derived from a dynamic SET function
SetIDs(STRING fname) := SET(People(firstname=fname),id);
ds := DATASET(SetIDs('RICHARD'),{People.id});
         
// Inline DATASET derived from a list of transforms
IDtype := UNSIGNED8;
FMtype := STRING15;
Ltype := STRING25;

resultRec := RECORD
  IDtype id;
  FMtype firstname;
  Ltype lastname;
  FMtype middlename;
END;
          
T1(IDtype idval,FMtype fname,Ltype lname ) :=
  TRANSFORM(resultRec,
            SELF.id := idval,
            SELF.firstname := fname,
            SELF.lastname := lname,
            SELF := []);
          
T2(IDtype idval,FMtype fname,FMtype mname, Ltype lname ) :=
  TRANSFORM(resultRec,
            SELF.id := idval,
            SELF.firstname := fname,
            SELF.middlename := mname,
            SELF.lastname := lname);
ds := DATASET([T1(123,'Fred','Jones'),
               T2(456,'John','Q','Public'),
               T1(789,'Susie','Smith')]);

// You can construct a DATASET from a SET.
SET OF STRING s := ['Jim','Bob','Richard','Tom'];
DATASET(s,{STRING txt});

Single-row DATASET Expressions

DATASET( row )

This form is only used in an expression context. It allows you to in-line a single record dataset.

Example:

//the following examples demonstrate 4 ways to do the same thing:
personRecord := RECORD
  STRING20 surname;
  STRING10 forename;
  INTEGER2 age := 25;
END;
         
namesRecord := RECORD
  UNSIGNED     id;
  personRecord;
END;
          
namesTable := DATASET('RTTEST::TestRow',namesRecord,THOR);
//simple dataset file declaration form
         
addressRecord := RECORD
  UNSIGNED         id;
  DATASET(personRecord) people;   //child dataset form
  STRING40       street;
  STRING40       town;
  STRING2        st;
END;
         
personRecord tc0(namesRecord L) := TRANSFORM
  SELF := L;
END;
 
//** 1st way - using in-line dataset form in an expression  context
addressRecord t0(namesRecord L) := TRANSFORM
  SELF.people := PROJECT(DATASET([{L.id,L.surname,L.forename,L.age}],
                                 namesRecord),
                         tc0(LEFT));
  SELF.id := L.id;
  SELF := [];
END;
 
p0 := PROJECT(namesTable, t0(LEFT));
OUTPUT(p0);
 
//** 2nd way - using single-row dataset form
addressRecord t1(namesRecord L) := TRANSFORM
  SELF.people := PROJECT(DATASET(L), tc0(LEFT));
  SELF.id := L.id;
  SELF := [];
END;

p1 := PROJECT(namesTable, t1(LEFT));
OUTPUT(p1);

//** 3rd way - using single-row dataset form and ROW function
addressRecord t2(namesRecord L) := TRANSFORM
  SELF.people := DATASET(ROW(L,personRecord));
  SELF.id := L.id;
  SELF := [];
END;

p2 := PROJECT(namesTable, t2(LEFT));
OUTPUT(p2);

//** 4th way - using in-line dataset form in an expression context
addressRecord t4(namesRecord l) := TRANSFORM
  SELF.people := PROJECT(DATASET([L], namesRecord), tc0(LEFT));
  SELF.id := L.id;
  SELF := [];
END;
p3 := PROJECT(namesTable, t4(LEFT));
OUTPUT(p3);

Child DATASETs

DATASET( childstruct [, COUNT( count ) | LENGTH( size ) ] [, CHOOSEN( maxrecs ) ] )

This form is used as a value type inside a RECORD structure to define child dataset records in a non-normalized flat file. The form without COUNT or LENGTH is the simplest to use, and just means that the dataset the length and data are stored within myfield. The COUNT form limits the number of elements to the count expression. The LENGTH form specifies the size in another field instead of the count. This can only be used for dataset input.

The following alternative syntaxes are also supported:

childstruct fieldname [ SELF.count ]

DATASET newname := fieldname

DATASET fieldname (deprecated form -- will go away post-SR9)

Any operation may be performed on child datasets in hthor and the Rapid Data Delivery Engine (Roxie), but only the following operations are supported in the Data Refinery (Thor):

1) PROJECT, CHOOSEN, TABLE (non-grouped), and filters on child tables.

2) Aggregate operations are allowed on any of the above

3) Several aggregates can be calculated at once by using

          summary := TABLE(x.children,{ f1 := COUNT(GROUP),
                                        f2 := SUM(GROUP,x),
                                        f3 := MAX(GROUP,y)});
          summary.f1;

4) DATASET[n] is supported to index the child elements

5) SORT(dataset, a, b)[1] is also supported to retrieve the best match.

6) Concatenation of datasets is supported.

7) Temporary TABLEs can be used in conjunction.

8) Initialization of child datasets in temp TABLE definitions allows [ ] to be used to initialize 0 elements.

Note that,

TABLE(ds, { ds.id, ds.children(age != 10) });

is not supported, because a dataset in a record definition means "expand all the fields from the dataset in the output." However adding an identifier creates a form that is supported:

TABLE(ds, { ds.id, newChildren := ds.children(age != 10); });

Example:

ParentRec := {INTEGER1 NameID, STRING20 Name};
ParentTable := DATASET([{1,'Kevin'},{2,'Liz'},
                        {3,'Mr Nobody'},{4,'Anywhere'}], ParentRec);
ChildRec := {INTEGER1 NameID, STRING20 Addr};
ChildTable := DATASET([ {1,'10 Malt Lane'},{2,'10 Malt Lane'},
                        {2,'3 The cottages'},{4,'Here'},{4,'There'},
                        {4,'Near'},{4,'Far'}],ChildRec);
DenormedRec := RECORD
  INTEGER1 NameID;
  STRING20 Name;
  UNSIGNED1 NumRows;
  DATASET(ChildRec) Children;
//  ChildRec Children;   //alternative syntax
END;
 
DenormedRec ParentMove(ParentRec L) := TRANSFORM
  SELF.NumRows := 0;
  SELF.Children := [];
  SELF := L;
END;

ParentOnly := PROJECT(ParentTable, ParentMove(LEFT));
DenormedRec ChildMove(DenormedRec L,ChildRec R,INTEGER C):=TRANSFORM
  SELF.NumRows := C;
  SELF.Children := L.Children + R;
  SELF := L;
END;
DeNormedRecs := DENORMALIZE(ParentOnly, ChildTable,
                            LEFT.NameID = RIGHT.NameID,
                            ChildMove(LEFT,RIGHT,COUNTER));
OUTPUT(DeNormedRecs,,'RTTEMP::TestChildDatasets');

// Using inline DATASET in a TRANSFORM to initialize child records
AkaRec := {STRING20 forename,STRING20 surname};
outputRec := RECORD
  UNSIGNED id;
  DATASET(AkaRec) children;
END;
 
inputRec := RECORD
  UNSIGNED id;
  STRING20 forename;
  STRING20 surname;
END;
 
inPeople := DATASET([
         {1,'Kevin','Halliday'},{1,'Kevin','Hall'},{1,'Gawain',''},
         {2,'Liz','Halliday'},{2,'Elizabeth','Halliday'},
         {2,'Elizabeth','MaidenName'},{3,'Lorraine','Chapman'},
         {4,'Richard','Chapman'},{4,'John','Doe'}], inputRec);
outputRec makeFatRecord(inputRec l) := TRANSFORM
  SELF.id := l.id;
  SELF.children := DATASET([{ l.forename, l.surname }], AkaRec);
END;

fatIn := PROJECT(inPeople, makeFatRecord(LEFT));
outputRec makeChildren(outputRec l, outputRec r) := TRANSFORM
  SELF.id := l.id;
  SELF.children := l.children + ROW({r.children[1].forename,
                                     r.children[1].surname},
                                     AkaRec);
END;

r := ROLLUP(fatIn, id, makeChildren(LEFT, RIGHT));

DATASET as a Parameter Type

[GROUPED] [LINKCOUNTED] [STREAMED] DATASET( struct )

This form is only used as a Value Type for passing parameters, specifying function return types, or defining a SET OF datasets. If GROUPED is present, the passed parameter must have been grouped using the GROUP function. The LINKCOUNTED and STREAMED keywords are primarily for use in BEGINC++ functions or external C++ library functions.

Example:

MyRec := {STRING1 Letter};
SomeFile := DATASET([{'A'},{'B'},{'C'},{'D'},{'E'}],MyRec);
         
//Passing a DATASET parameter
FilteredDS(DATASET(MyRec) ds) := ds(Letter NOT IN ['A','C','E']);
                  //passed dataset referenced as "ds" in expression
          
OUTPUT(FilteredDS(SomeFile));

//*****************************************************************
// The following example demonstrates using DATASET as both a
// parameter type and a return type
rec_Person := RECORD
  STRING20 FirstName;
  STRING20 LastName;
END;

rec_Person_exp := RECORD(rec_Person)
  STRING20 NameOption;
END;

rec_Person_exp xfm_DisplayNames(rec_Person l, INTEGER w) :=
    TRANSFORM
  SELF.NameOption :=
           CHOOSE(w,
                  TRIM(l.FirstName) + ' ' + l.LastName,
                  TRIM(l.LastName) + ', ' + l.FirstName,
                  l.FirstName[1] + l.LastName[1],
                  l.LastName);
  SELF := l;
END;

DATASET(rec_Person_exp) prototype(DATASET(rec_Person) ds) :=
     DATASET( [], rec_Person_exp );

DATASET(rec_Person_exp) DisplayFullName(DATASET(rec_Person) ds) :=
     PROJECT(ds, xfm_DisplayNames(LEFT,1));

DATASET(rec_Person_exp) DisplayRevName(DATASET(rec_Person) ds) :=
     PROJECT(ds, xfm_DisplayNames(LEFT,2));

DATASET(rec_Person_exp) DisplayFirstName(DATASET(rec_Person) ds) :=
     PROJECT(ds, xfm_DisplayNames(LEFT,3));

DATASET(rec_Person_exp) DisplayLastName(DATASET(rec_Person) ds) :=
     PROJECT(ds, xfm_DisplayNames(LEFT,4));

DATASET(rec_Person_exp) PlayWithName(DATASET(rec_Person) ds_in,
                                     prototype PassedFunc,
                                     STRING1 SortOrder='A',
                                     UNSIGNED1 FieldToSort=1,
                                     UNSIGNED1 PrePostFlag=1) := FUNCTION
  FieldPre := CHOOSE(FieldToSort,ds_in.FirstName,ds_in.LastName);
  SortedDSPre(DATASET(rec_Person) ds) :=
      IF(SortOrder='A',
         SORT(ds,FieldPre),
         SORT(ds,-FieldPre));
  InDS := IF(PrePostFlag=1,SortedDSPre(ds_in),ds_in);
  
  PDS := PassedFunc(InDS); //call the passed function parameter
         
  FieldPost := CHOOSE(FieldToSort,
                      PDS.FirstName, 
                      PDS.LastName,
                      PDS.NameOption);
  SortedDSPost(DATASET(rec_Person_exp) ds) :=
        IF(SortOrder = 'A',
          SORT(ds,FieldPost),
          SORT(ds,-FieldPost));
      
  OutDS := IF(PrePostFlag=1,PDS,SortedDSPost(PDS));
  RETURN OutDS;
END;

    //define inline datasets to use.
ds_names1 := DATASET( [{'John','Smith'},{'Henry','Jackson'},
                       {'Harry','Potter'}], rec_Person );
ds_names2 := DATASET( [ {'George','Foreman'},
                        {'Sugar Ray','Robinson'},
                        {'Joe','Louis'}], rec_Person );
          

//get name you want by passing the appropriate function parameter:
s_Name1 := PlayWithName(ds_names1, DisplayFullName, 'A',1,1);
s_Name2 := PlayWithName(ds_names2, DisplayRevName, 'D',3,2);
a_Name := PlayWithName(ds_names1, DisplayFirstName,'A',1,1);
b_Name := PlayWithName(ds_names2, DisplayLastName, 'D',1,1);
OUTPUT(s_Name1);
OUTPUT(s_Name2);
OUTPUT(a_Name);
OUTPUT(b_Name);

DATASET from DICTIONARY

DATASET( dict )

This form re-defines the dict as a DATASET.

Example:

rec := {STRING color,UNSIGNED1 code, STRING name};
ColorCodes := DATASET([{'Black' ,0 , 'Fred'},
                       {'Brown' ,1 , 'Sam'},
                       {'Red'   ,2 , 'Sue'},
                       {'White' ,3 , 'Jo'}], rec);

ColorCodesDCT := DICTIONARY(ColorCodes,{Color,Code});

ds := DATASET(ColorCodesDCT);
OUTPUT(ds);         

See Also: OUTPUT, RECORD Structure, TABLE, ROW, RECORDOF, TRANSFORM Structure, DICTIONARY

DATASET from TRANSFORM

DATASET( count, transform [, DISTRIBUTED | LOCAL ] )

This form uses the transform to create the records. The result type of the transform function determines the structure. The integer COUNTER can be used to number each iteration of the transform function.

LOCAL executes separately and independently on each node.

Example:

IMPORT STD;
msg(UNSIGNED c) := 'Rec ' + (STRING)c + ' on node ' + (STRING)(STD.system.Thorlib.Node()+1);

// DISTRIBUTED example
DS := DATASET(CLUSTERSIZE * 2,
              TRANSFORM({STRING line}, 
                        SELF.line := msg(COUNTER)), 
              DISTRIBUTED);
DS;
/* creates a result like this:
   Rec 1 on node 1
   Rec 2 on node 1
   Rec 3 on node 2
   Rec 4 on node 2
   Rec 5 on node 3
   Rec 6 on node 3 
*/

// LOCAL example

DS2 := DATASET(2,
              TRANSFORM({STRING line},
                        SELF.line := msg(COUNTER)),
              LOCAL);
DS2;

/* An alternative (and clearer) way
creates a result like this:
   Rec 1 on node 1
   Rec 2 on node 1
   Rec 1 on node 2
   Rec 2 on node 2
   Rec 1 on node 3
   Rec 2 on node 3
*/

See Also: RECORD Structure, TRANSFORM Structure