attr := DATASET( file, struct, filetype [,LOOKUP]);

attr := DATASET( dataset, file, filetype [,LOOKUP]);

attr := DATASET( WORKUNIT( [ wuid , ] namedoutput ), struct );

[ attr := ] DATASET( recordset [, recstruct ] );

DATASET( row )

DATASET( childstruct [, COUNT( count ) | LENGTH( size ) ] [, CHOOSEN( maxrecs ) ] )


DATASET( dict )

DATASET( count, transform [, DISTRIBUTED | LOCAL ] )

attrThe name of the DATASET for later use in other definitions.
fileA string constant containing the logical file name. See the Scope & Logical Filenames section for more on logical filenames.
structThe RECORD structure defining the layout of the fields. This may use RECORDOF.
filetypeOne of the following keywords, optionally followed by relevant options for that specific type of file: THOR /FLAT, CSV, XML, JSON, PIPE. Each of these is discussed in its own section, below.
datasetA previously-defined DATASET or recordset from which the record layout is derived. This form is primarily used by the BUILD action and is equivalent to:
      ds := DATASET('filename',RECORDOF(anotherdataset), ... )
LOOKUPOptional. Specifies that the file layout should be looked up at compile time. See File Layout Resolution at Compile Time in the Programmer's Guide for more details.
WORKUNITSpecifies the DATASET is the result of an OUTPUT with the NAMED option within the same or another workunit.
wuidOptional. A string expression that specifies the workunit identifier of the job that produced the NAMED OUTPUT.
namedoutputA string expression that specifies the name given in the NAMED option.

A set of in-line data records. This can simply name a previously-defined set definition or explicitly use square brackets to indicate an in-line set definition. Within the square brackets records are separated by commas. The records are specified by either:

1) Using curly braces ({}) to surround the field values for each record. The field values within each record are comma-delimited.

2) A comma-delimited list of in-line transform functions that produce the data rows. All the transform functions in the list must produce records in the same result format.

recstructOptional. The RECORD structure of the recordset. Omittable only if the recordset parameter is just one record or a list of in-line transform functions.
rowA single data record. This may be a single-record passed parameter, or the ROW or PROJECT function that defines a 1-row dataset.
childstructThe RECORD structure of the child records being defined. This may use the RECORDOF function.
COUNTOptional. Specifies the number of child records attached to the parent (for use when interfacing to external file formats).
countAn expression defining the number of child records. This may be a constant or a field in the enclosing RECORD structure (addressed as SELF.fieldname).
LENGTHOptional. Specifies the size of the child records attached to the parent (for use when interfacing to external file formats).
sizeAn expression defining the size of child records. This may be a constant or a field in the enclosing RECORD structure (addressed as SELF.fieldname).
CHOOSENOptional. Limits the number of child records attached to the parent. This implicitly uses the CHOOSEN function wherever the child dataset is read.
maxrecsAn expression defining the maximum number of child records for a single parent.
GROUPEDSpecifies the DATASET being passed has been grouped using the GROUP function.
LINKCOUNTEDSpecifies the DATASET being passed or returned uses the link counted format (each row is stored as a separate memory allocation) instead of the default (embedded) format where the rows of a dataset are all stored in a single block of memory. This is primarily for use in BEGINC++ functions or external C++ library functions.
STREAMEDSpecifies the DATASET being returned is returned as a pointer to an IRowStream interface (see the eclhelper.hpp include file for the definition).Valid only as a return type. This is primarily for use in BEGINC++ functions or external C++ library functions.
structThe RECORD structure of the dataset field or parameter. This may use the RECORDOF function.
dictThe name of a DICTIONARY definition.
countAn integer expression specifying the number of records to create.
transformThe TRANSFORM function that will create the records. This may take an integer COUNTER parameter.
DISTRIBUTEDOptional. Specifies distributing the created records across all nodes of the cluster. If omitted, all records are created on node 1.
LOCALOptional. Specifies records are created on every node.

The DATASET declaration defines a file of records, on disk or in memory. The layout of the records is specified by a RECORD structure (the struct or recstruct parameters described above). The distribution of records across execution nodes is undefined in general, as it depends on how the DATASET came to be (sprayed in from a landing zone or written to disk by an OUTPUT action), the size of the cluster on which it resides, and the size of the cluster on which it is used (to specify distribution requirements for a particular operation, see the DISTRIBUTE function).

The first two forms are alternatives to each other and either may be used with any of the filetypes described below (THOR/FLAT, CSV, XML, JSON, PIPE).

The third form defines the result of an OUTPUT with the NAMED option within the same workunit or the workunit specified by the wuid (see Named Output DATASETs below).

The fourth form defines an in-line dataset (see In-line DATASETs below).

The fifth form is only used in an expression context to allow you to in-line a single record dataset (see Single-row DATASET Expressions below).

The sixth form is only used as a value type in a RECORD structure to define a child dataset (see Child DATASETs below).

The seventh form is only used as a value type to pass DATASET parameters (see DATASET as a Parameter Type below).

The eighth form is used to define a DICTIONARY as a DATASET (see DATASET from DICTIONARY below).

The ninth form is used to create a DATASET using a TRANSFORM function (see DATASET from TRANSFORM below)


attr := DATASET( file, struct, THOR [,__COMPRESSED__][,OPT ] [,UNSORTED][,PRELOAD([nbr])] [,ENCRYPT(key) ]);

attr := DATASET( file, struct, FLAT [,__COMPRESSED__] [,OPT] [,UNSORTED] [,PRELOAD([nbr])] [,ENCRYPT(key) ]);

THORSpecifies the file is in the Data Refinery (may optionally be specified as FLAT, which is synonymous with THOR in this context).
__COMPRESSED__Optional. Specifies that the THOR file is compressed because it is a result of the PERSIST Workflow Service or was OUTPUT with the COMPRESSED option.
__GROUPED__Specifies the DATASET has been grouped using the GROUP function.
OPTOptional. Specifies that using dataset when the THOR file doesn't exist results in an empty recordset instead of an error condition.
UNSORTEDOptional. Specifies the THOR file is not sorted, as a hint to the optimizer.
PRELOADOptional. Specifies the file is left in memory after loading (valid only for Rapid Data Delivery Engine use).
nbrOptional. An integer constant specifying how many indexes to create "on the fly" for speedier access to the dataset. If > 1000, specifies the amount of memory set aside for these indexes.
ENCRYPTOptional. Specifies the file was created by OUTPUT with the ENCRYPT option.
keyA string constant containing the encryption key used to create the file.

This form defines a THOR file that exists in the Data Refinery. This could contain either fixed-length or variable-length records, depending on the layout specified in the RECORD struct.

The struct may contain an UNSIGNED8 field with either {VIRTUAL(fileposition)} or {VIRTUAL(localfileposition)} appended to the field name. This indicates the field contains the record's position within the file (or part), and is used for those instances where a usable pointer to the record is needed, such as the BUILD function.


PtblRec := RECORD
  STRING2 State := Person.per_st;
  STRING20 City := Person.per_full_city;
  STRING25 Lname := Person.per_last_name;
  STRING15 Fname := Person.per_first_name;
Tbl := TABLE(Person,PtblRec);
PtblOut := OUTPUT(Tbl,,'RTTEMP::TestFile');
          //write a THOR file
Ptbl := DATASET('~Thor400::RTTEMP::TestFile',
                {PtblRec,UNSIGNED8 __fpos {VIRTUAL(fileposition)}},
             // __fpos contains the "pointer" to each record
             // Thor400 is the scope name and RTTEMP is the
             // directory in which TestFile is located
             //using ENCRYPT
PtblE := DATASET('~LR::TestFileEncrypted',