DATASET

DATASET
Prev	Record Structures and Files	Next

attr := DATASET( file, struct, filetype [,LOOKUP]);

attr := DATASET( dataset, file, filetype [,LOOKUP]);

attr := DATASET( WORKUNIT( [ wuid , ] namedoutput ), struct );

[ attr := ] DATASET( recordset [, recstruct ] );

DATASET( row )

DATASET( childstruct [, COUNT( count ) | LENGTH( size ) ] [, CHOOSEN( maxrecs ) ] )

[GROUPED] [LINKCOUNTED] [STREAMED] DATASET( struct )

DATASET( dict )

DATASET( count, transform [, DISTRIBUTED | LOCAL ] )

attr	The name of the DATASET for later use in other definitions.
file	A string constant containing the logical file name. See the Scope & Logical Filenames section for more on logical filenames.
struct	The RECORD structure defining the layout of the fields. This may use RECORDOF.
filetype	One of the following keywords, optionally followed by relevant options for that specific type of file: THOR /FLAT, CSV, XML, JSON, PIPE. Each of these is discussed in its own section, below.
dataset	A previously-defined DATASET or recordset from which the record layout is derived. This form is primarily used by the BUILD action and is equivalent to: ds := DATASET('filename',RECORDOF(anotherdataset), ... )
LOOKUP	Optional. Specifies that the file layout should be looked up at compile time. See File Layout Resolution at Compile Time in the Programmer's Guide for more details.
WORKUNIT	Specifies the DATASET is the result of an OUTPUT with the NAMED option within the same or another workunit.
wuid	Optional. A string expression that specifies the workunit identifier of the job that produced the NAMED OUTPUT.
namedoutput	A string expression that specifies the name given in the NAMED option.
recordset	A set of in-line data records. This can simply name a previously-defined set definition or explicitly use square brackets to indicate an in-line set definition. Within the square brackets records are separated by commas. The records are specified by either: 1) Using curly braces ({}) to surround the field values for each record. The field values within each record are comma-delimited. 2) A comma-delimited list of in-line transform functions that produce the data rows. All the transform functions in the list must produce records in the same result format.
recstruct	Optional. The RECORD structure of the recordset. Omittable only if the recordset parameter is just one record or a list of in-line transform functions.
row	A single data record. This may be a single-record passed parameter, or the ROW or PROJECT function that defines a 1-row dataset.
childstruct	The RECORD structure of the child records being defined. This may use the RECORDOF function.
COUNT	Optional. Specifies the number of child records attached to the parent (for use when interfacing to external file formats).
count	An expression defining the number of child records. This may be a constant or a field in the enclosing RECORD structure (addressed as SELF.fieldname).
LENGTH	Optional. Specifies the size of the child records attached to the parent (for use when interfacing to external file formats).
size	An expression defining the size of child records. This may be a constant or a field in the enclosing RECORD structure (addressed as SELF.fieldname).
CHOOSEN	Optional. Limits the number of child records attached to the parent. This implicitly uses the CHOOSEN function wherever the child dataset is read.
maxrecs	An expression defining the maximum number of child records for a single parent.
GROUPED	Specifies the DATASET being passed has been grouped using the GROUP function.
LINKCOUNTED	Specifies the DATASET being passed or returned uses the link counted format (each row is stored as a separate memory allocation) instead of the default (embedded) format where the rows of a dataset are all stored in a single block of memory. This is primarily for use in BEGINC++ functions or external C++ library functions.
STREAMED	Specifies the DATASET being returned is returned as a pointer to an IRowStream interface (see the eclhelper.hpp include file for the definition).Valid only as a return type. This is primarily for use in BEGINC++ functions or external C++ library functions.
struct	The RECORD structure of the dataset field or parameter. This may use the RECORDOF function.
dict	The name of a DICTIONARY definition.
count	An integer expression specifying the number of records to create.
transform	The TRANSFORM function that will create the records. This may take an integer COUNTER parameter.
DISTRIBUTED	Optional. Specifies distributing the created records across all nodes of the cluster. If omitted, all records are created on node 1.
LOCAL	Optional. Specifies records are created on every node.

The DATASET declaration defines a file of records, on disk or in memory. The layout of the records is specified by a RECORD structure (the struct or recstruct parameters described above). The distribution of records across execution nodes is undefined in general, as it depends on how the DATASET came to be (sprayed in from a landing zone or written to disk by an OUTPUT action), the size of the cluster on which it resides, and the size of the cluster on which it is used (to specify distribution requirements for a particular operation, see the DISTRIBUTE function).

The first two forms are alternatives to each other and either may be used with any of the filetypes described below (THOR/FLAT, CSV, XML, JSON, PIPE).

The third form defines the result of an OUTPUT with the NAMED option within the same workunit or the workunit specified by the wuid (see Named Output DATASETs below).

The fourth form defines an in-line dataset (see In-line DATASETs below).

The fifth form is only used in an expression context to allow you to in-line a single record dataset (see Single-row DATASET Expressions below).

The sixth form is only used as a value type in a RECORD structure to define a child dataset (see Child DATASETs below).

The seventh form is only used as a value type to pass DATASET parameters (see DATASET as a Parameter Type below).

The eighth form is used to define a DICTIONARY as a DATASET (see DATASET from DICTIONARY below).

The ninth form is used to create a DATASET using a TRANSFORM function (see DATASET from TRANSFORM below)

THOR/FLAT Files

attr := DATASET( file, struct, THOR [,__COMPRESSED__][,OPT ] [,UNSORTED][,PRELOAD([nbr])] [,ENCRYPT(key) ]);

attr := DATASET( file, struct, FLAT [,__COMPRESSED__] [,OPT] [,UNSORTED] [,PRELOAD([nbr])] [,ENCRYPT(key) ]);

THOR	Specifies the file is in the Data Refinery (may optionally be specified as FLAT, which is synonymous with THOR in this context).
__COMPRESSED__	Optional. Specifies that the THOR file is compressed because it is a result of the PERSIST Workflow Service or was OUTPUT with the COMPRESSED option.
__GROUPED__	Specifies the DATASET has been grouped using the GROUP function.
OPT	Optional. Specifies that using dataset when the THOR file doesn't exist results in an empty recordset instead of an error condition.
UNSORTED	Optional. Specifies the THOR file is not sorted, as a hint to the optimizer.
PRELOAD	Optional. Specifies the file is left in memory after loading (valid only for Rapid Data Delivery Engine use).
nbr	Optional. An integer constant specifying how many indexes to create "on the fly" for speedier access to the dataset. If > 1000, specifies the amount of memory set aside for these indexes.
ENCRYPT	Optional. Specifies the file was created by OUTPUT with the ENCRYPT option.
key	A string constant containing the encryption key used to create the file.

This form defines a THOR file that exists in the Data Refinery. This could contain either fixed-length or variable-length records, depending on the layout specified in the RECORD struct.

The struct may contain an UNSIGNED8 field with either {VIRTUAL(fileposition)} or {VIRTUAL(localfileposition)} appended to the field name. This indicates the field contains the record's position within the file (or part), and is used for those instances where a usable pointer to the record is needed, such as the BUILD function.

Example:

PtblRec := RECORD
  STRING2 State := Person.per_st;
  STRING20 City := Person.per_full_city;
  STRING25 Lname := Person.per_last_name;
  STRING15 Fname := Person.per_first_name;
END;
          
Tbl := TABLE(Person,PtblRec);
         
PtblOut := OUTPUT(Tbl,,'RTTEMP::TestFile');
          //write a THOR file
         
Ptbl := DATASET('~Thor400::RTTEMP::TestFile',
                {PtblRec,UNSIGNED8 __fpos {VIRTUAL(fileposition)}},
                THOR,OPT);
             // __fpos contains the "pointer" to each record
             // Thor400 is the scope name and RTTEMP is the
             // directory in which TestFile is located
             //using ENCRYPT
OUTPUT(Tbl,,'~Thor400::RTTEMP::TestFileEncrypted',ENCRYPT('mykey'));
PtblE := DATASET('~LR::TestFileEncrypted',
                 PtblRec,
                 THOR,OPT,ENCRYPT('mykey'));