Skip to main content

CSV Files

attr := DATASET( file, struct, CSV [ ( [ HEADING( n ) ] [, SEPARATOR( f_delimiters ) ]

[, TERMINATOR( r_delimiters ) ] [, QUOTE( characters ) ] [, ESCAPE( esc ) ] [, MAXLENGTH( size ) ]

[ ASCII | EBCDIC | UNICODE ] [, NOTRIM ]) ] [,ENCRYPT(key) ] [, __COMPRESSED__]);

CSVSpecifies the file is a "comma separated values" ASCII file.
HEADING(n)Optional. The number of header records in the file. If omitted, the default is zero (0).
SEPARATOROptional. The field delimiter. If omitted, the default is a comma (',') or the delimiter specified in the spray operation that put the file on disk.
f_delimitersA single string constant, or set of string constants, that define the character(s) used as the field delimiter. If Unicode constants are used, then the UTF8 representation of the character(s) will be used.
TERMINATOR

Optional. The record delimiter. If omitted, the default is a line feed ('\n') or the delimiter specified in the spray operation that put the file on disk.

r_delimiters

A single string constant, or set of string constants, that define the character(s) used as the record delimiter.

QUOTE

Optional. The string quote character used. If omitted, the default is a single quote ('\'') or the delimiter specified in the spray operation that put the file on disk.

characters

A single string constant, or set of string constants, that define the character(s) used as the string value delimiter.

ESCAPE

Optional. The string escape character used to indicate the next character (usually a control character) is part of the data and not to be interpreted as a field or row delimiter. If omitted, the default is the escape character specified in the spray operation that put the file on disk (if any).

esc

A single string constant, or set of string constants, that define the character(s) used to escape control characters.

MAXLENGTH(size)

Optional. Maximum record length in the file in bytes. If omitted, the default is 4096. There is a hard limit of 10MB but that can be overridden using #OPTION(maxCSVRowSizeMb,nn) where nn is the maximum size in MB. The maximum record size should be set as conservatively as possible.

ASCII

Specifies all input is in ASCII format, including any EBCDIC or UNICODE fields.

EBCDIC

Specifies all input is in EBCDIC format except the SEPARATOR and TERMINATOR (which are expressed as ASCII values).

UNICODE

Specifies all input is in Unicode UTF8 format.

NOTRIM

Specifies preserving all whitespace in the input data (the default is to trim leading blanks).

ENCRYPT

Optional. Specifies the file was created by OUTPUT with the ENCRYPT option.

key

A string constant containing the encryption key used to create the file.

__COMPRESSED__Optional. Specifies that the file is compressed because it was OUTPUT with the COMPRESSED option.

This form is used to read an ASCII CSV file. This can also be used to read any variable-length record file that has a defined record delimiter. If none of the ASCII, EBCDIC, or UNICODE options are specified, the default input is in ASCII format with any UNICODE fields in UTF8 format.

Example:

CSVRecord := RECORD
  UNSIGNED4 person_id;
  STRING20 per_surname;
  STRING20 per_forename;
END;

file1 := DATASET('MyFile.CSV',CSVrecord,CSV);            //all defaults
file2 := DATASET('MyFile.CSV',CSVrecord,CSV(HEADING(1)); //1 header
file3 := DATASET('MyFile.CSV',
                 CSVrecord,
                 CSV(HEADING(1),
                     SEPARATOR([',','\t']),
                     TERMINATOR(['\n','\r\n','\n\r'])));
          //1 header record, either comma or tab field delimiters,
          // either LF or CR/LF or LF/CR record delimiters