Skip to main content

BUILD

[attrname := ] BUILD(baserecset, [ indexrec ] , indexfile [, options ] );

[attrname := ] BUILD(baserecset, keys, payload, indexfile [, options ] );

[attrname := ] BUILD( indexdef [, options ] );

[attrname := ] BUILD( indexdef, dataset, [, options ] );

BUILD( library );

attrnameOptional. The action name, which turns the action into an attribute definition, therefore not executed until the attrname is used as an action.
baserecsetThe set of data records for which the index file will be created. This may be a record set derived from the base data with the key fields and file position.
indexrecOptional. The RECORD structure of the fields in the indexfile that contains key and file position information for referencing into the baserecset. Field names and types must match the baserecset fields (REAL and DECIMAL value type fields are not supported). This may also contain additional fields not present in the baserecset (computed fields). If omitted, all fields in the baserecset are used. The last field must be the name of an UNSIGNED8 field defined using the {virtual(filepposition)} function in the DATASET declaration of the baserecset.
keysThe RECORD structure of fields in the indexfile that contains key and file position information for referencing into the baserecset. Field names and types must match the baserecset fields (REAL and DECIMAL value type fields are not supported). This may also contain additional fields not present in the baserecset. If omitted, all fields in the baserecset are used.
payloadThe RECORD structure of the indexfile that contains additional fields not used as keys . If the name of the baserecset is in the structure, it specifies "all other fields not already named in the keys parameter." This may contain fields not present in the baserecset (computed fields). These fields do not take up space in the non-leaf nodes of the index and cannot be referenced in a KEYED() filter clause
indexfileA string constant containing the logical filename of the index to produce. See the Scope & Logical Filenames article for more on logical filenames.
optionsOptional. One or more of the options listed below.
indexdefThe name of the INDEX attribute to build.
libraryThe name of a MODULE attribute with the LIBRARY option.

The first four forms of the BUILD action create index files. Indexes are automatically compressed, minimizing overhead associated with using indexed record access. The keyword BUILDINDEX may be used in place of BUILD in these forms.

The fifth form creates an external query library--a workunit that implements the specified library. This is similar to creating a .DLL in Windows programming, or a .SO in Linux.

Index BUILD Options

The following options are available on all three INDEX forms of BUILD (only):

[, CLUSTER( target )] [, SORTED] [, DISTRIBUTE( key ) [ , MERGE ] ][, DATASET( basedataset )] [, OVERWRITE] [, UPDATE][,EXPIRE( [days] ) ][, FEW] [, FILEPOSITION(false)] [, LOCAL] [, NOROOT] [, DISTRIBUTED][, COMPRESSED( LZW | ROW | FIRST) ] [, WIDTH( nodes ) ] [, DEDUP][,SKEW(limit[, target] ) [, THRESHOLD(size) ] ] [, MAXLENGTH[(value)] ] ][, UNORDERED | ORDERED( bool ) ] [, STABLE | UNSTABLE ] [, PARALLEL [ ( numthreads ) ] ] [, ALGORITHM( name ) ][, SET ( option, value ) ]

CLUSTERSpecifies writing the indexfile to the specified list of target clusters. If omitted, the indexfile is written to the cluster on which the workunit executes. The number of physical file parts written to disk is always determined by the number of nodes in the cluster on which the workunit executes, regardless of the number of nodes on the target cluster(s) unless the WIDTH option is also specified.
targetA comma-delimited list of string constants containing the names of the clusters to write the indexfile to. The names must be listed as they appear on the ECL Watch Activity page or returned by the Std.System.Thorlib.Group() function, optionally with square brackets containing a comma-delimited list of node-numbers (1-based) and/or ranges (specified with a dash, as in n-m) to indicate the specific set of nodes to write to.
SORTEDSpecifies that the baserecset is already sorted, implying that the automatic sort based on all the indexrec fields is not required before the index is created.
DISTRIBUTESpecifies building the indexfile based on the distribution of the key.
keyThe name of an existing INDEX attribute definition.
MERGEOptional. Specifies merging the resulting index into the specified key.
DATASETThis is only needed when the baserecset is the result of an operation (such as a JOIN) whose result makes it ambiguous as to which physical dataset is being indexed (in other words, use this option only when you receive an error that it cannot be deduced). Naming the basedataset ensures that the proper record links are used in the index.
basedatasetThe name of the DATASET attribute from which the baserecset is derived.
OVERWRITESpecifies overwriting the indexfile if it already exists.
UPDATESpecifies that the file should be rewritten only if the code or input data has changed.
EXPIREOptional. Specifies the file is a temporary file that may be automatically deleted after the specified number of days since the file was read.
FILEPOSITIONOptional. If flag is FALSE, prevents the implicit fileposition field from being created and will not treat a trailing integer field any differently from the rest of the payload.
flagOptional. TRUE or FALSE, indicating whether or not to create the implicit fileposition field.
daysOptional. The number of days from last file read after which the file may be automatically deleted. If omitted, the default is seven (7).
FEWSpecifies the indexfile is created as a single one-part file. Used only for small datasets (typically lookup-type files, such as 2-character state codes). This option is now deprecated in favor of using the WIDTH(1).
indexdefThe name of an existing INDEX attribute definition that provides the baserecset, indexrec, and indexfile parameters to use.
LOCALSpecifies the operation is performed on each supercomputer node independently, without requiring interaction with all other nodes to acquire data; the operation maintains the distribution of any previous DISTRIBUTE function.
NOROOTSpecifies that the index is not globally sorted, and there is no root index to indicate which part of the index will contain a particular entry. This may be useful in Roxie queries in conjunction with ALLNODES use.
DISTRIBUTEDSpecifies both the LOCAL and NOROOT options (congruent with the DISTRIBUTED option on an INDEX declaration, which specifies the index was built with the LOCAL and NOROOT options).
COMPRESSEDSpecifies the type of compression used. If omitted, the default is LZW, a variant of the Lempel-Ziv-Welch algorithm. Specifying ROW compresses index entries based on differences between contiguous rows (for use with fixed-length records, only), and is recommended for use in circumstances where speedier decompression time is more important than the amount of compression achieved. FIRST compresses common leading elements of the key (recommended only for timing comparison use).
WIDTHSpecifies writing the indexfile to a different number of physical file parts than the number of nodes in the cluster on which the workunit executes. If omitted, the default is the number of nodes in the cluster on which the workunit executes. This option is primarily to create indexes on a large Thor that are destined to be deployed to a smaller Roxie (making the Roxie queries more efficient).
nodesThe number of physical file parts to write. If set to one (1), this operates exactly the same as the FEW option, above.
DEDUPSpecifies that duplicate entries are eliminated from the INDEX.
SKEWIndicates that you know the data will not be spread evenly across nodes (will be skewed and you choose to override the default by specifying your own limit value to allow the job to continue despite the skewing.)
limitA value between zero (0) and one (1.0 = 100%) indicating the maximum percentage of skew to allow before the job fails (the default skew is 1.0 / <number of slaves on cluster>).
targetOptional. A value between zero (0) and one (1.0 = 100%) indicating the desired maximum percentage of skew to allow (the default skew is 1.0 / <number of slaves on cluster>).
THRESHOLDIndicates the minimum size for a single part before the SKEW limit is enforced.
sizeAn integer value indicating the minimum number of bytes for a single part. Default is 1GB.
MAXLENGTHOptional. This option is used to create indexes that are backward compatible for platform versions prior to 3.0. Specifies the maximum length of a variable-length index record. Fixed length records always use the minimum size required. If the default maximum length causes inefficiency problems, it can be explicitly overridden.
valueOptional. An integer value indicating the maximum length. If omitted, the maximum size is calculated from the record structure. Variable-length records that do not specify MAXLENGTH may be slightly inefficient
UNORDEREDOptional. Specifies the output record order is not significant.
ORDEREDSpecifies the significance of the output record order.
boolWhen False, specifies the output record order is not significant. When True, specifies the default output record order.
STABLEOptional. Specifies the input record order is significant.
UNSTABLEOptional. Specifies the input record order is not significant.
PARALLELOptional. Try to evaluate this activity in parallel.
numthreadsOptional. Try to evaluate this activity using numthreads threads.
ALGORITHMOptional. Override the algorithm used for this activity.
nameThe algorithm to use for this activity. Must be from the list of supported algorithms for the SORT function's STABLE and UNSTABLE options.
SETOptional. SET is used to set a value to a named metadata option. This allows you to set user metadata whose use and purpose is up to the developer. Currently _nodeSize is the only system-defined metadata, though other names starting with an underscore (_) should be considered reserved for system use. You may want to use SET('_nodeSize', '32768') if your hardware and usage pattern work better with larger page sizes. The default (8192) may not be optimal for all scenarios on modern hardware. We recommend using a power of 2 and not smaller than 8k.
optionA case sensitive string constant containing the name of the option to set.
valueThe value to set the option to. This may be any type of value, dependent on what the option expects to be.

BUILD an Access Index

[attrname := ] BUILD(baserecset, [ indexrec ] , indexfile [, options ] );

Form 1 creates an index file to allow keyed access to the baserecset. The index is used primarily by the FETCH and JOIN (with the KEYED option) operations.

Example:

Vehicles := DATASET('vehicles',
     {STRING2 st,
      STRING20 city,
      STRING20 lname,
      UNSIGNED8 filepos{virtual(fileposition)}},
     FLAT);
BUILD(Vehicles,{lname,filepos},'vkey::lname');
 //build key into Vehicles dataset on last name

BUILD a Payload Index

[attrname := ] BUILD(baserecset, keys, payload, indexfile [, options ] );

Form 2 creates an index file containing extra payload fields in addition to the keys. This form is used primarily to create indexes used by "half-key" JOIN operations to eliminate the need to directly access the baserecset, thus increasing performance over the "full-keyed" version of the same operation (done with the KEYED option on the JOIN).

By default, the payload fields are sorted during the BUILDINDEX operation to minimize space on the leaf nodes of the key. This sorting can be controlled by using sortIndexPayload in a #OPTION statement.

Example:

Vehicles := DATASET('vehicles',
     {STRING2 st,
      STRING20 city,
      STRING20 lname,
      UNSIGNED8 filepos{virtual(fileposition)}},
      FLAT);
BUILD(Vehicles,{st,city},{lname},'vkey::st.city');
 //build key into Vehicles dataset on state and city
 //payload the last name

BUILD from an INDEX Definition

[attrname := ] BUILD( indexdef [, options ] );

Form 3 creates an index file by using a previously defined INDEX definition.

Example:

nameKey := INDEX(mainTable,{surname,forename,filepos},'name.idx');
BUILD(nameKey); //gets all info from the INDEX definition

[attrname := ] BUILD( indexdef, dataset [, options ] );

Form 4 creates an index file on a dataset using a previously defined INDEX definition.

This is used to build an index where the dataset definition is complex. This allows the index to be logically separated from the dataset from which it is created. This is especially useful when the dataset definition is very complicated (Mb of source) because when the index is subsequently used in a query, all the code to create it is also parsed.

Example:

ds = DATASET(100, TRANSFORM({ unsigned id }, SELF.id := COUNTER));
i := INDEX({ unsigned id }, 'myIndex');
BUILD(i, ds);

BUILD a Query Library

BUILD( library );

Form 5 creates an external query library for use in hthor or Roxie, only.

A query library allows a set of related attributes to be packaged as a self contained unit so the code can be shared between different workunits. This reduces the time required to deploy a set of attributes, and also reduces the memory footprint for the set of queries within Roxie that use the library. Also, functionality in the library can be updated without having to re-deploy all the queries that use that functionality.

Query libraries are suitable for packaging together sets of functions that are closely related. They aren't suited for including attributes defined as MACROs--the meaning of a macro isn't known until its parameters are substituted.

The name form of #WORKUNIT names the workunit that BUILD creates as the external library. That name is the external library name used by the LIBRARY function (which provides access to the library from within the query that uses the library). Since the workunit itself is the external query library, BUILD(library) must be the only action in the workunit.

Example:

NamesRec := RECORD
  INTEGER1  NameID;
  STRING20  FName;
  STRING20  LName;
END;
FilterLibIface1(DATASET(namesRec) ds, STRING search) := INTERFACE
  EXPORT DATASET(namesRec) matches;
  EXPORT DATASET(namesRec) others;
END;

FilterDsLib1(DATASET(namesRec) ds, STRING search) :=
      MODULE,LIBRARY(FilterLibIface1)
  EXPORT matches := ds(Lname  = search);
  EXPORT others  := ds(Lname != search);
END;
#WORKUNIT('name','Ppass.FilterDsLib')
BUILD(FilterDsLib1);

See Also: INDEX, JOIN, FETCH, MODULE, INTERFACE, LIBRARY, DISTRIBUTE, #WORKUNIT