Skip to main content

PERSIST

attribute := expression : PERSIST( filename [, cluster ] [, CLUSTER(target)] [, EXPIRE(days )] [, REFRESH(flag )] [, SINGLE | MULTIPLE[(count)]] ) ;

attributeThe name of the Attribute.
expressionThe definition of the attribute. This typically defines a recordset (but it may be any expression).
filenameA string constant specifying the storage name of the expression result. See Scope and Logical Filenames.
clusterOptional. A string constant specifying the name of the Thor cluster on which to re-build the attribute if/when necessary. This makes it possible to use persisted attributes on smaller clusters but have them rebuilt on larger, making for more efficient resource utilization. If omitted, the attribute is re-built on the currently executing cluster.
CLUSTEROptional. Specifies writing the filename to the specified list of target clusters. If omitted, the filename is written to the cluster on which the PERSIST executes (as specified by the cluster parameter). The number of physical file parts written to disk is always determined by the number of nodes in the cluster on which the PERSIST executes, regardless of the number of nodes on the target(s).
targetA comma-delimited list of string constants containing the names of the clusters to write the filename to. The names must be listed as they appear on the ECL Watch Activity page or returned by the Std.System.Thorlib.Group() function, optionally with square brackets containing a comma-delimited list of node-numbers (1-based) and/or ranges (specified with a dash, as in n-m) to indicate the specific set of nodes to write to.
EXPIREOptional. Specifies the filename is a temporary file that may be automatically deleted after the specified number of days.
daysOptional. The number of days after which the file may be automatically deleted. If omitted, it defaults to use the PersistExpiryDefault setting in Sasha.
REFRESHOptional. Option to control when the PERSIST rebuilds. If omitted, the PERSIST rebuilds if 1) the underlying file does not exist, or 2) the data has changed, or 3) the code has changed.
flagA boolean value indicating whether to rebuild the PERSIST. When set to FALSE, the PERSIST rebuilds ONLY if the underlying file does not exist. If your PERSIST layout has changed and you specify REFRESH(FALSE) the mismatch could cause your job to fail.
SINGLEOptional. Specifies to keep a single PERSIST. The name of the persist file is the same as the name of the persist. The default is MULTIPLE(-1) which retains all.
MULTIPLE Optional. Specifies to keep different versions of the PERSIST. The name of the persist file generated is a combination of the name supplied suffixed with a 32-bit value derived from the ECL.
countOptional. The number of versions of a PERSIST to keep. If omitted, the system default is used. If set to -1, then an unlimited number are kept.

The PERSIST service stores the result of the expression globally so it remains permanently available for use (including the result of any DISTRIBUTE or GROUP operation in the expression). This is particularly useful for attributes based on large, expensive data manipulation sequences. The attribute is re-calculated only when the ECL code or underlying data that was used to create it have changed, otherwise the attribute data is simply returned from the stored name file on disk when referenced. This service implicitly causes the attribute to be evaluated at global scope instead of the enclosing scope.

PERSIST may be combined with the WHEN clause so that even though the attribute may be used more than once, its execution is based upon the WHEN clause (or the first use of the attribute) and not upon the number of times the attribute is used in the computation. This gives a kind of "compute in anticipation" capability.

You can use #OPTION to override the default settings, as shown in the example.

Example:

// #OPTION ('multiplePersistInstances', true|false); // if true retains MULTIPLE, if false SINGLE
// #OPTION ('defaultNumPersistInstances', <n>);      // the number to retain if MULTIPLE allowed. 
                                                     // Defaults to -1 (retain all)

  CountPeople := COUNT(Person) : PERSIST('PeopleCount');
  //Makes CountPeople available for use in all subsequent work units
  
  sPeople := SORT(Person,Person.per_first_name) :
          PERSIST('SortPerson'),WHEN(Daily);
  //Makes sPeople available for use in all subsequent work units
  
  s1 := SORT(Person,Person.per_first_name) :
          PERSIST('SortPerson1','OtherThor');
      //run the code on the OtherThor cluster
  s2 := SORT(Person,Person.per_first_name) :
          PERSIST('SortPerson2',
                  'OtherThor',
                  CLUSTER('AnotherThor'));
       //run the code on the OtherThor cluster
       // and write the file to the AnotherThor cluster

See Also: STORED, WHEN, GLOBAL, CHECKPOINT, #OPTION