Skip to main content

AGGREGATE

AGGREGATE( recordset, resultrec,maintransform [ , mergetransform (RIGHT1,RIGHT2) ] [, groupingfields ] [, LOCAL | FEW | MANY] [, UNORDERED | ORDERED( bool ) ] [, STABLE | UNSTABLE ] [, PARALLEL [ ( numthreads ) ] ] [, ALGORITHM( name ) ] )

recordsetThe set of records to process.
resultrecThe RECORD structure of the result record set.
maintransformThe TRANSFORM function to call for each matching pair of records in the recordset. This is implicitly a local operation on each node.
mergetransformOptional. The TRANSFORM function to call to globally merge the result records from the maintransform. If omitted, the compiler will attempt to deduce the merge from the maintransform.
groupingfieldsOptional. A comma-delimited list of fields in the recordset to group by. Each field must be prefaced with the keyword LEFT. If omitted, then all records match.
LOCALOptional. Specifies the operation is performed on each supercomputer node independently, without requiring interaction with all other nodes to acquire data; the operation maintains the distribution of any previous DISTRIBUTE. Valid only if the mergetransform is omitted.
FEWOptional. Indicates that the expression will result in fewer than 10,000 records. This allows optimization to produce a significantly faster result.
MANYOptional. Indicates that the expression will result in more than 10,000 records.
UNORDEREDOptional. Specifies the output record order is not significant.
ORDEREDSpecifies the significance of the output record order.
boolWhen False, specifies the output record order is not significant. When True, specifies the default output record order.
STABLEOptional. Specifies the input record order is significant.
UNSTABLEOptional. Specifies the input record order is not significant.
PARALLELOptional. Try to evaluate this activity in parallel.
numthreadsOptional. Try to evaluate this activity using numthreads threads.
ALGORITHMOptional. Override the algorithm used for this activity.
nameThe algorithm to use for this activity. Must be from the list of supported algorithms for the SORT function's STABLE and UNSTABLE options.
Return:AGGREGATE returns a record set.

The AGGREGATE function is similar to ROLLUP except its output format does not need to match the input format. It also has similarity to TABLE in that the groupingfields (if present) determine the matching records such that you will get one result for each unique value of the groupingfields. The input recordset does not need to have been sorted by the groupingfields.

The operation is implicitly local, in that the maintransform is called to process records locally on each node, and the result records on each node are then merged to produce the global result.

TRANSFORM Function Requirements - AGGREGATE

The maintransform must take at least two parameters: a LEFT record of the same format as the input recordset and a RIGHT record of the same format as the resultrec. The format of the resulting record set must be the resultrec. LEFT refers to the next input record and RIGHT the result of the previous transform.

The mergetransform must take at least two parameters: RIGHT1 and RIGHT2 records of the same format as the resultrec. The format of the resulting record set must be the resultrec. RIGHT1 refers to the result of the maintransform on one node and RIGHT2 the result of the maintransform on another.

The mergetransform is generated for expressions of the form:

  SELF.x := <RIGHT.x <op> f(LEFT)
  SELF.x := f(LEFT)  <op> RIGHT.x

where the <op> is: MAX, MIN, SUM, +, &, |, ^, *

How AGGREGATE Works

In the maintransform, LEFT refers to the next input record and RIGHT the result of the previous transform.

There are 4 interesting cases:

(a) If no records match (and the operation isn't grouped), the output is a single record with all the fields set to blank values.

(b) If a single record matches, the first record that matches calls the maintransform as you would expect.

(c) If multiple records match on a single node, subsequent records that match call the maintransform but any field expression in the maintransform that does not reference the RIGHT record is not processed. Therefore the value for that field is set by the first matching record matched instead of the last.

(d) If multiple records match on multiple nodes, then step (c) performs on each node, and then the summary records are merged. This requires a mergetransform that takes two records of type RIGHT. Whenever possible the code generator tries to deduce the mergetransform from the maintransform. If it can't, then the user will need to specify one.

inRecord := RECORD 
  UNSIGNED box; 
  STRING text{MAXLENGTH(10)}; 
END; 
inTable := DATASET([{1,'Fred'},{1,'Freddy'},
                    {2,'Freddi'},{3,'Fredrik'},{1,'FredJon'}], inRecord);
      
//Example 1: Produce a list of box contents by concatenating a string:
      
outRecord1 := RECORD 
  UNSIGNED box; 
  STRING contents{MAXLENGTH(200)}; 
END; 
outRecord1 t1(inRecord l, outRecord1 r) := TRANSFORM 
  SELF.box := l.box; 
  SELF.contents := r.contents + IF(r.contents <> '', ',', '') + l.text; 
END; 
      
outRecord1 t2(outRecord1 r1, outRecord1 r2) := TRANSFORM 
  SELF.box := r1.box; 
  SELF.contents := r1.contents + ',' + r2.contents; 
END; 
OUTPUT(AGGREGATE(inTable, outRecord1, t1(LEFT, RIGHT), t2(RIGHT1, RIGHT2), LEFT.box));
      
//This example could eliminate the merge transform if the SELF.contents expression in
//the t1 TRANSFORM were simpler, like this:
//     SELF.contents := r.contents + ',' + l.text;
//which would make the AGGREGATE function like this:
//   OUTPUT(AGGREGATE(inTable, outRecord1, t1(LEFT, RIGHT), LEFT.box));

      
//Example 2: A PIGMIX style grouping operation:
outRecord2 := RECORD 
  UNSIGNED box; 
  DATASET(inRecord) items; 
END; 
outRecord2 t3(inRecord l, outRecord2 r) := TRANSFORM 
  SELF.box := l.box; 
  SELF.items:= r.items + l; 
END; 
OUTPUT(AGGREGATE(inTable, outRecord2, t3(LEFT, RIGHT), LEFT.box));

See Also: TRANSFORM Structure, RECORD Structure, ROLLUP, TABLE