How AGGREGATE Works

How AGGREGATE Works
Prev	AGGREGATE	Next

In the maintransform, LEFT refers to the next input record and RIGHT the result of the previous transform.

There are 4 interesting cases:

(a) If no records match (and the operation isn't grouped), the output is a single record with all the fields set to blank values.

(b) If a single record matches, the first record that matches calls the maintransform as you would expect.

(c) If multiple records match on a single node, subsequent records that match call the maintransform but any field expression in the maintransform that does not reference the RIGHT record is not processed. Therefore the value for that field is set by the first matching record matched instead of the last.

(d) If multiple records match on multiple nodes, then step (c) performs on each node, and then the summary records are merged. This requires a mergetransform that takes two records of type RIGHT. Whenever possible the code generator tries to deduce the mergetransform from the maintransform. If it can't, then the user will need to specify one.

//Example 1: Produce a list of box contents by concatenating a string: 
IMPORT Std;
inRec := RECORD 
  UNSIGNED box; 
  STRING text{MAXLENGTH(100)}; 
END; 
inds := DATASET([{1,'Fred1'},{1,'Freddy1'},{1,'FredJon1'},
                 {3,'Fred3'},{3,'Freddy3'},{3,'FredJon3'},
                 {4,'Fred4'},{4,'Freddy4'},{4,'FredJon4'},
                 {2,'Freddi'},{2,'Fredrik'}], inRec,DISTRIBUTED);
outRec := RECORD 
  UNSIGNED box; 
  STRING contents{MAXLENGTH(200)}; 
END; 
outRec t1(inds l, outRec r) := TRANSFORM 
  SELF.box := l.box; 
  SELF.contents:= r.contents +IF(r.contents <> '', ',', '') +l.text +'-' +(Std.System.ThorLib.Node()+1); 
END; 
      
outRec t2(outRec r1, outRec r2) := TRANSFORM 
  SELF.box := r1.box; 
  SELF.contents := r1.contents + '::' + r2.contents; 
END; 
OUTPUT(AGGREGATE(inds, outRec, t1(LEFT, RIGHT), t2(RIGHT1, RIGHT2), LEFT.box));
//because there is a "group by" field, this will never call the second TRANSFORM
//because "group by" puts all grouped recs on a single node
//and it produces one result rec for each unique "group by" value

OUTPUT(AGGREGATE(inds, outRec, t1(LEFT, RIGHT), t2(RIGHT1, RIGHT2)));
//without the "group by" field, this calls the second TRANSFORM on a multi-node cluster
//and the second TRANSFORM produces a single result record after merging the results from 
//each node
  
      
//Example 2: A PIGMIX style grouping operation:
inRecord := RECORD 
  UNSIGNED box; 
  STRING text{MAXLENGTH(10)}; 
END; 
inTable := DATASET([{1,'Fred'},{1,'Freddy'},
                    {2,'Freddi'},{3,'Fredrik'},{1,'FredJon'}], inRecord);

outRecord2 := RECORD 
  UNSIGNED box; 
  DATASET(inRecord) items; 
END; 
outRecord2 t3(inRecord l, outRecord2 r) := TRANSFORM 
  SELF.box := l.box; 
  SELF.items:= r.items + l; 
END; 
OUTPUT(AGGREGATE(inTable, outRecord2, t3(LEFT, RIGHT), LEFT.box));