Actions (1): SEQUENTIAL, ORDERED, PARALLEL

Often an ECL query will consist of a series of results or actions which should all be executed together. Historically there have been two ways of grouping actions together.

P := PARALLEL(a1, a2, a3, a4);

PARALLEL indicates that all of the actions can be executed in parallel. If any intermediate values are common to more than one of the actions, then the values should only be evaluated once, and the result reused by each action.

S := SEQUENTIAL(a1, a2, a3, a4);

SEQUENTIAL is used when actions need to be executed in a particular order – e.g., when ingesting data, processing it, and then marking the processed data as live. However SEQUENTIAL also has an additional effect which often isn’t realised. Intermediate values are not shared between any of the different sequential actions. Why?

Say action ‘a1’ counts the records in a superfile, action ‘a2’ updates the contents of the superfile, and action ‘a3’ counts the new records in the same super file. If intermediate values were shared between the actions then the count in ‘a3’ would use the out-of date-count evaluated for ‘a1’, rather than the count of the new super file. For a similar reason, actions in a SEQUENTIAL do not share any values with expressions evaluated outside the SEQUENTIAL.

Unfortunately this means that if the actions share some complicated processing then that work is going to be repeated. This is probably the commonest causes of code being executed twice, and something to watch out for. So how can you avoid it?

If two sequential actions share a value which should be executed once, then marking it with : INDEPENDENT will ensure the evaluation is shared. However it is a poor solution for a couple of reasons. Firstly the ECL programmer doesn’t always know ahead of time which bits will be shared – they may not have written the code for the actions. The second reasons is that adding INDEPENDENT can also cause other code to be duplicated (because values will no longer be shared between an INDEPENDENT value and the context that expression is used in).

One example I saw recently had several pieces of similar code, which each had some preprocessing, main processing and post processing. They had each been coded as

myAction := SEQUENTIAL(preprocessing, mainprocessing, postprocessing); and all of the items were then executed in parallel:

myActions := PARALLEL(myAction1, myAction2, myAction3);

The problem is that the different main processing actions all used some common values, but because they were inside separate SEQUENTIALs the code was being duplicated.

What would be a way to avoid it?

One solution would be for each of the processes to be defined as a module:

myAction := MODULE EXPORT preprocessing :=_  EXPORT mainprocessing :=_ EXPORT postprocessing :=_ END;

And then combine the different actions into another module:

myActions := MODULE
 EXPORT preprocessing := PARALLEL(myAction1.preprocessing, myAction2.preprocessing); 
 EXPORT mainprocessing := PARALLEL(myAction1.mainprocessing, myAction2.mainprocessing); 
 EXPORT postprocessing := PARALLEL(myAction1.postprocessing, myAction2.postprocessing); END;

And then only in the query itself combine them with SEQUENTIAL.

SEQUENTIAL(myActions.preprocessing, myActions.mainProcessing, myActions.postprocessing);

If you end up doing this a frequently you could use a virtual module, and simplify the whole process. (Left as an exercise for the reader…)

So in conclusion be careful about the use of SEQUENTIAL – it can cause expressions to be evaluated multiple times. If you’re likely to combine SEQUENTIAL items, then it is worth spending some time thinking about the best way of structuring your ECL.

P.S. Version 4.2 introduces a new keyword – ORDERED. It has the ordering requirements of SEQUENTIAL, but without the additional constraint. I’m not 100% sure it is a good idea! It is probably most useful for ordering actions which do not have anything in common – e.g. generating files and then sending emails – but use it with care. If there is any chance of a shared value which may change meaning you need to use SEQUENTIAL.

Getting Started with HPCC Systems

Getting Started with HPCC Systems

Let’s get started

Detailed documentation

Detailed documentation

Detailed documentation

Check out the Wiki

HPCC Systems Training

HPCC Systems Training

HPCC Systems Training

HPCC Systems Training

Welcome to the HPCC Systems developer community!

Welcome to the HPCC Systems developer community!

Welcome to the HPCC Systems developer community!

Welcome to the HPCC Systems developer community!

Welcome to the HPCC Systems developer community!

Welcome to the HPCC Systems developer community!

Welcome to the HPCC Systems developer community!

Welcome to the HPCC Systems developer community!

Actions (1): SEQUENTIAL, ORDERED, PARALLEL