Fri Aug 19, 2022 2:58 pm
Login Register Lost Password? Contact Us

Please Note: The HPCC Systems forums are moving to Stack Overflow. We invite you to post your questions on Stack Overflow utilizing the tag hpcc-ecl ( This legacy forum will be active and monitored during our transition to Stack Overflow but will become read only beginning September 1, 2022.

Universal Workunit Scheduler

Share ideas, code, best practices and techniques with other community members

Mon Mar 09, 2020 7:47 pm Change Time Zone

Attached is example code (use without restriction) for a workunit scheduler.


The Harness, Instrucitons for installing, configuring and running in the code.
(24.59 KiB) Downloaded 250 times

Simple example, runs 3 workunits in sequence.
(2.72 KiB) Downloaded 239 times

Demonstrates Parameter passing of parent WUID.
(1.41 KiB) Downloaded 241 times

This document describes a harness for scheduling one or more streams of work within the HPCC system.
HPCC has the concept of a 'Workunit' on THOR. A 'Workunit' Performs a task on THOR, it may be a build of a keyfile, a spray of data into THOR, an analysis of historic data or myriad other tasks. This is all well and good but there are numerous scenarios where one workunit cannot, in itself, complete the entire job. The Job requires multiple stages, some examples:
    * (E)xtraction; (T)ransform; (L)oad of data into THOR is almost invariably a multi stage process where the 'scrubbing' or 'Transforming' of data has to wait upon the extraction and presentation of its input.
    * One Workunit may be 'watching' the run of another workunit and e-mailing out progress reports.
    * Build of retro datasets almost always requires multiple builds of the same dataset, consequently multiple runs of the same workunit but for different dates.
    * Multiple tasks (or workunits) can be dependent upon the same event (say presentation of data). All dependent tasks being able to run in parallel.
    * The decision on what stream of work to do may be dependent upon results from a workunit, e.g. an unusually large amount of ingested data may require extra tasks inserted into the normal stream of work.
    * Stats on say the 'process time' of a workunit obviously cannot be generated by the target workunit as it's not completed.
    * A build of a complex product can well be a choreograph of builds of components from disparate teams. A final assembly of pre-fabricated components all of which have to be in the right state at the right time.
This scheduler harness is suitable for all the scenarios above.
The basic idea is to implement each workunit as an ECL FUNCTION with the scheduler harness as a wrapper around said FUNCTION. So the harness both calls the target application function and is directed by the same function as to what to do next on return from the function. Note All target code must be accessible to the target THOR(s).Plural as the workunits do not have to all run on the same THOR.

As pseudo code its:
Code: Select all

To use the scheduling harness the only mandatory conditions are that the target application take, as a parameter, the 'Job Number' of the next job to perform in the sequence and return a 'Job Number'.
There are three options the FUNCTION has regarding the return of a 'Job Number'
1. Just echo back the 'Job Number' supplied as the input parameter and let the harness work out what to do next.
2. Return 0 (zero). This unconditionally terminates the sequence of executed workunits regardless of any default behaviour the harness may have had planned. (But NOT any workunit sequences running in PARALLEL in the same finite state machine.)
3. Return a completely new/different 'Job Number'. This directs the harness to start a different WU than that specified by the defined machine. This allows sequence of workunits to be decided at run time. With this option, one has to have detailed knowledge of the finite state machine driving the run time operation of the harness.

ECL allows one to tie an action to a returned result thus:
Code: Select all
TYPEOF(<NextJob>)TargetApplicationFUNCTION(TYPEOF(<NextJob>) <NextJob>) := FUNCTION
    RETURN WHEN(<NextJob>,Action);

Defining the Sequence of workunits
The Record layout defining one workunit is:
Code: Select all
OneTask := RECORD
    STRING WUName;      // The 'name' to give to the workunit via the #WORKUNIT('name'...) option.
    STRING eclFUNCTION; // The full path to the target application FUNCTION

Then a sequence of workunits is just a DATASET(OneTask), where the order of the records in the DATASET defines the order of execution.
There is also the option to define PARALLEL running with a DATASET wrapper around the DATASET(OneTask)
In pseudo code:
Code: Select all
ParallelRuns := RECORD
    DATASET(DATASET(OneTask)) Queues;

Marcos are supplied to allocate <NextJob> identifiers and to generate initiating ECL that can be passed to
.../WsWorkunits/WUSubmit to programmatically start the whole sequence.
You can also do everything by hand, defining the machine yourself, allocating out your own <NextJob> identifiers. Note that within the 'OneTask' structure place holder 'NextJob' is signified by the text '<NextJob>' with a synonym of '<state>'.

Additional Features
Along with the mandatory place holder <NextJob>, one can also use the <ParentWUID> place holder in either the WUName or parameter to the target ECL function. (Its type being STRING). This allows communication from parent to child workunits using:
Code: Select all
TYPEOF(<NextJob>)TargetApplicationFUNCTION(TYPEOF(<NextJob>) NextJob,STRING ParentWUID) := FUNCTION
SomeData := DATASET(WORKUNIT(ParentWUID, '<OUTPUT Identifier>'), {STRING Somedata})[1].Somedata;

The target ECL Function can also take, as a parameter, the entire machine driving the sequence of work, using place holder <fsm>. Obviously this has no meaning in the WUName.
In pseudo code:
Code: Select all
TYPEOF(<NextJob>)TargetApplicationFUNCTION(TYPEOF(<NextJob>) NextJob,TYPEOF(<fsm>) fsm) := FUNCTION

    This design is truly independent of, and agnostic to, the workings of any application.

    Being based on a State Machine there is no limit to its flexibility. For example: applications can themselves use the harness to initiate their own scheduling sequence, in effect allowing nested scheduling of workunits.

    The harness wrapper to any individual workunit runs in the same said workunit. Consequently if the workunit crashes the wrapper is terminated as well. The machine just stops without defunct child workunits left running or EVENTs left to be de-scheduled. (Note only the individual stream stops, other streams running in PARALLEL in the same state Machine are unaffected.)

Finally my Thanks and Acknowledgement to:
Richard Taylor
Robert Foreman
Tony Kirk
Dan Camper
Charles Kaminski
Without whose help this project would not have got over the line.

Posts: 444
Joined: Sat Oct 01, 2011 7:26 pm

Return to Tips & Tricks

Who is online

Users browsing this forum: No registered users and 1 guest