Quantile 4 – The engine interface

The next stage in adding a new activity to the system is to define the interface between the generated code and the engines. The important file for this stage is rtl/include/eclhelper.hpp, which contains the interfaces between the engines and the generated code. These interfaces define the information required by the engines to customize each of the different activities. The changes that define the interface for quantile are found in commit https://github.com/ghalliday/HPCC-Platform/commit/06534d8e9962637fe9a518…. Adding a quantile activity involves the following changes:

  • ThorActivityKind – TAKquantile

    Each activity that the engines support has an entry in this enumeration. This value is stored in the graph as the _kind attribute of the node.

  • ActivityInterfaceEnum – TAIquantilearg_1

    This enumeration in combination with the selectInterface() member of IHThorArg provides a mechanism for helper interfaces to be extended while preserving backwards compatibility with older workunits. The mechanism is rarely used (but valuable when it is), and adding a new activity only requires a single new entry.

  • IHThorArg This is the base interface that all activity interfaces are derived from. This interface does not need to change, but it is worth noting because each activity defines a specialized version of it. The names of the specialised interfaces follow a pattern; in this case the new interface is IHThorQuantileArg.

  • IHThorQuantileArg

    The following is an outline of the new member functions, with comments on their use:

    • getFlags()

      Many of the interfaces have a getFlags() function. It provides a concise way of returning several Boolean options in a single call – provided those options do not change during the execution of the activity. The flags are normally defined with explicit values in an enumeration before the interface. The labels often follow the pattern TF, i.e. TQFxxx ~= Thor-Quantile-Flag-XXX.

    • getNumDivisions()

      Returns how many parts to split the dataset into.

    • getSkew()

      Corresponds to the SKEW() attribute.

    • queryCompare()

      Returns an implementation of the interface used to compare two rows.

    • createDefault(rowBuilder)

      A function used to create a default row – used if there are no input rows.

    • transform(rowBuilder, _left, _counter)

      The function to create the output record from the input record and the partition number (passed as counter).

    • getScore(_left)

      What weighting should be given to this row?

    • getRange(isAll, tlen, tgt)

      Corresponds to the RANGE attribute.

Note that the different engines all use the same specialised interface – it contains a superset of the functions required by the different targets. Occasionally some of the engines do not need to use some of the functions (e.g., to serialize information between nodes) so the code generator may output empty implementations. For each interface defined in eclhelper.hpp there is a base implementation class defined in eclhelper_base.hpp. The classes generated for each activity in a query by the code generator are derived from one of these base classes. Therefore we need to create a corresponding new class CThorQuantileArg. It often provides default implementations for some of the helper functions to help reduce the size of the generated code (e.g., getScore returning 1). Often the process of designing the helper interface is dynamic. As the implementation is created, new options or possibilities for optimizations appear. These require extensions and changes to the helper interface in order to be implemented by the engines. Once the initial interface has been agreed, work on the code generator and the engines can proceeded in parallel. (It is equally possible to design this interface before any work on the parser begins, allowing more work to overlap.) There are some more details on the contents of thorhelper.hpp in the documentation ecl/eclcc/WORKUNIT.rst within the HPCC repository.