Recently, we received a great question from one of our HPCC Systems users – When processing many different file formats in a similar manner, how can ECL be structured to avoid code duplication? The user explained that they receive files from multiple customers in a few different formats. After an ETL process, which is customized for every file format, the data is then pushed through an analytics phase that is somewhat different for each customer but is, nevertheless, mostly the same for everyone.
Their initial solution was to simply write different ECL for each customer, but that resulted in a lot of copying and pasting of code. Instead of doing that, they wanted to pass the incoming file to a single function and have it do all the “common” tasks thus eliminating the need to copy and paste.
The answer to that question is yes there is a “fix” and it is called module inheritance. Inheritance is an important concept in programming and it provides a way for objects to define relationships with each other. As the name suggests, an object is able to inherit characteristics from another object. In more concrete terms, an object is able to pass on its behaviors to its children. For inheritance to be successful, the objects need to have common characteristics.
So how do you get started? In other languages, you can use reflection, code injection, function polymorphism, or code specialization. Code specialization is an older technique from most object-oriented programming (OOP) languages, with common/default code located within parent classes and specialized code in child classes. That is the technique you use in ECL, and you express it with modules.
Modules in ECL are containers of attributes. Physically, they are either file system directories within your ECL code base or a special MODULE attribute within an ECL file. Modules can be nested and, if they are of the attribute kind, they can inherit from a parent module. A child module that inherits from a parent gains visibility to the parent's exported or shared attributes. The child module can also override most of those exported or shared attributes to provide new definitions. That is the basis for OOP-like parent/child class relationships within ECL.
There are some important things to note or remember about module inheritance:
- Attributes in a parent module that will be overridden must be marked as VIRTUAL. You can either mark the entire module as virtual (allowing all shared or exported attributes to be overridden), or you can specify that only particular attributes are virtual.
- You can override most definitions that appear in a parent module. You cannot override a record definition.
- If you override something, the new definition's signature must match. For value attributes, the data types (STRING, UNSIGNED, etc.) must match. For functions, transforms, etc. the return data type and arguments must match.
- When you call into a code structured this way, it is important to call into the child module every time, even if the child module does not have the attribute defined in it, but a parent module does. This establishes the naming scope and you will have fewer maintenance issues if you remain consistent.
So how is this applied to a project? First, break your data flow down into discrete functions that represent the logical steps you need to take (and may override), then create a single controlling function that calls those functions in order. For simple projects, all of this code can be placed within a single module. Any attribute that may be specialized or change value should be marked as virtual, or the entire module can be marked as virtual to indicate that everything may be specialized. Second, create another module that inherits from the first module. Place your specialized functions or attribute declarations within the second module. Finally, to kick everything off, call your controlling function from within the scope of the second module.
A brief example will perhaps make this somewhat clearer. The following code can be pasted into a Builder window in the Windows ECL IDE and executed. For sake of brevity, the example manipulates only simple scalar values (strings and numbers) rather than datasets, but the concept is the same.
// Parent module containing all attributes; note that it // is marked as virtual Base := MODULE, VIRTUAL EXPORT STRING DELIM := ''; EXPORT UNSIGNED1 FirstNum() := FUNCTION RETURN 4; END; EXPORT UNSIGNED1 SecondNum() := FUNCTION RETURN 2; END; // This is the 'controlling function' that performs all // of the actions we are interested in; it gathers // numeric values from other functions, converts them // to strings, concatenates them with a delimiter, // then returns the result as a string EXPORT STRING ConcatNums() := FUNCTION RETURN (STRING)FirstNum() + DELIM + (STRING)SecondNum(); END; END; // Child module inheriting from the base module; this one // overrides only the delimiter used Example1 := MODULE(Base) EXPORT STRING DELIM := ':'; END; // Child module inheriting from the base module; override // both the delimiter and how the second number is // generated Example2 := MODULE(Base) EXPORT STRING DELIM := ':'; EXPORT UNSIGNED1 SecondNum() := FUNCTION RETURN RANDOM() % 256; END; END; // Child module inheriting from another child module; // this will pick up all exported attributes from // the base and child, including any overridden // definitions Example3 := MODULE(Example1) EXPORT UNSIGNED1 FirstNum() := FUNCTION RETURN 10; END; END; // Example calls; results are shown as comments // (note the '4-162' result will vary, as part of // it is randomly generated) OUTPUT(Base.ConcatNums(), NAMED('Base')); // 42 OUTPUT(Example1.ConcatNums(), NAMED('Ex1')); // 4:2 OUTPUT(Example2.ConcatNums(), NAMED('Ex2')); // 4-162 OUTPUT(Example3.ConcatNums(), NAMED('Ex3')); // 10:2
This should be familiar to anyone with OOP experience. You can override any virtual definition as long as the signature of the specialization matches the parent’s signature exactly. ECL does not support polymorphism.
It takes some careful software architecture, but more complex functionality, with differing layers of specialization mapped to a module hierarchy, is certainly possible.
One thing module inheritance will not let you override is a record definition, as noted above. The signature of a function with a dataset argument implicitly includes the record definition for that dataset. This prevents you from creating a specialized version of the function that accepts datasets with a different record definitions, which prevents you from going full-OOP-style on the ECL design, where you can override everything cleanly. Two different tactics can be used to address this:
- If the dataset is read from a logical file, pass the filename to the function instead. The function then creates the dataset definition locally and processes it as usual. The function’s signature is now something that can be overridden, as its argument is a string instead of a dataset. Note that this function can call out to other functions that themselves may be "common" and physically located in a parent module or elsewhere.
- A FUNCTIONMACRO provides a way of generalizing manipulation of a dataset when you don't know its structure, but you do know that the manipulation is the same. For instance, appending a unique numeric identifier to every record in a dataset is an action that does not depend on the structure of the dataset. Whatever the structure is, you want to append a new attribute with a certain value. Note that the *caller* of the FUNCTIONMACRO knows exactly what the dataset looks like, though. So, while the FUNCTIONMACRO is written in a generic way, when it is instantiated the caller will provide the details/context and the FUNCTIONMACRO can be made concrete by the ECL compiler.
The biggest benefit of using inheritance is that it allows programmers to reuse code they have already written. It is also more flexible to change and if problems do come up, maintenance is easier and requires less time to complete.
Hopefully, this clarifies module inheritance and how to initiate it in ECL. If you have not visited our forum for HPCC Systems, please do. It features news, events and is a wonderful place for developers to ask questions, make comments, or share ideas.