Compiler Tutorial, Step 4: Inlining

See the Introduction for this article here.

Previous chapter: Step 3: The Optimisation, and More Tests.

This step is based on the following commit:

Git commit: Inline dataset_from_transform https://github.com/hpcc-systems/HPCC-Platform/pull/3323

Now that we have our DATASET running distributed on Thor, and we’re quite happy with its behaviour, we can tackle some of the problems we created during the process.

The main issue is that, because some NORMALIZE were transformed into a DATASET(count), the compiler ended up joining them into one sub-expression (via CSE), and it is now a work-unit in itself. Normally, common code is a good thing, but in this case, the cost of spawning a new work-unit might not be worth the benefit of the DISTRIBUTED (when the inline table is too small). In cases like these, we want to inline code.

The Inliner

The main inlining decision is taken on ‘ecl/hqlcpp/hqlinline.cpp’, when defining the inline flags. The main function that calculates if an activity has to be inlined or not is ‘calcInlineFlags()’. In there, you need to put heuristics to check when can the expression be inlined.

In our case, we can’t inline distributed datasets (or we’d have to replicate the engines’ logic to know the ranges for every activity our dataset could be inlined, which doesn’t make sense). Inlining a transform that has a SKIP could be done, but we have chosen not to, since the benefits would probably not outweigh the costs of supporting it in the long run.

case no_dataset_from_transform: { if (expr->hasProperty(distributedAtom)) return 0; if (transformContainsSkip(expr->queryChild(1))) return 0; return RETassign; }

RETassign is a flag telling we can inline, if we’re using the dataset when assigning to a new variable without having to create a new temporary.

The function that will know how to inline, if the flag is non-zero, that is, is the ‘buildDatasetAssign()’ in ‘ecl/hqlcpp/hqlcppds.cpp’ (HQL CPP DataSet). As most of the other high-level functions in the compiler, it’s just a dispatcher to more specialized functions, in this case, a new one we’ll create.

case no_dataset_from_transform: buildDatasetAssignDatasetFromTransform(subctx, target, expr); return;

Coding the Inlined expression

On most compilers, inlining happens to already lowered expressions, as an optimisation pass on intermediate code. On our compiler, the inlining is, as you could see above, a step during the lowering of the expression.

It means you have the context in memory (selectors, sources, sinks), instead of having to calculate it afterwards, but it also mean you have to duplicate your generation procedures, producing very different code from each other.

Our new function will have to cope with all that an activity would, so it’s not just a matter of overriding the specific members of the base activity class. We have to create the whole thing inside another activity.

We’ll need to allocate memory for the rows, search the source dataset for a viable row, iterate through the counter and produce rows in a local sink dataset that will be used by the activity. It also means we have to account for range problems that the engines would.

Preamble

For instance, if the count is known to be zero or negative, nothing needs to be generated:

if (isZero(count) || isNegative(count)) return;

If the count is known to be a positive number, we just iterate. But if we’re not sure (say, for a stored – run time – variable), we have to add a run time check.

if (couldBeNegative(count)) { OwnedHqlExpr zero = createConstant(0, count->getType()); OwnedHqlExpr ifTest = createValue(no_gt, makeBoolType(), boundCount.getTranslatedExpr(), LINK(zero)); buildFilter(subctx, ifTest); }

Filter is the name of an IF statement, for historical reasons. The function ‘buildFilter()’ will update the context ‘subctx’, so that if it ends up building the IF block, it’ll move the insertion point inside it. This means every new code that will be added (including the loop) below, will be added inside the loop.

If we wanted to have code outside of it we should have worried about keeping a reference to it, or nesting it with a function call, but we don’t, so we just conditionally update the context.

Both functions ‘isNegative()’ and ‘couldBeNegative()’ were introduced in this commit, so it’s good if you could examine it and familiarise yourself with the type system for the expression graph.

Note that we’re creating constants of the same type of the counter, or the comparison could produce unexpected results (regarding sign).

It’s not hard to see that this code is creating a boolean value with the result of a ‘GreaterThen’ comparison between a zero constant and our count expression. However, it’s important to notice that we’re using the ‘boundCount’, and not the original ‘count’ from the expression graph.

If we were to use the counter here and in the iteration just after that, the compiler would generate two function calls to get the counter value, which would be inefficient. So, we ‘bind’ the counter to a variable:

CHqlBoundExpr boundCount;
buildSimpleExpr(ctx, count, boundCount);

which will be translated to some temporary in the code, like ‘v123’, and use it for both the if and the for loop later on.

The Main Loop

The loop code is also simple and straightforward. First, we create the induction variable:

OwnedHqlExpr loopVar = subctx.getTempDeclare(counterType, NULL);
OwnedHqlExpr one = getSizetConstant(1);
buildAssignToTemp(subctx, loopVar, one);

Second, we compare it to the bound counter:

OwnedHqlExpr loopTest = createValue(no_le, makeBoolType(), LINK(loopVar), LINK(boundCount.expr));

Finally, we increment the induction variable: OwnedHqlExpr inc = createValue(no_postinc, loopVar->getType(), LINK(loopVar));

And create the loop:

subctx.addLoop(loopTest, inc, false);

Here, again, ‘addLoop()’ will change the insertion point, from now on, to inside the loop. Since we know now that we need to iterate anyway, and there’s nothing for us to do outside the loop, we, again, will not bother with the lexical block outside out IF/FOR.

Finally, we need to create each row, from the selector of the first row of the source dataset into ‘count’ rows in the sink dataset.

First, we get the ‘value’ of the row, that will be generated by the given transform: OwnedHqlExpr rowValue = createRow(no_createrow, LINK(transform));

Then, we create a physical row, and attach that value to it:

BoundRow * targetRow = target->buildCreateRow(subctx); Owned targetRef = buildActiveRow(subctx, targetRow->querySelector()); buildRowAssign(subctx, targetRef, rowValue);

Finally, we finish the row, making sure memory is handled correctly: target->finishRow(subctx, targetRow);

Note that the ‘buildRowAssign’ will take care of the code in the transform, and make sure that any complex transformations that could come from it, including child queries and complex filters, will be coded before actually assigning that value to the row. This is an important feature, if we want to support every possible combination that datasets have available as activities, when inlined.

Side Effects So far, so good, the code looks to be inlined for the cases we were generating new work-units before. But there is a new issue when we run the compiler regression tests: one activity is inlining too much. Because we do inlining as we export the code, the compiler failed to notice that some code could be joined up. This was detected and a new function was added to try to reach a balance when it was safe to hoist a dataset and when it was better to inline, called ‘isEfficientToHoistDataset()’. Because the problem was spotted on a resource, so ‘ecl/hqlcpp/hqlresource.cpp’ was changed on the spot of figuring out whether to hoist or not. The new function added could be in a more generic file, if this problem is about to happen on other places, but for now, it should suffice.

Testing After applying the patch and running the standard set of regressions, we see that only the changes we were expecting (ie. the ones that improve code) have occurred. This does not mean others won’t happen in the wild, but do minimise the problems we might find in production code. The added test cases, in both compiler and engines regressions, have accounted for the expected cases of negative numbers (constants or not), zero and variable counts. It also demonstrates, on the compiler regression, that the DISTRIBUTED DATASET does not inline, even having every property to do so otherwise. The execution tests not only will prevent optimisations to destroy our inliner, but also show to the history what were the intentions when we coded it. It clearly shows that we’re expecting no rows for zero and negative (and not an error, for example). And with that, we finish our compiler tutorial.

Conclusion

Adding features, optimisations and fixing bugs in the ECL compiler is a bit of black magic, as any compiler, and help on the mailing list or the pull request reviews is much appreciated. Having tutorials like this will help you start changes and ask much better questions, which saves time for you and other developers, but they won’t tell you what to do in your case, nor help you with specific questions. Learning that is part of the process and best done if interacting with other developers. Also, have in mind that the tests require deep knowledge of the ECL syntax and the syntax that is valid (as opposed to what is actually described in the docs), so you will be asking a lot of questions based on the differences you see in the test suites. For further reference, please check the ECL reference manual and examples.