Skip to main content

HPCC Systems blog contributors are engineers and data scientists who for years have enabled LexisNexis customers to use big data to fulfill critical missions, gain competitive advantage, or unearth new discoveries. Check this blog regularly for insights into how HPCC Systems technology can put big data to work for your own organization.

Renato Golin on 11/05/2012

See the Introduction for this article here.

Previous chapter: Step 3: The Optimisation, and More Tests.

This step is based on the following commit:

Renato Golin on 11/05/2012

See the Introduction for this article here.

Previous chapter: Step 2: The Distributed Flag, and Execution Tests.

This step is based on the following commit:

Renato Golin on 11/05/2012

See the Introduction for this article here.

Previous chapter: Step 1: The Parser, The Expression Tree and the Activity.

Renato Golin on 11/05/2012

See the Introduction for this article here.

This part of the tutorial refers to the commit bellow:

Git commit: DATASET (N, transform(COUNTER))
https://github.com/hpcc-systems/HPCC-Platform/pull/1285/files

Renato Golin on 11/05/2012

This tutorial will walk you though adding a new feature in the compiler, making sure it executes correctly in the engines, and performing some basic optimisations such as replacing and inlining expressions.

When adding features to the compiler, there are two main places where you have to add code: the compiler itself, including the parser, the expression builder and exporter, and the engines (Roxie, Thor and HThor), including the common graph node representation.

Renato Golin on 06/27/2012

The ECL compiler

When ECL code is compiled, the internal representation is a graph of expressions that correlates each ECL instruction as a dependency to others. The compiler then walks through this expression graph, looking for patterns, dead code, common expressions and so on, until supposedly optimal code is printed at the end.

Renato Golin on 06/20/2012

HPCC's distributed file system has the concept of SuperFiles, a
collection of files with the same format, that is used to aggregate
data and automate disk reads.