Skip to main content

HPCC Systems blog contributors are engineers, data scientists, and fellow community members who want to share knowledge, tips, and other helpful information happening in the HPCC Systems community. Check this blog regularly for insights into how HPCC Systems technology can put big data analytics to work for your own needs.

Gavin Halliday on 06/10/2020
This is the third blog in our series focusing on using our new HPCC Systems Cloud native platform. The previous two blogs track our Path to the Cloud and provide details of how to setup a default system using a Helm chart and Microsoft Azure using Kubernetes. Our Cloud development project is ongoing, so keep checking the blogs in this series for updates. This blog covers how to persist your data in an HPCC Systems cloud native environment.
Gavin Halliday on 02/09/2017

ECL has two commonly known keywords for grouping together actions – PARALLEL and SEQUENTIAL.  Surely it is fairly obvious which one to use?  Use PARALLEL if you do not mind in which order operations are executed and use SEQUENTIAL if you do.  Unfortunately many people do not realise that SEQUENTIAL can be very bad for your query’s health.

Gavin Halliday on 11/06/2015

The next stage in adding a new activity to the system is to define the interface between the generated code and the engines. The important file for this stage is rtl/include/eclhelper.hpp, which contains the interfaces between the engines and the generated code. These interfaces define the information required by the engines to customize each of the different activities.

Gavin Halliday on 09/21/2015

The first stage in implementing QUANTILE will be to add it to the parser. This can sometimes highlight issues with the syntax and cause revisions to the design. In this case there were two technical issues integrating the syntax into the grammar. (If you are not interested in shift/reduce conflicts you may want to skip a few paragraphs and jump to the walkthrough of the changes.)

Gavin Halliday on 06/29/2015

When adding new features to the system, or changing the code generator, the first step is often to write some ECL test cases. They have proved very useful for several reasons:

Gavin Halliday on 05/11/2015

This series of blog posts started life as a series of walk-throughs and brainstorming sessions at a team offsite. This series will look at adding a new activity to the system. The idea is to give an walk through of the work involved, to highlight the different areas that need changing, and hopefully encourage others to add their own activities.

Gavin Halliday on 03/10/2015

As your body of ECL code grows it gets harder to track the dependencies between the different ECL definitions (or source files). Providing more information about the dependencies between those definitions makes it easier to understand the structure of the ECL code, and also gives you a better understanding of what queries would be affected by changing a particular definition. (i.e., If I change this, what am I going to break?)

Gavin Halliday on 03/24/2014

Different Types of Joins

Matching records from multiple data sources is one of the fundamental operations you need to process data. As you would expect ECL makes it easy – but the number of options can be bewildering. The following aims to guide you through the different options, and explain when they are appropriate.

Gavin Halliday on 01/03/2014

The most common way of filtering out records is to apply a filter to a dataset. E.g.,

Centenarians := myDataset(age >= 100)

However, sometimes the easiest place to decide whether or not a record is required is within the transform that creates it. ECL has the SKIP keyword to indicate that a record shouldn’t be generated from a transform1. SKIP has two different syntaxes:

1. SKIP as an attribute on a transform

Gavin Halliday on 11/18/2013

Often an ECL query will consist of a series of results or actions which should all be executed together. Historically there have been two ways of grouping actions together.

P := PARALLEL(a1, a2, a3, a4);

PARALLEL indicates that all of the actions can be executed in parallel. If any intermediate values are common to more than one of the actions, then the values should only be evaluated once, and the result reused by each action.