Skip to main content

HPCC Systems blog contributors are engineers, data scientists, and fellow community members who want to share knowledge, tips, and other helpful information happening in the HPCC Systems community. Check this blog regularly for insights into how HPCC Systems technology can put big data analytics to work for your own needs.

Renato Golin on 11/05/2012

See the Introduction for this article here.

Previous chapter: Step 2: The Distributed Flag, and Execution Tests.

This step is based on the following commit:

Renato Golin on 11/05/2012

See the Introduction for this article here.

Previous chapter: Step 1: The Parser, The Expression Tree and the Activity.

Renato Golin on 11/05/2012

See the Introduction for this article here.

This part of the tutorial refers to the commit bellow:

Git commit: DATASET (N, transform(COUNTER))

Renato Golin on 11/05/2012

This tutorial will walk you though adding a new feature in the compiler, making sure it executes correctly in the engines, and performing some basic optimisations such as replacing and inlining expressions.

When adding features to the compiler, there are two main places where you have to add code: the compiler itself, including the parser, the expression builder and exporter, and the engines (Roxie, Thor and HThor), including the common graph node representation.

Flavio Villanustre on 07/20/2012

As I was preparing the Keynote that I delivered at World-Comp'12, about Machine Learning on the HPCC Systems platform, it occurred to me that it was important to remark that when dealing with big data and machine learning, most of the time and effort is usually spent on the data ETL (Extraction, Transformation and Loading) and feature extraction process, and not on the specific learning algorithm applied.

Renato Golin on 06/27/2012

The ECL compiler

When ECL code is compiled, the internal representation is a graph of expressions that correlates each ECL instruction as a dependency to others. The compiler then walks through this expression graph, looking for patterns, dead code, common expressions and so on, until supposedly optimal code is printed at the end.

Renato Golin on 06/20/2012

HPCC's distributed file system has the concept of SuperFiles, a
collection of files with the same format, that is used to aggregate
data and automate disk reads.

Richard Taylor on 06/12/2012

How does one get started writing a Blog?

Flavio suggested to me recently, "You should write a blog about ECL to give the community an additional resource to learn more about it." So I said, "OK, I know quite a bit about ECL, so what specifically are you suggesting I write about?" And he replied, "Machine Learning."

Flavio Villanustre on 06/11/2012

More than 12 years ago, back in 2000, LexisNexis was pushing the envelope on what could be done to process and analyze large amounts of data with commercially available solutions at the time. The overall data size, combined with the large number of records and the complexity of the processing required made existing solutions non-viable.

Flavio Villanustre on 05/23/2012

One of our community members recently asked about fraud detection using the HPCC Systems platform. The case that this person described involved identifying potentially fraudulent traders, who were performing a significant number of transactions over a relatively short time period.