Skip to main content

HPCC Systems blog contributors are engineers and data scientists who for years have enabled LexisNexis customers to use big data to fulfill critical missions, gain competitive advantage, or unearth new discoveries. Check this blog regularly for insights into how HPCC Systems technology can put big data to work for your own organization.

Renato Golin on 11/05/2012

This tutorial will walk you though adding a new feature in the compiler, making sure it executes correctly in the engines, and performing some basic optimisations such as replacing and inlining expressions.

When adding features to the compiler, there are two main places where you have to add code: the compiler itself, including the parser, the expression builder and exporter, and the engines (Roxie, Thor and HThor), including the common graph node representation.

Flavio Villanustre on 07/20/2012

As I was preparing the Keynote that I delivered at World-Comp'12, about Machine Learning on the HPCC Systems platform, it occurred to me that it was important to remark that when dealing with big data and machine learning, most of the time and effort is usually spent on the data ETL (Extraction, Transformation and Loading) and feature extraction process, and not on the specific learning algorithm applied.

Renato Golin on 06/27/2012

The ECL compiler

When ECL code is compiled, the internal representation is a graph of expressions that correlates each ECL instruction as a dependency to others. The compiler then walks through this expression graph, looking for patterns, dead code, common expressions and so on, until supposedly optimal code is printed at the end.

Renato Golin on 06/20/2012

HPCC's distributed file system has the concept of SuperFiles, a
collection of files with the same format, that is used to aggregate
data and automate disk reads.

Richard Taylor on 06/12/2012

How does one get started writing a Blog?

Flavio suggested to me recently, "You should write a blog about ECL to give the community an additional resource to learn more about it." So I said, "OK, I know quite a bit about ECL, so what specifically are you suggesting I write about?" And he replied, "Machine Learning."

Flavio Villanustre on 06/11/2012

More than 12 years ago, back in 2000, LexisNexis was pushing the envelope on what could be done to process and analyze large amounts of data with commercially available solutions at the time. The overall data size, combined with the large number of records and the complexity of the processing required made existing solutions non-viable.

Flavio Villanustre on 05/23/2012

One of our community members recently asked about fraud detection using the HPCC Systems platform. The case that this person described involved identifying potentially fraudulent traders, who were performing a significant number of transactions over a relatively short time period.

Flavio Villanustre on 05/15/2012

You probably thought that the HPCC Systems platform and Hadoop were two technologies that represented the opposite ends of a spectrum, and that choosing one would make attempting to use the other, unrealistic. If this is what you believed: think again (and keep reading).

Flavio Villanustre on 05/10/2012

It is not uncommon to find situations where a classification model needs to be trained using a very large amount of historic data, but the ability to perform classification of new data in real time is required. There are many examples of this need, from real time sentiment analysis in tweets or news, to anomaly detection for fraud or fault identification.

Flavio Villanustre on 05/07/2012

At HPCC Systems we have been very busy finding better ways to communicate with our Community. As a result of this, we have just released the first edition of our official HPCC Systems podcast, in which the Host and our Program Manager, Trish McCall, has a conversation with our senior trainer Bob Foreman around different aspects of the HPCC Systems platform, the ECL data-intensive programming language and some other topics that we hope you will find interesting.