Skip to main content

HPCC Systems blog contributors are engineers and data scientists who for years have enabled LexisNexis customers to use big data to fulfill critical missions, gain competitive advantage, or unearth new discoveries. Check this blog regularly for insights into how HPCC Systems technology can put big data to work for your own organization.

Renato Golin on 06/20/2012

HPCC's distributed file system has the concept of SuperFiles, a
collection of files with the same format, that is used to aggregate
data and automate disk reads.

Richard Taylor on 06/12/2012

How does one get started writing a Blog?

Flavio suggested to me recently, "You should write a blog about ECL to give the community an additional resource to learn more about it." So I said, "OK, I know quite a bit about ECL, so what specifically are you suggesting I write about?" And he replied, "Machine Learning."

Flavio Villanustre on 06/11/2012

More than 12 years ago, back in 2000, LexisNexis was pushing the envelope on what could be done to process and analyze large amounts of data with commercially available solutions at the time. The overall data size, combined with the large number of records and the complexity of the processing required made existing solutions non-viable.

Flavio Villanustre on 05/23/2012

One of our community members recently asked about fraud detection using the HPCC Systems platform. The case that this person described involved identifying potentially fraudulent traders, who were performing a significant number of transactions over a relatively short time period.

Flavio Villanustre on 05/15/2012

You probably thought that the HPCC Systems platform and Hadoop were two technologies that represented the opposite ends of a spectrum, and that choosing one would make attempting to use the other, unrealistic. If this is what you believed: think again (and keep reading).

Flavio Villanustre on 05/10/2012

It is not uncommon to find situations where a classification model needs to be trained using a very large amount of historic data, but the ability to perform classification of new data in real time is required. There are many examples of this need, from real time sentiment analysis in tweets or news, to anomaly detection for fraud or fault identification.

Flavio Villanustre on 05/07/2012

At HPCC Systems we have been very busy finding better ways to communicate with our Community. As a result of this, we have just released the first edition of our official HPCC Systems podcast, in which the Host and our Program Manager, Trish McCall, has a conversation with our senior trainer Bob Foreman around different aspects of the HPCC Systems platform, the ECL data-intensive programming language and some other topics that we hope you will find interesting.

Flavio Villanustre on 04/30/2012

Don't be surprised by the title: I'm not trying to play down the link between high blood pressure and a diet rich in Sodium. In the HPCC Systems platform world, SALT has a completely different meaning.

Flavio Villanustre on 04/27/2012

While the ECL-ML (ECL Machine Learning) libraries currently support a variety of prevalent algorithms in machine learning, there could always be the need for the one that has not been added just yet. And, the fact that ECL-ML provides a distributed linear algebra library, which greatly simplifies distributed vectorized implementations, is a blessing, but it still requires some coding in ECL to add new algorithms.

Flavio Villanustre on 04/24/2012

A lot has happened since the version 1.0 release of our Machine Learning libraries. As you can see by checking out our ML portal (http://hpccsystems.com/ML), there are a ton of new algorithms, and significant improvements to existing ones.