Skip to main content

HPCC Systems blog contributors are engineers and data scientists who for years have enabled LexisNexis customers to use big data to fulfill critical missions, gain competitive advantage, or unearth new discoveries. Check this blog regularly for insights into how HPCC Systems technology can put big data to work for your own organization.

Lorraine Chapman on 03/08/2016

This year, we are running our own summer internship program again. If you are a student wondering how to spend your summer doing something interesting, worthwhile and fun, read on!

Richard Taylor on 03/02/2016

Given the recent buzz around this year’s 88th Academy Awards, our team decided to put HPCC Systems to the test. We used the platform to scan IMDb’s movie database1 for the “Six Degrees of Kevin Bacon.” But we didn’t stop there.

Lorraine Chapman on 02/11/2016

It’s another year for big ideas in big data. Those ideas don’t just come from developers, but from the students they started out as – students like you.

Lorraine Chapman on 01/28/2016

You may have already downloaded the HPCC Systems 6.0.0 Beta version and read my earlier blog about the features included. So what's been happening on the HPCC Systems Open Source Project since then...

Charles Kaminski on 01/20/2016

Prefix Trees can be an important addition to a big-data toolbox. In two prior posts I showed how to combine a prefix tree and an edit-distance algorithm on a big-data platform for a significant performance boost. In this post, I show how to further improve performance by layering on additional pruning strategies. A reasonable expectation here is an additional performance improvement of 10% to 50% based on the real-world data and the pruning strategies you use for your data.

Lorraine Chapman on 01/06/2016

Happy New Year from the HPCC Systems Platform Team! 

Charles Kaminski on 12/01/2015

In this blog post, I will walk you through using prefix trees and a big-data platform to build fast edit-distance queries. You can use the examples here to begin processing large volumes of data using an edit-distance algorithm or to build queries fast enough to query very large datasets interactively using an edit-distance algorithm. This blog post builds on a previous blog post.

Jim DeFabia on 11/18/2015

One aspect of multitenancy is the ability to share data across tenants, but still maintain some data silos. A multitenant application may need to share a file across tenants, but maintain columnar restrictions based upon the user’s rights and permissions.

Gavin Halliday on 11/06/2015

The next stage in adding a new activity to the system is to define the interface between the generated code and the engines. The important file for this stage is rtl/include/eclhelper.hpp, which contains the interfaces between the engines and the generated code. These interfaces define the information required by the engines to customize each of the different activities.

Richard Chapman on 10/26/2015

If you are an ECL programmer, there are a lot of things you don’t need to worry about that programmers in low level languages like C or C++ need to think about: