Skip to main content

End-to-end big data in a massively scalable supercomputing platform.

Open-source. Easy to use. Proven.

An Industry Insight from Arjuna Chala, Sr Director of Special Projects for HPCC Systems®

Mastering Data Lakes with HPCC Systems


Companies of all sizes now understand there’s untapped value in the large data sets they generate as they conduct day-to-day operations. To analyze that data and reveal key findings that make their products or services more competitive, these companies are searching for platforms and tools to manage their rapidly growing data sets. However, with the variety of database and big data analysis platforms currently available, the challenge becomes how to integrate and analyze data stored in different schema and different locations in one homogeneous environment. Data lakes present an interesting option for handling different data types, but bringing heterogeneous data into a homogenous data lake environment is a daunting aspect for any big data implementation.

Thankfully, it can be done and HPCC Systems can show you how. In this video presentation, watch as HPCC Systems’ Flavio Villanustre and Arjuna Chala conduct a demonstration of the HPCC Systems platform to illustrate how to manage disparate data with ease and efficiency. In the video, executives from HPCC Systems walk through how to profile, transform, aggregate, and analyze data (using actual data generated by a New York city taxi company) to extract many important conclusions about traffic volume by day, hour, and week for a given area; number of trips to and from an airport; cash transactions vs credit; and much more using the open-source (and completely free) HPCC Systems platform. Built by LexisNexis® Risk Solutions, one of the largest data aggregating companies in the world, the HPCC Systems platform was designed on the premise that big data is actually a solution, not a problem, and harvesting data from thousands of sources helps incorporate a learning-based approach to handling data.

In this video, you will learn how to:

  • Improve the speed and accuracy of the processes for transformation, cleaning, normalization, and aggregation
  • Enable efficient use of developer resources and development budgets
  • Facilitate the use of standard hardware, operating systems, and protocols


Additional Resources

After watching the video presentation, if you’d like to try HPCC Systems platform firsthand: