Skip to main content

Welcome to the second installment of the summer series of The Download: Tech Talks by HPCC Systems.

pic 1This series focuses on information about the latest developments within HPCC Systems related to the cloud. HPCC Systems has been very active with projects over the past few months, working to provide better support for dynamic environments via container orchestration and other capabilities.  

The second installment of the summer series of The Download: Tech Talks by HPCC System, features an interview with Jake Smith (Lead Architect, LexisNexis Risk Solutions), and Flavio Villanustre (VP and Chief Security Officer, LexisNexis Risk Solutions). 

The full recording of Flavio and Jake’s interview, “The Download: Tech Talks by HPCC Systems -  An Interview with Jake Smith,” is available on YouTube.

 

Meet our presenters

pic 2Flavio Villanustre is CISO and VP of Technology for LexisNexis Risk Solutions. He also leads the open source HPCC Systems platform initiative, which is focused on expanding the community gathering around the HPCC Systems Big Data platform, originally developed by LexisNexis Risk Solutions in 2001 and later released under an open source license in 2011. Flavio’s expertise covers a broad range of subjects, including hardware and systems, software engineering, and data analytics and machine learning. He has been involved with open source software for more than two decades, founding the first Linux users’ group in Buenos Aires in 1994. 

 

pic 3Jake Smith is a Lead Architect at LexisNexis Risk Solutions. He is one of the original HPCC Systems architects and has worked with the company for more than 20 years. Jake is the lead developer of Thor, the HPCC Systems Data Refinery Cluster, as well as a number of the other core HPCC Systems components.

 

This blog provides a brief summary of topics discussed in the interview, including:

  • Latest Developments
  • Challenges
  • Additional Information

Latest Developments

The goal of the HPCC System cloud development project is to transform the HPCC Systems platform into a system that works natively on the cloud. HPCC Systems was initially designed for commodity hardware, where HPCC Systems owns all hardware and manages it in data centers. 

Beginning with the 7.8.x series, HPCC Systems will now provide native support for containerization.  HPCC Systems 7.8.x series is available for download on bare metal on the website. This release also includes the first preview of the new design for providing a cloud native HPCC Systems platform. 

The new operating environment consists of Docker containers managed by Kubernetes, along with continued support for “bare metal” installations. Helm Charts will be used to deploy these containers on any cloud platform that supports Kubernetes. Example configurations are provided that illustrate how to use persistent data on various cloud providers.

In version 7.10, HPCC will also natively support Azure Blobs and AWS S3 for persistent data storage.

Challenges

There are challenges to operating the current HPCC Systems platform on the cloud. The assumptions made during the development of the HPCC Systems platform over the past 20 years were made for very good reasons. 

  • The first assumption is the use of local computers and computing nodes as storage for data. This has proven to be cost effective, and performs well. 
  • The second assumption is the use of IP addresses as persistent identifiers. 

Local storage does not work well in a cloud environment because it would require keeping expensive nodes in operation when computer power is not needed. This is inefficient and cost prohibitive. Computers on the cloud are ephemeral. The computer used today may not be the same one used at another time in the future. So, the user cannot store data long term on a local disk impossible, and an IP address cannot be used as a persistent identifier. 

Changes are being made to address these assumptions. One of the changes that HPCC Systems made to address the assumption of persistent identifiers is to resolve hostnames to IP addresses much later in the start-up process. This incremental change has helped, but more needs to be done to ensure that the HPCC Systems platform’s internal assumptions meet the requirements of a modern Public Cloud infrastructure.

Additional Information

More information about HPCC Systems and the cloud can be found in the following blogs:

HPCC Systems and the Path to the Cloud by Richard Chapman: In this blog, Richard Chapman talks about the journey to the Cloud and demonstrates how to set up a simple test cluster using a default Helm Chart.

Setting up a default HPCC Systems cluster on Microsoft Azure Cloud Using HPCC Systems 7.8.x and Kubernetes, by Jake Smith: This blog outlines the steps required to set up and run the HPCC Systems Helm Charts using the Azure Kubernetes Service (AKS).

Persisting data in an HPCC Systems Cloud native environment, by Gavin Halliday: This blog covers how to persist your data in an HPCC Systems cloud native environment.

Acknowledgements

A special thank you to Jake Smith and Flavio Villanustre for this informative interview. 

Tags