HPCC Systems 8.0.0 – Cloud Native Platform Highlights

At the start of 2020, I joined the HPCC Systems Platform Team at one of their offsite meetings, where the main focus of the discussion was implementing a Cloud Native version of our big data analytics platform. HPCC Systems 8.0.0 is the first release that provides a Cloud Native version of our platform that is feature complete. While it may be not fully production ready, it is now at the point where it is ready for Cloud performance evaluation. What does this mean for you?

It means that it is ready to start developing and testing cloud deployments, but it is brand new, so expect to see some issues and potentially some missing functionality, or cosmetic issues. We would like to know about your experience, so do let us know how your evaluation went by posting in our Community Forum (requires registration and a login to post).

If you do find issues along the way, we want to know about them. Please report any issues you find using our Community Issue Tracker, so we can investigate and fix them!

Getting started with the HPCC Systems Cloud Native Platform

To get a complete overview of our Cloud Native platform, including the main changes we have implemented and how to use the helm charts, read our HPCC Systems and the Path to the Cloud blog by Richard Chapman, VP and Head of Research and Development, LexisNexis Risk Solutions Group.

More resources are available on our HPCC Systems is going Cloud Native Wiki, which includes some How To videos, links to our GitHub Repository and Helm Charts as well as more blogs on topics such as persisting, exporting and importing data, using a service mesh, log visualisation and getting set up using Azure AKS and AWS EKS.

I’m already using HPCC Systems in the Cloud, why should I change to the Cloud Native Platform?

Using HPCC Systems in the Cloud has been possible for a number of years now via the ‘lift and shift’ approach, which involves setting up virtual machines on a Cloud service using our bare metal platform. To preserve the local data storage, these virtual machines need to stay running constantly (and be paid for), whether or not they are in use. Our Cloud Native platform allows you to take advantage of all the benefits of Cloud capabilities, including significant cost savings, by ensuring that you are paying only for what you are using when you are using it.

I’m an HPCC Systems Bare Metal user, what changes do I need to make to get started using the Cloud Native Platform?

ECL Developers

ECL developers, will find that not much has changed, in terms of using the ECL language. However, there may be some differences to bear in mind linked with:

  • Where your data is stored
  • How foreign files are accessed
  • How data is shared between environments

The biggest changes are more related to the way the system scales. Currently, using a Bare Metal system, a fixed number of Thors are available to users. These Thors are (ideally) kept as busy as possible, although sometimes users may find they need to wait for a Thor cluster to become available. The hidden costs associated with this approach revolve around someone not being able to use a cluster when they need to, or a cluster sitting idle.

Using the Cloud Native platform, your system can be configured to launch a new Thor cluster when a job is submitted, so the cluster is never idle and users are not waiting for the queue to become available to run their job. Thor clusters will expand according to the demand although there are cost consideration to bear in mind here too, since the costs also expand with the cluster, so users will need to be aware of the costs involved in running a job.  

Operations/Systems Management

The main change for system management is that the Cloud Native platform will be managed using Helm and Teraform rather than by using Genesis and the HPCC Systems Config Manager. In the Cloud Native platform, systems are configured by defining what the system should look like, rather than by making the new system look a certain way by modifying the configuration of a running system. Each Cloud Native system uses a completely new configuration.

The Cloud Native platform offers easier upgrades to new versions and moreover, there is no machine provisioning involved with clusters that are managed on demand in the Cloud.

What’s new in the HPCC Systems 8.0.0 Cloud Native Platform

HPCC Systems 8.0.0 also includes features and enhancements that are relevant to both our Cloud Native and Bare Metal platform users, which are covered separately in our HPCC Systems 8.0.0 – Cross Platform Highlights blog. 

This blog highlights the new features and enhancements added to our Cloud Native platform since 7.12.0 Gold was released. If you want to track the journey from the start, here are some helpful resources:

System Setup and Helm Chart Usage

The platform team have been improving the helm charts continually throughout the ongoing development of our Cloud Native platform. You will find them along with usage information in our Helm Chart GitHub Repository. You can also read Richard Chapman’s Path to the Cloud blog for more information on how to get started.

Early adopters have also helped us to work through other system setup testing. A team from Infosys has been looking at using a service mesh and two team members have produced blogs providing more information about how to install our Cloud Native platform using Istio and Linkerd on Microsoft Azure. During this work, they ran into a known issue in Kubernetes, which does not support sidecar termination ordering properly. Read these blogs to find out more:

We have added a feature that allows you to run with Linkerd, Istio or any sidecar mesh, without jobs stalling or lingering indefinitely because the sidecar has not been properly terminated.

Some new features and enhancements have also been added to the helm charts. You can now:

Security

Before we look at the new features and tweaks added in this area, these are the main security features supported by our Cloud Native platform:

At the system level, HPCC Systems 8.0.0 introduces the integration of cert-manager support providing a way of defining, generating and managing PKI certificates, which automates the process of setting up HPCC Systems to use HTTPS externally and MTLS internally. 

For ECL HTTPCALLs and SOAPCALLs, the connection and authentication information can now be provided via Kubernetes secrets or Hashicorp Vault secrets.  This provides a standard mechanism for securely storing service credentials and access information for remote services.

HPCC Systems has always provided mechanisms for controlling which operations are permitted in ECL code, to ensure that operations like PIPE and using embedded C++ cannot be used to circumvent access control on files. Our Cloud Native platform also supports ECL security options to the same level as in the Bare Metal platform

In the Helm files, signing keys may be stored in K8s secrets or a Hashicorps Vault and are now deployed automatically through the HPCC Systems helm deployment.

There is a new ESP service to give secure access to daliadmin functionality.

Please see our HPCC Systems 8.0.0 – Cross Platform Highlights blog for details of other security features and improvements that are available in this release.

Data Handling

One of the main focuses of the Cloud Native platform development project has been how to handle data in a world where storage is not local and your data and queries disappear when the system is uninstalled.

Gavin Halliday, Enterprise/Lead Architect, LexisNexis Risk Solutions Group, has written several blogs covering a number of different aspects relating to this issue to accompany the functionality added:

The most recent changes in this area included an improvement made to the generation of storage plane definitions and the use of Persistent Volume Claims.

Next Steps

The release of the Cloud Native platform in HPCC Systems 8.0.0, is the culmination of a development project that has been ongoing for over a year, so what’s next on the agenda?

The Platform Team are looking at a number of research projects that will provide more detailed information about using our Cloud Native platform:

  • Autoscaling
    Providing ways of detecting when the load increases or decreases, so that the number of nodes in use are adjusted accordingly.
  • Cost analysis
    Providing information that automatically reports on the running cost for each workunit and the cost of storage for each file.
  • More documentation
    Case studies using different configurations on Microsoft Azure which will help us to fine tune the way our Cloud Native platform works

The team will also be responding to the findings of our Cloud Native platform users, providing features and fixes to address issues as they are discovered and reported. To report an issue or make a feature request, please use our Community Issue Tracker and keep up to date with known issues and workarounds in the HPCC Systems Red Book.

HPCC Systems is Celebrating 10 Years as an Open Source Big Data Analytics Platform

It seems very fitting that in the year we celebrate this anniversary, we are also launching our Cloud Native Platform. Join us as we mark this anniversary event with users, colleagues, ambassadors and collaborators via a series of video podcasts. It’s great to reflect on how we got to where we are today with the stories shared in this series and look forward to what may lie ahead in the future. View the full list of podcasts on our 10 Year Anniversary Podcast Series Wiki.

Featured Podcast

Join Flavio Villanustre (VP Technology and CISO, LexisNexis Risk Solutions Group) and Richard Chapman (VP and Head of Research and Development, LexisNexis Risk Solutions Group) as they look back on what HPCC Systems looked like at the start in 2000 and our open source journey, bringing you right up to date with HPCC Systems 8.0.0 and our Cloud Native Platform.

Image of the Podcast Video by Flavio Villanustre and Richard Chapman