HPCC Systems and the cloud should be a natural fit. Both involve large quantities of identical, commodity, interchangeable hardware, working together on a problem. However, a couple of design decisions in HPCC Systems (made for very good reasons back in the day) can make operation of today’s HPCC Systems platform on the cloud a little complex, for a couple of reasons:
- Firstly, HPCC Systems (and in particular, the Thor engine) gains significant economies of cost, and benefits in performance, from using the local storage that comes “for free” along with the computing nodes as the storage for the data that is consumed and created by the cluster.
- Secondly, HPCC Systems tends to assume a known, static topology on pre-allocated IP addresses and that an IP address can be used to refer to a machine (and the data stored on it) indefinitely.
In a public cloud scenario such as AWS or Azure, machines are generally viewed as much more ephemeral than we treat them in our data center. The machine you are running on tomorrow may not be the same one you ran on today, so you can’t store files long-term on local disk nor can you assume an IP is a persistent identifier. The only way to do so would be to keep your machine instances up 24/7, but that’s expensive and defeats many of the benefits of being in the cloud.
We’ve been making changes to reduce the scope of the above assumptions in HPCC Systems over the past few releases, so that (for example) we resolve hostnames to IP addresses much later in the start-up process. Such incremental changes have helped, but the next change needs to be a bigger one that changes the HPCC Systems platform’s internal assumptions to ones that match the requirements of a modern Public Cloud infrastructure.
From now on and starting with changes in the newly released 7.8.x series, HPCC Systems will include native support for containerization. In particular, Docker containers managed by Kubernetes (“k8s”) is a new target operating environment, alongside continued support for traditional “bare metal” installations via .deb or .rpm files.
This is not a lift and shift type change, where we run the existing structure unchanged and treat the containers as just a way of providing virtual machines for us to run on, but a significant change in how components are configured, how and when they start up and where they store their data.
We aim to adhere to standard conventions regarding how Kubernetes deployments are normally configured and managed so far as possible, so that it should be easy for someone familiar with Kubernetes to install and manage HPCC Systems.
Processes and pods, not machines
Anyone familiar with our existing configuration system will know that part of the configuration involves creating instances of each process and specifying on which physical machines they should run. In a K8s world, this is something that is done dynamically by the K8s system itself (and may be changed dynamically as the system runs). Additionally, a containerized system is much simpler to manage if you stick to a one process per container paradigm, where the decisions about which containers need grouping into a pod and which pods can run on which physical nodes, can be made automatically.
In the containerized world, the information that the operator needs to supply to configure an HPCC Systems environment is greatly reduced. There is no need to specify any information about what machines are in use by what process, as mentioned above, but there is also no need to change a lot of options that might be dependent on the operating environment, since much of that is standardized at the time the container images were built. Therefore, in most cases, most settings should be left to use the default. As such, we designed our new configuration paradigm such that only the bare minimum of information needs to be specified and any parameters not specified will use the appropriate defaults.
The default environment.xml that we include in our bare-metal packages to describe the default single-node system runs to 1300 lines and it is complex enough that we had to provide a special tool for editing it. By contrast, the values.yaml from our default helm chart is well under 100 lines and can be edited in any editor, and/or modified via helm’s command-line overrides.
In order to realize the potential cost savings of a cloud environment while at the same time taking advantage of the ability to scale up when needed, some services which are always on in our traditional bare-metal installations are launched on-demand in our cloud installations.
An eclccserver component will launch a stub requiring minimal resources, where the sole task is to watch for workunits submitted for compilation and launch an independent k8s job to perform the actual compile. Similarly, the eclagent component is a stub that launches a k8s job when a workunit is submitted and the Thor stub starts up a Thor cluster only when required. Using this design, not only does the capacity of the system automatically scale up to use as many pods as needed to handle the submitted load, it scales down to use minimal resources (as little as a fraction of a single node) during idle times when waiting for jobs to be submitted.
ESP and Dali components are always on as long as the k8s cluster is started, as it is not feasible to start/stop them on demand without excessive latency. However, ESP can be scaled up and down dynamically to run as many instances as required to handle the current load. In theory it should be possible to have k8s do this scaling automatically, though this has not yet been tried.
Topology settings – Clusters versus queues
In the existing bare-metal installations, there is a section called Topology where the various queues that workunits can be submitted to are set up. It is the responsibility of the person editing the environment to ensure that each named target has the appropriate eclccserver, hthor (or ROXIE) and Thor (if desired) instances set up, to handle workunits submitted to that target queue.
This setup has been greatly simplified when using Helm charts to set up a containerized system. Each named Thor or eclagent component creates a corresponding queue (with the same name) and each eclccserver listens on all queues by default (but you can restrict to certain queues only if you really want to). Defining a Thor component automatically ensures that the required agent components are provisioned.
When setting up an eclagent component, there is a new type setting that allows you to indicate that the ROXIE engine should be used rather than the hthor engine. At some point, the hthor engine may be deprecated so that we only need to support ROXE.
Note: ROXIE running in this mode is a new “one-shot” ROXIE where the ROXIE process is started per-workunit and terminates at the end, just as hthor has always done.
Replicas versus maxActive
Most components in the helm chart support a replicas setting, which specifies the initial number of pods that will be started running the relevant container or stub. Stub components that launch child jobs on demand also have a maxActive setting, that can be used to limit the number of child jobs which will be allowed to run at once, for example, to limit the maximum spend rate.
Note: This setting is (currently) per pod, so if your eclagent is specified to have 2 replicas, and maxActive is set to 100, there may be up to 200 child jobs running at once.
Fetching the hpcc helm chart
To use the HPCC Systems helm chart, you need to first add it to the helm repository list, as shown below:
helm repo add hpcc https://hpcc-systems.github.io/helm-chart/ "hpcc" has been added to your repositories helm search repo hpcc NAME CHART VERSION APP VERSION DESCRIPTION hpcc/hpcc 7.8.22 7.8.22 An example Helm chart for launching a HPCC clus...
Starting a default system.
The default helm chart with nothing overridden will start a simple test system with Dali, ESP, eclccserver, two eclagent queues (ROXIE and hthor mode), and one Thor queue. There is a ROXIE cluster defined in the example too, but it is disabled. I will talk about how to use ROXIE in containerized mode in a future blog.
To start this simple system, it is enough to run:
helm install mycluster hpcc/hpcc --set global.image.version=latest
Assuming you have kubectl properly configured (for example, pointing to a local cluster set up using minikube or docker desktop), you should get output similar to this:
NAME: mycluster LAST DEPLOYED: Fri Apr 3 12:32:56 2020 NAMESPACE: default STATUS: deployed REVISION: 1 TEST SUITE: None $ kubectl get pods NAME READY STATUS RESTARTS AGE hthor-56bd86f4b9-h5746 1/1 Running 0 30s mydali-7f669fd7d6-frjbh 1/1 Running 0 30s myeclccserver-8665655fc8-krh8r 1/1 Running 0 30s myesp-77f4c69ddf-chjrn 1/1 Running 0 30s roxie-bb8dcdcd9-6gmlj 1/1 Running 0 30s thor-agent-797fdf698f-lmlzw 1/1 Running 0 30s thor-thoragent-974c58fbb-c5cd4 1/1 Running 0 30s
It may take a while before all components are running, especially the first time as the container images will need to be downloaded from docker hub.
hthor-56bd86f4b9-qztmx 0/1 Init:0/1 0 77s mydali-7f669fd7d6-sbnhb 0/1 PodInitializing 0 77s myeclccserver-8665655fc8-48vpw 0/1 Init:0/1 0 77s myesp-77f4c69ddf-v86dw 0/1 ContainerCreating 0 77s roxie-bb8dcdcd9-bhzh8 0/1 Init:0/1 0 77s thor-agent-797fdf698f-6lvj7 0/1 ContainerCreating 0 77s thor-thoragent-974c58fbb-gt4ks 0/1 ContainerCreating 0 77s
Once all say running, you can submit a job. If you have HPCC Systems client tools installed, you can run from the command line:
echo " 'Hello from ECL' " | ecl run - -t=hthor --server=<ip:port>
Where ip:port is the ip and port is where the myesp service has been exposed by Kubernetes.
If you are running Docker Desktop, the ip:port is usually localhost:8010 (so the --server parameter can be omitted), but when using minikube there is a proxy on the minikube host machine and you can obtain the ip and port using:
minikube service myesp --url
You can also access the eclwatch page by visiting http://localhost:8010 (for Docker Desktop) or by running minikube service myesp (on minikube).
Note: Some pages in ECL Watch, such as those displaying topology information, are not yet fully functional in containerized mode.
To stop the cluster, run:
helm uninstall mycluster
Modifying the cluster
If you want to set up a k8s cluster with different settings to the example in the standard helm chart, you can supply your own configuration using the example as your starting point. Some global settings can be overridden individually using --set as we did when specifying the image label above, but to change which components are available, or the settings within components, you’ll need to pass in a replacement yaml file for each section you want to replace (this is a limitation of Helm, which does not handle modifying settings within arrays well). The simplest way to get a starting point for your cluster definition is by using:
helm show values hpcc/hpcc
This will show you the settings that are applied by default and should look something like this:
# Default values for hpcc. global: # Settings in the global section apply to all HPCC components in all subcharts image: ## It is recommended to name a specific version rather than latest, for any non-trivial deployment #version: latest root: "hpccsystems" # change this if you want to pull your images from somewhere other than DockerHub hpccsystems pullPolicy: IfNotPresent # logging sets the default logging information for all components. Can be overridden locally logging: detail: 100 # Specify a defaultEsp to control which EclWatch service is returned from Std.File.GetEspURL, and other uses # If not specified, the first esp component present is assumed. # Can also be overridden locally in individual components ## defaultEsp: myesp ## storage: ## ## If storage.[type].existingClaim is defined, a Persistent Volume Claim must ## exist with that name. If an existingClaim is specified, storageClass and ## storageSize are not used. ## ## If storage.[type].storageClass is defined, storageClassName: <storageClass> ## If set to "-", storageClassName: "", which disables dynamic provisioning ## If undefined (the default) or set to null, no storageClassName spec is ## set, choosing the default provisioner. (gp2 on AWS, standard on ## GKE, AWS & OpenStack) ## ## storage.[type].forcePermissions=true is required by some types of provisioned ## storage, where the mounted filing system has insufficient permissions to be ## read by the hpcc pods. Examples include using hostpath storage (e.g. on ## minikube and docker for desktop), or using NFS mounted storage. storage: dllStorage: storageSize: 3Gi storageClass: "" # existingClaim: "" # forcePermissions: false daliStorage: storageSize: 1Gi storageClass: "" # existingClaim: "" # forcePermissions: false dataStorage: storageSize: 1Gi storageClass: "" # existingClaim: "" # forcePermissions: false dali: - name: mydali eclagent: - name: hthor ## replicas indicates how many eclagent pods should be started replicas: 1 ## maxActive controls how many workunits may be active at once (per replica) maxActive: 100 ## prefix may be used to set a filename prefix applied to any relative filenames used by jobs submitted to this queue prefix: hthor ## Set to false if you want to launch each workunit in its own container, true to run as child processes in eclagent pod useChildProcesses: false ## type may be 'hthor' (the default) or 'roxie', to specify that the roxie engine rather than the hthor engine should be used for eclagent workunit processing type: hthor - name: roxie replicas: 1 prefix: roxie useChildProcesses: false type: roxie eclccserver: - name: myeclccserver replicas: 1 ## Set to false if you want to launch each workunit compile in its own container, true to run as child processes in eclccserver pod. useChildProcesses: false ## Specify a list of queues to listen on if you don't want this eclccserver listening on all queues. If empty or missing, listens on all queues listen:  esp: - name: myesp replicas: 1 roxie: - name: roxie-cluster disabled: false prefix: roxiecluster services: - name: query port: 9876 listenQueue: 200 numThreads: 0 external: true - name: on-demand port: 0 numChannels: 2 ## Set serverReplicas to indicate a separate replicaSet of roxie servers, with slave nodes not acting as servers serverReplicas: 0 topoReplicas: 1 topoport: 9004 ## Set localSlave to true for a scalable cluster of "single-node" roxie servers localSlave: false useAeron: false thor: - name: thor numSlaves: 2 globalMemorySize: 4096 prefix: thor eclagent: maxActive: 4 thoragent: maxActive: 2
If you redirect the output to a file (let’s call it myfile.yaml), you can use it as the basis for defining your own HPCC Systems cluster, then start it using:
helm install mycluster hpcc/hpcc --values myfile.yaml
For example, we could use the myfile.yaml to define the simplest possible useable HPCC Systems cluster:
global: image: version: community_7.8.20-1 dali: - name: mydali eclagent: - name: hthor eclccserver: - name: myeclccserver esp: - name: myesp roxie: thor:
Here we have left as many settings as possible using their default values and defined a HPCC Systems cluster containing a dali, eclagent, eclccserver, and esp, all without specifying any persistence. Note that we define empty ROXIE and Thor sections, in order to override the default ROXIE and Thor sections from the helm chart.
Storage and persistence
As mentioned earlier, containerized HPCC Systems clusters do not use local storage for any persistent data instead, data is stored on persistent volumes set up using the standard k8s PersistentVolumeClaim mechanism. There are three types of persistent storage employed by the HPCC Systems cluster and each has its own storage: section in the helm chart’s values file to control them:
This is where your data files are stored. Controlled by storage.dataStorage section
Used for generated workunit dlls. Controlled by storage.dllStorage section
Used for the dali system state information store. Controlled by storage.daliStorage section
Within each of the sections mentioned above, the information that can be provided is the same:
Indicates the size of the PersistentVolumeClaim that will be created
Indicates a specific storage class to be used for dynamic storage provisioning. If blank, the default storage class is used. If “-“, dynamic provisioning is disabled.
Indicates that an existing persistent volume claim should be used to provide the storage in question.
Indicates that we need to apply permissions to allow hpcc user to access this storage.
When running on a single-node test system such as minikube or Docker Desktop, the default storage class will normally mean that all persistent volume claims map to temporary local directories on the host machine, which will be removed when the cluster is stopped. This is great for testing but for any real application you probably want your data to be stored somewhere a little more persistent! There is also an interesting feature of this default local storage class that means it may only be accessible to the root user. Since HPCC Systems processes are not run as root, we had to add a workaround to set the permissions on these temporary volumes before they can be used. Setting forcePermissions to true will enable this workaround.
More information on this topic can be found in Gavin Halliday's blog Persisting data in an HPCC Systems Cloud native environment
To get the first preview of the new design, reference it from helm and let Kubernetes install it automatically, using the instructions provided in this blog. However, there are some caveats. In particular any early adopters should expect that things are likely to change without notice in subsequent releases as we discover more about how things should fit together in the new Kubernetes world.
We also value your input into this process, so if you find something that needs addressing, please add an issue into our JIRA Community Issue Tracker, so we can investigate and provide a solution.
If you’re using Microsoft Azure, specific instructions on how to use the new HPCC Systems 7.8.x design on this type of cloud environment are available in Jake Smith’s blog Setting up a default HPCC Systems cluster on Microsoft Azure Cloud.