Configuring Storage in the Cloud Native HPCC Systems Platform

If you have not yet experimented with our new cloud native platform, you might like to take a look at the previous blogs in our HPCC Systems Cloud series:

These blogs are a good starting points for understanding the new containerized version of the HPCC Systems platform, which is still under active development.

This blog follows on from Persisting data in an HPCC Systems Cloud Native Environment, which walks through how to persist your data. It also provided a temporary solution for getting data in to the cloud platform, while the team continued to work on a more permanent and seamless solution.

In this blog Gavin Halliday (Enterprise/Lead Architect, LexisNexis Risk Solutions) describes recent changes that allow the storage to be further configured, provides an example of its use and outlines some of the next steps in the roadmap.

First some background.

Bare-metal

In a bare-metal system, the storage tends to be tightly tied to the Thor nodes that are performing the processing.  Data is written locally by the Thor nodes, and when other components want to read that data they access it via the dafilesrv component running on that Thor node.  These nodes tend to have a single type of local disk (or raided disk array), which all the data is written to.  (There is some very rudimentary support for off-node storage, but it isn’t generally used.)

Cloud

By contrast, in a cloud system the processing nodes are ephemeral.  Temporary data may be stored on local disks, but anything with any persistence is stored in independent storage.  Also, rather than a single local disk, cloud providers provide a large selection of different storage options, for example:

  • Local disk
    Useful for temporary files whose lifetime is shorter than the running component.
  • Local NVMe disks
    Very fast local disks.  Could be very useful for storing spill files from Thor jobs, or a local cache for data stored in other locations.
  • Networked hard disks
  • Networked SSDs
  • High performance storage (For example, Azure NetApp Files)
  • Blob storage
    Cheaper storage cost than mounted disks, but higher latency which is good for storing large amounts of data.
  • Archived storage
    A variety of blob storage for items that are rarely required.  Lower storage cost, but high latency and cost if access is needed.

HPCC Systems reads and writes many different types of data, each of which have different performance (and lifetime) characteristics.  Here are some examples of the different types of data and some possible ideas for the best storage mediums.

  • DLLs
    Written once, read a few times, but can be recreated if needed from the ECL.  A networked hard disk, which is copied to a local disk would work well.
  • Logs
    Speed of access is not likely to be critical.  Write granularity may be too small for blob storage.  A possible solution may be to write to a networked hard drive and then archive to blobs at the end of each day.
  • Dali
    Access to lots of relatively small files and sequential access to some large ones.  Mission critical component.  Networked SSDs are likely to be the best option.
  • Thor data
    Tends to be large and read sequentially.  Blobs or networked hard disks might be a good choice.
  • Roxie data
    Tends to be reasonably large and read randomly.  Networked SSDs, or another storage option with a cache in NVMe local disk is likely to be the best option.

The best options will depend on what the cluster is being used for and the SLAs or budget associated with the work being performed on the cluster.  They may even vary from job to job.

Goals for Storage Changes

The goals for the storage changes are to provide an extensible framework which supports all of these use cases, on both bare-metal and the cloud.  It should also extend the flexibility of the bare metal system, for example, by allowing data to be distributed over multiple disks in a single machine.  (This JIRA issue provides details of a long standing feature request.)

Before looking at the changes in detail, it might be worth clarify some terms:

  • Queue – The name of the queue that a workunit is submitted to.
  • Instance – A named instance of a component.
  • Replica – A load-balanced replica of a component instance.
  • Group  – The physical hosts that a particular replica of an instance is running on.
  • Storage plane – A set of one or more logical storage devices that the data will be written to.

Each instance of a component can have multiple replicas (each running on a different group) processing workunits submitted to the same queue with data being written to one or more storage planes.  (The name of the queue matches the name of the component instance.)

Note: Currently the term “cluster” is used to mean the queue a query is submitted to, the set of machines to run a query on and where the data is stored.  Group has also been used with multiple meanings..

Storage Planes

The new design is an extension of the simple configuration that was described in my previous blog on Persisting Data in an HPCC Systems Cloud Native Environment.  It allows extra flexibility when it is required.

HPCC Systems 7.12.0, introduces the concept of storage planes into the helm charts.  The storage planes section has the following format:

 planes:
    - name:        # The name of this storage plane.
      prefix:      # For locally mounted file systems this is the path to the local mount.  
                     For file systems accessed via a protocol/url this is the url prefix.
      pvc:         # The pvc used to access this data.  If supplied the pvc will be mounted 
                     in each component that is required to access the data.  (This option 
                     is not specified for blob storage accessed via a url.)
      secret:      # If a secret is required to access the storage (e.g. an access token), 
                     the name of the entry in the secrets section of the helm chart.  
                     (Not the k8s secret name.)
      numDevices:  # An advanced option - the number of logical storage devices.  This 
                     allows data to be striped over multiple storage locations.  If it is 
                     set, then the pvc corresponds to a set of pvcs: 
                     ``pvc-1`` .. ``pvc-<n>``, which are mounted at locations 
                     ``prefix/d1`` .. ``prefix/d<n>``.
      replication: # Not normally set - the cloud replication is like to be sufficient, 
                     but if specified data will be copied in the background to each of 
                     the replication storage planes.

The definition of the storage plane contains all the information that is required by the components to access that storage.  The planes are used by referring to them in the appropriate storage section.  For example,

 --set storage.daliStorage.plane=dali-plane

If no plane is specified for a storage section the simpler configuration described in my previous blog, on Persisting Data in an HPCC Systems Cloud Native Environment is used.

Each component can also specify the default storage plane used for data, by specifying the dataPlane attribute.  In future versions of the platform logs, spill files and other aspects will also be configurable with similar options.

It is also possible to specify where data should be stored from within the ECL language by using the PLANE option on an OUTPUT.  (An alias for the existing CLUSTER option.)  This allows the ECL programmer to write data to different storage planes depending on the performance/cost requirements for that particular data file.

Note: The logical file name is independent of where the data is stored and the engines can read data from any of the storage planes.

A plane can only be used for a single purpose, for example, for Dali, data, DLLs or spills.  An error will be reported if a plane is used for more than one purpose.  This prevents potential security and corruption issues if ECL queries had access to Dali or DLLs storage.

Secrets

The current secret implementation in HPCC Systems 7.12.0 is provisional and will be refined in future versions. It currently works as described below.

Secrets can be installed in a Kubernetes cluster using the kubectl apply command, for example:

 kubectl apply -f mysecret.yaml

where mysecret.yaml contains:

    apiVersion: v1
    kind: Secret
    metadata:
      name: ghallidaystorage
    type: Opaque
    stringData:
      key: <base64-encoded-access-key>

Having been installed on the cluster they then need to be made available to the different components running on the cluster.  Secrets are currently grouped into 2 categories:

  • ecl
    ecl secrets are installed on all components that run ecl code
  • storage
    storage secrets are installed on all components that need to access storage.

Others categories are planned for future versions.  Within each section there is a list of definitions of the form:

    <hpcc-logical-secret-name>: <actual-deployed-secret-name>

So if you have the following example:

     secrets:
storage:
azure-ghallidayblobs: ghallidaystorage

This indicates that there is a secret associated with storage that is known as azure-ghallidayblobs to HPCC Systems, which corresponds to the k8s secret ghallidaystorage.

Azure blob storage

The azure blob storage is currently specified as a data plane with a prefix of the following form:

 azure://<storage-account-name>@<container-name>

To read and write to the storage account, the platform needs an access key which is provided via a secret.   The system currently prefixes the storage account name with azure- to implicitly generate the name of the secret containing the access key.  (The information about the secret within the plane definition is not currently used, but will be required in future versions.)

An example

In this example, assume I have a Linux machine running Minikube, with two local drives, one SSD, and the other a spinning hard disk (hd).  I also want to write the results to azure blob storage.

The first step is to ensure that the pvcs are created for the current cluster.  To simplify this process there is a new example Helm chart in examples/hpcc-localplanes which allows you to bind any number of pvcs in a single chart install.  I created the following values.yaml file and deployed it to the Kubernetes cluster:

 helm install gchpvc examples/hpcc-localplanes --values <myvalues.yaml>
 planes:
    - name: dll
      path: /mnt/hpccdata/dlls
      size: 3Gi
      rwmany: true
    - name: dali
      path: /mnt/hpccdata/dali
      size: 1Gi
    - name: datahd
      path: /mnt/hpccdata/data-hd
      size: 3Gi
      rwmany: true
    - name: datassd
      path: /mnt/hpccdata/data-ssd
      size: 3Gi
      rwmany: true

This creates the following pvcs:

    dll-gchpvc-pvc
    dali-gchpvc-pvc
    datahd-gchpvc-pvc
    datassd-gchpvc-pvc

The pvc names are a combination of the name specified in the values.yaml and the release name.  The paths in the plane definitions are the paths within the MiniKube host.

Note: These do not necessarily correspond to the paths on the host machine.  The following MiniKube mount commands are used to mount the host directories within the MiniKube host:

    minikube mount /home/gavin/hpccdata/dlls:/mnt/hpccdata/dlls --gid=1000 --uid=999
    minikube mount /home/gavin/hpccdata/dali:/mnt/hpccdata/dali --gid=1000 --uid=999
    minikube mount /home/gavin/hpccdata/data:/mnt/hpccdata/data-ssd --gid=1000 --uid=999
    minikube mount /media/gavin/ddrive1/hpccdata:/mnt/hpccdata/data-hd --gid=1000 --uid=999

Note: For windows docker desktop the paths need to be of the form /run/desktop/mnt/host/c/<subdir> to correspond to files in c:<subdir>.

Next, I installed a secret that provides the access key to the azure storage account:

    apiVersion: v1
    kind: Secret
    metadata:
      name: ghallidaystorage
    type: Opaque
    stringData:
      key: <base64-encoded-key>

with the command:

     kubectl apply -f ghallidaystorage.yaml

I then installed an instance of the hpcc chart, supplying the extra values file:

   storage:
      planes:
      - name: localSSD
        prefix: /var/lib/hpccsystems/ssd
        pvc: "datassd-gchpvc-pvc"
      - name: localHD
        prefix: /var/lib/hpccsystems/hd
        pvc: "datahd-gchpvc-pvc"
      - name: azureBlobs
        prefix: azure://ghallidayblobs@data
        secret: azure-ghallidayblobs
      dllStorage:
        existingClaim: "dll-gchpvc-pvc"
      daliStorage:
        existingClaim: "dali-gchpvc-pvc"
      dataStorage:
        plane: localHD

    secrets:
      storage:
        azure-ghallidayblobs: ghallidaystorage

The first plane definition is for the SSD data storage. The pvc name matches a pvc created by the previous pvc deployment and the prefix provides the mount point within the container. Again this path does not necessarily match the host pathname.

The second plane definition is similar for the local hard disk storage.  The third plane is for the azure storage, using the data container within the ghallidayblobs storage account.

DLLs and Dali still use the simpler mechanism of directly specifying the pvc to use for the storage.  The data storage is configured to use the local hard disk by default.

Accessing the planes from ECL

Finally you can run the following ECL:

 Layout_Person := RECORD
      UNSIGNED1 PersonID;
      STRING15  FirstName;
      STRING25  LastName;
    END;
    allPeople := DATASET([ {1,'Fred','Smith'},
                           {2,'Joe','Blow'},
                           {3,'Jane','Smith'}],Layout_Person);
    somePeople := allPeople(LastName = 'Smith');
    //  Outputs  ---
    output(somePeople,,'somefile1',overwrite,cluster('localssd'));
    output(somePeople,,'somefile2',overwrite,cluster('localhd'));
    output(somePeople,,'somefile3',overwrite,cluster('azureblobs'));

Which will generate an output file on each of the 3 storage planes.

Next steps

An initial implementation is included in HPCC Systems 7.12.0, but the functionality is not yet complete. The following is a list of some of the next steps and future milestones:

  • Imports/landing zones
    Introduce a method of specifying landing zones mapped onto the storage planes and allow easy direct access to files from within ECL.
  • Secrets
    The implicit names of the secrets and definition may change.  This will become clearer once we support external secret vaults.
  • Allow the same file to exist on multiple storage planes
    This is similar to the bare metal ability to have a file stored in multiple places (e.g. thor and roxie).
    Note: For a bare metal system the files on the different locations have the same path, but that is no longer true for storage planes.  This will require refactoring the way file metadata is stored, so it will not be part of the 7.12.x series.
  • Clean up the current cluster/group terminology and provide access from ECL
    This involves three things, firstly, implementing OUPUT(ds,,PLANE()), which is a synonym for CLUSTER that fits in better with the storage plane structure.  This is likely to be available in a 7.12.x release. Secondly, allowing the default data plane to be configured for a workunit, probably via a #option and finally, allowing a QUEUE() and PLANE() attributes on a workflow item.
  • Implement bare-metal storage planes
    The storage plane definitions above only cover non-local storage – where the device is either locally mounted, or accessed via a URL.  Bare metal storage planes are not yet implemented, but they are likely to have have a couple of extra elements.  There will be another section in the helm chart which allows a host-group to be defined.  This is a list of host names that define the nodes containing the storage.  The storage plane will indicate which host group it uses.  The goal is to allow existing bare-metal configurations to be fully supported by the storage plane definitions
  • Improve support for blobs, and adls, s3, etc
    The current azure blob implementation is a proof of concept.  There is a large amount of work left to do, for example, refactoring the file reading interfaces, profiling and optimizing the code, implementing asynchronous readahead, adopting the Azure data lake storage api, and adding support for AWS s3 blobs.  There are also several operations which are free on locally mounted storage, but are potentially slow or costly on blob storage, which all need to be reviewed.
  • Allow logs/spills/xxx to be written to a pvc
    Currently only data, dali and dll can be configured.   This should be extended to allow logs, spills and other information to be configured for a system as a whole and for each individual component.

There are bound to be more exotic storage configurations that HPCC Systems should support.  These may require extensions to the plane definitions.

Find out more about the HPCC Systems Cloud Native Platform, including blog posts and interviews with the developers working on this ongoing development project.