HPCC Systems Cloud Native Platform – Importing Data

Getting data on to and persisting your data on a cloud environment is significant and complex when you take into account that IP addresses are not fixed and scaling up and down the size of a cluster happens on demand. Previously, Gavin Halliday has covered some aspects of the challenges involved in Persisting data in an HPCC Systems Cloud native environment. In his blog about Configuring Storage in the Cloud Native HPCC Systems Platform, he introduced the concept of a Storage Plane as a holding space for storing data.

In this blog, Gavin focuses on Storage Planes in a little more detail, looking specifically at importing data in to our cloud native platform otherwise known as using a Landing Zone in the HPCC Systems bare metal world.

Storage Planes and How To Use Them

Storage planes allow you to flexibly configure where the data is stored within HPCC Systems, but it doesn’t directly address the question of how to get data onto the platform in the first place.

To help with this problem, HPCC Systems 7.12.0 includes a new syntax for accessing files directly from a storage plane, which is very similar to the file:: syntax for directly reading files from a physical machine.  The new syntax is:

     ~plane::<storage-plane-name>::<path>::<filename>

Where the syntax of the path and filename are the same as for the file:: syntax.  This includes requiring uppercase letters to be quoted with a ‘^’.  For more details see the Landing Zone Files section of the ECL Language Reference.

How does this allow you to import data into the system?

If you have the myblobs storage plane configured to use azure blobs and you upload a file to that blob storage account with the name: 

 one/two/Three/example.csv

You can read it with the filename:

 ~plane::myblobs::one::two::^Three::example.csv

For example:

 myInput1 := DATASET('~plane::myblobs::one::two::^Three::example.csv', rec, csv);

This functionality is also supported for a bare-metal system.  You can define a dropZone in the environment that uses the azure blob storage.  For example:

<DropZone build="_"
           directory="azure://ghallidayblobs@data"
           name="myblobs"
           ECLWatchVisible="true"
           umask="022">

Note: This will also require the associated secret to deployed to the /opt/HPCCSystems/secrets/storage directory.

Fully integrating storage planes as dropzones is still a work in progress. For instance there is no way to currently allow a storage plane to be used as a drop zone from within ECL Watch and currently the only way to restrict access to a stroage plane is to use the file scope permissions.  Also, for both Kubernetes and bare metal the external data is currently only accessible from ECL.  Future changes will ensure these storage planes are accessible as dropzones from the spray operations in ECL Watch.

Getting Started with the HPCC Systems Cloud Native Platform

If you have not yet experimented with our new cloud native platform, you might like to take a look at the previous blogs in our HPCC Systems Cloud series:

These blogs are a good starting points for understanding the new containerized version of the HPCC Systems platform, which is still under active development. For a full list of resources to help guide you as you learn about and start to use our new cloud native platform, visit our HPCC Systems is going Cloud Native wiki page.

Tags