HPCC Systems 6.0.0 Beta 1 feature details

We have prepared a Beta version of HPCC Systems® 6.0.0 to coincide with the HPCC Systems® Summit 2015.

While the gold version of this release will not be available until early 2016, we have already started implementing a number of features that will seriously improve your HPCC Systems® experience. This Beta release includes a number of technical previews which you can take a look at in advance of the gold release. As with all beta releases, it’s a work in progress. Do download it and play with it, just don’t expect it to be perfect… yet!

Tell us what you think and please do let us know if you encounter problems using our Community Issue Tracker.

Dali replacement for workunit storage –https://hpccsystems.atlassian.net/browse/HPCC-12251

We have been working on an alternative solution for storing workunit information which has been one of DALI’s functions since HPCC Systems® was created 11 years ago. There are a number of reasons why we need to make changes in this area, the most central reason being that DALI is a single point of failure which does not support redundancy, hot swap or failover. While DALI has held its own admirably over the years, we must think about the future and the need to plan for and respond to the challenges of ever increasing amounts of storage capacity that is needed to meet the demands put on DALI, which eventually may start to have an impact its reliability and maintainability.

There are also scalability issues in that DALI has to fit in memory on one machine and so its performance cannot be enhanced by adding more nodes.

The HPCC Systems® 6.0.0 Beta is the first release to contain an option to use a Cassandra database in place of DALI for storing workunit information. This option is configured via the environment.xml file. When in use, it will be transparent to users, (although this is a technical preview and should be used with care while we complete development).

This Beta version doesn’t provide a method for migrating existing workunits from a DALI store to a Cassandra one or vice versa, although we will be providing this capability in the gold version of this release.

Our expectation is that from a capacity point of view, there will be no need to archive off workunits once you start to use the Cassandra store. In this Beta version, Sasha will not attempt to do this.

All other information will remain in DALI but, this work at least starts to take the load off DALI a little.

New Quantile Activity – https://hpccsystems.atlassian.net/browse/HPCC-12267 
This new activity is a technical preview in HPCC Systems® 6.0.0 Beta. It can be used to find the records that split a dataset into two equal sized blocks. Two common uses are to find the median of a set of data (split into 2) or percentiles (split into 100). It can also be used to split a dataset for distribution across the nodes in a system. One hope is that the classes used to implement quantile in Thor can also be used to improve the performance of the global sort operation. Gavin Halliday has written a blog series on this feature:

Security enhancements in the compiler
Versions of HPCC Systems® prior to 6.0.0 have always allowed some control over which operations were permitted in ECL code. This was done, primarily, as a way to ensure that operations like PIPE or Embedded C++ could not be used to circumvent access controls on files by reading them directly from the operating system. In HPCC Systems® 6.0 Beta, we have implemented two features to provide more flexibility over the control of operations.

  • We now limit which SERVICE functions are called at compile time using the FOLD attribute.
    https://hpccsystems.atlassian.net/browse/HPCC-13173
  • You can configure the access rights (which control the ability to use PIPE, embed C++, or declare a SERVICE as foldable) to be dependent on the code being signed. This means that we can provide signed code in the ECL Standard Library that makes use of these features, without opening them up for anyone to call anything.
    https://hpccsystems.atlassian.net/browse/HPCC-14185

The way that code signing works is similar to the way that emails can be signed to prove that they are from who they say they are and they have not been tampered with, using the standard gpg package. A file that has been signed will have an attached signature containing a cryptographic hash of the file contents with the signer’s private key. Anyone in possession of the signer’s public key can then verify that the signature is valid and that the content is unchanged.

So, in the ECL scenario, we will sign the SERVICE definitions provided by the ECL standard plugins and include the public key in the HPCC platform installation. Code that tries to use service definitions that are signed will continue to work as before but, code that tries to call arbitrary library functions using user-supplied SERVICE definitions will give compile errors.

Operations/system administrators can install additional keys on the eclserver machine, so if you want to use your own service definitions, they can be signed using a key that has been installed in this way. Using this method, a trusted person can sign code to indicate that it is ok for untrusted people to use, without allowing the untrusted people to execute arbitrary code.

Virtual-slave thor – https://hpccsystems.atlassian.net/browse/HPCC-8710 
We are in the process of implementing this feature which provides the ability to configure thor to run multiple virtual slaves. In previous versions, thor can run multiple thorslaves per node using the setting slavesPerNode but none of the slaves can share resources. The new implementation means each thorslave process will be able to run multiple virtual slaves within the same process allowing the sharing of a number of useful resources.

The downside of the old approach, with a process per slave, was that slaves did not share any resources and are forced to have a predefined slice of the total memory. It also meant that communication between master and slaves had to broadcast the same information to each slave. The slaves were completely unaware of one another.

The new approach with virtual slaves, has several advantages:

  • All virtual slaves will share a common memory pool. This will mean that instead of every slave being tied to a limited slice, it will be possible that some can use more than others where total resources permit it. The upshot of this is that there will be less overall spilling, i.e. Thor will utilize a greater percentage of total memory at each phase. Also, there is greater control over what and when spilling can take place.
  • Collective caching. Thor slaves cache various things. In the new virtual slave setup, the caching will be shared, meaning it can be larger and have more cache hits. Index key page caching is an example.

Future improvements… some in RC:1

  • Activity implementations can take advantage of the single process, multi channel view. One of the best examples of this is probably Lookup Join & Smart Join, which builds a table of the global RHS and broadcasts it to all other nodes. The old process per slave approach meant than there were N copies of the RHS table on each physical node, limiting the size that could be used in a Lookup Join to a fraction of what it could be.

In the new virtual slave implementation, only 1 of the RHS copies will exist, meaning a LookupJoin can actually utilize all of memory. It also means that the RHS is no longer broadcast N times to the same physical node, saving on network traffic and time. The upshot is that Lookup/Smart join will be faster and scale to greater sizes.

Many other activities will also be able to take direct advantage of the single shared process, for example, PARSE, which can share the same parsed AST between virtual slaves.

  • More efficient communication
    In the initial version, the virtual slaves will still communicate to a large extent via their own MP port/listener, that communicates over the loopback device (as they did before). However future versions will involve more direct slave to slave communication, which will not only avoid using the network interface and MP, but involve a lot less costly serialization and deserialization of rows.
  • Improved CPU utilization
    In the process per slave world, highly parallel tasks like sort tend to use all available cores. Where there are multiple slaves per node, this results in the inefficient over commitment of the CPU/core’s available. In the new virtual slave scheme, all slaves will be able to use a common pool and collaborate to increase utilization and avoid contention.
  • Simplified startup and configuration
    The old process per slave startup process, is a bit ‘tired’, launching and co-ordinating N processes per node that register with the master is scripted, Thor is unaware on the whole that the slaves co-exist on the same node. This makes the start/stop cleanup process and scripts overly complicated. The new setup should mean that the Thor cluster scripts can be streamlined.

TRACE Activity – https://hpccsystems.atlassian.net/browse/HPCC-12872 
A new TRACE activity is now available which provides a way of putting ‘tracepoints’ in to ECL code that will cause some or all of the data going through that part of the graph to be saved into the log file.

Tracing is not output by default even if TRACE statements are present. The workunit debug value traceEnabled must be set or the default platform settings changed to always output tracing.

In Roxie you can also request tracing on a deployed query by specifying traceEnabled=1 in the query XML. So you can leave trace statements in the ECL without any detectable overhead until tracing is enabled. Tracing is output to the log file, in the form:

 TRACE: value...

The syntax to use is:

 myds := TRACE(ds [, traceOptions]);

The available options are:

  • Zero or more expressions, which act as a filter. Only rows matching the filter will be included in the tracing.
  • KEEP (n) indicating how many rows will be traced.
  • SKIP (n) indicating that n rows will be skipped before tracing starts.
  • SAMPLE (n) indicating that only every nth row is traced.
  • NAMED(string) providing the name for the rows in the tracing.

It is also possible to override the default value for KEEP at a global, per-workunit, or per-query level.

 #option (traceEnabled, 1) // trace statements will be enabled #option (traceLimit, 100) // overrides the default keep value (10) for activities that don't specify it.

Note: You can use a stored boolean as the filter expression for a trace activity. This can allow you to turn individual trace activities on and off.

New option in Roxie to bind queries to cores –https://hpccsystems.atlassian.net/browse/HPCC-12599
This option has been added because we believe that Roxie will perform better in some circumstances if thread affinities are used to restrict a query’s threads to a subset of cores on a machine. To use this option, either set coresPerQuery in the RoxieTopology.xml or set bindCores in the workunit debug values located in the query XML, to indicate that execution of this query should be bound to, at the most, N cores out of all the ones that the Roxie process itself is bound to.

DFUPLUS restore super files feature –https://hpccsystems.atlassian.net/browse/HPCC-14111
We have extended the DFUPLUS feature allowing users to restore all the files from a previous session to include the ability to export and restore superfiles.

Init System improvements
Building on the improvements made in 5.4.0, here are a few more features we have added to building and using HPCC Systems® from an operational standpoint easier:

  • If you want to use a different jvm for the Java plugin, you can now do this by specifying the jvm you want to use in the environment.conf file. Simply assign the absolute path of the libjvm.so to the JNI_PATH variable.  When the component starts, it will be automatically recognized and linked to.
    https://hpccsystems.atlassian.net/browse/HPCC-12943
    https://hpccsystems.atlassian.net/browse/HPCC-14245
  • Administrators have always had the ability to save attributes for a roxie or thor component in the environment.xml, however, on startup, the system would try to start the unlisted components which could cause confusion. Now, the component will only start if it’s also declared in the topology section of the environment.xml.
    https://hpccsystems.atlassian.net/browse/HPCC-13844
  • We have added some checking features to detect configgen failures and produce improved error messages with better information about the problem found. We are also working on further improvements to distinguish between errors and warnings. This will mean that valid scenarios that users may want to experiment with from a testing perspective will throw a warning but will not stop a component from starting, which is what currently happens.
    https://hpccsystems.atlassian.net/browse/HPCC-13980

New ESP service methods
Two new ESP service methods are now available in HPCC Systems® 6.0.0 Beta:

  • WSESPControl.SetLogging Previously, to make changes to the logging level on an ESP, the logging level had to be changed in the environment.xml and a restart of the ESP was required. Using this feature, an admin user may temporarily change the ESP logging level when an ESP is running. The newly set logging level remains in use only for as long as the ESP remains running the same session. If the ESP is restarted at any time, the ESP logging level reverts back to the original logging level defined in environment.xml.
    https://hpccsystems.atlassian.net/browse/HPCC-12914
  • WsWorkunit.WUGraphQuery Previously, to get a list of graphs, a call to WsWorkunit.WUInfo was needed which uses a lot of ESP time and resources. We have implemented this new method to make it easier for you to get the list of graphs and the say what kind of graphs you want listed.
    https://hpccsystems.atlassian.net/browse/HPCC-13939

DynamicESDL support for writing web-services in JAVA –
https://hpccsystems.atlassian.net/browse/HPCC-14255 

This new feature is a technical preview for HPCC Systems® 6.0.0 Beta. You can now write ESP web-services in the JAVA language using Dynamic ESDL. Creating a JAVA service is made simple by the generation of Java base classes after the ESDL interface has been defined. A DynamicESDL web-service can now contain a mixture of operations written in JAVA and those written in ECL (running on roxie).

Once the HPCC Systems® platform is installed, configure an instance of DynamicESDL to run on ESP using configmgr.   The steps are documented in this manual: https://hpccsystems.com/download/documentation/learning-ecl/dynamic-esdl

For this technical preview, the documentation (including the ReadMe.txt) assumes that you are using port 8088 which we recommend you do to make following the walkthrough example easier.

The DynamicESDL example provided, includes a walkthrough of creating a JAVA service and is located here: /opt/HPCCSystems/examples/EsdlExample.

Additional Nagios monitoring feature
HPCC Systems® 6.0 Beta includes an additional feature in this area which means that if you have Nagios setup on your system, it will now carry out thormaster process checks and report the status back, which you can clearly see via the monitoring interface now built in to ECL Watch.