While this blog introduces you to some of the most notable new features and enhancements available in HPCC Systems 7.0.0. Gold, you may also want to use the following resources to get more detailed information:
We are also working on feature specific blogs designed to help you get started using some of the features mentioned here. So keep checking back for more information about the features that interest you the most.
Let’s take a look at the performance, usability and ECL Language and library improvements you can expect to see.
We have worked hard to make the 7.0.0 series of releases faster and more efficient. While some of these improvements are behind the scenes, we hope you notice the positive impact all the same!
When connecting to esp/roxie to make soap calls, rather than having to open a new connection for each call (which can be expensive under SSL), you can leave the connection open and make multiple calls, reducing the overheads.For more information, see the JIRA issue.
The following new features will mainly be of interest to ECL developers (more information in the ECL Language and Library section below) but they deserve a mention in this section for the improved performance they provide.
Improved Performance of KeyedJoins
We have completely reworked how Thor implements keyedjoins, significantly improving performance in many cases. For more information, see the JIRA issue.
Using a BLOOM filter means the system avoids having to look values up in an index if there is no chance of them being there, giving you better performance. More information is given below and in the JIRA issue. A blog discussing using BLOOM filters in more detail is also available for you to read.
Remote projection and record translation
This new feature reduces the amount of data having to be sent across the network.
When reading data remotely, only the rows and columns which are needed are returned, rather than returning everything and then having to discard what is not needed.
Record translation decouples the declaration of record layouts from the code, meaning that the code does not have to be updated when the layout changes. This makes it much easier to manage data upgrades. For more information, see the JIRA issue.
The following features will help you manage your system and third party products:
Systemd and daemonisation of components
These improvements allow you to manage the start-up process for the HPCC Systems platform with each component starting as a background process. While the systemd scripts will eventually supersede our init scripts, both are available to you in HPCC Systems 7.0.0. For more information see the related JIRA issue and our blog post, Systemd – Easier management of your HPCC Systems components.
HPCC Systems 7.0.0 Beta includes a fully supported VS Code extension which is available in the VS Code Marketplace. More details, including installation information, is available in the vscode-ecl github repository.
Using our new connector, you have the ability to read and write Thor files natively from Spark. For more information, see this JIRA issue for more details about the integration and this JIRA issue to learn more about the implementation of the read/write capabilities.
HPCC Systems-Spark Integration consists of a plug-in to the HPCC Systems platform and a Java library that facilitates access from a Spark cluster to/and from data stored on an HPCC Systems cluster.
The HPCC Systems Spark plug-in integrates Spark into your HPCC System platform. Once installed and configured, the Sparkthor component manages the Integrated Spark cluster. It dynamically configures, starts, and stops your Integrated Spark cluster when you start or stop your HPCC Systems platform.
The Spark-HPCC Systems Distributed Spark Connector employs the standard remote file read facility to read and write data to/from either sequential or indexed HPCC datasets.
Get everything you need to use this new feature.
Note: You must have HPCC Systems 7.0.0 Beta installed to use this connector, which relies on newly implemented remote read capabilities.
WsSQL is now included in the platform
Prior to HPCC Systems 7.0.0 Beta, WsSQL was available as a free module downloaded separately from the HPCC Systems core platform. However, it is now included as part of the core platform distro.
To avoid potential compatibility issues with previous version of WsSQL, you will need to uninstall your existing version of WsSQL before installing HPCC Systems 7.0.0. More information about this is available in the HPCC Systems Red Book.
There are a number of features to look out for in ECL Watch and ECL IDE in this release:
In line with other security improvements we have made to ECL Watch, users are now required to re-login after a period of time. They will be returned to a previously saved state after re-logging in. For more information, see the related JIRA issue.
Log visualisation tool using ELK
Using ELK (ElasticSearch, Logstash and Kibana) alongside ECL Watch, it is now possible to carry out predictive monitoring on your HPCC Systems environment. You can customise the interface to show you exactly what you want to monitor, viewed as visualizations in ECL Watch.
Since visualizations are so much easier to read than trawling through logs, you and your users can anticipate problems far more effectively, which may mean finding a problem sooner than later or maybe even preventing it from happening at all.
Our developers have written a blog showing an example use case created during their testing. It also shows how to integrate your ELK visualisations into the Operations area of ECL Watch under a new tab.
There is a new visual flowchart for running workunits in ECL Watch located on the WU details page. It shows what is happening to a job in terms of where it is in the overall process. The flowchart sections start off gray, turning green as each process completes. All will show red if the workunit fails to complete:
On the Timers page, there are two ways of looking at where the time was spent, allowing you to see why your job completed using the the amount of time taken. There is a Table view, which gives you a detailed breakdown. By clicking on the different areas shown, you can see, for example, how long a Parse or Generate took within the compilation process:
The Chart view displays a column chart showing the same information in a different way:
For more details, see the JIRA issue.
XREF for ROXIE
Manage and report the data on ROXIE clusters. There are some helpful changes to the XREF UI, including column sorting. Now that we have added this functionality for ROXIE, you now have the ability to manage and report data across all clusters in your HPCC Systems environment. For more information, see the related JIRA issue.
Dynamic ESDL functionality is now an integral part of ESP. You can now add ECL/ROXIE and JAVA based services to any ESP on the fly. You can also add new service ports dynamically. Big improvements have been made to the way dynamic ESP services can be added, configured and removed from within the Operations area of ECL Watch. For more information, see the related JIRA issue.
This checkbox has been added to the workunits, logical files, published queries and DFU workunits pages so users can see their own queries only in these lists. For more information, see the related JIRA issue.
The new eclcc indexer speeds up syntax checks by avoiding the reparsing of ECL when compiling with a local repository. For more information, see the related JIRA issue.
ECL Language and library improvements
There are some great additions to the ECL Language and library in HPCC Systems 7.0.0 Beta, which are illustrated briefly below. More detailed information showing how to use them can be found in our ECL Language documentation.
Best attribute added to dedup
This ECL language feature allows you to indicate which records should be retained, rather than retaining the first or last. More information is available in the related JIRA issue.
New attribute in EMBED
Default implementation of GROUP(dataset, fields, ,ALL) has changed
This is something to be aware of because it requires changes to your ECL which may generate different results. See the Red Book for more information about the implications of this change and also the JIRA issue.
Remote read/disk projection and record translation
We have extended this capability to index reads and disk reads on all platforms (not just ROXIE). Please also see the notes in the Performance Enhancements section above. More information is also available in the related JIRA issue.
Unicode standard library functions
View the list of new functions available in the related JIRA issue.
New BLOOM filter
Anyone who is designing indexes or using them will be interested in this new feature. It speeds up the performance of indexes by doing a quick check against a hash table (BLOOM table). More information on how to control which fields you can do this with, is available in the related JIRA issue.
These new date functions are designed to help with day to day project work:
- Find Nth week of Month for the given Date
- Find Nth week of Year for the given Date
Data Patterns Bundle
This new bundle provides useful summary statistical data, allowing you to analyse the content and shape (patterns) of the data in your data files. This helps you to make important decisions about filtering, de-duping and the linking of records, as well as providing information on the changing characteristics of your data over time.
More information about this bundle, (including installation instructions) is available in the Data Patterns repository on GitHub.
Machine Learning Improvements
There are a number of improvements that have been made to the HPCC Systems Machine Learning Library. We are now able to generate the documentation for our machine learning bundles from the sources, see the JIRA issue.
We have also made a number of significant improvements to some of our existing ML bundles:
- Gradient Boosted Trees has been added to the Learning Trees bundle (see the JIRA issue).
- The ML_Core bundle has been extended by the addition of some more descriptive stats in the Field Aggregates module (see the JIRA issue) and also by the inclusion of the Regression2 and Model2 definitions in the Analysis module (see the JIRA issue). Also in the Analysis module, we have added some common capabilities to the IClassify and IRegression interfaces so they are implemented once within ML_Core rather than within each algorithm (see the JIRA issue).
There are two machine learning bundles that are new in HPCC Systems 7.0.0:
- GLM (Generalized Linear Model) bundle. Provides Regression and Classification algorithms for situations in which your data does not match the assumptions of LinearRegression or LogisticRegression. Handles a variety of data distribution assumptions. See the HPCC Systems GLM repository on GitHub for more information.
- SVM (Support Vector Machines) bundle. SVM implementation for Classification and Regression using the popular LibSVM under the hood. See the HPCC Systems Support Vector Machines repository on GitHub for more information.
See the full list of bundles available which includes all supported machine learning bundles.
Interested in learning more about how to use our machine learning library? These blogs are a great source of information:
- Introducing our new, improved Machine Learning Library
- Introduction to using PBBlas on HPCC Systems
- Machine Learning Demystified
- Using HPCC Systems Machine Learning
- Understanding the Myriad interface feature of HPCC Systems Machine Learning
- Musings on Causality and Machine Learning
- Learning Trees – A guide to Decision Tree based machine learning
New ECL Bundles
We have a number of new ECL bundles available in HPCC Systems 7.0.0 providing a variety of new functionality. Some have already had a mention, but here’s the full list:
- GLM bundle – Generalized Linear Model machine learning bundle.
- SVM bundle – Support Vector Machines machine learning bundle.
- Data Patterns Bundle – Data profiling tool. For more information, hear Dan Camper speak about this bundle and how to use it our 2018 Community Day Summit, Watch Recording / View Slides.
- Dapper Bundle – Turns verbose ECL calls into simple verbs.
- Sassy Bundle – ECL helper for SAS calls. For more information, hear Luke Pezet speak at our 2018 Community Day Summit, about his experience of using SAS and HPCC Systems including details of this bundle, Watch Recording / View Slides.
Thinking of contributing an ECL bundle? Take a look at the ECL bundle writer’s guide for more information.
We want your feedback and contributions
Your feedback will help us to make improvements in the future, so do let us know if you have issues, comments or questions. Perhaps you also have an idea for or want to contribute a new feature, enhancement or bundle. Here’s how to interact with our community and get in touch:
Tell us about your experience
- To get advice about usage or to work through a specific issue, post in our Developer forum (you must register to post).
- Found an issue? Raise a ticket in the HPCC Systems Community Issue Tracker – JIRA
- Post comments directly into an existing JIRA ticket that is relevant to the issue you are experiencing.
- Want to contribute? Walk through the process and take a look at the notes for developers included in the readme in our GitHub repository.
- Read our blog to find out more about new features and what’s going on on our HPCC Systems open source project.
- Join us at an upcoming event.
- Subscribe to our forum. Keep up to date with new release information in our Announcements area, read about what other users are doing in our Developer forum and learn about Tips and Tricks.
- Subscribe to our developer newsletter
- Attend one of our Tech Talk webcasts and be a speaker or attend our annual Community Day Summit held in the fall. Find out about past speakers at this event and watch recordings.