HPCC Systems 7.0.0 Beta is now available for download. It includes all the major new features we expect to be included in the final gold version, targeted for release later in the year. There are some great new features and performance enhancements you might like to test drive and we're looking for your feedback as we make the final tweaks.
While this blog introduces you to some of the most notable new features and enhancements, you may also want to use the following resources to get more detailed information:
We are also working on feature specific blogs designed to help you get started using some of the features mentioned in this blog. So keep checking back for more information about the features that interest you the most.
Let's take a look at the performance, usability and ECL Language and library improvements you can expect to see.
We have worked hard to make the 7.0.0 series of releases faster and more efficient. While some of these improvements are behind the scenes, we hope you notice the positive impact all the same!
In HPCC Systems 7.0.0, Memcached defaults to on avoiding the recalculation of pages delivered to users. This reduces the load on ESP and improves its responsiveness. For more information, see the JIRA issue.
When connecting to esp/roxie to make soap calls, rather than having to open a new connection for each call (which can be expensive under SSL), you can leave the connection open and make multiple calls, reducing the overheads. For more information, see the JIRA issue.
The following new features will mainly be of interest to ECL developers (more information in the ECL Language and Library section below) but they deserve a mention in this section for the improved performance they provide.
Improved Performance of KeyedJoins
We have completely reworked how Thor implements keyedjoins, significantly improving performance in many cases. For more information, see the JIRA issue.
Using a BLOOM filter means the system avoids having to look values up in an index if there is no chance of them being there, giving you better performance. More information is given below and in the JIRA issue. A blog discussing using the BLOOM filter in more detail is also available for you to read.
Remote projection and record translation
This new feature reduces the amount of data having to be sent across the network.
When reading data remotely, only the rows and columns which are needed are returned, rather than returning everything and then having to discard what is not needed.
Record translation decouples the declaration of record layouts from the code, meaning that the code does not have to be updated when the layout changes. This makes it much easier to manage data upgrades. For more information, see the JIRA issue.
The following features will help you manage your system and third party products:
Systemd and daemonisation of components
These improvements allow you to manage the start-up process for the HPCC Systems platform with each component starting as a background process. While the systemd scripts will eventually supersede our init scripts, both are available to you in HPCC Systems 7.0.0. For more information see the related JIRA issue and our blog post, Systemd - Easier management of your HPCC Systems components.
HPCC Systems 7.0.0 Beta includes a fully supported VS Code extension which is available in the VS Code Marketplace. More details, including installation information, is available in the vscode-ecl github repository.
Using our new connector, you have the ability to read Thor files natively from Spark. We are still working on this connector and hope to include the ability to write files by the time HPCC Systems 7.0.0 goes gold later in the year. For more information, see the related JIRA issue.
The JAR files are available on Maven, although our intentation is to make the download available ourselves for the main gold release. Get everything you need from the Spark-HPCC Systems Connector page, located in the Third Party Integrations area of our website.
You must have HPCC Systems 7.0.0 Beta installed to use this connector, which relies on newly implemented remote read capabilities.
WsSQL is now included in the platform
Prior to HPCC Systems 7.0.0 Beta, WsSQL was available as a free module downloaded separately from the HPCC Systems core platform. However, it is now included as part of the core platform distro.
To avoid potential compatibility issues with previous version of WsSQL, you will need to uninstall your existing version of WsSQL before installing HPCC Systems 7.0.0. More information about this is available in the HPCC Systems Red Book.
There are a number of features to look out for in ECL Watch and ECL IDE in this release:
In line with other security improvements we have made to ECL Watch, users are now required to re-login after a period of time. They will be returned to a previously saved state after re-logging in. For more information, see the related JIRA issue.
Log visualisation tool using ELK
Using ELK (ElasticSearch, Logstash and Kibana) alongside ECL Watch, it is now possible to carry out predictive monitoring on your HPCC Systems environment. You can customise the interface to show you exactly what you want to monitor, viewed as visualizations in ECL Watch.
Since visualizations are so much easier to read than trawling through logs, you and your users can anticipate problems far more effectively, which may mean finding a problem sooner than later or maybe even preventing it from happening at all.
Our developers have written a blog showing an example use case created during their testing. It also shows how to integrate your ELK visualisations into the Operations area of ECL Watch under a new tab.
There is a new visual flowchart for running workunits in ECL Watch located on the WU details page. It shows what is happening to a job in terms of where it is in the overall process. The flowchart sections start off gray, turning green as each process completes. All will show red if the workunit fails to complete:
On the Timers page, there are two ways of looking at where the time was spent, allowing you to see why your job completed using the the amount of time taken. There is a Table view, which gives you a detailed breakdown. By clicking on the different areas shown, you can see, for example, how long a Parse or Generate took within the compilation process:
The Chart view displays a column chart showing the same information in a different way:
For more details, see the JIRA issue.
XREF for ROXIE
Manage and report the data on ROXIE clusters. There are some helpful changes to the XREF UI, including column sorting. Now that we have added this functionality for ROXIE, you now have the ability to manage and report data across all clusters in your HPCC Systems environment. For more information, see the related JIRA issue.
Dynamic ESDL functionality is now an integral part of ESP. You can now add ECL/ROXIE and JAVA based services to any ESP on the fly. You can also add new service ports dynamically. Big improvements have been made to the way dynamic ESP services can be added, configured and removed from within the Operations area of ECL Watch. For more information, see the related JIRA issue.
This checkbox has been added to the workunits, logical files, published queries and DFU workunits pages so users can see their own queries only in these lists. For more information, see the related JIRA issue.
The new eclcc indexer speeds up syntax checks by avoiding the reparsing of ECL when compiling with a local repository. For more information, see the related JIRA issue.
ECL Language and library improvements
There are some great additions to the ECL Language and library in HPCC Systems 7.0.0 Beta, which are illustrated briefly below. More detailed information showing how to use them can be found in our ECL Language documentation.
Best attribute added to dedup
This ECL language feature allows you to indicate which records should be retained, rather than retaining the first or last. More information is available in the related JIRA issue.
New attribute in EMBED
Default implementation of GROUP(dataset, fields, ,ALL) has changed
This is something to be aware of because it requires changes to your ECL which may generate different results. See the Red Book for more information about the implications of this change and also the JIRA issue.
Remote read/disk projection and record translation
We have extended this capability to index reads and disk reads on all platforms (not just ROXIE). Please also see the notes in the Performance Enhancements section above. More information is also available in the related JIRA issue.
Unicode standard library functions
These functions were coded by a high school student whose proposal to complete this work was accepted as an HPCC Systems intern program project in 2017. View the list of new functions available in the related JIRA issue.
New BLOOM filter
Anyone who is designing indexes or using them will be interested in this new feature. It speeds up the performance of indexes by doing a quick check against a hash table (BLOOM table). More information on how to control which fields you can do this with, is available in the related JIRA issue.
General information about BLOOM filters is available here. Please also see the notes in the Performance Enhancements section above and check back to read a blog (coming soon) about using this feature.
These new date functions are designed to help with day to day project work:
- Find Nth week of Month for the given Date
- Find Nth week of Year for the given Date
SALT Profile Bundle
This new bundle provides useful summary statistical data, allowing you to analyse the content and shape (patterns) of the data in your data files. This helps you to make important decisions about filtering, de-duping and the linking of records, as well as providing information on the changing characteristics of your data over time.
More information about this bundle, (including installation instructions) is available in the Salt_Profile GitHub repository.
We need your feedback
Once you've downloaded this beta version and had an opportunity to try out some of the new features, do let us know what you think. Your feedback will help us to make the final gold release even better!
Tell us about your experience
- To get advice about usage or to work through a specific issue, post in our Developer forum (you must register to post).
- Found an issue? Raise a ticket in the HPCC Systems Community Issue Tracker - JIRA
- Post comments directly into an existing JIRA ticket that is relevant to the issue you are experiencing.
- Read our blog to find out more about new features and what's going on on our HPCC Systems open source project.
- Join us at an upcoming event.
- Subscribe to our forum. Keep up to date with new release information in our Announcements area, read about what other users are doing in our Developer forum and learn about Tips and Tricks.
- Subscribe to our developer newsletter
- Attend one of our Tech Talk webcasts and be sure to participate in our upcoming 2018 Community Day Summit