Add-on Modules

HPCC Systems continues to develop new stand-alone applications or plug-in modules that extend the capabilities of the base HPCC platform.


MySql Import

Import schemas from MySQL.

Hadoop Data Integration

This connector provides access to HDFS data files from HPCC Systems.
Note: This module is no longer actively supported, but listed here for archival purposes.


WsSQL is a service that provides a SQL interface into HPCC Systems.

Previously, WsSQL was installed using a separate download package. From HPCC Systems 7.0.0. Beta onwards, WsSQL functionality has been fully integrated into the HPCC Systems platform package.

We recommend that existing WsSQL users preparing to upgrade from an earlier version of our platform (such as version 6.x.x), uninstall the old WsSQL package before upgrading to avoid potential compatibility issues./p>

Text Processing


Trigram manipulation.

Web Log Analytic Module

Our Web Log Analytic Module (WLAM) can correlate terabytes of log data in a matter of minutes, and perform complex transformation and linking enhancing the value of the existing data.

Prefix Tree

Improves Levenshtein edit distance performance


Natural Language Parsing (NLP) – The ability to parse and mine complex (or simple) structured data out of unstructured text using linguistic or ‘regular expression’ techniques.
Note:The NLP feature is included in the HPCC Systems platform without requiring additional module



Visualizer Bundle

The Visualization bundle is an open-source add-on to the HPCC platform to allow you to create visualizations from the results of queries written in ECL. Visualizations are an important means of conveying information from massive data.

Cell Formatter

Format ECL data for display


HPCC Systems continues to develop new plugins which are free to the community to help you integrate third party tools with the HPCC Systems platform.

Spark-HPCC Systems Integration

HPCC Systems-Spark Integration consists of a plug-in to the HPCC Systems platform and a Java library that facilitates access from a Spark cluster to/and from data stored on an HPCC Systems cluster.
The HPCC Systems Spark plug-in integrates Spark into your HPCC System platform. Once installed and configured, the Sparkthor component manages the Integrated Spark cluster. It dynamically configures, starts, and stops your Integrated Spark cluster when you start or stop your HPCC Systems platform.

The Spark-HPCC Systems Distributed Spark Connector employs the standard remote file read facility to read and write data to/from either sequential or indexed HPCC datasets.

Eclipse IDE for HPCC Systems

The Eclipse IDE is an alternative Integrated Development Environment (IDE) which can be used with the HPCC Systems Platform.
Note: This module is no longer actively supported, but listed here for archival purposes.

ECL Data Integration Plugins for Pentaho

A set of plugins for Pentaho Data Integration to make Big Data development as easy as drag and drop. The plugins are based on the powerful features of the ECL language and help developers to not only quickly develop solutions but also document it in such a way that can easily be understood.

R Integration

rHPCC is an R package providing an interface between R and HPCC Systems.

ECL Extension for VS Code

This extension brings ECL Language and HPCC Systems platform support to the popular VS Code editor. Visual Studio Code is a lightweight but powerful source code editor which runs on your desktop and is available for Windows, macOS and Linux. It comes with built-in support for JavaScript, TypeScript and Node.js and has a rich ecosystem of extensions for other languages (such as C++, C#, Java, Python, PHP, Go) and runtimes (such as .NET and Unity).

JDBC Driver

Allows you to connect to the HPCC Systems platform through your favorite JDBC client and access your data without the need to write a single line of ECL.

Java API Project

The HPCC Systems Java API Project provides a set of JAVA based APIs (JAPI) which facilitate interaction with HPCC Systems Web Services and C++ based tools.

Other Modules

Finance Library

Commonly used financial operations

Performance Testing

Performance test suite


Bloom filter support


The following features are included in the HPCC Systems platform without requiring additional modules:

Smart Stepping is a set of indexing techniques that, taken together, comprise a method of doing n-ary join/merge-join operations, where n is defined as two or more datasets. Smart Stepping enables the supercomputer to efficiently join records from multiple filtered data sources, including subsets of the same dataset. It is particularly efficient when the matches are sparse and uncorrelated. Smart Stepping also supports matching records from M-of-N datasets.

Before the advent of Smart Stepping, finding the intersection of records from multiple datasets was performed by extracting the potential matches from one dataset, and then joining that candidate set to each of the other datasets in turn. The joins would use various mechanisms including index lookups, or reading the potential matches from a dataset, and then joining them. This means that the only way to join multiple datasets required that at least one dataset be read in its entirety and then joined to the others. This could be very inefficient if the programmer didn’t take care to select the most efficient order in which to read the datasets. Unfortunately, it is often impossible to know beforehand which order would be the best. It is also often impossible to order the joins so that the two least frequent terms are joined. It was also particularly difficult to efficiently implement the M-of-N join varieties.

With Smart Stepping technology, these multiple dataset joins become a single efficient operation instead of a series of multiple operations. Smart Stepping can only be used in the context where the join condition is primarily an equality test between columns in the input datasets and the input datasets must have output sorted by those columns.

Smart Stepping also provides an efficient way of streaming information from a dataset, sorted by any trailing sort order. Previously if you had a sorted dataset (often an index) which was required to be filtered by some leading components, and then have the resulting rows sorted by the trailing components, you would have had to achieve it by reading the entire filtered result, and then post sorting that result.

Data Encryption at Rest – support for encrypted data access.