Skip to main content

HPCC Systems continues to develop new stand-alone applications or plug-in modules that extend the capabilities of the base HPCC platform.
Title
Description

WsSQL

WsSQL is a service that provides a SQL interface into HPCC Systems.

Previously, WsSQL was installed using a separate download package. From HPCC Systems 7.0.0. Beta onwards, WsSQL functionality has been fully integrated into the HPCC Systems platform package.

We recommend that existing WsSQL users preparing to upgrade from an earlier version of our platform (such as version 6.x.x), uninstall the old WsSQL package before upgrading to avoid potential compatibility issues.

Learn More

Web Log Analytic Module

Our Web Log Analytic Module (WLAM) can correlate terabytes of log data in a matter of minutes, and perform complex transformation and linking enhancing the value of the existing data.

Learn More

Visualizer Bundle

The Visualization bundle is an open-source add-on to the HPCC platform to allow you to create visualizations from the results of queries written in ECL. Visualizations are an important means of conveying information from massive data.

Learn More

ECL Bundles

The ECL Blundles Repository on GitHub serves as a central list of all known ECL bundles.

Learn More and see the latest list of bundles.

 

Supported bundles include:

Bundle Description
ML_Core Machine Learning core bundle
PBblas Parallel BLAS support for machine learning
Performance Testing Performance test suite
Visualizer HPCC Visualizations support

Approved bundles include:

Bundle Description
Bloom Bloom filter support
Cell Formatter Format ECL data for display
MySql Import Import schemas from MySQL
String Match Various string matching algorithms
Trigram Trigram manipulation

Other bundles include:

Bundle Description
Finance Library Commonly used financial operations
Prefix Tree Improves Levenshtein edit distance performance


Legacy Modules

The following modules are no longer actively supported, but are listed here for archival purposes.

Hadoop Data Integration

This connector provides access to HDFS data files from HPCC Systems.

Learn More

The following features are included in the HPCC Systems platform without requiring additional modules:

Data Encryption at Rest

Data Encryption support for encrypted data access.

Natural Language Parsing (NLP)

The ability to parse and mine complex (or simple) structured data out of unstructured text using linguistic or ‘regular expression’ techniques.

Smart Stepping

Smart Stepping is a set of indexing techniques that, taken together, comprise a method of doing n-ary join/merge-join operations, where n is defined as two or more datasets. Smart Stepping enables the supercomputer to efficiently join records from multiple filtered data sources, including subsets of the same dataset. It is particularly efficient when the matches are sparse and uncorrelated. Smart Stepping also supports matching records from M-of-N datasets.

Before the advent of Smart Stepping, finding the intersection of records from multiple datasets was performed by extracting the potential matches from one dataset, and then joining that candidate set to each of the other datasets
in turn. The joins would use various mechanisms including index lookups, or reading the potential matches from a dataset, and then joining them. This means that the only way to join multiple datasets required that at least one dataset be read in its entirety and then joined to the others. This could be very inefficient if the programmer didn't take care to select the most efficient order in which to read the datasets. Unfortunately, it is often impossible to know beforehand which order would be the best. It is also often impossible to order the joins so that the two least frequent terms are joined. It was also particularly difficult to efficiently implement the M-of-N join varieties.

With Smart Stepping technology, these multiple dataset joins become a single efficient operation instead of a series of multiple operations. Smart Stepping can only be used in the context where the join condition is primarily an equality test between columns in the input datasets and the input datasets must have output sorted by those columns.

Smart Stepping also provides an efficient way of streaming information from a dataset, sorted by any trailing sort order. Previously if you had a sorted dataset (often an index) which was required to be filtered by some leading components, and then have the resulting rows sorted by the trailing components, you would have had to achieve it by reading the entire filtered result, and then post sorting that result.