Web Log Analytic Module
The Visualization bundle is an open-source add-on to the HPCC platform to allow you to create visualizations from the results of queries written in ECL. Visualizations are an important means of conveying information from massive data.Learn More
The ECL Blundles Repository on GitHub serves as a central list of all known ECL bundles.
Learn More and see the latest list of bundles.
Supported bundles include:
|ML_Core||Machine Learning core bundle|
|PBblas||Parallel BLAS support for machine learning|
|Performance Testing||Performance test suite|
|Visualizer||HPCC Visualizations support|
Approved bundles include:
|Bloom||Bloom filter support|
|Cell Formatter||Format ECL data for display|
|MySql Import||Import schemas from MySQL|
|String Match||Various string matching algorithms|
Other bundles include:
|Finance Library||Commonly used financial operations|
|Prefix Tree||Improves Levenshtein edit distance performance|
The following modules are no longer actively supported, but are listed here for archival purposes.
The following features are included in the HPCC Systems platform without requiring additional modules:
Data Encryption at Rest
Data Encryption support for encrypted data access.
Natural Language Parsing (NLP)
The ability to parse and mine complex (or simple) structured data out of unstructured text using linguistic or ‘regular expression’ techniques.
Smart Stepping is a set of indexing techniques that, taken together, comprise a method of doing n-ary join/merge-join operations, where n is defined as two or more datasets. Smart Stepping enables the supercomputer to efficiently join records from multiple filtered data sources, including subsets of the same dataset. It is particularly efficient when the matches are sparse and uncorrelated. Smart Stepping also supports matching records from M-of-N datasets.
Before the advent of Smart Stepping, finding the intersection of records from multiple datasets was performed by extracting the potential matches from one dataset, and then joining that candidate set to each of the other datasets
in turn. The joins would use various mechanisms including index lookups, or reading the potential matches from a dataset, and then joining them. This means that the only way to join multiple datasets required that at least one dataset be read in its entirety and then joined to the others. This could be very inefficient if the programmer didn't take care to select the most efficient order in which to read the datasets. Unfortunately, it is often impossible to know beforehand which order would be the best. It is also often impossible to order the joins so that the two least frequent terms are joined. It was also particularly difficult to efficiently implement the M-of-N join varieties.
With Smart Stepping technology, these multiple dataset joins become a single efficient operation instead of a series of multiple operations. Smart Stepping can only be used in the context where the join condition is primarily an equality test between columns in the input datasets and the input datasets must have output sorted by those columns.
Smart Stepping also provides an efficient way of streaming information from a dataset, sorted by any trailing sort order. Previously if you had a sorted dataset (often an index) which was required to be filtered by some leading components, and then have the resulting rows sorted by the trailing components, you would have had to achieve it by reading the entire filtered result, and then post sorting that result.