ECL-ML Machine Learning Module

An extensible set of fully parallel Machine Learning (ML) and Matrix processing algorithms to assist with business intelligence; covering supervised and unsupervised learning, document and text analysis, statistics and probabilities, and general inductive inference related problems.

 

The ML project is designed to create an extensible library of fully parallel machine learning routines. This library leverages the distributed nature of the HPCC Systems architecture, providing for extreme scalability to both, the high level implementation of the machine learning algorithms and the underlying matrix algebra library, extensible to tens of thousands of features on billions of training examples.

The existing code is in beta and testing for different use cases continues.

Follow these simple steps to download and get started with ECL-ML
  1. Download the ML Library BETA
     
  2. Extract the contents of the zip file to the ECL IDE source folder

    The ECL Source folder is typically located at “C:UsersPublicDocumentsHPCC SystemsECLMy Files”. To locate your source folder, please refer to your ECL IDE preference, located at Preferences -> Compiler -> ECL Folders.
     

  3. Reference the library in your ECL source using a import statement as shown below:
    Example:
      		IMPORT * FROM ML;  		   		IMPORT * FROM ML.Cluster;  		   		IMPORT * FROM ML.Types;  		   		x2 := DATASET([  		{1, 1, 1}, {1, 2, 5},  		{2, 1, 5}, {2, 2, 7},  		{3, 1, 8}, {3, 2, 1},  		{4, 1, 0}, {4, 2, 0},  		{5, 1, 9}, {5, 2, 3},  		{6, 1, 1}, {6, 2, 4},  		{7, 1, 9}, {7, 2, 4}],NumericField);  		   		c := DATASET([  		{1, 1, 1}, {1, 2, 5},  		{2, 1, 5}, {2, 2, 7},  		{3, 1, 9}, {3, 2, 4}],NumericField);  		   		x3 := Kmeans(x2,c);  		   		OUTPUT(x3);    		

 

Reference Material

The machine learning library contains an extensible collection of machine learning routines which are easy and efficient to use and are designed to execute in parallel across a cluster. The list of modules supported will continue to grow over time.

View the PDF

Have a question? Visit the Machine Learning forum.