Thu Dec 02, 2021 12:21 pm
Login Register Lost Password? Contact Us


HPCC Systems ML library in action

Topics related to the set of Machine Learning libraries and Matrix processing algorithms

Tue Mar 13, 2012 6:19 pm Change Time Zone

Below is a link to a Pinterest POV conducted by Engauge. They used the HPCC Systems platform, ECL to process the data for analysis and several ML methods to generate results.

Check it out!
http://hpccsystems.com/Why-HPCC/case-st ... -pinterest
admin
Site Admin
Site Admin
 
Posts: 208
Joined: Thu Jan 27, 2011 10:58 am

Tue Apr 10, 2012 8:28 pm Change Time Zone

Is there any performance and scalability about this ML library? In general, how HPCC systems with ML compares with SAS's? Is there any solid performance and feature comparison?

Thanks,


sjz
szhou
 
Posts: 7
Joined: Tue Apr 10, 2012 8:13 pm

Wed Apr 11, 2012 1:32 pm Change Time Zone

Szhou,

I don't have personally much experience with SAS in large scale analytics, and I don't even know if it performs properly (or at all) in distributed computing platforms.

We did benchmark scalability on ECL-ML, and it does scale linearly with the number of nodes in the cluster, achieving almost perfect parallelism.

Do you have any specific algorithms in mind? If so, and if you can provide some base cases, we could help you run some benchmarks too.

Thanks,

Flavio
flavio
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 73
Joined: Wed Apr 27, 2011 8:59 pm

Wed Apr 11, 2012 2:39 pm Change Time Zone

Hi, Flavio:

Typically we will use k-means and decision tree for analysis. If HPCC ML has a linear scalability for them, that would be great. Do you have the source codes for HPCC ML/k-means? I am curious how it is implemented and how it achieves a linear scalability.

Thanks,


SJZ
szhou
 
Posts: 7
Joined: Tue Apr 10, 2012 8:13 pm

Wed Apr 11, 2012 2:54 pm Change Time Zone

Szhou,

Source code is available here - https://github.com/hpcc-systems/ecl-ml.

Thanks

Arjuna
arjuna chala
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 21
Joined: Mon Jul 11, 2011 1:57 pm

Wed Apr 11, 2012 3:31 pm Change Time Zone

To be clear - the performance scales linearly with the number of NODES - so 10x as many nodes = 10x faster.

The algorithm itself is NOT linear. If you look at the different distance metrics we support; some of them are significantly lower than the standard kN (especially for sparse data).

Note: we have decision tree support on the master branch - it is not yet in our official release ...

David
dabayliss
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 109
Joined: Fri Apr 29, 2011 1:35 pm

Wed Apr 11, 2012 4:57 pm Change Time Zone

Hi,

Looking at the source of hpcc/ML/k-means, I wonder how long for you to implement it? Have you compared the absolute performance against the corresponding serial c code and/or c with mpi code?
szhou
 
Posts: 7
Joined: Tue Apr 10, 2012 8:13 pm

Thu Apr 12, 2012 5:50 pm Change Time Zone

Szhou,

I'm not familiar with the implementations that you mention below, but I did run some comparisons with Matlab/Octave (and also some code that I wrote in Python) and our current ECL implementation was faster, even for reasonably small data.

In addition to our ECL-ML libraries, we have some current alpha/beta state HPCC integration with Ismion's PaperBoat ML library (http://ismion.com/documentation/paperboat/introduction.html) in case you want to check it out too: http://ismion.com/documentation/ecl-pb/index.html.

Please let me know.

Thanks,

Flavio
flavio
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 73
Joined: Wed Apr 27, 2011 8:59 pm

Thu Apr 12, 2012 6:03 pm Change Time Zone

The numerical calculation performance with python and matlab could be bad if they did not use c libraries internally. Another fair comparison for hpcc/ML could be against java/hadoop/mahout. The integrating paperboat with ecl is interesting.

Thanks,

Shujia
szhou
 
Posts: 7
Joined: Tue Apr 10, 2012 8:13 pm

Thu Apr 12, 2012 6:07 pm Change Time Zone

I agree! Please take a look at PaperBoat and let me know if we can work together on a fair benchmark between ECL-ML/PaperBoat and Mahout...

Flavio
flavio
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 73
Joined: Wed Apr 27, 2011 8:59 pm

Next

Return to Machine Learning

Who is online

Users browsing this forum: No registered users and 1 guest

cron