Fri Dec 03, 2021 1:43 am
Login Register Lost Password? Contact Us


ECL ML Kmeans - SKEW error

Questions around writing code and queries

Tue Aug 14, 2018 5:45 pm Change Time Zone

Code: Select all
System error: 10084: Graph graph2[11], sort[19]: SORT failed. Graph graph2[11], sort[19]: Exceeded skew limit: 0.250000, estimated skew: 1.000000


After trying to cluster 800k elements on a 4 nodes cluster using kmeans, I ended up with the error listed above.

Code: Select all

IMPORT STD;
IMPORT $;
IMPORT * FROM ML;
IMPORT * FROM ML.Cluster;
IMPORT * FROM ML.Types;

### for privacy reasons, part of the code cannot be presented ###

// result is a set of 800k 300 dimensions vectors
// c is a set of 3 300 dimension vectors set as centroids

ToField(result,dresult);
ToField(c,dc);

target := KMeans(dresult,dc,30,0.3,fDist:= DF.Cosine);

output(target);

maniblitz
 
Posts: 2
Joined: Tue Aug 14, 2018 5:34 pm

Wed Aug 15, 2018 7:10 pm Change Time Zone

maniblitz,
SORT failed. Graph graph2[11], sort[19]: Exceeded skew limit: 0.250000, estimated skew: 1.000000
Since I don't see a SORT in your posted code, I have to assume it's in that code "### for privacy reasons, part of the code cannot be presented ###" section.

So, just addressing the error message, it's telling you the skew is 1.0 and the max skew should be 0.25 -- which tells me that all the data is being SORTed on a single node for some reason. Is it possible that every record has the same value in the SORT expression(s)?

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1606
Joined: Wed Oct 26, 2011 7:40 pm

Mon Aug 20, 2018 2:33 pm Change Time Zone

The content of the masked area contains two datasets that do not have any transformation operated. There is no sort, transform or distribute function involved in that part. Nevertheless, after checking the function kmeans in the Github repository, I found out that the there is a SORT function involved in the kmeans algorithm.

The code masked has no problem as it was tested before and provided the anticipated results. I which to know if there is a way to modify the skew limit for the kmeans function without having to completely modify the ECL ML source code.
maniblitz
 
Posts: 2
Joined: Tue Aug 14, 2018 5:34 pm

Tue Aug 21, 2018 4:10 pm Change Time Zone

I would try making sure the records are distributed before calling Kmeans.
For example:
dresult2 := DISTRIBUTE(result, id);
dc2 := DISTRIBUTE(result, id);

Then use dresult2 and dc2 as the input to Kmeans.

If that fails, then it would be helpful if you could look into the graph (using ECLWatch) and try to identify the line of code represented by the sort (i.e. graph2, subgraph 11, activity 19).
Roger Dev
 
Posts: 2
Joined: Tue Aug 21, 2018 4:05 pm

Tue Aug 21, 2018 4:34 pm Change Time Zone

Did you randomly distribute your data across all the nodes of your cluster before executing kmeans? If not that is probably why the sort exceeded the skew limit.
tlhumphrey2
 
Posts: 260
Joined: Mon May 07, 2012 6:23 pm

Tue May 05, 2020 10:30 pm Change Time Zone

I am currently having the same exact issue when I try to run LogisticRegression.

System error: 10084: Graph graph1[431], sort[433]: SORT failed. Graph graph1[431], sort[433]: Exceeded skew limit: 0.250000, estimated skew: 1.000000

The call to the Sort() function in ML_Core > Analysis.ecl is producing this error.
tpay
 
Posts: 5
Joined: Tue May 05, 2020 10:28 pm

Wed May 06, 2020 1:41 pm Change Time Zone

Tpay,

Did you try the DISTRIBUTE as Roger suggested? If you did and you are still seeing the error, a JIRA may be needed so we can investigate further.

Regards,

Bob
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1005
Joined: Wed Jun 29, 2011 7:13 pm

Wed May 06, 2020 4:34 pm Change Time Zone

Hi Bob,

When I use distribute ML.Analysis.Classification.Accuracy works, but ML.Analysis.Classification.AccuracyByClass still produces the same error.

Thanks
Tayfun Pay
tpay
 
Posts: 5
Joined: Tue May 05, 2020 10:28 pm

Thu May 07, 2020 1:18 pm Change Time Zone

When I use distribute ML.Analysis.Classification.Accuracy works, but ML.Analysis.Classification.AccuracyByClass still produces the same error.


Thank you Tayfun! Would you please open a Jira issue on this, with sample code if possible and steps to reproduce contained in the report?
https://track.hpccsystems.com/secure/Dashboard.jspa

There is a category there for Machine Learning.

Thank You!

Bob
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1005
Joined: Wed Jun 29, 2011 7:13 pm

Wed May 20, 2020 8:24 pm Change Time Zone

Hi Bob, Is it possible for you to take a look at this as well? viewtopic.php?uid=4453&f=10&t=8093&start=0 I can email you the Zap reports as well as well forward my email exchanges with Roger. Thanks Tayfun
tpay
 
Posts: 5
Joined: Tue May 05, 2020 10:28 pm

Next

Return to Programming

Who is online

Users browsing this forum: No registered users and 1 guest