ECL ML Kmeans - SKEW error
- Code: Select all
System error: 10084: Graph graph2[11], sort[19]: SORT failed. Graph graph2[11], sort[19]: Exceeded skew limit: 0.250000, estimated skew: 1.000000
After trying to cluster 800k elements on a 4 nodes cluster using kmeans, I ended up with the error listed above.
- Code: Select all
IMPORT STD;
IMPORT $;
IMPORT * FROM ML;
IMPORT * FROM ML.Cluster;
IMPORT * FROM ML.Types;
### for privacy reasons, part of the code cannot be presented ###
// result is a set of 800k 300 dimensions vectors
// c is a set of 3 300 dimension vectors set as centroids
ToField(result,dresult);
ToField(c,dc);
target := KMeans(dresult,dc,30,0.3,fDist:= DF.Cosine);
output(target);
- maniblitz
- Posts: 2
- Joined: Tue Aug 14, 2018 5:34 pm
maniblitz,
So, just addressing the error message, it's telling you the skew is 1.0 and the max skew should be 0.25 -- which tells me that all the data is being SORTed on a single node for some reason. Is it possible that every record has the same value in the SORT expression(s)?
HTH,
Richard
Since I don't see a SORT in your posted code, I have to assume it's in that code "### for privacy reasons, part of the code cannot be presented ###" section.SORT failed. Graph graph2[11], sort[19]: Exceeded skew limit: 0.250000, estimated skew: 1.000000
So, just addressing the error message, it's telling you the skew is 1.0 and the max skew should be 0.25 -- which tells me that all the data is being SORTed on a single node for some reason. Is it possible that every record has the same value in the SORT expression(s)?
HTH,
Richard
- rtaylor
- Community Advisory Board Member
- Posts: 1619
- Joined: Wed Oct 26, 2011 7:40 pm
The content of the masked area contains two datasets that do not have any transformation operated. There is no sort, transform or distribute function involved in that part. Nevertheless, after checking the function kmeans in the Github repository, I found out that the there is a SORT function involved in the kmeans algorithm.
The code masked has no problem as it was tested before and provided the anticipated results. I which to know if there is a way to modify the skew limit for the kmeans function without having to completely modify the ECL ML source code.
The code masked has no problem as it was tested before and provided the anticipated results. I which to know if there is a way to modify the skew limit for the kmeans function without having to completely modify the ECL ML source code.
- maniblitz
- Posts: 2
- Joined: Tue Aug 14, 2018 5:34 pm
I would try making sure the records are distributed before calling Kmeans.
For example:
dresult2 := DISTRIBUTE(result, id);
dc2 := DISTRIBUTE(result, id);
Then use dresult2 and dc2 as the input to Kmeans.
If that fails, then it would be helpful if you could look into the graph (using ECLWatch) and try to identify the line of code represented by the sort (i.e. graph2, subgraph 11, activity 19).
For example:
dresult2 := DISTRIBUTE(result, id);
dc2 := DISTRIBUTE(result, id);
Then use dresult2 and dc2 as the input to Kmeans.
If that fails, then it would be helpful if you could look into the graph (using ECLWatch) and try to identify the line of code represented by the sort (i.e. graph2, subgraph 11, activity 19).
- Roger Dev
- Posts: 2
- Joined: Tue Aug 21, 2018 4:05 pm
Did you randomly distribute your data across all the nodes of your cluster before executing kmeans? If not that is probably why the sort exceeded the skew limit.
- tlhumphrey2
- Posts: 260
- Joined: Mon May 07, 2012 6:23 pm
I am currently having the same exact issue when I try to run LogisticRegression.
System error: 10084: Graph graph1[431], sort[433]: SORT failed. Graph graph1[431], sort[433]: Exceeded skew limit: 0.250000, estimated skew: 1.000000
The call to the Sort() function in ML_Core > Analysis.ecl is producing this error.
System error: 10084: Graph graph1[431], sort[433]: SORT failed. Graph graph1[431], sort[433]: Exceeded skew limit: 0.250000, estimated skew: 1.000000
The call to the Sort() function in ML_Core > Analysis.ecl is producing this error.
- tpay
- Posts: 5
- Joined: Tue May 05, 2020 10:28 pm
Tpay,
Did you try the DISTRIBUTE as Roger suggested? If you did and you are still seeing the error, a JIRA may be needed so we can investigate further.
Regards,
Bob
Did you try the DISTRIBUTE as Roger suggested? If you did and you are still seeing the error, a JIRA may be needed so we can investigate further.
Regards,
Bob
- bforeman
- Community Advisory Board Member
- Posts: 1006
- Joined: Wed Jun 29, 2011 7:13 pm
Hi Bob,
When I use distribute ML.Analysis.Classification.Accuracy works, but ML.Analysis.Classification.AccuracyByClass still produces the same error.
Thanks
Tayfun Pay
When I use distribute ML.Analysis.Classification.Accuracy works, but ML.Analysis.Classification.AccuracyByClass still produces the same error.
Thanks
Tayfun Pay
- tpay
- Posts: 5
- Joined: Tue May 05, 2020 10:28 pm
When I use distribute ML.Analysis.Classification.Accuracy works, but ML.Analysis.Classification.AccuracyByClass still produces the same error.
Thank you Tayfun! Would you please open a Jira issue on this, with sample code if possible and steps to reproduce contained in the report?
https://track.hpccsystems.com/secure/Dashboard.jspa
There is a category there for Machine Learning.
Thank You!
Bob
- bforeman
- Community Advisory Board Member
- Posts: 1006
- Joined: Wed Jun 29, 2011 7:13 pm
Hi Bob, Is it possible for you to take a look at this as well? viewtopic.php?uid=4453&f=10&t=8093&start=0 I can email you the Zap reports as well as well forward my email exchanges with Roger. Thanks Tayfun
- tpay
- Posts: 5
- Joined: Tue May 05, 2020 10:28 pm
13 posts
• Page 1 of 2 • 1, 2
Who is online
Users browsing this forum: No registered users and 2 guests