Fri Jul 01, 2022 10:40 pm
Login Register Lost Password? Contact Us

Please Note: The HPCC Systems forums are moving to Stack Overflow. We invite you to post your questions on Stack Overflow utilizing the tag hpcc-ecl (https://stackoverflow.com/search?tab=newest&q=hpcc-ecl). This legacy forum will be active and monitored during our transition to Stack Overflow but will become read only beginning September 1, 2022.



Reducing Skew

Post questions or comments on how best to manage your big data problem

Thu Dec 20, 2012 10:30 pm Change Time Zone

Hello,

I'm facing an issue wherein my data file is extremely skewed (+2900%, -100%)across slaves.

This is in spite of doing a hash32 distribute on two fields (one of which admittedly has lots of 0s, while the other is a mostly unique 38 digit integer).

Is there anything that I can do to reduce this skew? I could:
1. Exclude the field with 0s from the hash key
2. Use some other hashing function

Will be happy to provide whatever other information you need.
sban
 
Posts: 3
Joined: Fri Dec 14, 2012 7:08 am

Fri Dec 21, 2012 6:54 pm Change Time Zone

sban,

I would think your #1 would be your best/simplest/easiest option.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1619
Joined: Wed Oct 26, 2011 7:40 pm


Return to Managing Big Data

Who is online

Users browsing this forum: No registered users and 1 guest

cron