Mon May 23, 2022 1:42 am
Login Register Lost Password? Contact Us

Reducing Skew

Post questions or comments on how best to manage your big data problem

Thu Dec 20, 2012 10:30 pm Change Time Zone


I'm facing an issue wherein my data file is extremely skewed (+2900%, -100%)across slaves.

This is in spite of doing a hash32 distribute on two fields (one of which admittedly has lots of 0s, while the other is a mostly unique 38 digit integer).

Is there anything that I can do to reduce this skew? I could:
1. Exclude the field with 0s from the hash key
2. Use some other hashing function

Will be happy to provide whatever other information you need.
Posts: 3
Joined: Fri Dec 14, 2012 7:08 am

Fri Dec 21, 2012 6:54 pm Change Time Zone


I would think your #1 would be your best/simplest/easiest option.


Community Advisory Board Member
Community Advisory Board Member
Posts: 1619
Joined: Wed Oct 26, 2011 7:40 pm

Return to Managing Big Data

Who is online

Users browsing this forum: No registered users and 1 guest