Join Options(Partition left and Partition Right)
Hi Team,
Can anyone explain the functionality of Partition Left and Right with an example.
Thanks & Regards,
Manikandan N
Can anyone explain the functionality of Partition Left and Right with an example.
Thanks & Regards,
Manikandan N
- Daniel_mani
- Posts: 11
- Joined: Wed Jan 23, 2019 2:15 pm
Manikandan N,
Here's the example: If you're joining by "lastname, firstname" then in order for the JOIN to work, all the "Tom Jones" records from both datasets have to be on the same node together. That means JOIN moves data around the nodes as it needs to in order to accomplish the task.
PARTITION LEFT (the default behavior) says that the distribution of the data from both datasets is determined by the LEFT dataset, while PARTITION RIGHT says that the distribution of the data from both datasets is determined by the RIGHT dataset.
So, if you're JOINing a 10 Billion record dataset to a 20 Million record dataset, then the most even distribution of all data from both datasets would be determined by the larger dataset. Generally, you would make that one the LEFT dataset and go with the default partitioning, but if you have some particular need for that larger file to be the RIGHT dataset, then you should specify PARTITION RIGHT on the JOIN.
HTH,
Richard
Here's the example: If you're joining by "lastname, firstname" then in order for the JOIN to work, all the "Tom Jones" records from both datasets have to be on the same node together. That means JOIN moves data around the nodes as it needs to in order to accomplish the task.
PARTITION LEFT (the default behavior) says that the distribution of the data from both datasets is determined by the LEFT dataset, while PARTITION RIGHT says that the distribution of the data from both datasets is determined by the RIGHT dataset.
So, if you're JOINing a 10 Billion record dataset to a 20 Million record dataset, then the most even distribution of all data from both datasets would be determined by the larger dataset. Generally, you would make that one the LEFT dataset and go with the default partitioning, but if you have some particular need for that larger file to be the RIGHT dataset, then you should specify PARTITION RIGHT on the JOIN.
HTH,
Richard
- rtaylor
- Community Advisory Board Member
- Posts: 1619
- Joined: Wed Oct 26, 2011 7:40 pm
Thanks for the Clarification Richard..
Regards,
Manikandan N
Regards,
Manikandan N
- Daniel_mani
- Posts: 11
- Joined: Wed Jan 23, 2019 2:15 pm
3 posts
• Page 1 of 1
Who is online
Users browsing this forum: Bing [Bot] and 1 guest