Sun Oct 25, 2020 11:01 am
Login Register Lost Password? Contact Us

Question behaviour of Local JOINs FULL ONLY

Comments and questions related to the Enterprise Control Language

Wed May 13, 2020 4:03 pm Change Time Zone


I'm comparing two DATASETs and want to find differences (any differences).
I've started my implementation attempting to use:
Code: Select all
dBase      := DISTRIBUTE(base);          // On whole record
dcandidate := DISTRIBUTE(candidate);
JOIN(dbase,dcandidate,<equality comparison on every field>,FULL ONLY,LOCAL);

I'm trying to justify the results I'm getting back.
Some different records return two results, one for the LEFT the other for the RIGHT, but other different records just return a single record, either from the LEFT or RIGHT.

Can someone explain this behaviour?
Say there was a single letter case difference in a STRING field between the two DATASETS, now the DISTRIBUTE can do one of two things, either allocate them out to the same node or allocate them out to different nodes.
Is that the crux of the difference, for I can think of no other.


Posts: 431
Joined: Sat Oct 01, 2011 7:26 pm

Wed May 13, 2020 4:17 pm Change Time Zone

Just to add, if I run non-local, i.e. pan node I always get 2 records back for each record difference.
But would still like to understand the LOCAL behaviour.
Posts: 431
Joined: Sat Oct 01, 2011 7:26 pm

Wed May 13, 2020 7:24 pm Change Time Zone


The form of DISTRIBUTE you're using is the "random" version, which basically distributes the records based on a hash of the entire record.

Since your JOIN is LOCAL, my guess/explanation would be that the "single" results you're seeing are those records where the left or right "matching" record ended up on a different node.

Since your global JOIN version works correctly, I'm pretty sure that's the reason.


Community Advisory Board Member
Community Advisory Board Member
Posts: 1560
Joined: Wed Oct 26, 2011 7:40 pm

Return to ECL

Who is online

Users browsing this forum: No registered users and 1 guest