Smart Join
checkCorrectionKey := join(InputFile, CorrectionIndex,
keyed(left.DID = right.DID),
transform(left), smart);
"InputFile" may have one record or 15billion records( as per one dataset we have today in prod)
"CorrectionIndex" has currently 300records.. this would eventually grow but not too fast or too big.
With this said, if smart option a good choice in the join or any other suggestions?
Also, when tested this join it gave warning:
Warning: Smart specified on an unfiltered keyed join - was this intended?
keyed(left.DID = right.DID),
transform(left), smart);
"InputFile" may have one record or 15billion records( as per one dataset we have today in prod)
"CorrectionIndex" has currently 300records.. this would eventually grow but not too fast or too big.
With this said, if smart option a good choice in the join or any other suggestions?
Also, when tested this join it gave warning:
Warning: Smart specified on an unfiltered keyed join - was this intended?
- HPCC_KK
- Posts: 2
- Joined: Mon Nov 15, 2021 5:32 pm
HPCC_KK,
Personally, given that your CorrectionIndex only has 300 records and is expected to grow slowly, I would not use JOIN at all.
I'd do it more like this:
Your inner JOIN is basically accomplishing exactly the same thing, since your TRANSFORM(LEFT) is just going to give you the records from InputFile that match the set of DID fields in the CorrectionIndex.
HTH,
Richard
Personally, given that your CorrectionIndex only has 300 records and is expected to grow slowly, I would not use JOIN at all.
I'd do it more like this:
- Code: Select all
SetCorrectionKey := SET(CorrectionIndex,DID);
checkCorrectionKey := InputFile(DID IN SetCorrectionKey);
Your inner JOIN is basically accomplishing exactly the same thing, since your TRANSFORM(LEFT) is just going to give you the records from InputFile that match the set of DID fields in the CorrectionIndex.
HTH,
Richard
- rtaylor
- Community Advisory Board Member
- Posts: 1619
- Joined: Wed Oct 26, 2011 7:40 pm
CorrectionIndex has 300records as of last week, it will grow rapidly after few months and it could be 10k+ records. I dont think SET is not an option for 10+ records. I am thinking of a lookup join or hash or smart join. Please suggest.
Thanks Richard.
Thanks Richard.
- HPCC_KK
- Posts: 2
- Joined: Mon Nov 15, 2021 5:32 pm
HPCC_KK,
Just to test it, I created a SET containing 84,140 elements (a 10% sampling of the unique ID field from an 841,400 record dataset) then used that SET to filter the original dataset from which it came. It ran successfully in a bit over a minute on my 1-node VM environment.
So, IMO if it works well with an 84K SET, then a 10K SET will also work nicely, so I would personally still use that method.
HTH,
Richard
Just to test it, I created a SET containing 84,140 elements (a 10% sampling of the unique ID field from an 841,400 record dataset) then used that SET to filter the original dataset from which it came. It ran successfully in a bit over a minute on my 1-node VM environment.
So, IMO if it works well with an 84K SET, then a 10K SET will also work nicely, so I would personally still use that method.
HTH,
Richard
- rtaylor
- Community Advisory Board Member
- Posts: 1619
- Joined: Wed Oct 26, 2011 7:40 pm
4 posts
• Page 1 of 1
Who is online
Users browsing this forum: Bing [Bot] and 2 guests