Mon Oct 18, 2021 6:50 am

## Testing for uniqueness in a set of values

Questions around writing code and queries
Hi team,

What is the best way to test for uniqueness in a set of values? For example, if I have the following set definition:

SetNum := [1,2,3,4,5,6,3,2,7];

What's the best way to test that each element is unique?

I was creating a training exercise where a student needs to determine the highest count in a series of counts presented to them. This is a basic phase of the training course where TRANSFORM and the ETL processes have yet to be introduced.

After returning a series of COUNT values, and then using the MAX function I was able to get the maximium count in that set, but then I realized, "How do I know it was a unique value?". I ended up using WHICH and checked for equality across all count values. If WHICH returned anything other than zero, I knew that there was a duplicate in my counts.

This worked because I was only comparing 5 values, and using WHICH this gave me only 15 combinations to test, but in a larger set of values the combinations would be too many and WHICH would not be the best solution.

I know that we can use DEDUP and ROLLUP to remove duplicates and salvage data when needed, but what I was looking for is something like this:

val := ISUNIQUE(set); //returns FALSE if duplicates are found in the set
val2 := ISUNIQUE(recordset,field); //returns FALSE if duplicate in field

Do we have anything in ECL similar to this that I might have overlooked? Or is there a better or simpler approach to what I need to do?

Best Regards,

Bob Foreman
bforeman Posts: 1005
Joined: Wed Jun 29, 2011 7:13 pm

You need to make use of the form of DATASET that converts a set to a dataset. Simplest, and relatively efficient for small lists would be:

isUnique1(set of integer x) := function
ds1 := dataset(x, { integer value1; });
ds2 := dataset(x, { integer value2; });
return not exists(ds1(count(ds2(ds1.value1 = ds2.value2)) > 1));
END;

i.e. count the number of times each element occurs in the list, and not duplicates if it never exceeds 1. This will typically be O(N^2)

For longer lists you're better off sorting, deduping and comparing the counts, which is O(Nln(N)):

isUnique2(set of integer x) := function
ds1 := dataset(x, { integer value; });
return count(x) = count(dedup(sort(ds1, value), value));
END;
ghalliday Posts: 198
Joined: Wed May 18, 2011 9:48 am

VERY elegant AND efficient, thank you very much!

Bob
bforeman 