Fri Feb 21, 2020 5:47 pm
Login Register Lost Password? Contact Us


Return the difference between two strings

Comments and questions related to the Enterprise Control Language

Thu Jan 16, 2020 12:45 am Change Time Zone

Has anyone come up with a good way to return the difference between two strings in ECL? I'm trying not to recreate the wheel here.

There are a lot of good tools to tell us the degree at which two strings do not match but nothing I can find to return the actual differences.
newportm
 
Posts: 17
Joined: Tue Nov 15, 2016 2:48 pm

Thu Jan 16, 2020 3:28 pm Change Time Zone

newportm,

Do you have some specifics of what you would expect and the type of string data you're looking at comparing?

IOW, what's the real scope of your problem?
  • Letter-based:
    where f('abc','abs') might return 'c' or 's' depending on what you're interested in
  • Word-based:
    where f('abc def','abs def') might return 'abc' or 'abs' depending on what you're interested in
  • Sentence based:
    where f('Abc def ghi. Fred loves Mary.','Abc def ghi. Fred loves Susie.') might return 'Fred loves Mary' or 'Fred loves Susie' depending on what you're interested in
Or possibly something else?

FWIW, I don't know of any ECL/HPCC functions that do any of these (although I use Beyond Compare all the time, so I know it has all been done before on other platforms).

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1519
Joined: Wed Oct 26, 2011 7:40 pm

Thu Jan 16, 2020 3:55 pm Change Time Zone

Hey Richard,

What I am looking to do is compare XML strings to each other and pull out anything that is different between the two. I was not being picky with word vs Character because I could implement either. I also do not see any tools available at this time in the STD library. I opened HPCC-23304 just to be sure / get that conversation started.

There are certainly ways to do it in ECL using Normalize and writing some helper functions to detect character/word shifts. SALT has a few tools available but they don't tell you what is different just the % difference. So I was more looking to see if someone went down this path while I wait for the platform team's consideration.

Tim
newportm
 
Posts: 17
Joined: Tue Nov 15, 2016 2:48 pm

Thu Jan 16, 2020 8:06 pm Change Time Zone

Tim,

OK, here's a simple example of the way I would start approaching the problem:
Code: Select all
StringDiff(STRING S1, STRING S2) := FUNCTION
  L1 := LENGTH(S1);
  L2 := LENGTH(S2);
  ds := DATASET(MAX(L1,L2),
                TRANSFORM({STRING char},
                          SELF.char := IF(S1[COUNTER]=S2[COUNTER],' ',S2[COUNTER])));
  Rs := ROLLUP(ds,TRUE,TRANSFORM({STRING char},
                                 SELF.char := LEFT.char + RIGHT.char))[1].char;   
  // RETURN Rs;
  RETURN DATASET([{S1},{S2},{Rs}],{STRING char});
END;

StringDiff('ABC','ABS');

C1 := 'ABC DEF';
C2 := 'Abc Def Ghi';
StringDiff(C1,C2);

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1519
Joined: Wed Oct 26, 2011 7:40 pm

Tue Jan 21, 2020 3:33 pm Change Time Zone

Not exactly what your're looking for but there are, in the standard library:
Code: Select all
STD.Str.EditDistance
STD.Str.EditDistanceWithinRadius

Which gives you a metric on how different two strings are (there are also uni code versions of these functions)

Allan
Allan
 
Posts: 419
Joined: Sat Oct 01, 2011 7:26 pm


Return to ECL

Who is online

Users browsing this forum: No registered users and 1 guest