Fri Jan 21, 2022 6:09 pm
Login Register Lost Password? Contact Us


Interesting grouping problem.

Questions around writing code and queries

Sun Nov 20, 2016 8:56 am Change Time Zone

Hi,

I'm been working on this for a couple of days now and, to be frank, its got me beat.
I need to group a set of records that are all within 1 year of each other, so in the example below:
Code: Select all
d := DATASET([{1,20150101}
             ,{2,20150201}
             ,{3,20150301}
             ,{4,20150401}
             ,{5,20160229}
             ,{6,20170101}
             ,{7,20170102}
             ,{8,20170103}
             ,{9,20180601}],{unsigned1 id,UNSIGNED4 date});

The records should end up grouped thus:
Code: Select all
1,2,3,4
3,4,5
5,6,7,8
9

Note that records 3,4 and 5 end up on more than 1 group.
With JOINS I can match records within 1 year and those that are definitely in separate groups, but that does not get me any further into the grouping process.

So simplify a bit, the problem can be stated as just numbers where records within a numeric range are in a group.
I'm wondering if the dreaded GRAPH or LOOP functions get involved?

Any road up, any help/pointers would be gratefully received.
Cheers
Allan
Allan
 
Posts: 442
Joined: Sat Oct 01, 2011 7:26 pm

Tue Nov 29, 2016 2:49 pm Change Time Zone

Allan,

I wouod approach it something like this:
Code: Select all
IMPORT Std;
d := DATASET([{1,20150101}
             ,{2,20150201}
             ,{3,20150301}
             ,{4,20150401}
             ,{5,20160229}
             ,{6,20170101}
             ,{7,20170102}
             ,{8,20170103}
             ,{9,20180601}],{unsigned1 id,UNSIGNED4 date});

OutRec := RECORD
  unsigned1 id;
  DATASET(RECORDOF(d)) Grp;
END;

OutRec XF(d L) := TRANSFORM
  SELF.Grp := d(Date BETWEEN L.Date AND Std.Date.AdjustCalendar(L.Date,1));
  SELF := L;
END;
PROJECT(d,XF(LEFT));

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1610
Joined: Wed Oct 26, 2011 7:40 pm

Wed Nov 30, 2016 10:25 am Change Time Zone

Thanks for this Richard,

I was, in fact, generating something like this but it does not generate the groups as set out above, In the results from our example one has to take this result and somehow detect that group 3 contains an entry that can not below with an entry in group 1.

Yours

Allan
Allan
 
Posts: 442
Joined: Sat Oct 01, 2011 7:26 pm

Wed Nov 30, 2016 1:08 pm Change Time Zone

Allan,
one has to take this result and somehow detect that group 3 contains an entry that can not below with an entry in group 1.
What exactly is the criteria for determining that? I didn't detect a pattern to your example result groups.

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1610
Joined: Wed Oct 26, 2011 7:40 pm

Wed Dec 07, 2016 8:42 am Change Time Zone

Well,

Items 1,2,3 and 4 are obviously within one group as they are all within 1 year of each other.
Items 3,4 and 5 are also within 1 year of each other but items 1 and 2 are not within that group as they are too early.
Items 5,6,7 and 8 are within 1 year of each other, but all previous items are too early to be in this group.
Item 9 is a soul member of a group as no other dates are within 1 year of it.
Allan
 
Posts: 442
Joined: Sat Oct 01, 2011 7:26 pm

Wed Dec 07, 2016 11:19 am Change Time Zone

Allan,

OK, here's my take on getting to where you want to be:
Code: Select all
IMPORT Std;
d := DATASET([{1,20150101}
             ,{2,20150201}
             ,{3,20150301}
             ,{4,20150401}
             ,{5,20160229}
             ,{6,20170101}
             ,{7,20170102}
             ,{8,20170103}
             ,{9,20180601}],{unsigned1 id,UNSIGNED4 date});
d_rec := RECORDOF(d);
OutRec := RECORD
  unsigned1 id;
  DATASET(d_rec) Grp;
END;

OutRec XF(d L) := TRANSFORM
  SELF.Grp := d(Date BETWEEN L.Date AND Std.Date.AdjustCalendar(L.Date,1));
  SELF := L;
END;
RawGrps := PROJECT(d,XF(LEFT));

OutRec IterXF(OutRec L, OutRec R) := TRANSFORM
  FullSet := SET(L.Grp,id);
  d_rec ProjXF(d_rec Chld) := TRANSFORM,SKIP(Chld.id IN FullSet)
    SELF := Chld;
  END;      
  SELF.ID := IF(EXISTS(PROJECT(R.Grp,ProjXF(LEFT))),R.ID, SKIP);
  SELF := R;
END;

JustGrps := ITERATE(RawGrps,IterXF(LEFT,RIGHT));
JustGrps;
I'm nesting a PROJECT inside an ITERATE to SKIP those RawGrps whose child datasets are simply subsets of the previous Group. The result is exactly what your example suggested it should be. Hopefully this technique will be applicable to your real problem. :)

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1610
Joined: Wed Oct 26, 2011 7:40 pm

Mon Dec 12, 2016 7:34 am Change Time Zone

Brilliant Richard!

Thanks very much.
Yours
Allan
Allan
 
Posts: 442
Joined: Sat Oct 01, 2011 7:26 pm

Tue Dec 13, 2016 7:55 am Change Time Zone

Allan,

And a slightly more terse version that also numbers each group sequentially:
Code: Select all
IMPORT Std;
d_rec := {unsigned1 id,UNSIGNED4 date};
d := DATASET([{1,20150101}
             ,{2,20150201}
             ,{3,20150301}
             ,{4,20150401}
             ,{5,20160229}
             ,{6,20170101}
             ,{7,20170102}
             ,{8,20170103}
             ,{9,20180601}],d_rec);
OutRec := RECORD
  unsigned1 id;
  DATASET(d_rec) Grp;
END;

RawGrps := PROJECT(d,TRANSFORM(OutRec,
                               SELF.Grp := d(Date BETWEEN LEFT.Date AND
                                             Std.Date.AdjustCalendar(LEFT.Date,1)),
                               SELF := LEFT));
OutRec IterXF(OutRec L, OutRec R) := TRANSFORM
  ds := PROJECT(R.Grp,TRANSFORM(d_rec,
                                SKIP(LEFT.id IN SET(L.Grp,id)),
                                SELF := LEFT));
  SELF.ID := IF(EXISTS(ds),L.ID+1, SKIP);
  SELF := R;
END;

JustGrps := ITERATE(RawGrps,IterXF(LEFT,RIGHT));
JustGrps;

And, of course, you could turn it into a FUNCTION, like this:
Code: Select all
IMPORT Std;
d_rec := {UNSIGNED4 id,UNSIGNED4 date};
GroupWithinYear(DATASET(d_rec) d) := FUNCTION
  OutRec := RECORD
    UNSIGNED4 id;
    DATASET(d_rec) Grp;
  END;
  RawGrps := PROJECT(d,TRANSFORM(OutRec,
                                 SELF.Grp := d(Date BETWEEN LEFT.Date AND
                                 Std.Date.AdjustCalendar(LEFT.Date,1)),
                                 SELF := LEFT));
  OutRec IterXF(OutRec L, OutRec R) := TRANSFORM
    ds := PROJECT(R.Grp,TRANSFORM(d_rec,
                                  SKIP(LEFT.id IN SET(L.Grp,id)),
                                  SELF := LEFT));
    SELF.ID := IF(EXISTS(ds),L.ID+1, SKIP);
    SELF := R;
  END;
  JustGrps := ITERATE(RawGrps,IterXF(LEFT,RIGHT));
  RETURN JustGrps;
END;   

ds := DATASET([{1,20150101}
             ,{2,20150201}
             ,{3,20150301}
             ,{4,20150401}
             ,{5,20160229}
             ,{6,20170101}
             ,{7,20170102}
             ,{8,20170103}
             ,{9,20180601}],d_rec);

GroupWithinYear(ds);

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1610
Joined: Wed Oct 26, 2011 7:40 pm

Tue Dec 13, 2016 8:05 pm Change Time Zone

Once again, Thanks Richard.

These will take a bit of time to digest. Just wondered if this would be a good class exercise, as its simple to state, yet tricky to implement. It would at least sort the men form the boys, and you may get back some interesting implementations. (I know which category I'd end up in)

Yours

Allan
Allan
 
Posts: 442
Joined: Sat Oct 01, 2011 7:26 pm

Wed Dec 14, 2016 4:37 am Change Time Zone

Allan,

Interesting. I'll have a think on it.

The key elements of this solution are:
  • understanding how ITERATE works
  • knowing that PROJECT can be nested inside the ITERATE's TRANSFORM (as can any other transform operation)
  • knowing how to use both forms of SKIP in any TRANSFORM
Simple :D

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1610
Joined: Wed Oct 26, 2011 7:40 pm


Return to Programming

Who is online

Users browsing this forum: No registered users and 1 guest