Fri Jan 21, 2022 6:09 pm

Interesting grouping problem.

Questions around writing code and queries
Hi,

I'm been working on this for a couple of days now and, to be frank, its got me beat.
I need to group a set of records that are all within 1 year of each other, so in the example below:
Code: Select all
d := DATASET([{1,20150101}
,{2,20150201}
,{3,20150301}
,{4,20150401}
,{5,20160229}
,{6,20170101}
,{7,20170102}
,{8,20170103}
,{9,20180601}],{unsigned1 id,UNSIGNED4 date});

The records should end up grouped thus:
Code: Select all
1,2,3,4
3,4,5
5,6,7,8
9

Note that records 3,4 and 5 end up on more than 1 group.
With JOINS I can match records within 1 year and those that are definitely in separate groups, but that does not get me any further into the grouping process.

So simplify a bit, the problem can be stated as just numbers where records within a numeric range are in a group.
I'm wondering if the dreaded GRAPH or LOOP functions get involved?

Cheers
Allan
Allan

Posts: 442
Joined: Sat Oct 01, 2011 7:26 pm

Allan,

I wouod approach it something like this:
Code: Select all
IMPORT Std;
d := DATASET([{1,20150101}
,{2,20150201}
,{3,20150301}
,{4,20150401}
,{5,20160229}
,{6,20170101}
,{7,20170102}
,{8,20170103}
,{9,20180601}],{unsigned1 id,UNSIGNED4 date});

OutRec := RECORD
unsigned1 id;
DATASET(RECORDOF(d)) Grp;
END;

OutRec XF(d L) := TRANSFORM
SELF.Grp := d(Date BETWEEN L.Date AND Std.Date.AdjustCalendar(L.Date,1));
SELF := L;
END;
PROJECT(d,XF(LEFT));

HTH,

Richard
rtaylor Posts: 1610
Joined: Wed Oct 26, 2011 7:40 pm

Thanks for this Richard,

I was, in fact, generating something like this but it does not generate the groups as set out above, In the results from our example one has to take this result and somehow detect that group 3 contains an entry that can not below with an entry in group 1.

Yours

Allan
Allan

Posts: 442
Joined: Sat Oct 01, 2011 7:26 pm

Allan,
one has to take this result and somehow detect that group 3 contains an entry that can not below with an entry in group 1.
What exactly is the criteria for determining that? I didn't detect a pattern to your example result groups.

Richard
rtaylor Posts: 1610
Joined: Wed Oct 26, 2011 7:40 pm

Well,

Items 1,2,3 and 4 are obviously within one group as they are all within 1 year of each other.
Items 3,4 and 5 are also within 1 year of each other but items 1 and 2 are not within that group as they are too early.
Items 5,6,7 and 8 are within 1 year of each other, but all previous items are too early to be in this group.
Item 9 is a soul member of a group as no other dates are within 1 year of it.
Allan

Posts: 442
Joined: Sat Oct 01, 2011 7:26 pm

Allan,

OK, here's my take on getting to where you want to be:
Code: Select all
IMPORT Std;
d := DATASET([{1,20150101}
,{2,20150201}
,{3,20150301}
,{4,20150401}
,{5,20160229}
,{6,20170101}
,{7,20170102}
,{8,20170103}
,{9,20180601}],{unsigned1 id,UNSIGNED4 date});
d_rec := RECORDOF(d);
OutRec := RECORD
unsigned1 id;
DATASET(d_rec) Grp;
END;

OutRec XF(d L) := TRANSFORM
SELF.Grp := d(Date BETWEEN L.Date AND Std.Date.AdjustCalendar(L.Date,1));
SELF := L;
END;
RawGrps := PROJECT(d,XF(LEFT));

OutRec IterXF(OutRec L, OutRec R) := TRANSFORM
FullSet := SET(L.Grp,id);
d_rec ProjXF(d_rec Chld) := TRANSFORM,SKIP(Chld.id IN FullSet)
SELF := Chld;
END;
SELF.ID := IF(EXISTS(PROJECT(R.Grp,ProjXF(LEFT))),R.ID, SKIP);
SELF := R;
END;

JustGrps := ITERATE(RawGrps,IterXF(LEFT,RIGHT));
JustGrps;
I'm nesting a PROJECT inside an ITERATE to SKIP those RawGrps whose child datasets are simply subsets of the previous Group. The result is exactly what your example suggested it should be. Hopefully this technique will be applicable to your real problem. HTH,

Richard
rtaylor Posts: 1610
Joined: Wed Oct 26, 2011 7:40 pm

Brilliant Richard!

Thanks very much.
Yours
Allan
Allan

Posts: 442
Joined: Sat Oct 01, 2011 7:26 pm

Allan,

And a slightly more terse version that also numbers each group sequentially:
Code: Select all
IMPORT Std;
d_rec := {unsigned1 id,UNSIGNED4 date};
d := DATASET([{1,20150101}
,{2,20150201}
,{3,20150301}
,{4,20150401}
,{5,20160229}
,{6,20170101}
,{7,20170102}
,{8,20170103}
,{9,20180601}],d_rec);
OutRec := RECORD
unsigned1 id;
DATASET(d_rec) Grp;
END;

RawGrps := PROJECT(d,TRANSFORM(OutRec,
SELF.Grp := d(Date BETWEEN LEFT.Date AND
SELF := LEFT));
OutRec IterXF(OutRec L, OutRec R) := TRANSFORM
ds := PROJECT(R.Grp,TRANSFORM(d_rec,
SKIP(LEFT.id IN SET(L.Grp,id)),
SELF := LEFT));
SELF.ID := IF(EXISTS(ds),L.ID+1, SKIP);
SELF := R;
END;

JustGrps := ITERATE(RawGrps,IterXF(LEFT,RIGHT));
JustGrps;

And, of course, you could turn it into a FUNCTION, like this:
Code: Select all
IMPORT Std;
d_rec := {UNSIGNED4 id,UNSIGNED4 date};
GroupWithinYear(DATASET(d_rec) d) := FUNCTION
OutRec := RECORD
UNSIGNED4 id;
DATASET(d_rec) Grp;
END;
RawGrps := PROJECT(d,TRANSFORM(OutRec,
SELF.Grp := d(Date BETWEEN LEFT.Date AND
SELF := LEFT));
OutRec IterXF(OutRec L, OutRec R) := TRANSFORM
ds := PROJECT(R.Grp,TRANSFORM(d_rec,
SKIP(LEFT.id IN SET(L.Grp,id)),
SELF := LEFT));
SELF.ID := IF(EXISTS(ds),L.ID+1, SKIP);
SELF := R;
END;
JustGrps := ITERATE(RawGrps,IterXF(LEFT,RIGHT));
RETURN JustGrps;
END;

ds := DATASET([{1,20150101}
,{2,20150201}
,{3,20150301}
,{4,20150401}
,{5,20160229}
,{6,20170101}
,{7,20170102}
,{8,20170103}
,{9,20180601}],d_rec);

GroupWithinYear(ds);

HTH,

Richard
rtaylor Posts: 1610
Joined: Wed Oct 26, 2011 7:40 pm

Once again, Thanks Richard.

These will take a bit of time to digest. Just wondered if this would be a good class exercise, as its simple to state, yet tricky to implement. It would at least sort the men form the boys, and you may get back some interesting implementations. (I know which category I'd end up in)

Yours

Allan
Allan

Posts: 442
Joined: Sat Oct 01, 2011 7:26 pm

Allan,

Interesting. I'll have a think on it.

The key elements of this solution are:
• understanding how ITERATE works
• knowing that PROJECT can be nested inside the ITERATE's TRANSFORM (as can any other transform operation)
• knowing how to use both forms of SKIP in any TRANSFORM
Simple HTH,

Richard
rtaylor 