Calculate Median
Hi,
How to calculate the median for a particular column..?
Tried looking at the concept of FieldAggregates.Medians, but not clear about the usage.
Can anybody share a example on how to use Median..?
Can median be used for cross tab reports (to calculate median based on a grouped value)...?
Kindly help me regarding this.
Thanks a lot in advance.
Regards,
ksviswa
How to calculate the median for a particular column..?
Tried looking at the concept of FieldAggregates.Medians, but not clear about the usage.
Can anybody share a example on how to use Median..?
Can median be used for cross tab reports (to calculate median based on a grouped value)...?
Kindly help me regarding this.
Thanks a lot in advance.
Regards,
ksviswa
- ksviswa
- Posts: 129
- Joined: Sat Jun 09, 2012 9:43 am
ksviswa,
This code shows you how it works:
Richard
This code shows you how it works:
- Code: Select all
IMPORT ML,STD;
ds := DATASET([{20120101},{20120102},{20120103},{20120104},{20120105},{20120106}],{integer d});
ML.Types.NumericField XF(ds L, integer C) := TRANSFORM
SELF.id := C;
SELF.number := 1;
SELF.value := STD.Date.FromJulianDate(L.D);
END;
P := PROJECT(ds,XF(LEFT,COUNTER));
AVG2(REAL L, REAL R) := AVE(L,R);
DateStr(REAL jdate) := STD.Date.ToString(STD.Date.ToJulianDate(ROUND(jdate)),'%Y-%m-%d');
Simples := ML.FieldAggregates(P).Simple;
Minval := Simples[1].minval;
Maxval := Simples[1].maxval;
Meanval := Simples[1].mean;
Quarts := ML.FieldAggregates(P).NTileRanges(4);
Quart1 := AVG2(Quarts[1].max,Quarts[2].min);
// Quart2 := AVG2(Quarts[2].max,Quarts[3].min);
Quart2 := ML.FieldAggregates(P).Medians[1].median;
Quart3 := AVG2(Quarts[3].max,Quarts[4].min);
OutDS := DATASET([{'Min',DateStr(Minval)},
{'1st Qu',DateStr(Quart1)},
{'Median',DateStr(Quart2)},
{'Mean',DateStr(Meanval)},
{'3rd Qu',DateStr(Quart3)},
{'Max',DateStr(Maxval)}],
{STRING10 Prompt,STRING10 Val});
OutDS;
Richard
- rtaylor
- Community Advisory Board Member
- Posts: 1619
- Joined: Wed Oct 26, 2011 7:40 pm
Hi ksviswa,
Here's how you can use the FieldAggregates.Medians:
IMPORT ML;
myData := DATASET([{1,1,1}, {2,2,1}, {3,5,2}, {4,7,6}],{UNSIGNED rid; REAL f1; REAL f2;});
// Turn into regular NumericField file (with continuous variables)
ML.ToField(myData,o);
ML.FieldAggregates(o).Medians;
The output will have median values for both f1 and f2 columns.
Please make sure your ecl-ml code is up to date. You can get the most recent version at : https://github.com/hpcc-systems/ecl-ml
Edin
Here's how you can use the FieldAggregates.Medians:
IMPORT ML;
myData := DATASET([{1,1,1}, {2,2,1}, {3,5,2}, {4,7,6}],{UNSIGNED rid; REAL f1; REAL f2;});
// Turn into regular NumericField file (with continuous variables)
ML.ToField(myData,o);
ML.FieldAggregates(o).Medians;
The output will have median values for both f1 and f2 columns.
Please make sure your ecl-ml code is up to date. You can get the most recent version at : https://github.com/hpcc-systems/ecl-ml
Edin
- [email protected]
- Community Advisory Board Member
- Posts: 3
- Joined: Wed Jul 13, 2011 9:38 pm
Thanks Richard and Edin,
The recent code from github solved the median calculation issue for a particular column..
How the same can be applied to a group..?
Sample Input to calculate median based on a grouped value.
Problem Statement : Median age per city
Sample Output :
Can we use FieldAggregates.Medians to compute this result or any other logic to be implemented.
Any pointers regarding the same would be highly appreciated.
Thanks a lot in advance.
Thanks and Regards,
ksviswa
The recent code from github solved the median calculation issue for a particular column..
How the same can be applied to a group..?
Sample Input to calculate median based on a grouped value.
- Code: Select all
'A','Boston',23
'B','Boston',43
'C','Boston',29
'D','Chicago',15
'E','NY',12
'F','NY',55
'G','NY',57
'H','NY',61
Problem Statement : Median age per city
Sample Output :
- Code: Select all
city MedianAge
---------- -----------
Boston 29
Chicago 15
NY 56
Can we use FieldAggregates.Medians to compute this result or any other logic to be implemented.
Any pointers regarding the same would be highly appreciated.
Thanks a lot in advance.
Thanks and Regards,
ksviswa
- ksviswa
- Posts: 129
- Joined: Sat Jun 09, 2012 9:43 am
ksviswa,
Try this:
Richard
Try this:
- Code: Select all
IMPORT ML,STD;
rec := RECORD
STRING1 Ltr;
STRING10 City;
unsigned1 age;
END;
ds := dataset([ {'A','Boston',23},
{'B','Boston',43},
{'C','Boston',29},
{'D','Chicago',15},
{'E','NY',12},
{'F','NY',55},
{'G','NY',57},
{'H','NY',61}],rec);
t1 := SORT(TABLE(ds,{city},City),City);
CityNums := PROJECT(t1,TRANSFORM({UNSIGNED4 Num,STRING10 City},SELF.Num := COUNTER;SELF := LEFT));
ML.Types.NumericField XF(ds L, integer C) := TRANSFORM
SELF.id := C;
SELF.number := CityNums(City = L.City)[1].Num;
SELF.value := L.age;
END;
P := PROJECT(ds,XF(LEFT,COUNTER));
ML.FieldAggregates(P).Medians;
Richard
- rtaylor
- Community Advisory Board Member
- Posts: 1619
- Joined: Wed Oct 26, 2011 7:40 pm
Perhaps not most efficient, but in a situation where I couldn't directly use the ML package I decided to calculate the median by taking the middle value(s):
- Code: Select all
rec := RECORD
STRING1 Ltr;
STRING10 City;
unsigned1 age;
END;
ds := dataset([ {'A','Boston',23},
{'B','Boston',43},
{'C','Boston',29},
{'D','Chicago',15},
{'E','NY',12},
{'F','NY',55},
{'G','NY',57},
{'H','NY',11}],rec);
// count records per city
t_counts_format := RECORD
STRING10 City:=ds.City;
UNSIGNED3 recordCount:=COUNT(GROUP);
END;
t_counts := TABLE(
ds,
t_counts_format,
ds.City
);
// join ds with count
ds_city_count_layout:=RECORD
STRING1 Ltr;
STRING10 City;
unsigned1 age;
UNSIGNED3 recordCount;
END;
ds_city_count_layout join_ds_city_count(ds L, t_counts R):=TRANSFORM
SELF:=L;
SELF.recordCount:=R.recordCount;
END;
ds_city_count:=JOIN(
ds,
t_counts,
LEFT.City=RIGHT.City
);
//iterate and only keep the median(s): 1 value for uneven recordCount, 2 values if the group count is even.
ds_city_count_grouped_city := GROUP(SORT(ds_city_count,City,age),City); //then group them
ds_city_count_layout MedianValues(ds_city_count_grouped_city L, INTEGER C) :=
TRANSFORM, SKIP((L.recordCount % 2 = 0 AND (C < L.recordCount/2 OR C > ((L.recordCount/2)+1))) OR (L.recordCount % 2 != 0 AND C != ((L.recordCount+1)/2)))
SELF := L;
END;
ds_city_count_grouped_city_median_values := PROJECT(ds_city_count_grouped_city,
MedianValues(LEFT,COUNTER));
// median is now average of values in the group.
city_median_age_format:=RECORD
STRING10 City:=ds_city_count_grouped_city_median_values.City;
UDECIMAL10_5 median_age:=AVE(GROUP,ds_city_count_grouped_city_median_values.age);
END;
city_median_age:=TABLE(
ds_city_count_grouped_city_median_values,
city_median_age_format,
ds_city_count_grouped_city_median_values.City
);
OUTPUT(city_median_age);
- jeroenbaas
- Posts: 11
- Joined: Mon Sep 30, 2013 2:20 pm
Jeroen,
Here's a slightly different approach:
I ran each version several times, and this version seems to be consistently faster than your original code (although that could change given large amounts of data -- so YMMV
);
HTH,
Richard
Here's a slightly different approach:
- Code: Select all
rec := RECORD
STRING1 Ltr;
STRING10 City;
UNSIGNED1 age;
END;
ds := DATASET([ {'A','Boston',23},
{'B','Boston',43},
{'C','Boston',29},
{'D','Chicago',15},
{'E','NY',12},
{'F','NY',55},
{'G','NY',57},
{'H','NY',11}],rec);
DNrec := RECORD
STRING10 City;
DATASET({UNSIGNED1 age}) ages;
END;
pds := PROJECT(TABLE(ds,{city},city),
TRANSFORM(DNrec,SELF.Ages := [],SELF := LEFT));
dnds := DENORMALIZE(pds, ds,
LEFT.city = RIGHT.city,
TRANSFORM(DNrec,
SELF.Ages := LEFT.Ages +
ROW({RIGHT.age},{UNSIGNED1 age});
SELF := LEFT));
Outrec := RECORD
STRING10 City;
UDECIMAL10_1 MedianAge;
END;
OutRec XF2(dnds L) := TRANSFORM
AgeCnt := COUNT(L.ages);
S_ages := SORT(L.ages,age);
MidRec := AgeCnt - (AgeCnt DIV 2);
SELF.MedianAge := IF(AgeCnt % 2 = 1,
S_ages[MidRec].age,
(S_ages[MidRec].age + S_ages[MidRec+1].age)/2);
SELF := L;
END;
PROJECT(dnds,XF2(LEFT));
I ran each version several times, and this version seems to be consistently faster than your original code (although that could change given large amounts of data -- so YMMV

HTH,
Richard
- rtaylor
- Community Advisory Board Member
- Posts: 1619
- Joined: Wed Oct 26, 2011 7:40 pm
8 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 1 guest