Skip to main content

PigMix is a set of 17 Pig programs that are used as a benchmark to measure the comparative performance of the Pig programming language versus hand-coded Java running in a Hadoop environment. The algorithms were chosen and coded by the Pig community and should be representative of what Pig is used for and embody best practices for how to do it.

HPCC Systems provides a utility program called Bacon, which can automatically translate Pig programs into the equivalent ECL. The Bacon-translated versions of the PigMix tests are presented below.

For more information on the PigMix benchmark, and actual performance results of ECL versus the Pig and Java versions on the benchmark tests on an identical hardware configuration, refer to the whitepaper “Performing in the PigPen”. To learn more about Bacon and equivalent ECL for Pig language statements, refer to “ECL for Piggers”.

ECL outperforms Pig and Java significantly on the Hadoop PigMix benchmark on an identical hardware configuration! Across all tests, ECL was an average 4.45x faster than Pig and 3.23x faster than hand-coded Java.

View additional results performed from a recent benchmark test.


SCRIPT INDEX
Click on the script to view the full code comparison

Script L2

Script L3

Script L4

Script L5

Script L5 Modified

Script L6

Script L7

Script L8

Script L9

Script L10

Script L11

Script L12

Script L13

Script L14

Script L15

Script L16

Script L17
Common ECL attribute definitions (ZIP)

*Note: Script L1

Script L2

This script tests using a join small enough to do in fragment and replicate.

PigMix

register pigperf.jar;

 A = load '/user/pig/tests/data/pigmix/page_views' using 
  org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
  as (user, action, timespent, query_term, ip_addr, timestamp,
  estimated_revenue, page_info, page_links);

 B = foreach A generate user, estimated_revenue;
  alpha = load '/user/pig/tests/data/pigmix/power_users' using 
  PigStorage('\u0001') as (name, phone,
  address, city, state, zip);
  beta = foreach alpha generate name;

 C = join B by user, beta by name using "replicated" parallel 40;

 store C into 'L2out';

ECL

//BACON V0.0.7 Alpha generated ECL

 IMPORT PigTypes;

 IMPORT PigMixHelper;

 a := DATASET('~pigmix::subset::page_views',PigMixHelper.Layout_Page_Views,
  thor);

 b := TABLE(a,{user,estimated_revenue});

 alpha := DATASET('~pigmix::subset::power_users',{PigTypes.chararray name,
  /*PigTypes.chararray phone,*/PigTypes.chararray address,
  PigTypes.chararray city,PigTypes.NoType state,PigTypes.int zip},
  CSV(heading(0),separator('\t'),quote(''),terminator('\n')));

 beta := TABLE(alpha,{name});

 c := JOIN(b,beta,LEFT.user = RIGHT.name,LOOKUP);

 OUTPUT(c,,'pigmix::test::L2out',overwrite);

Back to top

Script L3

This script tests a join too large for fragment and replicate. It also contains a join followed by a group by on the same key, something that we could potentially optimize by not regrouping.

PigMix

register pigperf.jar;

 A = load '/user/pig/tests/data/pigmix/page_views'
  using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
  as (user, action, timespent, query_term, ip_addr, timestamp,
  estimated_revenue, page_info, page_links);

 B = foreach A generate user, (double)estimated_revenue;

 alpha = load '/user/pig/tests/data/pigmix/users' using PigStorage('\u0001')
  as (name, phone, address,
  city, state, zip);

 beta = foreach alpha generate name;

 C = join beta by name, B by user parallel 40;

 D = group C by $0 parallel 40;

 E = foreach D generate group, SUM(C.estimated_revenue);

 store E into 'L3out';

ECL

//BACON V0.0.7 Alpha generated ECL

 IMPORT PigTypes;

 IMPORT PigMixHelper;

 a := DATASET('~pigmix::subset::page_views',PigMixHelper.Layout_Page_Views,
  thor);

 b := TABLE(a,{user,PigTypes.double estimated_revenue := (PigTypes.double)
  estimated_revenue});

 alpha := DATASET('~pigmix::subset::users',{PigTypes.chararray name,
  /*PigTypes.chararray phone,*/PigTypes.chararray address,
  PigTypes.chararray city,PigTypes.NoType state,PigTypes.int zip},
  CSV(heading(0),separator('\001'),quote(''),terminator('\n')));

 beta := TABLE(alpha,{name});

 c := JOIN(beta,b,LEFT.name = RIGHT.user,HASH);

 d := c;

 e := TABLE(d,{name,SUM(GROUP,estimated_revenue)},name,MERGE);

 OUTPUT(e,,'pigmix::test::L3out',overwrite);

Back to top

Script L4

This script covers foreach/generate with a nested distinct.

PigMix

register pigperf.jar;

 A = load '/user/pig/tests/data/pigmix/page_views'
  using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
  as (user, action, timespent, query_term, ip_addr, timestamp,
  estimated_revenue, page_info, page_links);

 B = foreach A generate user, action;

 C = group B by user parallel 40;

 D = foreach C {
  aleph = B.action;
  beth = distinct aleph;
  generate group, COUNT(beth);
  }

 store D into 'L4out';

ECL

//BACON V0.0.11 Beta generated ECL

//IMPORT PigTypes AS *;

 IMPORT PigTypes;

 IMPORT PigMixHelper;

 a := DATASET('~pigmix::subset::page_views',PigMixHelper.Layout_Page_Views,
  thor);

 b := TABLE(a,{user,action});

 PigTypes._GROUP(b,user,b_records,group_out0_0);

 c := group_out0_0;

 d := TABLE(c,{Group_Key,COUNT(DEDUP(TABLE(b_records,{action}),
  WHOLE RECORD,ALL))});

 OUTPUT(d,,'pigmix::test::L4out');

Back to top

Script L5

This script does an anti-join. This is useful because it is a use of cogroup that is not a regular join.

PigMix

register pigperf.jar;

 A = load '/user/pig/tests/data/pigmix/page_views'
  using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
  as (user, action, timespent, query_term, ip_addr, timestamp,
  estimated_revenue, page_info, page_links);

 B = foreach A generate user;

 alpha = load '/user/pig/tests/data/pigmix/users' using PigStorage('\u0001')
  as (name, phone, address,
  city, state, zip);

 beta = foreach alpha generate name;

 C = cogroup beta by name, B by user parallel 40;

 D = filter C by COUNT(beta) == 0;

 E = foreach D generate group;

 store E into 'L5out';

ECL

//BACON V0.0.9.001 Alpha generated ECL

 IMPORT PigTypes;

 IMPORT PigMixHelper;

 a := DATASET('~pigmix::subset::page_views',PigMixHelper.layout_page_views,
  thor);

 b := TABLE(a,{user});

 alpha := dataset('~pigmix::subset::users', PigMixHelper.Layout_Users,
  CSV(heading(0),separator('\001'),quote(''),terminator('\n')));

 beta := TABLE(alpha,{name});
  PigTypes._GROUP(beta,name,beta_records,group_out0_0);
  PigTypes._GROUP(b,user,b_records,group_out0_1);

 join_out0_1 := JOIN(group_out0_0,group_out0_1,
  left.Group_Key=right.Group_Key,FULL OUTER,LOCAL);

 c := join_out0_1;

 d := c(COUNT(beta_records)=0);

 e := TABLE(d,{Group_Key});

 OUTPUT(e,,'pigmix::test::L5out',overwrite);

Back to top

Script L5 Modified

This script does an anti-join. This is useful because it is a use of cogroup that is not a regular join.

PigMix

 IMPORT PigTypes;

 a := DATASET('~pigmix::subset::page_views',PigMix.layout_page_views,
  thor);

 b := TABLE(a,{user});

 bcnt := TABLE(b, {user, cntb:=count(group)}, user);

 alpha := dataset('~pigmix::subset::users', PigMix.layout_users,
  CSV(heading(0),separator('\001'),quote(''),
  terminator('\n')));

 beta := TABLE(alpha,{user:=name});

 betacnt := TABLE(beta, {user, cntbeta:=count(group)}, user);

 c:= JOIN(bcnt,betacnt,left.user=right.user,FULL OUTER);

 d := TABLE(c(cntbeta=0),{user});

 OUTPUT(d,,'pigmix::test::L5out',overwrite);

ECL

// Dataset definitions stored as attributes

// Uses LEFT ONLY Join (not available in Pig) to derive answer

// Determine unique Page View user and user names

 IMPORT PigMixHelper;

 a := DEDUP(TABLE(PigMixHelper.File_Page_Views,{user}),user,all);

 b := DEDUP(TABLE(PigMixHelper.File_Users,{name}),name,all);

 c:= JOIN(a,b,left.user=right.name,LEFT ONLY,LOCAL);

 OUTPUT(c,,'pigmix::test::L5out',overwrite);

Back to top

Script L6

This script covers the case where the group by key is a significant percentage of the row.

PigMix

register pigperf.jar;

 A = load '/user/pig/tests/data/pigmix/page_views'
  using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
  as (user, action, timespent, query_term, ip_addr, timestamp,
  estimated_revenue, page_info, page_links);

 B = foreach A generate user, action, (int)timespent as timespent,
  query_term, ip_addr, timestamp;

 C = group B by (user, query_term, ip_addr, timestamp) parallel 40;

 D = foreach C generate flatten(group), SUM(B.timespent);

 store D into 'L6out';

ECL

//BACON V0.0.8Alpha generated ECL

 IMPORT PigTypes;

 IMPORT PigMixHelper;

 a := DATASET('~pigmix::subset::page_views',PigMixHelper.Layout_Page_Views,
  thor);

 b := TABLE(a,{user,action,PigTypes.int timespent := (PigTypes.int)
  timespent,query_term,ip_addr,timestamp});

 c := b;

 d := TABLE(c,{user,query_term,ip_addr,timestamp,SUM(GROUP,timespent)},
  user,query_term,ip_addr,timestamp,MERGE);

 OUTPUT(d,,'pigmix::test::L6out',overwrite);

Back to top

Script L7

This script covers having a nested plan with splits.

PigMix

register pigperf.jar;
 A = load '/user/pig/tests/data/pigmix/page_views'
  using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
  as (user, action, timespent, query_term,
  ip_addr, timestamp, estimated_revenue, page_info, page_links);

 B = foreach A generate user, timestamp;

 C = group B by user parallel 40;

 D = foreach C {
  morning = filter B by timestamp < 43200;
  afternoon = filter B by timestamp >= 43200;
  generate group, COUNT(morning), COUNT(afternoon);
  }

 store D into 'L7out';

ECL

//BACON V0.0.11 Beta generated ECL

//IMPORT PigTypes AS *;

 IMPORT PigTypes;

 IMPORT PigMixHelper;

 a := DATASET('~pigmix::subset::page_views',PigMixHelper.Layout_Page_Views,
  thor);

 b := TABLE(a,{user,timestamp});
  PigTypes._GROUP(b,user,b_records,group_out0_0);

 c := group_out0_0;

 d := TABLE(c,{Group_Key,COUNT(b_records(timestamp<43200)),

 COUNT(b_records(timestamp>=43200))});

 OUTPUT(d,,'pigmix::test::L7out');

Back to top

Script L8

This script covers group all.

PigMix

register pigperf.jar;

 A = load '/user/pig/tests/data/pigmix/page_views' using
  org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
  as (user, action, timespent, query_term, ip_addr, timestamp,
  estimated_revenue, page_info, page_links);

 B = foreach A generate user, (int)timespent as timespent,
  (double)estimated_revenue as estimated_revenue;

 C = group B all;

 D = foreach C generate SUM(B.timespent), AVG(B.estimated_revenue);

 store D into 'L8out';

ECL

// Keep from going to hThor

 #option ('pickBestEngine', false)


//BACON V0.0.8Alpha generated ECL

 IMPORT PigTypes;

 IMPORT PigMixHelper;

 a := DATASET('~pigmix::subset::page_views',PigMixHelper.Layout_Page_Views
  ,thor);

 b := TABLE(a,{user,PigTypes.int timespent := (PigTypes.int) timespent,
  PigTypes.double estimated_revenue := (PigTypes.double) estimated_revenue});

 c := b;

 d := TABLE(c,{SUM(GROUP,timespent),AVE(GROUP,estimated_revenue)});

 OUTPUT(d,,'pigmix::test::L8out',overwrite);

Back to top

Script L9

This script covers order by of a single value.

PigMix

register pigperf.jar;

 A = load '/user/pig/tests/data/pigmix/page_views' using
  org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
  as (user, action, timespent, query_term, ip_addr, timestamp,
  estimated_revenue, page_info, page_links);

 B = order A by query_term parallel 40;

 store B into 'L9out';

ECL

//BACON V0.0.8Alpha generated ECL

 IMPORT PigTypes;

 IMPORT PigMixHelper;

 a := DATASET('~pigmix::subset::page_views',PigMixHelper.Layout_Page_Views,
  thor);

// PARALLEL n ignored; ECL automatically distributes collation across
  all nodes

 b := SORT(a,query_term);

 OUTPUT(b,,'pigmix::test::L9out',overwrite);

Back to top

Script L10

This script covers order by of multiple values.

PigMix

register pigperf.jar;

 A = load '/user/pig/tests/data/pigmix/page_views'
  using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
  as (user, action, timespent:int, query_term, ip_addr, timestamp,
  estimated_revenue:double, page_info, page_links);

 B = order A by query_term, estimated_revenue desc, timespent parallel 40;

 store B into 'L10out';

ECL

//BACON V0.0.8.001Alpha generated ECL

 IMPORT PigTypes;

 IMPORT PigMixHelper;

 a := DATASET('~pigmix::subset::page_views',PigMixHelper.Layout_Page_Views,
  thor);

// PARALLEL n ignored; ECL automatically distributes collation across,
  all nodes

 b := SORT(a,query_term,-estimated_revenue,timespent);

 OUTPUT(b,,'pigmix::test::L10out',overwrite);

Back to top

Script L11

This script covers distinct and union.

PigMix

register pigperf.jar;

 A = load '/user/pig/tests/data/pigmix/page_views'
  using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
  as (user, action, timespent, query_term, ip_addr, timestamp,
  estimated_revenue, page_info, page_links);

 B = foreach A generate user;

 C = distinct B parallel 40;

 alpha = load '/user/pig/tests/data/pigmix/widerow'
  using PigStorage('\u0001');

 beta = foreach alpha generate $0 as name;

 gamma = distinct beta parallel 40;

 D = union C, gamma;

 E = distinct D parallel 40;

 store E into 'L11out';

ECL

//BACON V0.0.8Alpha generated ECL

 IMPORT PigTypes;

 IMPORT PigMixHelper;

 a := DATASET('~pigmix::subset::page_views',PigMixHelper.Layout_Page_Views,
  thor);

 b := TABLE(a,{user});

 c := DEDUP(b,WHOLE RECORD,ALL);

 alpha := DATASET('~pigmix::subset::widerow',PigMixHelper.Layout_Widerow,
  CSV(heading(0),separator('\001'),quote(''),terminator('\n')));

 beta := TABLE(alpha,{name := user});

 gamma := DEDUP(beta,WHOLE RECORD,ALL);

 d := c+gamma;

 e := DEDUP(d,WHOLE RECORD,ALL);

 OUTPUT(e,,'pigmix::test::L11out',overwrite);

Back to top

Script L12

This script covers multi-store queries.

PigMix

register pigperf.jar;

 A = load '/user/pig/tests/data/pigmix/page_views'
  using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
  as (user, action, timespent, query_term, ip_addr, timestamp,
  estimated_revenue, page_info, page_links);

 B = foreach A generate user, action, (int)timespent
  as timespent, query_term, (double)estimated_revenue as estimated_revenue;
  split B into C if user is not null, alpha if user is null;
  split C into D if query_term is not null, aleph if query_term is null;

 E = group D by user parallel 40;

 F = foreach E generate group, MAX(D.estimated_revenue);

 store F into 'highest_value_page_per_user';

 beta = group alpha by query_term parallel 40;

 gamma = foreach beta generate group, SUM(alpha.timespent);

 store gamma into 'total_timespent_per_term';

 beth = group aleph by action parallel 40;

 gimel = foreach beth generate group, COUNT(aleph);

 store gimel into 'queries_per_action';

ECL

//BACON V0.0.8Alpha generated ECL

 IMPORT PigTypes;

 IMPORT PigMixHelper;

 a := DATASET('~pigmix::subset::page_views',PigMixHelper.Layout_Page_Views,
  thor);

 b := TABLE(a,{user,action,PigTypes.int timespent := (PigTypes.int)
  timespent,query_term,PigTypes.double estimated_revenue := 
  (PigTypes.double) estimated_revenue});

 c := b(NOT (user = (typeof(user))''));

 alpha := b(user = (typeof(user))'');

 d := c(NOT (query_term = (typeof(query_term))''));

 aleph := c(query_term = (typeof(query_term))'');

 e := d;

// Note ECL automatically creates 3 parallel graphs for the three outputs

 f := TABLE(e,{user,MAX(GROUP,estimated_revenue)},user,MERGE);

 OUTPUT(f,,'pigmix::test::L12_highest_value_page_per_user',overwrite);

 beta := alpha;

 gamma := TABLE(beta,{query_term,SUM(GROUP,timespent)},query_term,MERGE);

 OUTPUT(gamma,,'pigmix::test::L12_total_timespent_per_term',overwrite);

 beth := aleph;

 gimel := TABLE(beth,{action,COUNT(GROUP)},action,MERGE);

 OUTPUT(gimel,,'pigmix::test::L12_queries_per_action',overwrite);

Back to top

Script L13

This script covers outer join.

PigMix

register pigperf.jar;

 A = load '/user/pig/tests/data/pigmix/page_views'
  using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
  as (user, action, timespent, query_term, ip_addr, timestamp,
  estimated_revenue, page_info, page_links);

 B = foreach A generate user, estimated_revenue;

 alpha = load '/user/pig/tests/data/pigmix/power_users_samples'
  using PigStorage('\u0001') 
     as (name, phone, address, city, state, zip);

 beta = foreach alpha generate name, phone;

 C = join B by user left outer, beta by name parallel 40;

 store C into 'L13out';

ECL

//BACON V0.0.8Alpha generated ECL

 IMPORT PigTypes;

 IMPORT PigMixHelper;

 a := DATASET('~pigmix::subset::page_views',PigMixHelper.Layout_Page_Views,
  thor);

 b := TABLE(a,{user,estimated_revenue});

 alpha := DATASET('~pigmix::subset::power_users_samples',{PigTypes.chararray
  name,/*PigTypes.NoType phone,*/PigTypes.chararray address,PigTypes.chararray
  city,PigTypes.chararray state,PigTypes.int zip},CSV(heading(0),
  separator('\001'),quote(''),terminator('\n')));

 beta := TABLE(alpha,{name,address});

 c := JOIN(b,beta,LEFT.user = RIGHT.name,LEFT OUTER,HASH);

 OUTPUT(c,,'pigmix::test::L13out',overwrite);

Back to top

Script L14

This script covers merge join.

PigMix

register pigperf.jar;

 A = load '/user/pig/tests/data/pigmix/page_views_sorted'
  using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
  as (user, action, timespent, query_term, ip_addr, timestamp,
  estimated_revenue, page_info, page_links);

 B = foreach A generate user, estimated_revenue;

 alpha = load '/user/pig/tests/data/pigmix/users_sorted'
  using PigStorage('\u0001') as (name, phone, address, city, state, zip);

 beta = foreach alpha generate name;

 C = join B by user, beta by name using "merge";

 store C into 'L14out';

ECL

//BACON V0.0.8Alpha generated ECL

 IMPORT PigTypes;

 IMPORT PigMixHelper;

 a := DATASET('~pigmix::subset::page_views_sorted',
  PigMixHelper.Layout_Page_Views,thor);

 b := TABLE(a,{user,estimated_revenue});

 alpha := DATASET('~pigmix::subset::users_sorted',
  PigMixHelper.Layout_Users_Sorted,thor);

 beta := TABLE(alpha,{name});

 c := JOIN(b,beta,LEFT.user = RIGHT.name,NOSORT);

 OUTPUT(c,,'pigmix::test::L14out',overwrite);

Back to top

Script L15

This script covers multiple distinct aggregates.

PigMix

register pigperf.jar;

 A = load '/user/pig/tests/data/pigmix/page_views' using
  org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
  as (user, action, timespent, query_term, ip_addr, timestamp, 
  estimated_revenue, page_info, page_links);

 B = foreach A generate user, action, estimated_revenue, timespent;

 C = group B by user parallel 40;

 D = foreach C {

 beth = distinct B.action;

 rev = distinct B.estimated_revenue;

 ts = distinct B.timespent;

 generate group, COUNT(beth), SUM(rev), (int)AVG(ts);

 }

 store D into 'L15out';

ECL

//BACON V0.0.11 Beta generated ECL

//IMPORT PigTypes AS *;

  IMPORT PigTypes;

  IMPORT PigMixHelper;

  a := DATASET('~pigmix::subset::page_views',PigMixHelper.Layout_Page_Views,
  thor);

  b := TABLE(a,{user,action,estimated_revenue,timespent});
  PigTypes._GROUP(b,user,b_records,group_out0_0);

  c := group_out0_0;

  d := TABLE(c,{Group_Key,COUNT(DEDUP(TABLE(b_records,{action}),
  WHOLE RECORD,ALL)),SUM(DEDUP(TABLE(b_records,
  {estimated_revenue}),WHOLE RECORD,ALL),estimated_revenue),(PigTypes.int)
  AVE(DEDUP(TABLE(b_records,{timespent}),WHOLE RECORD,ALL),timespent)});

  OUTPUT(d,,'pigmix::test::L15out');

Back to top

Script L16

This script covers accumulative mode.

PigMix

register pigperf.jar;

 A = load '/user/pig/tests/data/pigmix/page_views'
  using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
  as (user, action, timespent, query_term, ip_addr, timestamp,
  estimated_revenue, page_info, page_links);

 B = foreach A generate user, estimated_revenue;

 C = group B by user parallel 40;

 D = foreach C {

 E = order B by estimated_revenue;

 F = E.estimated_revenue;
  generate group, SUM(F);
  }

 store D into 'L16out';

ECL

//BACON V0.0.11 Beta generated ECL

//IMPORT PigTypes AS *;

 IMPORT PigTypes;

 IMPORT PigMixHelper;

 a := DATASET('~pigmix::subset::page_views',
  PigMixHelper.Layout_Page_Views,thor);

 b := TABLE(a,{user,estimated_revenue});
  PigTypes._GROUP(b,user,b_records,group_out0_0);

 c := group_out0_0;

 d := TABLE(c,{Group_Key,SUM(TABLE(SORT(b_records,estimated_revenue),
  {estimated_revenue}),estimated_revenue)});

 OUTPUT(d,,'pigmix::test::L16out');

Back to top

Script L17

This script covers wide key group.

PigMix

register pigperf.jar;

  A = load '/user/pig/tests/data/pigmix/widegroupbydata'
  using org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
  as (user, action, timespent, query_term, ip_addr, timestamp,
  estimated_revenue, page_info, page_links, user_1, action_1, timespent_1,
  query_term_1, ip_addr_1, timestamp_1, estimated_revenue_1, page_info_1,
  page_links_1, user_2, action_2, timespent_2, query_term_2, ip_addr_2,
  timestamp_2, estimated_revenue_2, page_info_2, page_links_2);

 B = group A by (user, action, timespent, query_term, ip_addr, timestamp,
  estimated_revenue, user_1, action_1, timespent_1, query_term_1, ip_addr_1,
  timestamp_1,
  estimated_revenue_1, user_2, action_2, timespent_2, query_term_2, ip_addr_2,
  timestamp_2,
  estimated_revenue_2) parallel 40;

 C = foreach B generate SUM(A.timespent), SUM(A.timespent_1),
  SUM(A.timespent_2),
  AVG(A.estimated_revenue), AVG(A.estimated_revenue_1),
  AVG(A.estimated_revenue_2);

 store C into 'L17out';

ECL

//BACON V0.0.7Alpha generated ECL

 IMPORT PigTypes;

 IMPORT PigMixHelper;

 a := dataset('~pigmix::subset::widegroupbydata',
  PigMixHelper.Layout_Widegroupbydata, thor);

 b := a;

 c := TABLE(b,{SUM(GROUP,timespent),SUM(GROUP,timespent_1),
  SUM(GROUP,timespent_2), AVE(GROUP,estimated_revenue),
  AVE(GROUP,estimated_revenue_1), AVE(GROUP,estimated_revenue_2)}
  ,user,action,timespent,query_term,ip_addr,timestamp,
  estimated_revenue,user_1,action_1,timespent_1,query_term_1,
  ip_addr_1,timestamp_1,estimated_revenue_1,user_2,action_2,
  timespent_2,query_term_2, ip_addr_2,timestamp_2,
  estimated_revenue_2,MERGE);

 OUTPUT(c,,'pigmix::test::L17out',overwrite);

Back to top

 

Note: A Bacon translation for PigMix test L1 is not included since a direct translation of native Pig map[ ] data type to an equivalent native ECL data type is not available and access to this data format currently requires custom ECL coding. This is expected to be available in a future release of Bacon.