Tue Oct 19, 2021 7:44 am
Login Register Lost Password? Contact Us


Available White Papers

Topics related to the Hadoop Connector or migrating data from Hadoop

Tue Jun 21, 2011 8:18 pm Change Time Zone

In response to recent social media posts comparing HPCC/ECL to Hadoop/MapReduce/PIG, below are references to three helpful white papers that provide an introduction, comparison and benchmarks on the technology.

http://hpccsystems.com/community/white-papers/ecl-for-piggers
http://hpccsystems.com/community/white-papers/ecl-for-hadoopers
http://hpccsystems.com/community/white-papers/performing-pig-pen
admin
Site Admin
Site Admin
 
Posts: 208
Joined: Thu Jan 27, 2011 10:58 am

Thu Jun 30, 2011 7:56 pm Change Time Zone

I noticed on page 6 of wp_ecl_for_hadoopers.pdf that the LOCAL was absent from the sort to help with possible skewing but the following DEDUP had LOCAL included. Is that what you intended since LOCAL will keep the focus of the operation on a per node basis? Thanks.
udetelx
 
Posts: 7
Joined: Thu Jun 30, 2011 7:38 pm

Thu Jun 30, 2011 8:04 pm Change Time Zone

The LOCAL is on a ROLLUP, not DEDUP.
udetelx
 
Posts: 7
Joined: Thu Jun 30, 2011 7:38 pm

Thu Jun 30, 2011 10:08 pm Change Time Zone

You are correct. The SORT will distribute the data evenly - but it does ensure that all of the records with the same value for the key are on the same node.
GIVEN that the rollup condition contained an equality on the whole key from the sort I could guarantee that a rollup would never need to pull records from the following node. I thus used the ,LOCAL flag so that all nodes could act independantly.
dabayliss
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 109
Joined: Fri Apr 29, 2011 1:35 pm

Fri Jul 01, 2011 8:10 am Change Time Zone

In this case (and many others) if you didn't add ,LOCAL to the ROLLUP then the code generator will automatically spot that all the matching records must be on the same node, and will automatically add the LOCAL attribute for you.
ghalliday
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 198
Joined: Wed May 18, 2011 9:48 am

Fri Jul 01, 2011 1:34 pm Change Time Zone

@Gavin
Yes - that is true - but as this is an introductory text:
a) I wanted to show how it actually would execute
b) I wasn't quite ready to admit that you would pretty much ignore them and build the graph you throught they should have written :shock:
dabayliss
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 109
Joined: Fri Apr 29, 2011 1:35 pm


Return to From Hadoop

Who is online

Users browsing this forum: No registered users and 1 guest

cron