Call for workload benchmarks hpcc/hadoop/spark
*Caveat Some of you may know I used to work for LN.*
HPCC vs Hadoop vs Spark over Mellanox IB on x86
I am about to get Mellanox Unstructured Data Acceleration (UDA) for Hadoop on the same cluster I have HPCC installed on. (http://www.mellanox.com/page/hpcc http://www.mellanox.com/page/hadoop)
I'll be able to do an apples to apples comparison on the same hardware. I can't go into specifics on the clusters, but high number of cpus and more than 512GB ram on each node, for a 8 node system (1 master, 7 slaves)
I'll be testing spinny disk and SanDisk FusionIO.
I'd like to get ideas on honest workload benchmarks so I can document performance. My gut says the C++ will be faster, but I want to have defensible work to show a true apples to apples comparison.
My goal is to show results for HPCC, MapReduce, Spark jobs.
HPCC vs Hadoop vs Spark over Mellanox IB on x86
I am about to get Mellanox Unstructured Data Acceleration (UDA) for Hadoop on the same cluster I have HPCC installed on. (http://www.mellanox.com/page/hpcc http://www.mellanox.com/page/hadoop)
I'll be able to do an apples to apples comparison on the same hardware. I can't go into specifics on the clusters, but high number of cpus and more than 512GB ram on each node, for a 8 node system (1 master, 7 slaves)
I'll be testing spinny disk and SanDisk FusionIO.
I'd like to get ideas on honest workload benchmarks so I can document performance. My gut says the C++ will be faster, but I want to have defensible work to show a true apples to apples comparison.
My goal is to show results for HPCC, MapReduce, Spark jobs.
- Lee_Meadows
- Posts: 16
- Joined: Mon Jul 21, 2014 1:43 pm
Outside of SORTs, other benchmarks that quickly come to mind are:
1. Join two datasets, and you can try different types: 2 large datasets on a sort-merge-join in Hadoop, a large left hand and a small right hand for a lookup join, a large left hand and an indexed right hand for a keyed join, etc. – Hadoop vs. Thor
2. Play with SALT’s scored search for real-time slicing and dicing in memory in Roxie – Spark vs. Roxie
3. Graph processing using KEL (you can follow the tutorial in David’s Blog here: http://hpccsystems.com/blog/dabayliss) – Hadoop and Spark vs. HPCC
I hope this helps.
1. Join two datasets, and you can try different types: 2 large datasets on a sort-merge-join in Hadoop, a large left hand and a small right hand for a lookup join, a large left hand and an indexed right hand for a keyed join, etc. – Hadoop vs. Thor
2. Play with SALT’s scored search for real-time slicing and dicing in memory in Roxie – Spark vs. Roxie
3. Graph processing using KEL (you can follow the tutorial in David’s Blog here: http://hpccsystems.com/blog/dabayliss) – Hadoop and Spark vs. HPCC
I hope this helps.
- flavio
- Community Advisory Board Member
- Posts: 73
- Joined: Wed Apr 27, 2011 8:59 pm
2 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 1 guest