Sat Apr 21, 2018 3:24 pm
Login Register Lost Password? Contact Us

Call for workload benchmarks hpcc/hadoop/spark

Topics related to the Hadoop Connector or migrating data from Hadoop

Fri May 22, 2015 3:34 pm Change Time Zone

*Caveat Some of you may know I used to work for LN.*

HPCC vs Hadoop vs Spark over Mellanox IB on x86

I am about to get Mellanox Unstructured Data Acceleration (UDA) for Hadoop on the same cluster I have HPCC installed on. (

I'll be able to do an apples to apples comparison on the same hardware. I can't go into specifics on the clusters, but high number of cpus and more than 512GB ram on each node, for a 8 node system (1 master, 7 slaves)

I'll be testing spinny disk and SanDisk FusionIO.

I'd like to get ideas on honest workload benchmarks so I can document performance. My gut says the C++ will be faster, but I want to have defensible work to show a true apples to apples comparison.

My goal is to show results for HPCC, MapReduce, Spark jobs.
Posts: 16
Joined: Mon Jul 21, 2014 1:43 pm

Fri May 22, 2015 4:38 pm Change Time Zone

Outside of SORTs, other benchmarks that quickly come to mind are:

1. Join two datasets, and you can try different types: 2 large datasets on a sort-merge-join in Hadoop, a large left hand and a small right hand for a lookup join, a large left hand and an indexed right hand for a keyed join, etc. – Hadoop vs. Thor
2. Play with SALT’s scored search for real-time slicing and dicing in memory in Roxie – Spark vs. Roxie
3. Graph processing using KEL (you can follow the tutorial in David’s Blog here: – Hadoop and Spark vs. HPCC

I hope this helps.
Community Advisory Board Member
Community Advisory Board Member
Posts: 73
Joined: Wed Apr 27, 2011 8:59 pm

Return to From Hadoop

Who is online

Users browsing this forum: No registered users and 1 guest