HPCC vs Hadoop Detailed Comparison

Learn how the HPCC Systems Platform compares to Hadoop and why HPCC Systems is a superior alternative:

Item HPCC Systems HADOOP

Origin

Thor was invented in 1999 by LexisNexis specifically to solve large graph problems. The business model is based upon consuming large volumes of structured and unstructured data and converting it into a massive social graph of people and businesses. Hadoop was invented at Yahoo to index web data. The project was started in 2006. In 2008 Yahoo released the source as open source.

Parallelism Architecture

The Thor architecture was based on the data flow paradigm that supports three types of parallelism:
1. Data Parallelism = Where data is divided in parts and distributed across multiple nodes. Compute occurs on each node in parallel.
2. Pipeline Parallelism = Two consecutive operations can work on the same dataset at the same time. As soon as parts of the data is processed by the first it becomes available to the next operation.
3. System Parallelism = If the system detects that two or more operations are independent, the system will try to execute all of them in parallel.
The Hadoop architecture is based on the Map Reduce paradigm that was originally used by Google to index large volumes of content on the web. The (only) parallelism is derived by mapping (Map phase) the data into multiple parts, processing the maps and then consolidating (Reduce phase) the data into an output format.

High Level Scripting Languages

ECL (Enterprise Control Language) is a language built specifically to tackle the complexities around MPP data problems.
KEL (Knowledge Engineering Language) is dedicated to tackle graph problems around big data.
Pig, Hive. Both the languages convert high level scripts to MapReduce jobs.

Shared Nothing Architecture

Yes Yes

License

Apache 2.0 Apache 2.0

Native Binaries

Yes. Coded in C++ and compiled to native binaries No. Based on a JVM.

Realtime Query

Yes. Thor data can be indexed and deployed to Roxie for high performance Realtime query. The Roxie and Thor components form the HPCC Systems platform and are completely open source and available for free. The Roxie architecture was specifically built for random access performance and low latency and highly concurrent queries. No. Third party vendors have built integrations on top of HDFS to provide this feature. Examples: Pivotal HD & Teradata. However, HDFS being block oriented was built with sequential access in mind

Monitoring

Tightly integrated with world class monitoring tools - Ganglia & Nagios. Available as Open Source and packaged as part of the platform Mostly vendor dependent

Enterprise Licensing Required

No. There is exactly one package available - The HPCC Systems Platform. It is enterprise ready and is the platform running the LexisNexis 1.4 billion dollar business and several Reed Elsevier projects. Multiple packages and vendor dependent. Most vendors offer a separate enterprise license.

Free Training

Take advantage of our free training resources to get the most out of HPCC Systems.

Sign up today

Free Assessment

Contact us to discuss how your organization can benefit from HPCC Systems.

Introduction to HPCC Systems

Learn more about the types of big data problems the HPCC System platform can solve, and how it solves them.

Learn more