Skip to main content

Your End-to-End
Data Lake
Management Solution

HPCC Systems gives you the ability to quickly develop the data your applications need.

Simple. Fast. Accurate. Cost effective.

A platform purpose-built for high-speed data engineering.

HPCC Systems key advantage comes from its lightweight core architecture. Better performance, near real-time results and full-spectrum operational scale — without a massive development team, unnecessary add-ons or increased processing costs.

I can write in 4 lines of ECL what would take me 200 lines in SQL. That makes it really easy to read, understand, and maintain as a code base.

- Adwait Joshi, CEO, DataSeers

Functionality

Seven aspects of HPCC Systems make it easier than alternatives for processing and analyzing big data.

Standard hardware, operating system and protocols

  • Processing clusters use commodity hardware and high-speed networking.
  • Clusters run on the Linux operating system. • Supports SOAP, XML, HTTP/HTTPS, REST, and JSON.
  • Enterprise Services Platform (ESP) enables end-user access to ROXIE queries via common web services protocols.

High redundancy and availability

  • Thor and ROXIE are both fault-resilient, based on replication within the cluster.
  • The systems store file part replicas on multiple nodes to protect against disk or node failures.
  • Both are designed for resiliency and continued availability in event of hardware failures.

Practical tools and extensions

  • Administrative tools for environment configuration, job monitoring, system performance management, distributed file system management, and more.
  • Extension modules for web log analytics, natural language parsing, machine learning, data encryption, and more.

Efficient programming

Declarative, modular, extensible Enterprise Control Language (ECL) is designed specifically for processing big data.

  • Highly efficient — accomplish big data tasks with far less code.
  • Flexible — can be used both for complex data processing on a Thor cluster and for query and report processing on a ROXIE cluster.
  • Graphical IDE for ECL simplifies development, testing, and debugging.
  • ECL compiler is cluster-aware and automatically optimizes code for parallel processing.
  • ECL code compiles into optimized C++ and can be easily extended using C++ libraries.

End-to-end configuration

The two main systems, Thor and ROXIE, work together to provide an end-to-end solution for big data processing and analytics. Data and indexes to support queries are pre-built on Thor and then deployed to ROXIE.

Thor, the Data Refinery Engine, is the ingestion and enrichment engine.

  • Thor uses a master-slave topology in which slaves provide localized data storage and processing power, while the master monitors and coordinates the activities of the slave nodes and communicates job status information.
  • Middleware components provide name services and other services in support of the distributed job execution environment.

ROXIE, the Information Delivery Engine, provides high-performance online processing and data warehouse capabilities.

  • Each ROXIE node runs a Server process and an Agent process. The Server process handles incoming query requests from users, allocates the processing of the queries to the appropriate Agents across the Roxy cluster, collates the results, and returns the payload to the client.
  • Queries may include joins and other complex transformations, and payloads can contain structured or unstructured data.

Optimized distributed file system (DFS)

  • Thor DFS is record-oriented and optimized for big data ETL (extract-transform-load). A big data input file containing fixed or variable length records in standard or custom formats is partitioned across the cluster’s DFS, with each node getting approximately the same amount of record data and with no splitting of individual records.
  • ROXIE DFS is index-based and optimized for concurrent query processing. Based on a custom B+ tree structure, the system enables fast, efficient data retrieval.

Massive scalability and performance

  • Horizontal scalability from one node to thousands of nodes.
  • Thor can process up to billions of records per second.
  • ROXIE can support thousands of users with sub-second response time, depending on the application.
Ready. Set. Go.

Are you ready to get started using HPCC Systems? Visit our Get Started page to explore the power of the HPCC Systems platform, test ECL code in a virtual playground, and learn how to get up and running with our Virtual Machine or create your own cloud cluster. Still want to learn more? Continue reading below.

How does HPCC Systems compare?

Benchmarks and comparisons with competitive platforms

HPCC Systems Whitepaper

Coming Soon! Evaluating the benefits of building your own data lake using the HPCC Systems platform

READ WHITEPAPER

Technical Whitepaper

Comparison of HPCC Systems Thor vs Apache Spark Performance on AWS

READ WHITEPAPER

Webinar Overview

Baselines & Benchmarks – Making Open Source Big Data Analytics Easy

READ BLOG

Terasort Results

How to accurately calculate the real ROI of a Big Data analytics system

READ PDF

PigMix Comparison

ECL outperforms Pig and Java significantly on the Hadoop PigMix benchmark on an identical hardware configuration

READ MORE

Comparison to Hadoop

Learn how the HPCC Systems platform compares to Hadoop and why HPCC Systems is a superior alternative

READ MORE
More about HPCC Systems
capability icon

Whitepapers

More than a dozen whitepapers provide in-depth analysis of topics that are important to members of the HPCC Systems community and anyone interested in big data processing and analytics.

VIEW WHITEPAPERS
capability icon

Books

HPCC Systems offers several books that are designed as a reference for researchers, programmers, business managers, entrepreneurs and investors within the big data industry.

VIEW BOOKS
capability icon

Podcast

Flavio Villanustre, VP of Infrastructure and Products at HPCC Systems, shares the history of the platform, how it is architected for scale and speed, and the unique solutions that it provides for enterprise-grade data analytics.(1hr13min)

VIEW PODCASTS
Versatile. Flexible. Refined.

An experienced HPCC Systems user explains the benefits and advantages of using HPCC Systems as your big data management solution.