How It Works

Two integrated clusters, a declarative programming language, and a standards-based web services platform form the basis of this comprehensive, massively scalable big data solution.

The HPCC Systems platform consists of two integrated but distinct clusters: a back-end data refinery cluster for ingesting, refining, and transforming big data (called Thor); and a front-end data delivery cluster supporting high-performance online querying of processed data (called Roxie). Both clusters run on commodity off-the-shelf hardware. A single, powerful programming language called Enterprise Control Language (ECL) creates the applications that run on the data refinery cluster as well as those that drive the data delivery cluster. End user access to the system’s real-time querying capabilities is supported through a standards-based web services platform. In combination these components provide a comprehensive, massively scalable solution for big data processing and analytics.

Download Technical Brochure

Thor: The Data Refinery Cluster for Big Data Ingest and Transformation

The HPCC Systems Data Refinery Cluster – known as “Thor”, after the hammer-wielding god of thunder – is responsible for ingesting, cleaning, transforming, linking, and indexing vast amounts of data. It functions as a distributed file system with parallel processing power spread across the nodes. A Thor cluster can scale from a single node to thousands of nodes.

A Thor cluster:

  • Provides a massively parallel job execution environment for programs coded in ECL.
  • Utilizes a master-slave topology in which slaves provide localized data storage and processing power, while the master monitors and coordinates the activities of the slave nodes and communicates job status information.
  • Provides a record-oriented distributed file system (DFS). A big data input file containing fixed or variable length records in standard or custom formats is partitioned across the cluster’s DFS, with each node getting approximately the same amount of record data and with no splitting of individual records.
  • Is fault resilient, based on configurable replication of file parts within the cluster.
  • Utilizes middleware components that provide name services and other services in support of the distributed job execution environment.

Roxie: The Data Delivery Engine Supporting Up to Thousands of Requests Per Second

Roxie – for Rapid Online XML Inquiry Engine – is the front-end cluster providing high-performance online query processing and data warehouse capabilities.

  • Data and indexes to support queries are pre-built on Thor and then deployed to Roxie.
  • Roxie uses an index-based distributed file system, based on a custom B+ tree structure, to enable fast, efficient data retrieval.
  • Queries may include joins and other complex transformations, and payloads can contain  structured or unstructured data.
  • Each Roxie node runs a Server process and an Agent process. The Server process handles incoming query requests from users, allocates the processing of the queries to the appropriate Agents across the Roxy cluster, collates the results, and returns the payload to the client.
  • A Roxie cluster is fault resilient, based on data replication within the cluster.

ECL: The Powerful, Efficient Programming Language Built for Big Data

Enterprise Control Language (ECL) is a key factor in the flexibility and capabilities of the HPCC Systems platform. This declarative programming language was designed specifically to enable the processing of massive data sets as efficiently as possible.

  • Accomplishes big data processing and analysis objectives with a minimum of coding.
  • The sophisticated ECL compiler is cluster-aware and automatically optimizes code for parallel processing. Programmers needn’t be concerned about whether their code will be deployed on one node or hundreds of nodes.
  • An included graphical IDE for ECL simplifies development, testing, and debugging.
  • ECL code compiles into optimized C++ and can be easily extended using C++ libraries.
  • ECL can be used both for complex data processing on a Thor cluster and for query and report processing on a Roxie cluster.

ESP: A Versatile, Standards-Based End User Services Platform

The Enterprise Services Platform (ESP) provides the means for end users to access Roxie queries through common web services protocols.

  • Supports SOAP, XML, HTTP, and REST.
  • Provides authentication, security, and logging functions.

 

Free Training

Take advantage of our free training resources to get the most out of HPCC Systems.

Sign up today

Free Assessment

Contact us to discuss how your organization can benefit from HPCC Systems.

Introduction to HPCC Systems

Learn more about the types of big data problems the HPCC System platform can solve, and how it solves them.

Learn more