Skip to main content

A powerful, open-source, enterprise-proven big data analytics platform.

Overview Video

Overview

HPCC Systems® helps businesses of all sizes find the answers they need by making data easier to process, analyze, and understand.

Born from the deep data analytics history of LexisNexis® Risk Solutions, HPCC Systems provides high-performance, parallel processing and delivery for applications using big data.

The open-source platform incorporates a software architecture implemented on commodity shared-nothing computing clusters for resilience and scalability. It is configurable to support both parallel batch data processing and high-performance data delivery applications using indexed data files. The platform includes a high-level, implicitly parallel data-centric declarative programming language that adds to its flexibility and efficiency.

Developers, data scientists and technology leaders adopt HPCC Systems because it is cost-effective, comprehensive, fast, powerful and scalable.

Ultimately, it makes managing big data easier.

The HPCC Systems Platform

HPCC Systems platform is set of easy-to-use software features enabling developers and data scientists to process and analyze data at any scale. With a strong commitment to the open source community, the HPCC Systems platform is available free of licensing and service costs. Download the HPCC Systems Brochure.

HPCC Systems provides all the functionality to execute a data project. Specifically, the HPCC Systems stack comprises of:

Thor: The Data Refinery Cluster

Known as “Thor” after the hammer-wielding god of thunder, this cluster is designed to execute big data workflows including extraction, loading, cleansing, transformations, linking and indexing.

Data Management Tools

Data Profiling, Data Cleansing, Snapshot Data Updates and consolidation, Job Scheduling and automation are some of the key features.

ROXIE: The Data Delivery Engine

Rapid data delivery cluster provides separate high-performance online query delivery for big data. ROXIE (Rapid Online XML Inquiry Engine) utilizes highly optimized distributed B-tree indexed data structures conceived for high concurrent use.

Predictive Modeling Tools

In place (supporting distributed linear algebra) predictive modeling functionality to perform Linear Regression, Logistic Regression, Decision Trees and Random Forests.

Features

Seven aspects of HPCC Systems make it easier than alternatives for processing and analyzing big data.

Standard hardware, operating system and protocols

  • Processing clusters using commodity hardware and high-speed networking
  • Linux operating system
  • Supports SOAP, XML, HTTP, REST and JSON
  • Enterprise Services Platform (ESP) enables end-user access to ROXIE queries via common web services protocols

High redundancy and availability

  • Thor and ROXIE are both fault-resilient, based on replication within the cluster.
  • The systems store file part replicas on multiple nodes to protect against disk or node failures.
  • Both are designed for resiliency and continued availability in event of hardware failures.

Practical tools and extensions

  • Administrative tools for environment configuration, job monitoring, system performance management, distributed file system management, and more.
  • Extension modules for web log analytics, natural language parsing, machine learning, data encryption, and more.

Efficient programming

Declarative, modular, extensible Enterprise Control Language (ECL) is designed specifically for processing big data.

  • Highly efficient — accomplish big data tasks with far less code.
  • Flexible — can be used both for complex data processing on a Thor cluster and for query and report processing on a ROXIE cluster.
  • Graphical IDE for ECL simplifies development, testing and debugging.
  • ECL compiler is cluster-aware and automatically optimizes code for parallel processing.
  • ECL code compiles into optimized C++ and can be easily extended using C++ libraries.

End-to-end configuration

The two main systems — Thor and ROXIE — work together to provide an end-to-end solution for big data processing and analytics. Data and indexes to support queries are pre-built on Thor and then deployed to ROXIE.

Thor, the Data Refinery, is the extraction, transformation and loading engine.

  • Thor uses a master-slave topology in which slaves provide localized data storage and processing power, while the master monitors and coordinates the activities of the slave nodes and communicates job status information.
  • Middleware components provide name services and other services in support of the distributed job execution environment.

ROXIE, the Data Delivery Engine, provides high-performance online processing and data warehouse capabilities.

  • Each ROXIE node runs a Server process and an Agent process. The Server process handles incoming query requests from users, allocates the processing of the queries to the appropriate Agents across the Roxy cluster, collates the results, and returns the payload to the client.
  • Queries may include joins and other complex transformations, and payloads can contain structured or unstructured data.

Optimized distributed file systems (DFS)

  • Thor DFS is record-oriented and optimized for big data ETL (extract-transform-load). A big data input file containing fixed or variable length records in standard or custom formats is partitioned across the cluster’s DFS, with each node getting approximately the same amount of record data and with no splitting of individual records.
  • ROXIE DFS is index-based and optimized for concurrent query processing. Based on a custom B+ tree structure, the system enables fast, efficient data retrieval.

Massive scalability and performance

  • Horizontal scalability from one node to thousands of nodes.
  • Thor Data Refinery can process up to billions of records per second.
  • ROXIE Data Delivery Engine can support thousands of users with sub-second response time, depending on the application.

Case studies: Solving big data problems

Organizations have used HPCC Systems in demanding production environments for more than a decade, making it the most proven solution of its type. Learn how innovators are using the HPCC Systems platform in these detailed case studies.
3LOQ
3LOQ uses proprietary machine learning algorithms to analyze billions of data points & map out customer recommendations. Learn more...

3LOQ

Changing the Landscape of Customer Engagement with AI

Proagrica
Searching for ways to help the agriculture industry use data-driven decision making for crops and livestock production. Learn more...

Proagrica

Using Big Data to help feed the world

GuardHat
Redefining worker safety with a smart hardhat continuously transmitting data to a safety control center.Learn more...

GuardHat

Managing worker safety in real time

InfoSys
360-degree customer view helps leading Chinese appliance manufacturer transition to an internet-centric world.Learn More...

InfoSys

360-Degree Customer View

Overview videos

Free videos are available to help you get started quickly and get the most from HPCC Systems. The videos cover a variety of topics, from beginners trying to solve their first problem to advanced users looking to tune their programs to meet performance requirements.

View getting started videos

Read more about HPCC Systems

White Papers

More than a dozen white papers provide in-depth analysis of topics that are important to members of the HPCC Systems community and anyone interested in big data processing and analytics.

Books

HPCC Systems offers several books that are designed as a reference for researchers, programmers, business managers, entrepreneurs and investors within the big data industry.

FAQs

Read the FAQs for answers to common questions about HPCC, ECL and more.

News

See our latest news updates about how we work with organizations to solve big data problems.

What people are saying about HPCC Systems