Simple. Fast. Accurate. Cost effective.
A platform purpose-built for high-speed data engineering.
HPCC Systems key advantage comes from its lightweight core architecture. Better performance, near real-time results and full-spectrum operational scale — without a massive development team, unnecessary add-ons or increased processing costs.
I can write in 4 lines of ECL what would take me 200 lines in SQL. That makes it really easy to read, understand, and maintain as a code base.
- Adwait Joshi, CEO, DataSeers
Discover Cloud Native
Learn how being cloud native can improve your current cloud deployments. HPCC Systems 8.0 combines the usability of our bare metal platform with the automation of Kubernetes to make it easy to set up, manage and scale your big data and data lake environments.
- Support for Azure Kubernetes Service
- Support for Amazon Elastic Kubernetes Service
- Object Stores: AWS Simple Storage Service (S3) and Azure Blob Storage
- Disk Stores: AWS Elastic Block Storage and Azure Files/Azure Disks
- Scaling a cluster without moving the data
- Auto wakeup to enable on demand processing by compute resources
- End to end encryption
- Service Mesh Options (Linkerd and Istio)
- OAuth 2.0 support for Authentication, with built in support for Azure AD
- JWT
- Authorization (Roles and Permissions) including Logical Files, Workunits, Admin Resources
Seven aspects of HPCC Systems make it easier than alternatives for processing and analyzing big data.
-
Standard hardware, operating system and protocols
-
High redundancy and availability
-
Practical tools and extensions
-
Efficient programming
-
End-to-end configuration
-
Optimized distributed file system (DFS)
-
Massive scalability and performance
Standard hardware, operating system and protocols
- Processing clusters use commodity hardware and high-speed networking.
- Clusters run on the Linux operating system.
- Supports SOAP, XML, HTTP/HTTPS, REST, and JSON.
- Enterprise Services Platform (ESP) enables end-user access to ROXIE queries via common web services protocols.
High redundancy and availability
- Thor and ROXIE are both fault-resilient, based on replication within the cluster.
- The systems store file part replicas on multiple nodes to protect against disk or node failures.
- Both are designed for resiliency and continued availability in event of hardware failures.
Practical tools and extensions
- Administrative tools for environment configuration, job monitoring, system performance management, distributed file system management, and more.
- Extension modules for web log analytics, natural language parsing, machine learning, data encryption, and more.
Efficient programming
Declarative, modular, extensible Enterprise Control Language (ECL) is designed specifically for processing big data.
- Highly efficient — accomplish big data tasks with far less code.
- Flexible — can be used both for complex data processing on a Thor cluster and for query and report processing on a ROXIE cluster.
- Graphical IDE for ECL simplifies development, testing, and debugging.
- ECL compiler is cluster-aware and automatically optimizes code for parallel processing.
- ECL code compiles into optimized C++ and can be easily extended using C++ libraries.
End-to-end configuration
The two main systems, Thor and ROXIE, work together to provide an end-to-end solution for big data processing and analytics. Data and indexes to support queries are pre-built on Thor and then deployed to ROXIE.
Thor, the Data Refinery Engine, is the ingestion and enrichment engine.
- Thor uses a master-slave topology in which slaves provide localized data storage and processing power, while the master monitors and coordinates the activities of the slave nodes and communicates job status information.
- Middleware components provide name services and other services in support of the distributed job execution environment.
ROXIE, the Information Delivery Engine, provides high-performance online processing and data warehouse capabilities.
- Each ROXIE node runs a Server process and an Agent process. The Server process handles incoming query requests from users, allocates the processing of the queries to the appropriate Agents across the Roxy cluster, collates the results, and returns the payload to the client.
- Queries may include joins and other complex transformations, and payloads can contain structured or unstructured data.
Optimized distributed file system (DFS)
- Thor DFS is record-oriented and optimized for big data ETL (extract-transform-load). A big data input file containing fixed or variable length records in standard or custom formats is partitioned across the cluster’s DFS, with each node getting approximately the same amount of record data and with no splitting of individual records.
- ROXIE DFS is index-based and optimized for concurrent query processing. Based on a custom B+ tree structure, the system enables fast, efficient data retrieval.
Massive scalability and performance
- Horizontal scalability from one node to thousands of nodes.
- Thor can process up to billions of records per second.
- ROXIE can support thousands of users with sub-second response time, depending on the application.
Are you ready to get started using HPCC Systems? Visit our Get Started page to explore the power of the HPCC Systems platform, test ECL code in a virtual playground, and learn how to get up and running with our Virtual Machine or create your own cloud cluster. Still want to learn more? Continue reading below.

Whitepapers
More than a dozen whitepapers provide in-depth analysis of topics that are important to members of the HPCC Systems community and anyone interested in big data processing and analytics.

Books
HPCC Systems offers several books that are designed as a reference for researchers, programmers, business managers, entrepreneurs and investors within the big data industry.

Podcast
Flavio Villanustre, VP of Infrastructure and Products at HPCC Systems, shares the history of the platform, how it is architected for scale and speed, and the unique solutions that it provides for enterprise-grade data analytics.(1hr13min)
An experienced HPCC Systems user explains the benefits and advantages of using HPCC Systems as your big data management solution.