High Performance Innovative Data Lake Management

Proven and tested Data Lake end-to-end management and processing

What are High Performance Computing Clusters?

A High Performance Cluster Computing platform built for high-speed data engineering.

HPCC Systems key advantage comes from its lightweight core architecture. Better performance, near real-time results and full-spectrum operational scale — no need for a massive development team, unnecessary add-ons or increased processing costs.

Innovative Features

View the many advantages HPCC Systems brings to the maintenance of your Data Lake or Big Data environment.

Harness the benefits of Kubernetes

Learn how using our containerized, cloud native platform can improve your current cloud deployments. Today’s HPCC Systems combines the usability of our legacy bare metal platform with the automation of Kubernetes to make it easy to set up, manage and scale your implementation.

Runs on KubernetesNew Storage Plane Architecture supportsElasticitySecurity
Support for Azure Kubernetes Service

Support for Amazon Elastic Kubernetes Service
Object Stores: AWS Simple Storage Service (S3) and Azure Blob Storage

Disk Stores: AWS Elastic Block Storage and Azure Files/Azure Disks
Scaling a cluster without moving the data

Auto wakeup to enable on demand processing by compute resources
End to end encryption

Service Mesh Options (Linkerd and Istio)

OAuth 2.0 support for Authentication, with built in support for Azure AD

JWT

Learn About our Cloud Native Platform

Visit the Cloud Native Wiki page for access to Helm charts, blog content, videos and other instructional information.

Running HPCC System on a Local Machine

A Virtual Machine containerized deployment with Docker Desktop or Minikube is an excellent resource for experimenting, evaluating and training on the HPCC Systems platform.

Containerized Platform Documentation

Documentation useful for cloud-based deployments featuring Terraform, Helm, and other deployments (large or small) as well as local testing and development deployments.

Ultra Performance

HPCC Systems key advantage comes from its lightweight core architecture. Better performance, near real-time results and full-spectrum operational scale — without a massive development team, unnecessary add-ons or increased processing costs.

HPCC Systems Overview

HPCC Systems is an open source platform for big data implementations,
whether as a data lake or data warehouse, providing users with a
clear path from data discovery to production.

End to End Data Lake Management

Data lakes are helping leading organizations solve the
problem of extremely large, unstructured datasets,
allowing them to increase responsiveness and
scalability while reducing costs.

Spark Comparative Analysis

A Comparative analysis of Spark and HPCC Systems including the architectures and feature support of Spark and HPCC Systems in regard to data lake capabilities and their focus on different parts of the big data pipeline.

Code Less — Accomplish More

A declarative programming language, ECL allows a programmer to express the logic of a computation without describing its flow control. Developers tell the system what they need, but leaves it up to the system to determine the best way to do it.

DataSeers Case Study

With the efficiency of ECL, fewer lines of code allows prototypes that can be iterated quickly, speeding both time to market and time to revenue.

Try the ECL Playground

Try our Enterprise Control Language (ECL), the data-oriented programming language specially designed for data processing and analytics.

Access Free Training

From free training courses to rich community resources and a comprehensive wiki, we have all the resources for every stage from initial installation all the way to power user.

Machine Learning Library and Causality Analytics

The ML Library provides a wide range of algorithms and is designed to utilize the parallel computing capabilities of HPCC Systems. Build and test ML models and to use those models to predict qualitative or quantitative values.

Machine Learning Demystified

A quick but potent intro to Machine Learning for those who are new to the subject. This article provides enough of the basic theory and terminology to make you dangerous.

Machine Learning Workshop

Follow along with our trainers as they demonstrate our DBSCAN, K-Means, Logistic and Linear Regression, Generalized Neural Networks and Learning Trees bundles.

Machine Learning Library

The HPCC Systems Machine Learning Library provides a wide range of Machine Learning algorithms accessible from ECL, and designed to utilize the parallel computing capabilities of HPCC Systems.

Integrate with Ease

HPCC Systems continues to develop new plugins, connectors and stand-alone applications which are free to the community to help you integrate many popular third party tools with the HPCC Systems platform.

Free Add-On Modules

HPCC Systems continues to develop new stand-alone applications and plug-in modules that extend the capabilities of the base HPCC Systems platform.

ECL Bundles

An ECL Bundle is a self-contained set of ECL files, designed to accomplish specific tasks. They are encapsulated for versioning, distribution and download.

Third Party Integrations

Use embedded languages and external datastores with HPCC Systems to integrate your system to your data.

Use your favorite language or data source 

ECL is very flexible. You can embed a number of different languages within your ECL code and process data on a HPCC Systems cluster from a variety of different sources using the various plugins and connectors we provide specifically to help you bridge the gap.

Using Your Favorite Language or Data Source

How flexible is ECL? Read about supported languages, plugins and connectors.

Embedded Languages and Data Stores Wiki

The full list of supported languages, plugins and connectors, including links to other information you might find useful.

Advanced Python Embedding

Learn how ECL makes it easy to transition between declarative and procedural worlds through use of embedding.

Committed to Open Source Innovation

Freely available to the open source community for more than 10 years and licensed under Apache 2.0, we continue to push the boundaries of Big Data with a vibrant development community both online and in academic institutions.

GitHub Repository

HPCC Systems is an open source, massive parallel-processing computing platform for Big Data processing and analytics.

Stack Overflow Community Forum

Receive peer to peer support on our Stack Overflow forums. Ask questions specific to your development or read and answer questions others have posted.

 Academic Research

The HPCC Systems Team collaborates with multiple colleges, universities, high schools and institutions of higher learning around the world to help train and develop the future managers of Big Data projects.

Proven, Stable and Secure

HPCC Systems is a mature platform that has been heavily used in commercial applications for more than two decades, predating even the development of Hadoop. Created by LexisNexis Risk Solutions, an innovative pioneer in big data processing, and open source for nearly a decade now, HPCC Systems features a vibrant development community that continues to push the boundaries of big data.

Securing your environment & protecting your data

This blog highlights some of the many security features that make HPCC Systems a compelling solution for users that require a robust, configurable, highly secure computing platform.

Detail on the many security features that make HPCC Systems a compelling solution for users that require a robust, configurable, highly secure computing platform.

Data Lake Curation and Governance with Tombolo

Tombolo provides the tools required to implement, document, and maintain an organizational infrastructure and can implement safeguards to govern what users and applications have access to those data assets.

Conduct curation and governance operations in an automated fashion to consistently and reliably curate huge amounts of inbound new data and ensure continuous availability.

What You Need to Know About Securing Your Platform

Blog discussing some of the basic security considerations to properly secure a Big Data platform from unauthorized access or data theft.

Get Started

Want to do a little more testing before you install a full cluster? If you’re ready to start building your Data Lake, you can jump straight to learning about how to install your first complete HPCC Systems cluster. Interested in learning just how powerful, flexible, and efficient ECL really is? Take a look at our ECL guide.

Localized Machine

Containerized deployments using Docker Desktop or Minikube are easier to start up locally and provide more flexibility and stability.

Documentation & Training

Tackling big data problems? We’ve got you covered, with documentation and training to support you from initial installation all the way to power user.

Get Up and Running

Get a high level overview to help new users get started with HPCC Systems and ECL (Enterprise Control Language).

Test Drive

Test our code in a virtual playground using a sample dataset. Or, create your own high performance computing cluster (Thor) and/or query cluster (Roxie).

HPCC Systems: The End-to-End Data Lake Management Solution

Ready. Set. Go.

Are you ready to get started using HPCC Systems? Use the panels below to get a quick overview of the HPCC Systems platform, learn about how you can ingest, clean and deliver your mixed schema data to make it useful and relevant for both you and your customers.

Versatile. Flexible. Refined.

An experienced HPCC Systems user explains the benefits and advantages of using HPCC Systems as your big data management solution.

Ingest data from your Data Lake

Here are some example data sets for example programs provided by members of the HPCC Systems Community.

Get more from your data with the Machine Learning Library

The HPCC Systems Machine Learning Library provides a wide range of Machine Learning algorithms accessible from ECL, and designed to utilize the parallel computing capabilities of HPCC Systems.

Design and automate your data workflows

Tombolo technology is the central console for developers and operators, providing all of the facilities needed for designing, developing, automating, documenting, and governing data lakes.

A legacy of Innovation and Open Source software for more than 10 years

Freely available to the open source community for more than 10 years and licensed under Apache 2.0, we continue to push the boundaries of big data with a vibrant development community both online and in academic institutions.

Have a Question?

Check out our FAQ page. Browse the topics to discover more about HPCC Systems technology and answers to common questions about HPCC Systems, ECL and more.

Stay Informed

Keep up with the latest in HPCC Systems developer news and community information. Sign up for our newsletter here

Get the latest news covering platform updates, technical blogs, events and other related announcements. Just put your email address in the form below. We will not send you junk mail or sell your email address. Just the latest information to keep you up to date.