Skip to main content

HPCC Systems blog contributors are engineers, data scientists, and fellow community members who want to share knowledge, tips, and other helpful information happening in the HPCC Systems community. Check this blog regularly for insights into how HPCC Systems technology can put big data analytics to work for your own needs.

Cassandra Walker on 08/10/2021
Jingqing Zhang, Ph.D. (HiPEDS) candidate at Department of Computing, Imperial College London, London, UK has done incredible work in the areas of biomedical natural language processing, traffic prediction, and text classification. Jingqing is nearing the end of his doctoral studies, and this blog highlights his latest HPCC Systems community-sponsored research project.
Nikita Jha on 08/03/2021
Nikita Jha is a high school student, who joined the 2021 HPCC Systems Intern Program to complete a project focusing on applying docker image build and Kubernetes security principles to our new Cloud Native platform. This tutorial style blog covers the importance of certificate management and provides instructions for setting up and configuring a Hashicorp Vault.
Lorraine Chapman on 07/20/2021
It's that time of year again! The HPCC Systems Intern Program is well underway. In 2021, we have welcomed 12 students on to the program to complete a range of projects contributing to our open source platform and machine learning library. Now that the students are well into their project tasks, now is a good time to share progress made so far.
Lorraine Chapman on 07/15/2021
Our newest release HPCC Systems 8.2.0 Gold, contains a number of improvements and features that mean our Cloud Native Platform is now ready to be tested and used in production environments. Find out more about the new features and enhancements now available for both our Cloud Native and Bare Metal Platform users.
Gavin Halliday on 07/15/2021
The method of defining storage on our Cloud Native platform has been rationalized and simplified in HPCC Systems 8.2.0.  This blog, by Gavin Halliday, takes you through the new way storage is defined in the values.yaml file, providing some examples of how it can be used.
Roger Dev on 06/17/2021
Causality 2021 is an HPCC Systems research and development program. The goal is to increase our understanding of the latest causal algorithms, assess and challenge the current state-of-the art, and develop a Causality Toolkit for HPCC Systems Platform. This project encompasses all three levels of the "Ladder of Causality", as well as Causal Model Validation, and Causal Discovery.
Roger Dev on 06/17/2021
Reproducing Kernel Hilbert Spaces (RKHS) and their associated "kernel methods" have become powerful analytic tools in the hands of statisticians, and data scientists. Unfortunately, they are little known to most Computer Scientists, and those who have heard of them don't always appreciate their utility. To make the subject more accessible to a general technical audience, we skip over the hard parts, and get right down to: What is it good for? What is it? and How does it work? Our aim is to encourage more use of these powerful techniques by more people.
Flavio Villanustre on 06/15/2021
June 15, 2021 marks the 10 year anniversary of HPCC Systems in the open source community. To commemorate this milestone achievement, I sit down with Vijay Raghavan, EVP & CTO, LexisNexis Risk Solutions Group in the latest episode of the 10 year anniversary podcast series.
Lorraine Chapman on 05/27/2021
Bahar Fardanian is a key ambassador for the HPCC Systems Open Source Project. Bahar's work is an inspiration to all who experience her presentations, workshops and Hackathon challenges, in particular students looking to become software engineers or data scientists. Find out about her latest venture, which involves delivering a big data analytics course to students at Kennesaw State University.
Roger Dev on 05/18/2021
As Data Lakes become more complex, it can become difficult to locate information and to understand the inner workings. Curation is the process of documenting a Data Lake so that resources can be located, and its flows understood. Tombolo is an open-source Curation and Governance system for HPCC Systems Data Lakes. It provides visibility into the Data Lake and a central repository for documentation of all of its aspects. It is tightly integrated with the HPCC Systems Platform, automatically exchanging information to help automate the Curation and operation of the Data Lake.