HPCC Systems blog contributors are engineers, data scientists, and fellow community members who want to share knowledge, tips, and other helpful information happening in the HPCC Systems community. Check this blog regularly for insights into how HPCC Systems technology can put big data analytics to work for your own needs.

Roger Dev on 05/10/2022
ECL Provides a powerful capability to combine the benefits of declarative programming (ECL) with those of procedural languages such as C++, Java, or Python. This is known as Embedding. This guide provides a comprehensive review of various methods and patterns for Python Embedding within ECL programs. It reviews elementary embedding techniques, and provides a guide to several more advanced embedding patterns.
Roger Dev on 06/17/2021
Causality 2021 is an HPCC Systems research and development program. The goal is to increase our understanding of the latest causal algorithms, assess and challenge the current state-of-the art, and develop a Causality Toolkit for HPCC Systems Platform. This project encompasses all three levels of the "Ladder of Causality", as well as Causal Model Validation, and Causal Discovery.
Roger Dev on 06/17/2021
Reproducing Kernel Hilbert Spaces (RKHS) and their associated "kernel methods" have become powerful analytic tools in the hands of statisticians, and data scientists. Unfortunately, they are little known to most Computer Scientists, and those who have heard of them don't always appreciate their utility. To make the subject more accessible to a general technical audience, we skip over the hard parts, and get right down to: What is it good for? What is it? and How does it work? Our aim is to encourage more use of these powerful techniques by more people.
Roger Dev on 05/18/2021
As Data Lakes become more complex, it can become difficult to locate information and to understand the inner workings. Curation is the process of documenting a Data Lake so that resources can be located, and its flows understood. Tombolo is an open-source Curation and Governance system for HPCC Systems Data Lakes. It provides visibility into the Data Lake and a central repository for documentation of all of its aspects. It is tightly integrated with the HPCC Systems Platform, automatically exchanging information to help automate the Curation and operation of the Data Lake.
Roger Dev on 07/20/2020
The HPCC Systems COVID-19 Tracker provides enhanced insight into the state and evolution of the COVID-19 pandemic at Country, State, and County levels. It provides unique metrics, and a comprehensive dashboard for use by health officials and curious individuals alike.
Roger Dev on 01/07/2020
The GNN (Generalized Neural Network) bundle provides an ECL interface to Keras and Tensorflow. Using GNN, an ECL developer can construct, train, and utilize arbitrarily complex Neural Networks such as Classical, Convolutional, and Recurrent networks. These networks can be utilized to analyze complex data such as images, video, and time-series.
Roger Dev on 04/09/2019
Text Vectorization allows for the mathematical treatment of textual information. Words, phrases, sentences, and paragraphs can be organized as points in high-dimensional space such that closeness in space implies closeness of meaning. HPCC Systems' new TextVectors module supports vectorization for words, phrases, or sentences in a parallelized, high-performance, and user-friendly package.
Roger Dev on 10/05/2018
Decision Tree based learning methods have proven to be some of the most accurate and easy-to-use Machine Learning mechanisms. We call these mechanisms "Learning Trees". We explore the hows and whys of the various Learning Tree methods and provide an overview of our recently upgraded LearningTrees bundle.
Roger Dev on 08/30/2018
Cause and effect lie at the heart of human discourse and knowledge. Yet computer science and mathematics has very little to say on the subject until recently. There are now algorithms that can detect patterns of cause and effect from data. We explore these mechanisms and how they relate to Machine Learning and Artificial Intelligence.
Roger Dev on 04/18/2018
The Myriad Interface allows users of the HPCC Systems Machine Learning bundles to execute multiple independent machine learning activities within a single interface invocation. Learn how this works and how to use it.