Skip to main content

The HPCC Systems Causality Toolkit is now released.  This toolkit provides a set of leading-edge algorithms for Causal Analysis of data.  These intensive algorithms are parallelized on HPCC Systems clusters.

Causal Analysis is a fairly new area of scientific inquiry and as such, the algorithms and techniques are rapidly evolving.  We are currently utilizing the Toolkit to explore ways in which Causal Analysis can help understand and utilize real-world data.

There are three main components to the toolkit, any of which can be used independently:

  • Synthetic Data Generator -- A generalized facility for generating synthetic, multivariate datasets with known statistical and causal characteristics for use in testing and validating various algorithms.
  • Probability Layer -- A powerful probability analysis subsystem that supports a wide range of probability queries for any combination of discrete and continuous variables.  It supports basic probabilities, conditional probabilities, conditionalizing, independence testing,  prediction, and classification.  It is designed to answer most any question that can be asked at the level of probability and statistics.
  • Causality Layer -- A range of Causal Analysis algorithms, utilizing the Probability Layer as well as a Causal Hypothesis.  The presence of a Causal Hypothesis supports an extended set of queries that cannot be asked or answered at the Probability Layer.  Facilities include: Causal Inference, Causal Metrics, Causal Model Validation, and Causal Model Discovery.  These algorithms are strickly experimental -- they are rapidly evolving and we are just beginning to investigate their power against real-world data.

The HPCC Systems Causality Toolkit Bundle is available here.  It requires a Python module "Because" to be installed on each cluster server.  The Bundle provides full installation instructions, extensive documentation, and example code.

For an introduction to the HPCC Systems Causality Project, see Causality 2021.  This article includes background material, as well as a rich references section.