Skip to main content

Causality 2021

Causality 2021 is an HPCC Systems research and development program. The goal is to increase our understanding of the latest causal algorithms, assess and challenge the current state-of-the art, and develop a Causality Toolkit for HPCC Systems Platform. This project encompasses all three levels of the "Ladder of Causality", as well as Causal Model Validation, and Causal Discovery.

Reproducing Kernel Hilbert Space (RKHS) -- A primer for non-mathematicians

Reproducing Kernel Hilbert Spaces (RKHS) and their associated "kernel methods" have become powerful analytic tools in the hands of statisticians, and data scientists. Unfortunately, they are little known to most Computer Scientists, and those who have heard of them don't always appreciate their utility. To make the subject more accessible to a general technical audience, we skip over the hard parts, and get right down to: What is it good for? What is it? and How does it work? Our aim is to encourage more use of these powerful techniques by more people.

Introducing the HPCC Systems Machine Learning Preprocessing Bundle

It is well known in the Machine Learning community that data preparation is the most time-consuming phase of a Machine Learning project. To make that phase smoother on the HPCC Systems platform, Vannel Zeufack has provided a machine learning preprocessing bundle. This work was completed as an intern project in 2020, when Vannel joined the program for the second year running.

Using the HPCC Systems Generalized Neural Network (GNN)

One of the biggest concerns that we have in education is the safety of our school campuses. To address this problem, the American Heritage School robotics team developed an autonomous, mobile robot that gathers and provides vital information to school security personnel, and, if necessary, first responders. This blog details the facial recognition training project completed by HPCC Systems summer intern, Jack Fields, a 12th grade high school student at American Heritage School in Delray, Florida.

Integrating Prior Knowledge with Learning in Biomedical Natural Language Processing

Natural language processing (NLP) research has progressed rapidly in the last year, with significant performance improvement in a variety of NLP tasks. Yet, the research of NLP techniques in the biomedical domain has not progressed much. The advantage of the biomedical domain is that it has a high-quality human curated knowledge base and large-scale literature, both of which contain rich prior knowledge.

Leveraging the Spark-HPCC Ecosystem

James McMullan, Sr. Software Engineer at LexisNexis Risk Solutions, gave an overview of the Spark-HPCC Plugin & Connector in a breakout session at the 2019 HPCC Systems Community Day. This presentation also included an introduction to Apache Zeppelin, a demonstration of a random forest model created in Spark, and a discussion about the future of the Spark-HPCC Ecosystem.

Deep Sequence Learning in Traffic Prediction and Text Classification

Deep learning is a subset of machine learning that is modeled on the basis of the human brain. It essentially teaches computers what comes naturally to humans (learning by examples). In this blog, we discuss how deep learning models using background knowledge were used to achieve sequence learning on traffic and natural language. We also introduce the deep learning tool, TensorLayer.

An Investigation into Time Series Analysis

Time series forecasting is an important statistical tool for predicting future events, needs, trends, etc., and can be applied to a variety of data sources. Jeremy Meier and David Noh, recent graduates of Clemson University’s Computer Science program, spoke at HPCC Systems Tech Talk episode 23 about the basic principles and components of time series forecasting using modern machine learning methods. This blog gives insight into their semester-long project, which focused on time series analysis and forecasting using financial datasets. 
Subscribe to Machine Learning