Causality 2021 is an HPCC Systems research and development program. The goal is to increase our understanding of the latest causal algorithms, assess and challenge the current state-of-the art, and develop a Causality Toolkit for HPCC Systems Platform. This project encompasses all three levels of the "Ladder of Causality", as well as Causal Model Validation, and Causal Discovery.
Reproducing Kernel Hilbert Spaces (RKHS) and their associated "kernel methods" have become powerful analytic tools in the hands of statisticians, and data scientists. Unfortunately, they are little known to most Computer Scientists, and those who have heard of them don't always appreciate their utility. To make the subject more accessible to a general technical audience, we skip over the hard parts, and get right down to: What is it good for? What is it? and How does it work? Our aim is to encourage more use of these powerful techniques by more people.
It is well known in the Machine Learning community that data preparation is the most time-consuming phase of a Machine Learning project. To make that phase smoother on the HPCC Systems platform, Vannel Zeufack has provided a machine learning preprocessing bundle. This work was completed as an intern project in 2020, when Vannel joined the program for the second year running.
One of the biggest concerns that we have in education is the safety of our school campuses. To address this problem, the American Heritage School robotics team developed an autonomous, mobile robot that gathers and provides vital information to school security personnel, and, if necessary, first responders. This blog details the facial recognition training project completed by HPCC Systems summer intern, Jack Fields, a 12th grade high school student at American Heritage School in Delray, Florida.
Natural language processing (NLP) research has progressed rapidly in the last year, with significant performance improvement in a variety of NLP tasks. Yet, the research of NLP techniques in the biomedical domain has not progressed much. The advantage of the biomedical domain is that it has a high-quality human curated knowledge base and large-scale literature, both of which contain rich prior knowledge.
Audio forensics is the field of forensic science relating to the acquisition, analysis, and evaluation of sound recordings. These recordings are normally used as evidence in an official venue. In this blog, we discuss a modern approach to forensic audio analysis, using artificial neural networks, digital signal processing, and big data.
James McMullan, Sr. Software Engineer at LexisNexis Risk Solutions, gave an overview of the Spark-HPCC Plugin & Connector in a breakout session at the 2019 HPCC Systems Community Day. This presentation also included an introduction to Apache Zeppelin, a demonstration of a random forest model created in Spark, and a discussion about the future of the Spark-HPCC Ecosystem.
Deep learning is a subset of machine learning that is modeled on the basis of the human brain. It essentially teaches computers what comes naturally to humans (learning by examples). In this blog, we discuss how deep learning models using background knowledge were used to achieve sequence learning on traffic and natural language. We also introduce the deep learning tool, TensorLayer.
Time series forecasting is an important statistical tool for predicting future events, needs, trends, etc., and can be applied to a variety of data sources. Jeremy Meier and David Noh, recent graduates of Clemson University’s Computer Science program, spoke at HPCC Systems Tech Talk episode 23 about the basic principles and components of time series forecasting using modern machine learning methods. This blog gives insight into their semester-long project, which focused on time series analysis and forecasting using financial datasets.
Chris Gropp, a PhD student at Clemson University, spoke at HPCC Systems Tech Talk 10, focusing on how to plan effectively at the start of a machine learning research project to achieve a successful outcome. This blog shares his experience on how to ask the right questions with machine learning, by taking a step back and carefully examining the requirements.