October 2023 kicked off with an incredibly busy week, full of events including the 8th annual HPCC Systems Poster competition, the 10th annual HPCC Systems Community Day Summit, and a series of free ECL workshops to help attendees explore the platform deeper. The purpose of the Summit is to gather engineers, data scientists and technology professionals to share knowledge and future roadmap plans for the HPCC Systems platform. This event is dedicated to showcase our community and have industry and academia present their HPCC Systems use cases, research projects and share their experience on how they leverage the HPCC Systems platform. Below you will find some of the standout moments and highlights from these events.
HPCC Systems Poster Competition
The 8th annual poster presentations competition was held as part of the Community Virtual Summit, showcasing the work of our academia community on how students leverage the HPCC Systems platform. In addition to the standard judged categories, the Community Choice Award was announced where our virtual attendees participated in the voting of their favorite poster. Winners were revealed as part of the Community Recognition and Awards ceremony.
This contest was run entirely remote. Posters were divided up into four categories, Data Analytics, Platform Enhancement, Use Case, and Research. Students submitted an abstract, their completed poster and a five minute video. A page on our Community Wiki was then created for each student, providing a space to access all these resources. At first the resources were only available to the judges, but once they had an opportunity to view them in private and make their decisions on the winners of the four categories, the posters were then made available for the public to view in our 2023 Poster Contest Wiki. The community then voted on their selection for the winner of the Community Choice Award.
Poster Competition Judges
In 2023, we had a great team of four judges that were assigned with a challenging task of evaluating each student’s poster and providing valuable feedback. If you recognize Amila, there is a good reason to it as she presented at the summit last year with a talk about PyHPCC. The same applies to Jeff Mao, who was an HPCC Systems Summer intern twice, and is now pursuing a computer science degree at Georgia Tech. He has also started his own startup in the e-commerce sector. Michael Gardner is a former intern and is now software engineer at LexisNexis Risk Solutions. Michael is a member of the HPCC Systems platform team playing a key role in the platform project builds. Finally, Mauricio Oliveira who is a manager of software engineering at LexisNexis Risk Solutions and also a former intern, now leads the technical development and support of data solutions for the Brazilian market. Thank you judges for volunteering your time to support the growth of our student community.
Amila De Silva
Senior Software Engineer, LexisNexis Risk Solutions
Michael Gardner
Software Engineer III,
LexisNexis Risk Solutions
Jeff Mao
Co-founder @SellRaze CS, student at Georgia Tech
Mauricio Nunes de Oliveira
Manager Software Engineering,
LexisNexis Risk Solutions
Click on the links below to read the abstracts and view the posters in each of the following four categories:
Data Analytics
- Skanda P R, R.V. College of Engineering, India
Analyzing and Visualizing Causes of Poverty in India - S Dhanush, R.V. College of Engineering, India
Data-Driven IPL Team Selection: Leveraging T20 Statistics - Samhitha S, R.V. College of Engineering, India
Visualization and Determination of Air Quality - Prashant Ronad, R.V. College of Engineering, India
Leveraging HPCC Systems for natural disaster management in Asia - Eshaan Mathur, R.V. College of Engineering, India
Analysis of Country-wide Healthcare Data and Selection of an Ideal Machine Learning Model for Prediction of GHSI - Fulvio Favilla Filho, University of São Paulo, Brazil
Machine Learning Approach to Cardiovascular Disease (CVD) Prediction
Platform Enhancement
- Johnny Huang, University of Toronto, Canada
Improve Error Handling and Reporting for Automated Test Systems - Boqiang Li, Clemson University, USA
Convert Generalized Neural Network bundle (GNN) to native Tensorflow 2.0 - Noah Seligson, University of Central Florida, USA
Convert Automated Test Systems from Python2 to Python3 - Jessie Mao, Lambert High School, USA
HPCC Systems Deployment with Various Helm Chart Configurations - Ryan Rao, American Heritage School, USA
HPCC Systems Storage Support With Container Storage Interface (CSI) - Nivedha Sivakumar, Georgia State University, USA
Test Suite for a Roxie Cluster on Kubernetes
Use Case
- K Dheemonth, R.V. College of Engineering, India
Sentiment Analysis in English - Adarsh U, R.V. College of Engineering, India
Enhance the NLP English Dictionary - Jayanth C, R.V. College of Engineering, India
Scalable Analysis of English Dictionary Files on HPCC Systems - Carlos Caceres, American Heritage School, USA
Practical Application of Generative AI Technology - Kruthika Pinnada, R.V. College of Engineering, India
Resume Analyzer - Shyamaa Karthik, Saint Andrew’s School, USA
Processing the Tamil Wiktionary Pages into a NLP++ Dictionary
Research
- Sudershan K S, R.V. College of Engineering, India
Performance optimization of Learning Trees modules in the ECL repository - Hiroki Sato, Indiana University, USA
Automation of HPCC Systems Cloud Native Deployment to AWS with Terraform - Rohhun Laiju, R.V. College of Engineering, India
Image Conversion Plugin for GNN Module on HPCC Systems - Logan Patterson, New College of Florida, USA
Designing Test Algorithms for Causal Model Discovery Within the HPCC Systems Causality Framework - Narayan Kandel, Clemson University, USA
Enhancing Performance of Distributed Neural Network with GNN Bundle - Sarah Nash, New College of Florida, USA
Causal Discovery and Validation with Categorical Data
And the winners are!
Data Analytics
Platform Enhancement
Use Case
Research
Fulvio Favilla Filho
Machine Learning Approach to Cardiovascular Disease (CVD) Prediction
Nivedha Sivakumar
Test Suite for a Roxie Cluster on Kubernetes
Shyamaa Karthik
Processing the Tamil Wiktionary Pages into a NLP++ Dictionary
Sarah Nash
Causal Discovery and Validation with Categorical Data
A special congratulations goes to Fulvio for being the first recipient of not only a judging category award but also the Community Choice Award. Fulvio is an electrical engineering bachelor student at the University of São Paulo in Brazil. Fulvio has been developing a study related to cardiovascular disease prediction using machine learning models as his bachelor’s dissertation. The HPCC Systems platform was used to perform a comparative analysis between different ML algorithms to evaluate which model is more accurate to predict the presence or absence of cardiovascular diseases in patients by using simple clinical data as input.
2023 Community Choice Award
Fulvio Favilla Filho
Machine Learning Approach to Cardiovascular Disease (CVD) Prediction
HPCC Systems Community Day Summit
Plenary Sessions
Let’s begin with both of the plenary sessions. The first featuring LexisNexis Risk Solutions Technology Leaders and Community Keynote Speakers as well as the second session where the winners of the 2023 Poster Competition were announced along with a recap of all the academic and industry events the HPCC Systems team participated in, throughout 2023. Please also take a look at the 2023 Community Catch Up blog for the full details on these events.
2023 HPCC Systems Community Summit:
Welcome & Plenary Keynotes
Join Gavin Halliday, SVP and Head of Platform Engineering, LexisNexis Risk Solutions, as he kicks off the 10th annual HPCC Systems Community Virtual Summit. Also featured are our community keynote speakers Bill Franks, Director of the Center for Data Science and Analytics, Kennesaw State University. Followed by Bahar Fardanian, Manager, Solutions Engineering, LexisNexis Risk Solutions, as she introduces Gus Cawley, Chief Executive Officer, ZipApply.
2023 HPCC Systems Community Summit:
Awards Ceremony & Plenary
Join Trish McCall, Hugo Watanuki, George S Foreman, and Bob Foreman, from LexisNexis Risk Solutions, for the long-awaited announcement of the 2023 Community Awards and Poster Competition winners, followed by the afternoon keynote honoring the academic community and engagement.
Cloud Strategies with HPCC Systems
Although the HPCC Systems journey to the cloud started a few years ago, the community members continue to push for improvements and state of art technologies to optimize the HPCC Systems deployments and usage in the cloud. These sessions present the latest techniques in optimizing the Cloud Native HPCC Systems platform.
2023 HPCC Systems Community Summit:
HPCC Systems Deployment using K3D Instance
This integration package enables HPCC Systems trainers and trainees to install the HPCC Systems engine in any standalone system for training and practice. This will enable you to work on HPCC Systems components including Thor, Roxie, Dali Storage, and ECL at the localized environment.
2023 HPCC Systems Community Summit:
A Better Understanding of Thor & Roxie Config Using Terraform
In the HPCC Systems Cloud Native Platform, configurations for the Thor and Roxie engines are coded in YAML, stored in a file and passed to the helm deployment. For this presentation, the goal is to go over those settings, recommendations and best practices, then deploy the HPCC Systems Platform using Terraform.
2023 HPCC Systems Community Summit:
How to Enable Azure Log Analytics for the Containerized HPCC…
For log solutions, the HPCC Systems Cloud Native Platform provides a standard log interface to process HPCC Systems logs. This presentation will demonstrate how to enable Azure Log Analytics using the newly developed Terraform module for HPCC Systems logs.
2023 HPCC Systems Community Summit:
Optimizing Business Solutions with Azure Full-stack Products
Learn how the LexisNexis Risk Solutions business in Brazil launched two data products using an HPCC Systems implementation on Azure. The speakers will provide an overview of the components including a Web portal, Batch and ESP API as delivery channels served from a Roxie and MySQL backend, all on commercial Azure. They will close with a product demo of identity verification and fraud solutions using fictitious data while diving into the architecture, best practices, challenges, and technical experiences shared by the team.
Productivity with HPCC Systems
Whether you are new to the platform or an experienced HPCC Systems user, the community members always find new ways of doing things better. These sessions contain useful content put together by some of the distinguished community members to help you make the most of the improvements and innovative features in the latest HPCC Systems 9.x release.
2023 HPCC Systems Community Summit:
Roxie Performance: A Deeper Technical Dive into 9.x
In this presentation, Gavin will discuss some of the different factors that affect Roxie performance, as well as cover details including some of the recent platform changes which aim to make this job easier. Anyone deploying production Roxie queries or interested in the technical details of how Roxie indexes are implemented can benefit from attending this session.
2023 HPCC Systems Community Summit:
HPCC Systems Landing Zone Security
Data is imported to and exported from an HPCC Systems Logical File from a file system location known as a Landing Zone. Landing zone access was previously authorized via the LDAP Security Manager feature permissions, but once a user had access to it they could access the entire Landing Zone directory. Beginning with HPCC Systems Version 9.0, admins can specify Landing Zone scopes, using the same ECL Watch interface as file scopes and workunit scopes. This presentation describes the motivation for the additional security and walks the attendee through the creation of these scopes.
2023 HPCC Systems Community Summit:
ECL Watch: Redefining Progress
Delve into ECL Watch and check out the snapshot queries in Roxie, logging, hot spot identification, cost estimation, and metrics for efficient data retrieval.
2023 HPCC Systems Community Summit:
A Threat Detection & Mitigation Framework: HPCC Systems & Azure…
While it is fantastic to advertise that we are confirming various security controls (CIS, FedRAMP), how do we actually prove it? This talk is going to explore in-depth the concepts around control effectiveness by showing the audience some example security controls we are implementing at LexisNexis Risk Solutions and how the HPCC Systems data analytics platform is used to collect, explore and analyze the data.
Programming with HPCC Systems
Being such a complete and powerful programming language, ECL allows our community members to work with and explore the most diversified data formats. These sessions present innovative ways ECL can be extended to support different use cases involving big data processing and analysis.
2023 HPCC Systems Community Summit:
Bitcoin Blockchain Parser + Optimization of Learning Trees…
This joint session includes two presentations from RV College of Engineering featuring the work from academic collaboration as part of the HPCC Systems Centre of Excellence (CoE) in Cognitive Intelligent Systems for Sustainable Solutions (CISSS).
2023 HPCC Systems Community Summit:
Building Trustworthy and Auditable Digital Human Readers…
Machine Learning, Neural Networks and Large Language Models are statistical and opaque, human memory is fallible, and NLP++ is now stepping in to fix these problems and build trustworthy systems in areas such as law enforcement, healthcare, and sentiment analysis. Not only can NLP++ perform better than humans, but it can build systems that eventually can be better than human experts. Unlike the statistical methods of ML, NN, and LLM which are opaque and not auditable, NLP++ is a verifiable and auditable technology which is a game-changer in today’s critical systems. NLP++ is a glass box computer programming language and framework that seamlessly integrates with HPCC systems to provide sophisticated text processing that until now, have been only achievable by humans. We will look at work being done at Clemson to help solve the medical coding task that is currently done by human readers and has a 50% failure rate.
2023 HPCC Systems Community Summit:
Digital Human Readers Making History with NLP++ and HPCC Systems
From resume processing, to sentiment analysis, to digital dictionaries, NLP++ and HPCC Systems are making history. Searching for resumes is precarious when using simple keyword search and new work using NLP++ is making resume searching an exact science. Sentiment analysis till now has been done with statistical and keyword methods and have been too generic to used as real-world systems. Two sentiment analyzers break that mold, one involving soccer teams in Brazil, and the other cricket teams in India. NLP digital dictionaries have either been stuck in time for decades or are non-existent. Two Wiktionary projects use NLP++ to parse linguistic information from Wiktionary pages, processing hundreds of thousands and millions of pages using HPCC Systems, and creating dictionaries that can be used in future NLP systems. The result will be the most comprehensive English dictionary ever created for NLP and the first digital dictionary with linguistic information ever created for the Tamil language.
2023 HPCC Systems Community Summit:
Parquet Support for ECL
Introducing the Parquet Plugin, an interface between ECL and Apache Arrow that gives the ECL programmer the capability to interact with the parquet file format. This talk will demonstrate how ECL programmers can efficiently read and write parquet files with ease. With this interface, ECL programmers can partition datasets, read any partitioned or non-partitioned dataset, and write to a parquet file. In the demo all the functions of the plugin will be shown. One of the key highlights of the plugin is its capability to handle datasets larger than memory. By leveraging streaming techniques, we ensure efficient processing of large-scale datasets without sacrificing performance or data integrity. Attendees will gain insights into the integration of parquet and how the Apache Arrow library was leveraged to give the ECL programmer efficient access to the parquet file format. A variety of demos to include usage examples will be shown as well as opportunities to ask questions and learn more about the plugin.
HPCC Systems in Action
Born from the deep data analysis experience of our platform engineering team, HPCC Systems is a proven, comprehensive, and dedicated big data platform that can be leveraged in many different scenarios. These sessions present recent real world use cases with HPCC Systems.
2023 HPCC Systems Community Summit:
An Understanding of HPCC Systems Platform Metrics
HPCC Systems provides a metric framework for collecting low level platform specific metrics. This presentation covers the framework, available metrics, component instrumentation, and metric collection. It also covers the new ESP method execution profiling. All of it is tied together with examples from a running cluster.
2023 HPCC Systems Community Summit:
Robots, Drones and Generative AI: Exploring Cutting Edge Tech…
School safety security, and wellbeing have been a driving force for our most recent academic partnership with HPCC Systems. The first phase of our security project began in 2020 with an autonomous mobile sentry designed, built and programmed with facial recognition software developed by our students. This project was very effective, and we wanted to enhance the capabilities by increasing response times for any security concern with the use of drones. This latest version uses autonomous drone technology to enhance the security features of our school by detecting unauthorized personnel on campus, assisting security in response to active shooter situations, gathering information during lockdowns and to track the student/staff evacuation process if initiated.
2023 HPCC Systems Community Summit:
Tombolo – Open-Source Tools for Interacting with HPCC Systems…
Introducing Tombolo version 2.0, an innovative open-source project initiated by LexisNexis Risk Solutions, aimed at providing data catalog capabilities for HPCC Systems clusters. A web application that caters to both technical and non-technical users, facilitating seamless interaction with HPCC Systems clusters. Join us in this enlightening session as we demonstrate the impressive capabilities of Tombolo. Discover how users can effortlessly create workflows by leveraging assets from HPCC Systems clusters, enhanced with our newly introduced versioning tool. This tool empowers users with increased confidence when editing existing workflows. Witness the application’s exceptional ability to monitor assets and proactively send timely notifications via different medias, ensuring crucial data management tasks are completed promptly. Moreover, we’ll showcase Tombolo’s intuitive dashboard and its extensive range of API endpoints, offering a comprehensive overview of asset performance. During the presentation, our team will also share insights into the future development goals of Tombolo, providing a glimpse into the exciting features currently in development and planned for future additions.
2023 HPCC Systems Community Summit:
Vulnerable Victim Monitoring – Connecting the Dots to Find…
According to Child Find of America, an estimated 2,300 children go missing each day. That equates to over half a million missing children per year. Most of these children are runaways, a population that is quite vulnerable to trafficking. Trafficked victims can be exploited for their PII in relation to account take over, stolen or synthetic identity fraud. LexisNexis Risk Solutions Inquiries are comprised of searches for fraud and credit seeking consumers which are stored in HPCC Systems. These hundreds of billions of inquiry data can be searched retroactively and monitored proactively for matches on missing children within a reasonable geographic area. Additional topics of discussion will include data visualization capabilities in Python with data obtained from both the National Center for Missing & Exploited Children’s public RSS feed (current missing cases) as well as all missing cases from the past 20 years (obtained from the ADAM Program). We will also discuss proposals for additional future studies utilizing the power of HPCC Systems to extract other types of big data and provide further insights and information to provide law enforcement and social services to better aid them in the search and prevention of human trafficking.
HPCC Systems Community Day Summit Workshops
Workshops
The HPCC Systems Training and Support team was very busy last year visiting universities and trade show events to promote HPCC Systems and ECL through Code Days, hackathons, and workshops. The 2023 Community Summit workshops presented an in-depth look at some of these events in three 1-hour sessions. Lead trainer Bob Foreman crafted each of these hackathons and then adapted them into workshop format for this event. Bob Foreman has worked with the HPCC Systems technology platform and the ECL programming language for over a decade and has been a technical trainer for over 30 years. He is the developer and designer of the HPCC Systems online training courses and is the senior instructor for all classroom and remote based training.
Part 1:
The “Music is Life!” Workshop
Who doesn’t love music? Take a break from your daily routine datasets and join us in this first hour. We break down a popular open-source music dataset, explore normalizing the dataset and its effects, and look at a variety of data evaluation and query techniques.
Part 2:
The “Find your Paradise” Hackathon
Have you ever thought about building an application that can help people find places to live that maximize their quality of life and happiness?
The goal of this challenge is to analyze different datasets across different categories and correlate them using the HPCC Systems platform. After analyzing, the participants are asked to design an interface to query this data and assign it a scoring system, then deliver it to the user via ROXIE and show the user where they should most likely want to live. Users should be given choices in an easy-to-use form that when submitted will generate a unique set of scores based on locations.
Part 3:
Better Customer Insights Through Relational Dataset Queries
In this final hour of our workshop series, we look at the power of the open data model, transforming normalized data into a denormalized relational dataset, and use the implicit relationality power to analyze relationships to provide better customer insights.
2023 Community Award Winners
In the closing plenary session, the recipients of our David Kan Ambassador and Community Recognition awards were announced. These awards are presented to people who have made a huge difference and contribution to the HPCC Systems Open Source Project. Every year, there are many successes achieved by LexisNexis Risk Solutions employees and community members. All contributions are valuable to the community and this is one way to show appreciation for the commitment and hard work achieved during the year.
2023 Community Recognition Award Winner
Renato de Oliveira Moraes, PhD
Production Engineering Department
University of São Paulo
2023 David Kan Ambassador Award Winner
Lili Xu
Sr Software Engineer
LexisNexis Risk Solutions
See you next year!
The dates have already been set for the next one. Mark your calendars for the week of October 7 for the 2024 HPCC Systems Community Summit. Stay tuned for details!