Welcoming the 2023 HPCC Systems Interns

The HPCC Systems Intern Program has kicked off and the 2023 cohort of interns is the biggest ever, totaling 15 students from North America, Europe and Asia.

As many of you have been following the intern program over the years, you will know that students across the academic spectrum are eligible to take part. This year is no different with students spanning the entire academic range from high school level through PhD. Here are some details about the group:

  • Four of the students are in high school, six are studying for a bachelor’s degree, three are studying for a master’s degree and two are pursuing a PhD.
  • Eleven students are located across the USA, two are in India, one in Italy, and one in Canada
  • Nine mentors in total spread across the USA, UK, and Italy.

This year’s program follows a well-established pattern, covering a wide range of abilities and project types which is great to see. There is something for everyone here, so prepare to get excited about the contributions being produced by this hardworking group of young technologists.

The 2023 HPCC Systems Summer Internship project categories include:

  • Machine Learning
  • Natural Language Processing
  • Test Automation
  • Deployment Automation
  • Architectural Enhancement
  • Technology Marketing

Below you will find a snapshot of each intern and their project. The interns are also encouraged to keep their own blog journal which means if you find any of the 2023 projects particularly interesting, you can follow the project progress via the blog journal link listed under each section below.

HPCC Systems Machine Learning Intern Projects

The category of Machine Learning projects this year encompassed two major subcategories: improvements to the HPCC Systems Generalized Neural Network (GNN) bundle and to the Causality toolkit.

The HPCC Systems GNN bundle allows the ECL programmer to combine the parallel processing power of HPCC Systems with the powerful neural network capabilities of Keras and TensorFlow. Despite the GNN implementation, endless variations and combinations of the neural network techniques continue to be proposed in order to push the state of the art in machine learning and optimization. Therefore, the projects in this subcategory will research, implement, and evaluate alternative methods for distributed training of neural networks and then, implement the most promising methods on the HPCC Systems platform using the GNN bundle. This year, three students, Boqiang Li, Narayan Kandel and Carlos Caceres, joined the HPCC Systems Intern Program to contribute to the GNN bundle, mentored by Lili Xu (Software Engineer, LexisNexis Risk Solutions). Causal analysis can be performed in the HPCC Systems platform via the Causality toolkit.  This toolkit provides a set of leading-edge algorithms for causal analysis of big data.  These intensive algorithms are parallelized on HPCC Systems clusters, and we are currently utilizing the toolkit to explore ways in which causal analysis can help understand and utilize real-world data. This year, two students, Logan Patterson and Sarah Nash, joined the 2023 HPCC Systems Intern Program to contribute to the causality project, mentored by Roger Dev (Senior Architect, LexisNexis Risk Solutions).

Boqiang Li
PhD in Computer Science, Clemson University, USA
Convert Generalized Neural Network bundle (GNN) to native TensorFlow 2.0

The HPCC Systems Generalized Neural Network (GNN) bundle currently uses TensorFlow 2.0 in compatibility mode, and in order to achieve superior performance, the bundle needs to be updated. Boqiang Li joined the HPCC Systems Intern Program to analyze the current code base of the GNN bundle and to update it, so Tensorflow2.0 is supported natively. Boqiang’ s work involves upgrading the GNN API to TensorFlow 2.0, developing and executing test cases, and writing documentation. Find out more about this project by reading Boqiang’s Blog Journal.

Narayan Kandel
PhD in Computer Science, Clemson University, USA
Distributed Neural Network Training and Prediction

During this internship, Narayan has the goal of both researching and implementing alternative methods for distributed training of neural networks on the HPCC Systems platform. To this end, Narayan needs to design and implement alternative distributions models for parallelized training and evaluation of neural networks using the HPCC Systems GNN bundle and TensorFlow. Find out more about this project by reading Narayan’s blog journal.

Carlos Caceres
American Heritage School, Florida, USA
Practical Application of Generative AI Capabilities

Carlos is a junior in high school looking for combining HPCC Systems neural network capabilities for facial recognition with generative language models such as ChatGPT. Carlos’s project was his own suggestion, and its goal is to design and implement a proof-of-concept solution in the context of a school campus environment that is capable of recognizing the student’s emotional state (via a face recognition model trained using the GNN bundle) and greet the student according to their detected emotional state. Find out more about Carlos’s project by reading his blog journal.

Logan Patterson
MSc in Applied Data Science, New College of Florida, USA
Causal Discovery Algorithms

A wide variety of causal discovery algorithms have been described and implemented to date but remain to be thoroughly evaluated in HPCC Systems.  Logan’s project evaluates novel algorithms for causal relationship discovery by using large mixed-data-type real-world datasets.  This involves identifying candidate datasets, defining appropriate analytics, performing causal analysis and publishing results. Find more about this project by reading Logan’s blog journal.

Sarah Nash
MSc in Applied Data Science, New College of Florida, USA
Causal Model Validation

Sarah’s project focus on methods for assessing the correctness of a causal model given a dataset thought to be produced by that model. Therefore, as part of her project, Sarah needs to survey the current state of causal inference and model validation, implement the methods collected, test them and compare the difference validation methods to determine the best to be implemented in the Causality toolkit. Find out more about Sarah’s project by reading her blog journal.

HPCC Systems Natural Language Processing Intern Projects

A separate category for NLP projects exists in the HPCC Systems Intern Program since the approach leveraged by the NLP++ plugin does not rely on the traditional machine learning approach for NLP. David de Hilster (Consulting Software Engineer, LexisNexis Risk Solutions) is leading this initiative and it is certainly a passion of his. He wants to create Digital Human Readers for different languages that can understand text just as well as a human who speaks that language (read his blog on the subject here). Three students, Dheemonth Kodali, Kruthika Pinnada, and Shyamaa Karthik, joined the 2023 HPCC Systems Intern Program to complete NLP projects mentored by David de Hilster.

Dheemonth Kodali
RV College of Engineering, India
Sentiment Analysis in English

The goal of Dheemonth’s project is to perform sentiment analysis in English using the HPCC Systems NLP++ plugin. To this aim, Dheemonth needs to collect a large number of tweets from a national cricket team in India and build analyzers that allow identifying the emotion of the author of the tweet. Find out more about Dheemonth’s project by reading his blog journal.

Kruthika Pinnada
RV College of Engineering, India
Resumé Analyzer

Kruthika’s work involves developing an analyzer capable of parsing the different sections of a resume, such as education, professional experience, and demographic, just like a human reader does. The parsed data supports the creation of a knowledge base to store the data of all resumes analyzed.  Find out more about Kruthika’s project by reading her blog journal.

Shyamaa Karthik
Saint Andrew’s School, Florida, USA
Processing the Tamil Wiktionary pages into a NLP++ dictionary

Shyamaa is a high school student looking to creating a solution to parse Wiktionary pages written in Tamil (a language spoken primarily in India) into an NLP++ dictionary. Once developed, the dictionary can be used to process text written in Tamil similar to a ciphering machine for electronic reading. Find out more about Shyamaa’s project by reading his blog journal.

HPCC Systems Test Automation Intern Projects

The HPCC Systems platform team has automated test systems called OBT – Overnight Build and Test systems – and Smoketest. In Smoketest, the execution time of the Regression Suite is important, and failures, time-outs or engine crashes can increase its execution time significantly. On the other hand, in OBT the execution time is not as critical, but the goal is to obtain a whole picture of the platform.

A separate category for test automation projects exists in the HPCC Systems Intern Program and they are usually mentored by Attila Vamos (Consulting Software Engineer, LexisNexis Risk Solutions) in coordination with other mentors, such as Krishna Turlapathi (Director Software Engineering, LexisNexis Risk Solutions).

Three students, Johnny Huang, Nivedha Sivakumar, and Noah Seligson, joined the 2023 HPCC Systems Intern Program to complete test automation projects.

Johnny Huang
BSc in Computer Science, University of Toronto, Canada
Improve Error Handling and Reporting for Automated Test Systems

Johnny’s work aims at assisting HPCC Systems testers and developers to easily access detailed information from the test reports generated by GitHub actions.  Therefore, Johnny focuses primarily in improving the GitHub actions scripts to analyze logs and provide detailed information on the test executed. As a secondary goal, Johnny also looks to improve the fault tolerance of the test systems by adding logic to retry failed actions. Find out more about Johnny’s project by reading his blog journal.

Nivedha Sivakumar
BSc in Computer Science, Georgia State University, USA
Test Suite for a Roxie Cluster on Kubernetes

Nivedha’s project focuses on the development of a test suite for the Roxie cluster of a containerized HPCC Systems environment. To this aim, Nivedha leverages the HPCC Systems regression test suite and performance test suite to adapt these tests suites to a containerized paradigm, focusing on Roxie jobs in various cloud setup configurations, such as different storage types, cluster sizes, Kubernetes node sizes, etc. Find out more about Nivedha’s project by reading her blog journal.

Noah Seligson
BSc in Computer Science, University of Central Florida, USA
Convert Automated Test Systems from Python2 to Python3

Noah is a returning student from 2022. This year, Noah’s work involves updating the OBT and Smoketest code from Python2 to Python3, by using a combination of manual and automated migration tools such as 2to3. During this process, Noah also aims at removing unnecessary source code that serves no tangible purpose and document potential duplicated functionalities so they can be analyzed and replaced by a unary function.  Find out more about Noah’s project by reading his blog journal.

HPCC Systems Deployment Automation Projects

The introduction of cloud native support for HPCC Systems has opened up a plethora of opportunities for those students looking to learn DevOps skills. One such opportunity is the exploration of best practices for adopting concepts such as Infrastructure as a Code (IaC) for HPCC Systems deployments in the cloud. Starting in 2023, deployment automation projects have been grouped under a category of automation projects.

Two students, Hiroki Sato and Jessie Mao, joined the 2023 HPCC Systems Intern Program to complete HPCC Systems deployment automation projects. Hiroki and Jessie are being mentored by Wayne Carty (Architect, LexisNexis Risk Solutions) and Xiaoming Wang (Consulting Software Engineer, LexisNexis Risk Solutions), respectively, plus Godji Fortil (Software Engineer, LexisNexis Risk Solutions) as a backup mentor to both students.

Jessie Mao
Lambert High School, Georgia, USA
HPCC Systems Cloud Deployment with Various Helm Chart Configurations

Jessie is a high school student looking into exploring the automation of various HPCC Systems deployments in Kubernetes using helm charts. The end goal of Jessie’s work is to provide template helm charts for deploying HPCC Systems in different scenarios, such as a Thor only cluster, a Roxie only cluster, logging, monitoring, and much more. Find out more about Jessie’s project by reading her blog journal.

Hiroki Sato
MSc in Computer Science, Indiana University, USA
Investigate Frameworks and Best Practices for HPCC Systems Cloud Native

Hiroki’s work aims at developing opinionated modules using Terraform for automating HPCC Systems deployments in Amazon Web Services (AWS). As part of this development, Hiroki needs to consider elements such as cloud costs and system performance as part of building these modules, as well as system logging and metrics that are essential to developers and sys admins. Find out more about Hiroki’s project by reading his blog journal.

HPCC Systems Architectural Enhancement Projects

For any students that would like to contribute to the HPCC Systems open source project via the exploration of potential architectural enhancements, the HPCC Systems Intern Program reserves a separate category of projects: HPCC Systems Architectural Enhancements. These are projects usually mentored by the HPCC Systems platform engineering team and has a specific focus on improving systems architecture or adding new capabilities to the platform.

One student, Ryan Rao, joined the 2023 HPCC Systems Intern Program to complete a HPCC Systems architectural enhancement project and is being mentored by Xiaoming Wang (Consulting Software Engineer, LexisNexis Risk Solutions) and Godji Fortil (Software Engineer, LexisNexis Risk Solutions).

Ryan Rao
American Heritage School, Florida, USA
HPCC Systems Storage Support with Container Storage Interface (CSI)

Ryan Rao is high school interested in exploring HPCC Systems storage options available in AWS. The objective of Ryan’s project is to provide helm chart examples for AWS Container Storage Interface (CSI) driver support on various storage types, such as, for example, AWS EFS and FSx Lustre. Find out more about Ryan’s project by reading his blog journal.

HPCC Systems Technology Marketing Intern Projects

This category was created in 2022 as part of an effort from the HPCC Systems Internship Program to improve and extend the program to students interested in technology but who do not study a computer or data science related subject. Specifically, the technology branding and marketing projects allow the students to improve the HPCC Systems market presence. 

One student, Elizabeth Lorti, joined the 2023 HPCC Systems Intern Program to complete a technology branding project and is being mentored by Jessica Lorti (Director of Marketing, LexisNexis Risk Solutions).

Elizabeth Lorti
Bachelor of International Development, King’s College, UK
Technology Branding and Marketing

Elizabeth is also a returning student from 2022. In 2023, having gained experience in marketing, messaging and communications, Elizabeth’s work now involves a complete review of HPCC Systems website, digital campaigns and social channels looking for potential areas of improvement. Another area of focus for Elizabeth this year is the HPCC Systems Community Summit where she provides speaker and attendee engagement. Find out more about Elizabeth’s project by reading her blog journal.

More to come on these projects

The 2023 program is at full swing now and there will be plenty of opportunities to hear more about individual projects in the coming months. Some interns will provide a blog that will be featured on our website, others will speak at our 2023 Community Day Summit in October.

All the interns are also producing posters about their work which will be available in the HPCC Systems student wiki alongside an abstract and 5 minute video presentation (see last year’s here). 

Please keep an eye out later in the year for the posters to be submitted into our 2023 Poster Contest and remember, you get to vote for the winner as an attendee of our 2023 Community Day Event in October. The Community Choice Award goes to the poster presenter who wins the vote of attendees on the day, so make sure you plan to be there and have your say!

Take a look at all last year’s poster contest participants which also includes entries from students working with our academic partners on HPCC Systems related projects.

HPCC Systems Intern Program 2024

The proposal period for the 2024 HPCC Systems Intern Program is now open! In the meantime, if you are a student thinking of applying or know someone you’d like to encourage to join, visit our list of available projects (new projects will be added soon), watch our info session webinar and read our blog about the program for more information.