2021 Poster Contest – Judges, participants and of course, the winners!

As usual, we announced the winners of the 2021 HPCC Systems Poster Contest at our Virtual Community Day Summit event. But the contest really started months earlier. Students, particularly those involved in the 2021 HPCC Systems Intern Program, were working on their posters in July and August, while the judges had already been invited to participate way back in April.

Since we knew the contest was going to be completely virtual again, Trish McCall and I set in motion the process involved in collecting all the resources together. Last year’s contest worked so well for everyone that we decided to follow the same process. All students had to submit their poster, an abstract and a five minute video presentation. The 2021 HPCC Systems Technical Presentations Wiki was setup and as the materials arrived, a page was created to display each student’s resources. The judges use these resources to review each poster and choose winners in four categories:

  • Best Use Case
  • Best Platform Enhancement
  • Best Research
  • Best Data Analytics

Q and A sessions followed in September, where judges got to meet the students in groups and find out more information to help them come to their final decisions.

The poster resources were also made available in our 2021 Virtual Community Day web application. Delegates were invited to view these resources and vote for their top three favourite posters, culminating in the presentation of our Community Choice Award to the student whose poster received the most votes. This award and those chosen by our poster contest judges were announced at the end of the day.

You can watch our 2021 Community Awards Ceremony here. Links to each poster presenter’s resources are included in this blog (via the 2021 Poster Contest Wiki) so you can take your time reviewing them.

Our 2021 Poster Contest Judges

Every year, the quality of the projects illustrated by each poster is incredibly high and the wide variety of topics makes it very difficult to be an expert across all of the areas covered. So our judges always have their work cut out for them. While they make their judging decisions individually, as a group, they bring with them a equally diverse range of knowledge, experience and critical thinking.

Three of our judges were colleagues from LexisNexis Risk Solutions Group:

Greg Panagiotatos
Software Engineer III

Image showing Greg Panagiotatos

Greg has been with LexisNexis Risk Solutions Group since 2001, starting off in LexisNexis Legal and Professional, Content Creation/Global Electronic Product Delivery. He joined the HPCC Systems Platform team in 2010 just as we were going open source. In that time, he has been responsible for producing platform related documentation. He has help to build a documentation delivery system based on open source standards which automates documentation production and even helps to develop some self-documenting software.

More recently, he has been working on a series of How-To videos to supplement the traditional HPCC Systems documentation and support adoption of our new Cloud Native Platform.

Greg spoke at our 2021 Virtual Community Day Summit about HPCC Systems Logging in the Cloud and an Elastic Stack Solution  alongside our colleague Rodrigo Pastrana, Architect, LexisNexis Risk Solutions Group.

Jessica Skaggs
Consulting Software Engineer

Image showing Jessica Skaggs

Jessica has been with LexisNexis Risk Solutions Group since 2001. She spent 16 years as a developer and technical lead with LexisNexis Legal & Professional, before moving to her current role 4 years ago, when she began learning about and working with HPCC Systems, ECL, and Power BI. She has a BS degree in Systems Analysis from Miami University (OH). Jessica spoke at our 2021 Virtual Community Day Summit about HPCC Systems Thor Monitor – Using Workunit Services and Power BI to Monitor Thor Activity.

Kevin Wilmoth
Manager, Software Engineering

Image showing Kevin Wilmoth

Kevin began his career at LexisNexis Legal and Professional in 1997 and 10 years later moved to another group to work as a user of HPCC Systems. He developed an entity recognition and fact extraction system to automatically identify attorneys, lawyers, and facts about US Caselaw to enhance our product offerings to lawyers.

Kevin then joined LexisNexis Risk Solutions Group as an engineer on the Linking team, supporting an existing process that builds business profiles for the purpose of searching businesses. He is now the manager of the External Linking team, which is responsible for providing search interfaces that allow our products to uniquely search and identify persons, businesses, and health care providers.

Kevin has provided mentoring to a student who joined the HPCC Systems Intern Program two years running. Farah Alshank, Masters in Computer Science from Clemson University, submitted posters about her 2018 and 2019 intern projects, which focused on using the HPCC Systems Machine Learning Library.

We were introduced to our fourth judge thanks to a longstanding connection between Florida Atlantic University and the HPCC Systems Academic Program. In 2018, Taghi Khoshgoftaar (Motorola Professor from the Department of Electrical Engineering and Computer Science) suggested that a student of his might like to join our HPCC Systems Intern Program, which he did, three years running!

Robert Kennedy
PhD Candidate
Florida Atlantic University

Image showing Robert Kennedy

Robert joined the HPCC Systems Intern Program for the first time in 2018, completing a project using HPCC Systems to research distributed deep learning with TensorFlow (Watch 2018 Conference Presentation / View Slides). He was the 3rd Prize Winner at our 2018 Poster Contest (View Poster). In 2019, he returned to complete another internship and his poster, GPU Accelerated Neural Networks on HPCC Systems Platform (View PosterWatch 2019 Conference Presentation / View Slides) winning first prize in our contest that year. In 2020, he won the best poster prize in our Data Analytics category. His project involved expanding on the previous year’s work, focusing on Distributed GPU Accelerated Neural Networks with GNN (View Poster / Watch 2020 Conference Presentation).

Having been a winner of our poster contest three years running, Robert was perfectly placed to be a judge in 2021 and have the opportunity to see this contest from other side of the fence.

Judging the Posters

Each poster is reviewed, the supporting video watched, Q and A sessions attended and at that point, our judges have all the information they need to record their results, making a note of any comments too.

During the Q and A group meetings, each student was in the ‘hot seat’ for a while. Judges asked their questions and engaged with each poster presenter ‘in person’. These meetings were really helpful in teasing out extra information to support the judges scoring decisions, while providing students with the opportunity to drill down and elucidate, particularly on unfamiliar or niche topics. I am sure it may have felt like an intense experience some of the time, but each Q and A session was also a very positive and fun hour with students showing an interest in the projects of the other students joining their group session.

Image showing Q and A sessions with poster presenters and judges

After the Q and A sessions were complete, the judges scorecards were returned and the results compiled. Trish and I usually do this together and it is very exciting and also quite hard to keep the award winners a secret for several weeks!

Meet the 2021 Poster Presenters

The number of entries in 2021 was the highest ever with 18 posters submitted, so prepare yourself for treat. There is something for everyone here, from data analytics and machine learning to platform enhancements, interesting use cases and research projects. These are the categories the judges are focused on and a prize is awarded to the winner in each one. Take a look at the full suite of submissions and see whether you agree with the judges choices!

This year’s presenters included nine HPCC Systems interns and nine presenters from other academic partner projects. Here are some additional statistics showing the diversity of this group of talented students:

Click on the poster title to view the resources available for each poster presenter.

Poster Presentations by 2021 HPCC Systems Interns

It is a requirement of the HPCC Systems Intern Program that students produce a poster illustrating the progress and achievements made during their 12 week internship. Not all students are available to to take part in the contest, but in 2021, nine out of the 12 interns that joined the program entered the contest. They achieve a lot in 12 weeks with support and guidance from mentors who are experts in the chosen project area.

Achinthya Sreedhar
Bachelor of Computer Science
RV College of Engineering, Bengalaru, India

Image showing Achinthya Sreedhar

Improving conditional probability calculations using kernel methods in Reproducing Kernel Hilbert Space (RKHS) as a part of the Causality Project

Conditional Probability is a key enabling technology for Causal Inference. For real valued variables, calculating conditional probabilities is particularly challenging because they can take on an infinite set of values. With the increase in conditional dimensions, the data appears sparser and sparser making it difficult to derive accurate results. After looking at various ways of modelling conditional probabilities, we found that using RKHS kernel methods, it was possible to estimate the density and cumulative density of conditional probabilities with a single conditioning variable.

Comments from the judges’ scorecards

Judges were interested to hear how all the methods for this project are included our emerging Causality Toolkit and were developed by Achinthya alongside the leader of our Machine Learning Library open source project, Roger Dev. The algoirithm Achinthya worked on is portable to other data analytics platforms, but the benefit of using HPCC Systems is the speed due to the parallel processing. A real world application for this work might be in the healthcare field where it could be used to calculate outcomes for patients based on events and medication taken.

Amy Ma
Marjory Stoneman Douglas High School, FL, USA

Image showing Amy Ma

Ingress Configuration

An Ingress is an object that allows access to Kubernetes services from outside the Kubernetes cluster.  Ingress is made up of an Ingress object and the Ingress Controller.  An Ingress Controller is the implementation of the Ingress.  In this project, two Ingress implementations, HAProxy and Nginx were examined on Azure environment.  These two Ingress controllers both use the in-cluster Ingress solutions, where load balancing is performed by pods within the cluster. Amy’s works explores the different setup used to configure Ingress features through annotations and Kubernetes ingress specifications.

Comments from the judges’ scorecards

The judges were interested to discover that the latency testing Amy carried out, showed that there is not much latency overhead which is extremely beneficial. She chose HAProxy and Nginx because they have similar features, making it possible to test and compare the ones they had in common. She identified that the TLS and basic routing features were the most important ones to use. In her presentation and Q and A session, Amy demonstrated the knowledge and understanding she gained while working on a topic that was previously unknown to her.

Atreya Bain
Bachelor of Computer Science
RV College of Engineering, Bengalaru, India

Image showing Atreya Bain

Improvements on HSQL: A SQL-like language for HPCC Systems

Big Data has become an important field, and there is a steep learning curve to getting used to handling Big Data, especially in distributed systems. HSQL for HPCC Systems is a solution that is developed for allowing users to get used to its architecture and the ECL language with which it primarily operates. HSQL aims to provide a seamless interface for data science developers to use, for working with data. It is designed to work in conjunction with ECL, the primary programming language for HPCC Systems, and should prove to be easy to work with and robust for general purpose analysis.

Comments from the judges’ scorecards

Judges discovered that HSQL is available for use now. It is aimed at people who may already know SQL but may be unfamiliar with the ECL languages and HPCC Systems as a way to get up and running quickly. When the ECL has been generated, users have the flexibility to choose to submit it automatically, or submit it themselves. Users can also export the generated ECL for use on other platforms. Ongoing work on this project will focus on trying to find a way to work with keyed datasets and added a feature to support the SELECT statement.

Carina Wang
American Heritage School, FL, USA

Image showing Carina Wang

Processing Student Image Data with Kubernetes and HPCC Systems GNN on Azure

In order to foster a safe learning environment, measures to bolster campus security have emerged as a top priority around the world. The developments from my internship will be applied to a tangible security system at American Heritage High School (AHS). Processing student images on the HPCC Systems Cloud Native Platform and evaluating the HPCC Systems Generalized Neural Network (GNN) bundle on cloud ultimately facilitated a model’s classification of an individual as “AHS student” or “Not an AHS student”. This will allow a person to receive confirmation from the robot that they are in the student database and retrieve information as part of a larger, interactive security feature.

Comments from the judges’ scorecards

Carina contributed to the HPCC Systems Open Source Project in many different ways during her 2021 internship, including creating a number of JIRA tickets to resolve issues found and suggest possible enhancements. The judges were impressed by this real world use case of HPCC Systems. Using her work, the robot was able to provide a yes/no answer when attempting to identify a student and was also able to output the correct student ID. It is not currently setup to cope with twins, but that may be something to factor in at some point which would require more variations of similarity. Carina would also like to test detection while the robot is in motion as well as when stationary. The big picture goal here is to help with school security and provide a version that is compatible with a smart phone.

Chris Connelly
Data Scientist
North Carolina State University, USA

Image showing Chris Connelly

Ingestion and Analysis of Collegiate Women’s Baskteball GPS Data in HPCC Systems and RealBI

In the past NC State Strength and Conditioning has worked with HPCC Systems to create solutions for taking different data streams, bringing them together for comprehensive analysis to improve athlete wellbeing and performance. This solution using HPCC Systems and RealBI provides insights from data provided by the NC State Women’s basketball team. Also, see some differences from working with a Bare Metal environment to a Kubernetes environment. See how these solutions can help our understanding of this data to provide better service to these student athletes.

Comments from the judges’ scorecards

Chris’s data was mainly GPS data in one big file and involved pulling the data from logical files stored on a Thor cluster. Chris found RealBI easy to work with, since there was no need to upload data to tables to use it. Judges found this project to be a fascinating use case of HPCC Systems, which shows clearly how to process data from a real world activity and present the information to users in a very accessible way. The data was collected from athletes wearing a vest and heart rate monitor. Work done over time was also monitored. Future work might include cluster analysis of the data, potentially providing predictive capabilities focusing, for example, on injury based on drills or player load.

Jefferson Mao
Lambert High School, GA, USA

Image showing Jefferson Mao

Toxicity Detection

Not only was the creation of the internet the largest technological breakthrough of the 20th century, it also happened to become a hidden double-edged sword. The internet has allowed us to access information and communicate at unprecedented levels, across the globe. Yet, this comes at an enormous cost. Hidden behind computer screens, we enjoy a security blanket of anonymity, which emboldens some to say and do things that are labeled as disturbing in a public setting. By creating a Toxicity Detection Platform, I aim to curb this harassment and provide a healthier web environment for everyone.

Comments from the judges’ scorecards

Jefferson use a labeled Kaggle dataset focused on posts from different social media message boards and chat rooms. His project provided an interesting use case for the HPCC Systems GNN machine learning bundle and he ran ROXIE queries to provide a UI via ECL Watch. He first worked on a proof of concept using Python and then moved it over to HPCC Systems, which was an easy transfer process. Judges felt that this project was an excellent choice since it is a topic that is very pertinent to today’s world and something that is on the minds of anyone who uses social media platforms, particularly parents.

Mayank Agarwal
Bachelor of Computer Science and Engineering
RV College of Engineering, Bengalaru, India

Image showing Mayank Agarwal

Independence Testing with RCoT : Causal Validation and Discovery for HPCC System Causal Toolkit

The new science of Causality promises to open new frontiers in Data Science and Machine Learning, but requires an accurate model of the causal relationships between variables. This causal model takes the form of a Directed Acyclic Graph (DAG). Nature provides a few subtle cues to the structure of the causal model, the most important of which is the independencies or conditional independencies between variables. These independencies allow us to test a causal model to determine if it is consistent with the observed data, and in some cases to discover the causal model from data alone.

Comments from judges’ scorecards

Judges feel that this work is a great addition to the HPCC Systems Machine Learning Library. This project involved intensive research in an emerging area and Mayank’s results were interesting, showing some good initial results which will be improved with future work. This project marks the initial stage of the work involved. All the results were measured in Python and have yet to be integrated into the HPCC Systems environment. This is the focus of future work and will involve embedding the Python code within the ECL language. Causal models and independence are useful for computing scenarios where we don’t exactly know what may happen.

Nikita Jha
Northview High School, GA, USA

Image showing Nikita Jha

Apply Docker Image Build and Kubernetes Security Principles

With cybersecurity attacks becoming more prevalent, organizations are constantly looking for ways to improve security on their platforms. The new HPCC Systems cloud-native platform uses Docker containers managed by Kubernetes to store and manage data. With this new change, it is of utmost importance that HPCC Systems has a secure cloud environment since they are using it to manage secure data from other companies.

Comments from the judges’ scorecards

Nikita was completely new to HPCC Systems and the area covered by her intern project. Since security is such an important topic, judges commented on the value of her achievements and were impressed by her blog contribution on using Hashicorp Vault. Nikita did some initial research to identify tests to perform which is how she landed on the idea to disable caching. When using her Windows laptop to build a module, she encountered a problem which turned out to be an issue with the configuration Apple M1 chip. When caching was enabled, she discovered that building the module took a significant amount of time in comparison with when caching was disabled. This brought the build time down from 3 hours or more down to 10 minutes with caching disabled. A great finding to share!

Roshan Bhandari
Masters in Computer Science
Clemson University, USA

Image showing Roshan Bhandari

Use Azure Spot Instance with HPCC Systems for Cost Optimization

Minimizing the cost of setting up cloud infrastructure is very important for all companies. Azure spot instances can provide great cost savings for cloud infrastructure setup. Azure Spot Instances are unused computing resources (virtual machines). Azure provides them for a lower price compared to normal virtual machines. Azure gives these instances at a rate that can be as low as 90% below the normal instance. The price varies based on region and size. This project analyzes different aspects related to the use of Azure Spot Instance with HPCC Systems.

Comments from the judges’ scorecards

Roshan’s project involved testing HPCC Systems with 2 instances, spot and Kubernetes. Roshan used the information provided by Microsoft to learn how often the various nodes will be evicted. He wrote scripts to bring up the node and track whether they were evicted and if they were evicted when running specific types of jobs. This is a particularly timely piece of work for anyone who is looking to transition to the cloud and wants to be mindful of identifying potential cost savings. Judges were interested to learn that running outside of regular working hours had less likelihood of eviction and cheaper costs.

Poster Presentations by Academic Partners

The HPCC Systems Academic Program supports research and collaborative projects using the HPCC Systems open source platform. We partner with a number of academic institutions, including schools and universities around the world. The posters shown below were submitted by students working on projects with their professors in collaboration with LexisNexis Risk Solutions Group over several months or even years. Some of these projects may form part of a course that either includes a work placement segment or requires the completion of an in-depth project. Others contribute to ongoing long term research projects by university departments. Mentoring of students is shared between university professors and our LexisNexis Risk Solutions Group colleagues.

André Fontanez Bravo
Industrial Engineering
University of Sao Paulo, Brazil

Image showing André Fontanez Bravo

Big Data and Logistic Regression applied to Analysis of Loan Requests

Big Data and its applications are becoming more and more important across many different fields. In this context, techniques and tools that are able to process the immense flow of information to create value can be powerful instruments. This study focuses on the application of data analysis to financial investments at LendingClub’s platform. LendingClub is an American peer-to-peer lending company. As investing in loans that end up not being paid evidently incurs in financial losses, we need a way to identify loan requests that have a higher probability of being paid on time.

Comments from the judges’ scorecards

André used a Kaggle dataset which was cleaned and organised using HPCC Systems and the ECL language. Judges felt that this piece of research was highly relevant to the finance industry as well as a great use case for HPCC Systems and our Machine Learning Library. Future work might include comparing the model used with others such as a Decision Tree.

Bruno Carneiro Camara
Electrical Engineering
University of Sao Paulo, Brazil

Image showing Bruno Camara

Preventing Fraud by Registration Inconsistencies

Lots of money is lost because of fraud committed by companies. There are already laws to punish company partners for these abusive acts for their own benefit, but how can the authorities locate and take the necessary actions? My work identifies registration inconsistencies, suspicious behaviors or unusual situations may prevent or locate frauds. Using three different public databases as the starting point, I was able to link companies and partners to suspicious behaviors, such as receipt of undue government benefit by company partners and reports of work analogous to slavery in companies.

Comments from the judges’ scorecards

Judges were interested about the definition of work analogous to slavery. Interestingly, in Brazil, the government has a database of companies that have been reported for this type of activity. There is also some monitoring of companies that may be improperly receiving government financial aid, focusing on people high up in organisations who make claims. This project peaked the curiosity of the group as it focused specifically on issues of current local interest in Brazil. Although, the wider relevance of this project work is clear to see. Bruno mentioned how one of the greatest challenges in the data cleaning process was replacing the accent characters.

Chirag Bapat
RV College of Engineering, Bengalaru, India

Image showing Chirag Bapat

Comparative study of HPCC Systems and Hadoop

In order to constantly evolve and generate better results from any system, we require constant studies to be conducted to assess and compare the performance of new and upcoming systems with the current industry standards. Through our project, we intend to perform a similar comprehensive comparative study between the current standard in Big Data Analytics systems (Hadoop) and that provided by HPCC Systems. This will allow us to assess both the similarities and differences between the two setups, which in turn will assist the end user or the client to make a better and more informed choice about the kind of system to be set up for their specific requirements.

Comments from judges’ scorecards

This project did a great job of providing useful guidance about how to choose a big data platform, particularly for students. This was Charag’s first time using HPCC Systems and the ECL language and for him, the overriding factor was which system performed better. While he found Hadoop slightly easier to use at the start, HPCC Systems performed better. This was what he had expected but all the same, it’s good to see this hypothesis confirmed in his results. Runtimes were reduced by almost 3 times with HPCC Systems vs Hadoop. Charag clarified this finding by telling the judges how for a small dataset with 1200 records, HPCC Systems was 3 times faster and for a large dataset of 1.7 million records, HPCC Systems was 1.7 times faster than Hadoop.

Deeksha Shravani
Bachelor of Computer Science and Engineering
RV College of Engineering, Bengalaru, India

Image showing Deeksha Shravani

Developing a Recommendation System for a Virtual Reality based Supermarket using Big Data Platforms

This poster introduces a Virtual Reality (VR) based online shopping platform and its integration with a recommendation system with the demonstration of the virtual environment. With the advent of the pandemic, the ability of virtual reality platforms to provide a realistic shopping experience puts it in a unique position that assures safety and isolation while also offering the benefits of online shopping platforms to both customers and retailers. To foster user adoption and improve the experience of the user beyond the confines of traditional shopping experiences, a recommendation system is needed.

Comments from the judges’ scorecards

Deeksha shared with judges that Python and TensorFlow were used for the GPU training and the GPU was at around 40% with 17 gigabytes of vram. As well as building the recommendation system, Deeksha also built the virtual reality supermarket used for testing. Future work might include looking at how the recommendations compare across similar users and establishing a way to evaluate accuracy.

Francisco Ciol Rodrigues Aveiro
Bachelor of Computer Engineering
INSPER, Sao Paulo, Brazil

Image showing Francisco Ciol Rodrigues Aveiro

HPCC Systems Ingress Configuration with AWS ALB

During this current era of information, the use of cloud computing became a necessity due to the amount of computational power needed. The access to storage and processing power at low cost allied with ease of access are some of the advantages of using such service, which is available as platform as a service (PaaS), software as a service (SaaS), infrastructure as a service (IaaS), and hardware as a service (HaaS). In the IaaS model payment is normally under the Pay-as-you-go politics, where you pay for what you’re using. Though pricing may be cheap, the misusage of resources and unnecessary uptime can bring up the cost.

Comments from the judges’ scorecards

Fransisco was completely new to AWS and Kubernetes when he started this project which focused on cost effectiveness and security. He was also using HPCC Systems for the first time which involved having to quickly get up to speed with the platform. Future work may move on to look at comparisons with other cloud providers and creating a Helm chart for people to use when setting up their own system.

Guilherme Santos da Silva
Bachelor of Computer Engineering
Universidade Tecnológica Federal do Paraná, Brazil

Image showing Guilherme Santos da Silva

HPCC Systems File Usage Monitor

A cluster is a connection between two or more computers with the purpose of improving the performance of systems in performing different tasks. In the cluster, each computer is called “node” and there is no limit to how many nodes can be interconnected. Then, computers start to act within a single system, working together in processing, analyzing and interpreting data, information and/or performing simultaneous tasks.

It is interesting to know information about a cluster, such as its capacity and availability.

Comments from the judges’ scorecards

The intention is that the results of this project will be used in production environments by the LexisNexis Risk Solutions teams in Brazil. The ECL jobs that Guilherme ran on hThor took just one second to run. There are many different features that could be added to help users improve the use of and maintenance of an HPCC Systems environment. Disk space usage is one area that was looked at here. As well as factoring in the space taken up by logical files, the physical files on the landing zone were also included in the calculation. In the future, being able to include the tracking of cloud storage costs would be a great addition.

Luiz Fernando Cavalcante Silva
Bachelor of Civil Engineering
University of Sao Paulo, Brazil

Image showing Luiz Fernando Cavalcante Silva

Massive data analysis in public management: A proposal to identify outliers in the São Paulo city government’s real estate registry

The amount of open data made available by government agencies is increasing, resulting in large numbers of datasets with different layouts, formats and frequency updates. Despite being difficult to analyze, these datasets have a large amount of rich information of use for applications involving public policies. The objective of this project is to develop a machine learning pipeline using HPCC Systems that can be ultimately used to identify outliers in the São Paulo city government´s real state registry extract.

Comments from the judges’ scorecards

The project exposed Luiz to real world use of modelling, which is not easy and highlighted how preparing the data can take more work than training and measuring the results. One of the biggest challenges was choosing the right attributes. Using a dataset of 3 million records which had 30 attributes, he reduced the number of attributes to 4. Another challenge was finding that using DBScan, the job did not complete after several days. He was advised to use K-Means which reduced the runtime down to 10 minutes and was a solution well worth discovering!

Murtadha D. Hssayeni
PhD Candidate
Florida Atlantic University

Image showing Murtadha Hssayeni

The Forecast of COVID-19 Spread Risk at The County Level

Early detection of the COVID-19 outbreak is important to save people’s lives and restart the economy quickly and safely. Social behavior, reflected in mobility data, plays a major role in spreading the disease. Daily mobility data is aggregated at the county level with COVID-19 statistics and demographic information for short-term forecasting of COVID-19 outbreaks in the United States. The daily data are fed to a deep learning model based on Long Short-Term Memory (LSTM) to predict the accumulated number of COVID-19 cases in the next two weeks.

Comments from the judges’ scorecards

The aim for this work is to integrate this feature into the main COVID-19 tracker tool. When vaccinations became available, this information was also added to the model, which is fine tuned regularly every two weeks. The model was also adjusted to take account of the Delta variant and it held up well, showing that there is great potential for using it in future pandemic like situations to help with public health policies and save lives.

Shivani C H
Bachelor of Engineering and Computer Science
RV College of Engineering, Bengalaru, India

Image showing Shivani C H

COVID-19 Cases and Vaccination Data Tracker in India

With the global outbreak of COVID-19 pandemic, it has become crucial to track the active cases and vaccination data in order to analyse the current situation and trends. Hence, a systematic way of collecting, processing, enhancing, analysing and visualising the data and trends for better understanding is needed. Through this project, we aim to provide the users with the required information about the covid cases since it’s outburst and the vaccination data in different states of India and country as a whole.

Comments from the judges’ scorecards

Judges were interested to hear that Shivani had made use of the HPCC Systems Visualizer Bundle for her project, which she has found very easy to use. The data she used was pre-cleaned and there were challenges with normalizing the data for the analysis and visualisation steps. Shivani had to use null values which made it hard to compare averages and she had to remove the accumulated data, demonstrating an impressive understanding of the ECL language. The data came from Kaggle and she used embedded Python code before running the jobs on a Thor cluster. Future work on this project might include showing a per capita vaccination rate and a visualization showing the correlation between cases and vaccines.

And the 2021 Poster Contest Awards go to…

Judges record marks for each poster relating to the following criteria:

  • Content
    Project originality and relevance to our open source project and community.
  • Poster design
    The overall appearance of the poster including, organisation, use of visual aids and applicability to the topic video presentation.
  • Presentation
    How effectively the project ideas and challenges are communicated, including an assessment of the clarity and flow of the presentation and level of enthusiasm and confidence shown in their chosen subject.

There is no conferring between the judges! Each judge records their own scores. With 18 high quality posters to review, you can imagine how hard it must be to come to a final decision. But winners must be chosen. Thanks to our judges for their time and careful consideration of all the posters submitted.

Alongside their contest prize, our winners are awarded a digital badge they can use on their social media profiles.

2021 Poster Contest Winner Badge   2021 Best Poster Winner- Use Case

Jefferson Mao, Lambert High School, GA, USA
Toxicity Detection

View Poster Resources Image showing Jefferson Mao's Poster

Jefferson joined the HPCC Systems Intern program for the second time in 2021. In fact, this is the second time he has won this award having walked away with it in 2020. This year, Jefferson combined his interest in machine learning with his desire to create a practical solution for detecting toxic language, which could be used to filter interactions across chat rooms and other social media channels. His Toxicity Detection feature uses our GNN machine learning module to spot toxic language in the text of messages. Jeff was mentored by our LexisNexis Risk Solutions Group colleague, Bob Foreman (Senior Software Engineer), who is one of our ECL language trainers.

Congratulations Jeff. This is a great project and a use case that will be close to the heart of many people.

2021 Poster Contest Winner Badge   2021 Best Poster Winner- Platform Enhancement

Nikita Jha, Northview High School, GA, USA
Apply Docker Image Build and Kubernetes Security Principles

View Poster Resources Image showing Nikita Jha's Poster

Nikita is a high school student who joined the HPCC Systems Intern Program in 2021 and was mentored by our colleague Michael Gardner with support from Xiaoming Wang and Godson Fortil. Nikita’s intern project directly supports our Cloud Native platform focusing on that all important area of security. She looked at the Docker security components and security best practices  for Kubernetes. She has contributed a blog about using Hashicorp Vault as a security manager on our cloud native platform.

Congratulations Nikita on winning this well deserved award.

2021 Poster Contest Winner Badge   2021 Best Poster Winner – Research

André Fontanez Bravo
Big Data and Logistic Regression Applied to Analysis of Loan Requests

View Poster Resources Image showing André Fontanez Bravo's Poster

André is one of a number students whose academic studies are supported by the collaboration between his university and our colleagues, Hugo Watanuki (Senior Software Engineer) and Alysson Olivera (Software Engineer I) in Brazil. André’s project used the HPCC Systems Platform to handle the large amounts of data and he used our Machine Learning Library to create the models he used to carry out the analysis.

Congratulations to you André on being the first ever winner in this new award category.

2021 Poster Contest Winner Badge   2021 Best Poster Winner –  Data Analytics

Carina Wang, American Heritage School, FL, USA
Processing Student Images with Kubernetes on HPCC Systems

View Poster Resources Image showing Carina Wang's Poster

Carina is a high school student who joined the HPCC Systems Intern Program in 2021 and was mentored by our colleague David DeHilster. She is member of the Stallion Robotics Team 5472 at American Heritage School, which is run by Tai Donovan, the Robotics Program Director. The aim of Carina’s project was to provide a mechanism for their autonomous security robot to recognise known faces from images in a database. She used data from an augmented image set to train a GNN model on Azure using our Cloud Native Platform to classify the images. The aim is to allow a student to walk up to the robot and retrieve information as part of a larger, interactive security feature.

Congratulations to you Carina, very well done indeed!

2021 Poster Contest Winner Badge   2021 Community Choice Award Winner

Atreya Bain, RV College of Engineering, Bengaluru, India
Improvements on HSQL: An SQL-like Language for HPCC Systems

View Poster Resources Image showing Atreya Bain's Poster

Atreya Bain was the poster presenter who received the most votes from our Virtual Community Day Summit attendees.

Atreya joined the HPCC Systems Intern Program in 2021 to complete this project but his involvement started much earlier. Atreya’s university, RV College of Engineering in India, is a member of our Academic Program. Atreya began working on HSQL alongside our LexisNexis Risk Solutions Group Colleague, Arjuna Chala (Senior Director, Operations). Having completed an initial implementation in 2020, Atreya’s project in 2021, builds and improves on this new language, providing a solution that helps new users to get up and running with their big data analytics projects using HPCC Systems. Atreya was mentored by Arjuna Chala with support from Dr Shobha G and Professor Jyothi Shetty from RVCE.

Congratulations Atreya. It is wonderful to see your work recognised by the HPCC Systems Open Source Community.

Thanks and ways you can get involved

Thank you to all the students who made our 2021 HPCC Systems Poster Contest such a resounding success. Thanks also to all the colleagues and school/university teachers and professors who mentored the students and everyone who participated in voting for our 2021 Community Choice Award Winner.

Our poster contest presenters do a wonderful job of demonstrating the versatility of the HPCC Systems Platform. We look forward to seeing how others use HPCC Systems and the posters to emerge from these endeavours in the future.

If you have project ideas, or would like to participate in our academic program in some way, do get in touch. Perhaps you’d like to mentor a student on our intern program or be involved in our poster contest. We welcome your ideas, contributions and collaborations. They are an integral part of our vibrant open source community.