Welcome to our 2020 HPCC Systems Interns
The 2020 HPCC Systems Intern Program is now underway and we warmly welcome all students who are joining us to work on coding projects this summer. All our students are working remotely this year and in the true spirit of the program, this is no different to all the developers on our open source project who are currently doing the same thing.
Students who live near one of our offices and want be office based are usually given the choice to come into the office to work and we recommend that high school students are office based, to experience working physically with a team and following the rules of office based working. However, the challenges presented by COVID-19 this year, mean that the three high school students joining the program are also going to be fully immersed in the remote working experience.
We’ve adjusted the program a little to make sure our interns feel as connected as possible to their mentor, the team and each other, to give them the best possible intern experience during these challenging times.
Each student joining the HPCC Systems Intern Program has already put a great deal of effort into making their project a success. To be accepted on to the HPCC Systems Intern Program, students must submit a proposal to complete a specific project from our ideas list, or suggest one of their own that leverages HPCC Systems in some way. The proposal must include a detailed description of their aims and a timeline showing the tasks they expect to complete for each week of their internship. Before they complete their internship, students are expected to check in all code, submit tests cases and provide documentation.
Each student works alongside an HPCC Systems mentor, using the same working practices as all the developers who contribute to the HPCC Systems Open Source Project. We also welcome university professors and school teachers to the program, who are co-mentoring their student(s) alongside our LexisNexis Risk Solutions colleagues, providing additional support and guidance.
A warm welcome to the 7 students have joined the HPCC Systems Intern Program in 2020. We’d also like to extend this warm welcome to other students who are working on HPCC Systems projects over the summer.
Masters of Computer Science, Clemson University
Leveraging and evaluating Kubernetes support on Microsoft Azure
Yash Mishra is a member of Dr Amy Apon’s research team which carries out research on Big Data systems, focusing on optimizations and improvements at both the data and network layers. HPCC Systems is a sponsor of their Data Intensive Computing Lab and we have been collaborating with Dr Apon and her team since our Academic Program began. Yash was introduced to HPCC Systems in the Cloud Computing Architecture class at Clemson University and has been involved in identifying different configuration options to deploy HPCC Systems on the commercial cloud. Last year, he worked on a project looking at auto-provisioning HPCC Systems on AWS and entered a poster illustrating this work into our 2019 Technical Poster Contest (View Poster / Read Extract). More recently, he has moved on to using HPCC Systems on Microsoft Azure and spoke about deploying onto this cloud provider using our bare metal version in our January 2020 Tech Talk Series (Watch Recording / View Slides).
Yash’s intern project follows on from the valuable work he has been doing the using the bare metal version of HPCC Systems in the Cloud, but he is now moving on to using our new Cloud native platform to leverage the Kubernetes support for HPCC Systems and will also focus on performance measurements, cost analysis, looking at various configuration options. This, along with his previous work, will provide a comparison of running the HPCC Systems bare metal version and the new K8 support of cloud native HPCC Systems on Microsoft Azure. He will also supply a user guide as part of his internship which will be a valuable resource to our users. Yash’s mentor is Dan Camper (Senior Architect, LexisNexis Risk Solutions) who has been supporting Yash throughout his various HPCC Systems related research projects alongside Dr Amy Apon. More information about this project is available in the associated JIRA issue.
American Heritage School, Boca/Delray, Florida
Using the GNN Bundle with TensorFlow to train a model to find known faces
Jack is a 12th grade high school student who has developed an impressive amount of experience in Java, Python and C++ from the robotics and computer science courses provided by his school and his involvement in the Stallion Robotics Team 5472, run by Tai Donovan (Robotics Program Director and Instructor). If you have attended one of our Community Day Summits in recent years or followed us on social media, you may have seen demonstrations of the impressive robots Tai and the team have built from the ground up over the years. Jack is the Director of Programming for the team, who are currently working on an Autonomous Security Robot (Watch Demo) that can recognise potential risks on a school campus that might otherwise be missed by the human eye. Using object and facial recognition, they can capture faces and recognise them with 93% accuracy using Tensorflow. He submitted a poster into our 2019 Technical Poster Contest showcasing the progress the team has on made on the robot (View Poster / Read Extract).
Jack’s project involves processing the data from collected images using our Generalized Neural Network (GNN) Bundle with TensorFlow to train a model that can recognise known faces. Jack’s mentors are David De Hilster (Consulting Software Engineer, LexisNexis Risk Solutions), who has been supporting the robotics program at AHS for a number of years and Xiaoming Wang (Senior Consulting Software Engineer, LexisNexis Risk Solutions), who will be providing technical guidance. Tai Donovan is providing the support needed to link in with the school project and our Machine Learning Library expert, Roger Dev (Senior Architect, LexisNexis Risk Solutions) will also be available to provide any help Jack may need when using the GNN bundle. Robert Kennedy (Research Assitant, Florida Atlantic University), is interning with us this year (see below) and has also contributed to our GNN bundle. Jack may find their paths cross which fits in well with the collaborative nature of our intern program, development team and open source project. More information about this project is available in the associated JIRA issue.
Lambert High School, Suwanee, GA
Establish HPCC Systems on the Google Cloud Platform
Jefferson is a 12th grade high school student who has his eyes set on studying business at the University of Pennsylvania in the future. He is already something of an entrepreneur having founded Philosophy Robotics LLC, a software company that produces software for resellers such as automated checkout services, reselling tools and web scraping applications.
He heard about the HPCC Systems intern program when taking part in CodeDay Atlanta, a 24 hour event where student programmers and designers get together to create apps and games. LexisNexis Risk Solutions is a sponsor of this annual event which was hosted at our office in Alpharetta in 2019 and Jefferson was a Best in Show prize winner.
Jefferson’s project contributes to our ongoing HPCC Systems Cloud native development project. As well as working through the steps required to use HPCC Systems on the Google Cloud platform, he will design a web application for creating new HPCC Systems cluster on this cloud service. He is also exploring Google Cloud Anthos, (a new Google Kubernetes deployment platform), with an HPCC Systems cluster.
The final part of Jefferson’s project involves analysing how running HPCC Systems on the Google Cloud works in comparison with other cloud services (such as AWS), looking at performance, security and cost effectiveness. More information about this project is available in the associated JIRA issue. Jefferson’s mentor is Xiaoming Wang, Senior Consulting Software Engineer, LexisNexis Risk Solutions
Research Assistant, Florida Atlantic University
Implement a Multi-node, Multi-GPU Accelerated Deep Learning Algorithm using GNN
We are delighted to welcome Robert Kennedy back to join our intern program for the third year running.
During his 2020 internship, he aims to expand on our existing GNN bundle to improve our GPU accelerated neural network training. By the end of his internship, HPCC Systems will be able to train neural networks, at scale, across many GPUs, across many GPU enabled nodes using different parallelisation techniques that are suited to deep learning tasks. Robert’s work will increase the robustness of the underlying GNN library by identifying areas for improvement while documenting best practices to be used when training neural networks on GPUs using the GNN bundle. More information about this project is available in the associated JIRA issue.
Throughout his interns projects he has been supported by Dr Taghi Khoshgoftaar, Florida Atlantic University, who is an old friend of the HPCC Systems Open Source Project. As in all previous years, Robert’s mentor is Tim Humphrey, Consulting Software Engineer, LexisNexis Risk Solutions.
You can find out about Robert’s previous projects using the links below. He has also entered our poster contest during his previous internships, placing third in 2018 and a well deserved first in 2019.
- GPU Accelerated Neural Networks on HPCC Systems – Tech Talk Presentation / View Poster / Community Day 2019 Presentation (Watch Recording View Slides)
- Begin development of a software library that would provide HPCC Systems distributed neural network training – Tech Talk Presentation / View Poster / Community Day 2018 Presentation ( Watch Recording / View Slides)
Masters in Data Science, New College of Florida
Applying HPCC Systems Word Vectors to SEC Filings
Matthias Murray graduates with a Masters in Data Science in 2020. He previously studied a BA in Maths and Physics also at New College of Florida, producing a thesis on Thin Film Fracture and Finite Element Analysis Fundamentals.
This project involves reporting on the current status of vectorisation and NLP representation of SEC filings and then compiling identified SEC filing cases and their intersection from a LexisNexis perspective. He will need to sort and transform SEC data, creating a function to convert the data into a format required by the HPCC Systems Word Vectors ML bundle. More information about this project is available in the associated JIRA issue.
Matthias has lots of ideas about how how the results of his project may be of practical help in a business setting, including providing a tool for calling particular filing details for a specific company, predictions such as expected analyst rating upgrades/downgrades before they are officially issued and providing a visualisation tool on extracted filings showing interesting patterns.
Matthias’s mentor is Lili Xu, Software Engineer III, LexisNexis Risk Solutions but he is also being supported by Professor Burcin Bozkaya from New College of Florida.
Masters in Computer Science
Kennesaw State University, USA
Implement a Preprocessing Bundle for the HPCC Systems ML Library
Vannel Zeufack joins the HPCC Systems Intern Program for the second year running having completed an internship with us in 2019. Last year he completed a project to Develop and Assess Unsupervised Anomaly Detections Methods using HPCC Systems. If you want to know more about this project, listen to Vannel present at our Tech Talk in September 2019. You can also see the poster he entered into our 2019 Technical Poster Contest, where he placed third. His 2019 blog Journal also provides details about the progress made on his project as well as his HPCC Systems internship experience.
Vannel’s project this year is quite different to the one he completed in 2019. The purpose of his 2020 project is to make the data preprocessing phase of machine learning on HPCC Systems easier and faster. He also plans to produce a preprocessing bundle tutorial to demonstrate how the different modules in the preprocessing bundle could be used together to easily prepare data for a machine learning project. More information about this project is available in the associated JIRA issue.
Vannel’s mentor is Lili Xu, Software Engineer III, LexisNexis Risk Solutions.
Hills Road Sixth Form College, Cambridge, UK
Execute Multiple Workflow Items in Parallel
Nathan Halliday is a high school student who has just completed his A’ Levels. He will start university later this year, having received an offer to study mathematics at St Anne’s College, University of Oxford. Nathan plays 1st trombone in a 15 piece band called ‘The Umbrella Big Band‘ and enjoys playing tennis and reading murder mystery books. He is interested in quantum computing, which may influence his future career path.
The aim of Nathan’s project is to restructure the workflow engine to create a graph of tasks that can be used to track which tasks have been executed and which tasks should be executed next. Part of this work is to ensure that there are no multi-threading issues in the workflow engine. the plan is to support ROXIE and Thor by the end of Nathan’s internship. Nathan has already submitted his first pull request to add some workflow examples to our ECL regression suite. More information about this project is available in the associated JIRA issue.
Nathan’s mentor is Gavin Halliday, Enterprise/Lead Architect, LexisNexis Risk Solutions.
And there’s more…
There are other intern programs available through LexisNexis Risk Solutions and as they get underway, I am hearing about students on these programs who are working on HPCC Systems related projects. Here are some more students working on HPCC Systems projects this summer and throughout the year that deserve to be recognised for the contribution they are making to our open source project. Keep checking back for details of more research and cool tools underway over the summer.
If you know of a student working on an HPCC Systems related project that you’d like us to feature, get in touch, we’d love to hear about them.
Masters of Computer Science, Clemson University
Lohith is working on Jerry Jacob’s team (Manager, Software Engineering, LexisNexis Risk Solutions) with Chris Human (Software Engineer II, LexisNexis Risk Solutions) on a project called REAL-BI (ROXIE Enabled Business Intelligence). Chris and Lohith recently showed us the progress made on this project which allows users to display visualisations of data that can be shared with others using the application. The plan is to eventually be able to share these visualisations outside of the application. Lohith completes his internship in July and we plan to feature his work in more detail in a blog post. Find out more about Lohith on his website.
Information Systems, Federal University of Santa Catarina (UFSC), Brazil
Using high performance computing applications in containerized cloud environments
Lucas’s university course covers a broad range of subjects including Object Oriented Programming, Data Structures, Database (Relational and up to newSQL), as well as some marketing and business modules. In his spare time he plays the guitar and has a keen interest in snowboarding and computer games, particularly D&D!
Lucas is working with Hugo Watanuki (Senior Technical Support Engineer, LexisNexis Risk Solutions) and is completing a year long internship in the Brazil office. His project involves getting HPCC Systems up and running in a Cloud environment using Kubernetes, analysing which is the best provider to use and also looking at configuration options. Later in his internship, he will be looking at performance comparisons between the bare metal and Cloud systems.
Welcome to the HPCC Systems open source project!
It’s great to be able to welcome these students to the HPCC Systems open source project and introduce them to our community. I am looking forward to hearing about their achievements and wish them all an enjoyable and productive experience.
For those who are already thinking about 2021 internships, the proposal application period for the HPCC Systems intern program will open later this year in the Fall. Subscribe to our student forum to get notifications and find out more about the program here.