Before I introduce you to the students and their projects, it's important to mention that they have already put a great deal of effort into making their projects a success. To be accepted on to the HPCC Systems Intern Program, students must submit a proposal to complete a specific project from our ideas list, or suggest one of their own that leverages HPCC Systems in some way. The proposal must include a detailed description of their aims and a timeline showing the tasks they expect to complete for each week of their internship. Before they complete their internship, students are expected to check in all code, submit tests cases and provide documentation.
Each student works alongside an HPCC Systems mentor, using the same working practices as all the developers who contribute to the HPCC Systems Open Source Project. We also welcome a number of university professors to the program, who are co-mentoring their student(s) alongside our LexisNexis Risk Solutions colleagues, providing additional support and guidance.
By mid July, all students will have started their internship. So far, eight have already started. Let's meet them!
North Carolina State University
Cleaning and Analysis of Collegiate Soccer GPS Data in HPCC Systems
Chris has been already been using HPCC Systems in his work on the Athlete 360 project, which is led by Dr Vincent Freeh, who is the Assistant Director of Undergraduate Programs & Associate Professor at NCSU. Chris's intern project leverages his knowledge of Python and SQL to build programs and a database that will support an athlete data monitoring platform for the Strength and Conditioning department. The plan is to collect GPS data with the help of the NCSU men's and women's teams which will be processed and analysed using HPCC Systems. Some examples of the data analysis he will perform include, typical movement patterns for certain drills or differences between positions, distances for typical efforts at different speeds and distance recorded in GPS data for a specific effort against a known distance
There are a number of other measurements Chris has his eye on which form part of a wish list to extend this work to find interesting insights into heart rate data for the athletes.
Chris will be mentored by our colleague Raja Sundarrajan (LexisNexis Risk Solutions, Software Engineer III) with co-mentoring from from the professor leading the Athlete 360 project at NCSU, Dr Vincent Freeh.
PhD Computer Science
Domain Based Common Words List Using High Dimensional Representation of Words
It's great to have Farah join the team again in 2019. Farah completed an intern project with us in 2018 to Implement equivalence terms for the Text Search Bundle. This project involved answering the question How similar does a term in a search request need to be to a term in the document to be considered a term match? The aim of this project was to provide the ability to automatically create equivalents for initialisms and acronyms. It also provides a means of applying a table of equivalents and the attributes to build that table from an open source thesaurus such as Moby. Farah presented about this project at one of our Community Tech Talk webcasts (Watch Recording / View Slides) and also entered a poster into our 2018 Technical Poster Contest 2018 which took place during our 2018 Community Day Summit. (See Farah's poster / Read the abstract).
The aim of her 2019 internship project is to use a text vectors bundle (CBOW) with HPCC Systems to find the common words for any datasets. Her project is based on the hypothesis that eliminating domain based common words will enhance the performance of the classification methods used as well as improve the results of topic modeling. The ability of HPCC Systems to massively scale-up and its fast distributed data storage will enhance the performance of the methodology.
Farah's mentor for the second year running is our LexisNexis Risk Solutions colleague Kevin Wilmoth (Consulting Software Engineer) and her supervising professor at Clemson University, Dr Amy Apon will also be providing mentoring support and guidance. Follow Farah's experience by reading her blog.
PhD Computer Science
Florida Atlantic University
Create HPCC Systems VM on Hyper V
Robert is joining the team as an intern for the second year running. In 2019, he completed a project that involved integrating ECL and third party open source libraries to extend our Deep Learning capabilities. He presented about this work to our open source community on the main stage at our Community Day Summit (Watch Recording / View Slides) and also presented a Tech Talk webcast (Watch Recording / View Slides) to demonstrate what he had achieved. He was also a third place prize winner at our 2018 Technical Poster Contest 2018, which took place during our 2018 Community Day Summit. (See Robert's poster / Read the abstract).
Robert will be researching and developing GPU accelerated Deep Learning algorithms on HPCC Systems. In his proposal, Robert talks about how GPU acceleration vastly improves Deep Learning training time. His work will produce the first GPU accelerated library (to his knowledge) and expand our deep neural network capabilties. Creating an HPCC Systems VM on Hyper V as part of this project, will increase the number of configurations on which HPCC Systems can be deployed and provide the building blocks needed for the possible future development of different distributed configurations that we don't currently provide, such as model parallelism and enabling HPCC Systems to Deep Learn using asynchronous algorithms.
Robert's mentor for the second year running is our LexisNexis Risk Solutions colleague Tim Humphrey (Consulting Software Engineer) and his supervising professor at FAU, Dr Taghi Khoshgoftaar will also be providing mentoring support and guidance. Follow Robert's experience by reading his blog.
Bachelor of Engineering (Computer Engineering)
University of Mumbai, India
Cluster Deployment with Juju Charm
The aim of this project is to deliver charms for the HPCC Systems platform and HPCC Systems plugins, ported to the Charms helper framework with corresponding tests in amulet.
Yash was the first student to join the program at the start of May and is already making good progress. Follow his experience by reading his blog.
Yash's mentor is Xiaoming Wang, who works on a number of cloud solutions for HPCC Systems development.
BTech in Computer Science
RV College of Engineering, Bengalaru, India
Fraud Detection in Value Based Cards
The full title of this project is 'Detection of fraud in stored-value cards by applying CNN and Random Forest machine learning models on transactional data to classify a transaction as “Fraudulent” or “Not fraudulent'. These methods will be compared for efficacy.
In his proposal, Akshar pointed out how the features of a stored value card, while attractive to consumers from data privacy and anonymity points of view, are also susceptible to fraud. Identifying fraudulent methods in a cost effective and timely manner is a challenge for companies who supply these cards.
Akshar is seeking to prove that the machine learning model he has chosen provides an easier method of solving the problem of identifying anomalies quickly that may suggest a fraudulent transaction has taken place.
Akshar will be mentored by Roger Dev, who is the leader of the HPCC Systems Machine Learning Library with co-mentoring and support from Dr Shobha G and Jyoti Shetty from Rashtreeya Vidyalaya College of Engineering (RVCE).
Bachelor of Engineering
RV College of Engineering, Bengalaru, India
Evaluation of Machine Learning Algorithms
Surya's project involves providing additional evaluation methods for our Machine Learning Library, including running comparisons with existing benchmarks, the addition of new evaluation metrics and the enhancement of performance checking.
These evaluations will help the performance of the models and enhance their ability to choose the appropriate method for the given data. The aim is to provide an evaluation tool that integrates well with features such as the Myriad Interface.
Surya will be mentored by Arjuna Chala (LexisNexis Risk Solutions, Sr Dir Operations) with co-mentoring from Lili Xu (LexisNexis Risk Solutions, Software Engineer III), Dr Shobha G and Jyoti Shetty from Rashtreeya Vidyalaya College of Engineering (RVCE). Follow his experience by reading his blog.
Sathvik K R
Bachelor of Engineering (Computer Science)
RV College of Engineering, Bengalaru, India
Interfacing Octave with ECL
The aim of this project is to support Octave by allowing the embedding of Octave database queries within ECL code. This will be done with the help of simple wrapper classes to handle scalar values and structured data, including multi-threaded access from the ECL side. This will add to the growing list of embedded languages and datastores we currently support.
Sathvik will be mentored by Dan Camper (LexisNexis Risk Solutions, Sr Architect) with co-mentoring and support from Dr Shobha G and Jyoti Shetty from Rashtreeya Vidyalaya College of Engineering (RVCE). Follow his experience by reading his blog.
Masters in Computer Science
Kennesaw State University, USA
Developing and Assessing Unsupervised Anomaly Detections Methods using HPCC Systems
Vannel has been researching various log analysis techniques to detect abnormal activities on computing/network systems alongside his professor at KSU. His aim is to adopt a number of machine learning and big data analysis techniques, to implement an algorithm that has the ability to detect unknown cybersecurity threats. The ideas for his project are based on this paper: Experience Report: System Log Analysis for Anomaly Detection by Shilin He, Jieming Zhu, Pinjia He and Michael R. Lyu.
Vannel will be mentored by Arjuna Chala (LexisNexis Risk Solutions, Sr Dir Operations) with co-mentoring and support from Lili Xu (LexisNexis Risk Solutions, Software Engineer III). Follow his experience by reading his blog.
Welcome to the HPCC Systems platform team!
It's great to see such an interesting and impressive set of projects. A warm welcome to all our interns and a big thank you to their mentors and professors who will be supporting them as they complete their projects.
Check back here in a few week's time to find out how these projects are progressing and to meet the rest of the students who are due to start their internships by mid July.
For those who are already thinking about 2020 internships, the proposal application period will open later this year in the Fall. Subscribe to our student forum to get notifications and find out more about the program here.