Academic Program 2018 – Achievements and Reflections
Our Academic Program has been running since the HPCC Systems platform went open source in 2011. Through our collaborations with an increasing number of academic partners, we help to train and develop skills in big data analytics and coding. We support research projects focused on finding solutions for important problems affecting today’s world. It’s also an opportunity for us to attract talented technologists to the LexisNexis Risk Solutions and wider RELX Group family.
Some of our academic partners have been with us right from the very beginning and this program continues to grow as we welcome universities, professors and students who are new to HPCC Systems every year. As we look forward to the collaborations to come in 2019, it’s also a great opportunity to look back at what was achieved in 2018.
Sponsoring research projects
Grants are awarded to universities to support the work of students studying a STEM related subject, such as computer science, mathematics, statistics, physics, engineering, information technology etc. The projects leverage HPCC Systems in some way, often involving machine learning and data mining. A number of papers have been published with the support of our Academic Program. Some of the more recent ones are mentioned in this blog and we are continually adding more to the growing list which is available on our wiki.
At Clemson University, we supported a PhD student working as a research assistant on a project involving machine learning of large data sets using HPCC Systems. This student, Lili Xu, also joined our intern program for the third year running in 2018 and spoke about the work she completed alongside our LexisNexis colleague, Gus Reyna, at our Community Day Summit. She also presented a poster about this work at our annual poster contest, hosted at the same event. We are very pleased that she has now become a LexisNexis colleague having completed her studies.
Lili Xu presenting at a workshop held at New College of Florida
Not only is Lili continuing her work with machine learning algorithms, she is also still contributing to the success of our Academic Program by presenting big data analytics challenges at our workshops, which are designed to get students up and running using HPCC Systems. This is a great success story, demonstrating how our academic program has gone full circle in exactly the way we intended.
At North Carolina State University, we supported the research of a student focusing on the optimisation of cloud computing resources and how to choose the best cloud configuration. A supporting paper CAT: Cloud Architecture Tuning is currently under submission. Another student at NCSU carried out some benchmarking tests as part of the final project of his Masters in Computer Science. In his report (HPCC Systems Benchmarking), he benchmarks HPCC Systems alongside Hadoop studying scalability, time for execution and system metrics. The aim was to identify trends in performance and establish which worked best for a particular workload.
In the UK, we continue to support a PhD student at Imperial College, London who has been working on a project that involves using HPCC Systems alongside Tensorflow. He has been liaising with our colleagues to work through setup issues which have provided insights that others may find useful, should they want to do the same. One of his research topics focused on predicting traffic conditions to help make rerouting decisions either when congestion occurs or when it can be predicted due to known events taking place at a specific location. A paper was published outlining this work in June 2018. Jingqing Zhang presented about his findings at our Community Day Summit (Watch Recording / View Slides).
In India, at the Rashtreeya Vidyalaya College of Engineering, students have been working on quite a wide variety of projects with guidance from Dr Shobha G, Professor Jyoti Shetty and mentoring support from the HPCC Systems team. One project involved the design and development of a plugin to support IoT applications on HPCC Systems. This has been fully documented and is available in our github repository.
Another project provided us with the Data Skew Profiler (available here in our github repository), which automates the process of skew prediction by analysing the execution of the graphs associated with a job running on an HPCC Systems cluster. Using the Random Forest Regressor Model, it is able to predict the probable performance skew for a given set of queries.
As well as providing grants which support specific research projects, we also share our industry knowledge and experience with Academic Partners who seek to extend their coursework outside of the classroom.
At Clemson University, we supported four undergraduate students in 2018, who worked alongside our machine learning experts on our Text Search bundle, focusing on time series forecasting. This research is ongoing as part of an undergraduate research class and we hope to see a research paper published some time in 2019.
We were delighted to hear that one of these students is keen to go on and study for a PhD on completion of his BS in Computer Science later this year. Dr Amy Apon, (Professor and Chair, Division of Computer Science, Clemson University) said:
“I am confident that the research experience funded by HPCC Systems was a key factor in [this] decision“
Dr Vincent Freeh (Assistant Director of Undergraduate Programs and Associate Professor, NCSU) also incorporates the use of HPCC Systems and big data analytics into his coursework to encourage students to use HPCC Systems. Dr Freeh, Dr Tim Menzies and their students have presented at our annual Community Day Summit many times, demonstrating their achievements using our platform.
At Kennesaw State University, the HPCC Systems Lab which was setup in 2015, provides a space for students to work on their data science projects together. A certification program in Big Data Analytics which uses HPCC Systems was launched in 2017. The format of this course has been designed to help students develop skills and experience in a number of areas including database systems, data warehousing and mining, modern programming languages (R, Python and ECL), computer architecture including the cloud and more.
Dan Camper, Richard Taylor and Arjuna Chala outside the KSU HPCC Systems Open Lab
Also at Kennesaw State University, the LexisNexis HPCC Systems Endowment Scholarship contributes to the tuition fees, housing, books and meals providing financial assistance to allow a student (based in the College of Computing and Software Engineering) to completely focus on their studies.
Workshops and Hackathons
Workshops and Hackathons are becoming a regular feature for the HPCC Systems team. Our workshops are designed to introduce students to the HPCC Systems platform. Real world examples of data mining using machine learning are used to demonstrate the valuable insights into data that can be achieved. Students learn how to use the ECL language, which was specifically designed for writing queries for big data analytics. The workshop we presented at New College of Florida early in the year set the tone for workshops for the rest of the year. Students analysed 18 months of yellow cab fares in New York City after having worked through the following workshop topics:
- Exploring the shape of the date
- Looking at the data quality
- Learning how to transform and clean the data
- Appending 18 months of periodic weather data for NYC
- Deriving attributes for analysis
We have been providing pre-event training workshops the day before our Community Day Summit for the last few years. In 2018, delegates joined an introductory course in the morning about data extraction and transformation with the ECL language. In the afternoon, a deep dive approach was taken designed to demonstrate how to manage disparate data efficiently including how to profile, aggregate and analyse data. These workshops brought people together from across our open source community who embraced the lunch time social gatherings to meet, discuss and share their knowledge and experience.
Already in 2019, we have presented two workshops. One at the University of Georgia as part of UGAHacks4 and another at the Oxford University, Mathematical Institute in the UK.
Dan Camper presenting at a workshop recently held at the Mathematical Institute, Oxford University with Lili Xu presenting remotely from the USA
HPCC Systems is a regular sponsor of the Kennesaw State University College of Computing and Software Engineering Hackathon. In 2018, we were supported at this event by our industry partner DataSeers. Two challenges were provided using HPCC Systems and our ECL-ML machine learning library to build predictive models relevant to the financial industry. One of the challenges involved identifying ‘bad actors’ (for example, those appearing on an OFAC list) within the main user dataset. A second challenge was designed to get students thinking about data cleaning and the grouping together of merchant names that can be listed inconsistently in a dataset. The event was a great success. Watch this video, presented by Adwait Joshi, CEO, DataSeers to find out how students approached our big data challenges.
This event brought together partners from both our academic and industry community, supporting students alongside our LexisNexis colleagues, Arjuna Chala, Dan Camper and Richard Taylor.
Plans to take part in this event in 2019 are already in progress!
Interns and Posters
Ten students joined the HPCC Systems intern Program in 2018 from 8 different universities. This was almost double the number that joined the program in 2017.
Eight of the ten students who joined our intern program this year.
Back row from the left: Robert Kennedy, Aramis Tanelus, Everett Matthew Upchurch Butler, Shah Muhammad Hamdi, Saminda Wijeratne.
Front row from the left: Farah Alshanik, Nicole Navarro, Lili Xu
To be accepted on to this program, students must submit a proposal to complete a specific coding project during their 12 week internship. They can either choose a project from our list of available projects, or suggest one of their own that:
- Leverages HPCC Systems in some way.
- Contributes a new feature or enhancement.
- Provides an interesting use case.
The projects completed in 2018 covered a wide spectrum of topic areas from machine learning algorithms and use cases to platform related enhancements and development research. Details about all the intern projects completed in 2018 are available on our Student Wiki.
Each intern presented a Tech Talk about their achievements. Catch up on their presentations in episodes 16, 17 and 19. Two interns also presented at our Community Day Summit:
- Lili Xu – Using HPCC Systems ML to Map Thousands of Public Records Data Descriptions to Standard Codes (Watch Recording / View Slides / See Poster)
- Robert Kennedy – Parallel Distributed Deep Learning on HPCC Systems (Watch Recording / View Slides / See Poster). Robert recently had a paper published about this research and you will find his Distributed Deep Learning Library available in the HPCC Systems GitHub Repostory.
Our Technical Poster Contest, held at our Community Day event, showcased the work of ten students. Eight interns from our 2018 program presented posters and also two students working on projects as part of our wider Academic Program. The winners were:
- 1st Place – Saminda Wijeratne – Georgia Institute of Technology
MPI Proof of Concept
- 2nd Place – Nicole Navarro – New College of Florida
Measuring the Geo Social Distribution of Opioid Prescriptions
- 3rd Place – Robert Kennedy – Florida Atlantic University
Distributed Deep Learning with TensorFlow
Poster Contest 2018 Winners.
Congratulations to Saminda Wijeratne, Robert Kennedy and Nicole Navarro
See all the posters entered into our 2018 contest in our Poster Presentation Wiki and read our Fly on the Wall blog post about this event to immerse yourself in the experience!
As I mentioned at the start, one of the aims of our Academic Program and in particular our intern program is to seek out and attract talented individuals to join the LexisNexis family. Not all students are looking for employment the year they intern with us, but if they are, we want to find them an opportunity that suits them. We were delighted to welcome 2 students from our intern program to the LexisNexis family in 2018.
Anyone looking for an internship in 2019 or who knows someone who is, please note that the 2019 proposal period is open now and the final deadline is Friday 29th March. Read our blog to find out more about the experience and application process.
Our first Tech Talk speaker in 2018 was Chris Gropp, a PhD Candidate from Clemson University, who spoke about Asking the Right Questions with Machine Learning (Go to the Tech Talk / Read the blog). Throughout the year, our Academic Partners have been generous with their time, presenting at our Tech Talks on topics of interest to our wider community. Tech Talk recordings are available on catchup and if you are interested in finding out more about the wide range of topics covered, browse the full list on our wiki.
Itauma Itauma, PhD Candidate, Kaiser University, spoke during Tech Talk Episode 12 about Conducting Exploratory Analysis in Education Research Using HPCC Systems. A subject he returned to later in the year at our Community Day Summit, when he presented an extremely popular breakout presentation about Predicting College STEM enrollment using HPCC Systems in Education Research (Watch Recording / View Slides). Itauma also entered a poster into our 2018 poster contest on Cervical Cancer Risk Factors: Exploratory Analysis Using HPCC Systems and you will find the supporting paper here.
Tai Donovan, Robotics Director at American Heritage School in Florida, spoke during Tech Talk Episode 14 and presented at our Community Day summit about the Autonomous Agricultural Project (Watch Recording / View Slides). He brought the robot and his team to demonstrate what they have achieved (View Robotics Demo). We were delighted to present our Community Recognition Award 2018 to Tai in appreciation of his dedication and the success of the robotics program he has developed.
One of the high school students from the robotics team, Aramis Tanelus, completed an internship with us over the summer implementing APIs for HPCC Systems Data Ingestion for Common Robot Sensors. He presented about his progress during Tech Talk Episode 16 and entered a poster into our 2018 contest.
Aramis presents his poster at our 2018 poster contest.
From the left: HPCC Systems mentors, David DeHilster and Kevin Wang with Aramis Tanelus and his teacher from American Heritage School, Tai Donovan
The theme of our Community Day Summit in 2018 was Innovation and Reinvention Driving Transformation. There were four tracks throughout the day:
- HPCC Systems in Industry
- HPCC Systems in Academia
- Roadmap Tech Talks
- Breakout sessions
It was a full, action packed event providing opportunities to make new connections, reconnect with old friends, share ideas and discuss the future. Our Fly on the Wall blog walks through the events of the day, providing links to the recordings of each presentation.
Late in the year, we heard the wonderful news that Everett Matthew Upchurch Butler, BS in Information Technology at Kennesaw State University, had been awarded the Student Service award by the IT Department at Kennesaw State University.
Matt spoke during Tech Talk Episode13 about The Future of Automotive Technology: Assessing autonomous vehicle risk implications using simulated data. Matt’s interest in this area was fuelled by the work his Hackathon team completed as part of our 2017 KSU hackathon challenge. He took this theme and worked on it some more, producing some very interesting findings using HPCC Systems, which he shared not only as a Tech Talk presentation, but also as a poster entered into our contest in 2018. He was presented with the Student Service award in recognition of this work and the contribution he made to our Academic Program and community.
Everett Matthew Upchurch Butler receiving his Student Service award.
Presented by Dr Rutherfoord, Interim Assistant Dean, College of Computing and Software Engineering, Kennesaw State University
Roll on 2019!
What a successful year 2018 was for the HPCC Systems Academic Program. So much so, that I have really only given you a taste of all the events, contributions, connections and collaborations that have taken place in just 12 months.
The final word has to be a big ‘thank you’ to all those who have taken part in and supported our Academic Program not just last year, but every year since it began.
We look forward to celebrating more successes in 2019 and beyond!