If you have attended Community Day at one of our HPCC Systems engineering summits, you will have noticed a significant number of presentations from academic institutions. This is no coincidence. Since HPCC Systems went open source in 2011, we have collaborated with a number of universities, sponsoring research papers and projects requiring the processing of big data, making the HPCC Systems platform available for download to do the heavy lifting.
One problem facing those who work in the field of big data analytics is that there can often be significant barriers to mining data. The big data explosion is nowhere near its peak and it permeates many aspects of our lives now and will continue to do so into the future. The question is how can we successfully scale up computing infrastructure and algorithms while improving efficiency at the same time. This is the focus of one of the research projects currently being carried out by the HEADT Centre (Humboldt Elsevier Advanced Data and Text Centre). The HEADT Centre is a joint collaboration between Humboldt Universitat in Berlin and Elsevier (part of the RELX group and project partner to LexisNexis). There is also a second project running concurrently focusing on data integrity issues such as data falsification and fabrication; issues of significant interest to researchers, universities and publishers alike. At the HPCC Systems Engineering Summit in 2016, Professor Johann-Christoph Freytag and Fabian Fier from the HEADT Centre, spoke about text searching, highlighting the difficulties involved in finding similarities. They shared their approach to this problem and some of their findings, including some comparisons between implementations of algorithms on Hadoop vs HPCC Systems.
I’m sure we all remember the Ebola outbreak in West Africa which hit the global news networks a couple of years ago. Containing that outbreak was a monumental task with infections doubling every few weeks at its height and thousands of deaths. At Florida Atlantic University (FAU), Professor Borko Furht has been working on a research project to provide a model that can predict how fast an outbreak of a disease, like Ebola, can spread based on factors such as geographical location and population density. It has the potential to have a major positive impact on public health and safety. This project is funded by a grant from the National Science Foundation’s Industry-University Cooperative Research Center for Advanced Knowledge (CAKE). LexisNexis is a member of CAKE and is not only providing data to help develop and model the program, but is also providing support to the research team, who will be using HPCC Systems to process and analyse the massive amount of data involved in this and other research projects at FAU.
Much has been written about the ‘talent gap’ in big data analytics and we are certainly aware of it. We are making our own contribution to narrowing this gap by working with universities, helping them to provide the best courses, facilities and tools for students to achieve the best results possible. The HPCC Systems Lab at Kennesaw State University (established in 2015) is a designated space in the College of Computing and Software Engineering where students can work together on their projects. LexisNexis also recently gifted an endowment, which this faculty is using to setup a scholarship program that will run for the foreseeable future. We have also worked together to launch a certification program in high performance computing which leverages HPCC Systems. This program is open to students and will soon also be available to industry professionals who may not be pursuing a graduate or postgraduate degree.
Over the years, LexisNexis has given research grants to a number of universities contributing to their work on some important problems affecting today’s world. By giving access to HPCC Systems as part of the package, they also get the benefit of using a fast, scalable big data processing platform that was built specifically for that purpose. Watch recordings made by a snapshot of the professors and students involved in our academic program:
- Amy Apon – Professor, School of Computing Clemson University, talks about training students to use HPCC Systems in a day.
- Dr Vincent W. Freeh – Associate Professor, North Carolina State University, talks about using HPCC Systems in his research.
- Victor Herrera – Graduate Student and Research Assistant at the Department of Computer Science, Florida Atlantic University, talks about how ECL helped him to develop machine learning algorithms for his project.
- Itauma Itauma, Wayne State University talks about Unsupervised Learning and Image Classification in High Performance Computing Cluster on Community Day at the HPCC Systems Engineering Summit in 2016.
Great things are also being done by high school students as they reap the benefits of modern technology, while at the same time pushing the boundaries further. Two high schools in Florida have teams working on successful robotics projects. High school students at NSU University School in Fort Lauderdale, have built an autonomous robot which recently won the Amaze Award at the 2017 Vex Robotics World Competition. An autonomous golf cart built by high school students at American Heritage School in Delray Beach, also won the NASA sponsored Engineering Inspiration Award at the Brazos Valley Regional tournament earlier this year. By sponsoring projects like these, we want to encourage and invest in helping today’s youngsters follow in the footsteps of our most experienced software developers and system architects, some of whom may have cut their own teeth tapping away in BASIC on their BBC Microcomputer or early Amstrad in their bedroom as a teenager!
There is another way that today’s aspiring technologists can get coding experience with us and that is by applying for an internship on the HPCC Systems open source project during the summer recess. This is the third year we have run this program which was born out of our successful involvement with Google Summer of Code in 2015. The projects we offer are significant, ranging from implementing machine learning algorithms, core platform features or enhancements, to providing connectors between HPCC Systems and common programming languages or datastores. The application period for the HPCC Systems Intern Program opens towards the end of September every year and is open to high school students, undergraduates, Masters and PhD students. Applicants must submit their resume and a proposal outlining the tasks involved in completing their chosen project. Project mentors are available to give advice or comments during the proposal stage as well as guidance and support during the internship. Thanks to modern day technology which supports remote working, students from across the globe can be accepted on to this program. US based students working on an HPCC Systems related project, are eligible to enter our annual Technical Presentation Competition which is held at the HPCC Systems Engineering Summit.
Here at LexisNexis, we believe that having a solid academic program supporting STEM related subjects provides enormous mutual benefits to universities, high schools their students and our own business. We have recruited talented individuals as a result and as we expand the opportunities we offer to the young technologists of today, we have high hopes of increasing this tally in the future.