While this is the first year we have run a GSoC (Google Summer of Code) program, it is not the first year we have run an intern program.
We have a number of affiliations with universities in the US including Florida State, FAU, North Carolina State, Clemson, Georgia Tech and more including University College, London in the UK. Students have successfully completed projects for us particularly in the Machine Learning area, including the coding of decision trees and random forest and the porting of logistic regression to PB-BLAS. We currently have a student working with us from Florida Atlantic University, who is developing multi-layer perceptrons, back propagation and deep learning. Working with interns has been a good experience for us and is something we will continue to do.
We are therefore pleased to be mentoring 2 students who will be working on HPCC System® projects this summer. Both projects originated as GSoC 2015 proposals but since we did not have enough slots to accept them, we have included them in our summer intern program.
Machine Learning - CONCORD Algorithm
Syed Rahman is working on this Machine Learning project. Syed’s GSoC proposal was particularly interesting to us because it was an idea that he had developed himself to Implement High Dimensional Covariance Estimate Algorithms in ECL. Syed is studying for a PhD in Statistics at the University of Florida. The mentor for this project is John Holt who is one of the founders of the HPCC Systems Machine Learning Library. The CONCORD algorithm Syed has suggested will be a noteworthy addition to our ML Library adding real value. Correlations are extremely useful in the task of data analysis and working efficiently with high dimensional data is critical in many ML applications. Syed has been preparing the way for successfully implementing this project by getting to grips with running the HPCC Systems® platform, learning ECL, as well as refining his development plan.
Code Generator - Child Querys
Anshu Ranjan will be working on the HPCC Systems® platform project Improve Child Query Processing. This project involves delving into the code generator which is a highly specific and complex area. The mentor for this project is Gavin Halliday who is the ‘keeper of the keys’ to the code generator, so Anshu will have access to the best guidance and knowledge possible. Anshu is studying for a PhD in Computing Engineering at the University of Florida. This is an important project addressing some long standing issues that will help us to improve the speed and reduce the generated code size for complex queries that perform significant processing on child datasets. Anshu has been preparing for the coding period by improving his understanding of the platform and working on some of our online training courses. Evaluations will be due for interns according to the same schedule as GSoC so look out for an update on progress and milestones achieved sometime in July.
Project ideas and contributions are welcome
Projects ideas that didn’t make it either for GSoC or the summer intern program this year will be reviewed and may stay on the list for 2016. Other new interesting projects will also be added later this year. We are, of course, open to suggestions and requests via the HPCC Systems® Community Forums or students may contact one of our mentors by email using the details supplied on our GSoC Wiki here: https://wiki.hpccsystems.com/x/8ABF.
As a result of both student programs, we hope to complete a few more projects of value to our open source community this year. Students are also potential new, young developers to add to the HPCC Systems® team in the future. We want to encourage them to stay in touch once they have completed their program with us. Mentors will also want to keep in touch with students from time to time keeping the communication links open, finding out how they are progressing with their studies and checking on their availability for further contributions. HPCC Systems® is an open source project after all so we want to encourage contributions from outside our team. In all honesty, what can be better than attracting new, upcoming talent from the best universities and colleges!
1. The HPCC Systems® Summer Internship runs for 10 weeks beginning at the start of June and ending the first week of August. For more information contact Molly O'Neal who administers the program.
2. For more information about contributing to the HPCC Systems® code base, go to the Contributions area on this website: http://hpccsystems.com/community/contributions.
3. If you want to dive right in and resolve an outstanding issue, go the Community Issue Tracker (JIRA): https://track.hpccsystems.com/secure/Dashboard.jspa. Create yourself an account and search for issues with the Assignee field set to Available for Anyone to get some contribution ideas. Either post your interest in the Comments section and a developer will get back to you, or email Lorraine Chapman.