CodeDay Big Data Challenge December 2020
HPCC Systems was delighted to be a supporter of CodeDay in December 2020, which was attended, virtually, by over 1000 high school students who represented 49 US states and 27 different countries around the world. Students attended 26 workshops and events. One of these events was the Big Data Challenge and our contribution to this portion of the CodeDay event involved using the HPCC Systems Platform and ECL language.
Students work together in teams to solve the various challenges provided by the sponsors. This event attracted more beginners than usual, so setting a slightly larger team size of 12, provided a great opportunity for knowledge sharing, as well as a collaborative environment for the seasoned coders to support those just getting started.
The CodeDay organiser is Tyler Menezes and HPCC Systems was represented by our challenge co-ordinators Bahar Fardanian and Arjuna Chala, who provided support to students using our platform as well as helping out with answers to questions about our big data challenge.
Tyler Menezes is the Executive Director of CodeDay, a non-profit organisation providing coding opportunities for under-served students to explore a future in Technology. CodeDay encourages students to develop an interest in big data analytics providing coding challenges for those who take part. Find out more about Tyler on Wikipedia, visit the CodeDay website and hear him speak at our Community Day Summit in 2020.
Bahar Fardanian is a Tech Evangelist at LexisNexis Risk Solutions Group. She is a software Engineer with more than five years of extensive experience in Big Data, ETL and Data Science. She is an expert in design and developing scalable solutions using virtualized clustered technologies to address business needs from the problem statement right through to solution delivery. Bahar has been with LexisNexis Risk Solutions Group since 2015 as a software engineer, working with big data and ECL in different business verticals.
Arjuna Chala is a Senior Director Operations at LexisNexis Risk Solutions Group with almost 20 years of experience in software design. He leads the development of next generation big data capabilities including creating tools around exploratory data analysis, data streaming and business intelligence. Arjuna served as a key member of the team bringing the HPCC Systems platform to our open source community.
The HPCC Systems CodeDay Challenge
Our Big Data Challenge introduced students to big data concepts, providing opportunities to learn some ECL and practice what they have learned. In advance of the day of the event, resources to help students familiarise themselves with HPCC Systems and ECL are provided. We supplied an ECL cheat sheet for them to use as well as providing some initial getting started information and a copy of the ECL Language Reference and Visualising ECL Results user guides. There are, of course, many other training resources available that any keen student can look at on the HPCC Systems website.
Sample ECL code is provided that allows them to solve questions which range from easy through to mid-level difficulty. We prepared a dataset using data from Spotify’s Top 2000 songs and provided a cluster with the data loaded ready. Students were given access to a workspace in our CloudIDE (login required and video guide provided) which included the sample data, giving them a head start. Guidelines were also supplied to help them work through the challenge questions.
Here is a sample of some of the challenge questions students had to solve:
- Sort TopGenre dataset and count your total music dataset and display the first 50 results
- Display the first 50 songs by “garage rock” genre and then count the total number of records
- Count how many songs were produced by “Prince” in 1984
- Find the least popular song using “Popularity” field
- Count all songs where “SongDuration” is between 200 AND 250 AND “Speechiness” is above 14
- Create a new dataset which only has columns for Artist, Title and Year
- Calculate the average “Danceability” per “Artist” for “Year” 2008
Some of the challenges give students the chance to learn some of the basics involved in query creation, while others allow them to move on to finding more interesting and fun insights!
Students are given hints, here and there, to help them organise their thoughts and plan their problem solving.
So how did the December CodeDay go?
312 students took part in the HPCC Systems challenge. Each student was given a score per challenge and the teams with the highest scores were the winners. We had 26 teams of 12 students and this graph, provided by Tyler, shows the highest scoring HPCC Systems Challenge teams:
CodeDay had a great year in 2020. Like most organisations, they have had to adapt to the challenges we have all faced during the COVID-19 Pandemic. They hosted their first virtual event early in 2020 followed by others over the summer and in December. In fact, the virtual nature of the events this year, attracted more students than ever and we are delighted to have been a supporter and sponsor.
Find out more about our involvement in CodeDay November 2019 and CodeDay June 2020.
HPCC Systems runs an intern program over the summer months every year. It was wonderful to welcome a CodeDay student on to our intern program in 2020. Jefferson Mao contributed to our HPCC Systems Cloud Native development project with support from our LexisNexis Risk Solutions Group colleagues, Xioaming Wang (Senior Consulting Software Engineer) and Godson Fortil (Software Engineer I).
Find out more about Jeff’s project and view the poster he entered into our annual poster contest, which won him the top prize in the Best Poster – Use Case Category.
It would be great to welcome more CodeDay students on to the HPCC Systems Intern Program in the future and we wish Tyler and the CodeDay team a successful 2021 and beyond.