Hugo Watanuki (Senior Software Engineer, LexisNexis Risk Solutions Group) has over 15 years’ experience in the IT field. He specializes in High Performance Computing and has been developing and teaching ECL courses in Brazil for the past 3 years. Brazil is the largest country in South America generating petabytes of data that needs solutions. In this blog, Hugo talks about our partnership with the University of São Paulo, focusing on how HPCC Systems and the ECL Language are being used to support academic research as well as providing additional curriculum benefits to students studying big data analytics.******************************************
From the moment I became aware of HPCC Systems, I was struck by its potential to be a complete teaching & learning platform for Big Data subjects. During my first days working with it back in early 2019, I was introduced to HPCC Systems and the Enterprise Control Language (ECL) by my colleagues Richard Taylor and Bob Foreman via the online training courses. I could not stop thinking about how useful this open-source, totally free, and end-to-end mindset of the platform might be when applied to teaching Big Data processing and analysis from different perspectives, as well as the benefits to students with different backgrounds and levels of expertise.
These ideas started to solidify once the LexisNexis Risk Solutions office in Brazil began partnering with the University of São Paulo (USP) in late 2019. With guidance and support from Professor Renato de Oliveira Moraes, a series of initiatives, covering the three core areas of the University involvement in Brazilian society have been implemented in the form of teaching, learning, research, and extension activities.
This is the first in a series of three blogs where each of these core areas of the University’s activities will be discussed in more detail. This blog focuses on how HPCC Systems has been leveraged as a teaching & learning platform for undergraduate students from the University of São Paulo.
The University of São Paulo is the largest public University in Brazil and the country´s most prestigious educational institution. It is usually ranked number 100 worldwide by the Times Higher Education Word University Rankings. Professor Renato Moraes is a member of the University’s Polytechnique School of Engineering and conducts teaching & learning, research and extension activities in the area of information technology and multivariate data analysis. For more information please check out Professor Renato’s talk from our latest HPCC Systems Community Day event in 2021.
Initial discussions started in late 2019 when Professor Renato first visited the LexisNexis Risk Solutions office located in the Alphaville area of Brazil. The main idea was to provide an undergraduate course where HPCC Systems could be leveraged to teach Big Data processing and analysis to undergraduate students. The course was designed to provide students with the theoretical and practical knowledge necessary for the development of research dealing with massive datasets, as well as decision-making supported by data analysis. This felt like the perfect context for leveraging an open-source, totally free and end-to-end platform such as HPCC Systems.
The local LexisNexis Risk Solutions office offered support to Professor Renato who initiated the proposal submission for an elective course in early 2020. The entire screening and review process of the proposed elective course by the University’s committee took almost one year and the Dean’s approval was obtained in late 2020, allowing the first offering of the course to take place in early 2021. The course was approved, setting a total workload of 60 hours.
The basic program covers the following:
- Basic concepts in Big Data processing, data types, data transformation processes, data storage and visualization solutions
- Practical application of massive data analysis applied to operations management
- Usage of data extraction, transformation, loading and visualization techniques
- The leveraging of statistical analysis to support machine learning
Those who may be interested in learning more about the content of the course, will find more details in the elective course curriculum.
The First Class
Once the course was approved, the planning and execution of the first curriculum began. Since this was the first offering of an elective course, we were conservative with our hopes for enrollment. Our goal was to have a minimum of 20 students and a maximum of 40 students for the course to take place. Beyond our most optimistic expectations, the original minimum threshold of 20 students was quickly filled. The word of mouth spread very fast in the campus and soon we reached 48 students interested in the course. Given that classes were done remotely back in 2021, we could stretch our upper limit and accept all 48 students who applied for the course.
Overall, the students for the first offering came from different specialization areas of engineering and from different tenures, but with a prevalence of mid-course students from construction, electronics, naval and industrial engineering, as shown in this table:.
The total 60 hours of the course were split along a 15-week schedule where each 4-hour class was composed of both expository and practical activities. In the first half of the class, students were presented with the basic concepts around Big Data processing and analysis, followed by practical labs with HPCC Systems on the other half of the class. After every class, the students had an assignment to solve with ECL to be presented the following week. The curriculum for the practical labs and exercises followed the same content from our famous community courses including but not limited too:
- Introduction to ECL (parts 1 and 2)
- Advanced ECL (parts 1 and 2)
- Roxie (part 1)
At the end of the course, the students were also awarded with their respective HPCC Systems badges!
As part of the course core objective of presenting students the basics of massive data processing and analysis techniques for operations management, the students also needed to develop an applied Big Data use case during the final weeks of the course.
Since the goal of the course was also to align the students’ learning experiences with industry needs, three external business partners were invited to propose students challenges to be solved with Big Data analysis in HPCC Systems. LexisNexis Risk Solutions Business Services, LexisNexis Risk Solutions Insurance and Brazil´s most innovating credit Bureau.
A total of 6 challenges were proposed by the partners and the scope of the work focused on leveraging public datasets to achieve a diversified set of objectives, such as:
- Finding potential relationships between PEP (politically exposed persons), Brazilian company registration data, beneficiaries of social programs, and owners of companies denounced for slave labor
- Developing an insurance risk score for companies in Brazil according to their physical location and taking into consideration the risk of flooding (very common in Brazilian metropolitan areas), and the social indicators of the region.
- Developing a credit score and its associated attributes to measure the risk of loan requests made in a peer-to-peer lending platform.
The 48 students were then divided into a total of 12 groups and each proposed challenge was tackled by two different groups, which contributed to a little competition between the groups for the best solution for each challenge.
During the last 4-weeks of the course, mentoring was provided and the students had the opportunity to clarify any questions they had about their chosen challenge. In the final class, a special session was arranged with representatives from the external business partners and students presented their final solutions for evaluation by our partners. The session recordings were split into parts and are available here:
- Students’ Final Presentations Part 1
- Students’ Final Presentations Part 2
- Students’ Final Presentations Part 3
The assessment of each group’s solutions was done jointly with the professor and the external business partners. Along with the rich learning experience for the students, which was the primary goal of the course, some very interesting and unexpected outcomes were achieved in this first offering of the course:
- One of the students decided to leverage the knowledge acquired during the course to work on a submission to our 2021 HPCC Systems Community Day Poster Competition. The title of the poster was “Preventing Fraud by Registration Inconsistencies using HPCC Systems” and more details can be found on our wiki.
- The external business partner representatives were very pleased with the quality of the solutions proposed by the students.
Here are some of comments from our business partners which were captured via an anonymous survey distributed at the end of the student presentations:
“They [students] managed to properly handle the challenge both from a technical point of view and also from a business point of view…” (anonymous external Business representative)
“The logical organization of the workflow was great including bringing codes stored on Github…” (anonymous external Business representative)
“Their project presented an interesting beginning of data exploration; we will complement the solutions and continue with product development.” (Anonymous external Business representative)
Then suddenly some success stories from our students started to pop up. Right after the conclusion of the course, the students were invited to apply for our LexisNexis Risk Solutions corporate internship program and two of them were selected to join the program a couple of months after the end of the course. Students were also invited to join the Brazil Credit Bureau corporate internship program and just recently we heard of two additional students who started their tenure at the Credit Bureau. We will keep updating this blog as more success stories like these come to our attention.
After the successful experience of our first course, the elective course is now an integral part of the University of São Paulo’s curriculum, Professor Renato will be able to offer the course on a yearly basis going forward.
For 2022, the elective course has already started, having began on March 17th. At the time of writing, 145 students have already enrolled for our second offering of this class. According to the table below, it seems that news of our course continues to spread by word-of-mouth across the university campus, and is now attracting students from courses other than engineering as well as students from senior years.
In 2022, we expect to return to onsite activities and that computing laboratory at the university can only accommodate a maximum of 40 students. Unfortunately, this means that we might need to defer entry on to the course for some students in the future. But on the positive side, the number of students interested in the elective course is continuously growing and this has led the university to take the decision to provide additional funds to Professor Renato to hire a graduate tutor to support the students. The hiring process is ongoing (interested students can apply here) and we expect to announce the selected student in the coming weeks. The candidate must be a graduate student from our previous 2021 class.
We are delighted with the success of the course so far and the growing interest in our elective course. More exciting news is bound to come your way from this year’s cohort and beyond, so stay tuned!