Contributions to the HPCC Systems open source project from the interns in Mumbai

As part of the academic program, HPCC Systems is committed to supporting students around the world on their first steps into the big data industry. The most recent example of this type of collaborative effort is coming from India, more precisely from the LexisNexis Risk Solutions office located in Mumbai. During the first quarter of 2023, HPCC Systems and the LexisNexis Risk Solutions Technology team based in Mumbai partnered together to support two students from the NMIMS (Narsee Monjee Institute of Management Studies) in India as part of a 3-month internship opportunity with HPCC Systems.

Conversations about this potential collaboration started in mid-2022 when some of the employees from the LexisNexis Risk Solutions office in Mumbai spotted an opportunity to collaborate with one of the top ranked universities in India. NMIMS is a university originally established in Mumbai in 1981 but has expanded to several other regions in India over the years. Among the schools housed at the Mumbai campus, the Mukesh Patel School of Technology Management & Engineering (MPSTME) is home for the computing and technology related courses including a master’s and a PhD program.

Since the NMIMS University is in close proximity to the LexisNexis Risk Solutions office in Mumbai, it seemed like a natural fit to form a collaboration where both organizations could benefit from some of the HPCC Systems open source projects. It was Rahul Jain, a software engineering manager from LexisNexis Risk Solutions, and his team who organized themselves internally to interview, onboard and mentor these students. After a very competitive selection process, two students were finally selected to join us for a 3-month internship starting January 2023: Charvi Dave and Aryaman Gautam. Curious to know how this journey ended? Below is a short description of what both Charvi and Aryaman have achieved with their respective projects during their internship with us…

Meet the Interns

Charvi Dave

Data Science BTech student
NMIMS, India

Charvi Dave is a Bachelor of Technology Data Science student at NMIMS’ Mukesh Patel School of Technology Management and Engineering. Charvi joined LexisNexis Risk Solutions as an intern to work on a project involving the development of a Resume Analyzer that is based on the programming language from one of the HPCC Systems specialized plugins called NLP++.  

NLP++ is a “smart” language that can read and learn textual content like humans do and stores the information in Knowledge Bases (read more about NLP++ on this blog).

A Resume Analyzer is the implementation of an approach to applying various techniques for analyzing the resumes a company receives and retrieving their main sections.  The analyzer developed by Charvi works on text files of resumes. It extracts the main headers and sections of the resumes with efficiency and accuracy.  After the analyzer extracts these information, they are stored in the Knowledge Base which makes it easier to view and analyze the resume core information. The intent was to create an automated solution to support the parsing of resumes during recruitment processes.

According to Charvi, “the analyzer works regardless of the original format of the resume. Word/PDF/Image resumes are preprocessed into text files since NLP++ expects to take text files as an input. It automatically breaks down text into tokens, making it easier to work with Natural Language Processing. According to the formatting and words, we can write rules and use knowledge in NLP++. We created an analyzer which consists of multiple passes, where each pass of code performs a particular task or extracts a different piece of information from the resume. When the analyzer is run, all the passes of code run on the file that is selected. The analyzer then identifies, classifies, and extracts the headers and sections of the resume, such as the candidate’s skills, work experience, email address, education, etc.”

It is important to highlight that all the information extracted from the analyzer is then stored in text files in the Knowledge Base. This is because it is easier to view and analyze the Knowledge Base than it is to go through the actual resume. Dictionaries consisting of a list of several different headers have been developed to assist the analyzer identifying a piece of information and store it in the Knowledge Base accordingly.

Charvi also commented that she has high expectations for the future of her internship project: “We are attempting to reduce time and efforts on the companies’ side. The companies can adopt the system as part of their recruitment process. We are attempting to simplify the recruitment process and help companies to identify the right, relevant talent.”

Charvi was mentored remotely by our LexisNexis Risk Solutions colleague based in the US, David de Hilster, and also locally in Mumbai by Umesh Mahind and Nandhini Velu.

Aryaman Gautam

Data Science BTech student
NMIMS, India

Aryaman Gautam is a final year, Bachelor of Technology Data Science student completing his four-year degree from NMIMS – Mukesh Patel School of Technology, Management and Engineering.

Aryaman joined this internship to work on a HPCC Systems cloud native project:  local deployment of HPCC Systems on a K3D cluster for ECL training. K3D is a lightweight wrapper to run K3S (Rancher Lab’s minimal Kubernetes distribution) in docker which makes it very easy to create single and multi-node K3S clusters in docker.

The solution developed by Aryaman utilizes Docker Daemon, K3D, Helm, and Kubectl. It also leverages a local storage that is mounted to the K3d cluster, allowing users to access ECL Watch through the localhost address on a browser. The ECL IDE can also be linked to this deployed HPCC Systems cluster by designating the server as localhost. 

According to Aryaman, the motivation for this project is to contribute with a lightweight HPCC Systems deployment solution for developers interested in learning the HPCC Systems programming language ECL: “Earlier, HPCC Systems had a prepackaged Virtual Machine to use with VirtualBox or Hyper-V to help new users experiment and learn more about HPCC Systems and ECL. HPCC Systems has since moved to the cloud, which now supports several alternative local cloud environments such as Docker Desktop, Minikube, etc. The objective of this project was to deploy HPCC Systems in any standalone machine (Linux, Windows) to allow users to experiment and learn more about HPCC Systems.”

Aryaman is also excited about the future possibilities of his initial development: “We are employing a local deployment procedure which is cost effective, since it removes dependency on a cloud provider and its associated costs. This deployment can use local storage as well as an external storage as per the users’ requirements. The users can also configure their HPCC Systems deployment on their own through YAML configuration files. This is deployed through a K3D cluster on the users’ local system that is dependent only on open source/ free CLIs & software and thus reduces the overall cost for trainers, trainees, and any other HPCC Systems or Big Data enthusiast.”

Aryaman was mentored by our LexisNexis Risk Solutions colleagues Xiaoming Wang and Godji Fortil, both located in the United States, and also by Sidharth Ganesan and Srinivasan Kothandam, locally in Mumbai.

Congratulations to our interns in Mumbai

Now that you’ve read all about their work, you can understand the value added to the HPCC Systems open source project by the interns. Each of these projects makes a positive impact on the platform and provides benefits to the community users. It is also always very rewarding to watch how students from all around the world can take their first steps into the big data industry via academic collaborations fostered by HPCC Systems.

Rahul highlights that during their tenure, Charvi and Aryaman have also been exposed to a multitude of experiences at the Mumbai office: “Our focus was on developing the interns from a 360º perspective. Beyond having the opportunity to develop themselves professionally, Charvi and Aryaman also had the opportunity to innovate, create and present to bigger forums. At the same time, they were an integral part of the team’s celebrations, such as Republic Day, Women’s day and several other team building activities”.

Looking Forward

As next steps, Charvi and Aryaman are planning to complete their studies at NMIMS University around May 2023 and because they were part of the intern program they will also have the opportunity to join the HPCC Systems Poster Contest during the 2023 HPCC Systems Community Day to showcase their work to our global community and compete for prizes. Access our 2023 Poster Contest Rules for more information on this opportunity.

A special thanks goes out to Charvi and Aryaman for their contribution to HPCC Systems, to colleagues from the LexisNexis Risk Solutions office in Mumbai and in the United States and everyone who played a part in making this another extremely successful internship experience for the HPCC Systems Academic Program.

Find out more about student opportunities like this by visiting our student wiki and our academic program page. Have a question about the program? Contact us.