Academic Spotlight: Performance Skew Prediction in HPCC Systems
HPCC Systems has a robust Academic Program that collaborates with colleges, universities, high schools and institutions of higher learning around the world.
One such collaboration with the Rashtreeya Vidyalaya College of Engineering (RVCE) in Bengaluru, India, has resulted in a new publication: Performance Skew Prediction in HPCC Systems, published by IEEE Xplore®
Below is the paper abstract:
Over the last decade, the volume of data has been growing at a larger rate in comparison to the processing power available. The advent of distributed computing was essential in being able to handle these vast amounts of data. However, the distribution of data across the systems may not be uniform and gives rise to the problems of data skew and performance skew. A key challenge is to estimate the effective performance skew of a set of queries based on the data skew of the dataset on a multi-computing cluster. We use HPCC Systems, a modern big data management and analysis tool. Methods used to measure the impact of performance skew on the performance of queries on a HPCC cluster are heavily dependent on human interpretation. This project aims to automate the process of skew prediction by analyzing the execution graphs of a job on the HPCC Systems cluster and predicting the probable performance skew for a given set of queries using a Random Forest Regressor Model.
Technical poster and collaborative working group
One of the student authors of this article, Ambu Karthik, is studying for a Bachelor of Computer Science and Engineering at the RV College of Engineering where he has worked on robotics, data analysis and networking. HPCC Systems provides a great combination of networking and big data analysis alongside using machine learning, which provides a great opportunity to expand his knowledge within his specific areas of his interest.
Ambu was a recent participant in the 2020 Technical Poster competition at our annual HPCC Systems Community Day where he presented a poster entitled Implementation of Generative Adversarial Networks in HPCC Systems using GNN Bundle. You can see some aspects of this poster presentation reflected in the new IEEE article.
(Note: The poster can be viewed in a larger format on the Technical Poster competition wiki page.)
In Ambu’s poster, he introduces the Generalized Neural Network (GNN) bundle with a wide variety of features which can be used for various neural network applications. To enhance the functionality of the bundle, he proposes the design and development of Generative Adversarial Networks (GANs) on HPCC Systems platform using ECL, a powerful, declarative language that drives the HPCC Systems platform.
HPCC Systems and the Rashtreeya Vidyalaya College of Engineering (RVCE) have had a long and very productive working group on GANs and the GNN bundle. This working group has included several authors of this paper working in conjunction with Roger Dev, a Senior Architect for HPCC Systems and LexisNexis Risk Solutions, who is the leader of the HPCC Systems Machine Learning Library. You can learn more about this working group in the blog Academic Program Spotlight – HSQL, Generative Adversarial Networks and the DBSCAN clustering algorithm.
The LexisNexis Risk Solutions HPCC Systems team has collaborated with RVCE since 2017, including support to faculty and students by providing HPCC Systems internships and funding for research projects. We are excited to sponsor the proposed RV College of Engineering – HPCC Systems Centre of Excellence in Cognitive Intelligent Systems for Sustainable Solutions and look forward to collaborating with the CoE and continued partnership with RVCE.
You can see Ambu explain his technical poster presentation below to learn more about GANs.
About the Authors:
Ambu Karthik is a 2nd year undergraduate student at the RV College of Engineering.
Ambu has worked on robotics, data analysis and networking in previous projects. HPCC Systems provides a great combination of networking and big data analysis alongside using machine learning, which provides a great opportunity to expand his knowledge within his specific areas of his interest.
Harsh Mishra was first introduced to HPCC Systems as an undergraduate student at R.V College of Engineering in 2018. He conducted original research on profiling data skew in multicomputing clusters and its impact on the runtimes of simple and compound queries. He published his findings at the International Conference on Big Data and Education in London in April 2019. He has since finished graduate school at Columbia University and is currently working as a quantitative trader at a hedge fund in New York City where his work involves developing mathematical and statistical models which process large volumes of high-frequency financial data to identify exploitable price patterns in the markets.
S. Jayanth is an undergraduate student at the RV College of Engineering.
Dr. G. Shobha is a Professor in the Computer Science and Engineering Department at the RV College of Engineering with teaching experience of 25 years. Her specialization includes Data mining, Machine Learning and Image processing. She has published more than 150 papers in reputed journals / conferences. She has also executed sponsored projects worth INR 200 lakhs funded from various agencies nationally and internationally. She is a recipient of various awards such as Career Award for young teachers 2007-08 constituted by All India Council of Technical Education, Best Researcher award from Cognizant 2017, GHC Faculty Scholar for Women in Computing in 2018, IBM Shared University Research Award in 2019, and the HPCC Systems Community Recognition Award in 2020. Dr Shobha was the recipient of the 2021 HPCC Systems Mentor Badge Award for providing guidance and direction towards the successful completion of intern open source projects.
Dr. Jyoti Shetty is an Assistant Professor in the Computer Science and Engineering Department at the RV College of Engineering. In collaboration with students, she has executed several projects on HPCC Systems, including implementing a distributed DBSCAN, providing evaluation metrics for a clustering algorithm, and IoT plugin for HPCC Systems, an OpenCV interface for HPCC Systems and more. She finds HPCC Systems a simple and powerful open source platform to execute complex real world problems. Professor Shetty was the recipient of the 2021 HPCC Systems Mentor Badge Award for providing guidance and direction towards the successful completion of intern open source projects.