Shweta Oak evaluates Non-negative Matrix Factorization on HPCC Systems

Shweta Oak is studying for a Bachelor of Engineering at the Sardar Patel Institute of Technology in India, which is affiliated to Mumbai University. Shweta applied to complete a project as part of the HPCC Systems intern program in 2016. Unfortunately, we were not able to offer her a place. Most students at this stage would probably have moved on, but not Shweta! She emailed to ask whether there was a project she could work on as a regular contributor to get some experience to support her interest in machine learning. So, I put her in touch with our Machine Learning project leader, John Holt who supported her work on a project about non-negative matrix factorization (NMF).

The overall aim of this project is to implement an ECL version of the Small K project implemented by Georgia Tech. Shweta was tasked with evaluating different algorithms, to assess which might be the best to use to implement NMF in ECL.

NMF is significant in data analytics giving superior interpretative results for many problems including:

  • Image processing
  • Chemometrics
  • Bioinformatics
  • Topic modelling for text analytics

Shweta analysed four algorithms for the implementation of NMF to find out which would be the most feasible:

  • Multiplicative Update
  • Hierarchical Alternating Least Squares
  • Block Principal Pivoting
  • Rank 2

She compared the performance of these algorithms using four parameters:

  • Computational Efficiency
  • Convergence
  • Speed
  • Could they be implemented with the current PBblas functions

This table shows the results of her analysis, indicating that Multiplicative Update and Rank 2 are the best performing algorithms:

As always, there are pros and cons when choosing which algorithm to use in any given situation. Shweta discovered that while Multiplicative Update is easy to implement, useful in practical applications and guarantees convergence, other algorithms may be more efficient.

Unlike other algorithms which iterate indefinitely to converge, Rank 2 is a finite active set. It is cache and computationally efficient as well as fast, guaranteed to converge and easy to parallelize. Although, the overall complexity is the same just like other active set like algorithms.

There is still work to do on this project. It requires more performance testing before we can make the code available as part of the HPCC Systems Machine Learning Library. We also might want to evaluate other algorithms that may provide better results. The evaluation work that Shweta has completed for us on NMF, gives us the information we need to move towards providing an implementation that gives the best results possible for those who may want to use NMF in their machine learning analysis. I would like to thank Shweta for the part she has played in helping us to achieve this goal.

Shweta provides more details about the algorithms and theorems used as part of her evaluation process in a short presentation she prepared for us to showcase the work she completed on NMF.

As the organiser of the HPCC Systems intern program, I’m lucky enough to connect with extremely able students who have a bright future ahead of them. However, I find that the qualities I admire the most are determination, genuine interest and willingness to participate. Shweta seems to have these qualities in abundance. Her positive outlook encouraged her to pursue working on a project with us as a volunteer, even when all places on our intern program of 2016 had already been filled. I hope other students reading this post will be inspired by her initiative and strength of character. It just goes to show that a bit of persistence and drive can go a long way. I have no doubt she will build a great career from the opportunities that will come her way in the future.

More information about our HPCC Systems interns of 2016…

  1. Read about Suk Hwan Hong and Column Level Security on HPCC Systems
  2. Read about Syed Rahman and the CSCS Machine Learning Algorithm 
  3. Read about Sarthak Jain and the Latent Semantic Analysis Machine Learning Algorithm
  4. Read about Lily Xu and the YinYang K-Means Clustering Machine Learning Algorithm
  5. Read about Vivek Nair and his machine learning regression suite and ML plugins for the Data Science Portal

More about internship opportunities…

  1. Find out about intern opportunities available with LexisNexis.
  2. Interested in a student internship involving coding, machine learning etc? Read about the HPCC Systems intern program.