Dr. Taghi M. Khoshgoftaar is the Motorola-endowed chair of the Florida Atlantic University Department of Computer and Electrical Engineering and Computer Science and the director of the National Science Foundation Big Data Training and Research Laboratory.
With more than 750 published journals and conference papers in big data analytics, data mining and machine learning, health informatics and bioinformatics, social network mining, fraud detection, and software engineering, Dr. Khoshgoftaar is recognized as one of the foremost experts in the field. He also has a long history of outstanding work in the HPCC Systems community, serving as a highly respected mentor for students working on machine learning and other areas of applied research.
Flavio Villanustre, our VP of technology, recently spoke with Dr. Khoshgoftaar and discussed big data, Khoshgoftaar’s interest in academics and education, and his time as a mentor.
What led you to pursue a career in academics? What do you most enjoy about working with big data in an academic setting?
I recognized when I was in high school that I was interested in teaching. I started teaching my fellow students when I was just a sophomore in college. Of course, as a professor you have to conduct research as well, which I enjoy very much, but my first interest was in teaching.
Working with big data is very interesting and brings many challenges. Our environment is somewhat limited in terms of computational power and capabilities, but that hasn’t interfered at all with my enjoyment of working with big data in an academic setting.
FAU was one of HPCC Systems’ first academic partners, and you’ve mentored several students who have worked on the platform over the years. How would you describe your experience? Are there any projects or moments that stick out to you in particular?
I have been really fortunate to work in the mentoring program at HPCC Systems. I’ve mentored several students, starting back in 2012. In fact, one of my recent mentees, Miriam, just became the first FAU alumni to work as a data scientist at Microsoft.
In working in cooperation with FAU students and HPCC Systems, we’re able to get hands-on with ECL for big data research and experiments. One of our recent papers was an experiment with the LBFGS algorithm, which published in 2017. In this project we used HPCC Systems and AWS to optimize more than a billion parameters. This typically isn’t possible in an academic environment, but with HPCC Systems, it is.
Are you working on anything right now that you are particularly excited about?
Yes, currently we are looking at reality in big data. When you work with data, you are dealing with billions of instances, but the cases you are interested in – the positive cases – are very limited. That is what I refer to when I say “reality.” So, we are trying to figure out how to address this issue of learning when there are very few positive instances.
I like to work on projects that are interesting, industry-related, and have a practical use to tackle current challenges.
Where do you see the world of big data in 10 years?
I recently met with several doctors who have a concussion research center in LA. I told them that if all the parties involved in concussion research and development came together and shared their data, they could make unprecedented progress in the industry. Generally speaking, most companies and stakeholders don’t want to share their data, which severely hinders a field’s ability to grow. If you were to bring all this data together and fuel the fusion from different sources, the possibilities would be endless.
Each data set is an individual puzzle piece that could be assembled to solve larger problems. Unfortunately, what is top-of-mind for people these days is security, so our ability to collaborate has been stunted. This translates well beyond concussion research and is true for many areas of research across industries.
Is there an area of big data that you would like to explore in the future?
I would like to explore what can be done when we have terabytes of data but very little of it is labeled. What do we do? How much labeled data do we need, and how do we go about solving this? I think that is the really important question to answer and is the future of big data.
Want to hear more from Taghi’s interview with Flavio Villanustre, VP of technology, LexisNexis Risk Solutions? Listen to the webcast where Taghi continues to discuss his academic research, HPCC Systems, and more.