Jo Prichard is a senior data scientist at LexisNexis Risk Solutions and a long-time HPCC Systems user. Focusing on big data research and development, Jo helps enterprises target fraud, collusion and other socio-behavioral risk factors.
Jo’s experience and interest in applying large-scale graph analysis to solve challenges for companies in financial services, healthcare, government, insurance and retail sectors has given him deep knowledge and understanding of the world of big data.
Our own VP of Technology, Flavio Villanustre, sat down with Jo to discuss his love for computer science, his work with LexisNexis Risk Solutions – including some interesting law enforcement projects — and his experience mentoring in the HPCC Systems internship program.
What attracted you to a career as a data scientist?
When I started my career, I was introduced to a phenomenal team that taught me about computer programming and introduced me to the world of managing big data. That’s when I realized data science was for me.
Originally, I did a lot of work in graph visualization, which we used predominately for visualizing networks of things revolving around public records – this includes people, addresses, vehicles and properties, among others. In doing this, we discovered there was a natural transition between graph analysis and measuring networks of connected things.
The tipping point for me was when I realized that HPCC Systems was a really good way to explore the ocean that is big data. I can recall watching David Bayliss solve some very complex challenges with a ridiculously small amount of code. Watching him work was exciting because it helped me understand how to become a large-scale data detective where you can look at your data and ask, “What does this tell me about the world around us?”
How does your research and development work help LexisNexis Risk Solutions prevent and/or mitigate fraud?
We have several different verticals, and we leverage our technology in different ways. From a social network point of view or a graph perspective, the use cases and the domain of data might differ, but we set up the same types of behavioral networks to target the system in a specific way – usually to commit fraud. We use graph analysis to measure everything that’s happening within a large network of connected things to understand what areas need our attention.
Real estate transactions are a good example to explain how we do this. We want to understand how networks of people are operating and, at a transaction level, how we can leverage graph analysis to identify risk. By overlaying the relationship graph drawn from public records with the data from a real estate transaction, we can see if there is a potential relationship between the buyer and seller. If there is, it doesn’t mean it’s fraud, but it is considered a higher risk transaction. From there you ask, “Is there a network around the parties in this transaction that have done this more than once? Is it systemic in that part of the graph?”
We then compute everything in the graph to flatten out the results, which produces attributes that tell you more about what’s happening around that particular point. That’s where HPCC Systems is phenomenal. It allows you to pull in massive amounts of data and compute them all in one process.
What are some of the most interesting projects you’ve been involved in while working at LexisNexis Risk Solutions?
There are a couple of areas that we work in that are incredibly interesting. For example, we occasionally focus on misuse or abuse of identity where we utilize analytics and features that will help identify which consumers are potentially in danger of having their identities compromised. We are very careful about how we tackle these scenarios and are continuously working towards improving what we produce so that we can build processes to protect the consumer.
We also have a responsibility to actively use our data for good, so we have an initiative where we do pro bono work for law enforcement. At the moment, we predominantly focus on missing and exploited children, working with the National Center for Missing & Exploited Children (NCMEC) to help find adolescents who’ve gone missing. There are cases where we attempt to triangulate the abductor’s digital and social network footprint so we can map out their most probable direction or location.
The thing that excited me most about coming to LexisNexis Risk Solutions and using the data for law enforcement purposes was watching David Bayliss wrangle data and compute results to discover which people might have been in an area related to a sequence of serial crimes. It took David less than 20 minutes to work with more than 30 billion rows of data. He gave the law enforcement agencies a list of about 40 people to investigate and I think the suspect was number eight or nine on the list.
In your opinion, what advantages does HPCC Systems provide compared to other open-sourced platforms?
The platform was built with efficiency and simplicity in mind. It’s ridiculously easy to bring up either one node or a multi-node and enter data. The core platform itself has very few moving parts, which is important because it’s like an electric car – there’s less that can go wrong because there’s less to maintain. It enables you to focus your resources on other important functions.
That’s crucial when running a company. You need your platform to be efficient, robust, and consistent so that you can drive your product and generate revenue. HPCC Systems can lower your resource cost in terms of personnel resources because you don’t need as many people coding furiously to code a ton of stuff. You are writing significantly less code because you have a high level language that sits on top of all the C++ that it is generating.
At LexisNexis Risk Solutions, we’re used to the fact that capability just exists. Most of the people here who’ve been using ECL have no idea how complicated it could get if they were using other technologies.
Can you tell us a bit about your experience as a mentor for the HPCC Systems internship program last year?
Last year I worked with Nicole Navarro, and it was the first time I’ve mentored someone specifically from a data science perspective. The first thing I had to do was get her up and running within a relatively short amount of time so that she could do something engaging and meaningful. Data science typically is 90% grunt work and 10% glory.
Fortunately, we have some neat tools we’ve built on top of HPCC Systems that allowed us to accelerate Nicole’s experience, enabling her to focus on doing important work for her internship. I really wanted to immerse her in the problem she was focusing on and provide her with all of the data and tools to succeed. We also wrapped a larger team around her to fill in gaps at different points, so she wasn’t trying to do it all on her own. It wasn’t a case of, “Here’s a ton of data, go figure it out.” It was, “Here’s a ton of data, plus a support group and supporting tools.”
Nicole’s project focused on the opioid epidemic. She was looking at areas in a specific part of the country where the communities with prescription drugs had a more connected social network. She wanted to identify areas that were potentially receptive to community-based intervention programs. We had the tools and support to get her to that point quickly so she could focus her attention on her project, but it also gave her the ability to visualize and measure things around that so she could grow her understanding. I think having a framework to do that is really important. If that had happened maybe two or three years ago, we wouldn’t have had all the things in place to be able to do that.
Want to hear more from Jo’s interview with Flavio Villanustre, VP of Technology, LexisNexis Risk Solutions? Listen to the webcast to hear more of this interesting talk.