Earlier this month, DJ Patil, U.S. Chief Data Scientist with the White House Office of Science and Technology Policy, held a Q&A session on ProductHunt.com. We collected our favorite questions into a blog post, which you can read here. We thought it would be interesting to take key questions from the session and answer them. In the previous post in this series, Becky Champion, Senior Consulting Software Engineer, LexisNexis responded to the questions.
In this post, Jesse Shaw, Senior Consulting Software Engineer, also at LexisNexis, provides his insight.
Q. Where do you look for emerging trends (e.g. sources, events, websites, etc.)?
I subscribe to over 200 topic diverse RSS feeds ranging from architecture to zoology. Some of the notable ones are: Flowing Data, StatsBlog.com, 99U, R-Bloggers, SmartDataCollective, Inhabitat, Calculated Risk, Xconomy, The Verge, +Plus Magazine, arXiv, Fierce Healthcare, Hacker News, PhysOrg, ReCode, StrongTowns, ScienceDaily, and SuppVersity.
Q. What kind of tools do you currently use?
My primary tool set is the complete suite of HPCC Systems including the Machine Learning Libraries, SALT (Scalable Automated Linking Technology), KEL (Knowledge Engineering Language), and the Data Science Portal (DSP). I also use a few speciality text editors Gephi, R-Studio and Spotify.
Q. Is it better to be data-driven or data-informed?
I would say data-informed even though I don’t see the application of data science in this way. There is a vast difference between tradition (what you’ve always done), intuition (what you think you should do), and automation (future actions based on measured outcomes). Tradition and automation act as polar opposites while intuition is that fuzzier space where machines don’t yet tread. Tradition should be changed intuitively where machine learning guides.
Q. What drives you towards data science as opposed to computer science where the skill set can often overlap?
My initial interest in data science was driven, largely, by my love of triathlon and wanting to improve. I began collecting performance and biological markers and built a workout/diet plan to reach quarterly goals. From there, things got a little out of hand and I now have a few petabytes of data to explore. I love data exploration and uncovering counter-intuitive perspectives; however, my computer science skills seem to always lag my snowballing curiosity. When this happened, I reached out to specialists to bridge the gap. LexisNexis have developed the HPCC Systems suite of amazing tools which I have at my disposal allowing me to mine deeper and visualize faster than ever before.
Q. Can you tell us how a data scientist or a team of data scientists can change or affect a nation’s path?
With unfettered data access, a small team could re-invent the global healthcare market by developing patient specific, data-driven, treatment outcome-based, care chronologies. This would require access to all of the US’s healthcare records complete with identity information. Of course, for this revolution, HIPAA (Health Insurance, Portability and Accountability Act 1996) would have to be re-written and I’m sure corporate stakeholders with everything to lose would never allow this. Germany is beginning this process for the treatment of diabetes type II.
- The Data Science Portal is proprietary software which is only available outside of LexisNexis by special arrangement. It was created to answer the question, “How do we solve ‘difficult but similar’ data science problems easily?” It has been deisgned to solve these challenges by providing:
- A generic framework to design and implement data solutions based on reusable plugins.
- A visual drag-and-drop interface to selectively string these reusable plugins together to solve a larger problem.
- A visualization canvas to interpret the output of the solution using charts.