Does Big Data Equal Big Security Problems?

On May 1, the report “Big Data: Preserving Values, Seizing Opportunities” was released by the Executive Office of the President in response to a directive from President Obama to examine the impact of big data technology on society at large.

Big data is a movement in its own right, but after the White House report was released there was an influx of media articles questioning big data, and in particular, the safety of big data. Questions began to circulate on not only how secure the data is but also on the privacy rights of the citizens’ records whose very personal information is stored in this data.

For example, GigaOM posted an article titled “It’s 2014. Do You Know Where your Data Is?” and a LinkedIn blog that declared, “Big Data has Big Problems.” Both stories addressed the security and privacy of information stored and utilized for big data purposes.

Recently, I gave an interview to discuss how LexisNexis Risk Solutions protects our data and customer information as well as address the recent concerns raised in the media regarding big data, and what we are doing on our HPCC Systems big data platform to maximize security. Below is the Q&A transcript from the interview.


Moderator: Why is LexisNexis’ information safe and then why should customers trust us?
Flavio Villanustre: We are secure because we have a comprehensive security and privacy program in place. We continuously test our security posture and address any weaknesses that we may find, and we also have state of the art controls around access to the data.

But security goes far beyond just technology. Security isn’t just about making your software secure so that it cannot be breached, you need to also make your processes secure. You need to provide awareness to your employees so that they don’t get socially engineered, for example, and apply controls around security across your entire organization. It’s not just technology, and it’s not just customer support or customer operations.

What are some specific things we do to protect the data?
We do a lot of things. On the administrative side, we have a comprehensive set of security policies, procedures and standards. We provide security through training and we require that employees have knowledge of our policies. We do internal and external (independent third party) risk assessments to ensure that every part of the process is assessed from a risk standpoint, and that controls are commensurate with the exposure and overall risk.

We also employ technical controls, which are things like firewalls and network segmentation, data loss prevention systems and anti-malware protection systems. Administrative and technical controls complement each other.

In addition, we draw a distinction between “security” and “compliance.” Too often, we see organizations “checking the box” to assure themselves that they are compliant with respect to some framework. Our viewpoint is: if we do a very good job with information security (at a technical level and at a process level), compliance more or less takes care of itself.

In general, controls can be classified in three categories: preventive, detective, and corrective. In general, the most important ones are the preventive controls, which are put in place to prevent an attack or mitigate a threat. You need to keep in mind that it is very difficult to undo the damage when sensitive data is leaked or exposed. This is why we put significant emphasis on preventive controls and prioritize prevention. At the same time, we have to always be prepared for the possibility that data might be leaked or exposed, which is where detective controls come in handy, i.e. the sooner we can detect an intrusion or malicious attack, we can minimize potential damage, as opposed to detecting the event weeks or months later.

How does the security of HPCC Systems compare to the threat of other big data systems like Hadoop?
[HPCC Systems] is a lot better. We have been doing this for 20 years, and we built HPCC Systems specifically to support our business. As a result, many of the requirements that we have in our business around security and privacy are also incorporated into the HPCC Systems platform.

By contrast, Hadoop was designed from the ground up to allow people to store massive amounts of data on relatively inexpensive hardware and then be able to perform searches like the “find the needle in a haystack” type of problem. This is what it has been focused on for the longest time, rather than on making it work securely. So the security on Hadoop came as an afterthought, and even the basic authentication mechanisms weren’t deployed until a couple of years ago.

I saw that HPCC Systems went open source in 2011. Does that cause any vulnerability issues?
Not at all! On the contrary, this increases security through transparency and collaborative development. Generally speaking, the open source movement – started back in the 80s – is about releasing not just the compiled (or binary) version of the software, but also the programming language version of the software, the source code from which the binary code is generated. Rather than making things less secure, the increased number of reviewers and contributors can identify and correct security issues much faster with their combined efforts, making HPCC Systems even less vulnerable.

When legacy systems are converged onto the HPCC Systems platform, are there any concerns that one needs to be aware of? Some leading journals suggest that technology has progressed so quickly that legacy systems may have issues with merging into a new platform?
It’s true that technology has changed, and that it changes very rapidly. It’s no longer a time where we have a new generation of technology every 20 years. Now, a new generation happens every two or three years.

These days, big data encompasses many things: social media, videos, free text – which is not well supported by legacy systems. When you’re trying to deploy HPCC Systems in that environment, there are two ways you can do it. You can say, “Well, I’m going to phase out all my legacy systems completely and move all the data,” but that might not be viable for many companies since they may need to continue to operate while they do that migration, so integration is needed. As with any other integration process, there is some complexity, which could generate some potential security issues in between the interfaces, while you are trying to connect one system to the other and move data. Which is why, when we migrate legacy systems on to the HPCC Systems platform, we play close attention to the security controls that may need to be implemented or refactored as a function of the migration.

Do we ever have risks of losing data?
Well, the reality is that everyone does. There is the myth of complete security, and it’s just that, a myth. There is no way you can say, “I’m 100 percent exempt from having any security issues ever.” Of course, we believe, based on applying best-in-class security practices, having thorough and comprehensive monitoring and surveillance and having a mature set of processes that adapts to the ever changing security threat landscape, that we have a very low risk of losing data.

Maybe I’ve been watching too many political action and sci-fi shows lately, but I was watching 24 and my mind kind of races, which makes me ask: do we ever have anybody try to intentionally hack our systems to see if they can get in?
We don’t have any controls against aliens from outer space, but we do try to intentionally hack into our systems. We have security assessments and penetration testing and we regularly perform both, internally and externally. In addition to our own security experts – who are very well-trained and very knowledgeable of these practices – we also have third parties that we hire on a regular basis, to attempt to break into our systems.

Unfortunately, the number of hackers or wannabe hackers is very large, and you can bet they will be creative in trying to find new ways of breaking into your systems. So, if you’re not performing continuous surveillance and scanning for new threats and attack methodologies, it will potentially expose you in the long run.

What are some challenges that you see with protecting big data in general in the future? And what do you think we need to do to combat those threats?
First of all, we need to draw a distinction between security and privacy. I think the biggest challenges are going to be potentially around privacy, which is a very touchy subject because there is no universal concept of privacy. This distinction is necessary because some people often confuse a perceived lack of privacy with a lack of security.

What is considered acceptable privacy in the US might not be acceptable privacy in Germany or China. What’s privacy to us today is not the type of privacy we used to consider 20 years ago, and it won’t be the privacy 20 years from now. It’s important to have a clear understanding of what the society accepts as privacy, to ensure that you don’t go beyond that. You never want to be seen as creepy, and I can’t define exactly what creepy is, but you will know when you see it.

There can also be better education. For example, when you go to install an application on your smart phone, and the list of permissions pops up, the list is so long, you probably don’t even read it. You say, “I need the application, so accept.” Well, I don’t think that is the right way of doing it. There needs to be some bullet points, saying, “Look, you give me your data, and for giving me your data, I will use your data in this way.” It needs to be clear and understandable by anyone.

At the end of the day, there needs to be an exchange of value between the data owner (the person) and the data user, and that value needs to be measurable and tangible. I am glad to allow others to use my data, if that gives me access to better credit, simplifies my access to online services and makes my children safer in the streets.