Wherever you look, security and big data are hitting the headlines which confirms their place in my mind as the two hottest applications in tech at the moment. Both have had a profound effect on how businesses interact with customers and develop new products and services. In order to conduct the complex analytics that bring meaning to data, big data platforms require access to massive amounts of potentially sensitive data. And, no matter how powerful or easy to use a big data platform is, it can become a serious liability if it isn’t properly secured. Which begs the question: What is required to properly secure a big data platform from unauthorized access or data theft? While every company and platform vendor will have their own opinions on how secure their big data platform is or needs to be, there are some basic security considerations that must be met.
1) The first issue to consider is how well your company’s big data platform addresses security issues on its own. For example, does the platform have an integrated security manager application? Security managers must provide authentication and authorization features to control which users can access which data sets. Additionally, security managers should keep track of each and every user interaction on the platform, so that in case of a breach the IT team can determine if the system was compromised from the inside by a legitimate user (either knowingly or unknowingly) or by way of an external attack. But not all integrated security managers are created equal, so if you’re looking to adopt a big data platform in the near future, look for one which provides the right mix of granular control and ease of use. For example, does the security manager have a hierarchical system for granting access rights to data? Does the security manager support user authentication and/or authorization? The two terms sound similar, but they mean different things. While an authenticated user is simply someone who has been confirmed as having legitimate access to the platform, an authorized user receives further scrutiny to determine if they are authorized to use specific platform functionality. Without support for authentication, it becomes difficult to implement data access controls uniformly across a large organization. But if the security manager does support hierarchies, any new data access restrictions implemented above existing restrictions will automatically take precedence.
2) After evaluating their big data platform’s native support for security, companies must then determine if the platform complies with security protocols already in place. For example, a medical device company with access to highly sensitive or proprietary information about its customers (medical records, social security numbers, even credit card data), may protect that data by requiring internal users pass a fingerprint or facial recognition check before that data can be accessed. Does restricting data access like that interrupt the big data platform’s workflow? Will security cause bottlenecks in the flow of data that could impact the big data platform’s response time as it analyzes data? This could be a serious problem in mission-critical data analysis situations where the time between analysis and a recommended action is measured in milliseconds; any delay could have unacceptable consequences.
3) A common way to secure data is to encrypt it, so companies should also confirm that any big data platform under consideration supports standard encryption specifications in both hardware and software. Hardware support for any data-at-rest encryption/decryption protocol is vital as it offers much faster performance than software-based support. Additionally, look for big data platforms that are compatible with data-in-transit software standards such as:
- TLS (Transport Layer Security) - Encrypts data as it is moved between software components in order to keep hackers from using sniffers and other hacking tools to intercept it.
- AES (Advanced Encryption Standard) - A common algorithmic standard for securing electronic data established by the National Institute of Standards and Technology (NIST) and adopted by the U.S. government.
- SHA (Secure Hash Algorithm) - Another data encryption standard from NIST.
- PKC (Public Key Cryptography) - A key-based encryption system that lets users encrypt data using the sender’s public key, but that encrypted message can only be decrypted with the receiver's private key. It allows for secure communication between an entity and a large number of users.
4) The final consideration, as with many other software applications, is that the security settings in your big data platform may often be set to the lowest, most permissive settings by default. This allows for easier setup and configuration, but many companies forget to review and adjust the platform’s default security settings. An example of this involves TLS. Many data applications are set by default to disable TLS on data in transit to avoid slowing down network performance. Another default setting that can routinely be missed is data platform administrator passwords. Vendors often ship their product without usernames and passwords in place or with obvious passwords (like “Admin” or “1234”) in place to simplify initial installation. Hackers are well aware of this and routinely look for gaps in TLS settings or other default security settings that can be exploited to gain illegitimate access to data.
In summary, big data security should never be an afterthought, and security policies and procedures should be reviewed frequently. Auditing policies and user access rights should be closely scrutinized and managed. Although data breaches are always possible, adherence to these standards adds a level of confidence that your data fortress is well protected.
For more information on the HPCC Systems Security Manager, please consult the following resources:
• Installation and Administration Guide: HPCC Systems Security Manager
• Video: Configuring Security Manager Plugins on HPCC Systems
• Blog: Cryptographic Standard Library ECL Module
• Tech Talk: Security & HPCC Systems- Cryptographic ECL Library (go to 1:15:15 timestamp)
Learn more about HPCC Systems, the open source platform that provides flexible and responsive data lake management with better performance, near real-time results, and full-spectrum operational scale. Simple. Fast. Accurate. Cost Effective.