Securing your HPCC Systems environment and protecting your data
Russ Whitehead (William) is an HPCC Systems Software Architect responsible for the HPCC Systems security framework. Russ has a BS degree in Computer Science from University of Florida, and an MBA from the University of Miami. His background is operating systems development, network management and voice recognition.
In this blog, Russ walks through some of the security features that are available to protect your HPCC Systems clusters and data. There are a number of different methods available to suit your requirements, whether you are focused on testing right through to providing a multi-user production ready environment.
HPCC Systems is a mature, open source, massively parallel-processing computing platform that solves Big Data problems. With every release, advancements in the security framework have helped to make HPCC Systems one of the most secure big data processing platforms available to the open source community. This blog highlights some of the many security features that make HPCC Systems a compelling solution for users that require a robust, configurable, highly secure computing platform.
Security Managers
HPCC Systems provides a number of different configurable security managers, any of which can be chosen by the system administrator based on the need to ensure the identity of users and administrators and to safeguard access to HPCC Systems resources and services such as:
- Landing zones, for files ready to be submitted to HPCC Systems for ETL
- File scopes
- Recordset data and metadata
- ECL (Enterprise Query Language) Query/Workunit execution
- File spraying and de-spraying
- The ever growing array of new ESP (Enterprise Services Platform) services and features
- And more…
Security Managers are selected and configured by an administrator using the HPCC Systems Configuration Manager service and a web browser bound to the Configuration Manager service port. When configuring the HPCC Systems platform, or at any later time, an administrator can create one or more security manager software components, and bind them to ESP/Dali/EClServer components depending on the individual security requirements.
The following image shows how you can select a particular security manager using the HPCC Systems configuration manager tool:
In this blog, each security manager is discussed at a high level.
Username-only Security Manager
By default, HPCC Systems does not enable any type of security manager. Any user that can access the ECLWatch IP address can anonymously create and run workunits (HPCC Systems queries), view/modify/delete files and execute platform services.
To provide a higher degree of accountability, an administrator can enable the Username-only Security Manager. It is technically not a security manager, but when configured it requires a username to be entered in order to access ECLWatch. The username is not verified, but is used to log HPCC Systems activities and workunit execution. This is appropriate for smaller multi-user deployments that want to track user activity and operate in a protected silo or do not contain sensitive data.
Single User Security Manager
The Single User security manager is a very specialized security manager that allows a username/password combination to be specified on the ESP startup command line.
At run time, whenever a user tries to login to ECLWatch or access any other authenticating ESP feature, they must specify this same username/password combination in order to succeed. Single User Security could be useful for a custom deployment where you do not want to configure an LDAP server or create a Linux HTPASSWD file, such as a classroom environment or a custom HPCC Systems Virtual Machine.
HTPASSWD Security Manager
The HTPASSWD security manager is provided for deployments that only want to provide username/password authentication of a user or users.
It interfaces with the encrypted Linux Apache HTPASSWD file and programmatically verifies the user’s provided username/password with the password hash stored in the password file. Once successfully authenticated, the user is granted full access to the HPCC Systems and all services and resources. There is no concept of an administrator; every authenticated user has full rights and permissions. HTPASSWD security is typically chosen for smaller deployments, perhaps on a campus or a public cloud, which only require protection from unauthorized access.
LDAP Active Directory Security Manager
The LDAP (Lightweight Directory Access Protocol) Active Directory Security Manager is the most powerful and dynamic security manager offered with HPCC Systems.
When selected and configured to bind to an Active Directory server, this security manager enables user and administrator authentication and provides a vast array of individual resource and feature level authorizations at run time. Permissions can be assigned to individual users and to Active Directory groups to which users can be assigned.
The Active Directory Security Manager can be configured to support either Microsoft ® Active Directory or the open source Red Hat 389 Directory Server and supports binding to the standard LDAP connection or the secure connection port. This security manager is most appropriate for enterprise deployments where there are many users working on different projects and there is a need to restrict users to only the data, features and resources appropriate for their workflow.
When the LDAP Active Directory Security Manager is configured, an administrator can create LDAP “file scopes” (and similarly workunit scopes and ECL Attribute scopes) which are representations of HPCC Systems hierarchical logical paths. The administrator can then assign permissions to those scopes, at both the users and group entities. To enable file scope verification, the administrator must bind the security manager to the Dali metadata server’s component at configuration time. When access to a logical file is requested, all permissions related to the user and their group memberships are queried and a single DENY will take precedence over all ALLOWs. Therefore if a user is a member of a group that is granted access to a scope, but the user is denied, the deny will take precedence. Because the HPCC Systems file system is logically hierarchical, a deny at any level above the one being requested will take precedence. The ECL Attribute repository and workunits can be protected in a similar manner.
Access to numerous HPCC features and resources can also be controlled, such as whether or not a user/group can utilize embedded C++ code in their ECL and which programs can be run via the Pipe command. These powerful ECL extensions can greatly enhance ECL productivity but are possibly malicious if an unscrupulous user attempts to read a file directly from the file system. Access to these features can be explicitly allowed, denied, or allowed only in ECL code that is digitally signed and the key is present. Read more about this in the section Code Signing, Embedded Languages, and Security in the HPCC Systems® Programmer’s Guide.
Similarly, permissions can be granted to numerous HPCC Systems resources and features. As with scopes, permissions can be granted and denied based on user and groups and the user is authenticated and authorized whenever they attempt to access these items.
The following image shows a fully configured LDAP Security Manager:
The LDAP Active Directory Security Manager is documented in the User Security Maintenance section of the HPCC Systems® Administrators Guide.
Pluggable Security Managers
Some HPCC Systems users may require a more customized manner of authentication such as:
- Fingerprint recognition.
- Access card reader.
- Storing security related settings in a private database, such as MySQL.
- Requiring an implementation with much higher security classifications and audit trails.
HPCC Systems provides a framework where security developers can create their own security manager software library component that can be plugged into the configuration process and enabled at HPCC Systems run time. Once configured and bound to an ESP/ECLServer/Dali component, this custom security manager is dynamically loaded and called upon by the HPCC Systems security framework to perform security related tasks.
A security manager of this type would typically be coded as a C++ Dynamic Link Library (SO/DLL), implementing the HPCC Systems ISecManager security interface. This interface is called upon by the HPCC Systems security framework to perform authentication, authorization and numerous other security related tasks. Once called on by HPCC Systems security framework, the pluggable security manager can perform these tasks in whatever manner is dictated by their security needs.
In addition to developing the security manager DLL, a set of configuration files (XML/XSD) must be created that describe the security manager’s configuration and the management options needed for the security manager to initialize and operate. At configuration time, the Configuration Manager will expose these options to the administrator, who can select from the choices made available by these configuration files. For instance, you might want to specify the interface details to a fingerprint scanner, or the IP address and credentials of a MySQL server.
The HTPASSWD security manager was developed as a Pluggable Security Manager, as a simple reference model for developers wanting to develop their own custom manager. A review of this project will demonstrate how quickly and easily a custom security manager can be implemented to provide specialized behavior.
More information about Pluggable Security managers is provided in the HPCC Systems Security Manager Plugin Framework Guide, which is available on the HPCC Systems Website.
Encryption At Rest
In addition to providing security managers to control authentication and authorization, HPCC Systems provides other security features to protect user data stored on a cluster.
Encryption and compression are supported by the ECL language and are documented in our ECL Language Reference. When an HPCC Systems recordset file is generated by ECL code, the OUTPUT statement can contain an option that instructs the data to be encrypted. When the ENCRYPT option is specified and a variable length encryption key is provided, the data is both 256-bit AES encrypted and LZW compressed before being serialized. Even if a malicious user is able to access the data from physical media, it is unusable to them without the key and a method to decompress and decrypt.
In addition, we recommend that hardware encryption should also be considered. Current physical storage technologies often provide high speed, on-the-fly in-hardware encryption/decryption when data is stored and retrieved, adding yet another level of confidence.
ECL Standard Library Cryptographic Module
Introduced in the HPCC Systems 7.0 release is a new STD.Crypto module that provides ECL developers with an assortment of cryptographic features to utilize in order to safeguard their sensitive data at the column level, using industry standard cryptographic algorithms. These features include digital hash algorithms, symmetric and asymmetric encryption and decryption, and digital signatures, all of which can be applied to individual columns within an ECL dataset.
More information about this feature is provided in our Standard Library Reference Guide.
Encryption In Transit
In addition to providing a means to encrypt data in persistent storage, it is important to ensure network snoopers cannot view and capture HPCC Systems socket data as it is being transmitted between and within HPCC Systems components.
Industry standard HTTPS is available as a configuration option for ECLWatch Browser communications with the Enterprise Services Platform (ESP). When configured with a public/private key pair, the user must access the ESP IP using a Transport Layer Security (TLS) connection over a secure port. TLS is a cryptographic protocol meant to protect content being transmitted over the physical layer, and does introduce a small latency as it encrypts and decrypts on the fly.
Internally, an administrator can configure a public/private key pair to be used to enable TLS on socket communication between HPCC System components and within a component. This ensures that even if a snooper was able to listen to the wire, the content would be encrypted.
A new feature recently added to the HPCC Systems 7.0.0 series is signed digital tokens. When components request data, that request must contain a digital token that was created and PKI signed by a trusted component, using the configured public/private key pair. Before honoring any data request, the token contents and signature are verified by the recipient.
Allowed IP List
The Dali server component has the ability to restrict access to only companion HPCC Systems nodes and utilities that are specified in a configurable “allowed IP list” in Dali’s configuration (“whitelist” in environment.xml). Friendly nodes are specified in this Dali environment’s section, either by IP address or hostname and include a comma separated list of the role(s) that component plays in an HPCC Systems cluster. When this feature is enabled, any attempt to communicate with Dali from a component not specified in the list will be denied and a log file entry is added to DaServer log file that identifies the offending caller. It should be noted that it is not necessary to specify components explicitly defined in the environment, such as Roxie and Thor.
This feature can be disabled, in which case log file entries alerting the admin are still created, but the Dali communication is allowed. Dali administrative tools such as updtdalienv and envmod are available to add, remove and view the allowed IP list dynamically, without have to restart the Dali component.
This feature is documented in the HPCC Systems® Administrators Guide.
Auditing and Logging
All of the components in an HPCC Systems cluster perform extensive logging of requests and responses. From these logs, users and administrators can monitor for unusual patterns of failed access attempts and take action to block malicious access.
Future security challenges
Securing your system and data is a never ending challenge and we will continue to rise to that challenge by providing innovative solutions to help our users stay one step ahead of the hackers.
HPCC Systems has been designed from the bottom up and inside out to be a secure, trusted framework for housing and accessing sensitive data. We are always moving forward with the latest security technologies and as you can see, we have solutions available to implement different levels of security measures to suit the requirements.