Leveraging HPCC Systems as Part of an Information Security, Privacy, and Compliance Framework

Overview of IADP Organization

Pic 1

Andy Bayer, Information Assurance Program Director, RELX, gave an overview of the IADP organization at HPCC Systems Community Day 2019, during the General Session. The RELX Group Information Assurance and Data Protection organization (IADP) provides oversight of privacy, security, and compliance practices as part of the company’s comprehensive risk mitigation program. The IADP generally works with Risk Solutions and Legal and Professional business, focusing on PII (personal identifiable information) and SPII (sensitive identifiable information), that are available through LexisNexis online products. Key functions of this organization include:

Fraud Detection – The IADP mitigates risk by detecting anomalous activity that may be an indication of fraudulent behavior. 

Investigations and Incident Response – The investigations team conducts data security investigations and responds to incidents utilizing various toons in accordance with established processes.  

Ongoing Compliance – The IADP complies with regulatory requirements, applicable laws, data provider restrictions, and internal policies. 

In order for the IADP to perform these functions, relevant data is collected and retained from numerous online product repositories across the business, including:

  • Customer account metadata
  • Administrative activity logs
  • Product authentication logs
  • Transaction logs

Fraud Detection

The IADP fraud detection system detects suspicious activity through a set of rules that identify anomalous behavior. The rules-based criteria includes:

  • Geographical location
  • Time Stamp
  • Industry
  • Authentication Method
  • Search Type
  • Search Velocity
  • Search Volume
  • Previous User Activity

An indication of potential fraud then goes through a workflow tool that is used by the Investigations Team.

Investigations & Incident Response

Trained investigators follow a documented investigative process to respond to raised alerts. During the investigation process, investigators:

  • Log and maintain user search records and review and investigate exception activity reports, as necessary.
  • Respond to security incidents.

The last part of this overview of the IADP organization focuses on ongoing compliance.

Ongoing Compliance

In the ongoing compliance process, an organization works with customers and sends them searches that they have done, to verify that the searches were authorized, conducted for legitimate business purposes, and in accordance with legal and regulatory requirements. 

The compliance process includes 3 types of audits: 

  • Random audits
  • Event-driven audits
  • Audits in which customers conduct searches on high profile individuals

So, how is IADP continuously improving?

Continuous Improvement

To respond better to the current and future environment, the IADP must adapt, grow, and capitalize on all data and analysis available, using new technologies to their fullest capabilities. 

There are two areas of focus for continuous improvement:

  • Effectiveness: Is the IADP monitoring, alerting and auditing on the right things.
  • Efficiency: Does the IADP identify suspicious activity and respond quickly and accurately.

Going forward there will be more of an emphasis on behavioral versus location and time based analytics, and a shift toward data modeling instead of reactive response.

This completes the overview of the IADP organization. Now we will move on to operational aspects that were considered when integrating HPCC Systems into the current infrastructure.

Operational Considerations

Pic 2

Marcus Mullins, CISSP, Security Engineering Manager, RELX, spoke about the following operational aspects that were considered during integration: 

  •   Workflow Integration
  •   Data Management
  •   Security 

Workflow Integration

Workflow Integration – A Customer logs into a Product and performs various transactions. All activity is logged and collected into the Landing Zone.

Pic 3

Data Ingestion – The data is then ingested from the Landing Zone into an HPCC Systems Cluster.

Pic 4

Report Queries – Next, we have the output from the data. There are Reporting Tools that allow various ways of accessing data. 

Pic 5

Process State Tracking – The MySQL tool is used to coordinate the state between the processes on the Landing Zone and the Reporting Tools.

Pic 6

The data in MySQL is also used to present alerts, data on a Dashboard, and various other operations, so that the IADP Operations staff can monitor data ingestion, the flow of report queries, etc. 

Pic 7

Data Management

Landing Zone Space Management – Over 30 Gigabytes of data are pulled into the HPCC Systems Landing Zone for processing. As such, an automated truncation process is used to roll old data out of the Landing Zone, so that it will not fill up.  

Data Normalization – Data is pulled in from over 60 sources coming from various products, so the incoming data must be normalized into a common format. This is done to ensure that ECL code has a consistent way of accessing the data. 

Record Versioning – There is no static one-time event for metadata, a user account, or customer information, so it is necessary to keep track of versions of the data.   

The diagram below shows a user record.

Pic 8

The phone number and date in the record changes, and another version is created.

Pic 9

The phone and date change again, and another version is created.

Pic 10

Data Truncation – Due to the large amount of data ingested into the cluster, a data truncation process is required. Storage is not unlimited in the cluster, and care must be taken to prevent over-utilization.  

Pic 11


User Authentication/Authorization – Protection of PII and SPII data is vital. There are various user authentication/authorization mechanisms that use Active Directory to control the communications between the elements of the system, including users.

Pic 12

The data at rest (inactive data that is stored physically in any digital form) must also be protected, so encryption is used on the cluster, as well as on the reporting tool server. This offers an additional layer of protection in the event that someone is able to get onto the machine. 

The final discussion on leveraging HPCC systems as part of an information security, privacy, and compliance framework offers details of how IADP applies HPCC Systems in their work. 

HPCC Systems Project Detail

Pic 13

Mohammed Naweed, Consulting Software Engineer for LexisNexis Risk solutions, spoke about how IADP applies HPCC Systems in daily operations. 

LexisNexis has a wide variety of products catering to numerous industries. Thousands of users use these products on a daily basis. These products have built-in safety mechanisms for identity and access management, but as an organization, responsibility does not end there. One of the main responsibilities of the IADP is to make sure that LexisNexis products are used by the right people in the right way, and for the right purpose, and this application is one of the main tools used to ensure that. 

Project Requirements

The two main requirements for the HPCC Systems Project are to:
1. Check every user event for fraud as soon as it is created.
The sooner that fraudulent activity is identified, the sooner that action can be taking against it to prevent damage. 

2. Create and deliver the reports requested by investigators in the shortest time possible.
In any investigation for fraud or real-life crime, the most important thing is to provide this information to the investigators. 

Project Challenges

The challenges for this project are:

  • More than 30 million are processed every day.
  • Data comes from more than 60 Data Sources, and these sources are increasing.
  • Apart from user events, the only types of data available are Meta Data, Look up Data, and IP Geo Location Data.
  • Reports are created based on Investigators’ search requests. 
  • Data Corruption – When dealing with such a high volume and velocity of data, there is always a risk of data corruption. As such, IADP maintains a “Last Known Good State of the System,” to ensure data recovery, as required.

After considering the requirements and challenges for this project, the technology that IADP chose for the project was HPCC Systems. HPCC Systems has processing capabilities, scalability, low cost maintenance, and open source availability that make it a simple and effective architecture.

Application Architecture

The diagram below is a high level representation of the application architecture. Meta Data, IP Geo Location Data, and Authentication Transaction Logs are fed into HPCC Systems, where it is processed (data ingestion, fraud detection/alerts, advanced search/reporting). Investigators use the IADP Web Application to create their searches. Elastic Search Meta Data is used by the IADP Web Application. When the Investigators finalize the query for the search, the information is stored in the My SQL database. HPCC Systems has a process that reads the database, picks up the new queries, and generates reports. The reports are stored in the Reports Repository.

Pic 14

Data Processing

On the HPCC Systems side, the Meta Data is ingested with the file structure shown in the diagram below. Multiple generations of the data are maintained. If bad data is received, the user must go back to the previous version.

Pic 15

A similar file structure is used for the IP Geo Location Data. 

Pic 16

Data processing for User Action Logs involves the following steps:

Data Validation – The data must be accurate and in the correct format.

Pic 17

Data Formatting – The data must be formatted according to requirements.

Pic 18

Data Enhancement – Data is linked to the Meta Data and IP Geo Location Data, which provides more insight for the investigators.

Pic 19

Fraud Check – check for suspicious activity and alert the proper personnel.

Pic 20

Build Indexes – Indexes are payload and non-payload. Each data feed has its own index, which allows for isolation of a bad data feed. The bad data can be quarantined, fixed, and fed back into the system without affecting the functionality of the system.

Pic 21

HPCC Systems – Cluster Architecture

In the diagram below, data goes into Thor Q1 where data ingestion and fraud detection take place. On the right hand side there is Thor Q2. Normally, there would be a Roxie, but HPCC Systems allows the flexibility to add a second Thor for query processing and report creation. 

Pic 22


The benefits of applying HPCC Systems to IADP operations for the purpose of information security, privacy, and compliance are:

  • Time taken to check every user event for fraud is reduced from hours to minutes.
  • More exhaustive Fraud detection – able to check for more types of fraud.
  • Faster report creation 
  • Ability to create more complex queries


A special thank you to Andy Bayer, Marcus Mullins, and Naweed Mohammed for a wonderful presentation. Their presentation, “Leveraging HPCC Systems as Part of an Information Security, Privacy, and Compliance Framework,” is available on YouTube.