Following on from my earlier blogs about intern opportunities with LexisNexis and more specifically, the HPCC Systems intern program, the next 6 blogs in this series will each feature a student and the project they completed this summer as a developer on the HPCC Systems open source project. First up is Suk Hwan Hong who is a masters student in Computer Science at Georgia Tech. He joined the HPCC Systems platform team as part of the LexisNexis intern program. His project was to come up with a proof of concept for implementing Column Level Security (CLS) on HPCC Systems.
Suk worked alongside his mentor, Russ Whitehead, on this project, who is our go to expert on security. During Suk’s internship, they implemented most of the high level Column Level Security, which allows an administrator to create views to be applied when a user tries to run an existing query. So the focus here is on controlling access to columns in delivered data from queries for end users who simply run these deployed queries.
It’s not a simple problem to solve because although the data may be delivered with all columns available, some columns may include data that is more sensitive than others. It may be ok to authorize some users to see this sensitive data, but there may also be a need to deny access to others. However, it’s not just a matter of simply granting or denying access because sometimes, it is appropriate to allow users to see that a column exists but not show the data within and sometimes it is necessary to hide the fact that the column exists completely. Suk came up with four control modes to resolve this issue, grant, deny, mask and hide.
But sometimes it’s not a single column we are interested in, we may want to control access to a group of related columns which Suk has called ‘Views’. These are simply named ‘logical containers’ for the related data columns of interest, to which users can then be assigned access rights as shown below:
Along the way, Suk had to consider how to manage the following implementation challenges:
- Any post processing that may be required to mask or hide columns.
- Protecting against data inference
- Stricter checking of record layout hashes
- Minimising performance overheads
There is a second part to this project which is to implement low level CLS on HPCC Systems, focusing on columns in huge flat files in Thor. The target audience for this would be ECL developers who frequently deploy queries to Thor. More to come on this in the future!
Suk produced a poster of his project and entered it into the poster competition on Community Day at the HPCC Systems Engineering Summit. It generated a lot of interest and showed the hard work and dedication he put in to making his internship with us such a success:
Congratulations and thank you Suk, for making such a valuable contribution to the HPCC Systems open source project!
More information about our HPCC Systems interns of 2016...
- Read about Syed Rahman and the CSCS Machine Learning Algorithm
- Read about Sarthak Jain and the Latent Semantic Analysis Machine Learning Algorithm
- Read about Lily Xu and the YinYang K-Means Clustering Machine Learning Algorithm
- Read about Vivek Nair and his machine learning regression suite and ML plugins for the Data Science Portal
- Read about Shweta Oak and Non-negative Matrix Factorization on HPCC Systems
More about internship opportunities...