Setting up an HPCC Systems cluster on AWS EKS

Current development work on the HPCC Systems open source project is focused on providing a Cloud native version of the platform. Our new platform will be able to run on any cloud provider and we have been testing on different cloud platforms as we work through the development process.

Our Going Cloud Native resources provide what you need to take our new platform for a test drive and contribute to the ongoing development process by providing feedback about your experience using our Community Issue Tracker.

Previous blogs in this series walk through how to setup a default system using a Helm file or using Microsoft Azure and Kubernetes and how to persist your data, configure storage and import data

This blog focuses on using HPCC Systems on the AWS Elastic Kubernetes Service (EKS) and is the result of a collaboration between LexisNexis Risk Solutions Group and Amazon Web Services.

Photo of Lucas Varella

Lucas Varella is a student of Information Systems studying at the Federal University of Santa Catarina in Brazil. Lucas is working with Hugo Watanuki, completing a year long internship in the LexisNexis Risk Solutions Group Brazil office. He is working on projects that support ongoing development on the HPCC Systems cloud native platform. See the poster Lucas submitted into the 2020 HPCC Systems Poster Contest titled: A Cross Provider Assessment for HPCC Systems Container Orchestration.

Photo of Hugo Watanuki

Hugo Watanuki is a Tech Support Engineer and ECL Instructor for LexisNexis Risk Solutions. He supports the development and delivery of training programs for the HPCC Systems platform in the Brazil region. Hugo has worked for over 13 years on various technical roles in the IT industry with a focus on High Performance Computing. He is also a part time researcher on Information Systems and a member of the UK Academy for Information Systems.

Photo of Xiaoming Wang

Xiaoming Wang (Ming) is a Senior Software Engineer at LexisNexis Risk Solutions. Ming is responsible for various HPCC Systems projects and builds as well as installation and configuration of the HPCC Systems platform. He also works on several cloud solutions for HPCC Systems, such as Instant Cloud for AWS, Cloud Formation Solution, Juju Charm and Docker/Kubernetes and more.

Photo of Akash Gheewala

Akash Gheewala is a Solutions Architect at Amazon Web Services (AWS), responsible for helping global companies across the high tech vertical on their journey to the cloud. He does this through his passion for accelerating digital transformation for customers and building highly scalable and cost-effective solutions in the cloud. Akash also enjoys mental models, creating content and vagabonding about the world! 

********************************************************************************************

Unlike Azure’s azurefile, there is no out-of-the-box network file system storage class in AWS. The EBS volume, gp2, can be used for the deployment of an HPCC Systems platform cluster on AWS, but the cluster cannot be scaled since the “deployment type” Pod controller is used for all HPCC Systems Pods. This doesn’t support dynamic volume creation. The Elastic File System (EFS) service, a simple scalable, fully managed elastic NFS file system provided by AWS, can be used to create an NFS type server and create an NFS type storage class on AWS.

Recently, the Amazon EFS Container Storage Interface (CSI) driver provides an interface allowing Kubernetes clusters running on AWS to manage the lifecycle of Amazon EFS file system. On top of that, the efs-provisioner allows you to mount EFS storage as PersistentVolumes in kubernetes, as it consists of a container that has access to an AWS EFS resource. The container reads a configmap which contains the EFS filesystem ID, the AWS region and the name you want to use for your efs-provisioner.

These are the steps required to deploy an HPCC Systems Custer with EFS on EKS:

  1. Create an AWS account if you don’t have a one.
  2. Create the IAM user and add policies for create/manage EKS
  3. Configure the AWS CLI, EKSCTL, Helm and KubeCTL
  4. Create the EFS service
  5. Create an EKS cluster
  6. Install Amazon EFS CSI driver and external provisioner for Amazon EFS
  7. Deploy the HPCC Systems cluster on EKS

Note: IAM user, policies and configure AWS are mainly for operations through AWS Command-line Client.

Creating an AWS free account and IAM user

If you do not already have an AWS account, create one using the AWS Management Console and then follow these steps:

Creating an IAM user

  1. Create an IAM user using the AWS IAM service
  2. Make a note of the ACCESS_KEY and SECRET_KEY.

Creating an EKS policy

  1. From IAM in the left panel, select Policies.
  2. Click the Create Policy button.
  3. Click JSON and copy/paste the following, giving it a name, for example, AmazonEKSAdminPolicy.
{
    "Version": "2012-10-17",
    "Statement": [
      {
         "Effect": "Allow",
         "Action": [
           "eks:*"
         ],
         "Resource": "*"
      },
      {
         "Effect": "Allow",
         "Action": "iam:PassRole",
         "Resource": "*",
         "Condition": {
           "StringEquals": {
             "iam:PassedToService": "eks.amazonaws.com"
             }
         }
      }
  ]
}

Adding policies to the IAM user

  1. Add the EKS policy (shown above) to the IAM user.
  2. Other policies are also needed, here is the list of policies to add:
AmazonElasticFileSystemFullAccess
AWSCloudFormationFullAccess
AmazonEC2FullAccess
IAMFullAccess
AmazonEKSClusterPolicy
AmazonEKSWorkerNodePolicy
AmazonS3FullAccess
CloudFrontFullAccess
AmazonVPCFullAccess
AmazonEKSServicePolicy

Note: IF you are using a free tier account in AWS, you may be limited to adding a maximum of 10 managed policies to your IAM user. In that case, please note that you can still access the JSON content from the above managed policies and add them as an inline policy.

Preparing the AWS CLI, EKS eksctl, and Kubernetes Client Kubectl

Note: To proceed, you must already have an ACCESS KEY ID and SECRET ACCESS KEY (see above). You may also find it useful to have the AWS Getting Started with eksctl guide available for reference. We recommend executing all the steps going forward on a Linux shell.

The following tools are required for running this blog. Install them in your PC by using the following links in case you haven´t them already:

Configuring AWS 

Use the command shown below and provide the ACCESS KEY and SECRET ACCESS KEY:

 aws configure

Configuring AWS with a profile

If you already have an AWS configuration and want to configure it with your newly created IAM user you can configure AWS with a profile:

 aws configure --profile <profile-name>

Create and export an environment variable:

 AWS_PROFILE=<profile-name>

This is your current AWS profile. It is useful to know this when you have multiple AWS accounts and IAM users

Note:  You need the following for any eksctl command:

 --profile <profile-name> 

Also you may need to switch the kubectl context with the following command:

 kubectl config

Creating the Elastic File System (EFS) Server

This can either be done using the AWS console service EFS or the AWS CLI and involves selecting a region and a VPC (subnets). Whichever method you choose, there are two steps to follow:

  • Create an EFS in a region
  • Make sure your mount targets are available for all interested availability zones. 

Note: If you don’t know which zones are needed, set all of them in the region.

Creating the EFS with AWS CLI

Follow these steps:

  1. Create an EFS in a region:
 aws efs create-file-system  --throughput-mode bursting  --tags "Key=Name,Value=<EFS NAME>" --region <REGION>
  1. Get an EFS ID:
 aws efs describe-file-systems --region <REGION>

The output shows “NAME”  and FileSystemID.

The EFS server FQDN will be:

 <EFS ID>.efs.<REGION>.amazonaws.com
  1. Find a VPC (which should have public subnets). If you don’t know which VPC to use, you can use the default:
 aws ec2 describe-vpcs --region <REGION>

The output shows the VPC ID.

  1. To get all subnets for all available zones in the region use the following command:
 aws ec2 describe-subnets --region <REGION> --filters "Name=vpc-id,Values=<VPC ID>"

The output shows “AvailabilityZone” “Subnets ID”.

Note: You need to use VPC public subnets otherwise the Pods will not be able to access the EFS server. The simplest way to achieve this is to use the default VPC which has public subnets.

Creating the mount target

Follow these steps:

  1. To create the mount target use the following command:​​​
 aws efs create-mount-target --region <REGION>  --file-system-id <EFS ID> --subnet-id <Subnet id>

Usually an AWS EKS cluster needs at least two available zones, so create two mount targets (2 AZs). If you are not sure, create mount targets for all AZs.

  1. To display the mount targets you have just created, use the following command:
 aws efs describe-mount-targets --region <REGION> --file-system-id <EFS ID>

Keep the following EFS information handy, as it will be needed going forward:

  • REGION
  • EFS ID
  • Subnet IDs
  • Mount target IDs

Creating an EKS cluster with eksctl

Since we are creating an EFS on the selected VPC, we will use the existing VPC subnets.

If, for example, we created an EFS on region us-east-1 with mount targets for all available zones, the subnet id on zone us-east-1a would be subnet-05a2f12b and the subnet id on zone us-east-1b would be subnet-9e08ecd3.

For the purposes of this tutorial, we are using the following setup:

  • Node type “t3.medium” with an initial 3 nodes and a maximum 4 nodes.
  • Cluster name is “hpcc-1”
eksctl create cluster \
     --name hpcc-1 \
     --region us-east-1 \
     --nodegroup-name hpcc-workers \
     --node-type t3.medium \
     --nodes 3 \
     --nodes-min 1 \
     --nodes-max 4 \
     --managed \
     --vpc-public-subnets subnet-05a2f12b \
     --vpc-public-subnets subnet-9e08ecd3    

Creating an EKS cluster usually takes 15 to 30 minutes. When it is done, run the following command:

 eksctl get cluster

Installing the Amazon EFS CSI driver and external provisioner for Amazon EFS

This needs to be done using the AWS CLI and there are two main steps to follow:

  • Install Amazon EFS CSI driver
  • Install the external provisioner for Amazon EFS. 

Installing the Amazon EFS CSI driver

After successfully creating your cluster, it is time to install the EFS CSI driver on your EKS cluster. Follow these steps:

  1. Install the Amazon EFS CSI driver using the following command:
 kubectl apply -k "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/ecr/?ref=release-1.0"   
  1. The efs-csi pods should now be running. Check this using the following command:
 kubectl get pods -n kube-system | grep -i efs-csi

Installing the external provisioner for Amazon EFS

Now, deploy the external provisioner for Amazon EFS by following these steps:

  1. As a pre-requisite, make sure your EKS cluster resources have inbound access to the mount targets by obtaining the security group information from your EKS cluster resources:
 aws eks describe-cluster --name hpcc-1 \
--query cluster.resourcesVpcConfig.clusterSecurityGroupId
  1. Next, obtain the security group information from your mount targets:
 aws efs describe-mount-target-security-groups --mount-target-id <mount-target-id>
  1. Now authorize inbound access to the security group for the EFS mount target (if the permission is already granted you will get an ‘already exists’ exception – that is ok):
aws ec2 authorize-security-group-ingress \
--group-id <ID of the security group created for Amazon EFS mount target> \
--protocol tcp \
--port 2049 \
--source-group <ID of the security group created for the EKS cluster> \
--region <REGION>
  1. Add the efs-provisioner helm chart repository by running:
helm repo add efs-provisioner https://charts.helm.sh/stable 
  1. Deploy an efs-provisioner using the following command:
helm install my-efs-provisioner \
efs-provisioner/efs-provisioner \
--set efsProvisioner.efsFileSystemId=<EFS ID> \
--set efsProvisioner.awsRegion=<REGION>
  1. You should now have an efs-provisioner pod running:
 kubectl get pods

Deploying the HPCC System platform cluster

Use the steps below to fetch, modify and deploy the HPCC Systems charts.

  1. First, add the hpcc helm chart repository using the following command:
 helm repo add hpcc https://hpcc-systems.github.io/helm-chart/
  1. Deploy an HPCC Systems Platform with the following command: ​​​​​
helm install mycluster hpcc/hpcc \
--set global.image.version=latest \
--set storage.dllStorage.storageClass=aws-efs \
--set storage.daliStorage.storageClass=aws-efs \
--set storage.dataStorage.storageClass=aws-efs

If successful, you should see an output like this:

NAME: mycluster
LAST DEPLOYED: Tue Sep  8 23:04:58 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
  1. Now validate your deployment. At this point, your hpcc pods should be running. To verify this, use the following commands:
kubectl get pods
kubectl get pv
kubectl get pvc

Finally, you can get ESP FQDN and check ECL Watch as follows:

 kubectl get services | grep eclwatch | awk '{print$4}'

You should get an output like this:

 a312c7c7d80af43a290dda74d205ebcf-1061652479.us-east-1.elb.amazonaws.com

Now, open a browser and go to:

 http://a312c7c7d80af43a290dda74d205ebcf-1061652479.us-east-1.elb.amazonaws.com:8010/

If everything is working as expected, the ECL Watch landing page will be displayed.

Deleting the HPCC System Platform Cluster and EFS Persistent Volumes

To delete the cluster, use the following command:

 helm uninstall mycluster

To delete the EFS provisioner, use the following command:

 helm uninstall my-efs-provisioner

Note: EFS Persistent Volumes may still exist. You can either re-use them or delete them using the following command:

 kubectl delete pv <pv name>

Alternatively, to delete all, use the following command:

 kubectl delete pv --all

To delete the EFS-CSI, use the following command:

 kubectl delete -k "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/ecr/?ref=release-1.0"

Deleting the EKS Cluster

To delete the EKS cluster, use the following command:

 eksctl delete cluster <cluster name>

Note: Sometimes certain resources may fail to delete. You can access AWS Service CloudFormation from AWS console to cleanup “Stack”

Deleting the EFS

To delete the mount targets use the following command:

 aws efs delete-mount-target --mount-target-id <mount target ID>

To delete the EFS use the following command:

 aws efs delete-file-system --file-system-id <EFS ID>