Massive computation workflows utilize High Performance Computing Cluster (HPCC) Systems environments that require significant resources, but these resources are only needed during processing. A commercial cloud service is a great resource to facilitate a short term need for computing capacity without a long term commitment. It allows the user to access computing services, such as servers, storage, networking, and software over the internet (“the cloud”). In this blog, we discuss how HPCC Systems was deployed on the Microsoft Azure cloud computing platform to test custom data.
Yash Mishra, a Graduate Research Assistant at Clemson University, introduced his HPCC Systems research project on “Deploying HPCC Systems on Microsoft Azure,” at HPCC Systems Tech Talk 31. The full recording of Yash’s Tech Talk is available on YouTube.
The goal of Yash’s research project is to use HPCC Systems to provision HPCC Systems on Microsoft Azure to test a candidate data-intensive workload and identify best configuration options for the workload. This project is the latest in an ongoing effort to identify different configuration options for deploying HPCC Systems on a commercial cloud platform.
In this blog, we discuss:
- The need for HPCC Systems on a cloud computing platform
- Related Work
- Azure deployment strategy
- Architectural design choices
- Azure portal vs ARM templates
- Initial architecture
- Expected outcomes
Let’s begin by discussing the need for HPCC Systems on a cloud computing platform.
Need for HPCC Systems on the Cloud
For massive computation workflows, researchers and scientists need:
- Distributed computing architecture
- Resources only necessary at the time of actual processing
- Different options to provision resources while saving costs
- The ability to utilize scalability on demand
- The capacity to instantly create, manage, and terminate resources
The cloud provides services to meet these requirements, and provides a platform to effectively utilize resources safely, efficiently, and without incurring significant cost.
HPCC Systems currently has an HPCC Systems Instant Cloud for Amazon Web Services  that allows the user to:
- Provision and deploy Thor and ROXIE clusters on demand
- Easily monitor running instances
- Use different region options to deploy
- Terminate clusters with a single click
This guide provides details and guidance in running an HPCC Systems® Platform inside an Amazon Web Services (AWS) Elastic Cloud (EC2) using the Instant Cloud for AWS page.
Azure Deployment Strategy
HPCC Systems was deployed on the Microsoft Azure cloud computing platform in two phases:
1. Configuring single node HPCC Systems and
2. Configuring Multiple instances
The steps to configure HPCC Systems include:
- Identifying type of compute instance based on costs and performance parameters
- Creating an HPCC Systems VM image
– Choosing an operating system
– Installing the appropriate HPCC software and version
– Generating a base image that has HPCC Systems software
- Creating Resource Group (this hosts all associated Azure resources in one place)
- Creating virtual network for the HPCC Systems environment
- Creating network security groups to be associated with VMs
- Creating virtual network interface card
- Creating VMs from the base image
- Identify mount points
- Creating and mounting disk drive
- Configuring data and log files on the mounted disk drive
- Creating system links to data and log files
- Cleaning log files
- Starting cluster
Architectural Design Choices
On a cloud platform, the user can try various configurations to determine the best architecture for a specific workload.
To design the architecture for the cloud-based deployment of HPCC Systems in Microsoft Azure, the following were considered:
Compute (hosting model for the computing resources on which applications run)
- A – Series: Best suited for entry level workloads
- Bs – Series: Economical burstable VMs
- D – Series: General purpose compute
The virtual machine chosen depends on various workload requirements, such as the type of workload, cost constraints, time, etc.
VM size (CPU, memory, storage)
Some workloads require more storage with less memory while some require less storage with more memory. The sizing of VMs should be determined based on the data that you would like to test with. High storage and memory add up to higher cost of deployment. Users have various options of configurations to keep the costs as low as possible while getting the maximum performance. It would be a good idea to run a quick cost analysis of storage and memory using Azure’s Pricing Calculator to identify the optimal priced configuration before using them.
- Blob: Massively scalable object storage for unstructured data
- Data Lake Storage: Limitless storage for analytics data (scalable)
- Distributed File System
- Latency considerations
The region where the cluster is deployed plays a part in performance and pricing. The closer the region, the better the performance. Cost for various services also vary by region. Microsoft Azure has a pricing calculator to help estimate cost.
HPCC Systems Resources to Map on Azure
The HPCC Systems resources used to map on Azure include:
- Thor cluster
- ROXIE cluster
- Dali | DFU Server | Landing Zone | Drop Zone | ECL Agent | ECL Server | ESP server | Sasha
Azure Portal vs ARM (Azure Resource Manager) Templates
The Azure Portal is a web-based, unified console that allows the user to build, manage, and monitor complex cloud deployments. The Azure Portal can be used to configure HPCC Systems.
The Azure Portal:
- Is ideal for small clusters
- May involve significant manual tasks if provisioning large scale clusters
- Provides easy monitoring of deployed resources
As the cluster size increases, the task of choosing several configuration options for various VMs tends to become more tedious and time consuming. To automate this, a better approach is leveraging Azure Resource Manger Templates using automated configuration deployment and management tools such as Ansible.
The ARM template offers:
- Easy orchestration of resources
- A resource manager that converts template into REST API operations
- Resource Manager orchestrates the deployment of interdependent resources so they’re created in the correct order.
- When possible, Resource Manager deploys resources in parallel so that the user’s deployments finish faster than serial deployments.
ARM Template Example
In this example, the Resource manager converts template to REST API operations. Instead of manually using the Azure portal, and then manually configuring resources and setting the order, the tasks are automated.
The resources on the left side are configured as a simple JSON template for a storage account. This allows the user to select the location and the properties associated with the resources.
Ansible  is an open-source product that automates cloud provisioning, configuration management, and application deployments. Using Ansible you can provision virtual machines, containers, and network and complete cloud infrastructures. Also, it allows you to automate the deployment and configuration of resources in your environment.
The integration of Ansible with ARM resource templates provided flexibility to configure HPCC Systems cluster to match with those resources.
The initial architecture for this project has everything within a private network and the respective THOR and ROXIE components.
The overall aim of this research is to:
- Configure scalable HPCC Systems on Azure cloud
- Identify best configuration for Azure deployment, given a workload
- Answer the following research questions:
1. Should a HPCC Systems environment on Azure have large number of nodes with small memory or small number of nodes with large memory?
2. What type of storage would be best to handle massive workload results?
3. What could be an optimal auto-scaling setting for Thor and ROXIE instances?
Future endeavors include work on Kubernetes support for HPCC Systems on Azure using Azure Kubernetes Services (AKS) , and exploration of the possibility of containerized environments.
About Yash Mishra
Yash Mishra is a graduate student in Computer Science at Clemson University. He is a Research Assistant at Data Intensive Computing Ecosystems (DICE) Lab at Clemson University, working under the supervision of Dr. Amy Apon. He has a growing interest in Cloud Computing, and architecting cloud-based solutions for domain specific workloads. He was introduced to HPCC Systems in the Cloud Computing Architecture class at Clemson and has been involved in identifying different configuration options to deploy HPCC Systems on commercial cloud.
A special thank you to Yash Mishra for his amazing presentation, “Deploying HPCC Systems on Microsoft Azure.” A special acknowledgement also goes to Dr. Amy Apon, Professor and Chair – Division of Computer Science School of Computing at Clemson University, for her guidance during this project, and to Dan Camper, Sr. Architect at LexisNexis Risk Solutions, for his mentorship.
 Templates overview – Azure Resource Manager. Retrieved from https://docs.microsoft.com/en-us/azure/azure-resource-manager/templates/overview
 Azure Kubernetes Service (AKS) | Microsoft Azure. https://azure.microsoft.com/en-us/services/kubernetes-service/
 Using Ansible with Azure. https://docs.microsoft.com/en-us/azure/developer/ansible/overview .