Getting up and running with HPCC Systems Part 1
If you are a first-time user who is curious to learn what HPCC Systems can do, or have been using our bare metal version for some time and would like to test drive our Cloud Native Platform, why not take a look using our play cluster. This first in a series of four blogs includes a number of useful takeaways, providing you with a step by step guide to using our play cluster with the ECL IDE or VS Code as well as ECL Watch.
This blog provides a high level overview, designed to help new users get started with HPCC Systems, and ECL (Enterprise Control Language). If you want to jump right in and set up your own HPCC Systems cloud native cluster, use our Cloud Native Platform wiki page, where you will find everything you need to get setup.
Other blogs in this series are coming soon, focusing on:
- Importing and working with new data
- Setting up your own local cluster via Docker Desktop and Kubernetes
- Comparing the ECL IDE vs Visual Studio code’s ECL Extension
What is HPCC Systems
In case you have not heard of HPCC Systems or ECL, here is a little bit of background.
Development on HPCC Systems began in late 1999. The aim was to provide a robust and easy to use End-to-End Data-Lake management technology platform, providing everything you need from data ingestion and data processing right the way through to data delivery to the end user. A more complete overview of the technology can be found here.
In 2011, HPCC Systems became an open-source platform and is now available for anyone to use for their own data analytics solutions (See us celebrating 10 years of open source in 2021 here). While it can provide you with everything you need for your End-to-End solutions, HPCC Systems is also flexible enough that you can also choose to use it to ingest and prepare your data for use elsewhere.
HPCC Systems can take exponential amounts of data and; break it apart, sort it, utilize the pertinent data to make accurate, significant connections; and then deliver the cleaned data to clients for their use. Our data libraries house petabytes of information, primed for reference at a moment’s notice by major insurance and legal entities running information checks. Our customers run HPCC Systems in data warehouses across the country that house the data and our bare metal clusters, however, only having on-site facilities has some vulnerabilities.
Protecting the massive amounts of sensitive data from possible outages is a priority for every major company. One way to do that is to shift, from using only bare metal clusters to a cloud native or hybrid platform. This allows the data to always be accessed, queried, and delivered to the customers with very little delay. Utilizing the cloud native platform provides a cost-effective solution to businesses who would otherwise not have the ability to build a data center. As well, if the data needs of a smaller business suddenly grows beyond their capacity to handle an increased volume of data, a cloud native platform can easily scale to their needs almost instantaneously!
Going to a cloud native platform reduces down time if your data center experiences an outage. It also means that depending on the amount of data you are processing you may have a substantial cost savings associated with not having a physical data center to manage. Data can be accessed from across the network with built in fail safes so you will always be able to get the information you need.
If you would like to fully ingest the history of ECL, please read through the White Paper, which explores how ECL works, how it tackles data problems, and how it’s an easier programming language to use compared to other solutions.
“Enterprise Control Language (ECL) is the query and control language developed to manage the HPCC (High Performance Cluster Computing) and truly differentiates it from other technologies in its ability to easily and efficiently provide flexible data analysis on a massive scale.” -David Bayliss (SVP and Chief Data Scientist)
ECL is a robust high level analytical coding language that performs complex operations while offering a friendly interface to work with for the user. If you want to learn more about how to use ECL, visit our online ECL training courses, which provide resources for beginners through to the advanced user.
Since the creation of ECL, it has been primarily implemented using the ECL IDE (Integrated Development Environment), however in the past few years other alternative IDEs such as Eclipse and Visual Studio Code have surfaced. The ECL IDE is still the in-house code builder originally designed for ECL.
An ECL extension has been made available for Visual Studio which is a code builder that supports a variety of languages.
Only one of these applications is required to work with ECL code and the play cluster but there are pros and cons to both which will be covered in the last part of this series. As a new user, I would recommend learning both. Each have their own advantages and at this point one is not superior to the other which will be explored another time.
Let’s first look at installing the ECL IDE and connecting it to a sample cluster, followed by the same process using the ECL Extension for VS Code.
THE ECL IDE
HPCC Systems was designed for use with the average computer in mind. You can use literally any modern computer and a stable internet connection to play with our product. When you are ready to dip your toe in the water, visit the Download page on our website.
Note: The ECL IDE only runs on the Windows operating systems. If you are using a Mac you will need to download Visual Studio and install the ECL extension. There will be a more in-depth look into VS Code with the ECL Extension below.
Download the ECL IDE and Client Tools. The Client Tools portion contains the essential compiler and the engine that generates the work units. More information on using the ECL IDE and Client Tools is available here.
Scroll down till you see the download steps and follow the instructions on the page.
Once you have the IDE installed, you will need to open it and connect to a cluster. The first time you open ECL IDE it will look something like the image below. Fill the fields as shown.
Note: Do not click OK once the fields are completed, the Preferences also need to be completed using the button provided.
A few things to point out for you if you are new to using the ECL IDE
- Leave Configuration set to default for now. If in the future you need to connect with other clusters you can add different configurations and easily switch between them.
- The Login ID needs to correspond with the work you do in ECL Watch. Typically, first name followed by last initial will be sufficient. Unless you find someone else working on the cluster that happens to have the same name as you then I recommend you use something more unique.
- A Password is not necessary, so just enter the username you wish to work with and then click ok only after you have entered the server information under the preferences button. (This will only need to be done the first time)
- The Error Log window located at the bottom right, can be closed once you are logged in, by clicking the “x” in the top right hand corner. The Syntax Errors window displays any information needed to indicate if there are any issues with your code.
Setting up the Server ID in Preferences
After you have entered your username click on the preferences button.
The HPCC Systems play cluster is available for users to use to try out our platform. and is designed to introduce new people to how the software works. It is strictly a sample of HPCC Systems. The play cluster’s return times are not a good indicator of the how powerful ECL is, at full strength.
To access the play cluster using the ECL IDE, enter the following in the Server ID field and click OK:
Note: Make sure the server field does not include the preface http:// which will not work. Sometimes when using copy/paste, it auto populates with http://play.HPCCSystems:8010.
You have already completed the next dialog that as shown below, so click OK to continue:
Now the ECL IDE is connected to the play cluster. If you wish only to use the ECL IDE then skip the next section that covers using the ECL Extension for VS Code.
Using the ECL Extension for VS Code
For those who wish to use VS Code to explore ECL, Download it here and follow their setup wizard for the basic install.
Once you have Visual Studio open, you’ll need to add the ECL Extension by going to View on the top toolbar and selecting Extensions or use the Ctrl + Shift + X shortcut.
Then type ECL in the search bar and select ECL Extension.
Once you’ve installed the ECL Extension you are ready to start working in the Explorer tab
ECL Watch is used to monitor what is happening with the clusters, although it is much more than that. ECL Watch is a Web-based Query Execution, Monitoring, and File Management Tool, which includes an interface for file Imports and Exports.
Recently the software has been updated. Any build prior to 8.6.4 will appear quite different. Some major upgrades can be found between its legacy version compared to the new modern version. There is a Transition Guide available on the wiki to help walk you through the changes and the main user documentation for ECL Watch is available here.
You are now officially connected to the play cluster! You can access the ECL Watch here and explore the interface.
Another alternative if you are a PC user and want to run the HPCC Systems Platform, the simplest and most natural environment may be to use a Hyper-V Virtual Machine. Hyper-V is standard on many versions of Windows, and tends to work better than most add-on virtualization systems. Please read Running HPCC Systems Platform on Microsoft Hyper-V written by Lili Xu (Software Engineer III, LexisNexis Risk Solutions Group) for the full instructions.
Stay tuned for part 2 in this series, where we will use these tools in greater depth to sort freshly imported data sourced from the web.