The 2022 HPCC Systems Virtual Community Day Summit provided a three-hour workshop for community members who want to expand their knowledge of the HPCC Systems platform and ECL in three different areas:
- ETL with ECL,
- Data Delivery with ROXIE
- Advanced ECL tips and tricks.
The workshop takes participants through the new book Definitive HPCC Systems (Vol II): Data Transformation and Delivery written by ECL expert Richard Taylor and provides code examples of specific ECL techniques to accomplish many relatively common tasks when working with huge amounts of data.
The workshop is split into 3 one-hour sessions and all sessions were recorded giving you the opportunity to take each one in sequence to complete the full course.
Meet the Trainers
The workshops are presented by our trainers Bob Foreman and Hugo Watanuki.
Software Engineering Lead
LexisNexis Risk Solutions Group
Bob has worked with the HPCC Systems big data analytics platform and the ECL programming language for over 10 years and has been a technical trainer for over 25 years. He is the developer and designer of the HPCC Systems Online Training Courses and is the Senior Instructor for all our classroom and remote based training.
Learn more about his experience of teaching ECL and HPCC Systems and how our training materials and lessons have evolved to meet the needs of our open source community, helping businesses solve real world data problems.
Manager Community Tech Programs
LexisNexis Risk Solutions Group
Hugo has been supporting the development and delivery of training programs for the HPCC Systems platform in Brazil since 2019. He has worked for over 15 years on various technical roles in the IT industry with a focus on High Performance Computing and is currently responsible for the HPCC Systems internship program at LexisNexis Risk Solutions.
Learn more about how the Brazil team are engaging with universities looking to collaborate with industry experts to teach students big data analytical skills, engage in research projects and obtain data science skills.
Join a Workshop Session
The recordings of these sessions are available on the HPCC Systems YouTube Channel via the links shown below and are best completed in the following order:
If you want to exercise the code examples while watching the recordings, all the artifacts ( such as ECL files, data source details and slide presentations) are available in the HPCC Systems Community Workshops GitHub Repository
Please Note: To get the most out of these sessions, taking our Introduction to ECL – Part 1 and Part 2 courses beforehand is highly recommended. You will need to have a basic familiarity with the ECL language, including query building and data handling before attending.
Session 1 – ETL with ECL
This workshop starts with an introduction to HPCC Systems and ECL core concepts, focusing specifically on common Big Data tasks such as raw data ingestion, profiling, hygiene, standardization and data export.
These tasks are presented in a logical and intuitive way allowing you to learn and explore ECL features under scenarios that are commonly faced by data engineers in the real-world. The New York City Taxi & Limousine Commission public dataset containing taxi trip data, sets the context for session 1 and you are invited to exercise tasks such as:
- Easily combining separate datasets during the ingestion process
- Efficiently exploring the data and discovering what’s what
- Optimizing strategies for data standardization and hygiene
- Understanding the different alternatives to get data to end-users
Session 2 – Data Delivery with Roxie
In the second session of the workshop the focus shifts to the task of distilling useful information from all the raw taxi trip data that was cleaned and standardized during the first workshop session. ECL code examples and techniques are presented to allow you to turn standardized taxi trip data into something that an end-user or customer would be willing to pay for. In this case, an online query service that returns the average fare amount, duration and distance for taxi trips based on every possible combination of pickup and drop-off locations, day of the week and hour of the day contained within the datasets. Therefore, during session 2 you will be presented with topics such as:
- Understanding the ECL programming philosophy behind ROXIE
- Writing efficient code to extract the information you want from the data
- Optimizing data delivery with indexes and data dictionaries
- Easily testing and deploying the product to end-users
Session 3 – The ECL Cookbook
The ‘grand finale’ takes place in third session of the workshop where an ECL cookbook packed with tips and tricks sets the tone of the session. The content of the ECL cookbook is a compilation of code examples that reflects Richard Taylor’s expertise acquired over the course of more than two decades supporting ECL developers worldwide. If you are an ECL developer (or planning to become one), you will certainly enjoy this content and acquire a better grasp of ECL’s full capabilities after this session. Some of the tips and tricks presented during this session include:
- Generating custom files and dealing with incremental file updates
- Understanding type casting and type transfer in ECL
- Strategies for dealing with strings, sets and dates
- Applying ECL to address common mathematical tasks
Other Training Opportunities
A full suite of online training courses is available on the HPCC Systems website, providing a range of learning opportunities.
Please Note: The advanced ECL and machine learning courses require the completion of our introductory ECL courses as a prerequisite.
ECL Core Classes
These online classes provide access to both our introductory and advanced ECL language classes as well as use of ROXIE queries and an applied course for those wanting to extend their knowledge further:
- Introduction to ECL Part 1 – Concepts and Queries
- Introduction to ECL Part 2 – The Extract Transform and Load (ETL) Process
- Advanced ECL Part 1 – Working with Relational Data
- Advanced ECL Part 2 – Superfiles, working with XML and free form text parsing
- ROXIE ECL Part 1 – Indexes and Queries
- ROXIE ECL Part 2 – Complex Query Development
- Applied ECL – ECL Code Generation Tools
This online course explores the fundamentals of Machine Learning with ECL, and leverages many of the supported open source Machine Learning bundles
The Introduction to HPCC Systems for Managers provides a basic familiarity of using HPCC Systems and how the ECL language can be used to build powerful data queries.
This course provides a suite of sessions for those managing HPCC Systems environments. Our suite of courses starts with an architectural overview, routine maintenance and best practices to observe, moving on to focusing on the specific needs of a Thor and/or ROXIE cluster.
Getting Help and Keep in Touch
Many more learning opportunities are available on the HPCC Systems YouTube Channel, including presentations from our conferences on a variety of topics and use cases that may be relevant to your own project.
If you have questions or would like to connect with others working on HPCC Systems projects, use our Stackoverflow forum to post comments and questions.
If you are new to HPCC Systems, find out more about us here.