HPCC Systems Virtual Community Day Summit 2021

Expedition Tech - 2021 Community Day Visual

The 2021 HPCC Systems Virtual Community Day Summit was another busy and eventful day providing large number of presentations covering a wide variety of topics. We also announced our 2021 Community Award winners including the Poster Contest Winners. Find out more about the 2021 HPCC Systems Poster Contest, including details of all 18 entries, our judges and, of course, the winners.

While our Community Day Event takes place on a single day, during the following three days, we also provided a series of one hour remote ECL training workshops, which is featured in more detail in our blog Cruising the ML World with HPCC Systems.

In addition, we also provided an Expo Hall to allow attendees to find out more about some of the featured topic areas with the opportunity to set up meetings or chat with people who are involved in delivering HPCC Systems related features, enhancements, our Academic Program and other community related collaborations. Here is an example of how that looked: 

Expo Hall Visual - Come to The Cloud

Our Poster Hall featured all 18 posters entered into this competition. Our Community Choice Award poster winner was selected by registered event attendees, who viewed all the posters and resources and then voted for their top three favourite posters. 

Poster Hall Visual

If you were not able to make it on the day, this is a great opportunity to look through the entire list of presentations and watch recordings of those that are of specific interest to you.

We started and ended the day with plenary sessions and five tracks ran concurrently throughout the day. So if you did attend and had to make some difficult choices of which sessions to join, use the links provided to catch up on any you missed or rewatch those presentations that particularly peaked your interest. The full catalog from the start is available on the HPCC Systems YouTube Channel.

If you have questions for our colleagues or open source community, why not post in our Community Forums. Simply pick the one that best matches the nature of your query.

Plenary Sessions

We kicked-off our 8th annual HPCC Systems Community Virtual Summit with keynotes from top industry leaders and technologists including Microsoft, BitPay and DataSeers.

In our closing plenary sessions, find out about the winners of our community awards, hear a panel discussion about preserving the symbiotic relationship between industry and academia and final closing comment from Flavio Villanustre.

Data Lake

In this track, presenters shared efficient and secure ways for handling and analysing data as well providing examples of using tools and extensions based on their own experience.

  • Data Visualisation with RealBI
    Dan Camper and Mahdi Kashani
    LexisNexis Risk Solutions Group

    RealBI is a new HPCC Systems business intelligence tool used to empower HPCC Systems developers to shape and visualize their data in real time, regardless of the size of that data. RealBI saves users time and cost by communicating directly with HPCC Systems clusters. This eliminates the need to further secure or transport the data since it remains entirely within the cluster. RealBI gives users direct access to logical files and ROXIE queries. It also enables users to write and execute custom ECL scripts from within the application if that is desired. Users don’t need programming skill to use RealBI. All charts, filters, sorting, and many more options, are all available with a click of the mouse.
  • Data Lake Curation and Governance with Tombolo
    Jerry Jacob and Roger Dev
    LexisNexis Risk Solutions Group

    It is easy for a Data Lake to grow out of control if appropriate measures are not put in place. When this happens, Data Engineer’s productivity can suffer, resulting in delays in customer commitments. A Data Lake can become a Data Swamp suddenly and without warning. The critical threshold is reached when the complexity of the Data Lake exceeds the capability of key personnel to hold the pattern of the Data Lake in their head. The goal of Tombolo, a Data Lake Curation tool, is to prevent such an event and allow the data lake to continue evolving rapidly as its complexity increases and as more personnel begin to participate. Tombolo provides the central operating environment for a Data Lake. The Tombolo Data Lake Curation System 1.0 is the first open-source Data Lake Curation system for the HPCC Systems Platform. It allows creation of documentation along with the data and analyses that provides a roadmap into all aspects (assets) of the Data Lake: Data Files, Data Providers and Consumers, Data Ingestion and Analytics, and User Queries. Its global find facility allows users to rapidly locate any asset, or browse hierarchically to get the lay-of-the-land.
  • Design Considerations for Migrating Your HPCC Systems Data Lake to the Cloud
    Krishna Turlapathi and Michael Gardner
    LexisNexis Risk Solutions Group

    During this session, we share lessons learned and design best practices through our own cloud migration experience. The beginning of this presentation includes a simple installation of a cluster on Azure using the community helm charts. During this demo we cover topics such as how the HPCC Systems platform differs between the Kubernetes cluster that we are deploying and the bare metal installations with which community members may be more familiar. We will dive into helm for HPCC Systems, the value of .yaml files and a few different ways that the cluster can be configured, and explain storage in the cloud compared to bare metal. We will then talk about ROXIE and Thor usage in the cloud. Krishna covers some details about getting query lists, suspended queries, and doing package file deployments. Michael expands on basic security features that end users will want to enable in the cloud, including encryption in transit and at rest in a cloud environment such as Azure.
  • Terasort with HPCC Systems on Azure Kubernetes Service and High Performance Storage
    Shrikrishna Kose and Steve Griffith
    Microsoft

    This presentation provides a discussion of the challenges, AKS considerations and storage options, including a demo covering the setup and configuration of HPCC Systems on AKS with Blob NFS 3.0 and performing a Terasort.
  • Taming the Data Demon with the DataSeers HPCC Systems Appliance
    Gurjot Bandasha & Adwait Joshi
    DataSeers

    The core of any data solution lies in data management. What is needed is a solution that will integrate and coordinate compliance, reconciliation, fraud monitoring, and visualization. Hear from the DataSeers experts how they are helping companies in the FinTech and Banking industry to manage money, fight fraud and maintain compliance using a solution built from the ground up leveraging HPCC Systems.

Instructional Demos

In this track, our technical engineers demonstrate how to complete specific tasks for configuring and using the HPCC Systems platform.

Machine Learning

This track features presentations from ML experts and users about the latest machine learning libraries and algorithms now available for use either in or with the HPCC Systems Machine Learning Library.

  • Contributions to HPCC Systems – From Virtual Collaboration to Virtual Reality
    Dr G Shobha
    RV College of Engineering

    This talk focuses on the virtual collaborations between RV College of Engineering and LexisNexis Risk Solutions which produced recent contributions to the HPCC Systems Platform. These include plugins and extending Machine Learning bundles for HPCC Systems, followed by analysing the impact of skewed data distributions on most commonly used ECL operations. The talk concludes with case studies executed on HPCC Systems, including the implementation of a virtual reality application.
  • HSQL: An SQL-like Language for HPCC Systems
    Atreya Bain, RV College of Engineering
    Mahdi Kashani, LexisNexis Risk Solutions Group

    There is a steep learning curve to getting used to handling Big Data, especially in distributed systems, where the task of data processing is split amongst various nodes in clusters. HSQL is the new big-data query language of HPCC Systems and is an innovative and open- source solution to let users process their data at any scale. It is designed to work in conjunction with ECL which is the primary programming language for HPCC Systems, and it should prove itself to be easy to work with and robust for general purpose analysis. Made to provide a compact and easy to comprehend SQL-like syntax for performing visualizations, general data analysis, training of Machine Learning models, HSQL allows a modular structure to such programs and can easily integrate with VS Code IDE. In this presentation, learn why HSQL is important and how it adds more value to HPCC Systems users, its syntax, and see a couple of examples on different datasets and its installation and setup instructions.
  • New Advancements to Logistic Regression and the ML Library
    Lili Xu
    LexisNexis Risk Solutions Group

    Logistic Regression is one of the most important analytic tools in the social and natural sciences such as natural language processing and image recognition. One of our Machine Learning advancements is to renovate the current HPCC Systems Logistic Regression bundle and add the ability to handle both binary and multi-classes predictions tasks. Another advancement is to improve the performance and remove the bottlenecks of the Preprocessing bundle. The improved version is more scalable and more efficient for Big Data preprocessing tasks.
  • The Causality Analytics Toolkit for HPCC Systems
    Roger Dev
    LexisNexis Risk Solutions Group

    Causal Reasoning is at the heart of most human thought and action, yet has only recently been formalized as a mathematical and scientific field of study. It is hard to conceive of achieving a true AI without such a capability. Although the science of Causality has not advanced to the threshold of AI, it can unlock capabilities that are beyond the realm of statistical observation. Current Machine Learning methods assess observational patterns, and learn to replicate the results of patterns previously detected. They make no effort to disentangle true causal effects from observed correlation. They lack the ability to respond to changes in the scenarios that generated the data, or to predict the effect of new actions on the outcome. Causal Science provides a path toward a deeper understanding of our data. It defines mechanisms that can separate causal influences from spurious correlation and infer causal effects from observational data. As these techniques evolve, they stand to revolutionize our understanding and uses of data. Causality 2021 is an HPCC Systems research and development program. The goal is to increase our understanding of the latest causal algorithms, assess and challenge the current state-of-the art, and develop a Causality Toolkit for HPCC Systems Platform. This project encompasses all three levels of the “Ladder of Causality”: “Seeing”, “Doing”, and “Imagining”, as well as Causal Model Validation, and Causal Discovery.
  • The Forecast of COVID-19 Spread Risk at The County Level
    Murtadha Hssayeni
    Florida Atlantic University

    The early detection of the coronavirus disease 2019 (COVID-19) outbreak is important to save people’s lives and restart the economy quickly and safely. People’s social behavior, reflected in their mobility data, plays a major role in spreading the disease. Therefore, we used the daily mobility data aggregated at the county level beside COVID-19 statistics and demographic information for short-term forecasting of COVID-19 outbreaks in the United States. The daily data are fed to a deep learning model based on Long Short-Term Memory (LSTM) to predict the accumulated number of COVID-19 cases in the next two weeks. A significant average correlation was achieved (r=0.83 (p=0.005)) between the model predicted and actual accumulated cases in the interval from August 1, 2020 until January 22, 2021. The model predictions had r > 0.7 for 87% of the counties across the United States. A lower correlation was reported for the counties with total cases of <1,000 during the test interval. The average mean absolute error (MAE) was 605.4 and decreased with a decrease in the total number of cases during the testing interval. The model was able to capture the effect of government responses on COVID-19 cases. Also, it was able to capture the effect of age demographics on the COVID-19 spread. It showed that the average daily cases decreased with a decrease in the retiree percentage and increased with an increase in the young percentage. Lessons learned from this study not only can help with managing the COVID-19 pandemic but also help with early and effective management of possible future pandemics. The project used the HPCC Systems platform for collecting, hosting, and analyzing the data.

Platform Features

This track provides an opportunity to learn about the new features and enhancements in the latest HPCC Systems platform, including cloud native topics.

  • What’s New in HPCC Systems and the Cloud Native Roadmap
    Gavin Halliday
    LexisNexis Risk Solutions Group

    Gavin shares an update on the new features and enhancements included in the latest release with additional focus on data handling in our cloud native version.
  • What’s New in ECL Watch, IDEs and Visualization Framework
    Gordon Smith
    LexisNexis Risk Solutions Group

    Gordon provides an update on the latest features in our ECL related development tools, including “Modern” ECL Watch, Visualization updates and the VS Code ECL Extension.
  • Securing Your Cloud Native HPCC Systems with Service Mesh
    Manish Kumar Jaychand
    Infosys

    With the advent of Cloud and Kubernetes over the years, machines are no longer considered as attached to a data center. Machines are more ephemeral than ever before. The traditional architecture of the HPCC Systems environment harnessed the physical storage of each node and that in turn gave certain performance benefits. But with cloud, it is no longer necessary to have a fixed machine for a process. The true power of cloud can be harnessed only when we treat them as ephemeral. Therefore, a Thor worker node which is always on in a traditional HPCC Systems environment is spun up only when it is required in a cloud environment. With the latest cloud native version of HPCC Systems, we now have the flexibility to spin up the clusters only when required. In this session, we will cover how the latest cloud native platform is different from the bare metal version, explain service mesh and how it fits into the HPCC Systems scheme of things, and a comparison of service mesh Istio and Linkerd.
  • Cloud Security & Authentication in HPCC Systems
    Russ Whitehead, Tony Fishbeck & Mark Kelly
    LexisNexis Risk Solutions Group

    This talk will cover a discussion of some current and future technology enhancements around HPCC Systems platform security, with a primary focus on cloud deployments. This session will include a look at support for mutual transport layer security between internal components within an HPCC Systems environment, external facing TLS for securing access into the HPCC Systems environment, using cert-manager to generate TLS certificates for HPCC Systems services and components, installing externally created TLS certificates, secrets management, and a look at our plans for future support of OAUTH2 based authentication and authorization, with an initial focus on support for OAUTH2 integration with Azure Active Directory services.
  • ROXIE Troubleshooting
    Mark Kelly
    LexisNexis Risk Solutions Group

    ROXIE services on cloud/Troubleshooting: What changes will need to occur in the ROXIE code to run on the cloud native platform?

Proven Use Cases

Hear success stories on how HPCC Systems is being used in proven solutions both in industry and academic research projects.

  • Deploying Digital Human Readers Leveraging HPCC Systems
    David de Hilster
    LexisNexis Risk Solutions Group

    With the newly launched NLP-Plugin for HPCC Systems and VSCode NLP Language Extension, the community now has the ability to incorporate human-like “digital readers” into HPCC Systems to mine information from free text that has up until now, been impossible to extract. Future projects will be discussed including reading radiology reports, business reports, and real estate documents the latter of which could open new markets across the industry. It is important for everyone to understand this new technology in order to spot potential applications for extracting unmined data that until now, was impossible to obtain. Sharing our own use case, the end goal is to create a NLP Center of Excellence that will serve the entire company with digital readers first in English, then, other languages to open new streams of revenue.
  • HPCC Systems Thor Monitor – Using Workunit Services and Power BI to Monitor Thor Activity
    Jessica Skaggs
    LexisNexis Risk Solutions Group

    The ECL Workunit Services standard library functions can be used to capture details about workunits running on Thor including processing time, errors, current state, and more. Capturing these details allows for monitoring, trending, error analysis, degradation, and other data points that can help improve the efficiency of your Thor environments. We will look at how to use this information to monitor the system with visualizations in Power BI.
  • Cooperative actions between University of São Paulo and LexisNexis Risk Solutions
    Renato de Oliveira Moraes
    University of São Paulo

    Professor Renato discusses the successful conjoint initiatives being held between University of São Paulo (USP) and LexisNexis Risk Solutions in Brazil for leveraging HPCC Systems for teaching & learning, research and extensions activities in academia, including recent machine learning projects.
  • Processing Student Image Data with Kubernetes and HPCC Systems GNN on the Cloud
    Carina Wang
    American Heritage School

    In order to foster a safe learning environment, measures to bolster campus security have emerged as a top priority around the world. In this session, Carina shares how HPCC Systems was leveraged to process student images with Kubernetes running on the Cloud Native Platform while utilizing the Generalized Neural Network (GNN) bundle for image classification. The result is a trained model which can be implemented on the autonomous security robot we built to help campus security personnel identify visitors, students, and staff.
  • Athlete 360: Leveraging HPCC Systems and RealBI for Athlete Wellness and Performance
    Christopher Connelly
    North Carolina State University

    There is a lot that plays into an athlete being able to perform at their best when it matters most. Not only are there physical demands, but factors that come from outside of their sport that affect their wellbeing and readiness to perform. In team sports, there are many external variables that cannot be controlled, which makes the process of gauging performance of individual athletes difficult. The better the understanding of what an athlete does and how their body responds, the better we can support them to be at their best. Within collegiate athletics, and sports in general, there is a struggle to be able to interpret data from different streams together in a single report. Furthermore, streamlined data collection, can further aid our understanding of what an athlete does and how their body responds. This involves data from all aspects of an athlete’s day including wellness questionnaires, practice training loads, weight room training loads, and weight room assessments of strength, power, and fatigue. In the past we have shown the impact of using HPCC Systems with the NC State Men’s soccer team. Here you will see some solutions using HPCC Systems and RealBI to provide insight from data collected with the NC State Women’s basketball team as well as how this system can serve not only the Strength and Conditioning department, but the athletics department as a whole.

Community Award Winners 2021

We work with many talented people whose contributions are highly valued by our community members. Our award winners are also presented with digital badges for displaying on their social media profiles.

2021 Community Recognition Award

Image showing the Community Recognition Badge

This award recognises external community members for their innovative use of HPCC Systems in their research, solutions and open source projects and also for their contributions to the HPCC Systems open source community. 

Image showing Robert Kennedy

Congratulations to Robert Kennedy
PhD Candidate
Florida Atlantic University

Robert Kennedy has completed three HPCC Systems Internships focusing on Deep Learning and was one of the first to use HPCC Systems with TensorFlow. He has also contributed a bundle, allowing ECL developers to build, train and consume neural networks using GPU Acceleration and recently expanded on this work, adding GPU acceleration our GNN bundle. He entered our poster contest three times, was a winner every time and was a judge for our 2021 Poster Contest. He has also spoken at our Tech Talks and Community Day event multiple times. Links to all the resources Robert has contributed over the years are available on our HPCC Systems Intern Contributions wiki page.

2021 David Kan Ambassador Award

Image showing the David Kan Ambassador Award Badge

This award recognises RELX and LexisNexis Risk Solutions Group colleagues who have significantly promoted and contributed to the growth of our community, serving as strong supporters, proud evangelists, subject matter experts and champions.

In 2016, this award was named in memory of David Kan, one of the first and very dedicated HPCC Systems Ambassadors.

Image showing Allan Wrobel

Congratulations to Allan Wrobel
Consulting Software Engineer
LexisNexis Risk Solutions Group

Allan Wrobel is a longstanding user of HPCC Systems and over the years has contributed greatly to our open source platform and community. As well as providing feature and enhancement ideas using our Community Issue Tracker, he has also helped to locate and diagnose issues, leading to fixes which have benefited many of our users. He has become a regular feature of our community outreach speaking at our first ever Tech Talk Series in 2017 contributing more presentations over the years as well as taking part in our 5 Questions With Series. Allan’s YouTube channel features ECL Tip and Tricks, providing short videos tacking specific tasks such as using macros and various ECL functions. Many users have found Allan’s YouTube Channel a valuable resource for practical guidance on using the ECL Language. 

Keep in Touch and Join us for HPCC Systems Community Day 2022

Thanks to all our speakers who gave their time to share their knowledge and experience of using HPCC Systems. Our Community Day Summit always provides a huge variety of learning opportunities as well inspiring stories of achievements and successes by our colleagues and open source community users. Thanks also to our LexisNexis Risk Solutions colleagues, community collaborators and users who support the HPCC Systems Open Source Project.

There are a number of ways you can keep in touch with what is going on in the HPCC Systems Open Source Community, here are just a few:

Before we know it, the 2022 conference season will be upon us. We wish you all a productive year ahead and look forward to hearing about and sharing your success stories at HPCC Systems Community Day 2022.