Our 9th annual HPCC Systems Community Day Summit was a day packed with content covering a variety of topics, with something for all members of our open source community. All sessions were recorded, so we’d like to give you this opportunity to catch up any you may have missed, using the HPCC Systems YouTube Channel.
At this event, we announced our 2022 Community Award and Poster Contest winners in the Closing Plenary Session. We also ran a series of three Hands On Learning ECL Workshops which are available for you to watch and extend your knowledge of the ECL language.
This blog gives a guided tour of our Community Day presentations, providing you with everything you need to select the sessions you want to watch based on your specific areas of interest.
Join our opening plenary sessions to watch presentations featuring LexisNexis Risk Solutions Technology Leaders and Community Keynote Speakers.
- Flavio Villanustre – SVP Technology and CISO, LexisNexis Risk Solutions Group
Welcome, introductions and a reveal of the all new HPCC Systems ‘electric tech’ reboot
- Adwait Joshi – Chief Seer, DataSeers (Minute Marker: 3.32)
Learn about the successes and huge growth DataSeers has experiences in a very short space of time with the help of HPCC Systems
- Dr Shobha – RV College of Engineering, India (Minute Marker: 19.14)
The HPCC Systems Centre of Excellence on Cognitive Intelligent Systems for Sustainable Solutions was established by RVCE in June 2022. Dr Shobha shares how her faculty has leveraged HPCC Systems in the classroom for a number of applications on Big Data analysis.
- Rohan Maheshwari – Computer Science and Engineering Student, RV College of Engineering (Minute Marker: 28.49)
Rohan has been using HPCC Systems to investigate block data stored on blockchain to gain insight on transactions and underlying user behaviour to help find criminals using the bitcoin network for illicit activities.
- Gavin Halliday – Enterprise/Lead Architect, LexisNexis Risk Solutions Group (Minute Marker: 50.15)
Hear about the features the have been added to the HPCC Systems Platform to improve productivity and to support our journey to the cloud, as well as some changes you can expect to see in the near future.
Join our closing plenary sessions to find out how we have been taking part in Hackathons to attract young coders to engage with technology as a potential future career. We also announce the recipients of our 2022 Community Awards and Poster Contest winners, followed by closing comments from Flavio Villanustre.
- From Hack to Hire. How industry is using Hackathons to drive student interest in Big Data and recruit talent
Bahar Fardanian – Manager, Solutions Engineering, LexisNexis Risk Solutions Group
Tyler Menezes – Executive Director, CodeDay (Minute Marker: 2.11)
Most companies are familiar with the difficulty in hiring talented software engineers, particularly those experienced in Big Data. Bahar and Tyler discuss how they have been using Hackathons to drive student interest and build skills, while having fun along the way.
- 2022 Community Awards Ceremony
Trish McCall – Senior Director Program Management, LexisNexis Risk Solutions Group
Lorraine Chapman – Manager, Business Analyst, LexisNexis Risk Solutions Group (Minute Marker: 16.25)
Watch as we announce the recipients of the 2022 HPCC Systems Community Recognition and David Kan Ambassador awards. Find out who the judges selected to win the best poster awards in the categories of Data Analytics, Platform Enhancement, Use Case Research. We also reveal which poster presenter our Community Day attendees voted to be the winner of the 2022 Community Choice Award.
- Closing Comments
Flavio Villanustre, SVP Technology and CISO, LexisNexis Risk Solutions Group (Minute Marker: 35.33)
A review of the day’s events and thanks to our open source community for contributing to a wonderful day of quality content.
Machine Learning Sessions
These sessions include contributions to our Machine Learning Library, uses cases of HPCC Systems and ML algorithms and connectors that allow you to interact with HPCC Systems more effectively.
- Machine Learning and Analytics Update
Roger Dev – Senior Architect, LexisNexis Risk Solutions Group
This talk focuses on the latest developments in the Machine Learning and Analytics Library:
Gaussian Process Regression – A non-linear regression bundle, accelerated using Random Fourier Features. This is a flexible kernel-based regression method, with enhanced scalability.
HPCC Systems Causality Toolkit – Our official release of the Causality toolkit. Provides synthetic data generation and powerful probability analysis, as well as causal model analysis, and causal inference.
- Visual Analysis of Data Relationships (Minute Marker: 8.01)
Roger Dev – Senior Architect, LexisNexis Risk Solutions Group
The Joint Probability Space of a multi-variate dataset is a remarkably complicated object, encompassing everything that is knowable from the data alone. We use the powerful analytic and visualization capabilities of the HPCC Systems Causality Toolkit to examine the nature of probabilistic and causal relationships within datasets. We start by examining synthetic data with known relationships in order to recognize essential patterns. Then we move on to natural datasets to see how those patterns can be recognized, and how they differ from those observed in synthetic data.
- Analysis on Medical Images: Colorectal Cancer Diagnosis
Sarvesh Prabhu – 2022 HPCC Systems Intern and Lambert High School, Georgia USA
Sarvesh joined the HPCC Systems Intern Program in 2022 to complete this project which involves analysing images collected by a smart pill swallowed by patients who are experiencing gastro-intestinal symptoms that need more investigation. Getting a consistently accurate diagnosis of GI issues is a challenge and Sarvesh’s work aims to provide a solution to this problem. Using HPCC Systems and our Machine Learning Library, Sarvesh compared Neural Networks and Random Forest to discover which might provide the best training and inference approaches for pragmatic business use cases.
- Implementation of Local Outlier Factor Algorithm for Anomaly Detection (Minute Marker: 14.26)
Arya Adesh – 2022 HPCC Systems Intern and Bachelor of Computer Science and Engineering, RVCE, India
Arya Adesh joined the HPCC Systems Intern Program in 2022, producing an anomaly detection algorithm that is now available for you to use via our Machine Learning Library. It provides an unsupervised method that identifies anomaly without training a model, which can be very useful in cases where real time analysis is needed, such as bank fraud, data compromises etc. While other anomaly detection algorithms find global anomalies they often do not find local outliers due to the assumption that the dataset has a uniform data distribution. LOF is suitable for uneven data distributions because it can identify both global outliers and local outliers.
Amila De Silva – Software Engineer III, LexisNexis Risk Solutions Group
PyHPCC is a Python package that allows users of HPCC Systems, particularly beginners to interact with HPCC Systems quickly and easily without having to learn the ECL language. We are always looking for ways to diversify the capabilities of our platform and tools like PyHPCC are a great way to encourage users, particular new users, to realise the potential of using HPCC Systems for their data analytics solutions.
Our Cloud Native platform is available and ready for you to use. These sessions provide you with information on getting up and running with some handy tips on recommended approaches and settings for your Cloud Native HPCC Systems clusters, by the developers who built the platform.
- HPCC Systems Platform Cloud Build and Deployment Pipeline, LexisNexis Risk Solutions Group
Godson Fortil – Software Engineer II, LexisNexis Risk Solutions Group
Michael Gardner – Software Engineer III, LexisNexis Risk Solutions Group
Xiaoming Wang – Senior Consulting Software Engineer, LexisNexis Risk Solutions Group
Join this session to learn about the full development pipeline from building the GitHub Repository directly in Docker containers that are easily accessible to outside developers, optimising builds to take full advantage of ML, GNN and GNN+GPU stacks in cloud environment like Azure and finally, the deployment of your builds onto your cloud infrastructure using HPCC Systems Terraform modules.
- Cost Savings on Cloud Native Systems based on ECL Metrics
Shamser Ahmed – Senior Consulting Software Engineer, LexisNexis Risk Solutions Group
Finding opportunities for cost savings starts with understanding where the costs are being incurred and the size of these costs. The HPCC Systems Cloud Native platform produces costs metrics that will be invaluable for developers to maximize cost savings. Find out about the cost information available through the Platform, how it may be used to minimize costs and the anticipated future enhancements planned for this feature.
- Enhancements in HPCC Systems Log Data Processing in the Cloud
Rodrigo Pastrana – Architect, LexisNexis Risk Solutions Group
Learn about the newly enhanced toolset designed to unlock the power of logging analytics for HPCC Systems containerized environments. Exciting new usability features allow HPCC Systems users and admins to plug-and-play powerful log processing platforms such as Elastic Stack, and Azure Log Analytics.
This collection of presentations focuses on transitioning to our Cloud Native platform as an ECL developer, as well as providing a detailed look at some specific topics for Roxie users.
- Journey to the Cloud – What every ECL Developer should know and some Common Misconceptions about using HPCC Systems Cloud Native
Bob Foreman – Software Engineering Lead, LexisNexis Risk Solutions Group
Hugo Watanuki – Senior Software Engineer, LexisNexis Risk Solutions Group
The transition from a traditional bare metal environment to a containerized or cloud based HPCC Systems platform should be transparent. Once the platform is set up, configured, and properly aligned for the types of work being done, it should work the same as a bare metal installation. Learn what every ECL developer should know, discover answers to common questions and focus on some issues you may have overlooked, such as monitoring and cost control.
- ROXIE Migration tool
Harsh Dasai – Software Engineering Lead, LexisNexis Risk Solutions Group
Rajeev Rajvaidya – Consulting Software Engineer, LexisNexis Risk Solutions Group
Sathish Kumar Seenivasan – Senior Software Engineer, LexisNexis Risk Solutions Group
During the process of system/process migration on any Infra (Cloud/Bare Metal) the most important step is testing. There are numerous of process that needs to be repeated until we achieve correct results. Testing manually and repeating process manually for a large project is tiresome and error prone. As a result we have built the ROXIE Migration Tool which will assist in these roles and moreover it can be used redundantly over any products and it does not restricts itself to the Roxie Migration only. In essence, it can scale to any segment in regular Roxie development. This tool will not only help in reduction of efforts ,but will relatively reduce error, helping to save revenue . With the help of RMT, repeated task can be accomplished in few minutes – helping to increase in overall lead time.
- Investigating ROXIE Queries
Krishna Turlapathi – Director Software Engineering, LexisNexis Risk Solutions Group
This talk focuses on ROXIE architecture and how to understand more about ROXIE query performance by looking at various fields in the ROXIE logs, including using some new stats that have recently been added. An example of collecting and analyzing stats for a completed query is provided. Some interesting information is shared about our own experiences with containers and an important ROXIE optimization about graph dependencies and how that affects when activities can start.
These sessions provide information about tools and tips to help you get the most out of your ECL Development.
- ECL Source Control with GIT
Greg Panagiotatos – Senior Software Engineer, LexisNexis Risk Solutions Group
For many years, HPCC Systems has used the eclccserver to compile ECL code from GitHub repositories, using the Githook mechanism. The latest version of the platform contains enhancements to improve the capabilities and make it easy to use GitHub. The new multiple repository feature allows teams to develop code in their own independent repositories but have versioned dependencies between those repositories. Come and learn how to use Git with HPCC Systems and see the recent changes to the platform that significantly improve support for compiling from the Git repositories.
- ECL Notebooks
Jim DeFabia -Senior Consulting Software Engineer, LexisNexis Risk Solutions Group
HPCC Systems now supports ECL Notebooks in VS-Code with the ECL Language Extension. ECL Notebooks are useful for interactive tutorials, demos, proofs-of-concept, and visualizations. ECL Notebooks allow you to create and share documents that contain narrative text and cells with “live” ECL code that other users can edit and run. ECL Notebooks are like Jupyter Notebooks, but they support ECL, and render and run inside of VS-Code (with the ECL extension). This presentation demonstrates how to create and use ECL Notebooks.
- ECLS Scanner Tool
Rahul Jain – Manager Software Engineer, LexisNexis Risk Solutions, India
What is the ECLS tool? The ECL Scanner tool is currently in Beta version, but it is more than just a working model. It’s about an idea with some work in progress already. This tool is developed in C# .NET which connects to ECL GIT Repos for scanning .ecl files for unwanted imports. It is eventually an .exe file and hence can be installed on any local system. The goal of this presentation is to leverage this idea and current working methodology to help the HPCC Systems ECL developer community to save on code clean up time.
Beyond HPCC Systems
Presenters share and discuss tools and techniques that extend the power of HPCC Systems.
- HPCC Systems to PowerBI Automated Connectivity
Lee Saunders – Consulting Data Analyst, LexisNexis Risk Solutions Group
Are you currently manually de-spraying CSV files to keep your Power BI reports up-to-date? Once upon a time our team was doing just that spending 5 days a month updating reports, now it takes minutes. In this talk we show:
• A process to set up and publish ECL queries to generate a REST URL (JSON output).
• How to link your Power BI reports for direct data refresh.
• Demonstrate how gateways can allow a scheduled refresh of this data, removing the need for any manual intervention in report updates.
- Interfacing MongoDB into ECL (Minute Marker: 19.19)
Jack Del Vecchio – 2022 HPCC Systems Intern and Bachelor of Computer Engineering, Miami of Ohio University
Jack Del Vecchio joined the 2022 HPCC Systems Intern Program to work on this project. The goals were to develop a plugin that uses the HPCC Systems engine to query and return results from MongoDB. In this talk, Jack shares all the features that his plugin supports. MongoDB has some pretty cool things that you can do, allowing me to demonstrate the most useful features with some example code. Jack shows how his plugin is very versatile in what it supports and explains how it may be very useful for someone writing ECL code and trying to use MongoDB. Jack also shares his experience as an intern and what it was like to start developing with very limited experience on ECL and HPCC Systems.
- Tombolo and RealBI
Dan Camper – Enterprise Architect, LexisNexis Risk Solutions Group
Yadhap Dahal – Software Engineer III, LexisNexis Risk Solutions Group
Tombolo is a data lake curation system, designed to work closely with the HPCC Systems platform. Tombolo provides tracking, analysis, and documentation for all resources in your data lake. Real BI is a dashboarding/visualization tool built to support both data and queries living in an HPCC Systems cluster. in this talk, Dan Camper will focus on recent developments in these tools, both created and maintained by the HPCC Systems Solutions Lab group. Yadhap will close with a deep dive demo of Tombolo.
- Processing Radiology Reports
David Dehilster – Consulting Software Engineer, LexisNexis Risk Solutions Group
Dr Amy Apon – Professor, Computer Science, Clemson University
Ashton Williamson – Student, Clemson University
Dhruvisha Patel – Student, Clemson University
Radiology reports consist of observations of medical images by a medical clinicians. In order to properly understand these reports, computers have to combine their knowledge of the human body with linguistic knowledge about how clinicians describe images of the body in order to understand what are the state of body parts in the image. The ultimate goal is to standardize radiology description so computers can detect what is normal and what is not from the medical dictations.
2022 Community Award Winners
In our closing plenary session, we announced the recipients of our David Kan Ambassador and Community Recognition awards. These awards are presented to people we feel have made a huge difference and contribution to the HPCC Systems Open Source Project. Every year, there are many successes achieved by our colleagues and community members. All contributions are valuable to our community and this is one way we can show our appreciation for the commitment and hard work achieved during the year.
This award recognises external community members for their innovative use of HPCC Systems in their research, solutions and open source projects and also for their contributions to the HPCC Systems open source community.
This award recognises RELX and LexisNexis Risk Solutions Group colleagues who have significantly promoted and contributed to the growth of our community, serving as strong supporters, proud evangelists, subject matter experts and champions.
In 2016, this award was named in memory of David Kan, one of the first and very dedicated HPCC Systems Ambassadors.
Congratulations to the recipients of our 2022 Community Awards
2022 Community Recognition Award
Congratulations Tyler Menezes Executive Director CodeDay
CodeDay is a nonprofit providing welcoming and diverse opportunities for under-served students to explore a future in tech. Tyler has been instrumental in providing opportunities for young technologists to use HPCC Systems, working with the team to supply coding challenges. He also encourages participants to join the HPCC Systems Intern Program. In 2021, a quarter of the places on that program went to CodeDay students and one of those went on to become a LexisNexis Risk Solutions employee.
2022 David Kan Ambassador Award
Congratulations Ming Wang Sr Consulting SWE LexisNexis Risk Solution Group
We present this award to Ming to say thank you for how he freely gives his time, to support our academic program and community users. His investment in the learning of students via our intern program and the research projects carried out by our academic collaborators is invaluable. We also want to show our appreciation for the way he does all this. Our team and users respect his knowledge, trust his experience and value his support. We thank you Ming for all you do in supporting the HPCC Systems Open-Source Community.