Skip to main content

Data Lake Curation and Automation with Tombolo

As Data Lakes become more complex, it can become difficult to locate information and to understand the inner workings. Curation is the process of documenting a Data Lake so that resources can be located, and its flows understood. Tombolo is an open-source Curation and Governance system for HPCC Systems Data Lakes. It provides visibility into the Data Lake and a central repository for documentation of all of its aspects. It is tightly integrated with the HPCC Systems Platform, automatically exchanging information to help automate the Curation and operation of the Data Lake.

ECL Extension for VS Code - New features available now

Throughout 2020, we have been adding new features and enhancements to our ECL Extension for VS Code and the latest version is available from the Visual Studio Marketplace. In this blog, Gordon Smith provides details of the feature highlights and enhancements recently added. These include simplified launch configurations, new workunit history view, a results viewer, access to additional resources such as bundles and client tools, the ability to insert record definitions, localisations to make this tool more accessible to those who speak a language other than English and a note about previously added Syntax Colouring that was adopted by GitHub earlier this year.

Introducing the Universal Workunit Scheduler

There are many scenarios where multiple dependent jobs (workunits) need to run in a particular sequence to complete a task. Within HPCC Systems, these jobs are referred to as “workunits.” The workunit performs specific tasks on Thor (the Data Refinery Engine in HPCC Systems), such as building a keyfile, analyzing data, spraying data onto Thor, and other tasks. The Universal Workunit Scheduler provides a way to schedule one or more streams or work within HPCC Systems. More information on this topic can be found in the ‘Tips & Tricks’ section of the Community HPCC Systems Forum.

HPCC Systems Thor Monitor - Using Workunit Services and Power BI to Monitor Thor Activity

Enterprise Control Language (ECL) Workunit Services standard library functions can be used to capture details about workunits running on Thor including processing time, errors, current state, and more. Capturing these details allows for monitoring, trending, error analysis, degradation, and other data points that can help improve the efficiency of Thor environments.  In this blog, we will look at how to use this information to monitor the system with visualizations in Power BI.

Leveraging the Spark-HPCC Ecosystem

James McMullan, Sr. Software Engineer at LexisNexis Risk Solutions, gave an overview of the Spark-HPCC Plugin & Connector in a breakout session at the 2019 HPCC Systems Community Day. This presentation also included an introduction to Apache Zeppelin, a demonstration of a random forest model created in Spark, and a discussion about the future of the Spark-HPCC Ecosystem.

New ECL IDE Features in 7.0

The ECL IDE is an integrated development environment for ECL programmers to create, edit, and execute Enterprise Control Language (ECL) code within the HPCC Systems platform. The latest 7.0 version includes new features and enhancements, such as a more comprehensive autocomplete, tooltips, and F12 capabilities. 
Subscribe to Features & Tools