As Data Lakes become more complex, it can become difficult to locate information and to understand the inner workings. Curation is the process of documenting a Data Lake so that resources can be located, and its flows understood. Tombolo is an open-source Curation and Governance system for HPCC Systems Data Lakes. It provides visibility into the Data Lake and a central repository for documentation of all of its aspects. It is tightly integrated with the HPCC Systems Platform, automatically exchanging information to help automate the Curation and operation of the Data Lake.
HPCC Systems 8.0.0 is the first release that is feature complete to the point where our Cloud Native platform is now ready for Cloud performance evaluation. Find out about the features and enhancements now available and the resources available to help you get started using our Cloud Native platform.
While the main focus of HPCC Systems 8.0.0 Gold is the Cloud Native platform, there are many new features and improvements users across both the Cloud Native and Bare Metal platforms will be pleased to see implemented. Find out more about the highlights including features and improvements in performance, usability, security, the ECL Language and Dynamic ESDL
Throughout 2020, we have been adding new features and enhancements to our ECL Extension for VS Code and the latest version is available from the Visual Studio Marketplace. In this blog, Gordon Smith provides details of the feature highlights and enhancements recently added. These include simplified launch configurations, new workunit history view, a results viewer, access to additional resources such as bundles and client tools, the ability to insert record definitions, localisations to make this tool more accessible to those who speak a language other than English and a note about previously added Syntax Colouring that was adopted by GitHub earlier this year.
There are many scenarios where multiple dependent jobs (workunits) need to run in a particular sequence to complete a task. Within HPCC Systems, these jobs are referred to as “workunits.” The workunit performs specific tasks on Thor (the Data Refinery Engine in HPCC Systems), such as building a keyfile, analyzing data, spraying data onto Thor, and other tasks. The Universal Workunit Scheduler provides a way to schedule one or more streams or work within HPCC Systems. More information on this topic can be found in the ‘Tips & Tricks’ section of the Community HPCC Systems Forum.
Enterprise Control Language (ECL) Workunit Services standard library functions can be used to capture details about workunits running on Thor including processing time, errors, current state, and more. Capturing these details allows for monitoring, trending, error analysis, degradation, and other data points that can help improve the efficiency of Thor environments. In this blog, we will look at how to use this information to monitor the system with visualizations in Power BI.
James McMullan, Sr. Software Engineer at LexisNexis Risk Solutions, gave an overview of the Spark-HPCC Plugin & Connector in a breakout session at the 2019 HPCC Systems Community Day. This presentation also included an introduction to Apache Zeppelin, a demonstration of a random forest model created in Spark, and a discussion about the future of the Spark-HPCC Ecosystem.
The ECL IDE is an integrated development environment for ECL programmers to create, edit, and execute Enterprise Control Language (ECL) code within the HPCC Systems platform. The latest 7.0 version includes new features and enhancements, such as a more comprehensive autocomplete, tooltips, and F12 capabilities.