The Download: Tech Talks by the HPCC Systems Community, Episode 7

The Download: Tech Talks by the HPCC Systems Community, Episode 7

On September 14th, 2017, HPCC Systems hosted the latest edition of The Download: Tech Talks. These technically-focused talks are for the community, by the community. The Download: Tech Talks is intended to provide continuing education through high quality content and meaningful development insight throughout the year.

Watch the webinar

Links to resources mentioned in Tech Talk 7:

Presentation

Code Samples

Elsevier Big Data

Episode Guest Speakers and Subjects:

Xiaoming Wang (Ming), Consultant Software Engineer, LexisNexis Risk Solutions

Xiaoming Wang (Ming), joined LexisNexis in 2013 on the HPCC Systems core platform team. His main responsibilities include working on the HPCC Systems Platform product builds, deployment and configuration tools and deployment solutions including AWS AMI/Instant Cloud, Juju Charm, and HPCC Systems, and more.

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more. In this talk, Ming will give an overview on plans for implementing a typescript based ECL kernel utilizing HPCC Systems JavaScript libraries to submit ECL code and return Workunit results rendered in Jupyter Notebook cells.

Bob Foreman – Senior Software Engineer, LexisNexis Risk Solutions

Bob Foreman has worked with the HPCC Systems technology platform and the ECL programming language for over 5 years, and has been a technical trainer for over 25 years. He is the developer and designer of the HPCC Systems Online Training Courses, and is the Senior Instructor for all classroom and Webex/Lync based training.

Have you ever wanted to expand the power of your ECL ITERATE and ROLLUP statements? Bob Foreman discusses the next level PROCESS and AGGREGATE transform functions, and illustrates practical examples that were shown in our HPCC Systems forums.

Key Discussion Topics:

1:25- Flavio Villanustre provides community updates:

HPCC Systems Platform updates

What’s Coming in 7.0!

  • Improved stats and improved metadata for smart editing
  • Faster compilation and fast syntax checking.
  • ECL IDE, Graphviewer and new ECL watch UI features
  • Remote projection/filtering of code
  • Spark integration
  • More machine learning bundles including improvements in spraying area, session management in ECL Watch, and improved Configuration Manager

 

15:25- Xiaoming (Ming) Wang – Initial HPCC Systems integration with Jupyter Notebook

Ming’s session focused on the Jupyter Notebook, an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.

Ming discusses the integration with HPCC Systems ECL, current TSECL features and limitations, as well as how to complete the installation. Ming follows this information with an instructive demo.

29:53 – Q&A

Q: Are all features in Jupyter supported through this interface with HPCC Systems? Are there future plans for enhancements?

A: Currently it does not support all Jupyter features in the JavaScript kernels, but we will continue to work to add more useful features as we go on with the project.

Q: In order to use Jupyter, do I need to have a full HPCC Systems cluster deployed, or can I just have the ECL compiler available running on standalone mode?

A: You should have the HPCC server somewhere, but you do not need it on your local server. The HPCC System platform needs to be running, but the Jupyter Notebook does not have an HPCC component. It only has an interface to communicate with HPCC and the HPCC platform.

Q: Would the HPCC Systems visualizer bundle work on the Jupyter Notebook? My students would find this useful.

A: That is something we are working on, but it is not yet available.

Q: What makes the Jupyter Notebook more suitable for use with HPCC Systems over alternative notebooks?

  1. The simple answer is that the Jupyter Notebook is more convenient and it is very good for some tutorials and some articles. If you have a tutorial to teach users ECL, you can have instructions. For example, you could have instructions on how you will code and people can run the code, modify the code, and play with it. Another example is adding big data articles. To run ECL code you can embed code using articles so you can share results and other people can work with it or modify it.

 

33:50- Bob Foreman- ECL Tips: PROCESS and AGGREGATE functions Upgrading your ITERATE and ROLLUP

Bob provides information on the PROCESS and AGGREGATE functions and showcases a detailed demo of each.

1:03:55 Q&A

Q: Are both ITERATE and PROCESS equally efficient? Should I use one more than the other, if either one works for my use case?

A: ITERATE is easier than PROCESS. If either one works for your use case, you should use ITERATE. Remember, PROCESS is only used if you need to initialize the row and use that row state machine for subsequent iterations. PROCESS gives you a little more “bang for your buck,” but both are equally efficient. Keep things as simple as possible and let the complier do the work for you.

Q: Is there a difference in performance for TABLE vs AGGREGATE for doing crosstab aggregation?

A: Table is simpler to use. When you need to do a cross tab and also need to rollup records that is where AGGREGATE comes into play

Q: Couldn’t I achieve similar functionality to AGGREGATE, if I used a TABLE and a GROUP keyword?

A: For grouping fields, yes, but AGGREGATE has other functions. In other words, it does the rollup as well. It is a cross tab report with a rollup. So there are applications for using AGGREGATE sometimes that would be better than using just the simple cross tab report.

 

Have a new success story to share? We would welcome you to be a speaker at one of our upcoming The Download: Tech Talks episodes.

  • Want to pitch a new use case?
  • Have a new HPCC Systems application you want to demo?
  • Want to share some helpful ECL tips and sample code?
  • Have a new suggestion for the roadmap?

Be a featured speaker for an upcoming episode! Email your idea to Techtalks@hpccsystems.com

Visit The Download Tech Talks wiki for more information about previous speakers and topics: https://hpccsystems.atlassian.net/wiki/display/hpcc/HPCC+Systems+Tech+Talks

Watch past episodes of The Download:

The Download: Tech Talks by the HPCC Systems Community, Episode 4

The Download:Tech Talks by the HPCC Systems Community, Episode 5

The Download:Tech Talks by the HPCC Systems Community, Episode 6

 

More about HPCC Systems Community Day 2017

The HPCC Systems Summit Community Day and new training workshop are taking place October 3 & 4 in Buckhead, Atlanta, Georgia. Learn More and Register

  • Community Day will be held in Atlanta on October 4, 2017
  • Poster Competition held on October 3, submission instructions available on the Wiki
  • Thank you to our Sponsors! Datum, Infosys and Cognizant!

NEW THIS YEAR! Pre-Event Workshop on October 3 – Mastering Your Big Data with ECL

  • Registration is open to the public to attend
  • Details at https://hpccsystems.com/hpccsummit2017
  • This class is for attendees who want to understand the HPCC Systems platform and learn ECL to build powerful data queries. Anyone who needs a basic familiarity and learn best practices with ECL should attend. The one day class will take the student through the entire ETL cycle from Spray (Extract) to Transform (THOR) and finally to Load (ROXIE).

Topics include:

Part 1: Data Extraction and Transformation

  • Quick overview of THOR cluster, and the parallel distributed data processing concept, setting up a cluster, ECL Watch overview, spraying data, ECL IDE, ECL language essentials, and more.

Part 2: Prepare the Data Search Engine

  • Defining and building an INDEX, getting single and batch results, data indexing, filtering and normalization, searching, and more.

Part 3: Write and Publish ROXIE query

  • Call Search, Implicit function, publish in ECL Watch, test in WS-ECL, and more!