The Download: Tech Talks by the HPCC Systems Community, Episode 4

On April 20, 2017, HPCC Systems hosted the latest edition of The Download: Tech Talks.  These technically-focused talks are for the community, by the community.  The Download: Tech Talks is intended to provide continuing education through high quality content and meaningful development insight throughout the year.

Watch the webinar here

Episode Guest Speakers and Subjects:

​Gordon Smith, Enterprise/Lead Architect, LexisNexis Risk Solutions
Gordon is an Enterprise/Lead Architect and manager of the HPCC Systems supercomputer clients. He is a member of the HPCC Core Platform team and a LexisNexis employee for over 18 years. Gordon is the principle developer for ECL related development and visualization tools, including the ECL IDE, ECL Plugin for Eclipse, ECL Watch, ECL Execution Graph Viewer and more recently the HPCC Visualization Framework.

 Gordon is also involved in our HPCC Systems intern program and serves as a mentor on any projects relating to the HPCC Visualization Framework and/or a web based debugging front end for ECL.

Gordon presents:

“Visualizer” – the ECL Bundle

Bridging the gap between ECL and the Visualization Framework (JavaScript/html), it allows ECL developers to embed visualizations and dashboards in their workunits with a few lines of ECL.

John Holt, Enterprise/Lead Architect, LexisNexis Risk Solutions
Dr. Holt is an Enterprise/ Lead Architect for LexisNexis Risk Solutions. Dr. Holt directs various projects such as the evolution of the Insurance Applications Systems to help assess risk and detect fraud which leverages the HPCC Systems platform.

Dr. Holt has been with LexisNexis for 36 years. Prior positions have included system architecture for the Risk Solutions Fabrication Systems, system architecture for the LexisNexis online system, project management, product management, and product development. Dr. Holt holds a PhD and an MS in Computer Science from Wright State University, an MBA from Wright State University, and a BS in Data processing from the University of Dayton.

John presents:

An Update of the Machine Learning Bundles

In this talk, John will discuss the two new Bundles planned for the HPCC Systems 6.4 platform release; and the bundles that are currently targeted for the 7.0 platform release.

David de Hilster, Consulting Software Engineer, LexisNexis Risk Solutions
David de Hilster is a consulting software engineer working on the development efforts of the ECL IDE component of the HPCC Systems platform. David has more than 20 years’ experience in research, design, programming, and bringing innovative ideas to the market place. David has developed numerous software designs including a resume processor, an online card room, and a Visual Studio-like environment for creating analyzers that process human language. Known for rapid prototyping, enthusiasm, creativity, and ability to communicate technical ideas to non-technical clients.

David presents:

The ECL IDE Goes Multi-Language – Computer Languages that Is!

The ECL IDE is named such because it is the best interface for programming ECL on the planet. But in the last few years, other languages have sprung up around it that generate ECL and they are being used more and more by the HPCC community. In response to this shift, a new multi-language ECL IDE is to the rescue! DUDE, HIPIE, SALT, ESDL, and KEL are now recognized by the new ECL IDE and are starting to make juggling all these different type files much more visually intuitive and functionally seamless.

Jessica Lorti, Director, Technology Marketing, HPCC Systems, LexisNexis Risk Solutions
Jessica comes from an extensive background defining and implementing strategic programs across a variety of marketing disciplines for the technology, financial services, and energy industries. She has held senior marketing roles at GE, Intel, Compaq and Grant Thornton where she managed product marketing and brought new technologies to market, developed and launched social media and online marketing efforts, and developed new business models in conjunction with sales and key corporate partners.

Jessica holds a Bachelor of Science in International Economics from Texas Tech University and a Masters in International Management with a concentration in Marketing from Thunderbird, the Global School of International Management.  She has also earned the LEAN Six Sigma Green Belt certification.

Jessica presents:

HPCC Systems  – New Website Preview

In this presentation, Jessica will share an update on our latest HPCC Systems Website initiatives and demo the upcoming hpccsystems.com redesign.

Key Discussion Topics:

1:25- Flavio Villanustre discusses:

  • The HPCC Systems Summer Intern program, deadline April 22
  • Call for presentations and poster abstracts for the 2017 HPCC systems Community Day to be held October 4th in Atlanta, Georgia

9:30- Gordon Smith: “Visualizer” – the ECL Bundle

10:22 Gordon discusses the Visualizer ECL bundle, a self-contained package of functionality that may not have made it into the standard ECL library or may not be a good candidate for the standard library.  This bundle allows you to introduce visualization within ECL allowing you to visualize data that is hosted within a logical file, work unit, or ROXIE query.

Topics include:

  • How to get and install the bundle
  • How to introduce visualization within ECL
  • Hello World code example
  • Self test included in the bundle
  • Screen share to see the visualizer bundle in action

27:15- Q&A

Q.  Is Visualizer distributed? Is it running in ROXIE or Thor?

A.  The Visualizer bundle is running inside your local web browser.  It fetches the data from either ROXIE or Thor depending on how you specify those parameters in ECL.  When you leave it blank, it will default to the parent work unit that it is run in, but you can also specify a query published to ROXIE on its url.  Gordon provides more information on how this works.

Q. How does the Visualize tab found in the Output tab of the ECL watch used, or is this a different type of Visualization?

A.  The Visualize tab in ECL Watch predates the open source visualization framework.  It was the genesis to it.  The Visualize tab in ECL Watch will likely move to use a lot of the new visualizations and technologies.  It could be that the Visualize bundle may replace this as it has much more functionality.

Q. Can the visual output be used in a website outside of ECL Watch?

A. For example, a batch job or large production job can have visualization embedded to report on the status of the job and those visualizations and dashboards would come from a static dashboard.  Using the download button, you are able to download and publish those dashboards without having access to the platform.  You would then be able to publish those externally.  You could also tweak the Visualizer code and the way it talks to the server so that instead of talking directly with the platform, it will instead talk to a proxy server, which in turn relays the information on to the platform and then echoes via the proxy server back.  This is a two-line code change.  Gordon provides additional information on the benefits.

Q.  With what versions of the platform is the bundle compatible?

A. The lowest version I have tested with is 5.6.  It is backwards compatible.  The ECL Watch has had support for embedded resources for a long time now and should be compatible with most installations out there today.

Q.  Can I use the bundle in an “air gapped” environment?

A. Not today; however, you can take a copy of the Visualization framework and host it internally with a one line change to the HTML file that is being embedded.  Gordon provides additional details in his answer.

Questions not answered on air:

Q. Is it good idea to use Visualization library (JS library) within application to show related dashboard or use the visualizer to generate dashboard within ECL Watch?

A. Depends on your available resources – if you only have ECL developers then the Visualizer bundle is a quick win.  If you have some HTML/JS development experience and a web server to host your visualizations, then you will find that route offers more flexibility.  There are some examples of the latter here:  https://bl.ocks.org/GordonSmith

33:45- John Holt: An Update of the Machine Learning Bundles

John explains how the ML library is being restructured as the current ECL ML repository moves from beta to a production-ready set of supported features.

John discusses:

  • A short review of the restructure taking place
  • Prerequisites for machine learning bundles
  • Machine Learning bundles for 6.4 including Logistic Regression and Multiple Linear Regression
  • Validation testing
  • Machine Learning bundles for 7.0 including SVM, Stepwise Logistic Regression and Logistic Regression for multinomial case, Stepwise Multiple Linear Regression, and Descriptive Stats

45:15- Q&A

Q. What portion of the time do you have multiple independents datasets that need the same analysis and modeling process versus a single dataset (one big pile)?

A.  You use the myriad interface when you are going to run logistic regression against data from different contributors so you run the same logistic regression and build models for each contributor.

Q. Is decision tree supported already or will it be supported in 7.0?

A.  I am not sure if the decision trees or random forests will make 7.0 but they will be supported.  They are high on our list of things to cover and we will make every effort to get them in as soon as possible.

Q. I heard that there is support for Random Forests. Is that part of the ECL ML Bundle too?

A.  Random Forests is included in ELC ML, which is quite good and very strong.  It does not support the myriad interface but this is what we will be working to include in a tree and random forest bundle to be completed once the 7.0 bundles are completed.

Q.  Can you recommend any books or web sites to someone learning ML for the first time?

A. My favorite book is Machine Learning by Murphy published a few years ago.  It is a very good, worthwhile book but may require you to reacquaint yourself with some Mathematics concepts.

Q.  Can anyone contribute to the ECL-ML library?

A.  Yes, we are looking for looking contributions from any member of the community.  We have standards with regard to the degree of validation required and some of the contributions will be in separate bundles rather than being in the bundles supported by the platform team. 

Q.  Are all the algorithms optimized to perform in parallel in a cluster?

A. Yes, all the ones we support in the platform team are optimized to use the cluster as much as possible.

Question not answered on air:

Q. Will there be a reference manual for ML in 6.4 or 7.0?

A. There will be a reference that is generated from the ECL definitions and comments.  We have modified the ECL compiler to generate the raw information, and we are developing a Java application that will generate HTML pages that are similar to the HTML pages generated by the javadoc program.

51:05- David de Hilster: The ECL IDE Goes Multi-Language – Computer Languages that Is!

David discusses new ECL IDE features coming in 6.4 and beyond. 

Listen to how the new interfaces can help you generate ECL code or do things easier with ECL. Features discussed include:

  • Colorized Languages
  • Language Specific Element Colors
  • Target Background Colors
  • General Colorization (Non-Language Specific)
  • Read Only Background
  • hthor Background
  • File Color Coding
  • Generating ECL for KEL

1:02:10- Q&A

Q:  What language capabilities are you planning to add?

A: We are planning to add some of batch files with some of the languages.  ESDL will likely be included with this version.  A lot of these languages will be running offline in their own executable generating ECL or doing other activities.  Associated with these other languages we will have available colorized in the ECL IDE, we will also have other types of functionalities.  Right now we have submitting but in the future we will be adding additional parameters for other languages.

Q. Are you planning to add support for different embedded languages too? Languages like Python, C++ or Java.?

A.  I believe in the future, that is something we can do.  We are working on the languages that are being developed here at LexisNexis first; however, that is something which could be easily added.

1:04:00- Jessica Lorti: HPCC Systems  – New Website Preview

Jessica reviews the new HPCC Systems website, which will be going live in June. (This date is an update from that presented in the webcast.) 

The new website includes a responsive design for all types of devices and features a more streamlined navigation and improved download interface.

Have a new success story to share?     We would welcome you to be a speaker at one of our upcoming The Download: Tech Talks episodes.

  • Want to pitch a new use case?   
  • Have a new HPCC Systems application you want to demo?   
  • Want to share some helpful ECL tips and sample code?   
  • Have a new suggestion for the roadmap?

Be a featured speaker for an upcoming episode! Email your idea to Techtalks@hpccsystems.com  

Visit The Download Tech Talks wiki for more information: https://hpccsystems.atlassian.net/wiki/display/hpcc/HPCC+Systems+Tech+Talks

Watch Past The Download: Tech Talks Webcasts:

The Download: Tech Talks by the HPCC Systems Community, Episode 3

  • Joselito (Joey) Chua , PhD, Manager Software Engineer, Optimal Decisions Group
    • Joey presents an overview of prescriptive techniques involving simulation and optimisation, the engineering challenges in building prescriptive tools, and HPCC solutions for those challenges.
  • Jill Luber, Senior Architect, LexisNexis Risk Solutions
    • Jill discusses a migration plan that moved ECL production code, production processes and developers out of MySQL/SVN and into a Git code management culture.  This includes migrating both Roxie and Thor processes to use Git branches across multiple HPCC Systems environments, all while continuing production data builds and releases.
  • Michael Gardner, Software Engineer II, LexisNexis Risk Solutions
    • Michael presents the Java API and tools released by the HPCC Systems Platform team.  These projects include wsclient, rdf2hpcc, clienttools, and jdbc.  These open source projects, which can be found in the hpcc-systems github repositories, are designed to allow downstream developers a consistent means by which to interface with the HPCC Systems Platform.  And to facilitate the workflow of common tasks a downstream developer might be concerned with.
  • Bob Foreman, Senior Software Engineer, HPCC Systems, LexisNexis Risk Solutions
    • Bob explores David Bayliss’ ECL Bible Tutorial, with particular focus on the GRAPH function and building the inverted index for the ROXIE search.  A recorded screen share helps you navigate and better understand how to use the functionality.

The Download: Tech Talks by the HPCC Systems Community, Episode 2

  • Fujio Turner, Solutions Architect, Couchbase – Mobile/IoT & HPCC Systems

Fujio discusses the challenges around IoT and address the following questions:

  • As there are more mobile and embedded devices all generating more data, what does that mean now and for the future?
  • What has to change in an organization’s infrastructure to keep up?
  • And how can I best take advantage this new stream of information?
  • Jacob Pellock, Sr Director Software Engineering, LexisNexis Risk Solutions
    • Jacob presents Operationalizing jobs on Thor utilizing Python, Git and HPCC Systems client tools – Part I
  • Roger Dev, Sr Architect, LexisNexis Risk Solutions
    • Roger’s presentation addresses: Basic Linear Algebra Subsystem (BLAS) and Parallel Block BLAS (PBBlas) libraries.  Manipulation of matrix data via Linear Algebra operations lies at the heart of many data-mining and machine-learning techniques. New modules for HPCC provide highly scalable and performant implementations of these operations.
  • Richard Taylor, Chief Trainer, HPCC Systems
    • Richard provides an overview on HPCC Systems Training: Updates and Deep Dives on Cool Code as well as an update on what is going on with ECL/HPCC/SALT/KEL training courses.

The Download: Tech Talks by the HPCC Systems Community, Episode 1

  • Anirudh Shah, Co-Founder, 3Loq
    • How we use HPCC Systems to process more than 500 monthly marketing campaigns at the largest private bank in India across the banks entire portfolio.
    • Our experience with HPCC Systems in production
    • Automation and data sanity frameworks
  • Allan Wrobel, Senior Engineer, LexisNexis
    • Making full use of Superfiles to make order of magnitude improvements to build times on THOR. (plus fringe benefits)
    • Thor is well known for making short the processing of billions of records, and this promotes the tendency to use brute force in its deployment. Watch how the UK managed to implement efficiency over brute force to reduce the processing time for a daily build of a billion record ingest file from 12 hours, to 2 hours, and enabled further speed increases in other processes.
  • Lorraine Chapman, Consulting Business Analyst, HPCC Systems
    • In 2015, HPCC Systems was an accepted organization for Google Summer of Code (GSoC) taking on 2 students involved in this program. However, we had the bandwidth to support more students and so the HPCC Systems summer internship program was born. Four students joined the program in 2015 and four more in 2016. We will apply for GSoC and run our intern program again in 2017. Hear how the programs work, how projects are identified and find out about student successes on these programs.

Please also check out our blog for more information on The Download: An HPCC Systems Podcast for in depth conversations on HPCC Systems, use cases, and Big Data technology topics.