The Download: Tech Talks by the HPCC Systems Community, Episode 5

On May 25, 2017, HPCC Systems hosted the latest edition of The Download: Tech Talks.  These technically-focused talks are for the community, by the community.  The Download: Tech Talks is intended to provide continuing education through high quality content and meaningful development insight throughout the year.

Watch the webinar

Episode Guest Speakers and Subjects:

Jeff Bradshaw, CTO, Adaptris
Jeff Bradshaw is the founder of Adaptris and Group CTO of Adaptris/F4F/DBT within Reed Business Information. He has spent his career integrating data wherever it resides and in-flight across a number of industries including Agriculture, Airlines, Telecommunications, Healthcare, Government and Finance.

Jeff has worked with and contributed to a number of international standards bodies and continues to work with large enterprises to help them extract value from their data silos and share data seamlessly with their trading partners to achieve business benefit. For the last few years Jeff has been focusing on Big Data and how to gather that across a wide range of sources to help gain insight into the agri-food supply chain.

Jeff presents: Interlok Deep Dive

Interlok is a powerful integration framework from Adaptris designed to help architects rapidly connect different applications, data stores and communications protocols using pre-built components.  It facilitates real-time data ingestion and flexible stream processing. In this talk, I will explain how Interlok is used within the HPCC Systems platform, specifically the Thor component, and developing entity models for delivering data insights.

Jon Burger, Sr Architect, LexisNexis Risk Solutions

Jon Burger is LexisNexis Risk’s head infrastructure architect with 20+ years in information technology and over 15 years’ experience with the HPCC platform.  He has worked in a variety of roles within technology including Director of Technology, Director of HPCC, Engineering in Network, Linux and Microsoft.  He currently works out of the Boca Raton office and is the father to two teenage boys.  Hive360 was created by him in an effort to aid in AWS deployments for LexisNexis Risk products.

Jon presents: Hive360, Cloud Ported HPCC Systems Platform

HPCC Systems is excited to announce the creation of the Hive360 & Swarm360 stacks.  Hive360 and its companion Swarm360 are a set of AWS cloud formation scripts designed to easily create a scalable, self-configuring, self-healing, on-demand HPCC platform within an existing AWS VPC. Taking advantage of native AWS services such as auto-scaling groups, EFS, cloud watch, cloud formation and multiple AZs allows a regular user with little or no experience with HPCC to create a dynamic, production ready big data processing platform that can easily be scaled for future growth.  This talk will introduce you to Hive360 and its components, give a brief demonstration of the process and answer any questions you have about this technology.

Rodrigo Pastrana, Software Architect, LexisNexis Risk Solutions

Rodrigo is an Architect with the HPCC systems supercomputer focusing in platform integration and plug-in development. He has been a member of the HPCC core technology team for over five years and a member of the LexisNexis team for seven. Rodrigo is the principle developer of WsSQL, the HPCC JDBC connector, the HPCC Java APIs library and tools, and the Dynamic ESDL component. He has more than fifteen years of experience in design, research and development of state of the art technology including IBM’s embedded text-to-speech and voice recognition products, Eclipse’s device development environment. Rodrigo holds an MS and BS in Computer Engineering from the University of Florida and during his professional career has filed more than ten patent disclosures through the USPTO.

Rodrigo presents: SQL on HPCC Systems

HPCC Systems has the powerful ECL language, but what if you want to execute SQL queries on HPCC based data? Or what if you want to integrate HPCC Systems into your favorite business intelligence product? HPCC Systems provides an SQL interface into its data files and published Roxie queries called WsSQL. As its name implies, this functionality is provided as a web service, which allows interactive and/or programmatic SQL based access to HPCC Systems.

Bob Foreman, Senior Software Engineer, HPCC Systems, LexisNexis Risk Solutions
Bob Foreman has worked with the HPCC Systems technology platform and the ECL programming language for over 5 years, and has been a technical trainer for over 25 years. He is the developer and designer of the HPCC Systems Online Training Courses, and is the Senior Instructor for all classroom and Webex/Lync based training.  

Bob presents: ECL Tip of the Month

This session will showcase an “ECL Tip of the Month”, presented by one of our ECL instructors, Bob Foreman. The tip will usually be something interesting that was posted on our HPCC Systems Support Forums, or a cool teaching example found in one of our many ECL classes.

Key Discussion Topics:

1:20- Flavio Villanustre provides community updates:

  • Welcome to the five summer interns starting this week!
    • 5 students in the program ranging from high school to PhD
    • Projects include machine learning, HPCC Systems integration, and extending the ECL standard library
    • Proposals for 2018 will open late September
  • Reminder: Call for Presentations and Poster Abstracts still open for the 2017 HPCC Systems Community Day!
    • Community Day will be held in Atlanta on October 4, 2017
    • Poster Competition held on October 3
    • This year’s theme is Smart Data
    • <strong>Submission deadline on June 30</strong>
    • Sponsorship opportunities still available. Thank you Datum Software!
    • Details at https://hpccsystems.com/hpccsummit2017Machine Learning Update

9:15- Jeff Bradshaw- Interlok Deep Dive

Jeff discussed how Adaptris can help you achieve integration anywhere to feed your big data platform.  Adaptris provides an Open Source Interlok Integration Framework that includes over 300 pre-built Interlok Adapters, commercial support, SaaS hosting with iPaaS options are available and consulting to help you get up and running.

Topics include:

  • ProAgrica- a case study on how truly messy data from disparate locations comes together
  • Solution architecture
  • Interlok as an alternative to APIs
  • IoT applications
  • Integration options with HPCC Systems
  • Technology demonstration

31:10- Q&A

Q.  Should I use tags or branches to manage attribute versions and releases and if so, why?

A.  Tags are like a frozen point of time so it does fit will with a release, like a time stamp.  Branches allow altering of the branches and hot fixes with a commit history so there is a view into the history.  It is a little trickier to do this with tags.  There are a lot of options in using branches and tags.  We used branches because of the option to alter what is in production and keep track of that.

Q. How does Git prevent your code from being overwritten if two people check out code one after the other?

A.  Merge Conflict functionality determines the head of the branch so it knows the branch you started with.  If two people start with the same branch with the same ancestry, the system knows the status of the code at the head of the branch. Git will accept the first pull request.  The subsequent submission will receive a merge conflict and allow for resolution before acceptance.  This functionality is one of the main reasons the team is moving to Git.

36:40- Jon Burger- Hive360, Cloud Ported HPCC Systems Platform

Jon discusses a new offering, Hive360, a cloud ported HPCC Systems platform.  Jon reviews

  • Why it was created: to leverage the on-demand benefits of IaaS cloud technology
  • What it is: two AWS cloud formation scripts
  • How the scripts help create a HPCC built to fully leverage cloud technology
  • Limitations , who the system is best suited for, and how to get started
  • Demo screen shots

53:20- Q&A

Q.  Is the very large number of virtual nodes running on a single server a performance burden for smaller clusters?

A.  There is a bit of contention that occurs, but we have not seen a dramatic performance decrease in the number of processes running on the servers.  We have tested this and we have seen a small performance degradation but we feel this isoffset by the flexibility of the system.

Q. Do you plan develop Hive360 for other platforms such as Microsoft Azure, BareMetal, Openstack?

A.  Yes, those are all in our scope.  We will likely begin with Openstack with Azure and BareMetal following.  This revolves around having a consistent distributed file system that has horizontal growth.

55:15- Rodrigo Pastrana- SQL on HPCC Systems

Rodrigo discusses how users can utilize an SQL interface built into HPCC Systems data files and published ROXIE queries called WsSQL. As its name implies, this functionality is provided as a web service, which allows interactive and/or programmatic SQL based access to HPCC Systems.

Rodrigo covers:

  • What WsSQL can do
  • Setup procedures and the simplicity of getting started
  • Ease of JDBC driver connection as well as how WsSQL is seen from the JDBC client
  • How to query via JAVA with four lines
  • Live demo to see the system in action from creating a file, loading the file, viewing metadata and performing simple queries as well as executing a prepared query and viewing results.

1:20:20- Q&A

Q:  Which BI tools have been successful integrating with your driver?

A: We have a few we have tested- LogiAnalytics, BIRT, some Eclipse-based tools, as well as a JDBC client called Squirrel and I expect there to be many more.

Q. How does WsSQL support SQL injection prevention?

A.  The most common way to prevent SQL injection is by exercising the prepared query concept, which we saw in the demo.  We have full support for declaring prepared queries and precompiling them and issuing iterative execution requests for these queries.  The entity creating the query declares the variables to be used as placeholders and the system knows the type of those variables.  On user input, we can confirm that the user input does match the expected type.

Q: What flavor of SQL syntax is supported?

A: We started using a MySQL 5.7 syntax.  Since then, in order to provide HPCC Systems functionality via SQL, we did add a couple of new entries into the grammar.  For the most part, we stay as true as possible to MySQL 5.7.

1:23:25- Bob Foreman: ECL Tip of the Month

Bob discusses and provides demos for two ECL Tips of the month

  • The secret Date/Time Functions of the Standard Library Reference
  • Fear no STRING – how to read and parse just about anything.

1:34:53- Q&A

Q. Is there a repository of useful ECL snippets?

A.  The tips and techniques forum area as well as the Language Reference manual and online training.  We also have a Wikipedia available.

Q. You mentioned in person training.  What are the requirements for in person training and where do you conduct training.

A: We require a minimum of six people in the United States and tem people internationally.  If there are enough people interested, we will go to sites to conduct training.  No prerequisites other than having an open mind and being ready to learn ECL!  The appropriate contact is Richard.Taylor@lexisnexisrisk.com.  Richard can provide more information on onsite classes both on and off LexisNexis properties.  There are also very robust online training options for people all over the world.

More information on HPCC Systems Training can be found here:  HPCC Systems Training and Current Class Schedule

Have a new success story to share?     We would welcome you to be a speaker at one of our upcoming The Download: Tech Talks episodes.

  • Want to pitch a new use case?   
  • Have a new HPCC Systems application you want to demo?   
  • Want to share some helpful ECL tips and sample code?   
  • Have a new suggestion for the roadmap?

Be a featured speaker for an upcoming episode! Email your idea to Techtalks@hpccsystems.com  

Visit The Download Tech Talks wiki for more information: https://hpccsystems.atlassian.net/wiki/display/hpcc/HPCC+Systems+Tech+Talks

Watch Past The Download: Tech Talks Webcasts:

The Download: Tech Talks by the HPCC Systems Community, Episode 2

  • Fujio Turner, Solutions Architect, Couchbase – Mobile/IoT & HPCC Systems
  • Fujio discusses the challenges around IoT and address the following questions:
  • As there are more mobile and embedded devices all generating more data, what does that mean now and for the future?
  • What has to change in an organization’s infrastructure to keep up?
  • And how can I best take advantage this new stream of information?
  • Jacob Pellock, Sr Director Software Engineering, LexisNexis Risk Solutions
    • Jacob presents Operationalizing jobs on Thor utilizing Python, Git and HPCC Systems client tools – Part I
  • Roger Dev, Sr Architect, LexisNexis Risk Solutions
    • Roger’s presentation addresses: Basic Linear Algebra Subsystem (BLAS) and Parallel Block BLAS (PBBlas) libraries.  Manipulation of matrix data via Linear Algebra operations lies at the heart of many data-mining and machine-learning techniques. New modules for HPCC provide highly scalable and performant implementations of these operations.
  • Richard Taylor, Chief Trainer, HPCC Systems
    • Richard provides an overview on HPCC Systems Training: Updates and Deep Dives on Cool Code as well as an update on what is going on with ECL/HPCC/SALT/KEL training courses. 

The Download: Tech Talks by the HPCC Systems Community, Episode 3

  • Joselito (Joey) Chua , PhD, Manager Software Engineer, Optimal Decisions Group
    • Joey presents an overview of prescriptive techniques involving simulation and optimisation, the engineering challenges in building prescriptive tools, and HPCC solutions for those challenges.
  • Jill Luber, Senior Architect, LexisNexis Risk Solutions
    • Jill discusses a migration plan that moved ECL production code, production processes and developers out of MySQL/SVN and into a Git code management culture.  This includes migrating both Roxie and Thor processes to use Git branches across multiple HPCC Systems environments, all while continuing production data builds and releases.
  • Michael Gardner, Software Engineer II, LexisNexis Risk Solutions
    • Michael presents the Java API and tools released by the HPCC Systems Platform team.  These projects include wsclient, rdf2hpcc, clienttools, and jdbc.  These open source projects, which can be found in the hpcc-systems github repositories, are designed to allow downstream developers a consistent means by which to interface with the HPCC Systems Platform.  And to facilitate the workflow of common tasks a downstream developer might be concerned with.
  • Bob Foreman, Senior Software Engineer, HPCC Systems, LexisNexis Risk Solutions
    • Bob explores David Bayliss’ ECL Bible Tutorial, with particular focus on the GRAPH function and building the inverted index for the ROXIE search.  A recorded screen share helps you navigate and better understand how to use the functionality.

The Download: Tech Talks by the HPCC Systems Community, Episode 4

  • Gordon Smith, Enterprise/Lead Architect, LexisNexis Risk Solutions
    • Gordon presents: “Visualizer” – the ECL Bundle that bridges the gap between ECL and the Visualization Framework (JavaScript/html) to allow ECL developers to embed visualizations and dashboards in their workunits with a few lines of ECL.
  • John Holt, Enterprise/Lead Architect, LexisNexis Risk Solutions
    • John presents: An Update of the Machine Learning Bundles.  He discusses the two new Bundles planned for the HPCC Systems 6.4 platform release; and the bundles that are currently targeted for the 7.0 platform release.
  • David de Hilster, Consulting Software Engineer, LexisNexis Risk Solutions
    • David presents: The ECL IDE Goes Multi-Language – Computer Languages that Is! A new multi-language ECL IDE is to the rescue! DUDE, HIPIE, SALT, ESDL, and KEL are now recognized by the new ECL IDE and are starting to make juggling all these different type files much more visually intuitive and functionally seamless.

Please also check out our blog for more information on The Download: An HPCC Systems Podcast for in depth conversations on HPCC Systems, use cases, and Big Data technology topics.