The Download: Tech Talks by the HPCC Systems Community, Episode 2

On February 16, 2017, HPCC Systems hosted the latest edition of The Download: Tech Talks by the HPCC Systems Community.  These technically-focused talks are for the community, by the community.  The Download: Tech Talks is intended to provide continuing education through high quality content and meaningful development insight throughout the year.

Watch the latest webcast here.

Slides from the webcast can be accessed here.

Episode Guest Speakers and Subjects:

Fujio Turner, Solutions Architect, Couchbase – Mobile/IoT & HPCC Systems

Fujio Turner is a Solutions Architect for Couchbase and he specializes in high-speed data platforms. He began his IT career as a LAMP stack developer and soon became a MySQL developer and DBA. His attention turned to the high availability NoSQL systems of CouchDB/Couchbase in 2010.

With his personal philosophy, “In the future, there will be more data, not less,” HPCC Systems was a perfect fit for him. In his spare time, Fujio evangelizes HPCC Systems in the Silicon Valley area with the Meetup group, “Exabyte Big Data – HPCC Systems – Silicon Valley.” His list of current and future projects include 3DJSON and Virtual Reality and Big Data.

Fujio will discuss the challenges around IoT and address the following questions:

  • As there are more mobile and embedded devices all generating more data, what does that mean now and for the future? 
  • What has to change in an organization’s infrastructure to keep up? 
  • And how can I best take advantage this new stream of information?

Jacob Pellock, Sr Director Software Engineering, LexisNexis Risk Solutions

Jacob Pellock is a Sr. Director with LexisNexis Risk Solutions where he is responsible for supporting cross-departmental Business Intelligence. He has been working at LexisNexis Risk Solutions for 14 years building solutions to support analytics across multiple industries. Jacob is particularly specialized in utilizing Big Data capabilities to support analysis and deployment of analytics capabilities into end user and system workflows.

Jacob will present:

Operationalizing jobs on Thor utilizing Python, Git and HPCC Systems client tools – Part I

So you’ve setup your HPCC Systems cluster and you’ve written your ECL code. Now you want to take the ECL you have written into production. Jacob will explain what technologies we have leveraged in bringing our LexisNexis data warehouse into production.

Roger Dev, Sr Architect, LexisNexis Risk Solutions

Roger is a Senior Architect working on the Machine Learning team at LexisNexis Risk Solutions.  He recently joined HPCC Systems from CA Technologies.  Roger has been involved in the implementation and utilization of machine learning and AI techniques for many years, and he has over 20 patents in diverse areas of software technology.

Roger’s presentation addresses:

Basic Linear Algebra Subsystem (BLAS) and Parallel Block BLAS (PBBlas) libraries for HPCC Systems.

Manipulation of matrix data via Linear Algebra operations lies at the heart of many data-mining and machine-learning techniques. New modules for HPCC provide highly scalable and performant implementations of these operations. BLAS provides an industry-standardized set of highly-optimized linear algebra operations. PBBlas extends these operations to mega-scale, splitting the operations into parallelizable units that can be balanced across an HPCC cluster. This talk provides an introduction to BLAS, describes the techniques and features of PBBlas, and provides an overview of the PBBlas interface.

Richard Taylor, Chief Trainer, HPCC Systems

Richard Taylor has worked with the HPCC Systems technology platform and the ECL programming language for over 15 years. He is the original author of the ECL documentation, developer and designer of the HPCC Systems Training Courses, and is the Chief Instructor for all classroom and remote based training.

Richard provides an overview on HPCC Systems Training: Updates and Deep Dives on Cool Code

Richard provided an update on what is going on with ECL/HPCC/SALT/KEL training courses. He also selected some interesting code snippets gathered from questions that come in from email and/or the Community Forums for an in-depth discussion of the techniques demonstrated by the code.

Key Discussion Topics:

4:25- Fujio Turner:  Mobile/IoT & HPCC Systems

Fujio reviews his experience with both Couchbase and HPCC Systems over the last several years.  He discusses how these two systems can work together to deliver mobile/IoT solutions that overcome the challenges of data handling and processing inherent with these types of solutions.  Fujio discusses the consolidation and management advantages of a Couchbase solution, provides key code snippets, and shows a high level diagram of a mobile solution connected to HPCC Systems ROXIE databases.

27:28- Q&A

Q.  How does Couchbase integrate with Thor and ROXIE?

A.  Fujio explains how he has been working with the HPCC Systems development team to establish two key areas of connection.  First, HPCC Systems can be used as a key value store inside a Couchbase solution through a piece of ECL code. Fujio and the HPCC Systems team are finalizing the details of the code and expect to have this available to the community in the near future.

Q. Can this setup be used for stream processing, for example to handle triggers to be executed when certain events happen?

A.  Yes, the system can be used as an Enterprise Service Bus action or a myriad of other actions to interact with users or groups of users.  Fujio gives a few examples of how companies are using streaming data

Q. Can Couchbase be used as a DeltaBase from ROXIE? If so, is there a public ECL bundle for it?

A.  Yes, Couchbase can be used as a DeltaBase from ROXIE.  This is part of the effort Fujio is working on with the HPCC Systems development team that should be available in the community in the near future.  Fujio will be joining The Download: HPCC Systems Podcast in a few weeks where he hopes to be able to provide additional details.

30:45- Jacob Pellock: Operationalizing Your HPCC Systems Environment, Part 1

Jacob discusses what you do after your HPCC systems cluster is configured and you have some ECL code written.  Jacob discussed how to transition to an automated operational system and what ecosystem tools his team has utilized to achieve operational goals.

Jacob explains how his team utilizes the following tools as well as sample code from their implementation:

  • HPCC Systems & ECL – warehouse data integration/transformation/distribution
  • Git- source code repository
  • Python- glue
  • HPCC Systems Client Tools- remote job submission

44:44-  Q&A

Q.  How do you integrate Git with your IDE (Integrated Development Environment)? Do you use ECL IDE or Eclipse?

A.  Jacob utilizes Eclipse with the standard Git add on for Eclipse but he believes the ECL IDE to be more widely used within his team.  Within the ECL IDE, people use the Tortoise plug in that works with Windows Explorer.  People that use both the ECL IDE and Eclipse use the command line utility to integrate with Git.

Q. Do you use EMBED Python? If so, do you have some open source ECL Bundles available?

A.  We do use EMBED Python.  Jacob explains how his team utilizes Python.  While he does not have any bundles available or know of any bundles that might be available, he thinks this is a good idea.  He will discuss with his team to see what can be used within the community.

Q. Could I automate my jobs programmatically too through an API?

A.  Yes, there is a web-based API.  Most examples Jacob walked through can be used through a web-based API.  In these cases, the command line is a shell.  There is a host of functions available through the web-based API.

47:30- Roger Dev: Basic Linear Algebra Subsystem (BLAS) and Parallel Block BLAS (PBBlas) Libraries for HPCC Systems

Roger works in the Machine Learning group.  He discusses linear algebra subsystems and explains the difference between BLAS, an open source linear algebra system, and PBBlas, which is proprietary to HPCC Systems and allows scale to huge matrixes and is available in Github.

1:05:18- Q&A

Q:  Are there particular optimizations for triangular matrices?

A: Both BLAS and PBBlas provide a triangular matrix solver to solve systems of linear equations such as ax=b form or the xa=b form where a is a triangular matrix.

Q. What are the OS/Library dependencies to install PBBlas? Do I just need to install the ECL bundle or is there anything else needed?

A.  The BLAS library is a standard platform library because it will automatically bring it into your platform.  PBBlas instals as a bundle over that.  You also need the Machine Learning core bundles.  The process to install bundles is very easy and fast.

Q.  Is the geometry of the matrix limited in any way for PBBlas? For example, can a row vector be partitioned across multiple nodes? What about a column vector?

A.  Yes, it will automatically handle all of these cases.  In fact, their optimization for multiplication for row and column vectors provide even more efficiency.  It is flexible on the shape of your matrixes and will try to optimize the partition regardless of the shape.

1:08:10- Richard Taylor: HPCC Systems Training: Updates and Deep Dives on Cool Code

Richard walks through the four training options offered by HPCC Systems, including in-person, remote, online, and mobile app training options to fit almost any need.  Richard provides the upcoming schedules for in-person and remote courses as well as how to register.

Richard demos LOOP functions (the LOOP function does not loop!)  Richard describes the five types of LOOP functions and shows the code for several different situational examples.

1:28:00- Q&A

Q. Do I need to be a developer or have any prior experience to take these classes?

A.  You do not need prior experience.  We have trained people with PhD’s in Computer Science or others who have never held a mouse.  HPCC Systems for Managers is good for non-technical people.  Our Introduction to ECL courses are a very good place to start for those who have not worked in HPCC Systems before.  There are many options for people of all levels.

More information on HPCC Systems Training can be found here:  HPCC Systems Training and Current Class Schedule

Have a new success story to share?     We would welcome you to be a speaker at one of our upcoming The Download: Tech Talks episodes.

  • Want to pitch a new use case?   
  • Have a new HPCC Systems application you want to demo?   
  • Want to share some helpful ECL tips and sample code?   
  • Have a new suggestion for the roadmap?

Be a featured speaker for an upcoming episode! Email your idea to Techtalks@hpccsystems.com  

Visit The Download Tech Talks wiki for more information: https://hpccsystems.atlassian.net/wiki/display/hpcc/HPCC+Systems+Tech+Talks

Watch Past Episodes of The Download: Tech Talks Webcasts:

The Download: Tech Talks by the HPCC Systems Community, Episode 1

  • Anirudh Shah, Co-Founder, 3Loq
    • How we use HPCC Systems to process more than 500 monthly marketing campaigns at the largest private bank in India across the banks entire portfolio.
    • Our experience with HPCC Systems in production
    • Automation and data sanity frameworks
  • Allan Wrobel, Senior Engineer, LexisNexis
    • Making full use of Superfiles to make order of magnitude improvements to build times on THOR. (plus fringe benefits)
    • Thor is well known for making short the processing of billions of records, and this promotes the tendency to use brute force in its deployment. Watch how the UK managed to implement efficiency over brute force to reduce the processing time for a daily build of a billion record ingest file from 12 hours, to 2 hours, and enabled further speed increases in other processes.
  • Lorraine Chapman, Consulting Business Analyst, HPCC Systems
    • In 2015, HPCC Systems was an accepted organization for Google Summer of Code (GSoC) taking on 2 students involved in this program. However, we had the bandwidth to support more students and so the HPCC Systems summer internship program was born. Four students joined the program in 2015 and four more in 2016. We will apply for GSoC and run our intern program again in 2017. Hear how the programs work, how projects are identified and find out about student successes on these programs.

Please also check out our blog for more information on The Download: An HPCC Systems Podcast for in depth conversations on HPCC Systems, use cases, and Big Data technology topics.