HPCC Systems blog contributors are engineers and data scientists who for years have enabled LexisNexis customers to use big data to fulfill critical missions, gain competitive advantage, or unearth new discoveries. Check this blog regularly for insights into how HPCC Systems technology can put big data to work for your own organization.
On January 12, 2017, HPCC Systems hosted the first ever episode of The Download: Tech Talks. These technically-focused talks are for the community, by the community. The Download: Tech Talks is intended to provide continuing education through high quality content and meaningful development insight throughout the year.
Watch the webcast here: https://www.brighttalk.com/webcast/15091/240577
TensorFlowTM (see https://www.tensorflow.org) is a new open-source program from Google for performing linear algebra operations on tensors (matrices) and connecting multiple such operations together. It is particularly suited for machine learning applications, and supports operations on GPUs as well as cluster-based operations across multiple machines when dealing with data that is too large for a single machine to handle.
Lily Xu joined the team as part of the HPCC Systems intern program in the summer of 2016. Lily is a PhD student at Clemson University, studying Computer Science which includes options in machine learning, data mining and software architecture. Lily submitted a proposal to implement the Yinyang K-Means clustering algorithm in ECL as a new feature to be included in the HPCC Systems machine learning library.
Sarthak Jain joined the HPCC Systems platform team as a student contributor during our involvement with the Google Summer of Code program in 2015. He added new statistics to the HPCC Systems Linear and Logistic Regression machine learning module. He returned to the team as part of the 2016 HPCC Systems summer intern program, again working on a machine learning related project, implementing Latent Semantic Analysis (LSA).
Following the HPCC Systems Summit, I took the opportunity to sit down with Raj Chandrasekaran, CTO and co-Founder of ClearFunnel (www.clearfunnel.com). Raj has a background in technology consulting and has led practices on Technology Strategy, Platforms, and Architecture. Currently on his third start-up, Raj shares information on how ClearFunnel is using the open source HPCC Systems big data platform to solve customer big data challenges as well as a
This is the second in a series of blogs featuring a student who worked on an HPCC Systems project in the summer of 2016. Syed Rahman is working towards a PhD in Statistics at the University of Florida and is a returning student intern. In 2015, he implemented a machine learning algorithm in ECL for the HPCC Systems open source project. Having been impressed with this work, we were delighted to welcome Syed back on to the team this year to work on another project.
One downside of using embedded database calls such as MySQL or Cassandra in your ECL code was that specifying the fields to be returned (or passed in, when inserting rows) was a little clunky and potentially inefficient. Projecting fields into EMBEDs makes this process much easier and more efficient in HPCC Systems 6.2.0.
Let's take a step back and review the approach ECL Developers may have been using to date and then take a look at how to use this new feature.
Just because HPCC Systems comes with its own proprietary programming language (ECL), does not mean this is the only language you can use to query your data. You can embed a number of different languages within your ECL code. Not only this, you can process data on a HPCC Systems cluster from a variety of different sources using the various plugins and connectors we provide specifically to help you bridge the gap.