HPCC Systems blog contributors are engineers and data scientists who for years have enabled LexisNexis customers to use big data to fulfill critical missions, gain competitive advantage, or unearth new discoveries. Check this blog regularly for insights into how HPCC Systems technology can put big data to work for your own organization.
Arjuna Chala, Senior Director of Technology for HPCC Systems, and I sat down to discuss the Future Technologies Conference that took place in San Francisco on December 6-7, 2016. Arjuna was a keynote speaker at the event and we discuss conference highlights and what he thinks these insights mean for the future of big data technology.
On January 12, 2017, HPCC Systems hosted the first ever episode of The Download: Tech Talks. These technically-focused talks are for the community, by the community. The Download: Tech Talks is intended to provide continuing education through high quality content and meaningful development insight throughout the year.
Watch the webcast here: https://www.brighttalk.com/webcast/15091/240577
TensorFlowTM (see https://www.tensorflow.org) is a new open-source program from Google for performing linear algebra operations on tensors (matrices) and connecting multiple such operations together. It is particularly suited for machine learning applications, and supports operations on GPUs as well as cluster-based operations across multiple machines when dealing with data that is too large for a single machine to handle.
Lily Xu joined the team as part of the HPCC Systems intern program in the summer of 2016. Lily is a PhD student at Clemson University, studying Computer Science which includes options in machine learning, data mining and software architecture. Lily submitted a proposal to implement the Yinyang K-Means clustering algorithm in ECL as a new feature to be included in the HPCC Systems machine learning library.
Sarthak Jain joined the HPCC Systems platform team as a student contributor during our involvement with the Google Summer of Code program in 2015. He added new statistics to the HPCC Systems Linear and Logistic Regression machine learning module. He returned to the team as part of the 2016 HPCC Systems summer intern program, again working on a machine learning related project, implementing Latent Semantic Analysis (LSA).
Following the HPCC Systems Summit, I took the opportunity to sit down with Raj Chandrasekaran, CTO and co-Founder of ClearFunnel (www.clearfunnel.com). Raj has a background in technology consulting and has led practices on Technology Strategy, Platforms, and Architecture. Currently on his third start-up, Raj shares information on how ClearFunnel is using the open source HPCC Systems big data platform to solve customer big data challenges as well as a
This is the second in a series of blogs featuring a student who worked on an HPCC Systems project in the summer of 2016. Syed Rahman is working towards a PhD in Statistics at the University of Florida and is a returning student intern. In 2015, he implemented a machine learning algorithm in ECL for the HPCC Systems open source project. Having been impressed with this work, we were delighted to welcome Syed back on to the team this year to work on another project.
One downside of using embedded database calls such as MySQL or Cassandra in your ECL code was that specifying the fields to be returned (or passed in, when inserting rows) was a little clunky and potentially inefficient. Projecting fields into EMBEDs makes this process much easier and more efficient in HPCC Systems 6.2.0.
Let's take a step back and review the approach ECL Developers may have been using to date and then take a look at how to use this new feature.