Scaling Data Science Capabilities: Leveraging a Homogeneous Big Data Ecosystem
With so many big data processing engines available, how does a start-up decide which one to use? For ClearFunnel, the answer was easy – HPCC Systems.
On February 15, Raj Chandrasekaran, CTO & Co-Founder, ClearFunnel, joined our panel for Tech Talk Episode 11. Raj provided a practitioner’s perspective of how his company has successfully leveraged the advantages of HPCC Systems to build and operate high-end big data analytics solutions for multiple clients. He delved into some of the core capabilities and extensions in HPCC Systems, which have allowed a single, homogeneous tech stack to successfully power use cases across a diverse spectrum of big data and machine learning domains.
So what sets HPCC Systems apart from other big data options? For running a SaaS business like ClearFunnel, the simplicity of solution engineering and speed of deployment becomes a key success factor. The key operational requirements are to successfully execute complex big data and machine learning use cases on tight budgets, and build and operate several advanced analytics use cases at big data scale for various clients at the same time.
HPCC Systems core architecture is very easy to use. It does not have layers and layers of complex ‘add-ons’ like Zookeeper, MR, Hive, Impala, HBase, Yarn, Mesos, RDD, Cluster Manager, DFS, GraphX, MLib, and SparkSQL which only make the whole Hadoop and Spark technology stacks and production deployment complex, time consuming, and very expensive to maintain. Not to mention the variety of talent that is needed to design, build, operate, and support such a complex cluster. In contrast, HPCC Systems architecture is extremely nimble and agile, and offers the right level of flexibility to perform the full spectrum of big data and complex machine learning operations without any of the countless add-ons for each new functionality. The biggest advantage of its architecture is the speed and cost of operations that it provides for companies and use cases of all sizes.
And consider this – a fully functional, production-grade HPCC Systems cluster consisting of several hundred nodes can be spun-up within a few minutes ready to fire away on the most complex of data engineering tasks, all of which can be easily programmed and managed with just ECL.
To learn more about how ClearFunnel has leveraged HPCC Systems for commercial success and how they implemented a full spectrum of complex data engineering use cases with HPCC Systems, see the recap of Tech Talk Episode 11. There you will find a link to the Webinar recording and the presentation slides, as well as Q & As from all the speakers.