More than 12 years ago, back in 2000, LexisNexis was pushing the envelope on what could be done to process and analyze large amounts of data with commercially available solutions at the time. The overall data size, combined with the large number of records and the complexity of the processing required made existing solutions non-viable. As a result, LexisNexis invented, from the ground up, a data-intensive supercomputer based on a parallel share-nothing architecture running on commodity hardware, which ultimately became the HPCC Systems platform.
To put this in a time perspective, it wasn’t until 2004 (several years later) that a pair of researchers from Google published a paper on the MapReduce processing model, which fueled Hadoop a few years later.
The HPCC Systems platform was originally designed, tested and refined to specifically address big data problems. It can perform complex processing of billions (or even trillions) of records, allowing users to run analytics in their entire data repository, without resorting to sampling and/or aggregates. Its real-time data delivery and analytics engine (Roxie) can handle thousands of simultaneous transactions, even on complex analytical models.
As part of the original design, the HPCC Systems platform can handle disparate data sources, with changing data formats, incomplete content, fuzzy matching and linking, etc., which are paramount to LexisNexis proprietary flagship linking technology known as LexID(sm).
But it is thanks to ECL, the high-level data-oriented declarative programming language powering the HPCC Systems platform, that this technology is truly unique. With advanced concepts such as data and code encapsulation, lazy evaluation, prevention of side effects, implicit parallelism and code reuse and extensibility, is that data scientists can focus on what needs to be done, rather on superfluous details around the specific implementation. These characteristics make the HPCC Systems platform significantly more efficient than anything else available in the marketplace.
Last June, almost a year ago, LexisNexis decided to release its supercomputing platform, under the HPCC Systems name, giving enterprises the benefit of an open source data intensive supercomputer that can solve large and complex data challenges. One year later, HPCC Systems has made a name for itself and built an impressive Community. Moreover, the HPCC Systems platform has been named one of the top five “start-ups” to watch and has been included in a recent Gartner 2012 Cool IT Vendors report.
LexisNexis has made an impact in the marketplace with its strategic decision to open source the HPCC Systems platform: a bold and innovative decision that can only arise from a Company which prides itself of being a thought leader, when it comes to Technology and Big Data analytics.