HPCC Frequently Asked Questions - General Information and Capabilities
The HPCC has been in active development and use for over 10 years.
This technology has been proven in the marketplace for the past ten years. Our HPCC technology powers the products and solutions of the LexisNexis Risk Solutions business unit, whose mission is to provide essential insights to advance and protect people, industry and society. LexisNexis Risk Solutions customers include top government agencies, insurance carriers, banks and financial institutions, health care organizations, credit card issuers, top retail card issuers, cell phone providers and a range of other industries. HPCC technology is also used to provide enhanced content to the new Lexis electronic products that serve legal, academic and research industries.
Yes. Starting at the lowest level HPCC generates C++ and not Java; that immediately gives it an efficiency advantage. HPCC has also been in critical production environments for over a decade. The time and effort placed in individual components give a tangible performance boost. Our analysis of ECL executing code translated directly from the PigMix shows an average performance improvement of 3.7x.
That said, the real performance of the HPCC begins to show when the ECL language is used to its fullest to express data problems in their most natural form. In the hands of a skilled coder, speed improvements in excess of an order of magnitude are common, and two orders of magnitude are not out of the question.
Yes. HPCC works over the internet and/or over a private network. It also operates on either distributed or centralized systems.
No. The HPCC is not a traditional transactional database.
HPCC is completely scalable, capable of meeting any database need regardless of size. It can be used for almost any data-centric task.
You can call queries deployed on HPCC using SOAP and REST/JSON. You can also use a web form which is provided for testing.
Although we do not currently test this configuration, our source code is available for developers to explore these possibilities. Currently, only Client Tools is supported on Apple OSX.
Yes. The HPCC Thor works well on Amazon AWS EC2. More information is available in the Install Thor on AWS documentation.
The HPCC is built from the ground up to work as a single cohesive super computer. Managing and developing solutions for the HPCC is far simpler.
Historically Beowulf clusters have defined their space in the field of computational analysis and mathematics. HPCC is designed for the purpose of data manipulation and is geared for that specific purpose.
For example, in a Beowulf Cluster the programmer explicitly controls the inter-node communication via a facility such as MPI (Message Passing Interface) to perform a global data operation; while in an HPCC system the inter-node communication is performed implicitly.
ECL (Enterprise Control Language) is a programming language designed and used with the HPCC system. It is specifically designed for data management and query processing. ECL code is written using the ECL IDE programming development tool.
ECL is a transparent and implicitly parallel programming language which is both powerful and flexible. It is optimized for data-intensive operations, declarative, non-procedural and dataflow oriented. ECL uses intuitive syntax which is modular, reusable, extensible and highly productive. It combines data representation and algorithm implementation.
The ECL IDE is an integrated development environment for the ECL language designed to make ECL coding easy and programmer-friendly. Using the ECL IDE you can build, edit and execute ECL queries, and mix and match your data with any of the ECL built-in functions and/or definitions that you have created.
The ECL IDE offers a built-in Attribute Editor, Syntax Checking, and ECL Repository Access. You can execute queries and review your results interactively, making the ECL IDE a robust and powerful programming tool.
For a more detailed look at the ECL IDE, see the HPCC Data Tutorial that provides a walk-through of the development process from beginning to end using the ECL IDE.
Roxie (Rapid Online XML Inquiry Engine) is the data delivery engine used in HPCC to serve data quickly and can support many thousands of requests per node per second.
Thor (The Data Refinery Cluster) is responsible for consuming vast amounts of data, transforming, linking and indexing that data. It functions as a distributed file system with parallel processing power spread across several nodes. A cluster can scale from a single node to thousands of nodes.
Big Data is a term that refers to very large (e.g., tera or petabyte) data sets and secure storage facilities that are created and manipulated by hardware and software tools, and the processes and procedures used behind them to do this.
As a leading information provider, LexisNexis has more than 35 years experience in managing big data, from publicly available information such as worldwide newspapers, magazines, articles, research, case law, legal regulations, periodicals, and journals – to public records such as bankruptcies, liens, judgments, real estate records – to other types of information.
To manage, sort, link, and analyze billions of records within sub-seconds, LexisNexis Risk Solutions designed a data intensive supercomputer built on our own high performing computing cluster (HPCC) platform that is proven for the past 10 years with customers who need to sort through billons of records. Customers such as leading banks, insurance companies, utilities, law enforcement and Federal government depend on LexisNexis technology and information solutions to help them make better decisions faster.
To manage, sort, link, and analyze billions of records within seconds, LexisNexis Risk Solutions designed a data intensive supercomputer that has been proven for the past 10 years with customers who need to process billons of records within seconds. Customers such as leading banks, insurance companies, utilities, law enforcement and Federal government depend on LexisNexis Risk Solutions. LexisNexis has offered this platform as an open source solution under HPCC Systems. LexisNexis Risk Solutions is a $1.5 billion business unit of LexisNexis, a $6 billion information solutions company. LexisNexis is owned by Reed Elsevier, which had revenues of $12 billion in 2010.
ESP (Enterprise Services Platform) provides an easy to use interface to access ECL queries using XML, HTTP, SOAP (Simple Object Access Protocol) and REST (Representational State Transfer).