White Papers

Many organizations have large amounts of data which have been collected and stored in massive datasets which need be processed and analyzed to provide business intelligence, improve products and services for customers, or to meet other internal data processing requirements. For example, Internet companies need to process data collected by Web crawlers as well as logs, click data, and other information generated by Web services. This white paper provides an introduction to the HPCC Platform that solves large data processing problem.

Enterprise Control Language (ECL) is the query and control language developed to manage all aspects of the massive data joins, sorts and builds that truly differentiate HPCC (High Performance Computing Cluster) from other technologies in its ability to provide flexible data analysis on a massive scale. This white paper explores how ECL works and how it tackles data problems. In addition, ECL's ease of use for building solutions is compared to other programming languages.

The principal performance driver of a Big Data application is the data model in which the Big Data resides. The aim of this paper is to discuss some of the principle data models that exist and are imposed; and then to argue that an industrial strength Big Data solution needs to be able to move between these models with a minimum of effort.

As a result of the continuing information explosion, many organizations are drowning in data and the resulting “data gap” or inability to process this information and use it effectively is increasing at an alarming rate. Data-intensive computing represents a new computing paradigm which can address the data gap using scalable parallel processing and allow government and commercial organizations and research environments to process massive amounts of data and implement applications previously thought to be impractical or infeasible.

Today’s biggest cyber challenges, which include the emergence of the advanced persistent threat, take advantage of the data deluge described above to establish long-term footholds, exploit multiple vulnerabilities, and deliver malicious payloads, all while avoiding detection. This white paper will focus on the big data processing platform from LexisNexis called HPCC Systems, (High Performance Computing Cluster) as a technology platform to ingest and analyze massive data that can offer meaningful indicators and warnings of malicious intent.

Fuse stands for File System in User Space. In basic terms, a FUSE driver enables a user to “mount data sources” that would not be “mountable” in the traditional sense. Where FUSE is concerned, “mountable” means we can interact with files as if they were a standard file system. This white paper gives an overview on Fuse and walks through an experimental proof of concept.

Social network analytics provide a different kind of data mining, visualized with graphing analysis – a tool that makes significant connections among individuals and behaviors clearer and that correlates relationships between entities that would otherwise go unnoticed. This white paper shows an example of how social network analytics uncover hidden and complex fraud schemes.

This article expands upon what it means to think declaratively and helps the reader begin the process. Learn more about the declarative thinking process and how the extensible nature of ECL comes into play.

This white paper compares and contrasts the traditional Relational Database Management System (DBMS)/ Structured Query Language (SQL) solution to the one offered by the High Performance Computing Cluster (HPCC) / ECL (Enterprise Control Language) platform. It is shown that ECL is not simply an adjunct to HPCC, but is actually a vital technological lynchpin that ensures the HPCC offering achieves performance levels which an SQL system is not capable of as a theoretical ideal. This paper also provides several case studies illustrating the new horizons that are opened by the HPCC and ECL combination.

The ECL performance advantage over Hadoop/PIG comes from two different areas: from the enhanced linguistic expressivity of ECL and from the advanced, mature engineering that underpins the platform. This white paper describes a comparison between ECL and PIG.

The HPCC Systems platform is built to be flexible and scalable at the same time. The underlying ECL programming language and built in components provides the foundational tooling to build fully automated ETL jobs that can quickly ingest new data and adjust to changes in schema. This white paper will describe the key components of the HPCC Systems ETL platform.

ECL has an excellent reference manual and many good 'starting from scratch' training resources that can teach ECL; this white paper does not aim to replace any of those but rather help those already experienced in Hadoop to answer the question: "how do I do X in ECL" – where X is a common Hadoop function.

The purpose of this white paper is to get a PIG programmer as productive in ECL, as they would be in PIG, as quickly as possible.

How can a company profit from their product while being fair to their Open Source development partners? After decades of corporate participation in Open Source, this question is still debated.

This white paper by Bruce Perens explains how HPCC Systems is taking a new approach.

Bruce Perens is one of the founders of the Open Source movement in software, and a strategic consultant to companies and governments on the issues of Open Source. There's more information on him at perens.com

Contact Us

email us   Email us
Toll-free   US: 1.877.316.9669
International   Intl: 1.678.694.2200

Sign up to get updates through
our social media channels:

facebook  twitter  LinkedIn  Google+  Meetup  rss  Mailing Lists

Get Started