Hadoop Data Integration

The HDFS to HPCC (H2H) Connector allows read/write access to HDFS data repositories.

Please note that the H2H Connector is no longer actively supported & the assets provided on this page are offered for archival purposes.This connector provides access to HDFS data files from HPCC. The connector provides ECL macros to easily populate ECL datasets of the HPCC Systems platform with HDFS data, and the ability to create HDFS files based on HPCC data. A libhdfs (native API provided by Hadoop) based connector, and a webhdfs (web based API provided by Hadoop) implementation are provided.

H2H Connector library (libhdfs based)

ReleaseSize*Version
HDFS Connector Centos5
Release Date: 02/22/2013
Centos 64bit
31KB1.4.4-1Download
MD5:5ea5103919c2d08752dcce043028e6c0
HDFS Connector Centos6
Release Date: 02/22/2013
Centos 64bit
31KB1.4.4-1Download
MD5: 0c78c4eb94c3cba7d306db8361c44e8d
HDFS Connector Ubuntu 10.04
Release Date: 03/14/2014
Ubuntu 64bit
27KB1.4.4-1
Download

MD5: e53c6f2781a21475db25495dba0d3aed
HDFS Connector Ubuntu 12.04
Release Date: 03/14/2014
Ubuntu 64bit
27KB1.4.4-1Download
MD5: cb1d95f208464d364d8a75396a68f2c9
HDFS Connector Ubuntu 12.10
Release Date: 03/14/2014
Ubuntu 64bit
28KB1.4.4-1Download
MD5: 04a19606165f580996db5618030ad946
HDFS Connector Ubuntu 13.04
Release Date: 03/14/2014
Ubuntu 64bit
28KB1.4.4-1Download
MD5: ef6cd32a219fe914cda2769dd7d06430
HDFS Connector Ubuntu 13.10
Release Date: 03/14/2014
Ubuntu 64bit
28KB1.4.4-1Download
MD5: be305d7cf8d25f2a88290d97eeb6df74

H2H Connector library (webhdfs based)

ReleaseSize*Version
WebHDFS Connector Centos5
Release Date: 02/22/2013
Centos 64bit
27KB1.4.4-1Download
MD5: d229a06032d8fda43e420092242dc836
WebHDFS Connector Centos6
Release Date: 02/22/2013
Centos 64bit
30KB1.4.4-1Download
MD5: e6b6694f91278b654789cbc0ae5cd628
WebHDFS Connector Ubuntu 10.04
Release Date: 03/14/2014
Ubuntu 64bit
26KB1.4.4-1Download
MD5: a05a5e78fcb61cc2bbd74800086e835e
WebHDFS Connector Ubuntu 12.04
Release Date: 03/14/2014
Ubuntu 64bit
26KB1.4.4-1Download
MD5: 895ccdf844c29cdd857ced966a6e3c6e
WebHDFS Connector Ubuntu 12.10
Release Date: 03/14/2014
Ubuntu 64bit
26KB1.4.4-1Download
MD5: c233125bad952bfd4239d87cd757aa41
WebHDFS Connector Ubuntu 13.04
Release Date: 03/14/2014
Ubuntu 64bit
26KB1.4.4-1Download
MD5: 020ecbc43eb9d38b4a016c10df9ce351
WebHDFS Connector Ubuntu 13.10
Release Date: 03/14/2014
Ubuntu 64bit
26KB1.4.4-1Download
MD5: 89d242ca1df67136ded1f30dfaee4353

* sizes are approximate

*****************************************************************

Known Limitations and Release Notes for H2H 1.4.4-1 and Web H2H 1.4.4-1

  • Due to recent changes, it is required that you uninstall any previous versions of H2H before installing 1.x releases.
  • LibHDFS based connector requires libhdfs.so, which requires local installation of Hadoop
  • WebHDFS based conector package notes:
    • Requires libcurl
    • Webhdfs must be enabled on target HDFS system
    • Target HDFS datanode hostnames must be resolvable locally (migh require adding entries in local hosts file)
    • PipeOutAndMerge not supported, only PipeOut. User responsible for merging file parts on Hadoop side
  • When installing the rpm (centos and opensuse) use the following command to install the plugin:
sudo rpm -Uvh --nodeps hpccsystems-.rpm
  • We have seen some issues with CSV text qualifiers (or escape characters). If your data has field values containing escape characters, in rare cases, your data may not PipeIn correctly (there could be data corruption and/or record loss in the resulting dataset). This doesn’t affect your original data.
  • Occasionally, we have seen instances where the LD Library path is not set up correctly. This causes an error when the libjvm.so cannot be found. Follow the steps in the “HDFS to HPCC Connector” Document in the “Editing and distributing the Configuration file” section If you get the following error:
…error while loading shared libraries: libjvm.so: cannot open shared object file: No such file or directory
============================================================

Comprehensive list of changes from H2H 1.4.2-1 to 1.4.4-1
============================================================

HH-84 Add support for hdfsuser on read operations
HH-86 Pull error messages from master branch to 1.4.4