Hadoop Data Integration

Printer-friendly versionE-Mail

The HDFS to HPCC (H2H) Connector allows read/write access to HDFS data repositories.

This connector provides access to HDFS data files from HPCC. The connector provides ECL macros to easily populate ECL datasets of the HPCC Systems platform with HDFS data, and the ability to create HDFS files based on HPCC data. A libhdfs (native API provided by Hadoop) based connector, and a webhdfs (web based API provided by Hadoop) implementation are provided.

H2H Connector library (libhdfs based)

Release Size* Version
HDFS Connector Centos5
Release Date: 02/22/2013
Centos 64bit

29KB 1.4.0-1
MD5:4a523e500a8953f744b85af0b3060548
HDFS Connector Centos6
Release Date: 02/22/2013
Centos 64bit

29KB 1.4.0-1
MD5: 6ebeb87e3217095487908ce4896fc511
HDFS Connector OpenSuse 11.4
Release Date: 01/22/2013
OpenSuse 64bit

27KB 1.4.0-1
MD5: 7ea331af90856527aed39d1fa0e6e371
HDFS Connector Ubuntu 10.04
Release Date: 01/22/2013
Ubuntu 64bit

25KB 1.4.0-1
MD5: 03caebd306e56e8cf38ac151946aacc3
HDFS Connector Ubuntu 11.10
Release Date: 01/22/2013
Ubuntu 64bit

25KB 1.4.0-1
MD5: e4f34c9f1f21dbf30bb3d2dd6cd75bb7
HDFS Connector Ubuntu 12.04
Release Date: 01/22/2013
Ubuntu 64bit

25KB 1.4.0-1
MD5: 305f375ee2c863809bc65f408f351174
HDFS Connector Debian Squeeze
Release Date: 01/22/2013
Ubuntu 64bit

25KB 1.4.0-1
MD5: a26cb63b06d1c467e1f06be1f22e2312

H2H Connector library (webhdfs based)

Release Size* Version
WebHDFS Connector Centos5
Release Date: 02/22/2013
Centos 64bit

25KB 1.0.0-1rc
MD5: d4db66f8bc7e5c2091f9709f95a6760a
WebHDFS Connector Centos6
Release Date: 02/22/2013
Centos 64bit

28KB 1.0.0-1rc
MD5: 1e5163b878f3dac49c50b76ea5eebf0c
WebHDFS Connector OpenSuse 11.4
Release Date: 01/22/2013
OpenSuse 64bit

24KB 1.0.0-1rc
MD5: 5203d314ce1e63cd702c123c4925b7e2
WebHDFS Connector Ubuntu 10.04
Release Date: 01/22/2013
Ubuntu 64bit

24KB 1.0.0-1rc
MD5: 22dfad76a1c95cc88149a41341036314
WebHDFS Connector Ubuntu 11.10
Release Date: 01/22/2013
Ubuntu 64bit

24KB 1.0.0-1rc
MD5: 580cb83b3ede75dde8585860f857c7e9
WebHDFS Connector Ubuntu 12.04
Release Date: 01/22/2013
Ubuntu 64bit

24KB 1.0.0-1rc
MD5: cf8c703704ac2c551f97942eeea4f1b8
WebHDFS Connector Debian Squeeze
Release Date: 01/22/2013
Ubuntu 64bit

24KB 1.0.0-1rc
MD5: c4c3221f1309a660cf1d6e69ea78789d
* sizes are approximate



*****************************************************************

Known Limitations and Release Notes for H2H 1.4.0-1 and Web H2H 1.0.0-1rc
  • Due to recent changes, it is required that you uninstall any previous versions of H2H before installing 1.x releases.
  • LibHDFS based connector requires libhdfs.so, which requires local installation of Hadoop
  • WebHDFS based conector package notes:
    • Requires libcurl
    • Webhdfs must be enabled on target HDFS system
    • Target HDFS datanode hostnames must be resolvable locally (migh require adding entries in local hosts file)
    • PipeOutAndMerge not supported, only PipeOut. User responsible for merging file parts on Hadoop side
  • When installing the rpm (centos and opensuse) use the following command to install the plugin:
    sudo rpm -Uvh --nodeps hpccsystems-.rpm
  • We have seen some issues with CSV text qualifiers (or escape characters). If your data has field values containing escape characters, in rare cases, your data may not PipeIn correctly (there could be data corruption and/or record loss in the resulting dataset). This doesn’t affect your original data.
  • Occasionally, we have seen instances where the LD Library path is not set up correctly. This causes an error when the libjvm.so cannot be found. Follow the steps in the “HDFS to HPCC Connector” Document in the “Editing and distributing the Configuration file” section If you get the following error:
    …error while loading shared libraries: libjvm.so: cannot open shared object file: No such file or directory
  • If you are running under openSUSE, you may get a "java.net.SocketException: Protocol not available" when executing HDFSConnector.PipeOut or HDFSConnector.PipeOutAndMerge. This is a known Hadoop issue with with OpenJDK 1.6. You can resolve this by installing Oracle Java JDK and setting the LD_LIBRARY_PATH to the Oracle libjvm in your H2h configuration.
====================================================
Comprehensive List of changes from H2H 3.6.4-1rc  to H2H 1.4.0-1
====================================================
HH-55    Set 1.0.0.1RC version for H2H
HH-54    Cross installs might contaminate libhdfs connector conf file, create webhdfs/libhdfs compliant configuration file.
HH-53    Maxredirects in CENTOS seems to default to 0 even though it is defined to be 50, explicitly set libcurl maxredirects to 50.
HH-52    Add support for WebHDFS – implement webhdfs based H2H




Edition: 
Included in Community

Contact Us

email us   Email us
Toll-free   US: 1.877.316.9669
International   Intl: 1.678.694.2200

Sign up to get updates through
our social media channels:

facebook  twitter  LinkedIn  Google+  Meetup  rss  Mailing Lists

Get Started