Hadoop Data Integration
The HDFS to HPCC (H2H) Connector allows read/write access to HDFS data repositories.
This connector provides access to HDFS data files from HPCC. The connector provides ECL macros to easily populate ECL datasets of the HPCC Systems platform with HDFS data, and the ability to create HDFS files based on HPCC data. A libhdfs (native API provided by Hadoop) based connector, and a webhdfs (web based API provided by Hadoop) implementation are provided.H2H Connector library (libhdfs based)
| Release | Size* | Version | |
|
HDFS Connector Centos5
Release Date: 02/22/2013 Centos 64bit |
29KB | 1.4.0-1 |
MD5:4a523e500a8953f744b85af0b3060548
|
|
HDFS Connector Centos6
Release Date: 02/22/2013 Centos 64bit |
29KB | 1.4.0-1 |
MD5: 6ebeb87e3217095487908ce4896fc511
|
|
HDFS Connector OpenSuse 11.4
Release Date: 01/22/2013 OpenSuse 64bit |
27KB | 1.4.0-1 |
MD5: 7ea331af90856527aed39d1fa0e6e371
|
|
HDFS Connector Ubuntu 10.04
Release Date: 01/22/2013 Ubuntu 64bit |
25KB | 1.4.0-1 |
MD5: 03caebd306e56e8cf38ac151946aacc3
|
|
HDFS Connector Ubuntu 11.10
Release Date: 01/22/2013 Ubuntu 64bit |
25KB | 1.4.0-1 |
MD5: e4f34c9f1f21dbf30bb3d2dd6cd75bb7
|
|
HDFS Connector Ubuntu 12.04
Release Date: 01/22/2013 Ubuntu 64bit |
25KB | 1.4.0-1 |
MD5: 305f375ee2c863809bc65f408f351174
|
|
HDFS Connector Debian Squeeze
Release Date: 01/22/2013 Ubuntu 64bit |
25KB | 1.4.0-1 |
MD5: a26cb63b06d1c467e1f06be1f22e2312
|
H2H Connector library (webhdfs based)
| Release | Size* | Version | |
|
WebHDFS Connector Centos5
Release Date: 02/22/2013 Centos 64bit |
25KB | 1.0.0-1rc |
MD5: d4db66f8bc7e5c2091f9709f95a6760a
|
|
WebHDFS Connector Centos6
Release Date: 02/22/2013 Centos 64bit |
28KB | 1.0.0-1rc |
MD5: 1e5163b878f3dac49c50b76ea5eebf0c
|
|
WebHDFS Connector OpenSuse 11.4
Release Date: 01/22/2013 OpenSuse 64bit |
24KB | 1.0.0-1rc |
MD5: 5203d314ce1e63cd702c123c4925b7e2
|
|
WebHDFS Connector Ubuntu 10.04
Release Date: 01/22/2013 Ubuntu 64bit |
24KB | 1.0.0-1rc |
MD5: 22dfad76a1c95cc88149a41341036314
|
|
WebHDFS Connector Ubuntu 11.10
Release Date: 01/22/2013 Ubuntu 64bit |
24KB | 1.0.0-1rc |
MD5: 580cb83b3ede75dde8585860f857c7e9
|
|
WebHDFS Connector Ubuntu 12.04
Release Date: 01/22/2013 Ubuntu 64bit |
24KB | 1.0.0-1rc |
MD5: cf8c703704ac2c551f97942eeea4f1b8
|
|
WebHDFS Connector Debian Squeeze
Release Date: 01/22/2013 Ubuntu 64bit |
24KB | 1.0.0-1rc |
MD5: c4c3221f1309a660cf1d6e69ea78789d
|
- Download ECL Library for HDFS Connector (ECL File)
- HDFS Connector Library for IDE (ZIP)
- HDFS to HPCC Connector Doc (PDF)
- Listen to the libhdfs H2H Podcast
*****************************************************************
Known Limitations and Release Notes for H2H 1.4.0-1 and Web H2H 1.0.0-1rc
- Due to recent changes, it is required that you uninstall any previous versions of H2H before installing 1.x releases.
- LibHDFS based connector requires libhdfs.so, which requires local installation of Hadoop
- WebHDFS based conector package notes:
- Requires libcurl
- Webhdfs must be enabled on target HDFS system
- Target HDFS datanode hostnames must be resolvable locally (migh require adding entries in local hosts file)
- PipeOutAndMerge not supported, only PipeOut. User responsible for merging file parts on Hadoop side
- When installing the rpm (centos and opensuse) use the following command to install the plugin:
sudo rpm -Uvh --nodeps hpccsystems-
.rpm - We have seen some issues with CSV text qualifiers (or escape characters). If your data has field values containing escape characters, in rare cases, your data may not PipeIn correctly (there could be data corruption and/or record loss in the resulting dataset). This doesn’t affect your original data.
- Occasionally, we have seen instances where the LD Library path is not set up correctly. This causes an error when the libjvm.so cannot be found. Follow the steps in the “HDFS to HPCC Connector” Document in the “Editing and distributing the Configuration file” section
If you get the following error:
…error while loading shared libraries: libjvm.so: cannot open shared object file: No such file or directory
- If you are running under openSUSE, you may get a "java.net.SocketException: Protocol not available" when executing HDFSConnector.PipeOut or HDFSConnector.PipeOutAndMerge. This is a known Hadoop issue with with OpenJDK 1.6. You can resolve this by installing Oracle Java JDK and setting the LD_LIBRARY_PATH to the Oracle libjvm in your H2h configuration.
==================================================== Comprehensive List of changes from H2H 3.6.4-1rc to H2H 1.4.0-1 ==================================================== HH-55 Set 1.0.0.1RC version for H2H HH-54 Cross installs might contaminate libhdfs connector conf file, create webhdfs/libhdfs compliant configuration file. HH-53 Maxredirects in CENTOS seems to default to 0 even though it is defined to be 50, explicitly set libcurl maxredirects to 50. HH-52 Add support for WebHDFS – implement webhdfs based H2H
Edition:
Included in Community









