Tue Aug 16, 2022 6:13 am
Login Register Lost Password? Contact Us

Please Note: The HPCC Systems forums are moving to Stack Overflow. We invite you to post your questions on Stack Overflow utilizing the tag hpcc-ecl (https://stackoverflow.com/search?tab=newest&q=hpcc-ecl). This legacy forum will be active and monitored during our transition to Stack Overflow but will become read only beginning September 1, 2022.

Installing Spark Plugin

Post questions specific to installation or configuration for the HPCC Systems platform

Mon May 20, 2019 1:48 pm Change Time Zone

I'm having some trouble with Spark plugin installation. I'm not sure if I understand the documentation and need some guidance. I'm using the installation guide found here.

I have a 3-node cluster running HPCC CE 7.2.10-1 on Ubuntu 16.04 LTS. I'm able to access ECL Watch on port 8010 and upload and spray files without issue. When I point my browser to port 8080, I get a page not found error. Further, I don't see Sparkthor listed on the ECL Watch System Servers page.

I used the Configuration Manager to add the Sparkthor component and pushed the environment file to all machines. I've confirmed the following lines were added:

Code: Select all
<SparkThorProcess build="_"
   <Instance computer="node001006"

I've also confirmed that Java is installed on all machines with the following output from java -version:

Code: Select all
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-8u212-b03-0ubuntu1.16.04.1-b03)
OpenJDK 64-Bit Server VM (build 25.212-b03, mixed mode)

It wasn't clear to me whether I needed to manually install Spark, so I didn't at first. After the first failed installation attempt, however, I installed Spark, but the issue persists. This page mentions the following:

The HPCC Systems Spark Connector requires Spark 2.10 or 2.11 and the org.hpccsystems.wsclient library available from the Maven Repository, download now.

Find the source code and examples in the spark-hpccsystems repository
Get the 7.2.12-1 JAR file from Maven Central Repository or download now
Get the javadocs from Maven Central Repository or download now

It's also not clear to me whether these files are necessary except for specific applications, so I didn't download these files (I'm not sure how to anyway). Maybe that's where the problem lies.
Posts: 2
Joined: Mon May 20, 2019 12:35 pm

Wed May 22, 2019 7:36 pm Change Time Zone

It looks like this problem only occurs when using systemctl to start the cluster. I don't have this problem when I use the command

Code: Select all
/etc/init.d/hpcc-init start

This may be a bug.
Posts: 2
Joined: Mon May 20, 2019 12:35 pm

Thu May 23, 2019 1:19 pm Change Time Zone


Yes, that would certainly be a bug. Please report it in JIRA (https://track.hpccsystems.com).


Community Advisory Board Member
Community Advisory Board Member
Posts: 1619
Joined: Wed Oct 26, 2011 7:40 pm

Thu May 23, 2019 6:05 pm Change Time Zone

Justin, definitely open an issue regarding the sparkthor component not starting using systemcontrol.

I'll discuss with the doc to be more explicit about the 2 components (server-side and client-side). On the server-side, the spark plugin will install an HPCC controlled instance of spark (therefore no spark install required), on the client-side, the spark-hpcc.jar has dependencies on spark provided libraries. Most users will not use the client side component directly, they'll use spark shell, or some notebook type interface. The jar is required if you plan to write a java application that exploits the hpcc-spark component features provided within. Thanks.
Posts: 29
Joined: Tue Jun 10, 2014 2:19 pm

Return to Installation

Who is online

Users browsing this forum: No registered users and 1 guest