Installing Spark Plugin
I'm having some trouble with Spark plugin installation. I'm not sure if I understand the documentation and need some guidance. I'm using the installation guide found here.
I have a 3-node cluster running HPCC CE 7.2.10-1 on Ubuntu 16.04 LTS. I'm able to access ECL Watch on port 8010 and upload and spray files without issue. When I point my browser to port 8080, I get a page not found error. Further, I don't see Sparkthor listed on the ECL Watch System Servers page.
I used the Configuration Manager to add the Sparkthor component and pushed the environment file to all machines. I've confirmed the following lines were added:
I've also confirmed that Java is installed on all machines with the following output from java -version:
It wasn't clear to me whether I needed to manually install Spark, so I didn't at first. After the first failed installation attempt, however, I installed Spark, but the issue persists. This page mentions the following:
It's also not clear to me whether these files are necessary except for specific applications, so I didn't download these files (I'm not sure how to anyway). Maybe that's where the problem lies.
I have a 3-node cluster running HPCC CE 7.2.10-1 on Ubuntu 16.04 LTS. I'm able to access ECL Watch on port 8010 and upload and spray files without issue. When I point my browser to port 8080, I get a page not found error. Further, I don't see Sparkthor listed on the ECL Watch System Servers page.
I used the Configuration Manager to add the Sparkthor component and pushed the environment file to all machines. I've confirmed the following lines were added:
- Code: Select all
<SparkThorProcess build="_"
buildSet="sparkthor"
name="mysparkthor"
SPARK_EXECUTOR_CORES="1"
SPARK_EXECUTOR_MEMORY="1G"
SPARK_MASTER_PORT="7077"
SPARK_MASTER_WEBUI_PORT="8080"
SPARK_WORKER_CORES="1"
SPARK_WORKER_MEMORY="1G"
SPARK_WORKER_PORT="7071"
ThorClusterName="mythor">
<Instance computer="node001006"
directory="/var/lib/HPCCSystems/mysparkthor"
name="s1"
netAddress="192.168.1.6"/>
</SparkThorProcess>
I've also confirmed that Java is installed on all machines with the following output from java -version:
- Code: Select all
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-8u212-b03-0ubuntu1.16.04.1-b03)
OpenJDK 64-Bit Server VM (build 25.212-b03, mixed mode)
It wasn't clear to me whether I needed to manually install Spark, so I didn't at first. After the first failed installation attempt, however, I installed Spark, but the issue persists. This page mentions the following:
The HPCC Systems Spark Connector requires Spark 2.10 or 2.11 and the org.hpccsystems.wsclient library available from the Maven Repository, download now.
Find the source code and examples in the spark-hpccsystems repository
Get the 7.2.12-1 JAR file from Maven Central Repository or download now
Get the javadocs from Maven Central Repository or download now
It's also not clear to me whether these files are necessary except for specific applications, so I didn't download these files (I'm not sure how to anyway). Maybe that's where the problem lies.
- jumou
- Posts: 2
- Joined: Mon May 20, 2019 12:35 pm
It looks like this problem only occurs when using systemctl to start the cluster. I don't have this problem when I use the command
This may be a bug.
- Code: Select all
/etc/init.d/hpcc-init start
This may be a bug.
- jumou
- Posts: 2
- Joined: Mon May 20, 2019 12:35 pm
jumou,
Yes, that would certainly be a bug. Please report it in JIRA (https://track.hpccsystems.com).
HTH,
Richard
Yes, that would certainly be a bug. Please report it in JIRA (https://track.hpccsystems.com).
HTH,
Richard
- rtaylor
- Community Advisory Board Member
- Posts: 1619
- Joined: Wed Oct 26, 2011 7:40 pm
Justin, definitely open an issue regarding the sparkthor component not starting using systemcontrol.
I'll discuss with the doc to be more explicit about the 2 components (server-side and client-side). On the server-side, the spark plugin will install an HPCC controlled instance of spark (therefore no spark install required), on the client-side, the spark-hpcc.jar has dependencies on spark provided libraries. Most users will not use the client side component directly, they'll use spark shell, or some notebook type interface. The jar is required if you plan to write a java application that exploits the hpcc-spark component features provided within. Thanks.
I'll discuss with the doc to be more explicit about the 2 components (server-side and client-side). On the server-side, the spark plugin will install an HPCC controlled instance of spark (therefore no spark install required), on the client-side, the spark-hpcc.jar has dependencies on spark provided libraries. Most users will not use the client side component directly, they'll use spark shell, or some notebook type interface. The jar is required if you plan to write a java application that exploits the hpcc-spark component features provided within. Thanks.
- rodrigo.pastrana
- Posts: 29
- Joined: Tue Jun 10, 2014 2:19 pm
4 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 1 guest