Wed Mar 21, 2018 4:32 pm
Login Register Lost Password? Contact Us

MyThor is not running in cluster

Topics related to recommendations or questions on the design for HPCC Systems clusters

Tue Feb 03, 2015 6:47 am Change Time Zone

I have 3 VM reservations installed with HPCC I'm building a script that automates HPCC cluster formation when these three machines boots up.

Exchange of SSH Keys, and environment.xml files are successful. But when I try to start the service hpcc-init using /opt/HPCCSystems/sbin/ script for the first time, except mythor service all other services are running. However, when I restart the hpcc-init service using the same script, all the services are running.

To successfully start mythor service, atleast one restart of the entire hpcc services is required. Why doesn't mythor service run at the first start? Can it be resolved? Because it takes some time to restart hpcc service in all the machines. This delays the service availability to the end user.

Below is the status of the services in each machine after first start.
X.X.X.154 hpcc-init status :
mydafilesrv ( pid 2954 ) is running...
mydfuserver ( pid 3044 ) is running...
myeclscheduler ( pid 3143 ) is running...

X.X.X.153 hpcc-init status :
mydafilesrv ( pid 2391 ) is running...
mydali ( pid 2481 ) is running...
myeclccserver ( pid 2585 ) is running...

X.X.X.63 hpcc-init status :
mydafilesrv ( pid 3260 ) is running...
myeclagent ( pid 3354 ) is running...
myesp ( pid 3450 ) is running...
mysasha ( pid 3548 ) is running...
mythor is stopped

After the first start, when I try to check the status of the services, script print the below statement.
Error found during hpcc-init_status_3795 execution.
Reference following log for more information:

These are the last few lines of the log.
2015-02-03 01:21:12,385 - hpcc.cluster.ScriptTask.2 - ERROR - X.X.X.63: Host is alive.
X.X.X.63: Running sudo /etc/init.d/hpcc-init status

2015-02-03 01:21:12,385 - hpcc.cluster.ScriptTask.2 - INFO - result: FAILED
2015-02-03 01:21:14,128 - hpcc.cluster - INFO - script execution done.
Posts: 15
Joined: Tue Feb 03, 2015 5:20 am

Tue Feb 03, 2015 1:54 pm Change Time Zone

The HPCC team took a look at your post, but we need some more information.

How are you configuring your THOR cluster with regards to the number of slave nodes?

Also, if you have the thormaster log, we would like to take a look at that as well.


Community Advisory Board Member
Community Advisory Board Member
Posts: 975
Joined: Wed Jun 29, 2011 7:13 pm

Tue Feb 03, 2015 7:12 pm Change Time Zone

Hi Bob,
I have attached thormaster log.
Below is the configuration parameters that are passed to envgen script to generate environment.xml.
number of thor nodes: 1
number of thor slaves per node: 1
Thormaster log
(17.38 KiB) Downloaded 199 times
Posts: 15
Joined: Tue Feb 03, 2015 5:20 am

Tue Feb 03, 2015 9:10 pm Change Time Zone

Can you please attach a copy of your environment.xml and the thorslave log? I'll try to get to the bottom of this for you.

Posts: 13
Joined: Tue Jan 20, 2015 9:30 pm

Tue Feb 03, 2015 11:30 pm Change Time Zone

Hi Michael,
I haven't taken backup of the environment.xml file for the set of IPs that I posted earlier. Now I have created the same scenario with a new set of machines because the VMs I work with are temporary ones i.e., whenever I request VMs I will get a new set of machines. The thor cluster configuration remains the same. I have attached thormaster log, thorslave log, environment.xml file.

Below are the services running at each machine.
X.X.X.240 hpcc-init status :
mydafilesrv ( pid 3175 ) is running...
mydfuserver ( pid 5924 ) is running...
myeclscheduler ( pid 6023 ) is running...

X.X.X.70 hpcc-init status :
mydafilesrv ( pid 3266 ) is running...
myeclagent ( pid 9561 ) is running...
myesp ( pid 9657 ) is running...
mysasha ( pid 9758 ) is running...
mythor ( pid 10721 ) is running...

X.X.X.167 hpcc-init status :
mydafilesrv ( pid 3099 ) is running...
mydali ( pid 6871 ) is running...
myeclccserver ( pid 6975 ) is running...

Thanks Michael..
thorslave log
(1.16 KiB) Downloaded 207 times
thormaster log
(12.12 KiB) Downloaded 203 times
environment.xml file
(37.88 KiB) Downloaded 203 times
Posts: 15
Joined: Tue Feb 03, 2015 5:20 am

Wed Feb 11, 2015 9:38 pm Change Time Zone

Thank you for posting the files. The team is still reviewing and will circle back soon.
Site Admin
Site Admin
Posts: 202
Joined: Thu Jan 27, 2011 10:58 am

Thu Feb 12, 2015 3:03 pm Change Time Zone


I may be looking at out of date log files but I do not understand the IP addresses. Thormaster log shows:

0000000C 2015-02-03 18:05:33.968 5481 5481 "ThorMaster version 4.1, Started on X.X.X.69:20000"

which suggests its IP address is X.X.X.69

And it is trying to connect with a thorslave on X.X.X.167:

00000012 2015-02-03 18:05:33.973 5481 5481 "verified connection with X.X.X.167:20100"

But Thorslave log shows:

00000002 2015-02-03 18:05:33.828 3850 3850 "registering X.X.X.68:20100 - master X.X.X.70:20000"

which suggests it is X.X.X.68 and the master is X.X.X.70. Can we verify all hosts and IPs again ?

Posts: 15
Joined: Mon Mar 10, 2014 2:51 pm

Mon Feb 16, 2015 4:10 pm Change Time Zone

Hi Mark,
There are two NICs for each machine, one of them is public facing NIC and another one is internal. Below are the pair of IP for each node.
Master - X.X.X.69/X.X.X.70
Slave - X.X.X.68/X.X.X.167

Lakshman Naresh
Posts: 15
Joined: Tue Feb 03, 2015 5:20 am

Tue Feb 17, 2015 2:48 pm Change Time Zone


Can you send the output from


on all 3 machines ? This info will help
to configure which interface to use on all
3 machines.

Posts: 15
Joined: Mon Mar 10, 2014 2:51 pm

Tue Feb 17, 2015 2:58 pm Change Time Zone

Also, if you could please run this command on .70 and .167 (the thormaster and thorslave.) Then post the output. I'm assuming that X.X.X.167 is the ip of your dali node according to the xml you gave us earlier.

Code: Select all
sudo /opt/HPCCSystems/bin/daliadmin X.X.X.167 dfsgroup mythor
Posts: 13
Joined: Tue Jan 20, 2015 9:30 pm


Return to Clustering

Who is online

Users browsing this forum: No registered users and 1 guest