Thu Aug 16, 2018 7:03 am
Login Register Lost Password? Contact Us


Multinode cluster issue: thorCluster not attached

Topics related to recommendations or questions on the design for HPCC Systems clusters

Mon Jan 09, 2017 5:25 pm Change Time Zone

How do I find the IP address of the thormaster, I don't see it in the environment.xml file
vchinta
 
Posts: 56
Joined: Mon Oct 31, 2016 3:45 pm

Mon Jan 09, 2017 5:33 pm Change Time Zone

It's in the ECL Watch, from your screen shot I can see it:
192.168.5.23
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 975
Joined: Wed Jun 29, 2011 7:13 pm

Mon Jan 09, 2017 5:42 pm Change Time Zone

Yes, can ssh to all nodes from thormaster and vice versa
vchinta
 
Posts: 56
Joined: Mon Oct 31, 2016 3:45 pm

Mon Jan 09, 2017 6:18 pm Change Time Zone

Ok, still checking with the team to see what else we can do. I still haven't seen a log from you with a specific ERROR in it, you might want to browse the other logs to see if you can find anything. I checked our issue tracker and there was an issue reported a while ago in 5.4.0 that was fixed in later releases, but not sure it applies to your configuration or version.

Bob
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 975
Joined: Wed Jun 29, 2011 7:13 pm

Mon Jan 09, 2017 6:27 pm Change Time Zone

ubuntu@hpcc-3:~$ cat /var/log/HPCCSystems/mythor/thorslave.1.2017_01_09.log
00000000 2017-01-09 15:09:59.138 28652 28652 "Opened log file //192.168.5.26/var/log/HPCCSystems/mythor/thorslave.1.2017_01_09.log"
00000001 2017-01-09 15:09:59.138 28652 28652 "Build community_5.4.6-1"
00000002 2017-01-09 15:09:59.140 28652 28652 "ERROR: -7: /var/lib/jenkins/workspace/CE-Candidate-withplugins-5.4.6-1/CE/ubuntu-15.04-amd64/HPCC-Platform/thorlcr/slave/thslavemain.cpp(424) : ThorSlave : port in use
Target: S>192.168.5.26, port = 20100, Raised in: /var/lib/jenkins/workspace/CE-Candidate-withplugins-5.4.6-1/CE/ubuntu-15.04-amd64/HPCC-Platform/system/jlib/jsocket.cpp, line 912"
00000003 2017-01-09 15:09:59.140 28652 28652 "temp directory cleared"
00000004 2017-01-09 15:09:59.140 28652 28652 "Unregistering slave : 192.168.5.26:20100"
00000005 2017-01-09 15:09:59.140 28652 28652 "ERROR: Failed to unregister slave : 192.168.5.26:20100"


ubuntu@hpcc-master:~$ cat /var/log/HPCCSystems/mythor/thormaster.2017_01_09.log
00000001 2017-01-09 15:09:59.139 21007 21007 "Opened log file //192.168.5.23/var/log/HPCCSystems/mythor/thormaster.2017_01_09.log"
00000002 2017-01-09 15:09:59.139 21007 21007 "Build community_5.4.6-1"
00000003 2017-01-09 15:09:59.139 21007 21007 "calling initClientProcess Port 20000"
00000004 2017-01-09 15:09:59.142 21007 21007 "Found file 'thorgroup', using to form thor group"
00000005 2017-01-09 15:09:59.142 21007 21007 "Checking cluster replicate nodes"
00000006 2017-01-09 15:10:59.143 21007 21007 "multiConnect failed to 192.168.5.26:7100 with -1"
00000007 2017-01-09 15:10:59.144 21007 21007 "ERROR: /var/lib/jenkins/workspace/CE-Candidate-withplugins-5.4.6-1/CE/ubuntu-15.04-amd64/HPCC-Platform/thorlcr/master/thmastermain.cpp(393) : VALIDATE FAILED(1) 192.168.5.26 : Connect failure"
00000008 2017-01-09 15:10:59.144 21007 21007 "Cluster replicate nodes check completed in 60002ms"
00000009 2017-01-09 15:10:59.144 21007 21007 "ERROR: /var/lib/jenkins/workspace/CE-Candidate-withplugins-5.4.6-1/CE/ubuntu-15.04-amd64/HPCC-Platform/thorlcr/master/thmastermain.cpp(632) : ERROR: Validate failure(s) detected, exiting Thor"

Found some error logs.
I may have notice noticed something, in our environment, I need explicitly enable access to ports, I've added 8010 and 8015 for eclwatch and the configmgr. Do I need to add any other ports?
vchinta
 
Posts: 56
Joined: Mon Oct 31, 2016 3:45 pm

Mon Jan 09, 2017 6:53 pm Change Time Zone

I don't think so, find out what process is using that port on your system and if it is thorslave simply kill it.
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 975
Joined: Wed Jun 29, 2011 7:13 pm

Mon Jan 09, 2017 7:06 pm Change Time Zone

port 20100 had thorslave, killed it, Tried init-start again, mythor failed to start again
vchinta
 
Posts: 56
Joined: Mon Oct 31, 2016 3:45 pm

Mon Jan 09, 2017 7:20 pm Change Time Zone

Anything new in the log? Perhaps a different error?
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 975
Joined: Wed Jun 29, 2011 7:13 pm

Mon Jan 09, 2017 7:26 pm Change Time Zone

hpcc-init.log
2017-01-09T19:02:27: --------------------------
2017-01-09T19:02:27: --------------------------
2017-01-09T19:02:27: The following components have been located:
2017-01-09T19:02:27: ---> mydali
2017-01-09T19:02:27: ---> mydfuserver
2017-01-09T19:02:27: ---> myeclagent
2017-01-09T19:02:27: ---> myeclccserver
2017-01-09T19:02:27: ---> myeclscheduler
2017-01-09T19:02:27: ---> myesp
2017-01-09T19:02:27: ---> mysasha
2017-01-09T19:02:27: ---> mythor
2017-01-09T19:02:27: --------------------------
2017-01-09T19:02:27: Debug log written to /var/log/HPCCSystems/hpcc-init.debug
2017-01-09T19:02:27: Attempting to execute stop argument on specified components
2017-01-09T19:02:27: --------------------------
2017-01-09T19:02:27: mythor ---> stop
2017-01-09T19:02:27: Already stopped
2017-01-09T19:02:27: stop_component mythor ---> Exit status 0
2017-01-09T19:02:27: --------------------------
2017-01-09T19:02:27: mysasha ---> stop
2017-01-09T19:02:27: mysasha ---> Waiting on Sentinel
2017-01-09T19:02:27: /opt/HPCCSystems/bin/start-stop-daemon -K -p /var/run/HPCCSystems/init_mysasha.pid >> tmp.txt 2>&1
2017-01-09T19:02:27: mysasha ---> Waiting on Sentinel
2017-01-09T19:02:28: Lock file /var/lock/HPCCSystems/mysasha/mysasha.lock does not exist
2017-01-09T19:02:28: Pid file doesn't exist
2017-01-09T19:02:28: stop_component mysasha ---> Exit status 0
2017-01-09T19:02:28: --------------------------
2017-01-09T19:02:28: myesp ---> stop
2017-01-09T19:02:28: myesp ---> Waiting on Sentinel
2017-01-09T19:02:28: /opt/HPCCSystems/bin/start-stop-daemon -K -p /var/run/HPCCSystems/init_myesp.pid >> tmp.txt 2>&1
2017-01-09T19:02:28: myesp ---> Waiting on Sentinel
2017-01-09T19:02:29: Lock file /var/lock/HPCCSystems/myesp/myesp.lock does not exist
2017-01-09T19:02:29: Pid file doesn't exist
2017-01-09T19:02:29: stop_component myesp ---> Exit status 0
2017-01-09T19:02:29: --------------------------
2017-01-09T19:02:29: myeclscheduler ---> stop
2017-01-09T19:02:29: myeclscheduler ---> Waiting on Sentinel
2017-01-09T19:02:29: /opt/HPCCSystems/bin/start-stop-daemon -K -p /var/run/HPCCSystems/init_myeclscheduler.pid >> tmp.txt 2>&1
2017-01-09T19:02:29: myeclscheduler ---> Waiting on Sentinel
2017-01-09T19:02:30: Lock file /var/lock/HPCCSystems/myeclscheduler/myeclscheduler.lock does not exist
2017-01-09T19:02:30: Pid file doesn't exist
2017-01-09T19:02:30: stop_component myeclscheduler ---> Exit status 0
2017-01-09T19:02:30: --------------------------
2017-01-09T19:02:30: myeclccserver ---> stop
2017-01-09T19:02:30: myeclccserver ---> Waiting on Sentinel
2017-01-09T19:02:30: /opt/HPCCSystems/bin/start-stop-daemon -K -p /var/run/HPCCSystems/init_myeclccserver.pid >> tmp.txt 2>&1
2017-01-09T19:02:30: myeclccserver ---> Waiting on Sentinel
2017-01-09T19:02:31: Lock file /var/lock/HPCCSystems/myeclccserver/myeclccserver.lock does not exist
2017-01-09T19:02:31: Pid file doesn't exist
2017-01-09T19:02:31: stop_component myeclccserver ---> Exit status 0
2017-01-09T19:02:31: --------------------------
2017-01-09T19:02:31: myeclagent ---> stop
2017-01-09T19:02:31: myeclagent ---> Waiting on Sentinel
2017-01-09T19:02:31: /opt/HPCCSystems/bin/start-stop-daemon -K -p /var/run/HPCCSystems/init_myeclagent.pid >> tmp.txt 2>&1
2017-01-09T19:02:31: myeclagent ---> Waiting on Sentinel
2017-01-09T19:02:32: Lock file /var/lock/HPCCSystems/myeclagent/myeclagent.lock does not exist
2017-01-09T19:02:32: Pid file doesn't exist
2017-01-09T19:02:32: stop_component myeclagent ---> Exit status 0
2017-01-09T19:02:32: --------------------------
2017-01-09T19:02:32: mydfuserver ---> stop
2017-01-09T19:02:32: mydfuserver ---> Waiting on Sentinel
2017-01-09T19:02:32: /opt/HPCCSystems/bin/start-stop-daemon -K -p /var/run/HPCCSystems/init_mydfuserver.pid >> tmp.txt 2>&1
2017-01-09T19:02:32: mydfuserver ---> Waiting on Sentinel
2017-01-09T19:02:33: Lock file /var/lock/HPCCSystems/mydfuserver/mydfuserver.lock does not exist
2017-01-09T19:02:33: Pid file doesn't exist
2017-01-09T19:02:33: stop_component mydfuserver ---> Exit status 0
2017-01-09T19:02:33: --------------------------
2017-01-09T19:02:33: mydali ---> stop
2017-01-09T19:02:33: mydali ---> Waiting on Sentinel
2017-01-09T19:02:33: /opt/HPCCSystems/bin/start-stop-daemon -K -p /var/run/HPCCSystems/init_mydali.pid >> tmp.txt 2>&1
2017-01-09T19:02:33: mydali ---> Waiting on Sentinel
2017-01-09T19:02:49: Lock file /var/lock/HPCCSystems/mydali/mydali.lock does not exist
2017-01-09T19:02:49: Pid file doesn't exist
2017-01-09T19:02:49: stop_component mydali ---> Exit status 0
2017-01-09T19:02:49: mydafilesrv ---> Waiting on Sentinel
2017-01-09T19:02:49: mydafilesrv ---> Sentinel Up
2017-01-09T19:02:49: mydafilesrv ---> Running ( pid 14949 )
2017-01-09T19:02:49: Service dafilesrv, mydafilesrv is still running.
2017-01-09T19:03:40: --------------------------
2017-01-09T19:03:40: --------------------------
2017-01-09T19:03:40: The following components have been located:
2017-01-09T19:03:40: ---> mydali
2017-01-09T19:03:40: ---> mydfuserver
2017-01-09T19:03:40: ---> myeclagent
2017-01-09T19:03:40: ---> myeclccserver
2017-01-09T19:03:40: ---> myeclscheduler
2017-01-09T19:03:40: ---> myesp
2017-01-09T19:03:40: ---> mysasha
2017-01-09T19:03:40: ---> mythor
2017-01-09T19:03:40: --------------------------
2017-01-09T19:03:40: Debug log written to /var/log/HPCCSystems/hpcc-init.debug
2017-01-09T19:03:40: Attempting to execute status argument on specified components
2017-01-09T19:03:41: mydafilesrv ---> Waiting on Sentinel
2017-01-09T19:03:41: mydafilesrv ---> Sentinel Up
2017-01-09T19:03:41: mydafilesrv ---> Running ( pid 14949 )
2017-01-09T19:03:41: --------------------------
2017-01-09T19:03:41: mydali ---> status
2017-01-09T19:03:41: mydali ---> Sentinel Down
2017-01-09T19:03:41: mydali ---> Stopped
2017-01-09T19:03:41: status_component mydali ---> Exit status 1
2017-01-09T19:03:41: --------------------------
2017-01-09T19:03:41: mydfuserver ---> status
2017-01-09T19:03:41: mydfuserver ---> Sentinel Down
2017-01-09T19:03:41: mydfuserver ---> Stopped
2017-01-09T19:03:41: status_component mydfuserver ---> Exit status 1
2017-01-09T19:03:41: --------------------------
2017-01-09T19:03:41: myeclagent ---> status
2017-01-09T19:03:41: myeclagent ---> Sentinel Down
2017-01-09T19:03:41: myeclagent ---> Stopped
2017-01-09T19:03:41: status_component myeclagent ---> Exit status 1
2017-01-09T19:03:41: --------------------------
2017-01-09T19:03:41: myeclccserver ---> status
2017-01-09T19:03:41: myeclccserver ---> Sentinel Down
2017-01-09T19:03:41: myeclccserver ---> Stopped
2017-01-09T19:03:41: status_component myeclccserver ---> Exit status 1
2017-01-09T19:03:41: --------------------------
2017-01-09T19:03:41: myeclscheduler ---> status
2017-01-09T19:03:41: myeclscheduler ---> Sentinel Down
2017-01-09T19:03:41: myeclscheduler ---> Stopped
2017-01-09T19:03:41: status_component myeclscheduler ---> Exit status 1
2017-01-09T19:03:41: --------------------------
2017-01-09T19:03:41: myesp ---> status
2017-01-09T19:03:41: myesp ---> Sentinel Down
2017-01-09T19:03:41: myesp ---> Stopped
2017-01-09T19:03:41: status_component myesp ---> Exit status 1
2017-01-09T19:03:41: --------------------------
2017-01-09T19:03:41: mysasha ---> status
2017-01-09T19:03:41: mysasha ---> Sentinel Down
2017-01-09T19:03:41: mysasha ---> Stopped
2017-01-09T19:03:41: status_component mysasha ---> Exit status 1
2017-01-09T19:03:41: --------------------------
2017-01-09T19:03:41: mythor ---> status
2017-01-09T19:03:41: mythor ---> Sentinel Down
2017-01-09T19:03:41: mythor ---> Stopped
2017-01-09T19:03:41: status_component mythor ---> Exit status 1
2017-01-09T19:04:01: --------------------------
2017-01-09T19:04:01: --------------------------
2017-01-09T19:04:01: The following components have been located:
2017-01-09T19:04:01: ---> mydali
2017-01-09T19:04:01: ---> mydfuserver
2017-01-09T19:04:01: ---> myeclagent
2017-01-09T19:04:01: ---> myeclccserver
2017-01-09T19:04:01: ---> myeclscheduler
2017-01-09T19:04:01: ---> myesp
2017-01-09T19:04:01: ---> mysasha
2017-01-09T19:04:01: ---> mythor
2017-01-09T19:04:01: --------------------------
2017-01-09T19:04:01: Debug log written to /var/log/HPCCSystems/hpcc-init.debug
2017-01-09T19:04:01: Attempting to execute start argument on specified components
2017-01-09T19:04:01: Creating dropzone
2017-01-09T19:04:01: mydafilesrv ---> Waiting on Sentinel
2017-01-09T19:04:01: mydafilesrv ---> Sentinel Up
2017-01-09T19:04:01: mydafilesrv ---> Running ( pid 14949 )
2017-01-09T19:04:01: Dependent service dafilesrv, mydafilesrv is already running.
2017-01-09T19:04:01: --------------------------
2017-01-09T19:04:01: mydali ---> start
2017-01-09T19:04:01: /opt/HPCCSystems/sbin/configgen -env /etc/HPCCSystems/environment.xml -od /var/lib/HPCCSystems -id /opt/HPCCSystems/componentfiles/configxml -c mydali
2017-01-09T19:04:01: compType = dali
2017-01-09T19:04:01: mydali ---> Sentinel Down
2017-01-09T19:04:01: /opt/HPCCSystems/bin/start-stop-daemon -S -p /var/run/HPCCSystems/init_mydali.pid -c hpcc:hpcc -d /var/lib/HPCCSystems/mydali -m -x /opt/HPCCSystems/bin/init_dali -b
2017-01-09T19:04:02: mydali ---> Waiting on Sentinel
2017-01-09T19:04:02: mydali ---> Sentinel Up
2017-01-09T19:04:02: start_component mydali ---> Exit status 0
2017-01-09T19:04:02: --------------------------
2017-01-09T19:04:02: mydfuserver ---> start
2017-01-09T19:04:02: /opt/HPCCSystems/sbin/configgen -env /etc/HPCCSystems/environment.xml -od /var/lib/HPCCSystems -id /opt/HPCCSystems/componentfiles/configxml -c mydfuserver
2017-01-09T19:04:03: compType = dfuserver
2017-01-09T19:04:03: mydfuserver ---> Sentinel Down
2017-01-09T19:04:03: /opt/HPCCSystems/bin/start-stop-daemon -S -p /var/run/HPCCSystems/init_mydfuserver.pid -c hpcc:hpcc -d /var/lib/HPCCSystems/mydfuserver -m -x /opt/HPCCSystems/bin/init_dfuserver -b
2017-01-09T19:04:04: mydfuserver ---> Waiting on Sentinel
2017-01-09T19:04:04: mydfuserver ---> Sentinel Up
2017-01-09T19:04:04: start_component mydfuserver ---> Exit status 0
2017-01-09T19:04:04: --------------------------
2017-01-09T19:04:04: myeclagent ---> start
2017-01-09T19:04:04: /opt/HPCCSystems/sbin/configgen -env /etc/HPCCSystems/environment.xml -od /var/lib/HPCCSystems -id /opt/HPCCSystems/componentfiles/configxml -c myeclagent
2017-01-09T19:04:04: compType = eclagent
2017-01-09T19:04:04: myeclagent ---> Sentinel Down
2017-01-09T19:04:04: /opt/HPCCSystems/bin/start-stop-daemon -S -p /var/run/HPCCSystems/init_myeclagent.pid -c hpcc:hpcc -d /var/lib/HPCCSystems/myeclagent -m -x /opt/HPCCSystems/bin/init_eclagent -b
2017-01-09T19:04:05: myeclagent ---> Waiting on Sentinel
2017-01-09T19:04:05: myeclagent ---> Sentinel Up
2017-01-09T19:04:05: start_component myeclagent ---> Exit status 0
2017-01-09T19:04:05: --------------------------
2017-01-09T19:04:05: myeclccserver ---> start
2017-01-09T19:04:05: /opt/HPCCSystems/sbin/configgen -env /etc/HPCCSystems/environment.xml -od /var/lib/HPCCSystems -id /opt/HPCCSystems/componentfiles/configxml -c myeclccserver
2017-01-09T19:04:05: compType = eclccserver
2017-01-09T19:04:05: myeclccserver ---> Sentinel Down
2017-01-09T19:04:05: /opt/HPCCSystems/bin/start-stop-daemon -S -p /var/run/HPCCSystems/init_myeclccserver.pid -c hpcc:hpcc -d /var/lib/HPCCSystems/myeclccserver -m -x /opt/HPCCSystems/bin/init_eclccserver -b
2017-01-09T19:04:06: myeclccserver ---> Waiting on Sentinel
2017-01-09T19:04:06: myeclccserver ---> Sentinel Up
2017-01-09T19:04:06: start_component myeclccserver ---> Exit status 0
2017-01-09T19:04:06: --------------------------
2017-01-09T19:04:06: myeclscheduler ---> start
2017-01-09T19:04:06: /opt/HPCCSystems/sbin/configgen -env /etc/HPCCSystems/environment.xml -od /var/lib/HPCCSystems -id /opt/HPCCSystems/componentfiles/configxml -c myeclscheduler
2017-01-09T19:04:06: compType = eclscheduler
2017-01-09T19:04:06: myeclscheduler ---> Sentinel Down
2017-01-09T19:04:06: /opt/HPCCSystems/bin/start-stop-daemon -S -p /var/run/HPCCSystems/init_myeclscheduler.pid -c hpcc:hpcc -d /var/lib/HPCCSystems/myeclscheduler -m -x /opt/HPCCSystems/bin/init_eclscheduler -b
2017-01-09T19:04:07: myeclscheduler ---> Waiting on Sentinel
2017-01-09T19:04:07: myeclscheduler ---> Sentinel Up
2017-01-09T19:04:07: start_component myeclscheduler ---> Exit status 0
2017-01-09T19:04:07: --------------------------
2017-01-09T19:04:07: myesp ---> start
2017-01-09T19:04:07: /opt/HPCCSystems/sbin/configgen -env /etc/HPCCSystems/environment.xml -od /var/lib/HPCCSystems -id /opt/HPCCSystems/componentfiles/configxml -c myesp
2017-01-09T19:04:07: compType = esp
2017-01-09T19:04:07: myesp ---> Sentinel Down
2017-01-09T19:04:07: /opt/HPCCSystems/bin/start-stop-daemon -S -p /var/run/HPCCSystems/init_myesp.pid -c hpcc:hpcc -d /var/lib/HPCCSystems/myesp -m -x /opt/HPCCSystems/bin/init_esp -b
2017-01-09T19:04:08: myesp ---> Waiting on Sentinel
2017-01-09T19:04:08: myesp ---> Sentinel Up
2017-01-09T19:04:08: start_component myesp ---> Exit status 0
2017-01-09T19:04:08: --------------------------
2017-01-09T19:04:08: mysasha ---> start
2017-01-09T19:04:08: /opt/HPCCSystems/sbin/configgen -env /etc/HPCCSystems/environment.xml -od /var/lib/HPCCSystems -id /opt/HPCCSystems/componentfiles/configxml -c mysasha
2017-01-09T19:04:08: compType = sasha
2017-01-09T19:04:08: mysasha ---> Sentinel Down
2017-01-09T19:04:08: /opt/HPCCSystems/bin/start-stop-daemon -S -p /var/run/HPCCSystems/init_mysasha.pid -c hpcc:hpcc -d /var/lib/HPCCSystems/mysasha -m -x /opt/HPCCSystems/bin/init_sasha -b
2017-01-09T19:04:09: mysasha ---> Waiting on Sentinel
2017-01-09T19:04:09: mysasha ---> Sentinel Up
2017-01-09T19:04:09: start_component mysasha ---> Exit status 0
2017-01-09T19:04:09: --------------------------
2017-01-09T19:04:09: mythor ---> start
2017-01-09T19:04:10: /opt/HPCCSystems/sbin/configgen -env /etc/HPCCSystems/environment.xml -od /var/lib/HPCCSystems -id /opt/HPCCSystems/componentfiles/configxml -c mythor
2017-01-09T19:04:10: compType = thor
2017-01-09T19:04:10: mythor ---> Sentinel Down
2017-01-09T19:04:10: /opt/HPCCSystems/bin/start-stop-daemon -S -p /var/run/HPCCSystems/init_mythor.pid -c hpcc:hpcc -d /var/lib/HPCCSystems/mythor -m -x /opt/HPCCSystems/bin/init_thor -b
2017-01-09T19:04:12: mythor ---> Waiting on Sentinel
2017-01-09T19:04:12: mythor ---> Currently Unhealthy
2017-01-09T19:04:13: mythor ---> Waiting on Sentinel
2017-01-09T19:04:13: mythor ---> Currently Unhealthy
2017-01-09T19:04:14: mythor ---> Waiting on Sentinel
2017-01-09T19:04:14: mythor ---> Currently Unhealthy
2017-01-09T19:04:15: mythor ---> Waiting on Sentinel
2017-01-09T19:04:15: mythor ---> Currently Unhealthy
2017-01-09T19:04:16: mythor ---> Waiting on Sentinel
2017-01-09T19:04:16: mythor ---> Currently Unhealthy
2017-01-09T19:04:17: mythor ---> Waiting on Sentinel
2017-01-09T19:04:17: mythor ---> Currently Unhealthy
2017-01-09T19:04:18: mythor ---> Waiting on Sentinel
2017-01-09T19:04:18: mythor ---> Currently Unhealthy
2017-01-09T19:04:19: mythor ---> Waiting on Sentinel
2017-01-09T19:04:19: mythor ---> Currently Unhealthy
2017-01-09T19:04:20: mythor ---> Waiting on Sentinel
2017-01-09T19:04:20: mythor ---> Currently Unhealthy
2017-01-09T19:04:21: mythor ---> Waiting on Sentinel
2017-01-09T19:04:21: mythor ---> Currently Unhealthy
2017-01-09T19:04:22: mythor ---> Waiting on Sentinel
2017-01-09T19:04:22: mythor ---> Currently Unhealthy
2017-01-09T19:04:23: mythor ---> Waiting on Sentinel
2017-01-09T19:04:23: mythor ---> Currently Unhealthy
2017-01-09T19:04:24: mythor ---> Waiting on Sentinel
2017-01-09T19:04:24: mythor ---> Currently Unhealthy
2017-01-09T19:04:25: mythor ---> Waiting on Sentinel
2017-01-09T19:04:25: mythor ---> Currently Unhealthy
2017-01-09T19:04:26: mythor ---> Waiting on Sentinel
2017-01-09T19:04:26: mythor ---> Currently Unhealthy
2017-01-09T19:04:27: mythor ---> Waiting on Sentinel
2017-01-09T19:04:27: mythor ---> Currently Unhealthy
2017-01-09T19:04:28: mythor ---> Waiting on Sentinel
2017-01-09T19:04:28: mythor ---> Currently Unhealthy
2017-01-09T19:04:29: mythor ---> Waiting on Sentinel
2017-01-09T19:04:29: mythor ---> Currently Unhealthy
2017-01-09T19:04:30: mythor ---> Waiting on Sentinel
2017-01-09T19:04:30: mythor ---> Currently Unhealthy
2017-01-09T19:04:31: mythor ---> Waiting on Sentinel
2017-01-09T19:04:31: mythor ---> Currently Unhealthy
2017-01-09T19:04:32: mythor ---> Waiting on Sentinel
2017-01-09T19:04:32: mythor ---> Currently Unhealthy
2017-01-09T19:04:33: mythor ---> Waiting on Sentinel
2017-01-09T19:04:33: mythor ---> Currently Unhealthy
2017-01-09T19:04:34: mythor ---> Waiting on Sentinel
2017-01-09T19:04:34: mythor ---> Currently Unhealthy
2017-01-09T19:04:35: mythor ---> Waiting on Sentinel
2017-01-09T19:04:35: mythor ---> Currently Unhealthy
2017-01-09T19:04:36: mythor ---> Waiting on Sentinel
2017-01-09T19:04:36: mythor ---> Currently Unhealthy
2017-01-09T19:04:37: mythor ---> Waiting on Sentinel
2017-01-09T19:04:37: mythor ---> Currently Unhealthy
2017-01-09T19:04:38: mythor ---> Waiting on Sentinel
2017-01-09T19:04:38: mythor ---> Currently Unhealthy
2017-01-09T19:04:39: mythor ---> Waiting on Sentinel
2017-01-09T19:04:39: mythor ---> Currently Unhealthy
2017-01-09T19:04:40: mythor ---> Waiting on Sentinel
2017-01-09T19:04:40: mythor ---> Currently Unhealthy
2017-01-09T19:04:41: mythor ---> Waiting on Sentinel
2017-01-09T19:04:41: mythor ---> Currently Unhealthy
2017-01-09T19:04:42: mythor ---> Waiting on Sentinel
2017-01-09T19:04:42: mythor ---> Currently Unhealthy
2017-01-09T19:04:43: mythor ---> Waiting on Sentinel
2017-01-09T19:04:43: mythor ---> Currently Unhealthy
2017-01-09T19:04:44: mythor ---> Waiting on Sentinel
2017-01-09T19:04:44: mythor ---> Currently Unhealthy
2017-01-09T19:04:45: mythor ---> Waiting on Sentinel
2017-01-09T19:04:45: mythor ---> Currently Unhealthy
2017-01-09T19:04:46: mythor ---> Waiting on Sentinel
2017-01-09T19:04:46: mythor ---> Currently Unhealthy
2017-01-09T19:04:47: mythor ---> Waiting on Sentinel
2017-01-09T19:04:47: mythor ---> Currently Unhealthy
2017-01-09T19:04:49: mythor ---> Waiting on Sentinel
2017-01-09T19:04:49: mythor ---> Currently Unhealthy
2017-01-09T19:04:50: mythor ---> Waiting on Sentinel
2017-01-09T19:04:50: mythor ---> Currently Unhealthy
2017-01-09T19:04:51: mythor ---> Waiting on Sentinel
2017-01-09T19:04:51: mythor ---> Currently Unhealthy
2017-01-09T19:04:52: mythor ---> Waiting on Sentinel
2017-01-09T19:04:52: mythor ---> Currently Unhealthy
2017-01-09T19:04:53: mythor ---> Waiting on Sentinel
2017-01-09T19:04:53: mythor ---> Currently Unhealthy
2017-01-09T19:04:54: mythor ---> Waiting on Sentinel
2017-01-09T19:04:54: mythor ---> Currently Unhealthy
2017-01-09T19:04:55: mythor ---> Waiting on Sentinel
2017-01-09T19:04:55: mythor ---> Currently Unhealthy
2017-01-09T19:04:56: mythor ---> Waiting on Sentinel
2017-01-09T19:04:56: mythor ---> Currently Unhealthy
2017-01-09T19:04:57: mythor ---> Waiting on Sentinel
2017-01-09T19:04:57: mythor ---> Currently Unhealthy
2017-01-09T19:04:58: mythor ---> Waiting on Sentinel
2017-01-09T19:04:58: mythor ---> Currently Unhealthy
2017-01-09T19:04:59: mythor ---> Waiting on Sentinel
2017-01-09T19:04:59: mythor ---> Currently Unhealthy
2017-01-09T19:05:00: mythor ---> Waiting on Sentinel
2017-01-09T19:05:00: mythor ---> Currently Unhealthy
2017-01-09T19:05:01: mythor ---> Waiting on Sentinel
2017-01-09T19:05:01: mythor ---> Currently Unhealthy
2017-01-09T19:05:02: mythor ---> Waiting on Sentinel
2017-01-09T19:05:02: mythor ---> Currently Unhealthy
2017-01-09T19:05:03: mythor ---> Waiting on Sentinel
2017-01-09T19:05:03: mythor ---> Currently Unhealthy
2017-01-09T19:05:04: mythor ---> Waiting on Sentinel
2017-01-09T19:05:04: mythor ---> Currently Unhealthy
2017-01-09T19:05:05: mythor ---> Waiting on Sentinel
2017-01-09T19:05:05: mythor ---> Currently Unhealthy
2017-01-09T19:05:06: mythor ---> Waiting on Sentinel
2017-01-09T19:05:06: mythor ---> Currently Unhealthy
2017-01-09T19:05:07: mythor ---> Waiting on Sentinel
2017-01-09T19:05:07: mythor ---> Currently Unhealthy
2017-01-09T19:05:08: mythor ---> Waiting on Sentinel
2017-01-09T19:05:08: mythor ---> Currently Unhealthy
2017-01-09T19:05:09: mythor ---> Waiting on Sentinel
2017-01-09T19:05:09: mythor ---> Currently Unhealthy
2017-01-09T19:05:10: mythor ---> Waiting on Sentinel
2017-01-09T19:05:10: mythor ---> Currently Unhealthy
2017-01-09T19:05:11: mythor ---> Waiting on Sentinel
2017-01-09T19:05:11: mythor ---> Currently Unhealthy
2017-01-09T19:05:12: mythor failed to start cleanly
2017-01-09T19:05:12: Refer to the log file for the binary mythor for more information
2017-01-09T19:05:12: Pid file doesn't exist
2017-01-09T19:05:12: start_component mythor ---> Exit status 1
ubuntu@hpcc-master:~$

thormaster.log
00000001 2017-01-09 19:04:11.615 26123 26123 "Opened log file //192.168.5.23/var/log/HPCCSystems/mythor/thormaster.2017_01_09.log"
00000002 2017-01-09 19:04:11.615 26123 26123 "Build community_5.4.6-1"
00000003 2017-01-09 19:04:11.615 26123 26123 "calling initClientProcess Port 20000"
00000004 2017-01-09 19:04:11.618 26123 26123 "Found file 'thorgroup', using to form thor group"
00000005 2017-01-09 19:04:11.619 26123 26123 "Checking cluster replicate nodes"
00000006 2017-01-09 19:05:11.620 26123 26123 "multiConnect failed to 192.168.5.26:7100 with -1"
00000007 2017-01-09 19:05:11.621 26123 26123 "ERROR: /var/lib/jenkins/workspace/CE-Candidate-withplugins-5.4.6-1/CE/ubuntu-15.04-amd64/HPCC-Platform/thorlcr/master/thmastermain.cpp(393) : VALIDATE FAILED(1) 192.168.5.26 : Connect failure"
00000008 2017-01-09 19:05:11.621 26123 26123 "Cluster replicate nodes check completed in 60002ms"
00000009 2017-01-09 19:05:11.621 26123 26123 "ERROR: /var/lib/jenkins/workspace/CE-Candidate-withplugins-5.4.6-1/CE/ubuntu-15.04-amd64/HPCC-Platform/thorlcr/master/thmastermain.cpp(632) : ERROR: Validate failure(s) detected, exiting Thor"

init_thorslave_mythor.log
ubuntu@hpcc-3:~$ cat /var/log/HPCCSystems/mythor/init_thorslave_mythor_2017_01_09_19_04_10.log
2017-01-09T19:04:11: dependency dafilesrv started
2017-01-09T19:04:11: slave(192.168.5.26) init
2017-01-09T19:04:11: slave(s) starting
2017-01-09T19:04:11: rsync -e ssh -o LogLevel=QUIET -o StrictHostKeyChecking=no 192.168.5.23:/var/lib/HPCCSystems/mythor/thorgroup /var/lib/HPCCSystems/mythor/thorgroup.slave
2017-01-09T19:04:11: thorslave_mythor master=192.168.5.23:20000 slave=.:20100 slavenum=1 logDir=/var/log/HPCCSystems/mythor
2017-01-09T19:04:11: slave pid 30365 started

thorslave.log
ubuntu@hpcc-3:~$ cat /var/log/HPCCSystems/mythor/thorslave.1.2017_01_09.log
00000000 2017-01-09 15:09:59.138 28652 28652 "Opened log file //192.168.5.26/var/log/HPCCSystems/mythor/thorslave.1.2017_01_09.log"
00000001 2017-01-09 15:09:59.138 28652 28652 "Build community_5.4.6-1"
00000002 2017-01-09 15:09:59.140 28652 28652 "ERROR: -7: /var/lib/jenkins/workspace/CE-Candidate-withplugins-5.4.6-1/CE/ubuntu-15.04-amd64/HPCC-Platform/thorlcr/slave/thslavemain.cpp(424) : ThorSlave : port in use
Target: S>192.168.5.26, port = 20100, Raised in: /var/lib/jenkins/workspace/CE-Candidate-withplugins-5.4.6-1/CE/ubuntu-15.04-amd64/HPCC-Platform/system/jlib/jsocket.cpp, line 912"
00000003 2017-01-09 15:09:59.140 28652 28652 "temp directory cleared"
00000004 2017-01-09 15:09:59.140 28652 28652 "Unregistering slave : 192.168.5.26:20100"
00000005 2017-01-09 15:09:59.140 28652 28652 "ERROR: Failed to unregister slave : 192.168.5.26:20100"
00000000 2017-01-09 19:04:11.608 30365 30365 "Opened log file //192.168.5.26/var/log/HPCCSystems/mythor/thorslave.1.2017_01_09.log"
00000001 2017-01-09 19:04:11.608 30365 30365 "Build community_5.4.6-1"
00000002 2017-01-09 19:04:11.610 30365 30365 "registering 192.168.5.26:20100 - master 192.168.5.23:20000"
vchinta
 
Posts: 56
Joined: Mon Oct 31, 2016 3:45 pm

Mon Jan 09, 2017 7:32 pm Change Time Zone

OK, looks like the same error at the end.

Try:
killall -9 –u hpcc
or
kill -9 <PID>

Then please try a restart.
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 975
Joined: Wed Jun 29, 2011 7:13 pm

PreviousNext

Return to Clustering

Who is online

Users browsing this forum: No registered users and 1 guest