Multinode cluster issue: thorCluster not attached
killall worked on all nodes except 191.168.5.26 where it said hpcc:no process found
Killed the following processes running with user hpcc
hpcc 30365 0.0 0.0 145644 14928 ? Sl 19:04 0:00 ./thorslave_mythor master=192.168.5.
hpcc 29451 0.0 0.0 119416 6680 ? Sl 18:59 0:00 dafilesrv -L /var/log/HPCCSystems -I
hpcc 25709 0.0 0.0 42620 4544 ? Ss Jan07 0:00 /lib/systemd/systemd --user
hpcc 25710 0.0 0.0 58356 1504 ? S Jan07 0:00 (sd-pam)
hpcc 26061 0.0 0.0 12196 4240 ? S Jan07 0:00 /bin/bash /opt/HPCCSystems/bin/init_
Thor still wont start. Looks like same error logs
Killed the following processes running with user hpcc
hpcc 30365 0.0 0.0 145644 14928 ? Sl 19:04 0:00 ./thorslave_mythor master=192.168.5.
hpcc 29451 0.0 0.0 119416 6680 ? Sl 18:59 0:00 dafilesrv -L /var/log/HPCCSystems -I
hpcc 25709 0.0 0.0 42620 4544 ? Ss Jan07 0:00 /lib/systemd/systemd --user
hpcc 25710 0.0 0.0 58356 1504 ? S Jan07 0:00 (sd-pam)
hpcc 26061 0.0 0.0 12196 4240 ? S Jan07 0:00 /bin/bash /opt/HPCCSystems/bin/init_
Thor still wont start. Looks like same error logs
- vchinta
- Posts: 56
- Joined: Mon Oct 31, 2016 3:45 pm
Hmmm....
Something is holding that port...what is the output of
ps -ef|grep thor
and
netstat -plan|grep 20100
on the bad node?
Something is holding that port...what is the output of
ps -ef|grep thor
and
netstat -plan|grep 20100
on the bad node?
- bforeman
- Community Advisory Board Member
- Posts: 1006
- Joined: Wed Jun 29, 2011 7:13 pm
[email protected]:~$ sudo netstat -plan|grep 20100
sudo: unable to resolve host hpcc-3
tcp 0 0 0.0.0.0:20100 0.0.0.0:* LISTEN 31264/thorslave_myt
[email protected]:~$ ^C
[email protected]:~$ sudo ps -ef|grep thor
sudo: unable to resolve host hpcc-3
ubuntu 1909 29212 0 20:00 pts/0 00:00:00 grep --color=auto thor
hpcc 31264 1 0 19:43 ? 00:00:00 ./thorslave_mythor master=192.168.5.23:20000 slave=.:20100 slavenum=1 logDir=/var/log/HPCCSystems/mythor
sudo: unable to resolve host hpcc-3
tcp 0 0 0.0.0.0:20100 0.0.0.0:* LISTEN 31264/thorslave_myt
[email protected]:~$ ^C
[email protected]:~$ sudo ps -ef|grep thor
sudo: unable to resolve host hpcc-3
ubuntu 1909 29212 0 20:00 pts/0 00:00:00 grep --color=auto thor
hpcc 31264 1 0 19:43 ? 00:00:00 ./thorslave_mythor master=192.168.5.23:20000 slave=.:20100 slavenum=1 logDir=/var/log/HPCCSystems/mythor
- vchinta
- Posts: 56
- Joined: Mon Oct 31, 2016 3:45 pm
It may be a delay issue here. Can you try this one more time please?
Kill the process again, wait like a minute or 2 and run the same 2 commands again
ps -ef|grep thor
netstat -plan|grep 20100
on the bad node
Thanks!
Kill the process again, wait like a minute or 2 and run the same 2 commands again
ps -ef|grep thor
netstat -plan|grep 20100
on the bad node
Thanks!
- bforeman
- Community Advisory Board Member
- Posts: 1006
- Joined: Wed Jun 29, 2011 7:13 pm
After killing the process
[email protected]:~$ sudo ps -ef|grep thor
sudo: unable to resolve host hpcc-3
ubuntu 6864 6161 0 23:08 pts/0 00:00:00 grep --color=auto thor
[email protected]:~$ sudo netstat -plan|grep 20100
sudo: unable to resolve host hpcc-3
[email protected]:~$
[email protected]:~$ sudo ps -ef|grep thor
sudo: unable to resolve host hpcc-3
ubuntu 6864 6161 0 23:08 pts/0 00:00:00 grep --color=auto thor
[email protected]:~$ sudo netstat -plan|grep 20100
sudo: unable to resolve host hpcc-3
[email protected]:~$
- vchinta
- Posts: 56
- Joined: Mon Oct 31, 2016 3:45 pm
OK, so with that process stopped, please try restarting the thormaster again
- bforeman
- Community Advisory Board Member
- Posts: 1006
- Joined: Wed Jun 29, 2011 7:13 pm
Can you share the mythor attributes from the environment.xml
- ultima_centauri
- Posts: 6
- Joined: Fri Nov 30, 2012 7:27 pm
I've attached the contents of the xml file as text file.
- Attachments
-
environment.txt
- (46.83 KiB) Downloaded 376 times
- vchinta
- Posts: 56
- Joined: Mon Oct 31, 2016 3:45 pm
Can you stop all the hpcc processes on the cluster with sudo /opt/HPCCSystems/sbin/hpcc-run.sh -a hpcc-init stop and sudo /opt/HPCCSystems/sbin/hpcc-run.sh -a dafilesrv stop
If have a linux desktop, I suggest you to install the python module radssh via pip install radssh http://radssh.readthedocs.io/en/v1.1.0/ and run it python -m radssh.shell --username=ubuntu 192.168.5.23-26 this will facilitate troubleshooting
Validate that all the processes are down with ps aux|grep hpcc if you installed radssh you can run it in parallel otherwise you will need to check each node individually or use a for loop; change the slaveport to 21000 in the environment.xml on one of the nodes; once you are done editing the file push it from the node where you made the changes with sudo /opt/HPCCSystems/sbin/hpcc-push.sh -s /etc/HPCCSystems/source/<edited file> -t /etc/HPCCSystems/environment.xml
If you installed radssh you can start the processes by running service dafilesrv start and service hpcc-init start else run sudo /opt/HPCCSystems/sbin/hpcc-run.sh -a hpcc-init stop on one of the nodes
If still fails please include the last log files from the thormaster (init log and thormaster log) and the log files of the thorslave node and the output from radssh (if installed).
If have a linux desktop, I suggest you to install the python module radssh via pip install radssh http://radssh.readthedocs.io/en/v1.1.0/ and run it python -m radssh.shell --username=ubuntu 192.168.5.23-26 this will facilitate troubleshooting
Validate that all the processes are down with ps aux|grep hpcc if you installed radssh you can run it in parallel otherwise you will need to check each node individually or use a for loop; change the slaveport to 21000 in the environment.xml on one of the nodes; once you are done editing the file push it from the node where you made the changes with sudo /opt/HPCCSystems/sbin/hpcc-push.sh -s /etc/HPCCSystems/source/<edited file> -t /etc/HPCCSystems/environment.xml
If you installed radssh you can start the processes by running service dafilesrv start and service hpcc-init start else run sudo /opt/HPCCSystems/sbin/hpcc-run.sh -a hpcc-init stop on one of the nodes
If still fails please include the last log files from the thormaster (init log and thormaster log) and the log files of the thorslave node and the output from radssh (if installed).
- ultima_centauri
- Posts: 6
- Joined: Fri Nov 30, 2012 7:27 pm
30 posts
• Page 3 of 3 • 1, 2, 3
Who is online
Users browsing this forum: No registered users and 1 guest