Fri Aug 17, 2018 11:02 pm
Login Register Lost Password? Contact Us


Multinode cluster issue: thorCluster not attached

Topics related to recommendations or questions on the design for HPCC Systems clusters

Mon Jan 09, 2017 7:45 pm Change Time Zone

killall worked on all nodes except 191.168.5.26 where it said hpcc:no process found
Killed the following processes running with user hpcc
hpcc 30365 0.0 0.0 145644 14928 ? Sl 19:04 0:00 ./thorslave_mythor master=192.168.5.

hpcc 29451 0.0 0.0 119416 6680 ? Sl 18:59 0:00 dafilesrv -L /var/log/HPCCSystems -I


hpcc 25709 0.0 0.0 42620 4544 ? Ss Jan07 0:00 /lib/systemd/systemd --user
hpcc 25710 0.0 0.0 58356 1504 ? S Jan07 0:00 (sd-pam)
hpcc 26061 0.0 0.0 12196 4240 ? S Jan07 0:00 /bin/bash /opt/HPCCSystems/bin/init_

Thor still wont start. Looks like same error logs
vchinta
 
Posts: 56
Joined: Mon Oct 31, 2016 3:45 pm

Mon Jan 09, 2017 7:57 pm Change Time Zone

Hmmm....
Something is holding that port...what is the output of
ps -ef|grep thor
and
netstat -plan|grep 20100
on the bad node?
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 975
Joined: Wed Jun 29, 2011 7:13 pm

Mon Jan 09, 2017 8:00 pm Change Time Zone

ubuntu@hpcc-3:~$ sudo netstat -plan|grep 20100
sudo: unable to resolve host hpcc-3
tcp 0 0 0.0.0.0:20100 0.0.0.0:* LISTEN 31264/thorslave_myt
ubuntu@hpcc-3:~$ ^C
ubuntu@hpcc-3:~$ sudo ps -ef|grep thor
sudo: unable to resolve host hpcc-3
ubuntu 1909 29212 0 20:00 pts/0 00:00:00 grep --color=auto thor
hpcc 31264 1 0 19:43 ? 00:00:00 ./thorslave_mythor master=192.168.5.23:20000 slave=.:20100 slavenum=1 logDir=/var/log/HPCCSystems/mythor
vchinta
 
Posts: 56
Joined: Mon Oct 31, 2016 3:45 pm

Mon Jan 09, 2017 8:27 pm Change Time Zone

It may be a delay issue here. Can you try this one more time please?

Kill the process again, wait like a minute or 2 and run the same 2 commands again

ps -ef|grep thor
netstat -plan|grep 20100

on the bad node

Thanks!
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 975
Joined: Wed Jun 29, 2011 7:13 pm

Mon Jan 09, 2017 11:09 pm Change Time Zone

After killing the process

ubuntu@hpcc-3:~$ sudo ps -ef|grep thor
sudo: unable to resolve host hpcc-3
ubuntu 6864 6161 0 23:08 pts/0 00:00:00 grep --color=auto thor
ubuntu@hpcc-3:~$ sudo netstat -plan|grep 20100
sudo: unable to resolve host hpcc-3
ubuntu@hpcc-3:~$
vchinta
 
Posts: 56
Joined: Mon Oct 31, 2016 3:45 pm

Tue Jan 10, 2017 1:48 pm Change Time Zone

OK, so with that process stopped, please try restarting the thormaster again
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 975
Joined: Wed Jun 29, 2011 7:13 pm

Tue Jan 10, 2017 5:12 pm Change Time Zone

Tried it, same error as before.
vchinta
 
Posts: 56
Joined: Mon Oct 31, 2016 3:45 pm

Tue Jan 10, 2017 5:45 pm Change Time Zone

Can you share the mythor attributes from the environment.xml
ultima_centauri
 
Posts: 6
Joined: Fri Nov 30, 2012 7:27 pm

Tue Jan 10, 2017 5:51 pm Change Time Zone

I've attached the contents of the xml file as text file.
Attachments
environment.txt
(46.83 KiB) Downloaded 117 times
vchinta
 
Posts: 56
Joined: Mon Oct 31, 2016 3:45 pm

Tue Jan 10, 2017 8:03 pm Change Time Zone

Can you stop all the hpcc processes on the cluster with sudo /opt/HPCCSystems/sbin/hpcc-run.sh -a hpcc-init stop and sudo /opt/HPCCSystems/sbin/hpcc-run.sh -a dafilesrv stop

If have a linux desktop, I suggest you to install the python module radssh via pip install radssh http://radssh.readthedocs.io/en/v1.1.0/ and run it python -m radssh.shell --username=ubuntu 192.168.5.23-26 this will facilitate troubleshooting

Validate that all the processes are down with ps aux|grep hpcc if you installed radssh you can run it in parallel otherwise you will need to check each node individually or use a for loop; change the slaveport to 21000 in the environment.xml on one of the nodes; once you are done editing the file push it from the node where you made the changes with sudo /opt/HPCCSystems/sbin/hpcc-push.sh -s /etc/HPCCSystems/source/<edited file> -t /etc/HPCCSystems/environment.xml

If you installed radssh you can start the processes by running service dafilesrv start and service hpcc-init start else run sudo /opt/HPCCSystems/sbin/hpcc-run.sh -a hpcc-init stop on one of the nodes

If still fails please include the last log files from the thormaster (init log and thormaster log) and the log files of the thorslave node and the output from radssh (if installed).
ultima_centauri
 
Posts: 6
Joined: Fri Nov 30, 2012 7:27 pm

Previous

Return to Clustering

Who is online

Users browsing this forum: No registered users and 1 guest

cron