Fri Aug 17, 2018 5:34 pm
Login Register Lost Password? Contact Us


Multinode cluster issue: thorCluster not attached

Topics related to recommendations or questions on the design for HPCC Systems clusters

Tue Dec 02, 2014 7:20 pm Change Time Zone

Hi,

I am trying to create a 2-node cluster on Ubuntu as given in the installation guide.
1) Installed HPCC 5.0.2.1 on both the machines (master A and slave B).
2) Created an environment.xml using configmgr. Copied the environment over to the second machine (Machine B) with hpcc user credentials.
3) Copied over .ssh folder (id_rsa, id_rsa.pub, authorized key) from hpcc user's home from machine A to B.
4) Now I start the hpcc process using "sudo service hpcc-init". But on ECL watch, it says "thorCluster is not attached".

Am I missing something?
soniaghanekar
 
Posts: 1
Joined: Thu Nov 06, 2014 2:56 pm

Thu Dec 04, 2014 2:42 pm Change Time Zone

The first place to start is by looking at the thor logs:

Code: Select all
/var/log/HPCCSystems/<name of thor>


See what the error is in the thormaster log. Its possible the error is in the environment.xml file. As you already know, the environment xml needs to be on all machines.

When you edit the enviroinment file, the file in question is found in:

Code: Select all
/etc/HPCCSystems/source/environment.xml


To "push" it out it needs to be located in:

Code: Select all
/etc/HPCCSystems/environment.xml



In addition, on both machines, you need to check to make sure that the md5sum match

For example:

On the 1st node (assuming the edit happened on that node)

Code: Select all
md5sum /etc/HPCCSystems/environment.xml /etc/HPCCSystems/source/*.xml


Next, go onto the second node and run

Code: Select all
md5sum /etc/HPCCSystems/environment.xml


HTH,

Bob and the HPCC team
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 975
Joined: Wed Jun 29, 2011 7:13 pm

Sat Jan 07, 2017 8:38 pm Change Time Zone

Hello,

I get the same error running hpcc 5.4.6-1 on ubuntu 15.04. I checked the xml file on all nodes, checksum is same on all of them.
Below is the generated log file
ubuntu@hpcc-master:~$cat /var/log/HPCCSystems/mythor/init_mythor_2017_01_07_20_34_50.log
2017-01-07T20:34:50: Starting mythor
2017-01-07T20:34:50: removing any previous sentinel file
2017-01-07T20:34:50: Ensuring a clean working environment ...
2017-01-07T20:34:50: Killing slaves
2017-01-07T20:34:50: --------------------------
2017-01-07T20:34:50: starting thorslaves ...
2017-01-07T20:34:51: thormaster cmd : /var/lib/HPCCSystems/mythor/thormaster_mythor MASTER=192.168.5.23:20000
2017-01-07T20:34:51: thormaster_lcr process started pid = 23790
2017-01-07T20:35:51: Thormaster (23790) Exited cleanly
ubuntu@hpcc-master:~$

Any idea what is wrong? Thanks for your help.

Vishnu
vchinta
 
Posts: 56
Joined: Mon Oct 31, 2016 3:45 pm

Mon Jan 09, 2017 2:04 pm Change Time Zone

Does the cluster has the mydafilesrv process up and running? You might want to do a preflight on the system:

http://cdn.hpccsystems.com/releases/CE-Candidate-6.2.0/docs/The_ECL_Watch_Manual-6.2.0-1.pdf

See pages 101 and 104
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 975
Joined: Wed Jun 29, 2011 7:13 pm

Mon Jan 09, 2017 3:18 pm Change Time Zone

Hello,
Did the preflight, mydafilesrv is running without any issues. Screenshots of the preflight
Error2.PNG
Error2.PNG (78.73 KiB) Viewed 685 times
Error1.PNG
Error1.PNG (45.63 KiB) Viewed 685 times


and generated output when I attempt to start the cluster. Also now I get the same error for Roxie as well

ubuntu@hpcc-master:~$ sudo -u hpcc /opt/HPCCSystems/sbin/hpcc-run.sh -a hpcc-init start
192.168.5.23: Running sudo /etc/init.d/hpcc-init start
sudo: unable to resolve host hpcc-master
Dependent service dafilesrv, mydafilesrv is already running.
Starting mydali ... [ OK ]
Starting mydfuserver ... [ OK ]
Starting myeclagent ... [ OK ]
Starting myeclccserver ... [ OK ]
Starting myeclscheduler ... [ OK ]
Starting myesp ... [ OK ]
Starting mysasha ... [ OK ]
Starting mythor ... [ FAILED ]

hpcc-init start in the cluster ...

Total hosts to process: 3

Execution progress: 100%, running: 0, in queue: 0, succeed: 3, failed: 0

hpcc-init_start_20004 run successfully on all hosts in the cluster


192.168.5.24 hpcc-init start :
Starting myroxie ... [ OK ]

192.168.5.25 hpcc-init start :

192.168.5.26 hpcc-init start :
vchinta
 
Posts: 56
Joined: Mon Oct 31, 2016 3:45 pm

Mon Jan 09, 2017 4:14 pm Change Time Zone

What is in the log files on 192.168.5.26 where you are having issues?
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 975
Joined: Wed Jun 29, 2011 7:13 pm

Mon Jan 09, 2017 4:18 pm Change Time Zone

ubuntu@hpcc-3:~$ cat /var/log/HPCCSystems/mythor/init_thorslave_mythor_2017_01_09_15_09_57.log
2017-01-09T15:09:58: dependency dafilesrv started
2017-01-09T15:09:58: slave(192.168.5.26) init
2017-01-09T15:09:58: slave(s) starting
2017-01-09T15:09:58: rsync -e ssh -o LogLevel=QUIET -o StrictHostKeyChecking=no 192.168.5.23:/var/lib/HPCCSystems/mythor/thorgroup /var/lib/HPCCSystems/mythor/thorgroup.slave
2017-01-09T15:09:59: thorslave_mythor master=192.168.5.23:20000 slave=.:20100 slavenum=1 logDir=/var/log/HPCCSystems/mythor
2017-01-09T15:09:59: slave pid 28652 started
vchinta
 
Posts: 56
Joined: Mon Oct 31, 2016 3:45 pm

Mon Jan 09, 2017 4:53 pm Change Time Zone

Are you able to become hpcc user and ssh into the thormaster and viceversa?
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 975
Joined: Wed Jun 29, 2011 7:13 pm

Mon Jan 09, 2017 4:57 pm Change Time Zone

How would I do that?
vchinta
 
Posts: 56
Joined: Mon Oct 31, 2016 3:45 pm

Mon Jan 09, 2017 5:20 pm Change Time Zone

On suspected bad node
su - hpcc then ssh <thormaster>

On thormaster
su - hpcc then ssh <suspected bad node>
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 975
Joined: Wed Jun 29, 2011 7:13 pm

Next

Return to Clustering

Who is online

Users browsing this forum: No registered users and 1 guest