Sat Aug 18, 2018 2:25 pm
Login Register Lost Password? Contact Us


Mythor failing to start

Topics related to recommendations or questions on the design for HPCC Systems clusters

Mon Nov 28, 2016 4:05 pm Change Time Zone

I recently had to change a local storage location on my compute slaves to a remote nfs share due to space issues. After doing this, mythor fails to start no matter what I do.
The mythor init log looks like this:
# cat init_mythor_2016_11_28_10_59_29.log
2016-11-28T15:59:29: Starting mythor
2016-11-28T15:59:29: removing any previous sentinel file
2016-11-28T15:59:29: Ensuring a clean working environment ...
2016-11-28T15:59:29: Killing slaves
2016-11-28T15:59:29: Error 255 in frunssh
2016-11-28T15:59:29: Please check /var/log/HPCCSystems/frunssh for more details
2016-11-28T15:59:29: Stopping mythor

Short of rebuilding my cluster, I'm not sure what to do at this point. The location that was remapped to an nfs share is /var/lib/HPCCSystems/hpcc-data.
bbrown57
 
Posts: 4
Joined: Mon Jun 13, 2016 2:51 pm

Sat Apr 14, 2018 12:58 am Change Time Zone

Hi,
I have the same problem, any solution for this?

Thanks
eprado22
 
Posts: 3
Joined: Fri Feb 02, 2018 12:03 am

Sat Apr 14, 2018 1:33 am Change Time Zone

My Logs

[hpcc@nodoa mythor]$ vi init_mythor_2018_04_13_20_11_31.log
2018_04_13_20_11_31: Starting mythor
2018_04_13_20_11_31: removing any previous sentinel file
2018_04_13_20_11_31: Ensuring a clean working environment ...
2018_04_13_20_11_31: Killing slaves
2018_04_13_20_11_31: --------------------------
2018_04_13_20_11_31: starting thorslaves ...
2018_04_13_20_11_32: Error 255 in frunssh
2018_04_13_20_11_32: Please check /var/log/HPCCSystems/frunssh for more details
2018_04_13_20_11_33: Stopping mythor
2018_04_13_20_11_33: mythor Stopped
2018_04_13_20_11_33: Killing slaves
2018_04_13_20_11_34: Frunssh successful
2018_04_13_20_11_34: removing init.pid file and slaves file


vi frunssh.2018_04_13.log

1: ssh(0): STDERR: cat: /etc/HPCCSystems/environment.conf: No existe el fichero o el directorio
cat: /etc/HPCCSystems/environment.conf: No existe el fichero o el directorio
cat: /etc/HPCCSystems/environment.conf: No existe el fichero o el directorio
/opt/HPCCSystems/etc/init.d/hpcc_common: línea 266: cfg.section.DEFAULT: no se encontró la orden
unable to write to /var/log/HPCCSystems/mythor/init_thorslave_mythor_2018_04_13_20_11_31.log
2: ssh(0): STDERR: cat: /etc/HPCCSystems/environment.conf: No existe el fichero o el directorio
cat: /etc/HPCCSystems/environment.conf: No existe el fichero o el directorio
cat: /etc/HPCCSystems/environment.conf: No existe el fichero o el directorio
/opt/HPCCSystems/etc/init.d/hpcc_common: línea 266: cfg.section.DEFAULT: no se encontró la orden
unable to write to /var/log/HPCCSystems/mythor/init_thorslave_mythor_2018_04_13_20_11_31.log
eprado22
 
Posts: 3
Joined: Fri Feb 02, 2018 12:03 am

Tue Apr 17, 2018 4:03 pm Change Time Zone

It shouldn't be anything to do with moving the storage directories, but it sounds like you've lost (or moved) some of the installation files (was /etc/HPCCSystems relocated?)

i.e. /etc/HPCCSystems/environment.conf must be present and it's owner and group should be 'hpcc'.

From the error you've pasted, it looks like it's now missing on either the master and/or some of the slave machines. OR in theory it (the scripts running as user hpcc) no longer have rights (permissions) to access it...

I would look at the master and all slave nodes and look to see 1st if this file is present everywhere, and if user hpcc can access it.
jsmith
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 70
Joined: Tue Jul 19, 2011 12:58 pm


Return to Clustering

Who is online

Users browsing this forum: No registered users and 1 guest

cron