Fri Nov 26, 2021 11:35 pm
Login Register Lost Password? Contact Us


Thor Slave won't start

Post questions specific to installation or configuration for the HPCC Systems platform

Fri Jul 23, 2021 4:56 pm Change Time Zone

Hello!

I've been trying to install new version of HPCCSystems Platform on Ubuntu 20.x but I'm facing an issue where the Thor Slave just won't start.
I tried HPCCSystems Platform 8.2, 8.0 and 7.12 (latest for each) on Ubuntu 20.04 and 20.10 but I get the same behavior.
Every time I simply do:
Code: Select all
dpkg -i hpccsystems-platform....
apt install -f
systemctl start hpccsystems-platform.service

When I run preflight certification, it shows Thor Slave is not ready.
Doing simple ps auxwww | grep hpcc I get the following:
Code: Select all
hpcc       50730  0.0  0.0 130820  6532 ?        Ssl  16:20   0:00 /opt/HPCCSystems/bin/dafilesrv -L /var/log/HPCCSystems -I mydafilesrv -D
hpcc       50745  0.0  0.0 577052  9004 ?        Ssl  16:20   0:00 /opt/HPCCSystems/bin/eclccserver --daemon myeclccserver
hpcc       50748  0.0  0.0 355816  8732 ?        Ssl  16:20   0:00 /opt/HPCCSystems/bin/agentexec --daemon myeclagent
hpcc       50754  0.0  0.2 2108300 38356 ?       Ssl  16:20   0:00 /opt/HPCCSystems/bin/daserver --daemon mydali
hpcc       50755  0.0  0.1 540264 19944 ?        Ssl  16:20   0:00 /opt/HPCCSystems/bin/dfuserver --daemon mydfuserver
hpcc       50757  0.0  0.2 2279816 39692 ?       Ssl  16:20   0:00 /opt/HPCCSystems/bin/roxie --topology=RoxieTopology.xml --logfile --restarts=2 --stdlog=0 --daemon myroxie
hpcc       50760  0.0  0.0 536128  8444 ?        Ssl  16:20   0:00 /opt/HPCCSystems/bin/eclscheduler --daemon myeclscheduler
hpcc       50769  0.0  0.0  86972  3352 ?        Ssl  16:20   0:00 /opt/HPCCSystems/bin/toposerver --daemon mytoposerver
hpcc       50775  0.0  0.2 993604 46228 ?        Ssl  16:20   0:00 /opt/HPCCSystems/bin/esp --daemon myesp
hpcc       51046  0.0  0.1 4292344 22896 ?       Ssl  16:20   0:00 /opt/HPCCSystems/bin/thormaster_lcr --daemon mythor MASTER=172.32.5.210:20000


The content of the /var/log/HPCCSystems/mythor/thorslaves-launch.debug is like this:
Code: Select all
+ [[ -z mythor ]]
+ [[ -z start ]]
++ pwd
+ cwd=/var/lib/HPCCSystems/mythor
+ [[ /var/lib/HPCCSystems/mythor != \/\v\a\r\/\l\i\b\/\H\P\C\C\S\y\s\t\e\m\s\/\m\y\t\h\o\r ]]
+ source mythor.cfg
++ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/opt/HPCCSystems/bin:/opt/HPCCSystems/sbin:/var/lib/HPCCSystems/mythor
++ THORNAME=mythor
++ THORMASTER=172.32.5.210
++ THORMASTERPORT=20000
++ THORSLAVEPORT=20100
++ localthorportinc=20
++ slavespernode=1
++ channelsperslave=1
++ DALISERVER=172.32.5.210:7070
++ localthor=true
++ breakoutlimit=3600
++ refreshrate=3
++ autoSwapNode=false
++ SSHidentityfile=/home/hpcc/.ssh/id_rsa
++ SSHusername=hpcc
++ SSHpassword=
++ SSHtimeout=0
++ SSHretries=3
++ SSHsudomount=
+ slaveIps=($(/opt/HPCCSystems/bin/daliadmin server=$DALISERVER clusternodes ${THORNAME} slaves timeout=2 1>/dev/null 2>&1; uniq slaves))
++ /opt/HPCCSystems/bin/daliadmin server=172.32.5.210:7070 clusternodes mythor slaves timeout=2
++ uniq slaves
+ [[ -z 172.32.5.210 ]]
+ [[ -z 172.32.5.210 ]]
+ numOfNodes=1
+ (( i=0 ))
+ (( i<1 ))
+ (( c=0 ))
+ (( c<1 ))
+ __slavePort=20100
+ __slaveNum=1
+ ssh -o LogLevel=QUIET -o StrictHostKeyChecking=no -o BatchMode=yes -i /home/hpcc/.ssh/id_rsa hpcc@172.32.5.210 '/bin/bash -c '\''/opt/HPCCSystems/sbin/thorslaves-exec.sh start thorslave_mythor_1 20100 1 mythor 172.32.5.210 20000'\'''
(...)
+ exit 0

(had to remove some lines from it to be able to submit this post).

I can run manually the ssh command from above, or even directly the thorslaves-exec.sh (with all the right values) but nothing shows up (no errors, no output). I ran the command that thorslaves-exec.sh runs, systemctl start thorslave@thorslave_mythor_1.service, and here is its status:
Code: Select all
● thorslave@thorslave_mythor_1.service - thorslave_mythor_1
     Loaded: loaded (/etc/systemd/system/thorslave@.service; static)
     Active: failed (Result: exit-code) since Fri 2021-07-23 16:30:48 UTC; 10min ago
    Process: 53104 ExecStart=/opt/HPCCSystems/bin/thorslave_lcr --daemon thorslave_mythor_1 master=${THORMASTER}:${THORMASTERPORT} slave=.:${SLAVEPORT} slavenum=${SLAVENUM} logDir=/var/log/HPCCSystems/${THORNAME} (code=exited, status=1/FAILURE)
   Main PID: 53104 (code=exited, status=1/FAILURE)

Jul 23 16:30:48 ip-172-32-5-210 systemd[1]: Started thorslave_mythor_1.
Jul 23 16:30:48 ip-172-32-5-210 systemd[1]: thorslave@thorslave_mythor_1.service: Main process exited, code=exited, status=1/FAILURE
Jul 23 16:30:48 ip-172-32-5-210 systemd[1]: thorslave@thorslave_mythor_1.service: Failed with result 'exit-code'.


Any idea why the slave would not start? Any idea how I could get more logs here to understand what's going on?

Thanks!
lpezet
 
Posts: 81
Joined: Wed Sep 10, 2014 3:14 am

Fri Jul 23, 2021 7:14 pm Change Time Zone

I've now gone down all the way to HPCCSystems 7.8 on Ubuntu 20.04 and still getting the same behavior WHEN USING systemctl (as mentioned in the doc: https://cdn.hpccsystems.com/releases/CE ... .2.2-1.pdf).
Now I went back to HPCCSystems 8.2/Ubuntu 20.04, but this time using the old school /etc/init.d/hpcc-init start and it worked!
Here are the processes I get for hpcc user:
Code: Select all
hpcc       25704  0.0  0.0   9672  4372 pts/0    S    19:06   0:00 /bin/bash /opt/HPCCSystems/bin/init_dafilesrv
hpcc       25743  0.0  0.1 138536 16424 pts/0    Sl   19:06   0:00 dafilesrv -L /var/log/HPCCSystems -I mydafilesrv
hpcc       25887  0.0  0.0   9672  4388 pts/0    S    19:06   0:00 /bin/bash /opt/HPCCSystems/bin/init_dali
hpcc       25924  0.0  0.2 764380 47800 pts/0    Sl   19:06   0:00 daserver
hpcc       26085  0.0  0.0   9672  4392 pts/0    S    19:06   0:00 /bin/bash /opt/HPCCSystems/bin/init_dfuserver
hpcc       26122  0.0  0.1 604864 24360 pts/0    Sl   19:06   0:00 dfuserver
hpcc       26283  0.0  0.0   9672  4464 pts/0    S    19:06   0:00 /bin/bash /opt/HPCCSystems/bin/init_eclagent
hpcc       26323  0.0  0.0 421156 14400 pts/0    Sl   19:06   0:00 agentexec
hpcc       26472  0.0  0.0   9672  4432 pts/0    S    19:06   0:00 /bin/bash /opt/HPCCSystems/bin/init_eclccserver
hpcc       26509  0.0  0.0 576856 14688 pts/0    Sl   19:06   0:00 eclccserver
hpcc       26674  0.0  0.0   9672  4444 pts/0    S    19:06   0:00 /bin/bash /opt/HPCCSystems/bin/init_eclscheduler
hpcc       26711  0.0  0.0 601496 14356 pts/0    Sl   19:06   0:00 eclscheduler
hpcc       26866  0.0  0.0   9672  4228 pts/0    S    19:06   0:00 /bin/bash /opt/HPCCSystems/bin/init_esp
hpcc       26903  0.0  0.3 756612 57512 pts/0    Sl   19:06   0:00 esp snmpid=26866
hpcc       27392  0.0  0.0   9672  4292 pts/0    S    19:06   0:00 /bin/bash /opt/HPCCSystems/bin/init_roxie
hpcc       27434  0.0  0.2 1771348 44128 pts/0   Sl   19:06   0:00 roxie --topology=RoxieTopology.xml --logfile --restarts=0 --stdlog=0
hpcc       27608  0.0  0.0   9672  4340 pts/0    S    19:06   0:00 /bin/bash /opt/HPCCSystems/bin/init_sasha
hpcc       27645  0.0  0.0 617888 14940 pts/0    Sl   19:06   0:00 saserver
hpcc       27807  0.0  0.0   9672  4396 pts/0    S    19:06   0:00 /bin/bash /opt/HPCCSystems/bin/init_thor
hpcc       27952  0.0  0.1 8577428 27952 pts/0   Sl   19:06   0:00 ./thorslave_mythor --master=172.32.5.233:20000 --slave=.:20100 --slavenum=1 --slaveprocessnum=0 --logDir=/var/log/HPCCSystems/mythor
hpcc       27957  0.0  0.1 4701252 28336 pts/0   Sl   19:06   0:00 /var/lib/HPCCSystems/mythor/thormaster_mythor --master=172.32.5.233:20000
hpcc       28135  0.0  0.0   9672  4364 pts/0    S    19:06   0:00 /bin/bash /opt/HPCCSystems/bin/init_toposerver
hpcc       28172  0.0  0.0  87080  8876 pts/0    Sl   19:06   0:00 toposerver


Preflight/certification is all good to.
Why, oh why?
lpezet
 
Posts: 81
Joined: Wed Sep 10, 2014 3:14 am

Fri Jul 23, 2021 10:42 pm Change Time Zone

Ipezet, thanks for bringing this up. I've opened a Jira ticket and I'll be investigating the issue. https://track.hpccsystems.com/browse/HPCC-26258

Reading the info you provided, it looks like there isn't actually an issue with the ssh call going through, the error is in the thorslaves-exec.sh script?
mgardner
 
Posts: 17
Joined: Tue Jan 20, 2015 9:30 pm

Mon Jul 26, 2021 6:28 am Change Time Zone

Hello!

I would say thorslaves-exec.sh runs fine, and it's something with /opt/HPCCSystems/bin/thorslave_lcr. When I try to run manually /opt/HPCCSystems/bin/thorslave_lcr I can't get much from it (besides its usage if I don't pass the right parameters): exit code is always 0 and no std/error output.
lpezet
 
Posts: 81
Joined: Wed Sep 10, 2014 3:14 am

Tue Oct 26, 2021 5:14 pm Change Time Zone

Could I get access to that JIRA ticket?
https://track.hpccsystems.com/browse/HPCC-26258

I'm at it again trying to run version 8.4 on Ubuntu 20.04 LTS and I'm still having issues with that new "systemctl" way of things. I'd like to check on that ticket if there's anything I could try to make it work.

Thanks!
lpezet
 
Posts: 81
Joined: Wed Sep 10, 2014 3:14 am

Wed Oct 27, 2021 1:09 pm Change Time Zone

lpezet,

You should be able to simply click on that link and get to the ticket. JIRA will ask you to login, so if you do not yet have a JIRA account you can just sign up to get one. Remember, this is Open Source so the JIRA tickets are visible to everybody.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1604
Joined: Wed Oct 26, 2011 7:40 pm


Return to Installation

Who is online

Users browsing this forum: No registered users and 1 guest

cron