Thor Slave won't start
Hello!
I've been trying to install new version of HPCCSystems Platform on Ubuntu 20.x but I'm facing an issue where the Thor Slave just won't start.
I tried HPCCSystems Platform 8.2, 8.0 and 7.12 (latest for each) on Ubuntu 20.04 and 20.10 but I get the same behavior.
Every time I simply do:
When I run preflight certification, it shows Thor Slave is not ready.
Doing simple ps auxwww | grep hpcc I get the following:
The content of the /var/log/HPCCSystems/mythor/thorslaves-launch.debug is like this:
(had to remove some lines from it to be able to submit this post).
I can run manually the ssh command from above, or even directly the thorslaves-exec.sh (with all the right values) but nothing shows up (no errors, no output). I ran the command that thorslaves-exec.sh runs, systemctl start [email protected]horslave_mythor_1.service, and here is its status:
Any idea why the slave would not start? Any idea how I could get more logs here to understand what's going on?
Thanks!
I've been trying to install new version of HPCCSystems Platform on Ubuntu 20.x but I'm facing an issue where the Thor Slave just won't start.
I tried HPCCSystems Platform 8.2, 8.0 and 7.12 (latest for each) on Ubuntu 20.04 and 20.10 but I get the same behavior.
Every time I simply do:
- Code: Select all
dpkg -i hpccsystems-platform....
apt install -f
systemctl start hpccsystems-platform.service
When I run preflight certification, it shows Thor Slave is not ready.
Doing simple ps auxwww | grep hpcc I get the following:
- Code: Select all
hpcc 50730 0.0 0.0 130820 6532 ? Ssl 16:20 0:00 /opt/HPCCSystems/bin/dafilesrv -L /var/log/HPCCSystems -I mydafilesrv -D
hpcc 50745 0.0 0.0 577052 9004 ? Ssl 16:20 0:00 /opt/HPCCSystems/bin/eclccserver --daemon myeclccserver
hpcc 50748 0.0 0.0 355816 8732 ? Ssl 16:20 0:00 /opt/HPCCSystems/bin/agentexec --daemon myeclagent
hpcc 50754 0.0 0.2 2108300 38356 ? Ssl 16:20 0:00 /opt/HPCCSystems/bin/daserver --daemon mydali
hpcc 50755 0.0 0.1 540264 19944 ? Ssl 16:20 0:00 /opt/HPCCSystems/bin/dfuserver --daemon mydfuserver
hpcc 50757 0.0 0.2 2279816 39692 ? Ssl 16:20 0:00 /opt/HPCCSystems/bin/roxie --topology=RoxieTopology.xml --logfile --restarts=2 --stdlog=0 --daemon myroxie
hpcc 50760 0.0 0.0 536128 8444 ? Ssl 16:20 0:00 /opt/HPCCSystems/bin/eclscheduler --daemon myeclscheduler
hpcc 50769 0.0 0.0 86972 3352 ? Ssl 16:20 0:00 /opt/HPCCSystems/bin/toposerver --daemon mytoposerver
hpcc 50775 0.0 0.2 993604 46228 ? Ssl 16:20 0:00 /opt/HPCCSystems/bin/esp --daemon myesp
hpcc 51046 0.0 0.1 4292344 22896 ? Ssl 16:20 0:00 /opt/HPCCSystems/bin/thormaster_lcr --daemon mythor MASTER=172.32.5.210:20000
The content of the /var/log/HPCCSystems/mythor/thorslaves-launch.debug is like this:
- Code: Select all
+ [[ -z mythor ]]
+ [[ -z start ]]
++ pwd
+ cwd=/var/lib/HPCCSystems/mythor
+ [[ /var/lib/HPCCSystems/mythor != \/\v\a\r\/\l\i\b\/\H\P\C\C\S\y\s\t\e\m\s\/\m\y\t\h\o\r ]]
+ source mythor.cfg
++ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/opt/HPCCSystems/bin:/opt/HPCCSystems/sbin:/var/lib/HPCCSystems/mythor
++ THORNAME=mythor
++ THORMASTER=172.32.5.210
++ THORMASTERPORT=20000
++ THORSLAVEPORT=20100
++ localthorportinc=20
++ slavespernode=1
++ channelsperslave=1
++ DALISERVER=172.32.5.210:7070
++ localthor=true
++ breakoutlimit=3600
++ refreshrate=3
++ autoSwapNode=false
++ SSHidentityfile=/home/hpcc/.ssh/id_rsa
++ SSHusername=hpcc
++ SSHpassword=
++ SSHtimeout=0
++ SSHretries=3
++ SSHsudomount=
+ slaveIps=($(/opt/HPCCSystems/bin/daliadmin server=$DALISERVER clusternodes ${THORNAME} slaves timeout=2 1>/dev/null 2>&1; uniq slaves))
++ /opt/HPCCSystems/bin/daliadmin server=172.32.5.210:7070 clusternodes mythor slaves timeout=2
++ uniq slaves
+ [[ -z 172.32.5.210 ]]
+ [[ -z 172.32.5.210 ]]
+ numOfNodes=1
+ (( i=0 ))
+ (( i<1 ))
+ (( c=0 ))
+ (( c<1 ))
+ __slavePort=20100
+ __slaveNum=1
+ ssh -o LogLevel=QUIET -o StrictHostKeyChecking=no -o BatchMode=yes -i /home/hpcc/.ssh/id_rsa [email protected] '/bin/bash -c '\''/opt/HPCCSystems/sbin/thorslaves-exec.sh start thorslave_mythor_1 20100 1 mythor 172.32.5.210 20000'\'''
(...)
+ exit 0
(had to remove some lines from it to be able to submit this post).
I can run manually the ssh command from above, or even directly the thorslaves-exec.sh (with all the right values) but nothing shows up (no errors, no output). I ran the command that thorslaves-exec.sh runs, systemctl start [email protected]horslave_mythor_1.service, and here is its status:
- Code: Select all
● [email protected]_mythor_1.service - thorslave_mythor_1
Loaded: loaded (/etc/systemd/system/[email protected]; static)
Active: failed (Result: exit-code) since Fri 2021-07-23 16:30:48 UTC; 10min ago
Process: 53104 ExecStart=/opt/HPCCSystems/bin/thorslave_lcr --daemon thorslave_mythor_1 master=${THORMASTER}:${THORMASTERPORT} slave=.:${SLAVEPORT} slavenum=${SLAVENUM} logDir=/var/log/HPCCSystems/${THORNAME} (code=exited, status=1/FAILURE)
Main PID: 53104 (code=exited, status=1/FAILURE)
Jul 23 16:30:48 ip-172-32-5-210 systemd[1]: Started thorslave_mythor_1.
Jul 23 16:30:48 ip-172-32-5-210 systemd[1]: [email protected]_mythor_1.service: Main process exited, code=exited, status=1/FAILURE
Jul 23 16:30:48 ip-172-32-5-210 systemd[1]: [email protected]_mythor_1.service: Failed with result 'exit-code'.
Any idea why the slave would not start? Any idea how I could get more logs here to understand what's going on?
Thanks!
- lpezet
- Posts: 85
- Joined: Wed Sep 10, 2014 3:14 am
I've now gone down all the way to HPCCSystems 7.8 on Ubuntu 20.04 and still getting the same behavior WHEN USING systemctl (as mentioned in the doc: https://cdn.hpccsystems.com/releases/CE ... .2.2-1.pdf).
Now I went back to HPCCSystems 8.2/Ubuntu 20.04, but this time using the old school /etc/init.d/hpcc-init start and it worked!
Here are the processes I get for hpcc user:
Preflight/certification is all good to.
Why, oh why?
Now I went back to HPCCSystems 8.2/Ubuntu 20.04, but this time using the old school /etc/init.d/hpcc-init start and it worked!
Here are the processes I get for hpcc user:
- Code: Select all
hpcc 25704 0.0 0.0 9672 4372 pts/0 S 19:06 0:00 /bin/bash /opt/HPCCSystems/bin/init_dafilesrv
hpcc 25743 0.0 0.1 138536 16424 pts/0 Sl 19:06 0:00 dafilesrv -L /var/log/HPCCSystems -I mydafilesrv
hpcc 25887 0.0 0.0 9672 4388 pts/0 S 19:06 0:00 /bin/bash /opt/HPCCSystems/bin/init_dali
hpcc 25924 0.0 0.2 764380 47800 pts/0 Sl 19:06 0:00 daserver
hpcc 26085 0.0 0.0 9672 4392 pts/0 S 19:06 0:00 /bin/bash /opt/HPCCSystems/bin/init_dfuserver
hpcc 26122 0.0 0.1 604864 24360 pts/0 Sl 19:06 0:00 dfuserver
hpcc 26283 0.0 0.0 9672 4464 pts/0 S 19:06 0:00 /bin/bash /opt/HPCCSystems/bin/init_eclagent
hpcc 26323 0.0 0.0 421156 14400 pts/0 Sl 19:06 0:00 agentexec
hpcc 26472 0.0 0.0 9672 4432 pts/0 S 19:06 0:00 /bin/bash /opt/HPCCSystems/bin/init_eclccserver
hpcc 26509 0.0 0.0 576856 14688 pts/0 Sl 19:06 0:00 eclccserver
hpcc 26674 0.0 0.0 9672 4444 pts/0 S 19:06 0:00 /bin/bash /opt/HPCCSystems/bin/init_eclscheduler
hpcc 26711 0.0 0.0 601496 14356 pts/0 Sl 19:06 0:00 eclscheduler
hpcc 26866 0.0 0.0 9672 4228 pts/0 S 19:06 0:00 /bin/bash /opt/HPCCSystems/bin/init_esp
hpcc 26903 0.0 0.3 756612 57512 pts/0 Sl 19:06 0:00 esp snmpid=26866
hpcc 27392 0.0 0.0 9672 4292 pts/0 S 19:06 0:00 /bin/bash /opt/HPCCSystems/bin/init_roxie
hpcc 27434 0.0 0.2 1771348 44128 pts/0 Sl 19:06 0:00 roxie --topology=RoxieTopology.xml --logfile --restarts=0 --stdlog=0
hpcc 27608 0.0 0.0 9672 4340 pts/0 S 19:06 0:00 /bin/bash /opt/HPCCSystems/bin/init_sasha
hpcc 27645 0.0 0.0 617888 14940 pts/0 Sl 19:06 0:00 saserver
hpcc 27807 0.0 0.0 9672 4396 pts/0 S 19:06 0:00 /bin/bash /opt/HPCCSystems/bin/init_thor
hpcc 27952 0.0 0.1 8577428 27952 pts/0 Sl 19:06 0:00 ./thorslave_mythor --master=172.32.5.233:20000 --slave=.:20100 --slavenum=1 --slaveprocessnum=0 --logDir=/var/log/HPCCSystems/mythor
hpcc 27957 0.0 0.1 4701252 28336 pts/0 Sl 19:06 0:00 /var/lib/HPCCSystems/mythor/thormaster_mythor --master=172.32.5.233:20000
hpcc 28135 0.0 0.0 9672 4364 pts/0 S 19:06 0:00 /bin/bash /opt/HPCCSystems/bin/init_toposerver
hpcc 28172 0.0 0.0 87080 8876 pts/0 Sl 19:06 0:00 toposerver
Preflight/certification is all good to.
Why, oh why?
- lpezet
- Posts: 85
- Joined: Wed Sep 10, 2014 3:14 am
Ipezet, thanks for bringing this up. I've opened a Jira ticket and I'll be investigating the issue. https://track.hpccsystems.com/browse/HPCC-26258
Reading the info you provided, it looks like there isn't actually an issue with the ssh call going through, the error is in the thorslaves-exec.sh script?
Reading the info you provided, it looks like there isn't actually an issue with the ssh call going through, the error is in the thorslaves-exec.sh script?
- mgardner
- Posts: 17
- Joined: Tue Jan 20, 2015 9:30 pm
Hello!
I would say thorslaves-exec.sh runs fine, and it's something with /opt/HPCCSystems/bin/thorslave_lcr. When I try to run manually /opt/HPCCSystems/bin/thorslave_lcr I can't get much from it (besides its usage if I don't pass the right parameters): exit code is always 0 and no std/error output.
I would say thorslaves-exec.sh runs fine, and it's something with /opt/HPCCSystems/bin/thorslave_lcr. When I try to run manually /opt/HPCCSystems/bin/thorslave_lcr I can't get much from it (besides its usage if I don't pass the right parameters): exit code is always 0 and no std/error output.
- lpezet
- Posts: 85
- Joined: Wed Sep 10, 2014 3:14 am
Could I get access to that JIRA ticket?
https://track.hpccsystems.com/browse/HPCC-26258
I'm at it again trying to run version 8.4 on Ubuntu 20.04 LTS and I'm still having issues with that new "systemctl" way of things. I'd like to check on that ticket if there's anything I could try to make it work.
Thanks!
https://track.hpccsystems.com/browse/HPCC-26258
I'm at it again trying to run version 8.4 on Ubuntu 20.04 LTS and I'm still having issues with that new "systemctl" way of things. I'd like to check on that ticket if there's anything I could try to make it work.
Thanks!
- lpezet
- Posts: 85
- Joined: Wed Sep 10, 2014 3:14 am
lpezet,
You should be able to simply click on that link and get to the ticket. JIRA will ask you to login, so if you do not yet have a JIRA account you can just sign up to get one. Remember, this is Open Source so the JIRA tickets are visible to everybody.
HTH,
Richard
You should be able to simply click on that link and get to the ticket. JIRA will ask you to login, so if you do not yet have a JIRA account you can just sign up to get one. Remember, this is Open Source so the JIRA tickets are visible to everybody.
HTH,
Richard
- rtaylor
- Community Advisory Board Member
- Posts: 1619
- Joined: Wed Oct 26, 2011 7:40 pm
6 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 2 guests