Sat Nov 27, 2021 2:58 am
Login Register Lost Password? Contact Us


From VM to separate Linux box

Post questions specific to installation or configuration for the HPCC Systems platform

Wed Oct 05, 2011 6:13 pm Change Time Zone

Ok, so I got the processes killed. All of them. Even the ones with no tcp sockets open.

Startup looks normal:

Code: Select all
root@LAB-HPCC-01:/var/log/HPCCSystems/mydali/server# time /etc/init.d/hpcc-init start
Starting mydafilesrv....       [  OK  ]
Starting mydali....            [  OK  ]
Starting mydfuserver....       [  OK  ]
Starting myeclagent....        [  OK  ]
Starting myeclccserver....     [  OK  ]
Starting myeclscheduler....    [  OK  ]
Starting myesp....             [  OK  ]
Starting myroxie....           [  OK  ]
Starting mysasha....           [  OK  ]
Starting mythor....            [  OK  ]


Then the ports look normal:

Code: Select all
root@LAB-HPCC-01:/var/log/HPCCSystems/mydali/server# netstat -tanp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:6500            0.0.0.0:*               LISTEN      12636/thormaster_65
tcp        0      0 0.0.0.0:7365            0.0.0.0:*               LISTEN      12086/eclscheduler
tcp        0      0 0.0.0.0:6600            0.0.0.0:*               LISTEN      12633/thorslave_660
tcp        0      0 0.0.0.0:7368            0.0.0.0:*               LISTEN      11929/agentexec
tcp        0      0 0.0.0.0:8877            0.0.0.0:*               LISTEN      12329/saserver
tcp        0      0 0.0.0.0:7315            0.0.0.0:*               LISTEN      12164/esp
tcp        0      0 0.0.0.0:9876            0.0.0.0:*               LISTEN      12243/roxie
tcp        0      0 0.0.0.0:7444            0.0.0.0:*               LISTEN      12008/eclccserver
tcp        0      0 0.0.0.0:7221            0.0.0.0:*               LISTEN      11850/dfuserver
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      849/sshd
tcp        0      0 0.0.0.0:7353            0.0.0.0:*               LISTEN      12243/roxie
tcp        0      0 0.0.0.0:7100            0.0.0.0:*               LISTEN      11684/dafilesrv
tcp        0      0 0.0.0.0:7070            0.0.0.0:*               LISTEN      11761/daserver
tcp        0      1 138.12.249.27:59729     138.12.249.27:7070      SYN_SENT    12636/thormaster_65
tcp        0      1 138.12.249.27:59735     138.12.249.27:7070      SYN_SENT    12164/esp
tcp        0      0 138.12.249.27:22        138.12.248.168:3802     ESTABLISHED 1075/sshd: kovacsvx
tcp        0      1 138.12.249.27:59734     138.12.249.27:7070      SYN_SENT    12086/eclscheduler
tcp        0      1 138.12.249.27:59730     138.12.249.27:7070      SYN_SENT    12633/thorslave_660
tcp        0      1 138.12.249.27:59731     138.12.249.27:7070      SYN_SENT    11850/dfuserver
tcp        0      1 138.12.249.27:59733     138.12.249.27:7070      SYN_SENT    12008/eclccserver
tcp        0      1 138.12.249.27:59732     138.12.249.27:7070      SYN_SENT    11929/agentexec
tcp        0      1 138.12.249.27:59736     138.12.249.27:7070      SYN_SENT    12329/saserver


Then I get the login soap error:
03--Another error logging in after full restart.png
(9.63 KiB) Downloaded 924 times


Then I note the time/date on the box:
Code: Select all
root@LAB-HPCC-01:/var/log/HPCCSystems/mydali/server# # Now I get the error logging in.
root@LAB-HPCC-01:/var/log/HPCCSystems/mydali/server# date
Wed Oct  5 10:31:28 EDT 2011
r



Then I check which logs changed around the time I got the login error:
Code: Select all
root@LAB-HPCC-01:/var/log/HPCCSystems# find -iname "*.log" | xargs grep -EHni "2011-10-05 10:(28|29|30)" *.log

./mydali/server/DaServer.log:18:00000011 2011-10-05 10:28:52 11761 11765 "SYS: PU=  0% MU=  2% MAL=29619200 MMP=29364224 SBK=254976 TOT=30004K RAM=306820K SWP=0K"
./mydali/server/DaServer.log:19:00000012 2011-10-05 10:28:52 11761 11765 "DSK: \[sda\] r/s=0.0 kr/s=0.0 w/s=0.6 kw/s=2.6 bsy=0 \[sdb\] r/s=0.0 kr/s=0.0 w/s=0.0 kw/s=0.0 bsy=0 NIC: rxp/s=1.9 rxk/s=0.2 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=0 iow=0 idle=99"
./mydali/server/DaServer.log:20:00000013 2011-10-05 10:29:52 11761 11765 "SYS: PU=  0% MU=  2% MAL=29619200 MMP=29364224 SBK=254976 TOT=30004K RAM=303460K SWP=0K"
./mydali/server/DaServer.log:21:00000014 2011-10-05 10:29:52 11761 11765 "DSK: \[sda\] r/s=0.0 kr/s=0.0 w/s=0.5 kw/s=1.9 bsy=0 \[sdb\] r/s=0.0 kr/s=0.0 w/s=0.0 kw/s=0.0 bsy=0 NIC: rxp/s=1.8 rxk/s=0.2 txp/s=0.1 txk/s=0.0 CPU: usr=0 sys=0 iow=0 idle=99"
./mydali/server/DaServer.log:22:00000015 2011-10-05 10:30:52 11761 11765 "SYS: PU=  0% MU=  2% MAL=29619200 MMP=29364224 SBK=254976 TOT=30004K RAM=303452K SWP=0K"
./mydali/server/DaServer.log:23:00000016 2011-10-05 10:30:52 11761 11765 "DSK: \[sda\] r/s=0.0 kr/s=0.0 w/s=0.4 kw/s=1.8 bsy=0 \[sdb\] r/s=0.0 kr/s=0.0 w/s=0.0 kw/s=0.0 bsy=0 NIC: rxp/s=1.7 rxk/s=0.2 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=0 iow=0 idle=99"

./mysasha/saserver.log:6:00000005 2011-10-05 10:29:50 12329 12329 "Failed to connect to Dali Server 138.12.249.27:7070. Retrying..."

./myeclscheduler/eclscheduler.log:5:00000004 2011-10-05 10:29:46 12086 12086 "Failed to connect to Dali Server 138.12.249.27:7070. Retrying..."

./myroxie/roxie.log:63:0000003F 2011-10-05 10:28:09 12243 12754 "PING: 1 replies received, average delay 104"
./myroxie/roxie.log:64:00000040 2011-10-05 10:28:59 12243 12246 "SYS: PU=  0% MU=  2% MAL=1075227168 MMP=1074794496 SBK=432672 TOT=1050160K RAM=306824K SWP=0K"
./myroxie/roxie.log:65:00000041 2011-10-05 10:28:59 12243 12246 "DSK: \[sda\] r/s=0.0 kr/s=0.0 w/s=0.5 kw/s=1.8 bsy=0 \[sdb\] r/s=0.0 kr/s=0.0 w/s=0.0 kw/s=0.0 bsy=0 NIC: rxp/s=2.1 rxk/s=0.2 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=0 iow=0 idle=99"
./myroxie/roxie.log:66:00000042 2011-10-05 10:29:09 12243 12754 "PING: 0 replies received, average delay 0"
./myroxie/roxie.log:67:00000043 2011-10-05 10:29:59 12243 12246 "SYS: PU=  0% MU=  2% MAL=1075227424 MMP=1074794496 SBK=432928 TOT=1050160K RAM=303464K SWP=0K"
./myroxie/roxie.log:68:00000044 2011-10-05 10:29:59 12243 12246 "DSK: \[sda\] r/s=0.0 kr/s=0.0 w/s=0.6 kw/s=2.7 bsy=0 \[sdb\] r/s=0.0 kr/s=0.0 w/s=0.0 kw/s=0.0 bsy=0 NIC: rxp/s=1.7 rxk/s=0.2 txp/s=0.1 txk/s=0.0 CPU: usr=0 sys=0 iow=0 idle=99"
./myroxie/roxie.log:69:00000045 2011-10-05 10:30:09 12243 12754 "PING: 0 replies received, average delay 0"
./myroxie/roxie.log:70:00000046 2011-10-05 10:30:59 12243 12246 "SYS: PU=  0% MU=  2% MAL=1075227648 MMP=1074794496 SBK=433152 TOT=1050160K RAM=303456K SWP=0K"
./myroxie/roxie.log:71:00000047 2011-10-05 10:30:59 12243 12246 "DSK: \[sda\] r/s=0.0 kr/s=0.0 w/s=0.3 kw/s=1.0 bsy=0 \[sdb\] r/s=0.0 kr/s=0.0 w/s=0.0 kw/s=0.0 bsy=0 NIC: rxp/s=1.7 rxk/s=0.2 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=0 iow=0 idle=99"

./myesp/esp.log:9:00000009 2011-10-05 10:29:48 12164 12164 "Failed to connect to Dali Server 138.12.249.27:7070. Retrying..."
root@LAB-HPCC-01:/var/log/HPCCSystems#


Obviously, shasha, eclscheduler, roxie, et al cannot connect to mydali.

Then I go to the dali log, and everything looks fine!

I look for dali processes with ps -ef | grep dali and there it is:

Code: Select all
hpcc     11743     1  0 10:25 pts/0    00:00:00 /bin/bash /opt/HPCCSystems/bin/init_dali


That's where I am now.
kovacsbv
 
Posts: 33
Joined: Fri Aug 05, 2011 12:53 pm

Wed Oct 05, 2011 6:23 pm Change Time Zone

Got it.

The iptables were still stopping various connections from the components to dali. I opened iptables wide open and it works now. Of course I need to find a way to tighten it back down, but at least I know what's wrong.

Thanks for all your time, Flavio.

Vic
kovacsbv
 
Posts: 33
Joined: Fri Aug 05, 2011 12:53 pm

Previous

Return to Installation

Who is online

Users browsing this forum: No registered users and 1 guest

cron