Deploying Indexes via Roxie Package
Hi
I'm having several issues when deploying packagemaps to my roxies
Problem 1
When deploying each hour throughout the day I sometimes get the this message on a roxie
Problem 2
Our second issue is when deploying our Roxie Package to 3 roxies one in every 3 deploys fails and the roxies fail to accept the soap request to replace the current package.
We managed to get some information from our logs
If I clone the failed job to force the package in I start getting the issues in Problem 1.
Can anyone please shed any light or push us in the right direction?
We are using version 7.8.46-1, however we are upgrading to 7.12.24.
Thanks
David
I'm having several issues when deploying packagemaps to my roxies
Problem 1
When deploying each hour throughout the day I sometimes get the this message on a roxie
- Code: Select all
Exception
Reported by: Roxie
Message: Query roxiewarmup.2 is suspended because Could not open file /var/lib/HPCCSystems/hpcc-data/roxie/globex/key_multiuserid_multiusersandmultipremium_202102161000._37_of_37
- Code: Select all
sudo service hpcc-init -c myroxie restart
- Code: Select all
0000599D PRG 2021-02-16 11:05:21.822 2846 2853 "Background copying //192.168.24.124:7100/var/lib/HPCCSystems/hpcc-data/thor/globex/key_airesourcescontentngrams_202102161000._4_of_145 to /var/lib/HPCCSystems/hpcc-data/roxie/globex/key_airesourcescontentngrams_202102161000._4_of_145"
0000599E PRG 2021-02-16 11:05:22.307 2846 2853 "Background copy to /var/lib/HPCCSystems/hpcc-data/roxie/globex/key_airesourcescontentngrams_202102161000._4_of_145 complete in 485 ms (32.7 MB/sec)"
0000599F PRG 2021-02-16 11:05:22.412 2846 2853 "Background copying //192.168.24.123:7100/var/lib/HPCCSystems/hpcc-data/thor/globex/key_airesourcescontentngrams_202102161000._3_of_145 to /var/lib/HPCCSystems/hpcc-data/roxie/globex/key_airesourcescontentngrams_202102161000._3_of_145"
000059A0 PRG 2021-02-16 11:05:22.688 2846 2853 "Background copy to /var/lib/HPCCSystems/hpcc-data/roxie/globex/key_airesourcescontentngrams_202102161000._3_of_145 complete in 276 ms (51.4 MB/sec)"
000059A1 PRG 2021-02-16 11:05:22.795 2846 2853 "Background copying //192.168.24.122:7100/var/lib/HPCCSystems/hpcc-data/thor/globex/key_airesourcescontentngrams_202102161000._2_of_145 to /var/lib/HPCCSystems/hpcc-data/roxie/globex/key_airesourcescontentngrams_202102161000._2_of_145"
000059A2 PRG 2021-02-16 11:05:23.436 2846 2853 "Background copy to /var/lib/HPCCSystems/hpcc-data/roxie/globex/key_airesourcescontentngrams_202102161000._2_of_145 complete in 642 ms (34.2 MB/sec)"
000059A3 PRG 2021-02-16 11:05:23.538 2846 2853 "Background copying //192.168.24.121:7100/var/lib/HPCCSystems/hpcc-data/thor/globex/key_airesourcescontentngrams_202102161000._1_of_145 to /var/lib/HPCCSystems/hpcc-data/roxie/globex/key_airesourcescontentngrams_202102161000._1_of_145"
000059A4 PRG 2021-02-16 11:05:24.277 2846 2853 "Background copy to /var/lib/HPCCSystems/hpcc-data/roxie/globex/key_airesourcescontentngrams_202102161000._1_of_145 complete in 739 ms (31.7 MB/sec)"
000059A5 PRG 2021-02-16 11:05:24.328 2846 2853 "No more data files to copy"
000059A6 PRG 2021-02-16 11:05:32.803 2846 2852 "SYS: LPT=15862 APT=316692 PU= 2% MU= 10% MAL=2362552320 MMP=2048638976 SBK=313913344 TOT=2311136K RAM=5312660K SWP=2528K RMU= 1% RMX=1023M"
000059A7 PRG 2021-02-16 11:05:32.804 2846 2852 "DSK: [sda] r/s=0.0 kr/s=0.0 w/s=164.3 kw/s=23817.0 bsy=54 NIC: [bond0] rxp/s=17978.0 rxk/s=25549.4 txp/s=1578.3 txk/s=110.4 rxerrs=0 rxdrps=166 txerrs=0 txdrps=0 CPU: usr=0 sys=1 iow=1 idle=97"
000059A8 PRG 2021-02-16 11:05:44.953 2846 8078 "PING: 1 replies received, average delay 781us"
000059A9 PRG 2021-02-16 11:06:32.825 2846 2852 "SYS: LPT=15862 APT=316692 PU= 0% MU= 10% MAL=2362552320 MMP=2048638976 SBK=313913344 TOT=2311136K RAM=5312004K SWP=2528K RMU= 1% RMX=1023M"
000059AA PRG 2021-02-16 11:06:32.826 2846 2852 "DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.8 kw/s=8.9 bsy=0 NIC: [bond0] rxp/s=13.2 rxk/s=4.1 txp/s=1.7 txk/s=0.7 rxerrs=0 rxdrps=162 txerrs=0 txdrps=0 CPU: usr=0 sys=0 iow=0 idle=99"
000059AB PRG 2021-02-16 11:06:44.954 2846 8078 "PING: 1 replies received, average delay 236us"
000059AC PRG 2021-02-16 11:07:32.847 2846 2852 "SYS: LPT=15862 APT=316692 PU= 0% MU= 10% MAL=2362552320 MMP=2048638976 SBK=313913344 TOT=2311136K RAM=5314436K SWP=2528K RMU= 1% RMX=1023M"
000059AD PRG 2021-02-16 11:07:32.847 2846 2852 "DSK: [sda] r/s=0.0 kr/s=0.0 w/s=0.4 kw/s=2.7 bsy=0 NIC: [bond0] rxp/s=16.0 rxk/s=4.2 txp/s=2.1 txk/s=0.7 rxerrs=0 rxdrps=161 txerrs=0 txdrps=0 CPU: usr=0 sys=0 iow=0 idle=99"
000059AE PRG 2021-02-16 11:07:44.954 2846 8078 "PING: 1 replies received, average delay 235us"
000059AF PRG 2021-02-16 11:08:32.866 2846 2852 "SYS: LPT=15862 APT=316692 PU= 0% MU= 10% MAL=2362552320 MMP=2048638976 SBK=313913344 TOT=2311136K RAM=5314436K SWP=2528K RMU= 1% RMX=1023M"
000059B0 PRG 2021-02-16 11:08:32.867 2846 2852 "DSK: [sda] r/s=0.0 kr/s=0.0 w/s=5.1 kw/s=28.0 bsy=1 NIC: [bond0] rxp/s=28.2 rxk/s=6.2 txp/s=12.8 txk/s=5.9 rxerrs=0 rxdrps=161 txerrs=0 txdrps=0 CPU: usr=0 sys=0 iow=0 idle=99"
000059B1 PRG 2021-02-16 11:08:36.234 2846 9084 "[192.168.20.25:9876{2}] FAILED: "
000059B2 PRG 2021-02-16 11:08:36.234 2846 9084 "[192.168.20.25:9876{2}] EXCEPTION: Query roxiewarmup.2 is suspended because Could not open file /var/lib/HPCCSystems/hpcc-data/roxie/globex/key_multiuserid_multiusersandmultipremium_202102161000._37_of_37"
000059B3 PRG 2021-02-16 11:08:44.954 2846 8078 "PING: 1 replies received, average delay 160us"
000059B4 PRG 2021-02-16 11:09:32.889 2846 2852 "SYS: LPT=15862 APT=316692 PU= 0% MU= 10% MAL=2362552320 MMP=2048638976 SBK=313913344 TOT=2311136K RAM=5254920K SWP=2528K RMU= 1% RMX=1023M"
000059B5 PRG 2021-02-16 11:09:32.889 2846 2852 "DSK: [sda] r/s=0.0 kr/s=0.0 w/s=0.8 kw/s=6.9 bsy=0 NIC: [bond0] rxp/s=17.4 rxk/s=4.9 txp/s=5.8 txk/s=2.3 rxerrs=0 rxdrps=160 txerrs=0 txdrps=0 CPU: usr=0 sys=0 iow=0 idle=99"
000059B6 PRG 2021-02-16 11:09:44.955 2846 8078 "PING: 1 replies received, average delay 256us"
000059B7 PRG 2021-02-16 11:10:32.910 2846 2852 "SYS: LPT=15862 APT=316692 PU= 0% MU= 10% MAL=2362552320 MMP=2048638976 SBK=313913344 TOT=2311136K RAM=5255156K SWP=2528K RMU= 1% RMX=1023M"
000059B8 PRG 2021-02-16 11:10:32.910 2846 2852 "DSK: [sda] r/s=0.0 kr/s=0.0 w/s=0.5 kw/s=3.1 bsy=0 NIC: [bond0] rxp/s=12.2 rxk/s=4.0 txp/s=1.4 txk/s=0.6 rxerrs=0 rxdrps=162 txerrs=0 txdrps=0 CPU: usr=0 sys=0 iow=0 idle=99"
000059B9 PRG 2021-02-16 11:10:40.757 2846 9084 "connectChild connecting to 192.168.20.25:9876"
000059BA PRG 2021-02-16 11:10:40.757 2846 9084 "connectChild connected to 192.168.20.25:9876"
000059BB PRG 2021-02-16 11:10:40.758 2846 23600 "[192.168.20.25:9876{4}] doControlMessage - control:state"
000059BC PRG 2021-02-16 11:10:44.955 2846 8078 "PING: 1 replies received, average delay 232us"
000059BD PRG 2021-02-16 11:11:32.931 2846 2852 "SYS: LPT=15862 APT=316692 PU= 0% MU= 10% MAL=2362552320 MMP=2048638976 SBK=313913344 TOT=2311136K RAM=5258284K SWP=2528K RMU= 1% RMX=1023M"
000059BE PRG 2021-02-16 11:11:32.931 2846 2852 "DSK: [sda] r/s=0.0 kr/s=0.0 w/s=2.3 kw/s=13.9 bsy=0 NIC: [bond0] rxp/s=13.0 rxk/s=4.0 txp/s=1.2 txk/s=0.3 rxerrs=0 rxdrps=167 txerrs=0 txdrps=0 CPU: usr=0 sys=0 iow=0 idle=99"
000059BF PRG 2021-02-16 11:11:44.956 2846 8078 "PING: 1 replies received, average delay 246us"
000059C0 PRG 2021-02-16 11:11:47.464 2846 9084 "[192.168.20.25:9876{5}] doControlMessage - control:queries"
000059C1 PRG 2021-02-16 11:12:27.734 2846 9084 "RoxieMemMgr: Heap size 4096 pages, 4095 free, largest block 4095, heapLWM 0, heapHWM 128, dataBuffersActive=0, dataBufferPages=0"
000059C2 PRG 2021-02-16 11:12:32.952 2846 2852 "SYS: LPT=15862 APT=316692 PU= 0% MU= 10% MAL=2363887616 MMP=2049974272 SBK=313913344 TOT=2312440K RAM=5258076K SWP=2528K RMU= 1% RMX=1023M"
000059C3 PRG 2021-02-16 11:12:32.953 2846 2852 "DSK: [sda] r/s=1.1 kr/s=11.1 w/s=0.4 kw/s=3.7 bsy=0 NIC: [bond0] rxp/s=15.2 rxk/s=4.3 txp/s=1.9 txk/s=0.8 rxerrs=0 rxdrps=168 txerrs=0 txdrps=0 CPU: usr=0 sys=0 iow=0 idle=99"
000059C4 PRG 2021-02-16 11:12:44.956 2846 8078 "PING: 1 replies received, average delay 265us"
000059C5 PRG 2021-02-16 11:13:32.974 2846 2852 "SYS: LPT=15862 APT=316692 PU= 0% MU= 10% MAL=2363887616 MMP=2049974272 SBK=313913344 TOT=2312440K RAM=5258880K SWP=2528K RMU= 1% RMX=1023M"
000059C6 PRG 2021-02-16 11:13:32.975 2846 2852 "DSK: [sda] r/s=0.0 kr/s=0.0 w/s=0.7 kw/s=4.7 bsy=0 NIC: [bond0] rxp/s=13.4 rxk/s=4.0 txp/s=0.9 txk/s=0.2 rxerrs=0 rxdrps=162 txerrs=0 txdrps=0 CPU: usr=0 sys=0 iow=0 idle=99"
000059C7 PRG 2021-02-16 11:13:44.957 2846 8078 "PING: 1 replies received, average delay 217us"
Problem 2
Our second issue is when deploying our Roxie Package to 3 roxies one in every 3 deploys fails and the roxies fail to accept the soap request to replace the current package.
We managed to get some information from our logs
- Code: Select all
0000C6D4 PRG 2021-02-15 07:49:02.701 41665 42734 "MP: Possible clash between 192.168.24.120:7070->192.168.20.25:7339 0(0)" 0000DA3D PRG 2021-02-15 10:50:26.156 41665 42734 "MP: Possible clash between 192.168.24.120:7070->192.168.20.26:7166 0(0)" 0000D4A2 PRG 2021-02-15 10:49:09.333 41665 42734 "MP: Possible clash between 192.168.24.120:7070->192.168.20.27:7475 0(0)" 0000C5F3 PRG 2021-02-15 06:50:23.516 41665 42734 "MP: Possible clash between 192.168.24.120:7070->192.168.20.26:7156 0(0)" 0000C5F4 PRG 2021-02-15 06:50:23.516 41665 42734 "Message Passing - removing stale socket to 192.168.20.26:7156"
If I clone the failed job to force the package in I start getting the issues in Problem 1.
Can anyone please shed any light or push us in the right direction?
We are using version 7.8.46-1, however we are upgrading to 7.12.24.
Thanks
David
- daviddasher
- Posts: 14
- Joined: Fri Dec 08, 2017 12:39 pm
Hi David,
Sorry for the delay in reply! Did anyone reach out to yet with a resolution?
If you haven't already done so, this looks like something that needs to be reported to our Issue Tracker.
https://track.hpccsystems.com/secure/Dashboard.jspa
Thank you!
Bob
Sorry for the delay in reply! Did anyone reach out to yet with a resolution?
If you haven't already done so, this looks like something that needs to be reported to our Issue Tracker.
https://track.hpccsystems.com/secure/Dashboard.jspa
Thank you!
Bob
- bforeman
- Community Advisory Board Member
- Posts: 1006
- Joined: Wed Jun 29, 2011 7:13 pm
Hi Bob
No worries at all.
It turns out we had some issues with a firewall which would terminate the connection between Dali and roxie after an hour. Initially we created a new set of roxies in the same subnet which eliminated the issue and then tracked it back to the firewall rule on the original roxies.
I do need to report via tracker so I'll chase our firewall team on all the details.
Thanks for checking and I hope you are well.
Thanks
David
No worries at all.
It turns out we had some issues with a firewall which would terminate the connection between Dali and roxie after an hour. Initially we created a new set of roxies in the same subnet which eliminated the issue and then tracked it back to the firewall rule on the original roxies.
I do need to report via tracker so I'll chase our firewall team on all the details.
Thanks for checking and I hope you are well.
Thanks
David
- daviddasher
- Posts: 14
- Joined: Fri Dec 08, 2017 12:39 pm
3 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 1 guest