Fails first, succeeds thereafter
Hello!
I'm trying to troubleshoot some weird behavior. I'm on 8.2.6-1 Server.
I have some ECL code like the following:
I run it like so:
Now I have like 6 or 8 files of this ECL code: the layout is different, csv file is different, and logical filename is different too in each file, but the logic is the same overall.
Now here's what's odd.
I run one of the ECL files:
All good.
I run a different one:
It fails. BUT, if I run it again right after:
Now I know we can use other ways to accomplish this but I'm trying to understand why it seems to be failing here when running different ECL code, then run just fine, no matter how often I re-run what just failed.
I did notice 2 distinct errors showing up (sometimes it's one, other times the other):
Other pieces of ECL code run just fine.
I changed the code in each .ecl file to NOT OUTPUT to a logical file (simply "OUTPUT(ds);") and everything works.
As soon as I want to (over)write a logical file, I get this weird behavior.
Any deja vu?
Any idea how I can troubleshoot that further?
Thanks!
I'm trying to troubleshoot some weird behavior. I'm on 8.2.6-1 Server.
I have some ECL code like the following:
- Code: Select all
IMPORT STD;
layout := RECORD
...
END;
ds := DATASET(std.File.ExternalLogicalFilename('192.168.0.1', '/var/lib/HPCCSystems/mydropzone/somefile.csv'), layout, CSV(HEADING(1)));
OUTPUT(ds,,'~test::weird::somefile',OVERWRITE);
I run it like so:
- Code: Select all
ecl run thor my.ecl [other options like -ssl, -I=, etc.]
Now I have like 6 or 8 files of this ECL code: the layout is different, csv file is different, and logical filename is different too in each file, but the logic is the same overall.
Now here's what's odd.
I run one of the ECL files:
- Code: Select all
$ ecl run thor my.ecl [other options like -ssl, -I=, etc.]
Using eclcc path /opt/HPCCSystems/bin/eclcc
EXEC: Creating PIPE program process : '/opt/HPCCSystems/bin/eclcc -E "-I/home/hpcc/git/master/" "/var/lib/HPCCSystems/.../my.ecl"' - hasinput=0, hasoutput=1 stderrbufsize=0
EXEC: Pipe: process 13422 complete 0
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Deploying ECL Archive /var/lib/HPCCSystems/.../my.ecl
Deployed
wuid: W20211025-233226
state: compiled
Running deployed workunit W20211025-233226
<Result>
<Dataset name='Result 1'>
</Dataset>
</Result>
All good.
I run a different one:
- Code: Select all
$ ecl run thor myother.ecl [other options like -ssl, -I=, etc.]
Using eclcc path /opt/HPCCSystems/bin/eclcc
EXEC: Creating PIPE program process : '/opt/HPCCSystems/bin/eclcc -E "-I/home/hpcc/git/master/" "/var/lib/HPCCSystems/.../myother.ecl"' - hasinput=0, hasoutput=1 stderrbufsize=0
EXEC: Pipe: process 13545 complete 0
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Deploying ECL Archive /var/lib/HPCCSystems/.../myother.ecl
Deployed
wuid: W20211025-233234
state: compiled
Running deployed workunit W20211025-233234
W20211025-233234 failed
<Result>
<Exception><Code>4294967295</Code><Source>eclagent</Source><Message>System error: -1: Failed to receive reply from thor 192.168.0.1:20000; (-1, Failed to receive reply from thor 192.168.0.1:20000)</Message></Exception>
</Result>
It fails. BUT, if I run it again right after:
- Code: Select all
$ ecl run thor myother.ecl [other options like -ssl, -I=, etc.]
Using eclcc path /opt/HPCCSystems/bin/eclcc
EXEC: Creating PIPE program process : '/opt/HPCCSystems/bin/eclcc -E "-I/home/hpcc/git/master" "/var/lib/HPCCSystems/.../myother.ecl"' - hasinput=0, hasoutput=1 stderrbufsize=0
EXEC: Pipe: process 15504 complete 0
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Deploying ECL Archive /var/lib/HPCCSystems/.../myother.ecl
Deployed
wuid: W20211025-233809
state: compiled
Running deployed workunit W20211025-233809
<Result>
<Dataset name='Result 1'>
</Dataset>
</Result>
Now I know we can use other ways to accomplish this but I'm trying to understand why it seems to be failing here when running different ECL code, then run just fine, no matter how often I re-run what just failed.
I did notice 2 distinct errors showing up (sometimes it's one, other times the other):
- Code: Select all
<Result>
<Exception><Code>4</Code><Source>eclagent</Source><Message>System error: 4: Unexpected process termination (ep:192.168.0.1:20100)</Message></Exception>
</Result>
- Code: Select all
<Result>
<Exception><Code>4294967295</Code><Source>eclagent</Source><Message>System error: -1: Failed to receive reply from thor 192.168.0.1:20000; (-1, Failed to receive reply from thor 192.168.0.1:20000)</Message></Exception>
</Result>
Other pieces of ECL code run just fine.
I changed the code in each .ecl file to NOT OUTPUT to a logical file (simply "OUTPUT(ds);") and everything works.
As soon as I want to (over)write a logical file, I get this weird behavior.
Any deja vu?
Any idea how I can troubleshoot that further?
Thanks!
- lpezet
- Posts: 85
- Joined: Wed Sep 10, 2014 3:14 am
Looks like some seg fault on the slave(s).
I guess I need to re-install things (preflight certification went fine though the first time).
Sigh...
I guess I need to re-install things (preflight certification went fine though the first time).
Sigh...
- Attachments
-
thorslave_logs.txt
- (5.86 KiB) Downloaded 55 times
- lpezet
- Posts: 85
- Joined: Wed Sep 10, 2014 3:14 am
2 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 1 guest