Sun Dec 05, 2021 5:38 pm
Login Register Lost Password? Contact Us


Fails first, succeeds thereafter

Comments and questions related to the Enterprise Control Language

Mon Oct 25, 2021 11:49 pm Change Time Zone

Hello!

I'm trying to troubleshoot some weird behavior. I'm on 8.2.6-1 Server.

I have some ECL code like the following:
Code: Select all
IMPORT STD;
layout := RECORD
...
END;
ds := DATASET(std.File.ExternalLogicalFilename('192.168.0.1', '/var/lib/HPCCSystems/mydropzone/somefile.csv'), layout, CSV(HEADING(1)));
OUTPUT(ds,,'~test::weird::somefile',OVERWRITE);

I run it like so:
Code: Select all
ecl run thor my.ecl [other options like -ssl, -I=, etc.]

Now I have like 6 or 8 files of this ECL code: the layout is different, csv file is different, and logical filename is different too in each file, but the logic is the same overall.

Now here's what's odd.
I run one of the ECL files:
Code: Select all
$ ecl run thor my.ecl [other options like -ssl, -I=, etc.]
Using eclcc path /opt/HPCCSystems/bin/eclcc
EXEC: Creating PIPE program process : '/opt/HPCCSystems/bin/eclcc -E "-I/home/hpcc/git/master/" "/var/lib/HPCCSystems/.../my.ecl"' - hasinput=0, hasoutput=1 stderrbufsize=0
EXEC: Pipe: process 13422 complete 0
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]


Deploying ECL Archive /var/lib/HPCCSystems/.../my.ecl

Deployed
   wuid: W20211025-233226
   state: compiled

Running deployed workunit W20211025-233226
<Result>
<Dataset name='Result 1'>
</Dataset>
</Result>

All good.
I run a different one:
Code: Select all
$ ecl run thor myother.ecl [other options like -ssl, -I=, etc.]
Using eclcc path /opt/HPCCSystems/bin/eclcc
EXEC: Creating PIPE program process : '/opt/HPCCSystems/bin/eclcc -E "-I/home/hpcc/git/master/" "/var/lib/HPCCSystems/.../myother.ecl"' - hasinput=0, hasoutput=1 stderrbufsize=0
EXEC: Pipe: process 13545 complete 0
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]


Deploying ECL Archive /var/lib/HPCCSystems/.../myother.ecl

Deployed
   wuid: W20211025-233234
   state: compiled

Running deployed workunit W20211025-233234
W20211025-233234 failed
<Result>
<Exception><Code>4294967295</Code><Source>eclagent</Source><Message>System error: -1: Failed to receive reply from thor 192.168.0.1:20000; (-1, Failed to receive reply from thor 192.168.0.1:20000)</Message></Exception>
</Result>


It fails. BUT, if I run it again right after:
Code: Select all
$ ecl run thor myother.ecl [other options like -ssl, -I=, etc.]
Using eclcc path /opt/HPCCSystems/bin/eclcc
EXEC: Creating PIPE program process : '/opt/HPCCSystems/bin/eclcc -E "-I/home/hpcc/git/master" "/var/lib/HPCCSystems/.../myother.ecl"' - hasinput=0, hasoutput=1 stderrbufsize=0
EXEC: Pipe: process 15504 complete 0
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]


Deploying ECL Archive /var/lib/HPCCSystems/.../myother.ecl

Deployed
   wuid: W20211025-233809
   state: compiled

Running deployed workunit W20211025-233809

<Result>
<Dataset name='Result 1'>
</Dataset>
</Result>

Now I know we can use other ways to accomplish this but I'm trying to understand why it seems to be failing here when running different ECL code, then run just fine, no matter how often I re-run what just failed.

I did notice 2 distinct errors showing up (sometimes it's one, other times the other):
Code: Select all
<Result>
<Exception><Code>4</Code><Source>eclagent</Source><Message>System error: 4: Unexpected process termination (ep:192.168.0.1:20100)</Message></Exception>
</Result>

Code: Select all
<Result>
<Exception><Code>4294967295</Code><Source>eclagent</Source><Message>System error: -1: Failed to receive reply from thor 192.168.0.1:20000; (-1, Failed to receive reply from thor 192.168.0.1:20000)</Message></Exception>
</Result>


Other pieces of ECL code run just fine.
I changed the code in each .ecl file to NOT OUTPUT to a logical file (simply "OUTPUT(ds);") and everything works.
As soon as I want to (over)write a logical file, I get this weird behavior.

Any deja vu?
Any idea how I can troubleshoot that further?

Thanks!
lpezet
 
Posts: 81
Joined: Wed Sep 10, 2014 3:14 am

Tue Oct 26, 2021 12:06 am Change Time Zone

Looks like some seg fault on the slave(s).
I guess I need to re-install things (preflight certification went fine though the first time).
Sigh...
Attachments
thorslave_logs.txt
(5.86 KiB) Downloaded 12 times
lpezet
 
Posts: 81
Joined: Wed Sep 10, 2014 3:14 am


Return to ECL

Who is online

Users browsing this forum: No registered users and 1 guest