Wed Aug 15, 2018 1:44 am
Login Register Lost Password? Contact Us


SOAP rpc error when using "ecl run" command in Linux

Questions around writing code and queries

Fri Oct 21, 2016 5:46 pm Change Time Zone

Hi,

I use "ecl run" command to execute ECL jobs in AWS Linux environment. I also use the option "--wait=86400000" with the "ecl run" command (to specify a 24-hour or 86,400,000 ms wait time).

Whenever any ECL job (executed using the "ecl run" command) runs over 2 hours, the "ecl run" bash command returns the following error (however, the ECL job continues running on the cluster):
Code: Select all
SOAP rpc error[errorCode = -6    message = timeout expired
Target: C!111.11.11.111, Raised in: /var/lib/jenkins/workspace/CE-Candidate-5.4.6-1/CE/centos-7.0-x86_64/HPCC-Platform/system/jlib/jsocket.cpp, line 1600 ]


Any suggestions to avoid this error in this procedure call?

(I see the Jira ticket https://track.hpccsystems.com/browse/HPCC-13971 mentioning this issue, but that ticket is currently unresolved.)

Thanks
Rohit
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 4
Joined: Tue Nov 13, 2012 4:34 pm

Mon Sep 18, 2017 11:37 am Change Time Zone

Hi,

is there a workaround for this error for computations that take more than 2 hours?

Thank you very much
davidefanchini
 
Posts: 1
Joined: Mon Sep 19, 2016 9:10 am

Tue Sep 19, 2017 6:24 pm Change Time Zone

Hi,

Since this issue still persists in the 'ecl run' procedure, the workaround that you may consider is the following:

1. Immediately after starting the ECL job using 'ecl run' command, write bash script to find out the Workunit ID of your job by using the 'ecl getwuid' command.

2. Remember, this next command 'ecl getwuid' will execute only after either the 'ecl run' command has successfully completed (or errored out) within 2 hours or the 'ecl run' command has exited due to the SOAP rpc timeout error after 2 hours. In either case, you will get the Workunit ID for your job.

3. Then use a bash 'do while true' loop to keep checking the status of this ECL workunit every 'n' seconds (using the 'ecl status' command). This will allow you to control the next sequence of events based on the various statuses of the job (for example: running / failed / aborted / completed / compiling).

Hope this helps.
Rohit
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 4
Joined: Tue Nov 13, 2012 4:34 pm

Tue Sep 19, 2017 6:39 pm Change Time Zone

David,

No resolution that I'm aware of.
(I see the Jira ticket https://track.hpccsystems.com/browse/HPCC-13971 mentioning this issue, but that ticket is currently unresolved.)
I'd suggest you add a comment to the above-named JIRA ticket.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1368
Joined: Wed Oct 26, 2011 7:40 pm


Return to Programming

Who is online

Users browsing this forum: No registered users and 1 guest

cron