Thu Dec 09, 2021 12:21 am
Login Register Lost Password? Contact Us


Query about zipping files on unix landing zone from HPCC.

Comments and questions related to the Enterprise Control Language

Fri Jul 09, 2021 8:52 am Change Time Zone

Hello Everyone,

Is there a way to zip files on the unix landing zone from HPCC using ECL ? An example of how to do will help.

Thanks and regards,
Akhilesh Badhri.
akhileshbadhri
 
Posts: 26
Joined: Thu Sep 22, 2016 12:15 pm

Fri Jul 09, 2021 1:01 pm Change Time Zone

Akhilesh,

I am unaware of any way to do that from ECL. You could try using the first form of the PIPE() function to call the Linux ZIP command. I have not tried it, nor do I know of anyone who has, so I have no example to provide.

Since you wouldn't want this to run n times (where n is the number of Thor nodes you're running on), I'd suggest running it only on hThor (or possibly using the NOTHOR() action) to ensure it only runs once.

If you get it working, perhaps you could post an example of how you did it here.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1606
Joined: Wed Oct 26, 2011 7:40 pm

Mon Jul 12, 2021 8:55 am Change Time Zone

Hello Richard,

I tried the following code -

IMPORT STD;

rec := RECORD
string name;
END;

IP := 'landing_zone_ip';

Despray := Std.File.Despray('~thor::4xx_res_resp.xml',
IP,
'/home/bxxxxx01/transfer/4xx_res_resp.xml',,,,true);

MyDropZone := '/home/bxxxxx01/transfer/';
RawFilename := MyDropZone + '4xx_res_resp.xml';

ZipCmdRaw := '" gzip -f ' + RawFilename + '";';

ZipCmd := 'bash -c \'' + ZipCmdRaw + '\'';
ZippedDS := PIPE(ZipCmd,rec);

ORDERED(Despray,OUTPUT(ZippedDS));


It gives me the following error -

System error: 2: Error piping from (bash -c '"gzip -f /home/bxxxxx01/transfer/4xx_res_resp.xml";'): process failed with code 127, stderr: 'bash: gzip -f /home/bxxxxx01/transfer/4xx_res_resp.xml: No such file or directory '

I also tried the following command by giving the landing zone IP like -
ZipCmdRaw := '" gzip -f landing_zone_IP:' + RawFilename + '";';

But still I get the same error.

Am I missing something here?

Thanks and regards,
Akhilesh Badhri.
akhileshbadhri
 
Posts: 26
Joined: Thu Sep 22, 2016 12:15 pm

Mon Jul 12, 2021 1:56 pm Change Time Zone

There's no clean way, as it's not something built into the platform.
From the error, it sounds like it's treating the whole command ("gzip -f /home/bxxxxx01/transfer/4xx_res_resp.xml") as the command (the same error would be issued if a quoted "gzip -f /home/bxxxxx01/transfer/4xx_res_resp.xml" was run from bash directly on the command line.

I think it would get further if you remove the quotes and ran e.g.:
ZipCmd := 'gzip -f ' + RawFilename;
ZippedDS := PIPE(ZipCmd,rec);



However, this approach won't work if the hthor node and the LZ are on different IPs, because gzip will have no direct access to the URL.
A possible solution to that is to use an OUTPUT statement to write the file out as 1 part in hthor, instead of the Despray. Then get the path to the [single] physical part belonging to the new file and zip it.
I haven't tried it, but something like may work:

IMPORT STD.System.Thorlib;

inRecDef := * define to record def. of in inupt file *
inLogicalFilename := '~thor::4xx_res_resp.xml';


MyDropZone := '/home/bxxxxx01/transfer/';
RawFilename := MyDropZone + '4xx_res_resp.tgz';

inDs := DATASET(inLogicalFilename, inRecDef, FLAT);

outputFilename := '~transfer::4xx_res_resp.xml';
writeStep := OUTPUT(inDs, , outputFilename , OVERWRITE);

ZipCmd := 'tar cfz ' + RawFilename + ' ' + thorlib.logicalToPhysical(outputFilename);

dummyRec := RECORD
string1 unused;
END;
ZippedDS := PIPE(ZipCmd, dummyRec);

ORDERED(
writeStep,
OUTPUT(ZippedDS) // NB: the created compressed tar file is going to created on the hthor node (which may not be same as LZ node)
);


In general, since things like this aren't navitely supported withint ECL and it's related to extracting data out of the platform, it would be best if possible to run your workflow steps of which this is part as an external script, where the ECL job(s) are steps in that process. i.e. step 1 runs ECL script with 'ecl run' that performs task and desprays. step 2 then uses bash/ssh to perform gzip on output.
jsmith
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 81
Joined: Tue Jul 19, 2011 12:58 pm

Tue Aug 10, 2021 8:43 am Change Time Zone

Thanks a lot jsmith.
I separated the gzip part from ECL and I am doing it in UNIX. I thought this to be a better approach.

Thanks and regards,
Akhilesh Badhri.
akhileshbadhri
 
Posts: 26
Joined: Thu Sep 22, 2016 12:15 pm


Return to ECL

Who is online

Users browsing this forum: No registered users and 1 guest

cron