Sun Oct 25, 2020 10:28 am
Login Register Lost Password? Contact Us


Depray into compressed arcive

Comments and questions related to the Enterprise Control Language

Tue May 12, 2020 8:20 pm Change Time Zone

Hi everybody,

I need to despray and archive a bunch of very large files (around 20 TB each) and looking for option to compress them on the fly to save both time and space.

Is there any option to despray and compress files?

Alternatively, maybe there is a Linux feature which can help? (I think the named pipes can solve this - but I didn't use this feature for ages)
oleg
 
Posts: 43
Joined: Fri Sep 21, 2012 9:41 am

Wed May 13, 2020 3:49 pm Change Time Zone

Hi Oleg,

Long time no hear from you.

Have you investigated PIPE, executing an external compress command (7za say) on each node and somehow its stdout is directed to your target box.

Just a guess, I've not tried it myself.

Cheers, all the best.

Allan
Allan
 
Posts: 431
Joined: Sat Oct 01, 2011 7:26 pm

Wed May 13, 2020 5:06 pm Change Time Zone

Hi,

the platform does not currently support despraying to compressed formats.

There are some Linux filing system types that support on-the-fly compression, such that you can configure a folder as compressed and it will compress all output to it as it is written.

Alternatively, as Allan has suggested, you could use PIPE to use a command line tool that accepts data from stdin. If this is a distributed file, you would end up with N part outputs, but you could ensure the 1st had all the data, by using DISTRIBUTE(theds, 0);

Hope that helps.
jsmith
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 77
Joined: Tue Jul 19, 2011 12:58 pm

Wed May 13, 2020 7:21 pm Change Time Zone

Hello Oleg,

Even though it is not precisely "on-the-fly", the code below is an example with PIPE that can be run on the playground and may be useful to accomplish your end goal in a "semi-automated" way.

HTH,
HugoW

Code: Select all
IMPORT STD;

rec := RECORD
  string name;
END;


Despray := Std.File.Despray('~test::hmw::despray_compress',
                            '10.0.0.90',   
                            '/var/lib/HPCCSystems/mydropzone/despray_file',
                            -1,
                            'https://10.0.0.90:18010/FileSpray',
                             1,
                             TRUE);
                                             
MyDropZone  := '/var/lib/HPCCSystems/mydropzone/';                        
RawFilename := MyDropZone + 'despray_file';
//ZipFilename := MyDropZone + 'compressed_file'; //Optional, to maintain the original file
   
ZipCmdRaw   := 'ssh 10.0.0.90 "' +
             // 'cp ' + RawFileName + ' ' + ZipFilename + ' && gzip -f ' + ZipFilename + '";';  //Optional, to maintain the original file
               'gzip -f ' + RawFilename + '";';
   
ZipCmd    := 'bash -c \'' + ZipCmdRaw + '\'';
ZippedDS  := PIPE(ZipCmd,rec);
   
ORDERED(Despray,OUTPUT(ZippedDS));
hwatanuki
 
Posts: 18
Joined: Mon Apr 15, 2019 1:22 am

Thu May 14, 2020 6:15 am Change Time Zone

But as you say hwatanuki,

Its not 'on-the-fly'.
Oleg is attempting to avoid the despray of uncompressed data.

Oleg,

There are compression attributes in the repo, you could compress every field before despraying, perhaps also use STD.Str.EncodeBase64 on top, if the output could not be binary?

Yours

Allan
Allan
 
Posts: 431
Joined: Sat Oct 01, 2011 7:26 pm

Thu May 28, 2020 3:24 pm Change Time Zone

Thank you very much, guys!
FYI: we decided that the simplest way is to go with the standard UNIX compressed drive feature. Hopefully, compression ratio on it will be good enough.
oleg
 
Posts: 43
Joined: Fri Sep 21, 2012 9:41 am


Return to ECL

Who is online

Users browsing this forum: Google [Bot] and 2 guests

cron