Wed Jan 23, 2019 7:48 am
Login Register Lost Password? Contact Us


Compressing uncompressed files

Questions around writing code and queries

Mon Dec 17, 2018 8:23 pm Change Time Zone

Hi,
Is there a way to compress previously uncompressed logical files?
sajish
 
Posts: 2
Joined: Mon Dec 17, 2018 8:06 pm

Tue Dec 18, 2018 2:07 pm Change Time Zone

sajish,

The short answer to that is: yes and no. :)

One of the foundational principles of the HPCC platform is "never throw anything away (because you might need it later)" so you will find that you cannot overwrite on output a file that you used in that workunit as input. So no, you cannot compress a previously uncompressed logical file, but you CAN read that uncompressed data and write it to a new compressed logical file. Like this:
Code: Select all
OUTPUT(uncompressedDataset,,'NewCompressedfilename',COMPRESSED);
Once you've done that, THEN you can delete the original uncompressed file if you need/want to.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1404
Joined: Wed Oct 26, 2011 7:40 pm

Tue Dec 18, 2018 2:53 pm Change Time Zone

rtaylor wrote:sajish,

The short answer to that is: yes and no. :)

One of the foundational principles of the HPCC platform is "never throw anything away (because you might need it later)" so you will find that you cannot overwrite on output a file that you used in that workunit as input. So no, you cannot compress a previously uncompressed logical file, but you CAN read that uncompressed data and write it to a new compressed logical file. Like this:
Code: Select all
OUTPUT(uncompressedDataset,,'NewCompressedfilename',COMPRESSED);
Once you've done that, THEN you can delete the original uncompressed file if you need/want to.

HTH,

Richard


Thanks Richard, I already have zeroed in on this solution but perhaps wanted to find out if this process was readily available in any function, especially because I need to compress all the uncompressed files in the cluster, I will have to get the list of uncompressed files and need to deal with the list one by one, figuring out the record structure and reading the file and then output as a uncompressed file. :roll:
sajish
 
Posts: 2
Joined: Mon Dec 17, 2018 8:06 pm

Wed Dec 19, 2018 7:47 pm Change Time Zone

sajish,
I will have to get the list of uncompressed files and need to deal with the list one by one, figuring out the record structure and reading the file and then output as a uncompressed file
OK, this is not a complete solution, but it should help you get a long way down the road:
Code: Select all
IMPORT Std;
//Get list of all non-superfiles:
AllFiles := STD.File.LogicalFileList();
// NOTHOR(AllFiles);

//Then filter out all the already-compressed files:
UnCompressed :=
  AllFiles(STD.File.GetLogicalFileAttribute('~'+name,'blockCompressed')='');
// NOTHOR(UnCompressedFiles);

//Then filter out all the sub-files of superfiles
// (which can't be deleted without first removing them from their superfiles)
NonSF := UnCompressed(NOT EXISTS(STD.File.LogicalFileSuperowners('~'+name)));
// NOTHOR(NonSF);

//Then get just the filenames, Record Structures, and file types:
NameStruct :=
  TABLE(NonSF,{name,
               STRING RecStruct := STD.File.GetLogicalFileAttribute('~'+name,'ECL'),
               STRING FileType  := STD.File.GetLogicalFileAttribute('~'+name,'kind')
         });
NOTHOR(NameStruct);
That result gets you to the point of actually writing the code to do an OUTPUT,COMPRESSED on all these files and then delete the uncompressed files. Of course, once you've done that you'll also need to find all the ECL DATASET declarations for the deleted files and update them with the new filenames and the __COMPRESSED__ option.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1404
Joined: Wed Oct 26, 2011 7:40 pm


Return to Programming

Who is online

Users browsing this forum: No registered users and 1 guest

cron