Thu Jul 07, 2022 6:36 am
Login Register Lost Password? Contact Us

Please Note: The HPCC Systems forums are moving to Stack Overflow. We invite you to post your questions on Stack Overflow utilizing the tag hpcc-ecl (https://stackoverflow.com/search?tab=newest&q=hpcc-ecl). This legacy forum will be active and monitored during our transition to Stack Overflow but will become read only beginning September 1, 2022.



Directory of Zipped Logfiles to HPCC?

Post questions or comments on how best to manage your big data problem

Thu Jun 30, 2011 4:14 pm Change Time Zone

Hey,

I have a directory containing thousands of zipped weblog files. Each zip contains one named file - that file contains thousands of lines of weblog info (in a slightly hickey CSV format).

What is the easiest way to get this (unzipped and) 'sprayed' into the VM? I don't mind if the unzipping happens pre or post spray and I don't mind when the concat happens - just looking for the easiest way to get the data in there ....

David
dabayliss
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 109
Joined: Fri Apr 29, 2011 1:35 pm

Mon Jul 11, 2011 8:03 pm Change Time Zone

Hi David,

I would definitely get the files unzipped and concatenated prior to the spray. That's just me, I go with what I know, but perhaps someone else out here knows of a more elegant way.

Cheers,

Bob
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1006
Joined: Wed Jun 29, 2011 7:13 pm

Mon Jul 11, 2011 8:18 pm Change Time Zone

David,

Richard Taylor also added:

The ProgGuide article “Working with BLOBs” tells how to spray multiple files to a single file in Thor, but I have no idea how to unzip them all easily – it would have to be done PRE-spray.

Hope this helps!

Bob
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1006
Joined: Wed Jun 29, 2011 7:13 pm

Mon Aug 01, 2011 3:55 pm Change Time Zone

Well - I finally found the time to solve my own problem - recorded here for others with the same issue:

1) 7zip is a program that will batch-unzip files for you - it also tackles .gz files which is useful for people dealing with weblogs coming from a Linux based apache. I extracted all my zips into a data directory - which gave me a gazillion little logs

2) copy *.log all.xlog then produced one concatened csv file

3) Uploaded to the landing zone

4) Sprayed using the normal crlf stuff as a seperator

David
dabayliss
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 109
Joined: Fri Apr 29, 2011 1:35 pm


Return to Managing Big Data

Who is online

Users browsing this forum: No registered users and 1 guest

cron