Wed Aug 15, 2018 1:45 am
Login Register Lost Password? Contact Us


Directory of Zipped Logfiles to HPCC?

Post questions or comments on how best to manage your big data problem

Thu Jun 30, 2011 4:14 pm Change Time Zone

Hey,

I have a directory containing thousands of zipped weblog files. Each zip contains one named file - that file contains thousands of lines of weblog info (in a slightly hickey CSV format).

What is the easiest way to get this (unzipped and) 'sprayed' into the VM? I don't mind if the unzipping happens pre or post spray and I don't mind when the concat happens - just looking for the easiest way to get the data in there ....

David
dabayliss
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 109
Joined: Fri Apr 29, 2011 1:35 pm

Mon Jul 11, 2011 8:03 pm Change Time Zone

Hi David,

I would definitely get the files unzipped and concatenated prior to the spray. That's just me, I go with what I know, but perhaps someone else out here knows of a more elegant way.

Cheers,

Bob
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 975
Joined: Wed Jun 29, 2011 7:13 pm

Mon Jul 11, 2011 8:18 pm Change Time Zone

David,

Richard Taylor also added:

The ProgGuide article “Working with BLOBs” tells how to spray multiple files to a single file in Thor, but I have no idea how to unzip them all easily – it would have to be done PRE-spray.

Hope this helps!

Bob
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 975
Joined: Wed Jun 29, 2011 7:13 pm

Mon Aug 01, 2011 3:55 pm Change Time Zone

Well - I finally found the time to solve my own problem - recorded here for others with the same issue:

1) 7zip is a program that will batch-unzip files for you - it also tackles .gz files which is useful for people dealing with weblogs coming from a Linux based apache. I extracted all my zips into a data directory - which gave me a gazillion little logs

2) copy *.log all.xlog then produced one concatened csv file

3) Uploaded to the landing zone

4) Sprayed using the normal crlf stuff as a seperator

David
dabayliss
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 109
Joined: Fri Apr 29, 2011 1:35 pm


Return to Managing Big Data

Who is online

Users browsing this forum: No registered users and 1 guest

cron