Sat Aug 18, 2018 2:35 pm
Login Register Lost Password? Contact Us


Spraying Delimited CSV

Post questions or comments on how best to manage your big data problem

Thu Apr 03, 2014 7:41 pm Change Time Zone

I'm trying to spray a delimited file with integer values where null values equal to 0.
The "Spray Delimited" function on ECL Watch seems to be ignoring the delimiter and it seems to reading the integers as base 256 instead of base 10 and is having issues with variable length data values. I tried using a comma delimited file as well as a tab delimited file and had the same issues with both.
System Specs:
  • Server
    • Nodes: 1
    • Processor: Intel Core i5 vPro
    • RAM: 4gb
    • Hard Drive: 64gb SSD
    • OS: Ubuntu 13.10 Server 64 bit
    • Server Platform: Community 4.2.2-1 for Ubuntu 13.10 64 bit
    • Client Tools: Not Installed
    • Graph Control: Not Installed
    • Ganglia Monitoring: Not Installed
  • Client
    • Manufacturer: Hewlett-Packard
    • Processor: Intel Core i7
    • RAM: 8gb
    • Hard Drive: 64gb SSD & 500gb HDD
    • OS: Windows 7 64 bit
    • ECL IDE: 4.2.2-1 for Windows x86
    • Internet Browser: Google Chrome x86 Version 33.0.1750.154 m
Data File: Transactions_April2014_HeaderData.csv
Code: Select all
35500555,2014,4,1,10,5,302009,0,02197881679,308,956
35904820,2014,4,1,9,1,352043,0,10152186187,404,0
10078085,2014,4,1,16,3,472017,54661600,10055782374,542,920
18701205,2014,4,1,14,1,235872,0,09440308519,748,0
32303558,2014,4,1,8,1,188447,0,0,523,0

Spray Delimited Options:
  • Source
    • Machine/dropzone: localhost/mydropzone
    • IP Address: ***.***.***.***
    • Local Path: /var/lib/HPCCSystems/mydropzone/Transactions_April2014_HeaderData.csv
    • Format: ASCII
    • Max Record Length: 8192
    • Separators: \,
    • No Separator: {FALSE}
    • Escape: {NULL}
    • Line Terminators: \n,\r\n
    • Quote: '
  • Destination
    • Group: mythor
    • Label: Transactions::April2014::HeaderData
    • Mask: Transactions::April2014::HeaderData._$P$_of_$N$
    • Prefix: {NULL}
  • Options
    • Overwrite: {TRUE}
    • No Split: {FALSE}
    • Compress: {FALSE}
DFU Workunit Details
  • ID: D20140403-073627
  • ClusterName: thor
  • DFUServerName: mydfuserver
  • Queue: dfuserver_queue
  • Protected: {FALSE}
  • Command: Spray
  • TimeStarted: 2014-04-03 12:36:27
  • TimeStopped: 2014-04-03 12:36:30
  • PercentDone: View Progress
  • ProgressMessage: 100% Done, 0 secs left (252/252bytes) [1/1nodes]
  • SummaryMessage: Total time taken 2 secs
  • State: finished
  • SourceIP: ***.***.***.***
  • SourceFilePath: /var/lib/HPCCSystems/mydropzone/Transactions.April2014.HeaderData.csv
  • SourceFormat: Delimited
  • SourceNumParts: 1
  • SourceDirectory: /var/lib/HPCCSystems/mydropzone
  • SourceSeparators: \,
  • SourceTerminators: \n,\r\n
  • SourceQuote: '
  • DestLogicalName: transactions::april2014::headerdata
  • DestGroupName: mythor
  • DestDirectory: /var/lib/HPCCSystems/hpcc-data/thor/transactions/april2014/
  • DestNumParts: 1
  • MonitorSub: 0
  • Overwrite: 1
  • Replicate: 0
  • Compress: 0
  • AutoRefresh: 0
ViewData.jpg
ViewData.jpg (154.03 KiB) Viewed 3893 times

HeaderFile/Layout_Header
Code: Select all
EXPORT Layout_Header := RECORD
   INTEGER4 TransID;
   INTEGER2 tYear;
   INTEGER1 tMonth;
   INTEGER1 tDay;
   INTEGER1 StoreID;
   INTEGER1 RegID;
   INTEGER3 TrxID;
   INTEGER4 AcctID;
   INTEGER5 CustID;
   INTEGER2 Cashier;
   INTEGER2 Manager;
END;

HeaderFile/File_HeaderFile
Code: Select all
IMPORT HeaderFile;
EXPORT File_HeaderFile :=
DATASET('~transactions::april2014::headerdata',HeaderFile.Layout_Header,THOR);

Execute HeaderFile/File_HeaderFile
  • Message:
    • Error:
    • System error: 1:
    • File /var/lib/HPCCSystems/hpcc-data/thor/transactions/april2014/headerdata.csv._1_of_1
    • size is 252 which is not a multiple of 26
    • (0, 0)
  • Code: 1
  • Location: {NULL}
Added 8 spaces to end of Transactions_April2014_HeaderData.csv to bypass error.
Re-uploaded and re-sprayed the file.
Re-executed HeaderFile/File_HeaderFile.
Output:
DataOutput.jpg
DataOutput.jpg (139.13 KiB) Viewed 3893 times

All values are incorrect and number of records has doubled.
Based on the values it seems as if each character is being imported as an integer based on its ASCII value (base 256).
Is this the correct interpretation of what is going on?
If so, is there a way to import integers without converting to base 256?
kmier
 
Posts: 5
Joined: Fri Mar 28, 2014 6:32 pm

Fri Apr 04, 2014 8:49 am Change Time Zone

Hi,

you have sprayed a delimited file and while reading it you are using 'THOR'
This is the reason for that Error

change the code in HeaderFile/File_HeaderFile as shown below
Code: Select all
IMPORT HeaderFile;
EXPORT File_HeaderFile :=
DATASET('~transactions::april2014::headerdata',HeaderFile.Layout_Header, csv(separator(',')));


Regards,
Sameer
sameermsc
 
Posts: 66
Joined: Wed Oct 05, 2011 10:09 am

Fri Apr 04, 2014 3:18 pm Change Time Zone

That fixed it. Thanks a ton.
kmier
 
Posts: 5
Joined: Fri Mar 28, 2014 6:32 pm

Thu Dec 31, 2015 12:24 am Change Time Zone

Thank you. worked for me too!
wroales
 
Posts: 1
Joined: Tue Dec 29, 2015 3:03 am


Return to Managing Big Data

Who is online

Users browsing this forum: No registered users and 1 guest

cron