Thu Sep 19, 2019 2:50 pm
Login Register Lost Password? Contact Us


Missing Records reading a Sprayed File

Comments and questions related to the Enterprise Control Language

Mon Sep 02, 2019 2:17 pm Change Time Zone

Hi,

I've sprayed a file and have tried to read it using a basic definition:

ds_in := DATASET(inputFilename, in_layout, CSV(MAXLENGTH(2000000), SEPARATOR(','),terminator(['\r\n']),quote('"')),OPT);


When I run a count on this it is short of 131 records compared to the file I started with.

If I change the Separator to something random (such as *) then it returns the correct record count. I've carried out a JOIN to look for records that are missed, when I check them in the raw file there is nothing peculiar about them, or the preceding records.

There are no 'random' characters, no extra-line breaks, no stray Quote's (single or double)

This seems to be a bug with the CSV definition? Does anyone have any further suggestions I could follow to pinpoint the root cause of these records being missed from the Dataset?
SChatman85
 
Posts: 2
Joined: Mon Sep 02, 2019 2:13 pm

Tue Sep 03, 2019 2:51 pm Change Time Zone

I have carried out some further testing on this, and found that if I put the separator as it should be, and override QUOTE then I can get the correct record count.

Code: Select all
ds_in := DATASET(inputFilename, in_layout, CSV(MAXLENGTH(2000000), SEPARATOR(','),terminator(['\r\n']),quote('')),OPT);


(If you Omit QUOTE, as per the documentation it will default to what was used during Spray)

The adjacent records in the file when viewed outside of HPCC did not contain any quotes - so not 100% certain why the records were lost but assume it is somehow related to the 4 ' marks within the data file I was using.
SChatman85
 
Posts: 2
Joined: Mon Sep 02, 2019 2:13 pm

Tue Sep 03, 2019 3:22 pm Change Time Zone

SChatman85,

I agree that it is probably the single quote characters in your data causing the missing 131 records. The most likely reason is that the "missing" records are between the two "pairs" of single quotes.

Using the QUOTE('') option is your workaround, but I suggest you report this issue in JIRA (http://track.hpccsystems.com) and attach your data file to the report (if you can legally do so) to make it easy for the developers to duplicate the problem.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1475
Joined: Wed Oct 26, 2011 7:40 pm


Return to ECL

Who is online

Users browsing this forum: Bing [Bot] and 1 guest