Missing Records reading a Sprayed File

Comments and questions related to the Enterprise Control Language

I've sprayed a file and have tried to read it using a basic definition:

ds_in := DATASET(inputFilename, in_layout, CSV(MAXLENGTH(2000000), SEPARATOR(','),terminator(['\r\n']),quote('"')),OPT);

When I run a count on this it is short of 131 records compared to the file I started with.

If I change the Separator to something random (such as *) then it returns the correct record count. I've carried out a JOIN to look for records that are missed, when I check them in the raw file there is nothing peculiar about them, or the preceding records.

There are no 'random' characters, no extra-line breaks, no stray Quote's (single or double)

This seems to be a bug with the CSV definition? Does anyone have any further suggestions I could follow to pinpoint the root cause of these records being missed from the Dataset?
I have carried out some further testing on this, and found that if I put the separator as it should be, and override QUOTE then I can get the correct record count.

ds_in := DATASET(inputFilename, in_layout, CSV(MAXLENGTH(2000000), SEPARATOR(','),terminator(['\r\n']),quote('')),OPT);

(If you Omit QUOTE, as per the documentation it will default to what was used during Spray)

The adjacent records in the file when viewed outside of HPCC did not contain any quotes - so not 100% certain why the records were lost but assume it is somehow related to the 4 ' marks within the data file I was using.
I agree that it is probably the single quote characters in your data causing the missing 131 records. The most likely reason is that the "missing" records are between the two "pairs" of single quotes.

Using the QUOTE('') option is your workaround, but I suggest you report this issue in JIRA ( and attach your data file to the report (if you can legally do so) to make it easy for the developers to duplicate the problem.


