Fri Dec 03, 2021 3:36 am
Login Register Lost Password? Contact Us


File contained a line of length greater than 10485760 bytes

Comments and questions related to the Enterprise Control Language

Mon Oct 17, 2011 8:22 pm Change Time Zone

Hi there

I have the following code

ResourceRecord := RECORD, MAXLENGTH(8192)
STRING ip;
INTEGER rid;
STRING dns;
INTEGER volume;
END;

rdata := DATASET(rr_in_dir+'::'+rr_file, ResourceRecord,
CSV(MAXLENGTH(8192), SEPARATOR(['\t', ' '])));
OUTPUT(COUNT(rdata));

when I run it I get

<Error><source>eclagent</source><code>0</code><message>System error: 0: Graph[1], csvread[2]: SLAVE 10.92.xxx.xxx:6600: File ~rr_files::rrsets_20110801 contained a line of length greater than 10485760 bytes.</message></Error>

I have run python scripts and I have made sure the file does not contain lines larger than 8000 characters. Also all lines are tab separated and they all contain 4 attributes. I have tested the file before spraying and after spraying on the node. Everything looks ok.
Also the file detail from ecl watch are

Logical Name: rr_files::rrsets_20110801
Description:

Modification Time: 2011-10-17 16:17:02 (UTC/GMT)
Directory: /mnt/HPCCSystems/hpcc-data/thor/rr_files
Pathmask: rrsets_20110801._$P$_of_$N$
Workunit: D20111017-161701
Job Name: rrsets_20110801
Size: 4,972,697,527
Format: csv
MaxRecordSize: 8192
CsvSeparate: \t
CsvQuote: '
CsvTerminate: \n,\r\n


File Parts:

Number IP Size
1 10.xxx.xx.xx 2,486,348,712
2 10.xxx.xx.xx 2,486,348,815


This file belongs to following superfile(s):

rr_files::super1


Any ideas what might be wrong?
nvasil
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 105
Joined: Mon Oct 17, 2011 6:48 pm

Mon Oct 17, 2011 8:52 pm Change Time Zone

Hi.

Without knowing the dataset contents, it's conjecture, but see this response to a similar question: viewtopic.php?f=10&t=102&sid=af6f55cb58fd5d3a84df9e46ea98aee0#p307.

Tony
Tony Kirk
 
Posts: 17
Joined: Thu Jun 23, 2011 5:01 pm

Mon Oct 17, 2011 9:08 pm Change Time Zone

Thank's a lot

I did scan my file and it turns out there are quotes. So this is probably what is causing the problem. Since I have no control over what kind of character each line will have, is it possible to read every line as a string and then do the parsing/cleaning with ECL. What kind of spray can I do?
nvasil
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 105
Joined: Mon Oct 17, 2011 6:48 pm

Mon Oct 17, 2011 9:13 pm Change Time Zone

Before resorting to hand-parsing your data, you could try the suggestion of the empty QUOTE set to see if the file is readable as sprayed.
Tony Kirk
 
Posts: 17
Joined: Thu Jun 23, 2011 5:01 pm

Mon Oct 17, 2011 9:22 pm Change Time Zone

It did work,

Maybe it is better to have the default quote empty

Thanks a lot
nvasil
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 105
Joined: Mon Oct 17, 2011 6:48 pm

Mon Oct 17, 2011 9:29 pm Change Time Zone

Excellent.
Tony Kirk
 
Posts: 17
Joined: Thu Jun 23, 2011 5:01 pm

Fri Oct 21, 2011 7:23 pm Change Time Zone

Here is an interesting fact.

I have sprayed my files setting the Quote to nothing. Each file can be read as a dataset without any problem. When I combine them to a superfile I get an error "File ~rr::super1 contained a line of length greater than 10485760 bytes". This is exactly the error I was getting when I wasn't setting quote to nothing. It seems to me that something is going wrong when you combine the files into a super file. Most likely the quote for the superfile is set to the default '. Is there a way to change it?

Nick
nvasil
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 105
Joined: Mon Oct 17, 2011 6:48 pm


Return to ECL

Who is online

Users browsing this forum: No registered users and 1 guest

cron