Mon Dec 06, 2021 5:13 am
Login Register Lost Password? Contact Us


Record Length in Data Sparying

Comments and questions related to the Enterprise Control Language

Fri Oct 05, 2018 4:15 pm Change Time Zone

Hello,

How can we spray an external file (JSON or CSV) without knowing the record length of that file. Is there any way to find out the record length of such a file?

Thanks,
Shayan
sh.shmss
 
Posts: 4
Joined: Fri Aug 24, 2018 3:24 pm

Fri Oct 05, 2018 6:41 pm Change Time Zone

Shayan,

JSON and CSV files are inherently variable-length. That means you don't need to know the record length, you only need to know the maximum length of the longest record. But if you don't know that and the spray fails, you can just increase that (the default max is 8K) to whatever value you want -- the largest max length I've seen successfully used was 10 million bytes.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1606
Joined: Wed Oct 26, 2011 7:40 pm

Wed Oct 10, 2018 7:03 pm Change Time Zone

Thank you for your reply, Richard. I managed to spray the file. Now, I have problem defining and locating the data through dataset command. If I don't mention the record length in my layout, I'll get the following error:

Error: System error: 1301: Pool memory exhausted: pool id 4194304 exhausted, requested 6473 heap(1/4294967295) global(1/1216) (in Disk Read G1 E2) (0, 0), 1301,

If I define the record length for my records, I'll get the following error:

Error: System error: 1: File /var/lib/HPCCSystems/hpcc-data/thor/online/ss/project/hospitals._1_of_1 size is 313263 which is not a multiple of 177 (0, 0), 1,

No need to mention that I can't seem to find a total record size which is a factor of 313263 (1, 3, 9, 34807, 104421, 313263).

I'd really appreciate if you could assist me with defining my dataset.

Thanks,
Shayan
sh.shmss
 
Posts: 4
Joined: Fri Aug 24, 2018 3:24 pm

Thu Oct 11, 2018 4:40 pm Change Time Zone

Shayan,

Have you taken the online Intro to ECL (Part 1) course (https://learn.lexisnexis.com/hpcc)? Spraying and defining files is covered there.

Please include your code along with the error message you get so I can see what syntax you tried. Since you were spraying a variable-length record file, you need a RECORD structure that does not specify the length of the records. The error message is telling you that the record length you specified is incorrect.


HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1606
Joined: Wed Oct 26, 2011 7:40 pm

Thu Oct 11, 2018 6:50 pm Change Time Zone

Actually, I'm using the code that is introduced for defining data in that course:

Code: Select all

Layout_Hospitals := record
string hospital_name;
string provider_number;
string state;
string measure_name;
string number_of_discharges;
string footnote;
string excess_readmission_ratio;
string predicted_readmission_rate;
string expected_readmission_rate;
string number_of_readmissions;
string start_date;
string end_date;
end;

EXPORT Hospitals := dataset('~online::ss::project::hospitals',Layout_Hospitals,thor);



And this is the dataset I've already sprayed:
https://data.medicare.gov/Hospital-Comp ... /9n3s-kdb3

As always, I appreciate your help.

Shayan
sh.shmss
 
Posts: 4
Joined: Fri Aug 24, 2018 3:24 pm

Fri Oct 12, 2018 3:12 pm Change Time Zone

Shayan,

Your problem is your DATASET definition is telling the compiler that it's a "thor" file -- but it's not. It's a CSV file, so your DATASET should be:
Code: Select all
EXPORT Hospitals := dataset('~online::ss::project::hospitals',Layout_Hospitals,CSV);

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1606
Joined: Wed Oct 26, 2011 7:40 pm


Return to ECL

Who is online

Users browsing this forum: No registered users and 1 guest

cron