Tue Dec 07, 2021 5:57 am
Login Register Lost Password? Contact Us


ECL Watch Spray delimited -- how does it work?

Comments and questions related to the Enterprise Control Language

Sat Mar 07, 2020 3:37 pm Change Time Zone

I am trying to upload and spray a CSV file.

In ECL Watch, in the "files>>landing zone" section, there is an option to spray delimited. In the drop-down associated with spray-delimited, there are several fields. One field is "Separators". Regards of what separator I have specified in that field, my file always loads with one record per line. That is it ignores the separator field.

I have tried this with commas and tabs (and even letters) delimiters without any effect. I understand there is a CSV option available in ECL https://hpccsystems.com/training/documentation/ecl-language-reference/html/CSV_Files.html. I will be trying this next.

However, my questions about spray-delimited still remain.
    What does it do?
    How do I use it?
vin
 
Posts: 28
Joined: Tue Feb 10, 2015 8:12 pm

Sat Mar 07, 2020 8:19 pm Change Time Zone

Hello Vin,

The behavior you describe below seems to be correct. Here are my two cents...

Overall, the goal of the spray operation is to partition the original data file you´d uploaded to the landing zone into as many "pieces" as there are Thor nodes in your target cluster and put each "piece" of the partitioned file into the disk of a respective Thor node.

In order to perform the delimited spray operation, the DFU needs to know, ideally, the size of your original data file and the character(s) for the line terminator. Based on these information, the original data file can then be partitioned more uniformly across the cluster nodes without "breaking" any records. You can also provide information about the separators at this point in time and this information can be used when you define your DATASET later in a ECL code.

After the delimited spray operation is performed (with or without the information about separator), if you look at the logical file content in ECL Watch, you will still see the record contents into a single line, as you describe. However, once you define the DATASET and its respective RECORD structure in a ECL code and OUTPUT its content, you will be able to see the fields properly separated. At this point, if you don´t provide specification about the separator during the definition of your DATASET in the ECL code, the separator information you had eventually provided during the spray operation will be leveraged automatically for you.

HTH
Hugo W.
hwatanuki
 
Posts: 28
Joined: Mon Apr 15, 2019 1:22 am


Return to ECL

Who is online

Users browsing this forum: No registered users and 1 guest

cron