Thu Aug 18, 2022 11:02 am
Login Register Lost Password? Contact Us

Please Note: The HPCC Systems forums are moving to Stack Overflow. We invite you to post your questions on Stack Overflow utilizing the tag hpcc-ecl (https://stackoverflow.com/search?tab=newest&q=hpcc-ecl). This legacy forum will be active and monitored during our transition to Stack Overflow but will become read only beginning September 1, 2022.



Performance with Dfuserver

Topics related to recommendations or questions on the design for HPCC Systems clusters

Tue Feb 18, 2014 4:57 pm Change Time Zone

Hi -
It seems like dfuserver takes a while at the very beginning to scan large files to identify the offsets for each node. Of course, this involves I/O and network time and can naturally take a while.

I'm wondering if searching for terminators (and separators) in quoted strings forces dfuserver to do a full scan on the file?
If so... does dfuserver have an option to indicate "quoted terminators" don't exist in the in-coming file - which would allow dfuserver to do a more streamlined generation of offsets (for a 10-node cluster, seek to 10%, find the next terminator, have offset... repeat...).

Thanks.
jwilt
 
Posts: 56
Joined: Wed Feb 27, 2013 7:46 pm

Fri Sep 19, 2014 6:02 pm Change Time Zone

Hi,

In early May 2014 it is implemented in HPCC 5.0.

Attila
AttilaV
 
Posts: 14
Joined: Fri Sep 19, 2014 3:55 pm

Thu May 28, 2015 8:51 pm Change Time Zone

Bumping this thread...

I see in the source code for the DFU server that there is something called "QuickPartitioner", which seems like the implementation asked about in the OP. How do I take advantage of this? Is there an argument to STD.File.SprayVariable or something? A flag to dfuplus?
alex
 
Posts: 38
Joined: Wed Feb 25, 2015 4:06 pm

Fri May 29, 2015 2:36 am Change Time Zone

Is the fix referred to above the "quotedTerminator" option in dfuplus?

The dfuplus usage statement shows:

spray options:
...
options for csv/delimited:
...
quotedTerminator=1|0 -- optional, default is 1 (quoted terminators in rows)

Thanks again.
jwilt
 
Posts: 56
Joined: Wed Feb 27, 2013 7:46 pm


Return to Clustering

Who is online

Users browsing this forum: No registered users and 1 guest

cron