Wed Aug 15, 2018 4:39 pm
Login Register Lost Password? Contact Us


Record size restriction by RAM?

Topics related to recommendations or questions on the design for HPCC Systems clusters

Thu Feb 05, 2015 1:44 am Change Time Zone

Hi -
For parsing very large text files in Thor -
If files are loaded 1 file/record - does the node require sufficient RAM to hold the entire record at once?
Or would a record larger than RAM fail the workunit?

Naturally, if splitting the raw file into smaller parts across multiple records is possible, that would presumably avoid this issue (though possibly still spilling to disk).

Thanks.
jwilt
 
Posts: 50
Joined: Wed Feb 27, 2013 7:46 pm

Thu Feb 05, 2015 1:14 pm Change Time Zone

Hi James,

What happens if the record size exceeds the RAM is you get a disk spill, in other words, part of the disk drive is used as temporary RAM. You can see this by looking at the graph, it will clearly show "disk spill" in the process. Of course, this means that your job may slow down due to the additional I/O.

The rule of HPCC is that a record structure never overlaps onto another node.

HTH,

Bob
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 975
Joined: Wed Jun 29, 2011 7:13 pm

Thu Feb 05, 2015 4:55 pm Change Time Zone

Jim,

If you have individual records that are larger than the amount of RAM you have on each node, then I would strongly suggest adding more RAM per node until you can at least fit one record in RAM (and more, if possible).

With 64-bit Linux you can put a lot more RAM one each box than previously possible. I've seen boxes that handle up to 256Gb RAM, and I'm pretty sure I'm probably out of date on that figure. :)

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1368
Joined: Wed Oct 26, 2011 7:40 pm


Return to Clustering

Who is online

Users browsing this forum: No registered users and 1 guest