Sat Aug 18, 2018 11:55 pm
Login Register Lost Password? Contact Us


Resource limit spill: Heavyweight (2>1)

Questions around writing code and queries

Fri Sep 08, 2017 2:37 pm Change Time Zone

Hello,

Looking through a workunit that is taking a long time to run. It is dealing with large files and I keep seeing this message.

What does it mean?

Thanks.
georgeb2d
 
Posts: 93
Joined: Wed Dec 24, 2014 3:36 pm

Mon Sep 11, 2017 4:08 pm Change Time Zone

georgeb2d,

The term "spill" indicates that there's too much data at that point in the process to maintain it all in memory, so a "spill to disk" is happening. Disk I/O being the slowest part of computing, this is guaranteed to slow things down.

The general rule to follow to avoid as much of this as possible is to always make sure you're only working with just the data the process actually needs (using vertical slice TABLEs can help with that) at every step in your process. Another possible option would be to increase the number of nodes in your Thor cluster. And a third possibility, if this is caused by heavily skewed data, would be to change your process logic so the skew doesn't affect the processing.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1370
Joined: Wed Oct 26, 2011 7:40 pm

Tue Sep 12, 2017 8:51 pm Change Time Zone

Thanks. That is what I suspected but could not find any documentation to support my thesis.
georgeb2d
 
Posts: 93
Joined: Wed Dec 24, 2014 3:36 pm

Thu Sep 14, 2017 6:45 pm Change Time Zone

I am looking at a graph as it runs. Before a large part of it has run it already has the disk spills in parts that have not run as yet, and Resource limit spill: Heavyweight (2>1). This is before the program even knows how many records result from a join, etc.

That makes me think there is something else going on. It looks like the compiler is assuming there will be too much data, whether there is or is not.

I am working with a system where the CPU is the bottleneck when it does a disk spill, etc. Currently it is cost prohibitive to upgrade the CPU so I am wondering if there is a way to limit these disk spills to cases where these are really needed.
georgeb2d
 
Posts: 93
Joined: Wed Dec 24, 2014 3:36 pm

Thu Sep 14, 2017 9:15 pm Change Time Zone

I am wondering if there is a way to limit these disk spills to cases where these are really needed.
I would expect the developers to answer this with, "They are limited to only those cases where they are really needed." IOW, this is a situation to report through JIRA so the developers can have a direct look at what you're doing.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1370
Joined: Wed Oct 26, 2011 7:40 pm


Return to Programming

Who is online

Users browsing this forum: No registered users and 1 guest

cron