Tue Aug 21, 2018 11:58 pm
Login Register Lost Password? Contact Us


Cluster not updating with new environment.xml

Topics related to recommendations or questions on the design for HPCC Systems clusters

Mon Jun 09, 2014 11:55 pm Change Time Zone

clo wrote:Hi, I was wondering what version of the platform you're currently running as well.


Sorry, I must have missed this! I am running "community_4.2.4-3".
fmorstatter
 
Posts: 10
Joined: Thu Jun 05, 2014 8:28 pm

Tue Jun 10, 2014 8:00 am Change Time Zone

0000001C 2014-06-09 14:51:38.188 8000 8000 "Listening for graph"
0000001D 2014-06-09 14:51:38.191 8000 8013 "WARNING: /var/lib/jenkins/workspace/CE-Candidate-with-plugins-4.2.4-3/CE/ubuntu-12.04-amd64/HPCC-Platform/system/mp/mpcomm.cpp(2225) : CInterCommunicator: ignoring closed endpoint: 128.2.219.77:20100"


I think that suggests that the slave started, registered and then immediately exited/crashed.

Can you attach some slave logging around this time frame?
jsmith
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 70
Joined: Tue Jul 19, 2011 12:58 pm

Tue Jun 10, 2014 1:30 pm Change Time Zone

You are right! Here is the error in the log:

0000000A 2014-06-10 09:28:33.871 23215 23215 "Disk space: /var/lib/HPCCSystems/hpcc-data/thor = 871987 MB, /var/lib/HPCCSystems/hpcc-mirror/thor = 871987 MB, /var/lib/HPCCSystems/mythor/temp = 871987 MB"
0000000B 2014-06-10 09:28:33.871 23215 23215 "ThorSlave Version LCR - 4.1 started"
0000000C 2014-06-10 09:28:33.871 23215 23215 "Slave 128.2.219.77:20100 - temporary dir set to : /var/lib/HPCCSystems/mythor/temp/"
0000000D 2014-06-10 09:28:33.871 23215 23215 "Using querySo directory: /var/lib/HPCCSystems/queries/mythor_20100"
0000000E 2014-06-10 09:28:33.871 23215 23215 "WARNING: Slave has less memory than master node"
0000000F 2014-06-10 09:28:33.871 23215 23215 "RoxieMemMgr: Setting memory limit to 9445572608 bytes (9008 pages)"
00000010 2014-06-10 09:28:33.872 23215 23215 "RoxieMemMgr: posix_memalign (alignment=1048576, size=9462349824) failed - ret=12 (ENOMEM There was insufficient memory to fulfill the allocation request.)"
00000011 2014-06-10 09:28:33.872 23215 23215 "ERROR: 1303: /var/lib/jenkins/workspace/CE-Candidate-with-plugins-4.2.4-3/CE/ubuntu-12.04-amd64/HPCC-Platform/thorlcr/slave/thslavemain.cpp(417) : ThorSlave : RoxieMemMgr: Unable to create heap"
00000012 2014-06-10 09:28:33.872 23215 23215 "temp directory cleared"
fmorstatter
 
Posts: 10
Joined: Thu Jun 05, 2014 8:28 pm

Tue Jun 10, 2014 4:44 pm Change Time Zone

Right, the handling of that error looks like it could certainly be improved (I've opened a new JIRA issue [HPCC-11651] to track)

Thor automatically configures the amount of memory the slaves use by examining the amount of physical memory and dedicating 75% of it for itself.
It assumes that the master/slaves are all homogeneous.

Looks like your master has 12GB in this case (Thor decided to use ~9GB of it), which seems to be more than yours slaves have.

You can manually configure how much nodes use by setting 'globalMemorySize' in the environment. You probably want to set it to 75% of the physical memory of your slave).
The master will use the same property if set, or you can override by defining 'masterMemorySize'

Hope that helps.
jsmith
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 70
Joined: Tue Jul 19, 2011 12:58 pm

Tue Jun 10, 2014 4:59 pm Change Time Zone

Thank you for the reply! When you say "set the property" do you mean that it is an environment variable in the system, or do I set it using the configuration manager?
fmorstatter
 
Posts: 10
Joined: Thu Jun 05, 2014 8:28 pm

Tue Jun 10, 2014 5:02 pm Change Time Zone

It's a HPCC environment.xml setting, which you can configure using the config manager.
jsmith
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 70
Joined: Tue Jul 19, 2011 12:58 pm

Tue Jun 10, 2014 9:51 pm Change Time Zone

Thank you! This did the trick. The cluster is up and running.
fmorstatter
 
Posts: 10
Joined: Thu Jun 05, 2014 8:28 pm

Previous

Return to Clustering

Who is online

Users browsing this forum: No registered users and 1 guest