Wed Dec 08, 2021 10:48 am
Login Register Lost Password? Contact Us


Thor shutting down

Questions related to node architecture, redundancy and system monitoring

Fri May 29, 2015 8:00 pm Change Time Zone

Hi

We seem to be having some issues with our Thor cluster. The system seems to be shutting down periodically. I have attached our thormaster Log. If somebody can help point me in the right direction it would be a great help.

Kind regards

David
Attachments
thormaster.log
(19.25 KiB) Downloaded 471 times
David Dasher
 
Posts: 56
Joined: Tue Feb 18, 2014 9:17 am

Fri May 29, 2015 9:25 pm Change Time Zone

Hello

Can someone tell us what might be causing the segfault? Slavelog attached.

Also, how do we replace a node with a new one?

Are there instructions?

Kind regards

David
Attachments
thorslave.1.2015_05_29.log
(52.25 KiB) Downloaded 471 times
David Dasher
 
Posts: 56
Joined: Tue Feb 18, 2014 9:17 am

Mon Jun 01, 2015 1:51 pm Change Time Zone

David,

Our Operations guys took a look at your logs and told me: "both logs point to 10.12.0.24 having problems...I would try and reboot that node ..if that does not correct the issue re-install the software"

And replacing a node is covered in our Systems Administrators Guide (PDF downloadable here: http://hpccsystems.com/download/docs/installation-and-administration). I found it on page 90.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1606
Joined: Wed Oct 26, 2011 7:40 pm


Return to System Health

Who is online

Users browsing this forum: No registered users and 1 guest