Wed Aug 15, 2018 4:40 pm
Login Register Lost Password? Contact Us


Thor swap failback

Topics related to recommendations or questions on the design for HPCC Systems clusters

Mon Aug 05, 2013 5:14 pm Change Time Zone

I have successfully swapped a failed Thor node for my backup node. Now the hard drive is repaired in the failed Thor node. How do I put it back in service?
JSJ
 
Posts: 5
Joined: Wed Jul 31, 2013 1:54 pm

Tue Aug 06, 2013 6:01 pm Change Time Zone

Hi,

We need to clarify a couple things before I can offer a proper response.

How did you run the swap node?
Does your system work now that you've swapped in the standby node for the failed node?
If it's working, then you can just add the repaired node back in as a standby node.
clo
 
Posts: 51
Joined: Thu May 12, 2011 11:57 am

Wed Aug 07, 2013 1:09 am Change Time Zone

-I shut down thor and forced the swap manually from ESP.
-Yes, the system was working on the spare.
-I tried to configure the original server as a spare, but that did not seem to work. I received an error like the one below when I submitted a job that accessed the existing logical files. Do I need to force the cluster to copy the mirrored data back to this node somehow? (This also raises questions about why the mirror is not working)

10004: System error: 10004: Graph[1], SLAVE xxx.xxx.xxx.xxx:20100: Graph[1], csvread[2]: No physical file part for logical file
JSJ
 
Posts: 5
Joined: Wed Jul 31, 2013 1:54 pm

Wed Aug 07, 2013 3:00 pm Change Time Zone

Is/was replication turned on in the Thor Cluster?
(replicateOutputs and replicateAsync would need setting to true in configmgr)

The error ('No physical file part for logical file'), should list the location it looked for the part.. and if replication is on, it should state the path of the primary part and the path to replicate part on the buddy node..
jsmith
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 70
Joined: Tue Jul 19, 2011 12:58 pm

Thu Aug 08, 2013 4:01 am Change Time Zone

Yes, replication is enabled, but I think I found the problem. It looks like replication was disabled in the DFU jobs that placed the files into the thor cluster. I will have the DFU jobs run again with replication enabled and try again.
JSJ
 
Posts: 5
Joined: Wed Jul 31, 2013 1:54 pm

Mon Aug 19, 2013 10:35 am Change Time Zone

Shouldn't the backupnode process have replicated the files anyway ?

How/when do we run backupnode on these systems - is it something the user has to do manually (or set up via cron), or is it automatic? Jake?
richardkchapman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 108
Joined: Fri Jun 17, 2011 8:59 am


Return to Clustering

Who is online

Users browsing this forum: No registered users and 1 guest