Wed Mar 21, 2018 1:08 pm
Login Register Lost Password? Contact Us

Thor swap failback

Topics related to recommendations or questions on the design for HPCC Systems clusters

Mon Aug 05, 2013 5:14 pm Change Time Zone

I have successfully swapped a failed Thor node for my backup node. Now the hard drive is repaired in the failed Thor node. How do I put it back in service?
Posts: 5
Joined: Wed Jul 31, 2013 1:54 pm

Tue Aug 06, 2013 6:01 pm Change Time Zone


We need to clarify a couple things before I can offer a proper response.

How did you run the swap node?
Does your system work now that you've swapped in the standby node for the failed node?
If it's working, then you can just add the repaired node back in as a standby node.
Posts: 51
Joined: Thu May 12, 2011 11:57 am

Wed Aug 07, 2013 1:09 am Change Time Zone

-I shut down thor and forced the swap manually from ESP.
-Yes, the system was working on the spare.
-I tried to configure the original server as a spare, but that did not seem to work. I received an error like the one below when I submitted a job that accessed the existing logical files. Do I need to force the cluster to copy the mirrored data back to this node somehow? (This also raises questions about why the mirror is not working)

10004: System error: 10004: Graph[1], SLAVE Graph[1], csvread[2]: No physical file part for logical file
Posts: 5
Joined: Wed Jul 31, 2013 1:54 pm

Wed Aug 07, 2013 3:00 pm Change Time Zone

Is/was replication turned on in the Thor Cluster?
(replicateOutputs and replicateAsync would need setting to true in configmgr)

The error ('No physical file part for logical file'), should list the location it looked for the part.. and if replication is on, it should state the path of the primary part and the path to replicate part on the buddy node..
Community Advisory Board Member
Community Advisory Board Member
Posts: 68
Joined: Tue Jul 19, 2011 12:58 pm

Thu Aug 08, 2013 4:01 am Change Time Zone

Yes, replication is enabled, but I think I found the problem. It looks like replication was disabled in the DFU jobs that placed the files into the thor cluster. I will have the DFU jobs run again with replication enabled and try again.
Posts: 5
Joined: Wed Jul 31, 2013 1:54 pm

Mon Aug 19, 2013 10:35 am Change Time Zone

Shouldn't the backupnode process have replicated the files anyway ?

How/when do we run backupnode on these systems - is it something the user has to do manually (or set up via cron), or is it automatic? Jake?
Community Advisory Board Member
Community Advisory Board Member
Posts: 105
Joined: Fri Jun 17, 2011 8:59 am

Return to Clustering

Who is online

Users browsing this forum: No registered users and 1 guest