Wed Aug 15, 2018 1:46 am
Login Register Lost Password? Contact Us


Swapping Back in a Failed Node

Topics related to recommendations or questions on the design for HPCC Systems clusters

Fri Oct 16, 2015 9:05 am Change Time Zone

Hi There,

We recently had an issue on one of the thor nodes and had to use swap node to get the cluster back in action, which worked perfectly.

We have rectified the issue on the original node and we want to swap it back again.

When we look in ECLWatch at Cluster Processes -> mythor, it does not have the original box in the list of processes.

It has it on the list of machines. (.29) but it isn't a spare so we can't use it as an option to swap back to.

Does anyone know the steps to swap a node back in? (back to the original that was swapped out). It is the same original IP.

See Attached Screen shot.

Thanks in advance

image.png
image.png (41 KiB) Viewed 1127 times
amillar
 
Posts: 14
Joined: Fri Oct 16, 2015 7:32 am

Fri Oct 16, 2015 2:21 pm Change Time Zone

Hi, I'm trying to verify proper procedure at the moment. In the meantime, would it be possible for you to respond with the version of the platform that you're running as well as the topology of the cluster?

To get the topology of the thor, please do the following:

1. Click on the Operations button in the top row of icons
2. Click on Target Clusters on the second row of options that appears.
3. Click on the name of the thor that you're trying to investigate and it should expand the list of components that are attached to that Thor.

Thanks.
clo
 
Posts: 51
Joined: Thu May 12, 2011 11:57 am

Mon Oct 19, 2015 10:22 am Change Time Zone

Hi There,

thanks for getting back to me, I have followed your instructions to get the information, if you need anything else then please let me know.

Thanks in advance

Platform Version :5.2.0-1

Topology :

node020021 Thor Master 192.168.20.21 localdomain Linux
node020024 Thor Slave
[mythor, 1] Swap Node 192.168.20.24 localdomain Linux
node020025 Thor Slave
[mythor, 2] Swap Node 192.168.20.25 localdomain Linux
node020026 Thor Slave
[mythor, 3] Swap Node 192.168.20.26 localdomain Linux
node020027 Thor Slave
[mythor, 4] Swap Node 192.168.20.27 localdomain Linux
node020028 Thor Slave
[mythor, 5] Swap Node 192.168.20.28 localdomain Linux
node020023 Thor Slave
[mythor, 6] Swap Node 192.168.20.23 localdomain Linux
node020024 Thor Slave
[mythor, 7] Swap Node 192.168.20.24 localdomain Linux
node020025 Thor Slave
[mythor, 8] Swap Node 192.168.20.25 localdomain Linux
node020026 Thor Slave
[mythor, 9] Swap Node 192.168.20.26 localdomain Linux
node020027 Thor Slave
[mythor, 10] Swap Node 192.168.20.27 localdomain Linux
node020028 Thor Slave
[mythor, 11] Swap Node 192.168.20.28 localdomain Linux
node020023 Thor Slave
[mythor, 12] Swap Node 192.168.20.23 localdomain Linux
node020024 Thor Slave
[mythor, 13] Swap Node 192.168.20.24 localdomain Linux
node020025 Thor Slave
[mythor, 14] Swap Node 192.168.20.25 localdomain Linux
node020026 Thor Slave
[mythor, 15] Swap Node 192.168.20.26 localdomain Linux
node020027 Thor Slave
[mythor, 16] Swap Node 192.168.20.27 localdomain Linux
node020028 Thor Slave
[mythor, 17] Swap Node 192.168.20.28 localdomain Linux
node020023 Thor Slave
[mythor, 18] Swap Node 192.168.20.23 localdomain Linux
node020024 Thor Slave
[mythor, 19] Swap Node 192.168.20.24 localdomain Linux
node020025 Thor Slave
[mythor, 20] Swap Node 192.168.20.25 localdomain Linux
node020026 Thor Slave
[mythor, 21] Swap Node 192.168.20.26 localdomain Linux
node020027 Thor Slave
[mythor, 22] Swap Node 192.168.20.27 localdomain Linux
node020028 Thor Slave
[mythor, 23] Swap Node 192.168.20.28 localdomain Linux
node020023 Thor Slave
[mythor, 24] Swap Node 192.168.20.23 localdomain Linux
node020024 Thor Slave
[mythor, 25] Swap Node 192.168.20.24 localdomain Linux
node020025 Thor Slave
[mythor, 26] Swap Node 192.168.20.25 localdomain Linux
node020026 Thor Slave
[mythor, 27] Swap Node 192.168.20.26 localdomain Linux
node020027 Thor Slave
[mythor, 28] Swap Node 192.168.20.27 localdomain Linux
node020028 Thor Slave
[mythor, 29] Swap Node 192.168.20.28 localdomain Linux
node020023 Thor Slave
[mythor, 30] Swap Node 192.168.20.23 localdomain Linux
node020024 Thor Slave
[mythor, 31] Swap Node 192.168.20.24 localdomain Linux
node020025 Thor Slave
[mythor, 32] Swap Node 192.168.20.25 localdomain Linux
node020026 Thor Slave
[mythor, 33] Swap Node 192.168.20.26 localdomain Linux
node020027 Thor Slave
[mythor, 34] Swap Node 192.168.20.27 localdomain Linux
node020028 Thor Slave
[mythor, 35] Swap Node 192.168.20.28 localdomain Linux
node020023 Thor Slave
[mythor, 36] Swap Node 192.168.20.23 localdomain Linux
node020024 Thor Slave
[mythor, 37] Swap Node 192.168.20.24 localdomain Linux
node020025 Thor Slave
[mythor, 38] Swap Node 192.168.20.25 localdomain Linux
node020026 Thor Slave
[mythor, 39] Swap Node 192.168.20.26 localdomain Linux
node020027 Thor Slave
[mythor, 40] Swap Node 192.168.20.27 localdomain Linux
node020028 Thor Slave
[mythor, 41] Swap Node 192.168.20.28 localdomain Linux
node020023 Thor Slave
[mythor, 42] Swap Node 192.168.20.23 localdomain Linux
node020024 Thor Slave
[mythor, 43] Swap Node 192.168.20.24 localdomain Linux
node020025 Thor Slave
[mythor, 44] Swap Node 192.168.20.25 localdomain Linux
node020026 Thor Slave
[mythor, 45] Swap Node 192.168.20.26 localdomain Linux
node020027 Thor Slave
[mythor, 46] Swap Node 192.168.20.27 localdomain Linux
node020028 Thor Slave
[mythor, 47] Swap Node 192.168.20.28 localdomain Linux
node020023 Thor Slave
[mythor, 48] Swap Node 192.168.20.23 localdomain Linux
node020024 Thor Slave
[mythor, 49] Swap Node 192.168.20.24 localdomain Linux
node020025 Thor Slave
[mythor, 50] Swap Node 192.168.20.25 localdomain Linux
node020026 Thor Slave
[mythor, 51] Swap Node 192.168.20.26 localdomain Linux
node020027 Thor Slave
[mythor, 52] Swap Node 192.168.20.27 localdomain Linux
node020028 Thor Slave
[mythor, 53] Swap Node 192.168.20.28 localdomain Linux
node020023 Thor Slave
[mythor, 54] Swap Node 192.168.20.23 localdomain Linux
node020024 Thor Slave
[mythor, 55] Swap Node 192.168.20.24 localdomain Linux
node020025 Thor Slave
[mythor, 56] Swap Node 192.168.20.25 localdomain Linux
node020026 Thor Slave
[mythor, 57] Swap Node 192.168.20.26 localdomain Linux
node020027 Thor Slave
[mythor, 58] Swap Node 192.168.20.27 localdomain Linux
node020028 Thor Slave
[mythor, 59] Swap Node 192.168.20.28 localdomain Linux
node020023 Thor Slave
[mythor, 60] Swap Node 192.168.20.23 localdomain Linux
node020023 Thor Spare 192.168.20.23 localdomain Linux
node020030 Thor Spare 192.168.20.30 localdomain Linux
amillar
 
Posts: 14
Joined: Fri Oct 16, 2015 7:32 am

Tue Oct 20, 2015 12:02 pm Change Time Zone

From our HPCC Systems team, here is the process:

1. Use configmgr tool to set up the node that was swapped out as a spare node.

2. Push out the change ( copy the updated environment.xml to all the nodes).

3. You must restart the components to make them aware of the **change**.

4. You may be able to run the “updtdalienv” cmd line tool in order to avoid restarting the dali.


/opt/HPCCSystems/bin/updtdalienv <path to the environment-xml-file> [-i <dali-ip>]

Assuming that the updated environment.xml file has been copied to </etc/HPCCSystems/environment.xml> on all the nodes.

The command should look like

[fernanux@node010241012201 ~]$ sudo /opt/HPCCSystems/bin/updtdalienv /etc/HPCCSystems/environment.xml -i 10.nnn.nnn.nnn
00000000 2015-10-20 07:05:09.710 52484 52484 "Environment and node groups updated in dali at 10.nnn.nnn.nnn:7070"
00000001 2015-10-20 07:05:09.710 52484 52484 "WARNING: New cluster layout for cluster thorxxx_spares
New cluster layout for cluster thorxxx_spares


HTH,

Bob (for Fernando)
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 975
Joined: Wed Jun 29, 2011 7:13 pm

Tue Oct 20, 2015 3:39 pm Change Time Zone

Make sure that after running "swap node", you sync up your data by running the backupnode utility from the thormaster node.

/opt/HPCCSystems/bin/start_backupnode
usage: /opt/HPCCSystems/bin/start_backupnode thor_cluster_name

In this example, the name of the thor cluster is thor200_100

/opt/HPCCSystems/bin/start_backupnode thor200_100
------------------------------
starting backupnode ...
Using backupnode directory /var/lib/HPCCSystems/hpcc-data/backupnode/last_backup
Reading slaves file /var/lib/HPCCSystems/thor200_100/backupnode.slaves
Scanning files from dali ...
------------------------------
------------------------------
Waiting for backup to complete
✔ complete at 11:30:54


note: This process could take some time to complete depending on the amount of data to be restored.

1- The data in the hpcc-mirror from the **replicate** node gets copied to the primary location on the "new node".
2- The hpcc-mirror directory on the "new node" gets populated with **replicate** data from the appropriate node.



Additionally, you should make sure the "fixed" node has:
1- The OS installed along with the matching HPCCSystems software build.
2- Empty thor data directories

    /var/lib/HPCCSystems/hpcc-data
    /var/lib/HPCCSystems/hpcc-mirror
fernando
 
Posts: 5
Joined: Thu Jun 19, 2014 1:29 pm

Thu Oct 22, 2015 8:56 am Change Time Zone

Hi Bob,

thanks for the quick reply its very much appreciated,

I have one more question though if you don't mind

Should we clear off all the hpcc-data off the node before we put it back in?

Thanks in advance
amillar
 
Posts: 14
Joined: Fri Oct 16, 2015 7:32 am

Thu Oct 29, 2015 11:19 am Change Time Zone

Yes!
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 975
Joined: Wed Jun 29, 2011 7:13 pm


Return to Clustering

Who is online

Users browsing this forum: No registered users and 1 guest

cron