Recovering from an unplanned failure in a Metro Mirror or Global Mirror environment
About this task
This scenario is based on a PowerHA environment with external storage Metro Mirror, or Global Mirror. Below is an example of the environment that is used in this scenario using Global Mirror, although the same sequence of steps apply to a Metro Mirror environment:
In this scenario:
Node A is the Primary cluster node with the production copy of the IASP.
Node B is the Backup cluster node, with the mirror copy of the IASP.
Replication is occurring in the direction from Node A to Node B.
A site failure in Data Center A causes the need to fail over production to Node B making it the Primary cluster node.
Procedure
Did you know?
All of the steps below leave the data at Data Center A as-is. There is no automatic reverse replication from the new primary back to Data Center A.
It is recommended to follow these steps as soon as the failure at Data Center A occurs. The determination of where to restart the production workload can be decided at a later point in time (either to start production workloads at Data Center B, or wait and get Data Center A online). Performing this set of steps allows the option, and reduces your recovery time objective (RTO) in the event that the decision is made to restart the production workload at Data Center B.
Begin with the environment pictured above.
When an unplanned failure of Node A and/or External Storage 1 in Data Center A occurs, the following set of procedures should be followed to fail over to Data Center B:
Possible Failure Scenarios:
Automatic Failover from Node A to Node B
In many instances, Node A will send out a distress message as the node is going down. This indicates to Node B that it should take over and causes automatic failover processing to start.
Requirements for Automatic Failover:
The Cluster Resource Group (CRG) must have had a status of active prior to the failure.
The target node in the CRG recovery domain must have had a status of active in the CRG recovery domain prior to the failure.
The Metro Mirror or Global Mirror replication must have a copy status of active at the time of the failover.
The failing node must have an opportunity to send out a distress message.
There cannot be a QCST_CRG_CANCEL_FAILOVER policy disabling automatic failover for the type of failure event that occurred.
In the event that the requirements for an automatic failover are not met, the following procedure demonstrates the appropriate manual failover steps.
Automatic Failover Procedure
On the node that will become the new primary node (Node B): use either
WRKCLU
, option 6 Work with Cluster Nodes or theDSPCLUINF
command to display the current cluster node status.Verify that the status of the original primary node (Node A), is either
Inactive P
,Inactive
orFailed
.
Note: If the status of the node shows as Partition, continue to Resolving a Cluster Partition below.
3. If the status is Inactive P
, wait until the status changes to either Inactive
or Failed
, pressing F5=Refresh
to refresh the panel.
4. If a failover message queue is defined for either the CRG or the Cluster, a message will be present in the failover message queue asking to proceed or cancel the failover.
5. Wait for the PowerHA failover processing to complete.
6. Vary on the IASP on the new primary node (NODEB)
7. Once the vary on of the IASP is complete, data on Node B can be accessed.
8. For information on choosing a direction to restart replication see Restoring the Environment below.
Resolving a Cluster Partition
In some instances, Node A does not have the opportunity to send out a distress message before the node fails. In these instances, PowerHA is unable to automatically determine if the failure is a true failure of a system, or a temporary communication failure that will automatically resolve itself.
Resolving a Cluster Partition Procedure
On the node that will become the new primary node (Node B): use either
WRKCLU
, option 6 Work with Cluster Nodes or theDSPCLUINF
command to display the current cluster node status.Verify that the status of the original primary node (Node A), is
Partition
.Use the Change Cluster Node Entry (CHGCLUNODE) command with the *CHGSTS option to change the status of the node from Partition to Failed. This indicates to PowerHA that the node is actually down, and that the partition condition is not the result of a temporary network communication issue. In this example, the command is:
CHGCLUNODE CLUSTER(MYCLU) NODE(NODEA) OPTION(*CHGSTS)
Display the cluster resource groups using the Display CRG Information (DSPCRGINF) command by typing
DSPCRGINF
with no parameters.Follow the appropriate steps below, depending on the CRG status and primary node:
If the CRG status is Inactive and the primary node is the new primary node (NODEB). Continue on to step 6, varying on the IASP.
If the CRG status is Inactive and the primary node is the original primary node (NODEA). Continue to Detaching Replication below.
Vary on the IASP on the new primary node (NODEB)
7. Once the vary on of the IASP is complete, data on Node B can be accessed.
8. For information on choosing a direction to restart replication see Restoring the Environment below.
Detaching Replication
These steps should be followed only if there is still no access to the IASP on Node B after following the steps in Automatic Failover and Resolving a Cluster Partition.
The steps under Automatic Failover and Resolving a Cluster Partition only enable access to data on Node B when replication is active at the time of the failover processing. In instances where replication is not active at the time of failover processing, since the data at Data Center B may be back-level additional procedures are required.
Detaching Replication Procedure
Display the cluster resource groups using the Display CRG Information (DSPCRGINF) command by typing
DSPCRGINF
with no parameters.If the status of the CRG is anything other than
Inactive
, end the cluster resource group with theENDCRG
command. In this scenario, the command is:ENDCRG CLUSTER(MYCLU) CRG(MYCRG)
Detach replication by using the *DETACH option on the Change Session Command. See the appropriate procedure depending on the type of replication:
4. Vary on the IASP on Node B:
5. Once the vary on of the IASP is complete, data on Node B can be accessed.
6. For information on choosing a direction to restart replication see Restoring the Environment below.
Restoring the Environment
On the new primary cluster node (Node B): use either
WRKCLU
, option 6 Work with Cluster Nodes or theDSPCLUINF
command to display the current cluster node status.Verify that the status of all nodes is
Active
.If one or more nodes have a status of either
Inactive
orFailed
, start clustering by using the STRCLUNOD command. For example:STRCLUNOD CLUSTER(MYCLU) NODE(NODEA)
End the cluster resource group if it currently has a status of
Active
orIndoubt
by using theENDCRG
command. For example:ENDCRG CLUSTER(MYCLU) CRG(MYCRG)
Display the PowerHA replication session using the display session command. See the appropriate procedure depending on the type of replication:
6. Verify that the copy status is detached.
7. Verify the source node in the PowerHA session is the node that contains the copy of the data to keep. Data at the target node will be overwritten. If the source and target node are reversed, use the CHGCRG command to correct the source and target node. For example, if Node B is currently the primary and Node A is the backup node, but the desired copy of the IASP to keep is the copy on Node A, a command similar to the following would be used: CHGCRG CLUSTER(MYCLU)
CRG(MYCRG)
CRGTYPE(*DEV)
RCYDMNACN(*CHGCUR)
RCYDMN((NODEB *BACKUP 1 DATACTRB *SAME *NONE)
(NODEA *PRIMARY *LAST DATACTRA *SAME *NONE))
8. Verify the IASP is varied off on the target node.
9. Reattach the PowerHA session using the Change Session command. See the appropriate procedure depending on the type of replication:
10. A confirmation panel confirming the reattach is displayed. Verify that the source and target node are correct and press F16
to confirm.
11. Start the Cluster Resource Group using the STRCRG
command. For example: STRCRG CLUSTER(MYCLU) CRG(MYCRG)
12. If the current primary node is not the desired primary node, perform a switchover using the CHGCRGPRI
command. For example: CHGCRGPRI CLUSTER(MYCLU) CRG(MYCRG)
Privacy Policy | Cookie Policy | Impressum
From time to time, this website may contain technical inaccuracies and we do not warrant the accuracy of any posted information.
Copyright © Fortra, LLC and its group of companies. All trademarks and registered trademarks are the property of their respective owners.