About this task
This scenario is based on a PowerHA environment with external storage Metro Mirror, or Global Mirror. Below is an example of the environment that is used in this scenario using Global Mirror, although the same sequence of steps follows for a Metro Mirror environment:
In this scenario:
Node A is the Primary cluster node with the production copy of the IASP.
Node B is the Backup cluster node, with the mirror copy of the IASP.
Replication is occurring in the direction from Node A to Node B.
A site failure in Data Center A causes the need to fail production over to Node B making it the Primary cluster node.
Procedure
Tip |
---|
Did you know? |
Begin with the environment pictured above.
When an unplanned failure of Node A and/or External Storage 1 in Data Center A occurs, the following set of procedures should be followed to fail over to Data Center B:
Possible Failure Scenarios:
Automatic Failover from Node A to Node B
In many instances, Node A will send out a distress message as the node is going down. This indicates to Node B that it should take over and causes automatic failover processing to start.
Requirements for Automatic Failover:
The Cluster Resource Group (CRG) must have had a status of active prior to the failure.
The target node in the CRG recovery domain must have had a status of active in the CRG recovery domain prior to the failure.
The Metro Mirror or Global Mirror replication must have a copy status of active at the time of the failover.
The failing node must have an opportunity to send out a distress message.
There cannot be a QCST_CRG_CANCEL_FAILOVER policy disabling automatic failover for the type of failure event that occurred.
In the event that the requirements for an automatic failover are not met, the steps below walk through appropriate manual failover steps.
Automatic Failover Procedure
On the node that will become the new primary node (Node B): use either
WRKCLU
, option 6 Work with Cluster Nodes or theDSPCLUINF
command to display the current cluster node status.Verify that the status of the original primary node (Node A), is either
Inactive P
,Inactive
orFailed
.
Note |
---|
Note: If the status of the node shows as Partition, continue to Resolving a Cluster Partition below. |
3. If the status is Inactive P
, wait until the status changes to either Inactive
or Failed
, pressing F5=Refresh
to refresh the panel.
4. If a failover message queue is defined for either the CRG or the Cluster, a message will be present in the failover message queue asking to proceed or cancel the failover.
Expand | ||
---|---|---|
| ||
|
5. Wait for the PowerHA failover processing to complete.
Expand | ||
---|---|---|
| ||
|
6. Vary on the IASP on the new primary node (NODEB)
Expand | ||
---|---|---|
| ||
Work with the configuration status of the independent ASP (IASP) by using the VARIED OFF
AVAILABLE If the IASP has a status of AVAILABLE, the IASP is already varied on and data on Node B can be accessed.
|
7. Once the vary on of the IASP is complete, data on Node B can be accessed.
8. For information on choosing a direction to restart replication see Restoring the Environment below.
Resolving a Cluster Partition
In some instances, Node A does not have the opportunity to send out a distress message before the node fails. In these instances, PowerHA is unable to automatically determine if the failure is a true failure of a system, or a temporary communication failure that will automatically resolve itself.
Tip |
---|
Did you know? You can increase PowerHA's ability to detect failures by utilizing PowerHA's HMC Advanced Node Failure Detection. See Advanced Node Failure Detection for more information. |
Resolving a Cluster Partition Procedure
On the node that will become the new primary node (Node B): use either
WRKCLU
, option 6 Work with Cluster Nodes or theDSPCLUINF
command to display the current cluster node status.Verify that the status of the original primary node (Node A), is
Partition
.Use the Change Cluster Node Entry (CHGCLUNODE) command with the *CHGSTS option to change the status of the node from Partition to Failed. This indicates to PowerHA that the node is actually down, and that the partition condition is not the result of a temporary network communication issue. In this example, the command is:
CHGCLUNODE CLUSTER(MYCLU) NODE(NODEA) OPTION(*CHGSTS)
Display the cluster resource groups using the DSPCRGINF (Display CRG Information) command by typing
DSPCRGINF
with no parameters.Follow the appropriate steps below, depending on the CRG status and primary node:
If the CRG status is Inactive and the primary node is the new primary node (NODEB). Continue on to step 6, varying on the IASP.
If the CRG status is Inactive and the primary node is the original primary node (NODEA). Continue to Detaching Replication below.
Vary on the IASP on the new primary node (NODEB)
Expand | ||
---|---|---|
| ||
Work with the configuration status of the independent ASP (IASP) by using the
|
7. Once the vary on of the IASP is complete, data on Node B can be accessed.
8. For information on choosing a direction to restart replication see Restoring the Environment below.
Detaching Replication
These steps should be followed only if there is still no access to the IASP on Node B after following the steps in Automatic Failover and Resolving a Cluster Partition.
The steps under Automatic Failover and Resolving a Cluster Partition only enable access to data on Node B when replication is active at the time of the failover processing. In instances where replication is not active at the time of failover processing, since the data at Data Center B may be back-level additional procedures are required.
Note |
---|
Warning: When following these procedures, in environments with Global Mirror due to the asynchronous nature of Global mirror, there may be data that was not received by the storage at Data Center B. This loss of data is represented in the Recovery Point Objective (RPO) trade-offs of allowing Global Mirror to span the globe. |
Detaching Replication Procedure
Display the cluster resource groups using the DSPCRGINF (Display CRG Information) command by typing
DSPCRGINF
with no parameters.If the status of the CRG is anything other than
Inactive
, end the cluster resource group with theENDCRG
command. In this scenario, the command is:ENDCRG CLUSTER(MYCLU) CRG(MYCRG)
Detach replication by using the *DETACH option on the Change Session Command. See the appropriate procedure depending on the type of replication:
Expand | ||
---|---|---|
| ||
To detach SVC/Storwize Metro Mirror or Global Mirror Sessions, use the CHGSVCSSN command. For example: |
Expand | ||
---|---|---|
| ||
To detach DS8000 CSM Metro Mirror or Global Mirror Sessions, use the CHGCSMSSN command. For example: |
Expand | ||
---|---|---|
| ||
To detach DS8000 ASP Metro Mirror or Global Mirror Sessions, use the CHGASPSSN command. For example: |
4. Vary on the IASP on Node B:
Expand | ||
---|---|---|
| ||
|
5. Once the vary on of the IASP is complete, data on Node B can be accessed.
6. For information on choosing a direction to restart replication see Restoring the Environment below.
Restoring the Environment
On the new primary cluster node (Node B): use either
WRKCLU
, option 6 Work with Cluster Nodes or theDSPCLUINF
command to display the current cluster node status.Verify that the status of all nodes is
Active
.If one or more nodes have a status of either
Inactive
orFailed
, start clustering by using the STRCLUNOD command. For example:STRCLUNOD CLUSTER(MYCLU) NODE(NODEA)
End the cluster resource group if it currently has a status of
Active
orIndoubt
by using theENDCRG
command. For example:ENDCRG CLUSTER(MYCLU) CRG(MYCRG)
Display the PowerHA replication session using the display session command. See the appropriate procedure depending on the type of replication:
Expand | ||
---|---|---|
| ||
To display SVC/Storwize Metro Mirror or Global Mirror Sessions, use the DSPSVCSSN command. For example: |
Expand | ||
---|---|---|
| ||
To display DS8000 CSM Metro Mirror or Global Mirror Sessions, use the DSPCSMSSN command. For example: |
Expand | ||
---|---|---|
| ||
To display DS8000 ASP Metro Mirror or Global Mirror Sessions, use the DSPASPSSN command. For example: |
6. Verify that the copy status is detached.
7. Verify the source node in the PowerHA session is the node that contains the copy of the data to keep. Data at the target node will be overwritten. If the source and target node are reversed, use the CHGCRG command to correct the source and target node. For example, if Node B is currently the primary and Node A is the backup node, but the desired copy of the IASP to keep is the copy on Node A, a command similar to the following would be used: CHGCRG CLUSTER(MYCLU)
CRG(MYCRG)
CRGTYPE(*DEV)
RCYDMNACN(*CHGCUR)
RCYDMN((NODEB *BACKUP 1 DATACTRB *SAME *NONE)
(NODEA *PRIMARY *LAST DATACTRA *SAME *NONE))
8. Verify the IASP is varied off on the target node.
9. Reattach the PowerHA session using the Change Session command. See the appropriate procedure depending on the type of replication:
Expand | ||
---|---|---|
| ||
To reattach SVC/Storwize Metro Mirror or Global Mirror Sessions, use the CHGSVCSSN command. For example: |
Expand | ||
---|---|---|
| ||
To reattach DS8000 CSM Metro Mirror or Global Mirror Sessions, use the CHGCSMSSN command. For example: |
Expand | ||
---|---|---|
| ||
To reattach DS8000 ASP Metro Mirror or Global Mirror Sessions, use the CHGASPSSN command. For example: |
10. A confirmation panel confirming the reattach is displayed. Verify that the source and target node are correct and press F16
to confirm.
Warning |
---|
Important: The data on the node listed as the target node on the confirmation panel will be overwritten by the data on the node listed as the source node. If the nodes are incorrect use |
11. Start the Cluster Resource Group using the STRCRG
command. For example: STRCRG CLUSTER(MYCLU) CRG(MYCRG)
12. If the current primary node is not the desired primary node, perform a switchover using the CHGCRGPRI
command. For example: CHGCRGPRI CLUSTER(MYCLU) CRG(MYCRG)