High Availability is an important service for any application and it is highly recommended for a monitoring application. HA solution for a monitoring solution makes sure that the monitoring is always on and the service is available with out interruptions.
From System Center 2012, HA is made easier with the concept of Resource pool, where each member of the pool will synchronize the SQL data and make themselves available during a failure and the same principle applies in System Center 2016 too
Scenarios of HA in System Center Operations Manager
- Agent Server fail over to a Management Server from Resource Pool
- Gateway Server Failover to Management Server
- Gateway Agent ( domain joined ) Failover
- Gateway Agent ( Work-group ) Failover
In order to test this fail-over functionality, I have configured the below servers in my Lab
- Domain: Kartik.com
- SCOM Primary Management Server : SCOM2016.kartik.com
- SCOM Secondary Management Server: SCOM2.kartik.com
- Gateway Server 1 : Server1.Kartik.com
- Gateway Server 2 : Node2.kartik.com
- Domain joined Client Server : Client2.kartik.com
- Workgroup Computer : Client
- Agent Server fail-over to Management Server from a Resource Pool
In this scenario, the agent servers will be reporting to Management Server Resource pool and when one Management server goes down, the agents reporting to that will fail-over to the other Management Server available in the pool
Test Fail-over
Scenario:
Primary Management Server: SCOM2.kartik.com
Failover Management Server : SCOM2016.kartik.com
Client Server: Client2.kartik.com
Shutdown the Management Server SCOM2.kartik.com to test the agent failover
SCOM2 showing grey in SCOM console
Event Logs from SCOM2016.kartik.com
Logs from SCOM2016.kartik.com
Logs from SCOM2016.kartik.com
Logs from Client2.kartik.com
Here, we see that the server successfully failed over to SCOM2016.kartik.com
Client2.kartik.com showing healthy in SCOM console
2. Gateway Server Fail-over
Gateway Server: Server1.kartik.com
Primary Management Server: SCOM2.kartik.com
Failover Management Server: SCOM2016.kartik.com
- Powershell Commands to configure Gateway Server failover
$primaryMS = Get-SCOMManagementServer –Name “SCOM2.kartik.com”
$failoverMS = Get-SCOMManagementServer –Name “SCOM2016.kartik.com”
$gatewayMS = Get-SCOMGatewayManagementServer –Name “Server1.kartik.com”
Set-SCOMParentManagementServer –Gateway $gatewayMS –PrimaryServer $primaryMS
Set-SCOMParentManagementServer –Gateway $gatewayMS –FailoverServer $failoverMS
Powershell Commands to verify Gateway Server Fail-over
$GWs = Get-SCOMManagementServer | where {$_.IsGateway -eq $true}
$GWs | sort | foreach {
Write-Host “”;
“Gateway MS :: ” + $_.Name;
“–Primary MS :: ” + ($_.GetPrimaryManagementServer()).ComputerName;
$failoverServers = $_.getFailoverManagementServers();
foreach ($managementServer in $failoverServers) {
“–Failover MS :: ” + ($managementServer.ComputerName);
}
}
Write-Host “”;
Verify Gateway Server Fail-Over
Shutdown the primary management Server SCOM2.kartik.com
Logs from SCOM2016.kartik.com
Event generated in SCOM console for SCOM2.kartik.com
Logs from Server1.kartik.com saying that it is successfully failed over to SCOM2016.kartik.com
Server1.kartik.com showing healthy in SCOM console
3. Gateway Agent ( domain-joined ) failover
Client: Client2.kartik.com
Primary Gateway Management Server: Server1.kartik.com
Failover Gateway Management Server: Node2.kartik.com
Client2.kartik.com reporting to Gateway Server1.kartik.com
Powershell commands to configure Gateway Agent failover
$primaryMS = Get-SCOMManagementServer | where {$_.Name –eq ‘server1.kartik.com’}
$failoverMS = Get-SCOMManagementServer | where {$_.Name –eq ‘Node2.kartik.com’}
$agent = Get-SCOMAgent | where {$_.PrimaryManagementServerName -eq ‘Server1.kartik.com’}
Set-SCOMParentManagementServer -Agent: $agent -PrimaryServer: $primaryMS
Set-SCOMParentManagementServer -Agent: $agent -FailoverServer: $failoverMS
Powershell commands to verify Gateway Agent failover
$Agents = Get-SCOMAgent | where {$_.PrimaryManagementServerName -eq ‘Server1.Kartik.COM’}
$Agents | sort | foreach {
Write-Host “”;
“Agent :: ” + $_.Name;
“–Primary MS :: ” + ($_.GetPrimaryManagementServer()).ComputerName;
$failoverServers = $_.getFailoverManagementServers();
foreach ($managementServer in $failoverServers) {
“–Failover MS :: ” + ($managementServer.ComputerName);
}
}
Write-Host “”;
Shutdown Server1.kartik.com
Event generated in SCOM console for Server1.kartik.com
Event Log from Management Server SCOM2016.kartik.com
Client2.kartik.com successfully failed over to other gateway server Node2.kartik.com
Event log generated in Client2.kartik.com
Client2.kartik.com showing healthy in scom console
4. Gateway Agent ( workgroup ) failover
Workgroup computer: Client.kartik.com
Primary Gateway Management Server: Server1.kartik.com
Failover Gateway Management Server: Node2.kartik.com
Note: For the workgroup computer to failover , the certificate used for client authentication should be imported into personal store of failover Gateway Management Server too
Workgroup client reporting to the gateway Server1.kartik.com
Certificates imported in personal store of both the Gateway Servers Server1.kartik.com and Node2.kartik.com
Powershell commands to verify Gateway Agent failover
Shutdown Server1.kartik.com
Event logs generated from Management Server SCOM2016.kartik.com
Event Log generated in workgroup computer for successful failover