Configure Distributed Cloud System Controller GEO Redundancy¶
About this task
You can configure a distributed cloud System Controller GEO Redundancy using DC manager CLI commands.
System administrators can follow the procedures below to enable and disable the GEO Redundancy feature.
Note
In this release, the GEO Redundancy feature supports only two distributed clouds in one protection group.
Enable GEO Redundancy¶
Set up a protection group for two distributed clouds, making these two distributed clouds operational in 1+1 active GEO Redundancy mode.
For example, let us assume we have two distributed clouds, site A and site B. When the operation is performed on site A, the local site is site A and the peer site is site B. When the operation is performed on site B, the local site is site B and the peer site is site A.
Prerequisites
The peer system controller’s OAM network is accessible to each other and can access the subclouds via both OAM and management networks.
For security of production system, it is important to ensure the safety and identification of peer site queries. To meet this objective, it is essential to have an HTTPS-based system API in place. This necessitates the presence of a well-known and trusted CA to enable secure HTTPS communication between peers. If you are using an internally trusted CA, ensure that the system trusts the CA by installing its certificate with the following command.
~(keystone_admin)]$ system certificate-install --mode ssl_ca <trusted-ca-bundle-pem-file>
where:
<trusted-ca-bundle-pem-file>
is the path to the intermediate or Root CA certificate associated with the StarlingX REST API’s Intermediate or Root CA-signed certificate.
Procedure
You can enable the GEO Redundancy feature between site A and site B from the command line. In this procedure, the subclouds managed by site A will be configured to be managed by GEO Redundancy protection group that consists of site A and site B. When site A is offline for some reasons, an alarm notifies the administrator, who initiates the group based batch migration to rehome the subclouds of site A to site B for centralized management.
Similarly, you can also configure the subclouds managed by site B to be taken over by site A when site B is offline by following the same procedure where site B is local site and site A is peer site.
Log in to the active controller node of site B and get the required information about the site B to create a protection group.
Unique UUID of the central cloud of the peer system controller
URI of Keystone endpoint of peer system controller
Gateway IP address of the management network of peer system controller
For example:
# On site B sysadmin@controller-0:~$ source /etc/platform/openrc ~(keystone_admin)]$ system show | grep -i uuid | uuid | 223fcb30-909d-4edf-8c36-1aebc8e9bd4a | ~(keystone_admin)]$ openstack endpoint list --service keystone \ --interface public --region RegionOne -c URL +-----------------------------+ | URL | +-----------------------------+ | http://10.10.10.2:5000 | +-----------------------------+ ~(keystone_admin)]$ system host-route-list controller-0 | awk '{print $10}' | grep -v "^$" gateway 10.10.27.1
Log in to the active controller node of the central cloud of site A. Create a System Peer instance of site B on site A so that site A can access information of site B.
# On site A ~(keystone_admin)]$ dcmanager system-peer add \ --peer-uuid 223fcb30-909d-4edf-8c36-1aebc8e9bd4a \ --peer-name siteB \ --manager-endpoint http://10.10.10.2:5000 \ --peer-controller-gateway-address 10.10.27.1 Enter the admin password for the system peer: Re-enter admin password to confirm: +----+--------------------------------------+-----------+-----------------------------+----------------------------+ | id | peer uuid | peer name | manager endpoint | controller gateway address | +----+--------------------------------------+-----------+-----------------------------+----------------------------+ | 2 | 223fcb30-909d-4edf-8c36-1aebc8e9bd4a | siteB | http://10.10.10.2:5000 | 10.10.27.1 | +----+--------------------------------------+-----------+-----------------------------+----------------------------+
Collect the information from site A.
# On site A sysadmin@controller-0:~$ source /etc/platform/openrc ~(keystone_admin)]$ system show | grep -i uuid ~(keystone_admin)]$ openstack endpoint list --service keystone --interface public --region RegionOne -c URL ~(keystone_admin)]$ system host-route-list controller-0 | awk '{print $10}' | grep -v "^$"
Log in to the active controller node of the central cloud of site B. Create a System Peer instance of site A on site B so that site B has information about site A.
# On site B ~(keystone_admin)]$ dcmanager system-peer add \ --peer-uuid 3963cb21-c01a-49cc-85dd-ebc1d142a41d \ --peer-name siteA \ --manager-endpoint http://10.10.11.2:5000 \ --peer-controller-gateway-address 10.10.25.1 Enter the admin password for the system peer: Re-enter admin password to confirm:
Create a SPG for site A.
# On site A ~(keystone_admin)]$ dcmanager subcloud-peer-group add --peer-group-name group1
Add the subclouds needed for redundancy protection on site A.
Ensure that the subclouds bootstrap data is updated. The bootstrap data is the data used to bootstrap the subcloud, which includes the OAM and management network information, system controller gateway information, and docker registry information to pull necessary images to bootstrap the system.
For an example of a typical bootstrap file, see Install and Provision a Subcloud.
Update the subcloud information with the bootstrap values.
~(keystone_admin)]$ dcmanager subcloud update subcloud1 \ --bootstrap-address <Subcloud_OAM_IP_Address> \ --bootstrap-values <Path_of_Bootstrap-Value-File>
Update the subcloud information with the SPG created locally.
~(keystone_admin)]$ dcmanager subcloud update <SiteA-Subcloud1-Name> \ --peer-group <SiteA-Subcloud-Peer-Group-ID-or-Name>
For example,
~(keystone_admin)]$ dcmanager subcloud update subcloud1 --peer-group group1
If you want to remove one subcloud from the SPG, run the following command:
~(keystone_admin)]$ dcmanager subcloud update <SiteA-Subcloud-Name> --peer-group none
For example,
~(keystone_admin)]$ dcmanager subcloud update subcloud1 --peer-group none
Check the subclouds that are under the SPG.
~(keystone_admin)]$ dcmanager subcloud-peer-group list-subclouds <SiteA-Subcloud-Peer-Group-ID-or-Name>
Create an association between the System Peer and SPG.
# On site A ~(keystone_admin)]$ dcmanager peer-group-association add \ --system-peer-id <SiteB-System-Peer-ID> \ --peer-group-id <SiteA-System-Peer-Group1> \ --peer-group-priority <priority>
The
peer-group-priority
parameter can accept an integer value greater than 0. It is used to set the priority of the SPG, which is created in peer site using the peer site’s dcmanager API during association synchronization.The default priority in the SPG is 0 when it is created in the local site.
The smallest integer has the highest priority.
During the association creation, the SPG in the association will be synchronized from the local site to the peer site, and the subclouds belonging to the SPG.
Confirm that the local SPG and its subclouds have been synchronized into site B with the same name.
Show the association information just created in site A and ensure that
sync_status
isin-sync
.# On site A ~(keystone_admin)]$ dcmanager peer-group-association list <Association-ID> +----+---------------+----------------+---------+-----------------+---------------------+ | id | peer_group_id | system_peer_id | type | sync_status | peer_group_priority | +----+---------------+----------------+---------+-----------------+---------------------+ | 1 | 1 | 2 | primary | in-sync | 2 | +----+---------------+----------------+---------+-----------------+---------------------+
Show
subcloud-peer-group
in site B and ensure that it has been created.List the subcloud in
subcloud-peer-group
in site B and ensure that all the subclouds have been synchronized as secondary subclouds.# On site B ~(keystone_admin)]$ dcmanager subcloud-peer-group show <SiteA-Subcloud-Peer-Group-Name> ~(keystone_admin)]$ dcmanager subcloud-peer-group list-subclouds <SiteA-Subcloud-Peer-Group-Name>
When you create the primary association on site A, a non-primary association on site B will automatically be created to associate the synchronized SPG from site A and the system peer pointing to site A.
You can check the association list to confirm if the non-primary association was created on site B.
# On site B ~(keystone_admin)]$ dcmanager peer-group-association list +----+---------------+----------------+-------------+-------------+---------------------+ | id | peer_group_id | system_peer_id | type | sync_status | peer_group_priority | +----+---------------+----------------+-------------+-------------+---------------------+ | 2 | 26 | 1 | non-primary | in-sync | None | +----+---------------+----------------+-------------+-------------+---------------------+
(Optional) Update the protection group related configuration.
After the peer group association has been created, you can still update the related resources configured in the protection group:
Update subcloud with bootstrap values
Add subcloud(s) into the SPG
Remove subcloud(s) from the SPG
After any of the above operations,
sync_status
is changed toout-of-sync
.After the update has been completed, you need to use the sync command to push the SPG changes to the peer site that keeps the SPG the same status.
# On site A dcmanager peer-group-association sync <SiteA-Peer-Group-Association1-ID>
Warning
The dcmanager peer-group-association sync command must be run after any of the following changes:
Subcloud is removed from the SPG for the subcloud name change.
Subcloud is removed from the SPG for the subcloud management network reconfiguration.
Subcloud updates one or both of these parameters:
--bootstrap-address
,--bootstrap-values parameters
.
Similarly, you need to check the information has been synchronized by showing the association information just created in site A, ensuring that
sync_status
isin-sync
.# On site A ~(keystone_admin)]$ dcmanager peer-group-association show <Association-ID> +----+---------------+----------------+---------+-----------------+---------------------+ | id | peer_group_id | system_peer_id | type | sync_status | peer_group_priority | +----+---------------+----------------+---------+-----------------+---------------------+ | 1 | 1 | 2 | primary | in-sync | 2 | +----+---------------+----------------+---------+-----------------+---------------------+
Results
You have configured a GEO Redundancy protection group between site A and site B. If site A is offline, the subclouds configured in the SPG can be migrated in batch to site B for central management manually.
Health Monitor and Migration¶
Peer monitoring and alarming¶
After the peer protection group is formed, if site A cannot be connected to site B, there will be an alarm message on site B.
For example:
# On site B
~(keystone_admin)]$ fm alarm-list
+----------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------+----------+--------------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------+----------+--------------------------+
| 280.004 | Peer siteA is in disconnected state. Following subcloud peer groups are impacted: group1. | peer=223fcb30-909d-4edf- | major | 2023-08-18T10:25:29. |
| | | 8c36-1aebc8e9bd4a | | 670977 |
| | | | | |
+----------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------+----------+--------------------------+
Administrator can suppress the alarm with the following command:
# On site B
~(keystone_admin)]$ fm event-suppress --alarm_id 280.004
+----------+------------+
| Event ID | Status |
+----------+------------+
| 280.004 | suppressed |
+----------+------------+
Migration¶
If site A is down, after receiving the alarming message the administrator can choose to perform the migration on site B, which will migrate the subclouds under the SPG from site A to site B.
Note
Before initiating the migration operation, ensure that sync-status
of the
peer group association is in-sync
so that the latest updates from site A
have been successfully synchronized to site B. If sync_status
is not
in-sync
, the migration may fail.
# On site B
~(keystone_admin)]$ dcmanager subcloud-peer-group migrate <Subcloud-Peer-Group-ID-or-Name>
# For example:
~(keystone_admin)]$ dcmanager subcloud-peer-group migrate group1
During the batch migration, you can check the status of the migration of each subcloud in the SPG by showing the details of the SPG being migrated.
# On site B
~(keystone_admin)]$ dcmanager subcloud-peer-group status <Subcloud-Peer-Group-ID-or-Name>
After successful migration, the subcloud(s) should be in
managed/online/complete
status on site B.
For example:
# On site B
~(keystone_admin)]$ dcmanager subcloud list
+----+---------------------------------+------------+--------------+---------------+-------------+---------------+-----------------+
| id | name | management | availability | deploy status | sync | backup status | backup datetime |
+----+---------------------------------+------------+--------------+---------------+-------------+---------------+-----------------+
| 45 | subcloud3-node2 | managed | online | complete | in-sync | None | None |
| 46 | subcloud1-node6 | managed | online | complete | in-sync | None | None |
+----+---------------------------------+------------+--------------+---------------+-------------+---------------+-----------------+
Post Migration¶
If site A is restored, the subcloud(s) should be adjusted to
unmanaged/secondary
status in site A. The administrator can receive an
alarm on site A that notifies that the SPG is managed by a peer site (site
B), because this SPG on site A has the higher priority.
~(keystone_admin)]$ fm alarm-list
+----------+-------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------+-----------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+-------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------+-----------------------+
| 280.005 | Subcloud peer group (peer_group_name=group1) is managed by remote system | subcloud_peer_group=7 | warning | 2023-09-04T04:51:58. |
| | (peer_uuid=223fcb30-909d-4edf-8c36-1aebc8e9bd4a) with lower priority. | | | 435539 |
| | | | | |
+----------+-------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------+-----------------------+
Then, the administrator can decide if and when to migrate the subcloud(s) back.
# On site A
~(keystone_admin)]$ dcmanager subcloud-peer-group migrate <Subcloud-Peer-Group-ID-or-Name>
# For example:
~(keystone_admin)]$ dcmanager subcloud-peer-group migrate group1
After successful migration, the subcloud status should be back to the
managed/online/complete
status.
For example:
+----+---------------------------------+------------+--------------+---------------+---------+---------------+-----------------+
| id | name | management | availability | deploy status | sync | backup status | backup datetime |
+----+---------------------------------+------------+--------------+---------------+---------+---------------+-----------------+
| 33 | subcloud3-node2 | managed | online | complete | in-sync | None | None |
| 34 | subcloud1-node6 | managed | online | complete | in-sync | None | None |
+----+---------------------------------+------------+--------------+---------------+---------+---------------+-----------------+
Also, the alarm mentioned above will be cleared after migrating back.
~(keystone_admin)]$ fm alarm-list
Disable GEO Redundancy¶
You can disable the GEO Redundancy feature from the command line.
Ensure that you have a stable environment to disable the GEO Redundancy feature, ensuring that the subclouds are managed by the expected site.
Procedure
Delete the primary association on both the sites.
# site A ~(keystone_admin)]$ dcmanager peer-group-association delete <SiteA-Peer-Group-Association1-ID>
Delete the SPG.
# site A ~(keystone_admin)]$ dcmanager subcloud-peer-group delete group1
Delete the system peer.
# site A ~(keystone_admin)]$ dcmanager system-peer delete siteB # site B ~(keystone_admin)]$ dcmanager system-peer delete siteA
Results
You have torn down the protection group between site A and site B.
Backup and Restore Subcloud¶
You can backup and restore a subcloud in a distributed cloud environment. However, GEO redundancy does not support the replication of subcloud backup files from one site to another.
A subcloud backup is valid only for the current system controller. When a subcloud is migrated from site A to site B, the existing backup becomes unavailable. In this case, you can create a new backup of that subcloud on site B. Subsequently, you can restore the subcloud from this newly created backup when it is managed under site B.
For information on how to backup and restore a subcloud, see Backup a Subcloud/Group of Subclouds using DCManager CLI and Restore a Subcloud/Group of Subclouds from Backup Data Using DCManager CLI.
Operations Performed by Protected Subclouds¶
The table below lists the operations that can/cannot be performed on the protected subclouds.
Primary site: The site where the SPG was created.
Secondary site: The peer site where the subclouds in the SPG can be migrated to.
Protected subcloud: The subcloud that belongs to a SPG.
Local/Unprotected subcloud: The subcloud that does not belong to any SPG.
Operation |
Allow (Y/N/Maybe) |
Note |
---|---|---|
Unmanage |
N |
Subcloud must be removed from the SPG before it can be manually unmanaged. |
Manage |
N |
Subcloud must be removed from the SPG before it can be manually managed. |
Delete |
N |
Subcloud must be removed from the SPG before it can be manually unmanaged and deleted. |
Update |
Maybe |
Subcloud can only be updated while it is managed in the primary site because the sync command can only be issued from the system controller where the SPG was created. Warning The subcloud network cannot be reconfigured while it is being managed by the secondary site. If this operation is necessary, perform the following steps:
|
Rename |
Yes |
|
Patch |
Y |
Warning There may be a patch out-of-sync alarm when the subcloud is migrated to another site. |
Upgrade |
Y |
All the system controllers in the protection group must be upgraded first before upgrading any of the subclouds. |
Rehome |
N |
Subcloud cannot be manually rehomed while being part of the SPG |
Backup |
Y |
|
Restore |
Maybe |
|
Prestage |
Y |
Warning The prestage data will get overwritten because it is not guaranteed that both the system controllers always run on the same patch level (ostree repo) and/or have the same images list. |
Reinstall |
Maybe |
If the subcloud in the primary site is already a part of SPG, you need to remove it from the SPG, unmanage and reinstall the subcloud, and add it back to SPG and perform the sync operation. If the subcloud is in the secondary site, perform the following steps:
|
Remove from SPG |
Maybe |
Subcloud can be removed from the SPG in the primary site. Subcloud can only be removed from the SPG in the secondary site if the primary site is currently down. |
Add to SPG |
Maybe |
Subcloud can only be added to the SPG in the primary site as manual sync is required. |
Note
After migrating the subcloud, kube-rootca_sync_status
may become
out-of-sync
if it is not synchronized with the new system controller.
To update the root CA certificate of the subcloud, run the
dcmanager kube-rootca-update-strategy command and pass the
kube-root
CA cert from the new system controller. However, if you update
the certificate and migrate the subcloud back to the primary site, then the
certificate needs to be updated again.