Restore a Subcloud/Group of Subclouds from Backup Data Using DCManager CLI¶
A subcloud can be restored from its backup data previously stored centrally on the system controller or locally on the subcloud using dcmanager command line interface (CLI). The subcloud install data must be available for this operation to proceed. The subcloud must support Redfish Virtual Media Service (version 1.2 or higher) if remote installation is required.
About this task
The CLI command dcmanager subcloud-backup restore can be used to restore a subcloud or a group of subclouds. By default, the restore is done from subcloud backup data on the central systemController. The command accepts the following parameters/options:
--with-installPerform remote installation of the subcloud prior to execution of restore procedure. The subcloud must support Redfish Virtual Media Service (version 1.2 or higher) to use this option.
--local-onlyUse the local backup archive (default local storage
/opt/platform-backup/backups/<release-version>). If not specified, the subcloud backup archive on the central systemController will be used.--registry-imagesRestore saved container images to registry.local as part of restore procedure (local storage only).
--subcloud <subcloud-name>The subcloud to restore.
--group <subcloud-group-name>The group of subclouds to restore.
--release <release>Software release used to install, bootstrap, and/or deploy the subcloud with. If not specified, the current software release of the System Controller will be used.
--autoTriggers the restore process locally on the subcloud using only BMC connectivity. It is used to perform a fully autonomous restore when the OAM network connectivity between the System Controller and subcloud is not available before the restore. It can be combined with
--local-onlyand/or--with-install. The subcloud must have software and the container images prestaged before auto restore.--factoryPerforms factory default restore via remote installation and automatic restore using prestaged software, container images, and factory backup from the subcloud’s platform-backup partition. This option overrides the
--auto,--with-install,--registry-images, and--local-onlyoptions.--restore-values <yaml-file>The yaml file containing the customization parameters.
wipe_ceph_osds: false: To keep the Ceph cluster data intact.wipe_ceph_osds: true: To wipe the Ceph cluster entirely.on_box_data: true: To indicate that the backup data file is under /opt/platform-backup directory on the local machine.ipmi_sel_event_monitoring: true|false: To enable or disable IPMI SEL (System Event Log) event monitoring for auto and factory-restore operations (defaults to true). Set to false for BMC systems that do not support custom SEL events (example: certain variants of OpenBMC).bootstrap_address: List of subclouds and their corresponding bootstrap addresses for connectivity.add_docker_prefix: For more details, see Install a Subcloud Using Redfish Platform Management Service.bootstrap_address: <subcloud_name1>: <subcloud_bootstrap_address1> <subcloud_name2>: <subcloud_bootstrap_address2>
Note
The
bootstrap_addresskey is only necessary for the restore of manually installed subclouds. For the subclouds installed via Redfish, thebootstrap_addressis already available in the install values.
See Run Restore Playbook Locally on the Controller for the list of configurable system restore parameters.
--sysadmin-password <sysadmin-password>If not specified, user will be prompted for the password. Recommend that this option is ONLY used for automation; i.e., for interactive use, don’t use option and specify password on prompting, so as to avoid sysadmin password getting into log files. For factory-restore operations with
ipmi_sel_event_monitoringas false, provide the factory default sysadmin password.
The --subcloud/--group is a mandatory parameter.
When --registry-images option is applied, the entire registry filesystem
which contains both platform and user container images will be restored.
After the subcloud has been re-installed with the desired release version, the
backup archive for that release will be transferred to the subcloud for the
restore operation by default. If --local-only option is specified, the local
backup archive for the release will be used instead.
It is possible to specify a custom location of the backup file that resides on
the subcloud using --restore-values option and by setting
initial_backup_dir and backup_filename in the provided
restore_values yaml file. Please ensure this custom backup file is not
corrupted and is compatible with software release the subcloud was installed
with.
To restore images from a custom backup file on the subcloud using
--restore-values <yaml-file> option, the registry_backup_filename parameter
must be set in restore_values yaml file.
Restore a Single Subcloud¶
Prerequisites
The System Controller is healthy and ready to accept dcmanager related commands.
The subcloud is unmanaged and is in a valid state for restore operation (i.e. not being restored, installed, bootstrapped, deployed or rehomed).
The subcloud install data is available.
The backup file(s) exists and is compatible with the software release the subcloud is being restored to.
Note
When a vCSR application running on the subcloud provides the network
routing, the standard restore operation (without --auto or
--factory) is not supported. For this use case, see the
Auto-Restore a Subcloud or Factory
Restore a Subcloud sections below.
Procedure
To restore a subcloud, including remote installation, from system backup data in central storage:
~(keystone_admin)]$ dcmanager subcloud-backup restore --subcloud <subcloud-name> --with-install --sysadmin-password <sysadmin-password>
To restore a pre-installed subcloud from system backup data in central storage:
~(keystone_admin)]$ dcmanager subcloud-backup restore --subcloud <subcloud-name> --sysadmin-password <sysadmin-password>
To restore a subcloud, including remote installation, from system backup data stored in default local storage:
~(keystone_admin)]$ dcmanager subcloud-backup restore --subcloud <subcloud-name> --with-install --local-only --sysadmin-password <sysadmin-password>
To restore a subcloud, including remote installation, from system backup and images backup data in default local storage:
~(keystone_admin)]$ dcmanager subcloud-backup restore --subcloud <subcloud-name> --local-only --registry-images --sysadmin-password <sysadmin-password>
Note
The --registry-images option can only be used with --local-only
option.
To restore a pre-installed subcloud from system and images backup data stored at custom location on the subcloud:
Create a yaml file e.g.
restore_overrides.yamlwith the following content:initial_backup_dir: /home/sysadmin/mybackup_dir backup_filename: test_platform_backup.tgz registry_backup_filename: test_images_backup.tgz
Then, run the command:
~(keystone_admin)]$ dcmanager subcloud-backup restore --subcloud subcloud1 --local-only –-registry-images --restore-values restore_overrides.yaml --sysadmin-password <sysadmin-password>
Sample response to a single subcloud restore:
+-----------------------------+----------------------------+
| Field | Value |
+-----------------------------+----------------------------+
| id | 8 |
| name | subcloud1 |
| description | None |
| location | None |
| software_version | 22.12 |
| management | unmanaged |
| availability | offline |
| deploy_status | restore-failed |
| management_subnet | fd01:15::0/64 |
| management_start_ip | fd01:15::2 |
| management_end_ip | fd01:15::11 |
| management_gateway_ip | fd01:15::1 |
| systemcontroller_gateway_ip | fd01:1::1 |
| group_id | 2 |
| created_at | 2022-12-12 05:29:23.807243 |
| updated_at | 2022-12-13 16:39:48.904037 |
| backup_status | unknown |
| backup_datetime | None |
+-----------------------------+----------------------------+
Note
The subcloud can be restored or restored again while in a failed deploy state such as:
data-migration-failed (upgrade failure)
restore-failed (previous restore attempt failed due to a bad backup file)
rehome-failed
To view the progress of subcloud restore, please use dcmanager subcloud show or dcmanager subcloud list command:
~(keystone_admin)]$ dcmanager subcloud show subcloud1
+-----------------------------+----------------------------+
| Field | Value |
+-----------------------------+----------------------------+
| id | 9 |
| name | subcloud2 |
| description | None |
| location | None |
| software_version | 22.12 |
| management | unmanaged |
| availability | offline |
| deploy_status | restoring |
| management_subnet | fd01:176::0/64 |
| management_start_ip | fd01:176::2 |
| management_end_ip | fd01:176::11 |
| management_gateway_ip | fd01:176::1 |
| systemcontroller_gateway_ip | fd01:1::1 |
| group_id | 2 |
| created_at | 2022-12-13 00:09:44.543494 |
| updated_at | 2022-12-13 18:23:20.659138 |
| backup_status | unknown |
| backup_datetime | None |
| dc-cert_sync_status | unknown |
| firmware_sync_status | unknown |
| identity_sync_status | unknown |
| kubernetes_sync_status | unknown |
| kube-rootca_sync_status | unknown |
| load_sync_status | unknown |
| patching_sync_status | unknown |
| platform_sync_status | unknown |
+-----------------------------+----------------------------+
If the restore operation completes successfully, the subcloud will become
online and the deploy_status will be set to ‘complete’.
Please continue with Post restore procedure.
If the restore operation fails, dcmanager subcloud errors command can be used to view the error.
Auto-Restore a Subcloud¶
The auto-restore feature enables fully autonomous subcloud restoration using only BMC connectivity, without requiring network communication between the System Controller and subcloud during the restore process. This is particularly useful when the subcloud network is unavailable until the restore completes (example: when a vCSR application provides network routing).
Note
This feature is available only to the AIO-SX subclouds.
Unresolved
Note
For non-vCSR systems, use the standard restore operation (without
--auto). Standard restore runs remotely through Ansible and provides
more detailed progress information than monitoring via BMC SEL events.
Prerequisites
Before performing an auto-restore operation, ensure that the following conditions are met.
The System Controller is healthy and ready to accept dcmanager commands.
The subcloud is unmanaged and is in a valid state for restore operation.
The subcloud is an AIO-SX.
The subcloud supports Redfish Virtual Media Service (version 1.2 or higher).
BMC access is available throughout the auto-restore operation.
For
--local-onlyauto-restore, the specified backup file exists in the subcloud’s platform-backup partition and the subcloud has been prestaged with software (ostree repo) and container images.For central storage auto-restore, the specified backup file exists on the System Controller and the subcloud has been prestaged with software (ostree repo) and container images.
Note
Auto-restore is only supported on subclouds running release r12 or later.
Procedure
You can auto-restore a subcloud using one of the following methods:
Auto-restore with remote installation using backup data from central storage.
~(keystone_admin)]$ dcmanager subcloud-backup restore --subcloud <subcloud-name> --auto --with-install --sysadmin-password <sysadmin-password>
This command performs the following operations:
Creates a miniboot ISO with the backup data embedded
Triggers remote installation via Redfish
Automatically triggers the restore process locally on the subcloud after the installation completes
Monitors the restore progress via IPMI SEL events (if supported)
Note
Use auto-restore only when the subcloud network is unavailable until the restore completes (for example, when a vCSR application provides network routing). Otherwise, use the standard restore operation.
Auto-restore with remote installation using local backup data.
~(keystone_admin)]$ dcmanager subcloud-backup restore --subcloud <subcloud-name> --auto --with-install --local-only --sysadmin-password <sysadmin-password>
This command uses prestaged ostree repository and backup data already present on the subcloud’s platform-backup partition.
Auto-restore to a specific release.
~(keystone_admin)]$ dcmanager subcloud-backup restore --subcloud <subcloud-name> --auto --local-only --release 26.03 --sysadmin-password <sysadminpassword>
Auto-restore without remote installation (pre-installed subcloud).
~(keystone_admin)]$ dcmanager subcloud-backup restore --subcloud <subcloud-name> --auto --sysadmin-password <sysadmin-password>
When
--with-installis not specified, the backup data is transferred to the subcloud via a cloud-init seed ISO, and the restore is triggered automatically upon boot. The cloud-init service must be enabled on the subcloud. This is done automatically if the subcloud is installed with a prestaged ISO.Note
For BMC systems that do not support custom IPMI SEL events (example: OpenBMC), set
ipmi_sel_event_monitoringto false in the restore values yaml file.~(keystone_admin)]$ cat restore_overrides.yaml ipmi_sel_event_monitoring: false ~(keystone_admin)]$ dcmanager subcloud-backup restore --subcloud <subcloud-name> --auto --with-install --restore-values restore_overrides.yaml --sysadmin-password <sysadmin-password>
When IPMI SEL event monitoring is disabled, the System Controller waits for the subcloud to become reachable via the OAM network to validate restore completion.
Factory-Restore a Subcloud¶
The factory-restore feature restores subcloud to its factory default state using prestaged software and factory backup data stored locally on the subcloud. This is useful for disaster recovery scenarios where the subcloud needs to be completely reset to its initial factory-installed state.
Note
This feature is available only to the AIO-SX subclouds.
Prerequisites
Before performing a factory-restore operation, ensure that the following conditions are met:
The System Controller is healthy and ready to accept dcmanager commands.
The subcloud is unmanaged and is in a valid state for restore operation.
The subcloud is an AIO-SX.
The subcloud supports Redfish Virtual Media Service (version 1.2 or higher).
BMC access is available throughout the factory-restore operation.
The factory backup data and prestaged software must exist on the subcloud’s platform-backup partition in the following structure:
/opt/platform-backup/factory/<sw-version>/ ostree_repo/ local_registry_filesystem.tgz (and/or container image tarballs) factory_backup.tgz miniboot.cfgThese files are created automatically during the factory install process described here Enroll a Factory Installed Non Distributed Standalone System as a Subcloud.
Note
Factory-restore only supports systems that were factory-installed with the r12 release or later.
Procedure
To factory-restore a subcloud, use the following command:
~(keystone_admin)]$ dcmanager subcloud-backup restore --subcloud <subcloud-name> --factory --sysadmin-password <factory-sysadmin-password>
This command performs the following operations:
Creates a miniboot ISO configured to use the local kickstart from the platform-backup partition.
Triggers remote installation via Redfish using prestaged ostree repository.
Automatically triggers the factory-restore process using the factory backup data after installation completes.
Monitors the restore progress via IPMI SEL events (if supported).
Note
When using --factory with ipmi_sel_event_monitoring as false,
provide the factory default sysadmin password (the password that was set
during the original factory installation).
Note
The --factory option cannot be combined with --auto,
--with-install, --registry-images, or --local-only options.
These options are ignored when --factory is specified.
Note
For factory-restore, the --release option must specify the same release
version in which the subcloud system was factory installed.
Factory-Restore Completion State¶
Upon successful factory-restore, the subcloud deploy_status will be set to
factory-restore-complete.
At this state, the subcloud can be enrolled or re-enrolled using the following commands:
~(keystone_admin)]$ dcmanager subcloud delete <subcloud-name>
~(keystone_admin)]$ dcmanager subcloud add --enroll ...
For more information, see Enroll a Factory Installed Non Distributed Standalone System as a Subcloud.
BMC Systems Without IPMI SEL Event Support¶
For BMC systems that do not support custom IPMI SEL events (example: OpenBMC), factory-restore completion cannot be automatically detected when vCSR application is involved (as the subcloud network is not available until after restore).
In this scenario:
The monitoring playbook will timeout and set the subcloud to the
install-failedstate, even if the restore was successful.Manual verification of restore success is required via BMC serial console. This can be done by verifying
/var/log/auto-restore.loginside the subcloud. If it contains theSystem restore-complete executed successfullymessage, the restore completed successfully.If the restore was successful, delete the subcloud on the System Controller and re-enroll it.
For factory-restore of a non-vCSR system, always set
ipmi_sel_event_monitoring to false. This enables remote factory-restore
execution through Ansible, which provides more detailed progress information
than monitoring via BMC SEL events.
~(keystone_admin)]$ cat restore_overrides.yaml
ipmi_sel_event_monitoring: false
~(keystone_admin)]$ dcmanager subcloud-backup restore --subcloud <subcloud-name> --factory --restore-values restore_overrides.yaml --sysadmin-password <factory-sysadmin-password>
Note
The ipmi_sel_event_monitoring restore value is ignored for the standard
restore operations (without --auto or --factory), which do not monitor
progress through SEL events.
Restore a Group of Subclouds¶
The above subcloud-backup restore operations can be performed for a group of
subclouds simultaneously by replacing --subcloud option with --group
option.
For example, to restore a group of subclouds with remote installation from their system data in central storage, run the following command:
~(keystone_admin)]$ dcmanager subcloud-backup restore --group <group> --with-install --sysadmin-password <sysadmin-password>
To auto-restore a group of subclouds, run the following command:
~(keystone_admin)]$ dcmanager subcloud-backup restore --group <group> --auto --with-install --sysadmin-password <sysadmin-password>
To factory-restore a group of subclouds, run the following command:
~(keystone_admin)]$ dcmanager subcloud-backup restore --group <group> --factory --sysadmin-password <factory-sysadmin-password>
If all subclouds in the group are not in the valid state for restore, an error message will be displayed. If some of the subclouds in the group meet restore operation criteria, a list will be displayed.
Sample group restore response:
+----+-----------+-------------+----------+------------------+------------+--------------+---------------+-------------------+---------------------+-------------------+-----------------------+-----------------------------+----------+----------------------------+----------------------------+----------------+----------------------------+
| id | name | description | location | software_version | management | availability | deploy_status | management_subnet | management_start_ip | management_end_ip | management_gateway_ip | systemcontroller_gateway_ip | group_id | created_at | updated_at | backup_status | backup_datetime |
+----+-----------+-------------+----------+------------------+------------+--------------+---------------+-------------------+---------------------+-------------------+-----------------------+-----------------------------+----------+----------------------------+----------------------------+----------------+----------------------------+
| 8 | subcloud6 | None | None | 22.12 | unmanaged | online | complete | fd01:15::0/64 | fd01:15::2 | fd01:15::11 | fd01:15::1 | fd01:1::1 | 2 | 2022-12-13 18:23:03.883068 | 2022-12-13 22:14:39.331199 | complete-local | 2022-12-13 22:04:06.232043 |
| 9 | subcloud8 | None | None | 22.12 | unmanaged | online | complete | fd01:176::0/64 | fd01:176::2 | fd01:176::11 | fd01:176::1 | fd01:1::1 | 2 | 2022-12-13 19:27:55.115604 | 2022-12-13 22:15:09.287665 | complete-local | 2022-12-13 22:05:03.785280 |
+----+-----------+-------------+----------+------------------+------------+--------------+---------------+-------------------+---------------------+-------------------+-----------------------+-----------------------------+----------+----------------------------+----------------------------+----------------+----------------------------+
After group restore is complete, continue with Post restore procedure for each subcloud in the group.
Post Factory-Restore¶
After a successful factory-restore, the subcloud deploy_status will be set
to factory-restore-complete.
Delete the subcloud and re-enroll it.
Delete the subcloud using the dcmanager subcloud delete <subcloud-name> command.
Re-add and enroll the subcloud using the dcmanager subcloud add --enroll command.
Post Restore¶
AIO-SX subcloud
Resume subcloud audit with the command:
~(keystone_admin)]$ dcmanager subcloud manage
AIO-DX/Standard subcloud
If the restore playbook completes successfully, the subcloud will be online and
deploy_status will be set to complete. Only controller-0 will be in
unlocked and online state. To complete the restore operation, follow the
procedure available in Restore Platform System Data and Storage for
restoring the remaining subcloud nodes.
Resume subcloud audit with the command:
~(keystone_admin)]$ dcmanager subcloud manage
Troubleshooting Auto-Restore and Factory-Restore¶
Diagnosing install failures
If the auto-restore or factory-restore fails during installation (detected via install timeout), perform the following:
Enable serial logs by setting
rvmc_debug_levelin the install values.rvmc_debug_level: 1
Note
Some BMCs require specific cipher suites to allow serial console log capture. Use the
bmc_ciphersuiteparameter in the install values YAML file to configure the cipher suites.Update the subcloud install values.
~(keystone_admin)]$ dcmanager subcloud update --install-values <install-values> --sysadmin-password <sysadmin-password> --bmc-password <bmc-password> <subcloud-name>
Next time the restore is executed, check the logs in
/var/log/dcmanager/ansibleon the System Controller.Alternatively, access
/root/install.logvia BMC serial console on the subcloud.
Common failure scenarios
Missing prestaged ostree data: The install will abort with the message
Installation Failed: ERROR: ostree_repo must be prestaged for auto-restore operationin/root/install.log.Missing backup file: Detected via IPMI SEL event (if supported) or during restore execution. Check that the backup file exists at the expected location.
Missing container images: Detected via IPMI SEL event (if supported) or during restore execution. Ensure that container images are prestaged or backed up.
Viewing restore logs
For auto-restore and factory-restore operations, restore logs are captured in
/var/log/auto-restore.logon the subcloud and can be viewed via BMC serial console.