Failure During the Installation or Data Migration of N+1 Load on a Subcloud

You may encounter some errors during Installation or Data migration of the N+1 load on a subcloud. This section explains the errors and the steps required to fix these errors.

Errors can occur due to one of the following:

  • One or more invalid install values

  • A network error that results in the subcloud being temporarily unreachable

To get detailed information about errors during installation or data migration, run the dcmanager subcloud errors <subcloud_name/subcloud_id> command.

If you are using the web interface, you can get the error details within the corresponding strategy step or access the comprehensive information related to the affected subcloud. To do so, navigate to the subcloud view and click Distributed Cloud Admin > Cloud Overview. In the given list of subclouds, expand a specific subcloud. You can see any error messages at the end of the details for each subcloud.

Failure Caused by Install Values

If the subcloud install values contain an incorrect value, use the following command to fix it.

~(keystone_admin)]$ dcmanager subcloud update <subcloud-name> --install-values <subcloud-install-values-yaml>

This type of failure is recoverable and you can retry the orchestrated upgrade for each of the failed subclouds using the following procedure:

Procedure

  1. Delete the failed upgrade strategy.

    ~(keystone_admin)]$ dcmanager upgrade-strategy delete
    
  2. Create a new upgrade strategy for the failed subcloud.

    ~(keystone_admin)]$ dcmanager upgrade-strategy create <subcloud-name> --force <additional options>
    

    Note

    If the upgrade failed during the AIO-SX upgrade or data migration, the subcloud availability status is displayed as ‘offline’. Use the --force option when creating the new strategy.

  3. Apply the new upgrade strategy.

    ~(keystone_admin)]$ dcmanager upgrade-strategy apply
    
  4. Verify the upgrade strategy status.

    ~(keystone_admin)]$ dcmanager strategy-step list
    

Failure Post Data Migration on a Subcloud

Once the data migration on the subcloud is completed, the upgrade is activated and finalized. If failure occurs:

Procedure

  • Get detailed information about errors during activation step by running the dcmanager subcloud errors <subcloud_name/subcloud_id> command.

  • If you are using the web interface, you can get the error details within the corresponding strategy step or access the comprehensive information related to the affected subcloud. To do so, navigate to the subcloud view and click Distributed Cloud Admin > Cloud Overview. In the given list of subclouds, expand a specific subcloud. You can see any error messages at the end of the details for each subcloud.

  • Check specified log files

  • Follow the recovery procedure. See Failure Prior to the Installation of N+1 Load on a Subcloud