Upgrade All-in-One Duplex / Standard

You can upgrade the StarlingX Duplex or Standard configurations with a new release of StarlingX software.

Prerequisites

  • Perform a full backup to allow recovery.

    Note

    Back up files in the /home/sysadmin and /root directories prior to doing an upgrade. Home directories are not preserved during backup or restore operations, blade replacement, or upgrades.

  • The system must be ‘patch current’. All updates available for the current release running on the system must be applied, and all patches must be committed. To find and download applicable updates, visit the a StarlingX mirror.

  • Transfer the new release software load to controller-0 (or onto a USB stick); controller-0 must be active.

    Note

    Make sure that the /home/sysadmin directory has enough space (at least 2GB of free space), otherwise the upgrade may fail. If more space is needed, it is recommended to delete the .iso bootimage previously imported after the load-import command.

  • Transfer the new release software license file to controller-0, (or onto a USB stick).

  • Transfer the new release software signature to controller-0 (or onto a USB stick).

  • Unlock all hosts.

    • All nodes must be unlocked as the health check prevents the upgrade cannot if there are locked nodes.

Note

The upgrade procedure includes steps to resolve system health issues.

Note

Upgrading hosts should be completed within 24 hours to avoid a kubeadm token timeout.

Procedure

  1. Ensure that controller-0 is the active controller.

  2. Install the license file for the release you are upgrading.

    ~(keystone_admin)]$ system license-install <license_file>
    

    For example,

    ~(keystone_admin)]$ system license-install license.lic
    
  3. Import the new release.

    1. Run the load-import command on controller-0 to import the new release.

      Source /etc/platform/openrc. Also, you must specify an exact path to the *.iso bootimage file and to the *.sig bootimage signature file.

      $ source /etc/platform/openrc
      ~(keystone_admin)]$ system load-import /home/sysadmin/<bootimage>.iso \
      <bootimage>.sig
      +--------------------+-----------+
      | Property           | Value     |
      +--------------------+-----------+
      | id                 | 2         |
      | state              | importing |
      | software_version   | nn.nn     |
      | compatible_version | nn.nn     |
      | required_patches   |           |
      +--------------------+-----------+
      

      The load-import must be done on controller-0 and accepts relative paths.

      Note

      This will take a few minutes to complete.

    2. Check to ensure the load was successfully imported.

      ~(keystone_admin)]$ system load-list
      +----+----------+------------------+
      | id | state    | software_version |
      +----+----------+------------------+
      | 1  | active   | nn.nn            |
      | 2  | imported | nn.nn            |
      +----+----------+------------------+
      
  4. Apply any required software updates.

    The system must be ‘patch current’. All software updates related to your current StarlingX software release must be uploaded, applied, and installed.

    All software updates to the new StarlingX release only need to be uploaded and applied. The install of these software updates will occur automatically during the software upgrade procedure as the hosts are reset to load the new release of software.

    To find and download applicable updates, visit the a StarlingX mirror.

    For more information, see Manage Software Updates.

  5. Confirm that the system is healthy.

    Check the current system health status, resolve any alarms and other issues reported by the system health-query-upgrade command, then recheck the system health status to confirm that all System Health fields are set to OK. For example:

    ~(keystone_admin)]$ system health-query-upgrade
    
    System Health:
    All hosts are provisioned: [OK]
    All hosts are unlocked/enabled: [OK]
    All hosts have current configurations: [OK]
    All hosts are patch current: [OK]
    Ceph Storage Healthy: [OK]
    No alarms: [OK]
    All kubernetes nodes are ready: [OK]
    All kubernetes control plane pods are ready: [OK]
    Required patches are applied: [OK]
    License valid for upgrade: [OK]
    No instances running on controller-1: [OK]
    All kubernetes applications are in a valid state: [OK]
    Active controller is controller-0: [OK]
    

    By default, the upgrade process cannot be run with active alarms present. Use the command system upgrade-start --force to force the upgrade process to start and ignore non-management-affecting alarms.

    Note

    It is strongly recommended that you clear your system of any and all alarms before doing an upgrade. While the --force option is available to run the upgrade, it is a best practice to clear any alarms.

  6. Start the upgrade from controller-0.

    Make sure that controller-0 is the active controller, and you are logged into controller-0 as sysadmin and your present working directory is your home directory.

    ~(keystone_admin)]$ system upgrade-start
    +--------------+--------------------------------------+
    | Property     | Value                                |
    +--------------+--------------------------------------+
    | uuid         | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 |
    | state        | starting                             |
    | from_release | nn.nn                                |
    | to_release   | nn.nn                                |
    +--------------+--------------------------------------+
    

    This will make a copy of the upgrade data onto a DRBD file system to be used in the upgrade. Configuration changes are not allowed after this point until the swact to controller-1 is completed.

    The following upgrade state applies once this command is executed:

    • started:

      • State entered after system upgrade-start completes.

      • Release <nn>.<nn> system data (for example, postgres databases) has been exported to be used in the upgrade.

      • Configuration changes must not be made after this point, until the upgrade is completed.

    As part of the upgrade, the upgrade process checks the health of the system and validates that the system is ready for an upgrade.

    The upgrade process checks that no alarms are active before starting an upgrade.

    Note

    Use the command system upgrade-start --force to force the upgrade process to start and ignore non-management-affecting alarms. This should ONLY be done if you ascertain that these alarms will interfere with the upgrades process.

    On systems with Ceph storage, the process also checks that the Ceph cluster is healthy.

  7. Upgrade controller-1.

    1. Lock controller-1.

      ~(keystone_admin)]$ system host-lock controller-1
      
    2. Upgrade controller-1.

      Controller-1 installs the update and reboots, then performs data migration.

      ~(keystone_admin)]$ system host-upgrade controller-1
      

      Wait for controller-1 to reinstall with the load N+1 and becomes locked-disabled-online state.

      The following data migration states apply when this command is executed:

      • data-migration:

        • State entered when system host-upgrade controller-1 is executed.

        • System data is being migrated from release N to release N+1.

        Note

        The upgrade process will take a minimum of 20 to 30 minutes to complete.

        You can view the upgrade progress on controller-1 using the serial console.

      • data-migration-complete or upgrading-controllers:

        • State entered when controller-1 upgrade is complete.

        • System data has been successfully migrated from release <nn>.<nn> to the newer Version.

      • data-migration-failed:

        • State entered if data migration on controller-1 fails.

        • Upgrade must be aborted.

        Note

        Review the /var/log/sysinv.log on the active controller for more details on data migration failure.

    3. Check the upgrade state.

      ~(keystone_admin)]$ system upgrade-show
      +--------------+--------------------------------------+
      | Property     | Value                                |
      +--------------+--------------------------------------+
      | uuid         | e7c8f6bc-518c-46d4-ab81-7a59f8f8e64b |
      | state        | data-migration-complete              |
      | from_release | nn.nn                                |
      | to_release   | nn.nn                                |
      +--------------+--------------------------------------+
      

      If the upgrade-show status indicates data-migration-failed, then there is an issue with the data migration. Check the issue before proceeding to the next step.

    4. Unlock controller-1.

      ~(keystone_admin)]$ system host-unlock controller-1
      

      Wait for controller-1 to enter the state unlocked-enabled. Wait for the DRBD sync 400.001 Services-related alarm to be raised and then cleared.

      The upgrading-controllers state applies when this command is executed. This state is entered after controller-1 has been upgraded to release nn.nn and data migration is successfully completed.

      If the controller transitions to unlocked-disabled-failed, check the issue before proceeding to the next step. The alarms may indicate a configuration error. Check the result of the configuration logs on controller-1, (for example, Error logs in controller1:/var/log/puppet).

  8. Set controller-1 as the active controller. Swact to controller-1.

    ~(keystone_admin)]$ system host-swact controller-0
    

    Wait until services have become active on the new active controller-1 before proceeding to the next step. The swact is complete when all services on controller-1 are in the state enabled-active. Use the command system servicegroup-list to monitor progress.

  9. Upgrade controller-0.

    1. Lock controller-0.

      ~(keystone_admin)]$ system host-lock controller-0
      
    2. Upgrade controller-0.

      ~(keystone_admin)]$ system host-upgrade controller-0
      
    3. Unlock controller-0.

      ~(keystone_admin)]$ system host-unlock controller-0
      

      Wait until the DRBD sync 400.001 Services-related alarm is raised and then cleared before proceeding to the next step.

      • upgrading-hosts:

        • State entered when both controllers are running release nn.nn software.

      Note

      AIO-DX or Controllers of Standard configurations can be upgraded, using steps 1-9 above.

  10. Check the system health to ensure that there are no unexpected alarms.

    ~(keystone_admin)]$ fm alarm-list
    

    Clear all alarms unrelated to the upgrade process.

  11. If using Ceph a storage backend, upgrade the storage nodes one at a time.

    Note

    Proceed to step 13 if no storage/worker node is present.

    The storage node must be locked and all OSDs must be down in order to do the upgrade.

    1. Lock storage-0.

      ~(keystone_admin)]$ system host-lock storage-0
      
    2. Verify that the OSDs are down after the storage node is locked.

      In the Horizon interface, navigate to Admin > Platform > Storage Overview to view the status of the OSDs.

    3. Upgrade storage-0.

      ~(keystone_admin)]$ system host-upgrade storage-0
      

      The upgrade is complete when the node comes online. At that point you can safely unlock the node.

      After upgrading a storage node, but before unlocking, there are Ceph synchronization alarms (that appear to be making progress in synching), and there are infrastructure network interface alarms (since the infrastructure network interface configuration has not been applied to the storage node yet, as it has not been unlocked).

      Unlock the node as soon as the upgraded storage node comes online.

    4. Unlock storage-0.

      ~(keystone_admin)]$ system host-unlock storage-0
      

      Wait for all alarms to clear after the unlock before proceeding to upgrade the next storage host.

    5. Repeat the above steps for each storage host.

      Note

      After upgrading the first storage node you can expect alarm 800.003. The alarm is cleared after all storage nodes are upgraded.

  12. Upgrade worker hosts, if any, one at a time.

    1. Lock worker-0.

      ~(keystone_admin)]$ system host-lock worker-0
      
    2. Upgrade worker-0.

      ~(keystone_admin)]$ system host-upgrade worker-0
      

      Wait for the host to run the installer, reboot, and go online before unlocking it in the next step.

    3. Unlock worker-0.

      ~(keystone_admin)]$ system host-unlock worker-0
      

      After the unlock wait for all alarms to clear before proceeding to the next worker host.

    4. Repeat the above steps for each worker host.

  13. Set controller-0 as the active controller. Swact to controller-0.

    ~(keystone_admin)]$ system host-swact controller-1
    

    Wait until services have become available on the active controller-0 before proceeding to the next step. When all services on controller-0 are in the enabled-active state, the swact is complete.

  14. Activate the upgrade.

    ~(keystone_admin)]$ system upgrade-activate
    +--------------+--------------------------------------+
    | Property     | Value                                |
    +--------------+--------------------------------------+
    | uuid         | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 |
    | state        | activating                           |
    | from_release | nn.nn                                |
    | to_release   | nn.nn                                |
    +--------------+--------------------------------------+
    

    When running the upgrade-activate command, new configurations are applied to the controller. 250.001 (hostname Configuration is out-of-date) alarms are raised and are cleared as the configuration is applied. The upgrade state goes from activating to activation-complete once this is done.

    The following states apply when this command is executed.

    activation-requested

    State entered when system upgrade-activate is executed.

    activating

    State entered when the system has started activating the upgrade by applying new configurations to the controller and compute hosts.

    activating-hosts

    State entered when applying host-specific configurations. This state is entered only if needed.

    activation-complete

    State entered when new configurations have been applied to all controller and compute hosts.

    1. Check the status of the upgrade again to see it has reached activation-complete.

      ~(keystone_admin)]$ system upgrade-show
      +--------------+--------------------------------------+
      | Property     | Value                                |
      +--------------+--------------------------------------+
      | uuid         | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 |
      | state        | activation-complete                  |
      | from_release | nn.nn                                |
      | to_release   | nn.nn                                |
      +--------------+--------------------------------------+
      

    Note

    This can take more than half an hour to complete.

    activation-failed

    Check /var/log/sysinv.log for further information.

  15. Complete the upgrade.

    ~(keystone_admin)]$ system upgrade-complete
    +--------------+--------------------------------------+
    | Property     | Value                                |
    +--------------+--------------------------------------+
    | uuid         | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 |
    | state        | completing                           |
    | from_release | nn.nn                                |
    | to_release   | nn.nn                                |
    +--------------+--------------------------------------+
    
  16. Delete the imported load.

    ~(keystone_admin)]$ system load-list
    +----+----------+------------------+
    | id | state    | software_version |
    +----+----------+------------------+
    | 1  | imported | nn.nn            |
    | 2  | active   | nn.nn            |
    +----+----------+------------------+
    
    ~(keystone_admin)]$ system load-delete 1
    Deleted load: load 1