Orchestrated Deployment Host Software Deployment¶
Software deployment orchestration automates the process of upversioning the StarlingX software to a new major release or new patch release (In-Service or Reboot Required (RR)). It automates the execution of all software deploy steps across all the hosts in a cluster, based on the configured policies.
Note
Software deployment orchestration also covers the orchestrated upversioning to a new patched major release, that is, all the comments in this section that are specific to major release also apply to a patched major release.
Software deployment Orchestration supports all standalone configurations: AIO-SX, AIO-DX and standard configuration.
Note
Orchestrating the software deployment of subclouds in a DC system is different from orchestrating the software deployment of standalone StarlingX configurations. See Distributed Software Deploy Orchestration Process using the CLI.
Software deployment orchestration automatically iterates through all the hosts and deploys the new software load on each host: first the controller hosts, then the storage hosts, and lastly the worker hosts, and finally activates and completes the software deployment. During software deployment on a worker host (and duplex AIO controllers), pods or VMs are automatically moved to the alternate worker hosts, if a reboot of the host is required. After software deployment orchestration has deployed the new software on all hosts, it will activate, complete, and delete the new software deployment.
Note
Software deployment orchestration completes and deletes the new software
deployment only when the --delete
option is selected by the user during
create strategy. In case of a Major Release, if the software deployment is
deleted, it can no longer be rolled back.
To perform a software deployment orchestration, first create an upgrade orchestration strategy for the automated software deployment procedure. This provides polices to perform the software deployment orchestration using the following parameters:
The host types to be software deployed.
Whether to deploy the software to hosts serially or in parallel.
The maximum number of hosts to deploy in parallel.
Maintenance action (stop-start or migrate) for hosted OpenStack VMs on a host that is about to have its software updated.
Alarm restrictions, that is, options to specify how the orchestration behaves when alarms occur.
Based on these parameters and the state of the hosts, software deployment orchestration creates a number of stages for the overall software deployment strategy. Each stage generally consists of deploying software on hosts for a subset of the hosts on the system. In the case of a reboot required (RR) software release, each stage consists of moving pods or VMs, locking hosts, deploying software on hosts, and unlocking hosts for a subset of the hosts on the system. After creating the software deployment orchestration strategy, you can either apply the entire strategy automatically or apply individual stages to control and monitor their progress manually.
Prerequisites
No other orchestration strategy exists. Firmware-upgrade, kubernetes-version-upgrade, system-config-update-strategy, and kube-rootca-update are other types of orchestration. A software deployment cannot be orchestrated while another orchestration is in progress.
You have the administrator role privileges.
The system is clear of alarms except the software deployment in progress alarm.
All the hosts are unlocked, enabled, and available.
For Duplex systems, the system should be fully redundant. There should be two controller nodes available, at least one complete storage replication group available for systems with Ceph backend.
Sufficient free capacity or unused worker resources must be available across the cluster. A rough calculation is:
Required spare capacity ( %) = (<Number-of-hosts-to-upgrade-in-parallel> / <total-number-of-hosts>) * 100
For a major release deployment, the license for the new release has been installed using system license-install <license-for-new-major-release>.
The software release to be deployed has been uploaded.
For a major release:
~(keystone_admin)]$ software upload [ --local ] <new-release>.iso <new-release>.sig <new-release-id> is now uploaded +-------------------------------+-------------------+ | Uploaded File | Release | +-------------------------------+-------------------+ | <new-release>.iso | <new-release-id> | +-------------------------------+-------------------+
This command may take 5-10 mins depending on hardware.
where –local can be used when running this command in an SSH session on the active controller to optimize performance. With this option, the system will read files directly from the local disk rather than transferring files over REST APIs backing the CLI.
For a patch release:
~(keystone_admin)]$ software upload <filename>.patch <release-id> is now uploaded +-------------------------------+-------------------+ | Uploaded File | Release | +-------------------------------+-------------------+ | <new-release>.patch | <new-release-id> | +-------------------------------+-------------------+
Ensure that the new software release was successfully uploaded.
~(keystone_admin)]$ software list +--------------------------+-------+-----------+ | Release | RR | State | +--------------------------+-------+-----------+ | starlingx-10.0.0 | True | deployed | | <new-release-id> | True | available | +--------------------------+-------+-----------+
For a major release deployment, the platform issuer (system-local-ca) must be configured beforehand with an RSA certificate/private key. If
system-local-ca
was configured with a different type of certificate/private key, use the Update system-local-ca or Migrate Platform Certificates to use Cert Manager procedure to reconfigure it with RSA certificate/private key.
Procedure
Create a software deployment orchestration strategy for a specified software release with desired policies.
~(keystone_admin)]$ sw-manager sw-deploy-strategy create [--controller-apply-type {serial,ignore}] [--storage-apply-type {serial,parallel,ignore}] [--worker-apply-type {serial,parallel,ignore}] [--max-parallel-worker-hosts {2,3,4,5,6,7,8,9,10}] [--instance-action {stop-start,migrate}] [--alarm-restrictions {strict,relaxed}] [--delete] <software-release-id> strategy-uuid: 5435e049-7002-4403-acfb-7886f6da14af release-id: <software-release-id> controller-apply-type: serial storage-apply-type: serial worker-apply-type: serial default-instance-action: stop-start alarm-restrictions: strict current-phase: build current-phase-completion: 0% state: building inprogress: true
where,
<software-release-id>
Specifies the specific software release to deploy. This can be a patch release or a major release.
[--controller-apply-type {serial,ignore}]
(Optional) Specifies whether software should be deployed to controller hosts in serial or ignored. By default, it is serial.
ignore
should be used only when re-creating and applying a strategy after an abort or failure.[--storage-apply-type {serial,parallel,ignore}]
(Optional) Specifies whether software should be deployed to storage hosts in serial, in parallel, or ignored. By default, it is serial. Software is deployed to storage hosts in parallel by software deploying a storage host from each storage redundancy group.
ignore
should be used only when re-creating and applying a strategy after an abort or failure.Note
If parallel apply for storage is used, it will be automatically replaced with the serial apply for
--storage-apply-type
.[--worker-apply-type {serial,parallel,ignore}]
(Optional) Specifies whether software should be deployed to worker hosts in serial, in parallel or ignored. By default, it is serial. The number of worker hosts that are software deployed in parallel is specified by
[--max-parallel-worker-hosts {2,3,4,5,6,7,8,9,10}]
. The default is 2.ignore
should be used only when re-creating and applying a strategy after an abort or failure.[--max-parallel-worker-hosts {2,3,4,5,6,7,8,9,10}]
Specifies the number of worker hosts that are software deployed in parallel that is specified by
[--max-parallel-worker-hosts {2,3,4,5,6,7,8,9,10}]
. The default is 2.[--instance-action {stop-start,migrate}]
Applies only to OpenStack VM hosted guests. It specifies the action performed to hosted OpenStack VMs on a worker host (or AIO controller) prior to deploying the new software to the host. The default is
stop-start
.stop-start
Before deploying the software release to the host, all the hosted OpenStack VMs are stopped or shutdown.
After deploying the software release to the host, all the hosted OpenStack VMs are restarted.
migrate
Before deploying the software release to the host, all the hosted OpenStack VMs are migrated to another host capable of hosting the hosted OpenStack VM and that is not part of the current stage.
Hosts whose software is already updated are preferred over the hosts whose software has not been updated yet.
Live migration is attempted first. If live migration is not possible for the OpenStack VM, cold migration is performed.
[--alarm-restrictions {strict,relaxed}]
Lets you determine how to handle alarm restrictions based on the management affecting statuses of any existing alarms, which takes into account the alarm type as well as the alarm’s current severity. Default is strict. If set to relaxed, orchestration will be allowed to proceed if there are no management affecting alarms present.
Performing management actions without specifically relaxing the alarm checks will still fail if there are any alarms present in the system (except for a small list of basic alarms for the orchestration actions, such as an upgrade operation in progress alarm not impeding upgrade orchestration). You can use the CLI command fm alarm-list --mgmt_affecting to view the alarms that are management affecting.
Strict
maintains alarm restrictions.Relaxed
relaxes the usual alarm restrictions and allows the action to proceed if there are no alarms present in the system with a severity equal to or greater than its management affecting severity. That is, it will use the-f
(force) option on the precheck or start of the deployment.
[--delete]
(Optional) Specifies if the software deployment needs to be deleted or not.
Wait for the
build
phase of the software deployment orchestration strategy create to be 100% complete and its state to beready-to-apply
.~(keystone_admin)]$ sw-manager sw-deploy-strategy show Strategy Software Deploy Strategy: strategy-uuid: 6282f049-bb9e-46f0-9ca8-97bf626884e0 release-id: <software-release-id> controller-apply-type: serial storage-apply-type: serial worker-apply-type: serial default-instance-action: stop-start alarm-restrictions: strict current-phase: build current-phase-completion: 100% state: ready-to-apply build-result: success build-reason:
Note
If the build phase fails (
build-result: failed
that will appear in the show command), determine the issue from the build error reason (build-reason: <Error information>
that will appear in the show command) and/or in/var/log/nfv-vim*.log
on the active controller, address the issues, delete the strategy, and retry the create.(Optional) Displays
--error-details
(phases and steps) of the build strategy.The software deploy strategy consists of one or more stages, which consist of one or more hosts to have the new software deployed at the same time.
Each stage will be split into steps (for example, query-alarms, lock-hosts, upgrade-hosts).
The new software is deployed on the controller hosts first, followed by the storage hosts, and then the worker hosts.
The new software is deployed on the worker hosts with no hosted guests (Kubernetes pods or OpenStack VMs) and before the worker hosts with hosted guests (Kubernetes pods or OpenStack VMs).
Hosted Kubernetes pods will be relocated off each worker host (AIO-Controller) if another worker host capable of hosting the Kubernetes pods is available before the new software is deployed to the worker host (AIO-Controller).
Hosted OpenStack VMs will be managed according to the requested
--instance-action
on each worker host (AIO-Controller) before the new software is deployed to the worker host (AIO-Controller).The final step in each stage is one of the following:
system-stabilize
This waits for a period of time (up to several minutes) and ensures that the system is free of alarms.
This ensures that we do not continue to deploy the new software to more hosts if the software deployment has caused an issue resulting in an alarm.
wait-data-sync
This waits for a period of time and ensures that data synchronization has completed after the upgrade of a controller or storage node.
~(keystone_admin)]$ sw-manager sw-deploy-strategy show --details Strategy Software Deploy Strategy: strategy-uuid: 6282f049-bb9e-46f0-9ca8-97bf626884e0 release-id: <software-release-id> controller-apply-type: serial storage-apply-type: serial worker-apply-type: serial default-instance-action: stop-start alarm-restrictions: strict current-phase: build current-phase-completion: 100% state: ready-to-apply build-phase: ... stages: ... steps: ... apply-phase: ... stages: ... steps: ...
Apply and monitor the software deployment orchestration.
You can either apply the entire strategy automatically or apply the individual stages to control and monitor their progress manually.
Apply the entire strategy automatically and monitor its progress:
~(keystone_admin)]$ sw-manager sw-deploy-strategy apply Strategy Software Deploy Strategy: strategy-uuid: 52873771-fc1a-48cd-b322-ab921d34d01c release-id: <software-release-id> controller-apply-type: serial storage-apply-type: serial worker-apply-type: serial default-instance-action: stop-start alarm-restrictions: strict current-phase: apply current-phase-completion: 0% state: applying inprogress: true
Show high-level status of apply.
~(keystone_admin)]$ sw-manager sw-deploy-strategy show Strategy Software Deploy Strategy: strategy-uuid: 35b48793-66f8-46be-8972-cc22117a93ff release-id: <software-release-id> controller-apply-type: serial storage-apply-type: serial worker-apply-type: serial default-instance-action: stop-start alarm-restrictions: strict current-phase: apply current-phase-completion: 7% state: applying inprogress: true
Show details of active stage or step of apply.
~(keystone_admin)]$ sw-manager sw-deploy-strategy show --active Strategy Software Deploy Strategy: strategy-uuid: 52873771-fc1a-48cd-b322-ab921d34d01c release-id: <software-release-id> controller-apply-type: serial storage-apply-type: serial worker-apply-type: serial default-instance-action: stop-start alarm-restrictions: strict current-phase: apply current-phase-completion: 7% state: applying apply-phase: total-stages: 3 current-stage: 0 stop-at-stage: 3 timeout: 12019 seconds completion-percentage: 7% start-date-time: 2024-06-11 12:19:51 inprogress: true stages: stage-id: 0 stage-name: sw-upgrade-start total-steps: 3 current-step: 1 timeout: 1321 seconds start-date-time: 2024-06-11 12:19:51 inprogress: true steps: step-id: 1 step-name: start-upgrade timeout: 1200 seconds start-date-time: 2024-06-11 12:19:51 result: wait reason:
Apply individual stages.
~(keystone_admin)]$ sw-manager sw-deploy-strategy apply --stage-id <STAGE-ID> Strategy Software Deploy Strategy: strategy-uuid: a0277e08-93cc-4964-ba39-ebab367a547c release-id: <software-release-id> controller-apply-type: serial storage-apply-type: serial worker-apply-type: serial default-instance-action: stop-start alarm-restrictions: strict current-phase: apply current-phase-completion: 0% state: applying inprogress: true
~(keystone_admin)]$ sw-manager sw-deploy-strategy show Strategy Software Deploy Strategy: strategy-uuid: a0277e08-93cc-4964-ba39-ebab367a547c release-id: <software-release-id> controller-apply-type: serial storage-apply-type: serial worker-apply-type: serial default-instance-action: stop-start alarm-restrictions: strict current-phase: apply current-phase-completion: 7% state: applying inprogress: true
~(keystone_admin)]$ sw-manager sw-deploy-strategy show --active Strategy Software Deploy Strategy: strategy-uuid: a0277e08-93cc-4964-ba39-ebab367a547c release-id: <software-release-id> controller-apply-type: serial storage-apply-type: serial worker-apply-type: serial default-instance-action: stop-start alarm-restrictions: strict current-phase: apply current-phase-completion: 7% state: applying apply-phase: total-stages: 3 current-stage: 0 stop-at-stage: 1 timeout: 1322 seconds completion-percentage: 7% start-date-time: 2024-06-11 14:40:23 inprogress: true stages: stage-id: 0 stage-name: sw-upgrade-start total-steps: 3 current-step: 1 timeout: 1321 seconds start-date-time: 2024-06-11 14:40:23 inprogress: true steps: step-id: 1 step-name: start-upgrade timeout: 1200 seconds start-date-time: 2024-06-11 14:40:23 result: wait reason:
While a software deployment orchestration strategy is being applied, it can be aborted.
The current step will be allowed to complete and if necessary, an abort phase will be created and applied, which will attempt to unlock any hosts that were locked.
~(keystone_admin)]$ sw-manager sw-deploy-strategy abort Strategy Software Deploy Strategy: strategy-uuid: 63f48dfc-f833-479b-b597-d11f9219baf5 release-id: <software-release-id> controller-apply-type: serial storage-apply-type: serial worker-apply-type: serial default-instance-action: stop-start alarm-restrictions: strict current-phase: apply current-phase-completion: 7% state: aborting inprogress: true
Wait for the abort to complete.
~(keystone_admin)]$ sw-manager sw-deploy-strategy show Strategy Software Deploy Strategy: strategy-uuid: 63f48dfc-f833-479b-b597-d11f9219baf5 release-id: <software-release-id> controller-apply-type: serial storage-apply-type: serial worker-apply-type: serial default-instance-action: stop-start alarm-restrictions: strict current-phase: abort current-phase-completion: 100% state: aborted apply-result: failed apply-reason: abort-result: success abort-reason:
Note
To view detailed errors, run the following commands:
~(keystone_admin)]$ sw-manager sw-deploy-strategy show --error-details
~(keystone_admin)]$ sw-manager sw-deploy-strategy show Strategy Software Deploy Strategy: strategy-uuid: <> release-id: <software-release-id> controller-apply-type: serial storage-apply-type: serial worker-apply-type: serial default-instance-action: stop-start alarm-restrictions: strict current-phase: abort current-phase-completion: 100% state: aborted apply-result: failed apply-error-response: abort-result: success abort-reason: abort-error-response:
Note
After a software deployment strategy has been applied (or aborted), it must be deleted before another software deployment strategy can be created.
Otherwise, wait for all the steps of all stages of the software deployment orchestration strategy to complete.
~(keystone_admin)]$ sw-manager sw-deploy-strategy show Strategy Software Deploy Strategy: strategy-uuid: 6282f049-bb9e-46f0-9ca8-97bf626884e0 release-id: <software-release-id> controller-apply-type: serial storage-apply-type: serial worker-apply-type: serial default-instance-action: stop-start alarm-restrictions: strict current-phase: applied current-phase-completion: 100% state: applied apply-result: success apply-reason:
If a software deployment strategy apply fails, you must address the issue that caused the failure, then delete/re-create the strategy before attempting to apply it again.
For additional details, run the sw-manager sw-deploy-strategy show --error-details command.
Delete the completed software deployment strategy.
~(keystone_admin)]$ sw-manager sw-deploy-strategy delete Strategy deleted
Postrequisites
After a successful software deployment orchestration,
The Kubernetes Version Upgrade procedure can be executed, if desired, to upversion to a new Kubernetes versions available in the new software release.
You should also validate that the system and hosted applications are healthy.
In the case of a major release software deployment:
If you do not need to rollback the major release software deployment, then delete the software deployment that was used by the software deployment orchestration.
~(keystone_admin)]$ software deploy delete Deployment has been deleted
~(keystone_admin)]$ software deploy show No deploy in progress
Remove the old major release to reclaim disk space.
Note
If this is a System Controller, the old major release should NOT be deleted until all the subclouds have moved to new major release.
~(keystone_admin)]$ software list +--------------------------+-------+-------------+ | Release | RR | State | +--------------------------+-------+-------------+ | starlingx-10.0.0 | True | unavailable | | <new-major-release-id> | True | deployed | +--------------------------+-------+-------------+
~(keystone_admin)]$ software delete starlingx-10.0.0 starlingx-10.0.0 has been deleted.
~(keystone_admin)]$ software list +--------------------------+-------+-------------+ | Release | RR | State | +--------------------------+-------+-------------+ | <new-major-release-id> | True | deployed | +--------------------------+-------+-------------+