Unified Software Management

Storyboard: https://storyboard.openstack.org/#!/story/2010676

This feature replaces the StarlingX Patching Framework and the StarlingX Upgrade Framework with a single StarlingX Unified Software Management Framework. Externally providing a single REST API / CLI and single procedure for updating the StarlingX software on a Standalone Cloud or Distributed Cloud. Internally providing a single Framework, with a superset of capabilities, to maintain and enhance for updating the StarlingX software on a Standalone Cloud or Distributed.

Problem Description

Today in StarlingX, StarlingX software can be patched with a new patch release or upgraded to a new major/minor release. Although a lot of the external procedures and internal mechanisms are similar, there are two completely different Frameworks for addressing StarlingX Patching and StarlingX Upgrades; two different REST API / CLIs, with different procedures and concepts, two different internal frameworks implemented, with both similar and different capabilities.

Externally, having two different CLIs and procedures for updating StarlingX software requires more operator/user training, learning, documentation, etc. .

Internally, having two frameworks for updating software, with both similar and different capabilities means more effort to maintain and/or a resulting difference in capabilities. For example, StarlingX Upgrades supports data model migrations and StarlingX Patching does not. StarlingX Patching supports efficient ‘ostree’-based in-place/in-service updates to software, while StarlingX Upgrades wipes the disk and installs the system/root disk from scratch.

Use Cases

  • Manual (command-by-command) StarlingX Patching and Manual StarlingX Upgrades for a Standalone Cloud system.

  • Orchestrated StarlingX Patching and Orchestrated StarlingX Upgrades for a Standalone Cloud system.

  • Orchestrated StarlingX Patching and Orchestrated StarlingX Upgrades for Subclouds of a Distributed Cloud system.

REST API, CLIs and Horizon Web Pages associated with StarlingX Patching and StarlingX Upgrades will be addressed.

Proposed Change

Introduction

A new framework will be introduced, as opposed to modifying an existing framework, such that the new framework could be patched back to a previous release, if required, and be used along-side the existing frameworks.

The new framework will incorporate the best capabilities of each of the existing frameworks. Not all capabilities will initially be available for both patching and upgrades, but it will be much easier to enable and test enhanced capabilities in the future.

The new framework will initially be based on a copy/clone of the existing StarlingX Patching Framework; as this framework currently supports an architecture that has the ability to update (patch) software with very few dependencies, enabling it to patch a broken system. The ability to patch a broken system is a key requirement to bring forward into the new framework.

Versioning Changes

The format for software release versioning used by StarlingX build tools, for building StarlingX Major Releases and StarlingX Patch Releases will change. It is changing to align with industry standard versioning of releases of software, and to provide a versioning mechanism that incorporates both Major/Minor release versions and Patch release versions.

The new format for software release versioning will be:

MM.mm.pp

where:

  • MM = Major Release number

  • mm = Minor Release number

    • Note that currently the StarlingX Community only releases Major releases so using ‘MM.0’ for a Major StarlingX Release

  • pp = the Patch Release of StarlingX (off of a Major Release)

    • Where pp=0 represents a Major Release with no patches

    • Note that the StarlingX Community currently does NOT provide Patch Releases of Community built Major Releases. However users of StarlingX can build Patch Releases with the StarlingX build tools.

Build Changes

  • Install ISO

    • The main build output for a Major Release.

    • There will be no major changes to the ISO structure, i.e.

      • It will still generally consist of installation code and an OSTree repo with a single branch, with a single commit, containing all of the host software for the Major Release

    • There will be minor changes to meta data delivered in the ISO, for various purposes, such as specifying the new release/version-id, a list of the names of packages (and their versions) in the ostree commit, minor enhancements to data model prechecks and data model migrations, etc. .

  • Patch Module

    • The main build output for a Patch Release.

    • This will change significantly.

    • The current approach of creating OSTree Repo DIFFs for Patches has several drawbacks:

      • the patch build times are extremely long; basically have to do a full build and then diff the patch build’s OSTree Repo with the GA (or patch-current) OSTree Repo,

      • the content of the patch is NOT easily inspectable by scripts or humans,

      • the catastrophic scenario of losing your build machine (and the patch-current OSTree Repo which is the basis of your DIFFs) would be difficult to recover from,

      • portability of patches between systems at differing patch levels (e.g. which may be possible when it does not have same patch dependencies or optional patches).

    • We will go back to delivering the new or changed DEB packages associated with the software changes of the PATCH.

    • Again, there will be minor changes to meta data delivered in the Patch module, for various purposes, such as specifying the new release/version-id, etc. .

  • Patched Install ISO

    • An optional build output for a Patch Release is to build a patched Install ISO. Required for scenarios where new hardware drivers are required for new installs on particular hardware.

    • The Patched Install ISO structure will generally be the same as the Install ISO structure of a Major Release

      • NOTE that, for ISO size reasons, the Patched Install ISO will contain an OSTree repo, with a single branch and ONLY a single OSTree commit representing the patch-current software. I.e. it will NOT contain the OSTree commits for all the individual patches up to and including the patch-current state of the ISO.

      • Therefore, when uploading a Patched Install ISO, all of the Patch Release(s) included in the patched Major Release Install ISO will be in a ‘committed patch’ state; i.e. you will NOT be able to un-deploy these Patch Releases and move back to a previous Patch Release.

    • There will also be minor changes to meta data delivered in the Patched ISO, for various purposes, such as specifying the new release/version-id, a list of the Patch Release(s)’ meta data, a list of the names of packages (and their versions) in the ostree commit, minor enhancements to data model prechecks and data model migrations, etc. .

OSTree Repo Strategy

In general, the OSTree Repo strategy on StarlingX will remain the same. However it will be extended now to support multiple Major Releases, upgrades between Major Releases, and dynamically built OSTree commits for Patch Releases.

Centralized ‘feed’ OSTree Repos will be managed on the controllers; one for each Major Release. These centralized ‘feed’ OSTree Repos will be used to prepare and hold the OSTree commits to be deployed on each of the hosts in the StarlingX cloud (i.e. controller(s) [, workers] [, storage hosts] ).

The main OSTree repo for EACH / EVERY host will be located at /sysroot/ostree/repo . This repo contains the software that is and can be deployed on the host. This repo is updated during software deployment procedures by pulling an OSTree commit from a ‘feed’ OSTree Repo on the active controller.

When uploading a new Major Release of StarlingX, the OSTree Repo from the new Major Release’s Install ISO will be used to create a new ‘feed’ OSTree Repo. This new Major Release ‘feed’ OSTree Repo will hold the new Major Release’s OSTree commit (software) and will also be used for managing future Patch Releases’ OSTree commits on this Major Release.

OSTree commits for Patch Releases will be built on-target using a new hybrid package manager (apt-ostree). New Patch Release’s OSTree commits will be added to the end of the Major Release’s ‘feed’ OSTree Repo’s branch.

New Unified Software Management Commands

This section goes thru all of the new Unified Software Management commands; specifying the syntax and actions performed by the command.

Cloud (VIM) Orchestration Changes

Patching and Upgrading Cloud Orchestration, which executes software patching and software upgrades across all hosts within a Cloud, can now be unified as well.

The VIM Upgrade Orchestration will be used for the Cloud Orchestration of Software Release Deployments; i.e. for both Patch Release deployments and Major Release deployments. The implementation will be modified to use the new ‘software …’ REST API. In the short term, the REST API and the CLIs (‘sw-manager upgrade-strategy create/apply/show/abort/delete’) will remain the same as they are today.

The VIM Patch Orchestration (‘sw-manager patch-strategy create/apply/show/abort/delete’ REST API and CLIs) will no longer be supported, as it is no longer needed.

Distributed Cloud (DC) Orchestration Changes

Patching and Upgrading DC Orchestration, which executes software patching and software upgrades across all Subclouds within a Distributed Cloud system, can now be unified as well.

The DC Upgrade Orchestration will be used for the Distributed Cloud Orchestration of Software Release Deployments; i.e. for both Patch Release deployments and Major Release deployments. The implementation will be modified to use the new ‘software …’ REST API and the existing (unchanged) VIM Upgrade Orchestration REST API. In the short term, the REST API and the CLIs (‘dcmanager upgrade-strategy create/apply/show/abort/delete’) will remain the same as they are today.

The DC Patch Orchestration (‘dcmanager patch-strategy create/apply/show/abort/delete’ REST API and CLIs) will no longer be supported, as it is no longer needed.

Alternatives

Alternatives that were considered included:

  • Continue with externally 2x CLIs but leveraging one internal framework.

    • The existing patch and upgrade CLIs could have been kept the same and only the underlying internal framework be unified.

    • However, it would have been difficult to map the two different CLIs and procedures to a single set of internal steps.

    • And having two different CLIs and procedures for updating StarlingX software requires more operator/user training, learning, documentation, etc. .

  • Continue with use of ostree-diffs for patches.

    • The APIs/CLIs and underlying internal framework could have been unified without modifying the structure of the patch module.

    • However the ostree-diff -style patch module had several drawbacks:

      • the patch build times are extremely long; basically have to do a full build and then diff the patch build’s OSTree Repo with the GA (or patch-current) OSTree Repo,

      • the content of the patch is NOT easily inspectable by scripts or humans,

      • the catastrophic scenario of losing your build machine (and the patch-current OSTree Repo which is the basis of your DIFFs) would be difficult to recover from,

  • Go back to a Package-based (now DEB) ISO

    • For consistency with the packaging of the Patch module, we considered going back to a Package-based (now DEB) ISO and having the Install ISO build the OSTree commit for the Major Release using apt-ostree.

    • This would have impacted/degraded the ISO Install performance.

    • Instead we left it that the Install ISO contained an OSTree Repo with a single OSTree branch, with a single OSTree commit (built at build time), and added a list of DEB Package Names and Versions in the Major Release OSTree commit, for readability, as part of the ISO meta data.

Data Model Impact

At a high-level, the data model used by the previous Patching Framework to track patches, will be used (and enhanced) by the new Unified Software Management Framework in order track software releases. The framework used by this data model uses a simple filesystem structure for its database, in order to reduce dependencies and align with the requirement to be able to patch a broken system. A number of parameters will be added to this internal data model in order to support patch modules with DEB Packages and to support deployments of Major Releases.

This feature also introduces major data model changes around the REST API / CLIs and command sequences for executing software deployments.

The following NEW software REST API / CLIs are introduced:

  • software upload [–local] <filename> [<ISO-signature-file>]

  • software upload-dir <directory>

  • software list

  • software show starlingx-<MM.mm.pp>

  • software upload

  • software deploy precheck [–force] starlingx-<MM.mm.pp>

  • software deploy start [–force] starlingx-<MM.mm.pp>

  • software deploy show

  • software deploy host <hostname>

  • software deploy host list

  • software deploy abort

  • software deploy rollback host <hostname>

  • software deploy activate

  • software deploy complete

  • software commit-patch { –all | starlingx-<MM.mm.pp> }

The following REST API / CLIs are DELETED and no longer supported:

  • Patching REST APIs / CLIs

    • sw-patch upload <patch-filename>

    • sw-patch upload-dir <directory-with-patch-files>

    • sw-patch query

    • sw-patch show <patch-id>

    • sw-patch apply { <patch-id> | –all }{

    • sw-patch query-hosts

    • sw-patch host-install <hostname>

  • Upgrades REST APIs / CLIs

    • system load-import [–local] <iso-filename> <signature-filename>

    • system load-list

    • system health-query-upgrade

    • system upgrade-start

    • system host-upgrade <hostname>

    • system upgrade-show

    • system upgrade-activate

    • system upgrade-complete

  • Cloud (VIM) Patching Orchestration REST APIs / CLIs

    • sw-manager patch-strategy create …

    • sw-manager patch-strategy apply

    • sw-manager patch-strategy show

    • sw-manager patch-strategy abort

    • sw-manager patch-strategy delete

  • Distributed Cloud (DC) Patching Orchestration REST APIs / CLIs

    • dcmanager patch-strategy create …

    • dcmanager patch-strategy apply

    • dcmanager patch-strategy show

    • dcmanager patch-strategy abort

    • dcmanager patch-strategy delete

REST API Impact

A new set of REST APIs will be added to implement commands of the new unified software management framework.

At a high-level this will be along the lines of:

  • POST /v1/software/upload

  • GET /v1/software/query

  • GET /v1/software/show/<release>

  • GET /v1/software/commit_patch/<release>

  • GET v1/software/deploy_show

  • POST v1/software/install_local

  • POST v1/software/init_release

  • POST v1/software/del_release (currently implemented as POST due to params)

  • GET v1/software/query_app_dependencies

  • GET v1/software/report_app_dependencies

  • GET v1/software/query_dependencies

  • POST v1/software/is_available

  • POST v1/software/is_deployed

  • POST v1/software/is_committed

  • POST /v1/software/delete/<release>

  • POST /v1/software/deploy_precheck/<release>

  • POST /v1/software/deploy_start/<release>

  • POST /v1/software/deploy_host

  • POST /v1/software/host_list

  • POST /v1/software/deploy_activate/<release>

  • POST /v1/software/deploy_complete/<release>

REST APIs associated with DELETED commands will be removed.

Security Impact

The new ‘software’ commands will be authenticated with Keystone, and all commands will require Keystone ‘admin’ role. Time permitting (or in future), passive commands (e.g. get, list) will only require Keystone ‘reader’ role.

For ‘software’ CLI commands executed locally, if the commands are run under ‘sudo’, Keystone authentication and authorization will be by-passed as part of the requirement to support patching a broken system; i.e. even if Keystone authentication/authorization is broken.

Other End User Impact

Other than the REST API, CLI and resulting procedural changes for managing the deployment of Patch Release and Major Release software on a StarlingX system, there are no other end-user impacts.

Performance Impact

The performance of the initial deployment of a Major Release should be unchanged due to the decision to keep the format of the Install ISO the same.

The major performance impact, actually an improvement, is related to the deployment of a new Major Release (i.e. an upgrade in old terminology).

Previous to this feature, upgrading to a new Major Release involved wiping StarlingX host’s disks and re-installing the new Install ISO’s OSTree Repo to disk. In the case of All-In-One (AIO) Simplex (SX) configurations this meant that a backup of configuration data, wipe disk, re-install to disk and a restore of configuration data was required; all (except the initial backup) within the out-of-service window.

By leveraging OSTree to manage and deploy the root filesystem of the new Major Release in parallel with running the existing root filesystem, the deployment of a Major Release on an AIO-SX system does not require a backup of configuration data, does not require a wipe disk, does not require a re-install to disk and does not require a restore of configuration data. An install of the new root filesystem for the new Major Release is required, but can be done outside of the out-of-service window. Because of all this, the elapsed time and the out-of-service time for a Major Release deployment on an AIO-SX system will be significantly reduced.

Other configurations, i.e. AIO-DX [+ workers] and Standard, will also benefit from not requiring a wipe disk and install to disk for each host. The elapsed time for a Major Release deployment will be reduced for these configurations.

Other Deployer Impact

The REST APIs / CLIs and procedures for managing both Patch Release deployments and Major Release deployments will now be the same. This will reduce deployer training, learning and documentation.

There is likely additional filesystem and memory requirements to realize this feature. However, it is not expected to be significant. Additional filesystem and memory requirements will be measured and documented in system engineering information if significant.

Developer Impact

After this feature is submitted to StarlingX Master Branch, all developers will have to manage future deployments of Patch Releases and Major Releases, within their development systems, with the new CLIs and new procedures.

The amount of time to build a ‘designer’ Patch Release for test purposes will be much shorter than it is today.

Upgrade Impact

It is important to note that this feature is targetted for STX 10.0 . Therefore the upgrade from STX 9.0 to STX 10.0 will not benefit from this feature’s changes. Subsequent management of Patch Releases at STX 10.0 will benefit from this feature, and the upgrade from STX 10.0 to STX 11.0 will benefit from this feature.

Implementation

Assignee(s)

Primary assignee:

Bin Qian John Kung

Other contributors:

Jessica Castelino Charles Short Junfeng (Shawn) Li Heitor Vieira Matsui Luiz (Boni) Bonatti Dostoievski Albino Batista Vanathi Selvaraju Hugo Nicodemos Christopher De Oliveira Souza Tee Ngo Jim Beacom Mike Matteson Michel Desjardins Ashish Jaywant Imad Iqbal

Repos Impacted

Repositories in StarlingX that are impacted by this spec:

  • update

  • config

  • puppet

  • nfv

  • dcmanager

  • metal

  • dcorch/api (proxy)

  • integ

  • apt-ostree (new)

  • ansible

  • docs

  • tools

  • root

Work Items

The major work items are:

  • Build Changes

    • Update Patch Build with DEB Packages, variety of minor meta data changes, deploy-prechecks-script, pre-deploy scripts and post-deploy scripts

    • Update ISO Build with variety of minor meta data changes, deploy-prechecks-script and list of DEB package names & versions

    • Update Patched ISO Build to use apt-ostree to build OSTree commit with Major Release’s software plus all Patch Releases’ software

  • REST API & CLI Client & Server Framework for new ‘software …’ commands

    • includes mechanisms to isolate accesses to sysinv from minimal Patch Release deployment flow in order to maintain ability to patch a broken system

    • includes mechanisms to improve error reporting and diagnostic information on deployment failures.

    • includes authentication and authorization changes

      • changes to support Keystone authentication & authorization for Remote CLI and non-sudo Local CLI

      • changes to support NO Keystone authentication & authorization when running Local CLI under sudo

  • new apt-ostree hybrid package manager

  • software upload

    • for Major Releases,

      • support for signature validation

      • support for –local

      • creation of Major Release’s ‘feed’ directory with ‘feed’ OSTree Repo from ISO

    • for Patch Releases,

      • new apt-repository for management of DEB Packages from Patch modules; done within apt-ostree

    • support for responding with Software Releases loaded from file(s).

  • software list / show implementation

  • software deploy precheck

    • calling and reporting status of system health checks

    • calling and reporting status of StarlingX-Service -specific deploy prechecks

  • software deploy start

    • for Major Releases,

      • calling and reporting status of StarlingX-Service -specific central data migration scripts; done in chroot of checkout of OSTree commit to be deployed (apt-ostree used here)

    • for Patch Releases,

      • use of apt-ostree to create new OSTree commit in ‘feed’ OSTree Repo for Patch Release from DEB Packages in Patch module; i.e. OSTree checkout, chroot to install DEB Packages and commit back to OSTree repo

  • software deploy host

    • support calling of pre-deploy scripts and post-deploy scripts on both In-Service Patch Release deployment and Reboot-Required Patch Release deployment

    • support for calling of local host data migration scripts on host reboot for Major Release deployment

    • support for Reboot-Required Patch Release deployment and Major Release deployment by using apt-ostree to pull OSTree branch from active controller’s ‘feed’ OSTree Repo into the local host’s /sysroot/ostree/repo and execute ‘ostree deploy …’ to prepare the host to reboot with the new root filesystem from the OSTree commit being deployed

    • support for In-Service Patch Release deployment by using apt-ostree to pull OSTree branch from active controller’s ‘feed’ OSTree Repo int the local host’s /sysroot/ostree/repo and then manipulating OSTree deployment hard links in order to immediately switch to the new root filesystem from the OSTree commit being deployed

  • ‘software deploy activate’ implementation

  • ‘software deploy complete’ implementation

  • support for committing a patch

  • Changes to Cloud (VIM) Orchestration

  • Changes to Distributed Cloud (DC) Orchestration

  • Changes to Subcloud installation, patch and upgrade prestaging due to changes introduced by this feature.

  • Changes to Horizon pages/tabs due to the changes introduced by this feature.

Dependencies

None.

Testing

Required testing for this feature will be extensive.

Testing should be done for all configurations; standalone AIO-SX, AIO-DX [+workers], Standard, DC SystemController and DC Subclouds.

Testing should be done primarily with Local CLI. Subset of tests can be done with Horizon and Remote CLI to verify their operations.

Testing needs to include:

  • software upload/list/show/delete testing

  • initial install testing for all configurations

  • deployment of in-service patch release for all configurations

  • deployment of reboot-required patch release for all configurations

  • deployment of major release for all configurations

  • deployment of patched major release for all configurations

  • validation of calling pre-deploy and post-deploy scripts for both in-service and reboot-required patch release deployment

  • validation of calling deploy-precheck, data migration and other upgrade scripts for major release deployment

  • system backup after software patch release and restore

  • cloud (vim) orchestrated deployment of Patch Release for all configurations

  • cloud (vim) orchestrated deployment of Major Release for all configurations

  • distributed cloud (DC) orchestrated deployment of Patch Release for all configurations

    • with & without subcloud prestaging

  • distributed cloud (DC) orchestrated deployment of Major Release for all configurations

    • with & without subcloud prestaging

  • removal of a patch release

  • commit of patch releases

  • rollback of inprogress Major Release deployment

Documentation Impact

There will be major changes to documentation as well for this feature.

The documentation for updating the software on a StarlingX system should become simpler due to the unified CLI and unified procedure for deploying a Patch Release and a Major Release.

The following sections

… should collapse into a single set of procedures for manually updating software.

Similarly the following sections

… should collapse into a single set of procedures for doing an orchestrated update of software.

And similar collapsing of patching/update and upgrade sections will occur in the distributed cloud documentation sections: https://docs.starlingx.io/dist_cloud/index-dist-cloud-f5dbeb16b976.html

References

None.

History

Revisions

Release Name

Description

STX 10.0

Initial draft