Kubernetes root CA certificate update

Storyboard: https://storyboard.openstack.org/#!/story/2008675

This feature introduces CLI/REST APIs and execution orchestration for updating Kubernetes root CA certficate and certificates issued by the root CA in a rolling fashion so that the impact on the system is minimized.

This is the updated version of the approved spec security-2008675-kubernetes-rootca-update.rst. This version reflects the adjustments from implementation.

Problem description

In a deployed Kubernetes cluster, the root CA certficate signs all the other serving and client certificates used by various components for various purposes. This root CA certificate may need to be updated for security or administrative reasons while the cluster is still running.

An update mechanism is needed to update the root CA certificate and all the certificates signed by the root CA certificate in a rolling fashion (ie., minimal impact on the applications and services running in the cluster).

Currently Kubernetes doesn’t provide such a mechanism out of the box. A manual update procedure [1] is possible but it’s lengthy and error-prone. This feature is to introduce a set of CLI/REST APIs and execution orchestration to simplify the procedure.

Use Cases

  • The cluster’s root CA certificate approaches its expiry date, the cloud admin need to update the root CA certicate in order for the cluster to function continously.

  • The cloud admin decides to update the root CA certificate with a new one for security concern.

Proposed change

Enhance sysinv to support root CA certificate rolling update

A rolling update procedure roughly based on [1] has been investigated. The procedure consists of three phases. The first phase is to update kubernetes components and pods to trust the new root CA certficate along with the old one (trust both). The second phase is to update kubernetes components’ server and client certificates with new ones signed by the new root CA certificate. The third phase is to remove the old root CA certficate from components’ and pods’ trusted CA bundle so that only the new root CA certificate is trusted.

We will wrap up this update procedure by sysinv CLI commands and supporting APIs. VIM and DC orchestration of the procedure will be in the future. This is being done to hide the complexities of the underlying procedure, add in semantic checks and overall provides a simpler, less error-prone procedure, which will be analogous to the approach taken for other complex multi-host procedures such as kubernetes upgrade, patching and system upgrades.

The overall feature will have multiple layers. sysinv REST APIs and CLI is the first layer providing the fundamental implementation of the certificate update. VIM orchestration is the second layer for executing the update across all hosts in a cluster, by utilizing support from sysinv. DC Orchestration is the third layer for executing VIM orchestration across all subclouds of a DC system.

There will also be a 4th layer in the future where cert-manager will manage the kubernetes Root CA certificate and key. cert-mon will monitor the certificate and raise alarm when it needs to be updated so that user can schedule the orchestration of the update during a maintenance window.

The initial version of the spec will cover only the first layer, the sysinv support for root CA certifcate update. Changes include adding new system CLI commands and sysinv REST APIs to the existing framework, adding logic to sysinv conductor to generate required puppet hieradata, and adding new puppet runtime manifests to be applied by sysinv agent to make the actual certificate update on hosts.

Sysinv operations for root CA certificate update

A new set of sysinv CLI commands will be introduced to simplify the update procedure. It will be a procedure similar to software upgrade, with a start, execute and complete cycle. User can retry a step if it fails. There will also be support for “abort”, where user can choose to exit an on-going update. But the user is supposed to restart the update procedure with either uploading or re-generating a root CA certficate and run the update to full complete. This also provides a mechanism to restore the original CA certificate if user chooses to upload the original CA certificate.

The following is a summary of the CLI commands and the steps to perform kubernetes root CA certificate update.

1. system kube-rootca-update-start

  • Pre-check to validate the update, initialize the procedure and mark update progress as update-started.

2. system kube-rootca-certificate-generate

  • Generates a new kubernetes root CA certificate

  • Change progress state to update-new-rootca-cert-generated

2. system kube-rootca-certificate-upload

  • User can choose to use this command to upload a new kubernetes root CA certificate and private key from a file instead of generating one

  • Change progress state to update-new-rootca-cert-uploaded

3. system kube-rootca-host-update <hostname> –phase=trust-both-cas

  • Update apiserver’s trusted CAs to include the new CA cert

  • Update scheduler’s trusted CAs to include the new CA cert

  • Update controller-manager’s trusted CAs to include the new CA cert

  • Update kubelet’s trusted CAs to include the new CA cert

  • Update admin.conf’s trusted CAs to include the new CA cert

  • Change progress state to updated-host-trust-both-cas on success

  • Change progress state to updating-host-trust-both-cas-failed on failure

4. system kube-rootca-pods-update –phase=trust-both-cas

  • Annotate Daemonsets and Deployments to trigger pod replacement in a safer rolling fashion, to ensure pods to pick up the new root CA cert as its trusted CA along with the old root CA certificate

  • Change progess state to updated-pods-trust-both-cas on success

  • Change progess state to updating-pods-trust-both-cas-failed on failure

5. system kube-rootca-host-update <hostname> –phase=update-certs

  • Update admin.conf’s client cert/key data with new ones signed by the new root CA

  • Update apiserver’s server and client certs/keys with new ones signed by the new root CA

  • Update scheduler’s client cert/key with new one signed by the new root CA

  • Update controller-manager’s client cert/key with new one signed by the new root CA

  • Update kubelet’s client cert/key with new one signed by the new root CA

  • Change progress state to updated-host-update-certs on success

  • Chante progress state to updating-host-update-certs-failed on failure

6. system kube-rootca-host-update <hostname> –phase=trust-new-ca

  • Update admin.conf’s trusted CAs to remove the old root CA

  • Update apiserver’s trusted CAs to remove the old root CA

  • Update controller-manager’s trusted CAs to remove the old root CA

  • Update scheduler’s trusted CAs to remove the old root CA

  • Update kubelet’s trusted CAs to remove the old root CA

  • Change progress state to updated-host-trust-new-ca on success

  • Change progress state to updating-host-trust-new-ca-failed on failure

7. system kube-rootca-pods-update –phase=trust-new-ca

  • Annotate Daemonsets and Deployments to trigger pod replacement in a safer rolling fashion, to remove the old root CA from pods trusted CA list

  • Change progress state to updated-pods-trust-new-ca on success

  • Change progress state to updating-pods-trust-new-ca-failed on failure

8. system kube-rootca-host-update complete

  • Post-check to verify the update

  • Change the progress state to update-complete

9. system kube-rootca-host-update-list

  • Run this command anytime to show the update status of all hosts in the cluster

10. system kube-rootca-update-show

  • Run this command anytime to show the overall update status

11. system kube-rootca-update-abort

  • Run this command to abort the update at any step

VIM Orchestration Operations

Refer to future spec

DC Orchestration Operations

Refer to future spec

cert-mon monitoring and alarm raising

Refer to future spec

Fault Handling

After the update start, user can re-try the step that fails. At any step before update-complete, user can choose to reload or regenerate a new root CA certificate and start the update procedure again. This provides a mechanism to recover from a step that fails multiple times, as well as a mechanism to restore the original root CA certficate.

CLI Clients

We will extend the existing system clients to add the new commands.

Web GUI

If we want to allow the update to be handled entirely through the GUI we’d need to add support in the GUI for all the operations from sysinv.

This will not be implemented in the initial release.

Alternatives

kubernetes v1.18.1 has support to renew certificates via “kubeadm alpha certs renew” command [2]. Certificates can be renewed by kubeadm include admin.conf, apiserver, apiserver-kubelete-client, controller-manager.conf, scheduler.conf. It doesn’t support renewal of the root CA certificate and kubelet client certificates.

We could update /etc/kubernetes/pki/ca.crt and /etc/kubernetes/pki/ca.key with a new root CA cert and use kubeadm to update the certificates supported, but this procedure won’t be a rolling update and will cause service outage. Still we have to handle kubelet client certificates as they are not managed by kubeadm.

Notably, this alternative procedure would be a lengthy manual error-prone procedure.

Data model impact

In order to track the progress of the update, the following tables in sysinv database are required.

  • kube_rootca_update

    • created/update/delete_at: as per other tables

    • id: as per other tables

    • uuid: as per other tables

    • from_rootca_cert: character (255), the id of the old root CA cert

    • to_rootca_cert: character (255), the id of the new root CA cert

    • state: character (255), the state of the update

  • kube_rootca_host_update

    • created/update/delete_at: as per other tables

    • id: as per other tables

    • uuid: as per other tables

    • target_rootca_cert: character (255), the id of the new root CA cert

    • effective_rootca_cert: character (255), the id of the current root CA cert

    • state: character (255), the state of the update

    • host_id: foreign key (i_host.id)

REST API impact

New sysinv REST APIs will be added to implement the certificate update logic on top of the existing sysinv API framework. The actual certificate update in the API implementation will be by sysinv-agent applying runtime puppet manifests on each host.

The following is the list of REST resources and APIs to be added:

The new resource /kube_rootca_update is added

  • URLS:

    • /v1/kube_rootca_update

  • Request Methods:

    • POST /v1/kube_rootca_update

      • Creates (starts) a new root CA cert update

      • Response body example:

        {"uuid": "47dff2b6-17ba-45a2-b3d3-8b2a85a5dba9",
         "to_rootca_cert": null,
         "created_at": "2021-08-25T14:57:13.006034+00:00",
         "from_rootca_cert": "d70efa2daaee06f8-91764",
         "updated_at": null,
         "state": "update-started",
         "id": 1}
        
    • GET /v1/kube_rootca_update

      • Return the current root CA update

      • Response body example:

        {"uuid": "47dff2b6-17ba-45a2-b3d3-8b2a85a5dba9",
         "to_rootca_cert": null,
         "created_at": "2021-08-25T14:57:13.006034+00:00",
         "from_rootca_cert": "d70efa2daaee06f8-91764",
         "updated_at": null,
         "state": "update-started",
         "id": 1}
        
    • PATCH /v1/kube_rootca_update

      • Modifies the current rootca_update. Used to update the state of the update (e.g. to update_complete, or update_aborted).

      • Request body example:

        [{"path": "/state",
         "value": "update-completed",
         "op": "replace"}]
        
        [{"path": "/state",
         "value": "update-aborted",
         "op": "replace"}]
        
      • Response body example:

        {"uuid": "fb882423-ea26-42bf-b645-fd9de4248fd4",
         "to_rootca_cert": "d70efa2daaee06f8-176046114160516196064588947858918572907",
         "created_at": "2021-08-24T13:40:13.318822+00:00",
         "from_rootca_cert": "d70efa2daaee06f8-199590289735612744821302170157251522966",
         "updated_at": "2021-08-24T13:52:21.547899+00:00",
         "state": "update-completed",
         "id": 20}
        
        {"uuid": "7d07e384-f06d-4213-8e61-5e300aeb9d1c",
         "to_rootca_cert": null,
         "created_at": "2021-08-24T13:38:55.376395+00:00",
         "from_rootca_cert": "d70efa2daaee06f8-199590289735612744821302170157251522966",
         "updated_at": "2021-08-24T13:39:47.108582+00:00",
         "state": "update-aborted",
         "id": 19}
        

The new resource /kube_rootca_update/upload_cert is added

  • URLS:

    • /v1/kube_rootca_update/upload_cert

  • Request Methods:

    • POST /v1/kube_rootca_update/upload_cert

      • Upload a root CA cert and key from a file

      • Request body example: (The contents of the body is from a file containing both private key and certificate):

        {"-----BEGIN PRIVATE KEY----- ...... -----END PRIVATE KEY----- ...... -----BEGIN CERTIFICATE----- ...... -----END CERTIFICATE-----}
        
      • Return body example:

        {"success": "8503e172a63b23e6-12808492498813125379",
         "error": ""}
        

The new resource /v1/kube_rootca_update/generate_cert is added

  • URLS:

    • /v1/kube_rootca_update/generate_cert

  • Request Methods:

    • POST /v1/kube_rootca_update/generate_cert

      • Tell sysinv to generate a new root CA cert and key pair

      • Request body example:

        {"expiry_date": "2022-08-25",
         "subject": "C=CA O=Company CN=kubernetes"}
        
      • Return body example:

        {"success": "a8942428863f292b-253592702972967198587817983178843995169",
         "error": ""}
        

The existing resource /ihosts is modified to add new actions

  • URLS:

    • /v1/ihosts/<hostid>

  • Request Methods:

    • POST /v1/ihosts/<hostid>/kube_update_ca

      • Update root CA cert on the specified host

      • Request body example:

        {"phase", "trust-both-cas"}
        
      • Response body example:

        {"target_rootca_cert": "8503e172a63b23e6-12808492498813125379",
         "created_at": "2021-08-25T17:13:22.571151+00:00",
         "hostname": "controller-1",
         "updated_at": "2021-08-25T17:58:59.809264+00:00",
         "state": "updating-host-trust-both-cas",
         "personality": "controller",
         "id": 8,
         "effective_rootca_cert": "d70efa2daaee06f8-91764",
         "uuid": "a597c090-731f-48f8-9f3f-344997c41317"}
        

The new resource /kube_rootca_update/hosts is added

  • URLs:

    • /v1/kube_rootca_update/hosts

  • Request Methods:

    • GET /v1/kube_rootca_update/hosts

      • Returns the update details of all hosts

      • Response body example:

        {
         "kube_host_updates": [
           {"target_rootca_cert": null,
           "created_at": "2021-08-25T17:13:22.558411+00:00",
           "hostname": "controller-0",
           "updated_at": null,
           "state": null,
           "personality": "controller",
           "id": 7,
           "effective_rootca_cert": "d70efa2daaee06f8-91764",
           "uuid": "7d7d05dd-900f-4004-951d-d92536faac8e"
           },
           {"target_rootca_cert": "8503e172a63b23e6-12808492498813125379",
           "created_at": "2021-08-25T17:13:22.571151+00:00",
           "hostname": "controller-1",
           "updated_at": "2021-08-25T17:59:16.097029+00:00",
           "state": "updated-host-trust-both-cas",
           "personality": "controller",
           "id": 8,
           "effective_rootca_cert": "d70efa2daaee06f8-91764",
           "uuid": "a597c090-731f-48f8-9f3f-344997c41317"
           },
           {"target_rootca_cert": null,
           "created_at": "2021-08-25T17:13:22.584500+00:00",
           "hostname": "worker-0",
           "updated_at": null,
           "state": null,
           "personality": "worker",
           "id": 9,
           "effective_rootca_cert": "d70efa2daaee06f8-91764",
           "uuid": "a4ca4eed-9b2f-4b4c-8ee7-45bbc573a55f"
           }
         ]
        }
        

The new resource /kube_rootca_update/pods is added

  • URLs:

    • /v1/kube_rootca_update/pods

  • Request Methods:

    • POST /v1/kube_rootca_update/pods

      • Update root CA cert for pods

      • Request body example:

        {"phase", "trust-both-cas"}
        
      • Response body example:

        {"uuid": "6cf4157b-75ff-4e86-bc96-8b08e4c9836d",
         "to_rootca_cert": "8503e172a63b23e6-12808492498813125379",
         "created_at": "2021-08-25T17:13:22.535798+00:00",
         "from_rootca_cert": "d70efa2daaee06f8-91764",
         "updated_at": "2021-08-25T18:37:02.574836+00:00",
         "state": "updating-pods-trust-both-cas",
         "id": 3}
        

Security impact

The new sysinv APIs are to be added within the existing framework, there is no changes to the existing security model.

The feature is providing a mechanism to update kubernetes certificates. Frequent or routine certificate update will enhance cluster security.

Other end user impact

End users will typically perform kubernetes root CA certificate update using the sysinv (i.e. system) CLI. The new CLI commands are shown in the Proposed change section above.

Performance Impact

When a root CA certificate update is in progress, kubernetes components (apiserver, scheduler, controller-manager, kubelet) and application pods will be restarted. Since the update is a rolling update, system will be functioning as usual but there will be small performance impact during the update. The user should update the host sequentially so the impact can be minimized.

Other deployer impact

Deployers will now be able to update the root CA certificate on a running system in a rolling fashion.

Developer impact

Developers working on the StarlingX components that manage container applications may need to be aware that certain operations should be prevented when a root CA update is in progress, since these components will be restarted during the update.

Developers working on application pods may also need to be aware that certain operations should be prevented when a root CA update is in progress as pods will be restarted during the update.

Generally speaking, there shouldn’t be any deployment or development activities on the system when a update is in progress. A maintenance window is a good time to do the update.

Upgrade impact

The newly added root CA update tables in sysinv database need to be created during upgrade from a release without this feature to a release with this feature. The tables will have initial empty default values.

Implementation

Assignee(s)

Primary assignee:

  • Andy Ning (andy.wrs)

Other contributors:

  • Soubihe, Joao Paulo (jsoubihe)

Repos Impacted

Impacted repo from this spec:

  • config

  • stx-puppet

  • fault

Work Items

Sysinv

  • New DB tables and APIs to access them

  • kube-rootca-update-start CLI/API

    • basic infrastructure

    • semantic and system health checks for update start

    • raise alarm to prevent upgrade, patching, etc.

  • kube-rootca-certificate-upload CLI/API

    • basic infrastructure

    • semantic checks

    • root CA issuer creation in cert-manager

    • calculate the ID of the new root certificate

  • kube-rootca-certificate-generate CLI/API

    • basic infrastructure

    • root CA certficate and issuer creation in cert-manager

    • calculate the ID of the new root certificate

  • kube-rootca-host-update <hostname> –phase=trust-both-cas CLI/API

    • basic infrastructure

    • semantic checks

    • conductor RPC/implementation (generate hieradata, call agent to apply puppet manifests, handle apply result, update host state etc…)

    • agent RPC/implementation (apply puppet manifest, report back config status, etc…)

  • kube-rootca-pods-update –phase=trust-both-cas CLI/API

    • basic infrastructure

    • semantic checks

    • conductor implementation (generate hieradata, trigger puppet manifests apply, handle apply result, update progress state etc…)

  • kube-rootca-host-update <hostname> –phase=update-certs CLI/API

    • basic infrastructure

    • semantic checks

    • conductor RPC/implementation (generate certificates and hieradata, call agent to apply puppet manifests, handle apply result, update host state etc…)

    • agent RPC/implementation (apply puppet manifest, report back config status, etc…)

  • kube-rootca-host-update <hostname> –phase=trust-new-ca CLI/API

    • basic infrastructure

    • semantic checks

    • conductor RPC/implementation (generate hieradata, call agent to apply puppet manifests, handle apply result, update host state etc…)

    • agent RPC/implementation (apply puppet manifest, report back config status, etc…)

  • kube-rootca-pods-update –phase=trust-new-ca CLI/API

    • basic infrastructure

    • semantic checks

    • conductor implementation (generate hieradata, trigger puppet manifests apply, handle apply result, update progress state etc…)

  • kube-rootca-update-complete CLI/API

    • basic infrastructure

    • semantic checks

    • clear the update in progress alarm

    • system health checks for update complete

  • kube-rootca-update-show CLI/API

    • basic infrastructure

    • conductor database query

  • kube-rootca-host-update-list CLI/API

    • basic infrastructure

    • conductor database query

  • kube-rootca-update-abort CLI/API

    • basic infrastructure

    • semantic checks

    • system health checks for update abort

    • clear ‘kube root CA update in progress’ alarm

    • raise ‘kube root CA update aborted’ alarm

Puppet

  • runtime manifest for host update trust-both-cas phase

  • runtime manifest for host update update-certs phase

  • runtime manifest for host update trust-new-ca phase

  • runtime manifest for pods update trust-both-cas phase

  • runtime manifest for pods update trust-new-ca phase

System Upgrade

  • Upgrade script to create the new tables in sysinv database when upgrading from a release without this feature. The tables will have default empty values.

Dependencies

None

Testing

The feature must be tested in the following StarlingX configurations:

  • AIO-SX

  • AIO-DX

  • Standard with at least one kubernetes worker node

The test can be performed on hardware or virtual environments.

Documentation Impact

New end user documentation will be required to describe how kubernetes root CA certificate update should be done. The config API reference will also need updates.

References

History

Revisions

Release Name

Description

stx-6.0

Introduced