StarlingX: Debian Builds on Kubernetes

Storyboard Story: https://storyboard.openstack.org/#!/story/2009812

The new Debian build system [1] in conjunction with Minikube lacks support for multiple projects, branches & users within the same environment. We propose a Kubernetes infrastructure to remedy these shortcomings: a dedicated multi-node build cluster with shared services, as well as the necessary tooling changes.

Problem Description

The current implementation relies on Minikube – a version of Kubernetes optimized for a single-node, single user operation – making it difficult to share computing resources between multiple projects, branches, and users within the same environment, particularly on a dedicated “daily” build server. The Debian package repository service cannot be shared, which results in excessive download times and disk usage.

There is no explicit support for CI environments, requiring additional scripting in Jenkins or similar tools. Jenkins’s approach to k8s integration is not compatible with the current tooling, as it requires the top-level scripts written in the “pipeline” domain-specific language. The best we can do in Jenkins is call the StarlingX build scripts, bypassing Jenkins’ POD & node scheduling & management mechanisms.

Use Cases

This change would support infrastructure configurations covering the common use cases described below.

Isolated single-user builds

An individual contributor wants to build individual packages and the installation ISO; or docker images – in an isolated, autonomous environment. This use case is already supported by the build tool using Minikube, any further changes must remain compatible with this type of environment.

Daily build server

An organization wishes to maintain a server cluster for building multiple projects or branches daily or on demand (this is the case with the current StarlingX official build system). Tooling must support:

  • Kubernetes clusters. Motivation: some organizations already have Kubernetes clusters.

  • StarlingX clusters. Motivation: “eat our own dog food”

  • Multiple worker nodes. Motivation: allow for expanding computing resources available to the build system.

  • Ideally, clusters without a shared file system. Motivation: shared redundant file systems are slow and difficult to implement and may not be available in the target environment.

Build server open to individuals

This is a variation of the above, but with the option for individual contributors to generate private builds based on their patches before pushing them to source control. Motivation: this allows users to benefit from the more powerful, centralized build server.

This use case is not addressed by the current spec. We believe the proposed changes are sufficient to add this functionality in the future.

Proposed changes

We propose a build system that can run in any environment based on Kubernetes, and a matching installation to drive daily builds on CENGN.

Change the build scripts to support vanilla k8s multi-user environments. This includes making sure POD and directory names do not clash between users or multiple projects/branches. Motivation: allow multiple users & projects in the same environment.

Update helm charts to isolate the common parts between minikube and other k8s environments.

Update the stx tool as it may be of limited or no use in full k8s environments.

Replace Aptly with Pulp (package repository service). Motivation: Pulp supports file types other than Debian packages, such as source archives used as build inputs.

Update the package repository service container so that it can be shared among multiple builds. Motivation: avoid unnecessary duplication of package files that can be shared among different users on the same system.

Update other build containers to allow transient use (single command execution). Motivation: efficient memory/CPU sharing among multiple builds.

xxxxxx Kubernetes or Minikube xxxxxxxxxxxxxxxxxxxxxxxx
x                                 ┌──────────────┐   x
x                                 │ User builder │   x
x   ┌──────────────┐           ┌──┤    PODs      │   x   User 1
x   │  Pulp        │◄─────┐    │  └──────────────┘   x
x   └──────────────┘      │    │                     x
x                         │    │  ┌──────────────┐   x
x   ┌──────────────┐      │    │  │ User builder │   x
x   │  Other repos │◄─────┼────┼──┤    PODs      │   x   User 2
x   └──────────────┘      │    │  └──────────────┘   x
x                         │    │                     x
x   ┌──────────────┐      │    │  ┌──────────────┐   x
x   │ Docker reg   │◄─────┘    │  │ User builder │   x
x   └──────────────┘           └──┤    PODs      │   x   User 2
x                                 └──────────────┘   x
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Additional repository services may be deployed in the cluster to support specific types of data. Whether the build will require additional repository services remains to be seen.

A docker registry may be deployed for managing intermediate containers used by the build. Some environments may have a docker registry available outside of the cluster, so this is optional. In particular, CENGN already has this service (Docker Hub) available.

We propose installing Kubernetes on a single server to drive daily builds (CENGN). Kubernetes will be configured to allow the addition of additional nodes. Jenkins will be installed to trigger Tekton [2] builds and for reporting.

Tekton is a CI pipeline engine designed specifically for Kubernetes. It is command-line driven and may be used by the build tools directly to schedule build jobs within the k8s cluster. Whether such direct usage is feasible or useful is unclear at this point.

Outputs of released or otherwise important builds would need to be saved indefinitely and backed up in case of hardware failures. On CENGN availability of backup storage is to be determined.

Outputs of old non-released builds would be deleted regularly (builds older than 2 weeks or similar). This includes all artifacts (log files, deb files, ISOs).

Mirrors of 3rd-party files (tars, deb files) would be saved indefinitely.

Docker images would be built using kaniko [4] – a tool to build container images from a Dockerfile, inside a container or Kubernetes cluster. It allows one to run “docker build” inside a docker container. This method is appropriate for building Debian build tools images.

For the more complicated cases that need to acces docker in other ways, we would use sysbox [5] – a tool for running system software, including Docker, inside docker containers. This method is appropriate for building application images, such as Openstack containers.

Alternatives

Tekton

We do not have to use Tekton – we could simply run build commands directly in k8s PODs controlled by the build scripts (Python), with a Jenkins on top to manage build schedules and artifacts archiving. This would require us to maintain a sizable chunk of the pipeline logic in Jenkins. Jenkins is hard to install and automate, making the testing of updates to the pipelines a challenge. Jenkins’ automation API is somewhat unstable and uses an obscure pipeline definition language. We expect a Tekton-based approach to be largely free of these shortcomings.

On the other hand, Tekton is not as mature as Jenkins.

Docker image builds

To build docker images in k8s and instead of kaniko & sysbox, we could use docker-in-docker [6]. This method has multiple problems linked to kernel security and I/O performance [7].

We could also mount the hosts’ daemon socket inside any containers/pods that need to interact with docker. This would leave container instances behind on the host and would require additional scripting to clean them up.

Impact on build tools installations

Individual contributors will be able to continue using Minikube, as they do now.

Installing and configuring Kubernetes itself is beyond the scope of this document. The services & POD definitions used by the build tools shall be reusable (as Helm charts) no matter what the surrounding infrastructure looks like.

Open questions

Persistent storage

The builds would need to persist these types of files:

  • Debian packages and other files (tarballs, etc) used as build inputs. This will be handled by Pulp, whose underlying storage facility is to be determined.

  • Debian packages produced by the build. This will be handled by Pulp as well.

  • Debian package mirror. This may be handled by Pulp as well. It is currently implemented as a custom script on CENGN, outside of k8s.

  • Other files produced by the build (ISO files, docker image list files). We expect to use Pulp for this as well.

  • Log files are normally stored within k8s itself, as well as in individual POD cotainers. We would probably need to export them for ease of access. CENGN users would expect log files as simple downloadable files, since we not proposing making anu k8s GUIs available to the public at this point. ElasticSearch may be helpful (searchable database of logs, among other things), but it needs a lot of CPU & RAM.

  • Docker images. Official images (ie build outputs) are to be published to an external Docker registry (Docker Hub).

It is not clear whether the build would require a shared persistent file system (eg for passing build artifacts between build steps). It is difficult to implement and target k8s installations may not have one available for use by us. Without a shared file system builds will take longer to complete due to having to download and copy many files. Contrast this with the older CentOS build system, which relies on a shared file system and uses symbolic links for file sharing.

If a file system can’t be shared - as a workaround - all builds’ PODs will have to be scheduled to run on the same node. Downside: can’t schedule PODs on different nodes

An object storage service (non-shared, artifacts to be copied, no symlinks, etc, such as MinIO[3]) may be used for artifacts archiving, as well as for passing artifacts between build stages. Downside: slow.

NFS could be used as a shared file system. Downside: slow

Ceph. Downside: seems complicated.

Artifact retention & backups

CENGN: not clear whether independent isolated backup storage is available We could save backups on one of the 2 build servers, making sure important files (released builds etc) are saved on 2 physical machines.

StarlingX vs Kubernetes

  • Once StarlingX switches to Debian, the build server would have to be re-imaged, this will cause disruption in daily builds.

  • We do not need many of the functions that StarlingX provides, k8s is sufficient.

  • StarlingX is not optimized for running build jobs.

  • If we use k8s we should pick a stable base OS with a long shelf life to avoid upgrading it for longer, while being able to upgrade k8s at will.

  • If we use StarlingX we should pick the latest official release (6.0).

Data model impact

None

REST API impact

None

Security impact

None for StarlingX deployments. Kubernetes clusters used for builds have security implications that will have to be considered.

Other end user impact

None

Performance impact

None

Other deployer impact

None

Developer impact

Current workflow based on Minikube will continue being supported. Organizations will gain the ability to take advantage of full Kubernetes installations for centralized builds.

Upgrade impact

None for StarlingX. Kubernetes upgrades are covered in [8].

Implementation

Assignee(s)

  • Davlet Panech - dpanech

  • Luis Sampaio - lsampaio

Repos impacted

starlingx/tools

Work Items

See storyboard.

Dependencies

None

Testing

As the scope of this spec is restricted to the building of StarlingX it does not introduce any additional runtime testing requirements. As this change is proposed to take place alongside the move to Debian, full runtime testing is expected related to that spec.

Building under full Kubernetes will require validation to ensure similar outcomes as were expected when building in Minikube environment.

Documentation Impact

StarlingX Build Guide https://docs.starlingx.io/developer_resources/build_guide.html - add instructions for full Kubernetes environments.

References

[1] StarlingX: Debian Build Spec – https://docs.starlingx.io/specs/specs/stx-6.0/approved/starlingx_2008846_debian_build.html

[2] Tekton, a CI pipeline engine for k8s – https://tekton.dev/

[4] Kaniko, a tool to build container images from a Dockerfile, inside a container or Kubernetes cluster – https://github.com/GoogleContainerTools/kaniko

[3] MinIO, an Amazon S3 - compatible object storage system – https://min.io/

[5] Sysbox, a container runtime that sits below Docker – https://github.com/nestybox/sysbox

[6] Docker in Docker – https://hub.docker.com/_/docker/

[7] Using Docker-in-Docker for your CI or testing environment? Think twice. – https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/

[8] Kubernetes - https://kubernetes.io/docs/home/

History

Revisions

Release Name

Description

STX-7.0

Introduced