Containerized Ceph deployment and provision

Storyboard: https://storyboard.openstack.org/#!/story/2005527

Slice: https://docs.google.com/presentation/d/1VcolrSux-sEBUYcQA06yrEeYx4KM4Ne5 wryeYBmSy_o/edit#slide=id.p

Design doc: https://docs.google.com/document/d/1lnAZSu4vAD4EB62Mk18mCgM7aAItl26 sJa1GBf9DJz4/edit?usp=sharing

Ceph is the standard persistent storage backend for StarlingX, this story is to implement Ceph containerization.

The implementation of containerized Ceph include:

  • Ceph distributed cluster deployment on containers

  • Persistent volume provisioners: rbd (ROX: readonlymany, RWO: readwriteonce) and Cephfs (RWX: readwritemany)

There are some benefits of containerized Ceph:

  • Ceph version upgrade: Dockerfile to build image for Ceph independently or just use upstream image, no need to adapt to the StarlingX build system, simple to upgrade new release.

  • Deployment: Autonomous management for Ceph services (such as Ceph-mon, Ceph-osd, Ceph-mgr etc) by COs (container orchestrator systems), container isolate namespace and avoid resource conflicts.

  • Autoscaling: Flexible and elastic expansion of Ceph cluster.

Problem description

Kubernetes application (e,g, OpenStack) require access to persistent storage, current solution is a helm chart that leverage the rbd-provisioner incubator project: external-storage, but has some problems.

There are several provisioners for containerized Ceph:

  • In-tree: Implemented in Kubernetes upstream, but it is not used anymore and code freezed.

  • external-storage: https://github.com/kubernetes-incubator/external-storage, the project used to extend in-tree provisioners before CSI existed, will slowly deprecated.

  • Ceph-CSI: https://github.com/Ceph/Ceph-CSI, CSI (container storage interface) is the standard interface that COs (container orchestration systems) can use to expose arbitary storage systems, also work with other COs such as docker swarm and mesos etc.

So it is the best way to implement Ceph-CSI rbd provisioners to replace current rbd-provisioner, also add Ceph-CSI Cephfs provisioner to support readwritemany persistent volume.

We found several difficulties without Ceph containerization:

1. A stable version of Ceph is released every 9 months, for example, StarlingX community plan to deploy Ceph N+1 in next release, but the process of Ceph version upgrade is very complicated, we have to adapt Ceph upstream codebase to StarlingX build environment, including several submodules with specific commit Id and with build issues handling, those efforts takes up most of upgrade time.

2. The Ceph related plugin modules in StarlingX need to be refactored to accommodate the features of Ceph, making Ceph deployment and the StarlingX environment coupled together, which increase difficulty in troubleshooting.

Use Cases

The deployer/developer needs to manage Ceph deployment and provision:

  • Build Ceph/Ceph-CSI docker image with required version number.

  • Bring up Ceph cluster when controller nodes are available.

  • Dynamic adjust configurations after Ceph deploy completed.

  • Kubernetes applications need access persistent volume with RWO/RWX/ROX modes.

  • Setup Ceph client with libraries for block, object, filesystem access for external client (OpenStack Cinder/Nova etc) with languages (python, go, etc).

Proposed change

This story will have a solution proposal for Ceph containerization.

Solution

There are 2 solutions for Ceph containerization:

1. OpenStack-Helm-Infra is simple for code organization, and easy for Ceph upgrade.

2. Rook is more complicated, support more features include Ceph-CSI, and better scalability, Rook also has some attention in Ceph community, but currently Ceph upgrade need to do more work currently.

This proposal is for Rook, it is a more ideal choice after v1.0 which support Ceph-CSI.

Advantages:

  • Rook supports Ceph storage backend natively, it turns storage software into self-managing, self-scaling and self-healing storage services via Rook operator, no need work for high available support.

  • Good scalability, other storage backends (e,g, NFS, EdgeFS) can also be supported.

  • Rook community has more fork/star/watch compared with OpenStack-Helm-Infra project, and current Ceph supported by Rook is stable, Rook also support Ceph operator helm chart.

Disadvantages:

  • From Ceph upgrade aspect, need manually check cluster status before, but upgrade operator will support in future which is much more simple.

  • Rook framework is popular but also has golang codebase, increase the cost of maintenance, and status of some Rook projects are not stable yet.

Implementation:

Rook community:

  • Rook current release is v1.0 in early June, 2019.

  • Rook supports Ceph with stable also add Ceph Nautilus and Ceph-CSI plugin experimental support in v1.0, and plan to stable support for CSI plugin in v1.1 due by August 16th, 2019.

  • Rook also supports helm charts, but only for operator without CSI support, you also need to create Ceph cluster by kubectl shown in Rook Ceph quickstart.

  • Rook plan to make more complete upgrade automation in future.

Code changes:

  • Remove current helm chart which leverage the rbd-provisioner incubator project (external-storage as we previously mentioned).

  • Remove service manager functionality for high available support with Ceph services.

  • Remove native Ceph cluster bootstrap in apply-bootstrap-manifest of ansible configuration.

  • Introduce upstream project: https://github.com/Rook/Rook with the cloud-native storage orchestrator for kubernetes with Ceph-CSI support.

  • Create new Dockerfiles to build container image for Rook-operator (include Rook operator and agent) and Ceph daemon (include Ceph-mon, Ceph-osd, Ceph-mgr, etc)

  • Add provisioning of Rook operator and Ceph cluster post ansible-apply, and consider implement in platform application when the rbd-provisioner chart is removed from this application.

  • Add 2 helm chart as Rook charts: Rook-operator (Ceph-CSI support) and Rook-Ceph (cluster), and consider as platform app post ansible-apply, but need addition work, because: Firstly, Rook 1.0 have helm chart but no CSI support currently. Secondly, Current Rook helm chart only for Rook operator, not include Ceph cluster bring up.

  • Changes in Rook & Ceph plugins and sysinv impact code.

The Rook & Ceph plugins and sysinv impact implementation include:

  • Remove puppet for which encapsulate puppet operations for Ceph storage configuration.

  • Remove python-cephclient module, the operations of ceph monitor and osd will manage by Rook operator, and replaced by python-rookclient.

  • Add support for Rook and Ceph config with different system configurations, like get_overrides() in Openstack-Helm charts.

  • Add python-rookclient to operate several deployment option by override yaml files or helm charts, for example: Ceph-monitor replication:3, since operator will create new if we manually delete a monitor, it cannot implement only by restful api to remove monitor. In current implementation, the configuration of system-mode will transfer to servicemanager to bring up Ceph cluster in native deployment, and it will reflect some option such as replication of ceph monitors, and in Rook configuration there is also corresponding parameters. Rook operator can refresh the ceph cluster when check the configuration changed by override the yaml/helm charts and no need to update sysinv code for RESTful commands.

Alternatives

Solution: OpenStack-Helm-Infra

  • Introduce project: https://github.com/openstack/openstack-helm-infra with Ceph related helm charts: Ceph-client, Ceph-mon, Ceph-osd etc.

  • Helm/armada has been widely accepted and used by the community, this solution follows the helm architecture and less changes to related code (e,g, helm plugin for Ceph manifest).

  • Ceph version upgrade is easy via helm install/update, but need to port Ceph-CSI project and rework the new Ceph-provisioners.

  • Need additional work for the high available support for Ceph, like the function of service manager in StarlingX.

Data model impact

1. In bootstrap, Init script to deploy Rook operator, when Rook operator bring up Ceph cluster, there is Rook-plugin to override Rook-Ceph yaml with system configration for monitor and osd settings etc.

2. Rookclient provide interface to change the deployment option by overriding rook yaml files (in future by helm charts), also include show & dump wrapper interface which used by sysinv.

REST API impact

None

Security impact

None

Other end user impact

None

Performance Impact

Suppose no impact. For networking, Ceph and related container uses host native network.

Other deployer impact

  • Containerized Ceph deployment should be used instead of native Ceph deployed by puppet.

  • Containerized Ceph-CSI provision should be used instead of ‘rbd-provisioner’.

Developer impact

None

Upgrade impact

  1. Upgrade after Rook:

Upgrade work includes 2 parts: Rook-Ceph operator and Ceph cluster. There is manual for update in Rook website: https://Rook.io/docs/Rook/v1.0/Ceph-upgrade.html, actually it need additional work.

Although in Rook community there has plan to make more complete upgrade automation, currently we have to follow the manual to upgrade.

  1. Upgrade from current implementation to Rook:

There is a big gap to upgrade from current native Ceph cluster to Rook, because the deploy model changed completely, it is hard to follow official upgrade manual to replace the Ceph services (mon, osd, etc) step by step, and Rook operator does not support bring up integrated service with native and containerized types.

Although the upgrade is not recommended, if necessary, following are key checkpoints, and I will create script for the actions.

  • Teardown old native Ceph cluster, and keep data in storage(osd).

  • Ceph osd can be set as the same device or file in Rook and native, keep the storage setting be consistent.

  • After that, re-deploy StarlingX with Rook with updated ceph-cluser.yaml, and bring up new cluster.

Upgrade a Rook cluster is not without risk, especially upgrade from native to rook. There may be unexpected issues or obstacles that damage the integrity and health of your storage cluster, including data loss.

Implementation

Assignee(s)

Primary assignee:

Tingjie Chen <tingjie.chen@intel.com>

Other contributors: (Provide key comments and integrated into this spec)

Brent Rowsell <brent.rowsell@windriver.com>, Bob Church <robert.church@windriver.com>

Repos Impacted

stx-manifest, stx-tools, stx-config, stx-integ, stx-upstream, stx-ansible-playbooks

Work Items

We can consider containerized Ceph as another storage backend used by OpenStack and Rook Ceph cluster can also bring up and co-exist with current native Ceph cluster.

For the develop model, add this as a parallel capability and keep the existing Ceph implementation. Default would be the existing implementation with the ability to override to use this implementation at deployment time. The old implementation would be removed in STX4.0 and include an upgrade strategy from the legacy implementation.

The main implementation can merge in advance since it doesn’t break current functionality, in the meantime we can prepare patch which switch the default Ceph backend into containerized Ceph implementation (include some configuration changes), and enable it in certain branch (the change is tiny) and then merge into master if ready.

The implementation can be split into the following milstones:

MS1 (end of July): Bring up Rook operator and Ceph cluster in StarlingX.

MS2 (30th, Sep): Finished Rook plugins and sysinv impact code, deploy by system config policy.

MS3 (20th, Oct): Ceph-CSI (for Kubernetes app) and OpenStack service support.

Once MS3 is achieved, the basic functionality needed to make sure we can cover all operational scenarios triggered through the sysinv API and get a feel for system impacts.

Include but not limit to:

  • Monitor assignment/reassignment.

  • Adding/removing storage tiers (impacts Ceph crushmap)

  • Defining kubernetes default storage class(current handled in rbd-provisioner)

  • Host-delete/host-add for host that do/will deploy Rook resources.

  • Replication factor updates for min data redundancy vs. H/A data availability (AIO-SX disk based vs host based replication)

Also there are test cases defined for Ceph: https://ethercalc.openstack.org/orb83xruwmo8

Dependencies

Story: [Feature] Kubernetes Platform Support at https://storyboard.openstack.org/#!/story/2002843

Story: [Feature] Ceph persistent storage backend for Kubernetes https://storyboard.openstack.org/#!/story/2002844

This requires existing functionality from some projects that are not currently used by StarlingX:

  • docker

  • kubernetes

  • Rook

  • Openstack-Helm-Infra

  • Ceph-CSI

Testing

None

Documentation Impact

None

References

Rook: https://github.com/Rook/Rook Ceph-CSI: https://github.com/Ceph/Ceph-CSI Openstack-Helm: https://github.com/openstack/openstack-helm Openstack-Helm-Infra: https://github.com/openstack/openstack-helm-infra build wiki: https://wiki.openstack.org/wiki/StarlingX/Containers/BuildingImages

History

None