Rook Migration Prerequisites

Common Prerequisites for All Migrations

Application and System State

  • All StarlingX applications must be in either uploaded or applied state.

  • The platform-integ-apps application must be in the applied state.

  • The platform-deployment-manager application must be applied.

  • The StarlingX-openstack application must not be applied.

  • Upgrade operation is not in progress.

  • Backup and Restore operations must not be running.

Ceph and Storage Requirements

  • The Ceph cluster health must be HEALTH_OK.

  • All OSDs must be in configured state.

  • All Ceph monitors must be in configured state.

  • Only the following Ceph OSD pools must be allowed, ceph osd pool ls:

    • .mgr

    • kube-rbd

    • kube-cephfs-data

    • kube-cephfs-metadata

    • images

    • cinder.backups

    • cinder-volumes

    • ephemeral

    • kube-rbdkube-system

  • Ceph PVCs must not be in use by any pod, since user applications must be scaled down before the migration.

  • Ceph PVCs must be in the Bound state. Check STATUS column in kubectl get pvc.

  • Ceph VolumeSnapshots must be in the ready state. Check READYTOUSE column in kubectl get volumesnapshot.

Host and Platform Health

  • All hosts must be unlocked/enabled/available.

  • No host named None may be present.

  • All hosts must be in-sync and reconciled:

    kubectl get hosts -n deployment
    
  • The system must be in-sync and reconciled:

    kubectl get system -n deployment
    

Disk Space Requirements

  • Directory used to back up the final deployment manager file must have at least 256 MiB of free space.

    • Custom path can be provided as a variable storage_backend_migration_backup_dir.

    • Ensure the parent directory has sufficient free space. By default, /opt/platform-backup/storage-backend-migration is created during the migration.

  • The /var directory on the active controller must have at least 30% free space:

    ~(keystone_admin)$ df -h /var
    
  • The cgts-vg volume group must have at least 20 GiB free on every host running Ceph monitors:

    ~(keystone_admin)$ system host-lvg-list <hostname>
    

DC Prerequisites

  • All System Controllers and subclouds included in the migration must be running the latest version of StarlingX.

  • System Controllers can be migrated independently of their subclouds.

  • If the System Controller migration occurs first, no additional steps are required prior to migrating any subcloud.

  • If subcloud migration occurs before System Controller migration, the required container images must be manually uploaded to registry.central on the System Controller before starting the subcloud migration. The list of required images is provided in the Docker Images section.

Run the following commands on the System Controller to upload the required images to the registry.central:

REGISTRY_PREFIX="server:port/path"
REGISTRY_USERNAME="admin"
REGISTRY_PASSWORD="password"

source /etc/platform/openrc
sudo docker login registry.local:9001 --username ${REGISTRY_USERNAME} --password ${REGISTRY_PASSWORD}

for image in \
registry.k8s.io/sig-storage/csi-attacher:v4.8.1 \
quay.io/cephcsi/cephcsi:v3.14.2 \
registry.k8s.io/sig-storage/csi-provisioner:v5.2.0 \
registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.13.0 \
registry.k8s.io/sig-storage/csi-resizer:v1.13.2 \
registry.k8s.io/sig-storage/csi-snapshotter:v8.2.1 \
docker.io/rook/ceph:v1.12.11 \
docker.io/rook/ceph:v1.17.8 \
quay.io/ceph/ceph:v16.2.15 \
quay.io/ceph/ceph:v18.2.7 \
docker.io/openstackhelm/ceph-config-helper:ubuntu_jammy_18.2.2-1-20241210 \
docker.io/bitnamilegacy/kubectl:1.33.4 \
docker.io/starlingx/stx-ceph-manager:stx.12.0-v18.2.2-0 \
docker.io/starlingx/stx-debian-dev:stx.9.0-v1.0.0

do

sudo docker pull ${REGISTRY_PREFIX}/${image}
sudo docker tag ${REGISTRY_PREFIX}/${image} registry.local:9001/${image}
sudo docker push registry.local:9001/${image}
IMAGE_NAME_NO_VERSION=$(echo "${image}" | cut -d':' -f1)
system registry-image-tags $IMAGE_NAME_NO_VERSION
sudo crictl pull --creds "${REGISTRY_USERNAME}:${REGISTRY_PASSWORD}" registry.local:9001/${image}
sudo docker rmi "${image}"
done

Docker Images

In order to migrate from Bare Metal Ceph to Rook Ceph, the following container images will be pulled and stored in the local registry registry.local:9001 during the migration process:

  • registry.k8s.io/sig-storage/csi-attacher:v4.8.1

  • quay.io/cephcsi/cephcsi:v3.14.2

  • registry.k8s.io/sig-storage/csi-provisioner:v5.2.0

  • registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.13.0

  • registry.k8s.io/sig-storage/csi-resizer:v1.13.2

  • registry.k8s.io/sig-storage/csi-snapshotter:v8.2.1

  • docker.io/rook/ceph:v1.12.11

  • docker.io/rook/ceph:v1.17.8

  • quay.io/ceph/ceph:v16.2.15

  • quay.io/ceph/ceph:v18.2.7

  • docker.io/openstackhelm/ceph-config-helper:ubuntu_jammy_18.2.2-1-20241210

  • docker.io/bitnamilegacy/kubectl:1.33.4

  • docker.io/starlingx/stx-ceph-manager:stx.12.0-v18.2.2-0

  • docker.io/starlingx/stx-debian-dev:stx.9.0-v1.0.0

These images are retrieved from the container registries defined in-service parameters, so the registries must contain all required images. To verify the configured registries, run:

~(keystone_admin)$ system service-parameter-list| grep '\-registry'

For DC systems, all required images must be preloaded into the System Controller’s local registry so that they can be synchronized and made available to all subclouds.

In-Service Migration Prerequisites

The goal of an in-service migration is to preserve user data while converting Ceph OSDs from Filestore to Bluestore. This conversion is performed while the system is still running on the Bare Metal Ceph backend. After Bare Metal Ceph is removed, the Bluestore OSDs are handed over to Rook. The final step in the process is to recreate the persistent volumes.

Users must scale down their applications before starting the migration and scale them back up afterward. As a result, an outage is expected, as with all migration methods.

  • Minimum 2 OSDs per chassis in the OSD tree must be up and running.

  • AIO-SX requires enough space to mark the largest OSD as out/down. This is a per “chassis” check.

  • AIO-SX systems configured with replica 1 have limitations on how much data can be used in the cluster before migration can proceed, as detailed below.

    • These limitations do not apply to AIO-SX systems using replica 2 or higher.

    • They also do not apply to AIO-DX or standard systems, which always operate with at least replica 2 and therefore allow wiping and migrating an entire host at a time.

  • Verify if AIO-SX has sufficient available space to mark one OSD as out (per chassis):

    Procedure

    1. Retrieve the available space information using ceph df (value stored in $CEPH_AVAIL_BYTES).

      ceph df --format json-pretty
      {
      "stats": {
      "total_bytes": 18800926720,
      "total_avail_bytes": 18444136448, # This is $CEPH_AVAIL_BYTES
      "total_used_bytes": 356790272,
      "total_used_raw_bytes": 356790272,
      "total_used_raw_ratio": 0.018977271392941475,
       "num_osds": 2,
       "num_per_pool_osds": 0
       },
      
      "stats_by_class": {
          "hdd": {
              "total_bytes": 18800926720,
              "total_avail_bytes": 18444136448,
              "total_used_bytes": 356790272,
              "total_used_raw_bytes": 356790272,
              "total_used_raw_ratio": 0.018977271392941475
         }
       },
      "pools": [
          {
            "name": "kube-rbd",
            "id": 1,
            "stats": {
                "stored": 0,
                "objects": 0,
                "kb_used": 0,
                "bytes_used": 0,
                "percent_used": 0,
                "max_avail": 17503635456
            }
       },
         {
              "name": "kube-cephfs-data",
              "id": 2,
              "stats": {
                  "stored": 0,
                  "objects": 0,
                  "kb_used": 0,
                  "bytes_used": 0,
                  "percent_used": 0,
                  "max_avail": 17503635456
           }
         },
           {
              "name": "kube-cephfs-metadata",
              "id": 3,
              "stats": {
                  "stored": 12363,
                  "objects": 22,
                  "kb_used": 13,
                  "bytes_used": 12363,
                  "percent_used": 7.0630989057463012e-07,
                  "max_avail": 17503635456
            }
          }
        ]
      }
      
    2. Get all OSD stats from ceph osd df and take notes on the OSD with the biggest kb_used under nodes ($WORST_CASE_OSD_USED_KB and $WORST_CASE_OSD_AVAIL_KB):

      ceph osd df --format json-pretty
      {
        "nodes": [
            {
                "id": 0,
                "device_class": "hdd",
                "name": "osd.0",
                "type": "osd",
                "type_id": 0,
                "crush_weight": 0.0084991455078125,
                "depth": 3,
                "pool_weights": {},
                "reweight": 1,
                "kb": 9180140,
                "kb_used": 174436,
                "kb_used_data": 9005704,
                "kb_used_omap": 2022,
                "kb_used_meta": 0,
                "kb_avail": 9005704,
                "utilization": 1.9001453136880266,
                "var": 1.001274294832792,
                "pgs": 91,
                "status": "up"
            },
            {
                "id": 1,
                "device_class": "hdd",
                "name": "osd.1",
                "type": "osd",
                "type_id": 0,
                "crush_weight": 0.0084991455078125,
                "depth": 3,
                "pool_weights": {},
                "reweight": 1,
                "kb": 9180140,
                "kb_used": 173992, # This is $WORST_CASE_OSD_USED_KB
                "kb_used_data": 9006148,
                "kb_used_omap": 631,
                "kb_used_meta": 0,
                "kb_avail": 9006148, # This is $WORST_CASE_OSD_AVAIL_KB
                "utilization": 1.8953087861405165,
                "var": 0.99872570516720816,
                "pgs": 101,
                "status": "up"
            }
            ],
              "stray": [],
              "summary": {
                  "total_kb": 18360280,
                  "total_kb_used": 348428,
                  "total_kb_used_data": 18011852,
                  "total_kb_used_omap": 2654,
                  "total_kb_used_meta": 0,
                  "total_kb_avail": 18011852,
                  "average_utilization": 1.8977270499142715,
                  "min_var": 0.99872570516720816,
                  "max_var": 1.001274294832792,
                  "dev": 0.0024182637737550916
            }
          }
      
    3. Calculate the worst case OSD data in bytes. Add a safety margin to the used OSD (default is +25%):

      WORST_CASE_OSD_USED_BYTES=$(printf "%.0f" "$(echo "$WORST_CASE_OSD_USED_KB * 1024 * 1.25" | bc -l)")
      echo $WORST_CASE_OSD_USED_BYTES
      
      WORST_CASE_OSD_AVAIL_BYTES=$((WORST_CASE_OSD_AVAIL_KB * 1024))
      echo $WORST_CASE_OSD_AVAIL_BYTES
      
    4. Calculate the free cluster space after removing the worst case OSD:

      FREE_AFTER_REMOVAL_BYTES=$((CEPH_AVAIL_BYTES - WORST_CASE_OSD_AVAIL_BYTES))
      FREE_AFTER_REMOVAL_BYTES=$(( FREE_AFTER_REMOVAL_BYTES < 0 ? 0 : FREE_AFTER_REMOVAL_BYTES ))
      echo $FREE_AFTER_REMOVAL_BYTES
      
    5. Check if the free space after removal ($FREE_AFTER_REMOVAL_BYTES) is bigger than the used data on that worst case OSD ($WORST_CASE_OSD_USED_BYTES):

      if (( FREE_AFTER_REMOVAL_BYTES > WORST_CASE_OSD_USED_BYTES )); then
         echo "SAFE TO REMOVE:"
         echo "free after = $FREE_AFTER_REMOVAL_BYTES > used = $WORST_CASE_OSD_USED_BYTES (+$((FREE_AFTER_REMOVAL_BYTES - WORST_CASE_OSD_USED_BYTES)) spare)"
      else
         echo "NOT SAFE:"
         echo "free after = $FREE_AFTER_REMOVAL_BYTES <= used = $WORST_CASE_OSD_USED_BYTES"
      fi
      
  • If the deployment includes an incorrectly named RBD pool (kube-rbdkuube-system), the migration process will move the data from that pool into the correctly named kube-rbd pool. This operation temporarily requires additional space on the active controller’s root disk (cgts-vg). For details on how to determine the necessary space, see Export/import: Incorrect RBD pool section.

    Note

    Re-balancing the cluster after the OSDs are marked as “in” takes some time, depending on how much data there is in the cluster.

Export/Import Prerequisites

The goal of this type of migration is to preserve user data, backing up the cluster data first while on Bare Metal, than removing it and installing Rook so we can import the data back. The last step is to recreate the PVs. Users must scale down their applications before starting the migration and scale them back up afterward. As a result, an outage is expected, as with all migration methods.

  • Cluster must have enough free space to store compressed backups containing all cluster data. When calculating required capacity, assume a worst case scenario of no compression (1:1) and add safety margin to the estimated data size. The final backup size depends on the type and amount of data stored in the cluster.

  • RBD snapshots are not supported for this migration. Snapshot references are lost during the import process. For this reason, migrations are blocked by default if RBD snapshots are present. Use the following flag to bypass and delete RBD snapshots:

    ~(keystone_admin)$ erase_rbd_snapshots=true
    

Check for Available Backup Space

Follow the steps below to confirm whether sufficient space is available for cluster backups.

Procedure

  1. Check free space in cgts-vg in bytes.

    ~(keystone_admin)$ sudo vgs --units b --no-suffix -o vg_free cgts-vg --noheadings --rows
    ~(keystone_admin)$ echo $CGTS_VG_FREE_BYTES
    
  2. Determine the total amount of data stored in the cluster without considering replicas by summing the stored field for all relevant pools, values obtained from pools.<pool-name>.stats.stored.

    sysadmin@controller-0:~$ ceph df --format json-pretty
    {
      "stats": {
           "total_bytes": 9400463360,
           "total_avail_bytes": 8144887808,
           "total_used_bytes": 1255559168,
           "total_used_raw_bytes": 1255575552,
           "total_used_raw_ratio": 0.1335652768611908,
           "num_osds": 1,
           "num_per_pool_osds": 0
    },
      "stats_by_class": {
           "hdd": {
               "total_bytes": 9400463360,
               "total_avail_bytes": 8144887808,
               "total_used_bytes": 1255559168,
               "total_used_raw_bytes": 1255575552,
               "total_used_raw_ratio": 0.1335652768611908
     }
     },
      "pools": [
           {
               "name": "kube-rbd",
               "id": 1,
               "stats": {
                   "stored": 540864547,
                   "objects": 143,
                   "kb_used": 528189,
                   "bytes_used": 540864547,
                   "percent_used": 0.065832816064357758,
                   "max_avail": 7674864640
               }
     },
     {
         "name": "kube-cephfs-data",
         "id": 2,
         "stats": {
             "stored": 536871074,
             "objects": 129,
             "kb_used": 524289,
             "bytes_used": 536871074,
             "percent_used": 0.065378516912460327,
             "max_avail": 7674864640
         }
     },
     {
         "name": "kube-cephfs-metadata",
         "id": 3,
         "stats": {
             "stored": 148821,
             "objects": 25,
             "kb_used": 146,
             "bytes_used": 148821,
             "percent_used": 1.9390323359402828e-05,
             "max_avail": 7674864640
         }
       }
     ]
     }
    
  3. Add 1024MiB (1073741824) or 5% of the cluster data (whatever is bigger) as safety margin for the used cluster data:

    KUBE_RDB_BYTES=540864547
    KUBE_CEPHFS_BYTES=536871074
    KUBE_CEPHFS_METADATA_BYTES=148821
    SAFETY_MARGING_BYTES=1073741824
    BACKUPS_REQUIRED_BYTES=$((KUBE_RDB_BYTES + KUBE_CEPHFS_BYTES + KUBE_CEPHFS_METADATA_BYTES + SAFETY_MARGING_BYTES))
    echo $BACKUPS_REQUIRED_BYTES
    
  4. Compare the required backup space against the available space in cgts-vg:

    if (( CGTS_VG_FREE_BYTES > BACKUPS_REQUIRED_BYTES )); then
       echo "BACKUPS FIT"
    else
       echo "DO NOT FIT"
    fi
    

Incorrect RBD Pool

If the system is using an RBD pool incorrectly named kube-rbdkube-system, the migration process will automatically transfer the data from this pool to the correctly named kube-rbd pool. This transfer requires temporary free space on the active controller’s root disk (cgts-vg). The fix applies to both in-service and export-import migrations.

The space requirement follows the same rules as the backup space calculation described in the Check for Available Backup Space section. However, in this case, only the bytes “stored” in the kube-rbdkube-system pool are considered, plus the required safety margin, i.e., the larger of 1024 MiB or 5% of the stored data.

  • For export/import migrations with this “kube-rbdkube-system” pool, the required space on the active controller is the total space of the cluster, since it will always be bigger than the space required to transfer the pool data to the correctly named one.

  • For in-service migrations: The required space is the stored bytes of the kube-rbdkube-system pool plus the safety margin.

Note

Compressing the data during the backup phase and decompressing it in the import phase takes some time, depending on the type and the amount of data present in the cluster.

Cluster Redeploy Prerequisites

A cluster redeploy migration provides a fast, clean transition from Bare Metal Ceph to Rook Ceph, without preserving any existing user data. During this process, Bare Metal Ceph is removed, and a new Rook Ceph deployment is created. The disks selected for Rook will be completely wiped.

Although the Kubernetes PV and PVC objects are preserved, they contain no data. As a result, users must redeploy any applications that rely on persistent storage and restore their data from backups after the migration completes.

Note

Therefore, before starting a cluster redeploy migration, it is important to note that existing cluster data will be permanently deleted during the migration, and that that no data will be preserved.