Replace OSDs on an AIO-SX Multi-Disk SystemΒΆ

You can replace OSDs in an AIO-SX system to increase capacity, or replace faulty disks on the host without reinstalling the host.

Procedure

Replication factor > 1

  1. Make sure there is more than one OSD installed, otherwise there could be data loss.

    ~(keystone_admin)$ ceph osd tree
    
  2. Verify that all Ceph pools are present.

    ~(keystone_admin)$ ceph osd lspools
    
  3. For each pool, make sure its size attribute is larger than 1, otherwise there could be data loss.

    ~(keystone_admin)$ ceph osd pool get <pool-name> size
    
  4. Disable pool size change during the procedure. This must be run for all pools.

    ~(keystone_admin)$ ceph osd pool set <pool-name> nosizechange true
    
  5. Verify that the Ceph cluster is healthy.

    ~(keystone_admin)$ ceph -s
    
    cluster:
      id:     50ce952f-bd16-4864-9487-6c7e959be95e
      health: HEALTH_OK
    
  6. Lock the controller.

    ~(keystone_admin)$ system host-lock controller-0
    
  7. Power down the controller.

  8. Replace the disk.

  9. Power on the controller.

  10. Unlock the controller.

    ~(keystone_admin)$ system host-unlock controller-0
    
  11. Wait for the recovery process in the Ceph cluster to start and finish.

  12. Ensure that the Ceph cluster is healthy.

    ~(keystone_admin)]$ ceph -s
    
    cluster:
      id:     50ce952f-bd16-4864-9487-6c7e959be95e
      health: HEALTH_OK
    
  13. Enable pool size changes.

    ~(keystone_admin)]$ ceph osd pool set <pool-name> nosizechange false
    

Replication factor 1 with space to backup

  1. Make sure there is more than one OSD installed, otherwise there could be data loss.

    ~(keystone_admin)$ ceph osd tree
    
  2. Verify all present ceph pools.

    ~(keystone_admin)$ ceph osd lspools
    
  3. For each pool, make sure its size attribute is larger than 1, otherwise there could be data loss.

    ~(keystone_admin)$ ceph osd pool get <pool-name> size
    
  4. Disable pool size change during the procedure. This must be run for all pools.

    ~(keystone_admin)$ ceph osd pool set <pool-name> nosizechange true
    
  5. Verify that the Ceph cluster is healthy.

    ~(keystone_admin)$ ceph -s
    
    cluster:
      id:     50ce952f-bd16-4864-9487-6c7e959be95e
      health: HEALTH_OK
    
  6. Lock the controller.

    ~(keystone_admin)$ system host-lock controller-0
    
  7. Power down the controller.

  8. Replace the disk.

  9. Power on the controller.

  10. Unlock the controller.

    ~(keystone_admin)$ system host-unlock controller-0
    
  11. Wait for the recovery process in the Ceph cluster to start and finish.

  12. Ensure that the Ceph cluster is healthy.

    ~(keystone_admin)]$ ceph -s
    
    cluster:
      id:     50ce952f-bd16-4864-9487-6c7e959be95e
      health: HEALTH_OK
    
  13. Enable pool size changes.

    ~(keystone_admin)]$ ceph osd pool set <pool-name> nosizechange false
    
  14. Set the replication factor to 1 for all pools.

    ~(keystone_admin)]$ ceph osd pool set <pool-name> size 1
    

Replication factor 1 without space to backup

  1. Lock the controller.

    ~(keystone_admin)$ system host-lock controller-0
    
  2. Backup file /etc/pmon.d/ceph.conf, then remove it.

  3. Mark OSD as out and down, stop it, and destroy it.

    ~(keystone_admin)$ ceph osd out osd.<id>
    ~(keystone_admin)$ ceph osd down osd.<id>
    ~(keystone_admin)$ sudo /etc/init.d/ceph stop osd.1
    ~(keystone_admin)$ ceph osd destroy osd.1
    
  4. Shutdown the machine, replace disk, turn it on, and wait for boot to finish.

  5. Unlock the controller.

    ~(keystone_admin)$ system host-unlock controller-0
    
  6. Copy the backup ceph.conf to /etc/pmon.d/.

  7. Verify that the Ceph cluster is healthy.

    ~(keystone_admin)$ ceph -s