Replace OSDs on an AIO-SX Multi-Disk SystemΒΆ
You can replace OSDs in an AIO-SX system to increase capacity, or replace faulty disks on the host without reinstalling the host.
Procedure
Replication factor > 1
- Make sure there is more than one OSD installed, otherwise there could be data loss. - ~(keystone_admin)$ ceph osd tree 
- Verify that all Ceph pools are present. - ~(keystone_admin)$ ceph osd lspools 
- For each pool, make sure its size attribute is larger than 1, otherwise there could be data loss. - ~(keystone_admin)$ ceph osd pool get <pool-name> size 
- Disable pool size change during the procedure. This must be run for all pools. - ~(keystone_admin)$ ceph osd pool set <pool-name> nosizechange true 
- Verify that the Ceph cluster is healthy. - ~(keystone_admin)$ ceph -s cluster: id: 50ce952f-bd16-4864-9487-6c7e959be95e health: HEALTH_OK 
- Lock the controller. - ~(keystone_admin)$ system host-lock controller-0 
- Power down the controller. 
- Replace the disk. 
- Power on the controller. 
- Unlock the controller. - ~(keystone_admin)$ system host-unlock controller-0 
- Wait for the recovery process in the Ceph cluster to start and finish. 
- Ensure that the Ceph cluster is healthy. - ~(keystone_admin)]$ ceph -s cluster: id: 50ce952f-bd16-4864-9487-6c7e959be95e health: HEALTH_OK 
- Enable pool size changes. - ~(keystone_admin)]$ ceph osd pool set <pool-name> nosizechange false 
Replication factor 1 with space to backup
- Make sure there is more than one OSD installed, otherwise there could be data loss. - ~(keystone_admin)$ ceph osd tree 
- Verify all present ceph pools. - ~(keystone_admin)$ ceph osd lspools 
- For each pool, make sure its size attribute is larger than 1, otherwise there could be data loss. - ~(keystone_admin)$ ceph osd pool get <pool-name> size 
- Disable pool size change during the procedure. This must be run for all pools. - ~(keystone_admin)$ ceph osd pool set <pool-name> nosizechange true 
- Verify that the Ceph cluster is healthy. - ~(keystone_admin)$ ceph -s cluster: id: 50ce952f-bd16-4864-9487-6c7e959be95e health: HEALTH_OK 
- Lock the controller. - ~(keystone_admin)$ system host-lock controller-0 
- Power down the controller. 
- Replace the disk. 
- Power on the controller. 
- Unlock the controller. - ~(keystone_admin)$ system host-unlock controller-0 
- Wait for the recovery process in the Ceph cluster to start and finish. 
- Ensure that the Ceph cluster is healthy. - ~(keystone_admin)]$ ceph -s cluster: id: 50ce952f-bd16-4864-9487-6c7e959be95e health: HEALTH_OK 
- Enable pool size changes. - ~(keystone_admin)]$ ceph osd pool set <pool-name> nosizechange false 
- Set the replication factor to 1 for all pools. - ~(keystone_admin)]$ ceph osd pool set <pool-name> size 1 
Replication factor 1 without space to backup
- Lock the controller. - ~(keystone_admin)$ system host-lock controller-0 
- Backup file /etc/pmon.d/ceph.conf, then remove it. 
- Mark OSD as out and down, stop it, and destroy it. - ~(keystone_admin)$ ceph osd out osd.<id> ~(keystone_admin)$ ceph osd down osd.<id> ~(keystone_admin)$ sudo /etc/init.d/ceph stop osd.1 ~(keystone_admin)$ ceph osd destroy osd.1 
- Shutdown the machine, replace disk, turn it on, and wait for boot to finish. 
- Unlock the controller. - ~(keystone_admin)$ system host-unlock controller-0 
- Copy the backup ceph.conf to /etc/pmon.d/. 
- Verify that the Ceph cluster is healthy. - ~(keystone_admin)$ ceph -s 
