Ceph Placement Group Number Dimensioning for Storage Cluster

Ceph pools are created automatically by StarlingX, StarlingX applications, or by StarlingX supported optional applications. By default, no pools are created after the Ceph cluster is provisioned (monitor(s) enabled and OSDs defined) until it is created by an application or the Rados Gateway (RADOS GW) is configured.

The following is a list of pools created by StarlingX OpenStack, and Rados Gateway applications.

Table 1. List of Pools

Service/Application

Pool Name

Role

PG Count

Created

Platform Integration Application

kube-rbd

Kubernetes RBD provisioned PVCs

64

When the platform automatically upload/applies after the Ceph cluster is provisioned

OpenStack

images

  • glance image file storage

  • used for VM boot disk images

256

When the user applies the application for the first time

ephemeral

  • ephemeral object storage

  • used for VM ephemeral disks

256

cinder-volumes

  • persistent block storage

  • used for VM boot disk volumes

  • used as aditional disk volumes for VMs booted from images

  • snapshots and persistent backups for volumes

512

cinder.backups

backup cinder volumes

256

Rados Gateway

rgw.root

Ceph Object Gateway data

64

When the user enables the RADOS GW through the system service-parameter CLI

default.rgw.control

Ceph Object Gateway control

64

default.rgw.meta

Ceph Object Gateway metadata

64

default.rgw.log

Ceph Object Gateway log

64

Note

Considering PG value/OSD has to be less than 2048 PGs, the default PG values are calculated based on a setup with one storage replication group and up to 5 OSDs per node.

Recommendations

For more information on how placement group numbers, (pg_num) can be set based on how many OSDs are in the cluster, see, Ceph PGs per pool calculator: https://old.ceph.com/pgcalc/.

You must collect the current pool information (replicated size, number of OSDs in the cluster), and enter it into the calculator, calculate placement group numbers (pg_num) required based on pg_calc algorithm, estimates on OSD growth, and data percentage to balance Ceph as the number of OSDs scales.

When balancing placement groups for each individual pool, consider the following:

  • pgs per osd

  • pgs per pool

  • pools per osd

  • replication

  • the crush map (Ceph OSD tree)

Running the command, ceph -s, displays one of the following HEALTH_WARN messages:

  • too few pgs per osd

  • too few pgs per pool

  • too many pgs per osd

Each of the health warning messages requires manual adjustment of placement groups for individual pools:

  • To list all the pools in the cluster, use the following command, ceph osd lspools.

  • To list all the pools with their pg_num values, use the following command, ceph osd dump.

  • To get only the pg_num / pgp_num value, use the following command, ceph osd get <pool-name>pg_num.

Too few PGs per OSD

Occurs when a new disk is added to the cluster. For more information on how to add a disk as an OSD, see, StarlingX Storage Configuration and Management: Provisioning Storage on a Storage Host Using the CLI.

To fix this warning, the number of placement groups should be increased, using the following commands:

~(keystone_admin)$ ceph osd pool set <pool-name> pg_num <new_pg_num>
~(keystone_admin)$ ceph osd pool set <pool-name> pgp_num <new_pg_num>

Note

Increasing pg_num of a pool has to be done in increments of 64/OSD, otherwise, the above commands are rejected. If this happens, decrease the pg_num number, retry and wait for the cluster to be HEALTH_OK before proceeding to the the next step. Multiple incremental steps may be required to achieve the targeted values.

Too few PGs per Pool

This indicates that the pool has many more objects per PG than average (too few PGs allocated). This warning is addressed by increasing the pg_num of that pool, using the following commands:

~(keystone_admin)$ ceph osd pool set <pool-name> pg_num <new_pg_num>
~(keystone_admin)$ ceph osd pool set <pool-name> pgp_num <new_pg_num>

Note

pgp_num should be equal to pg_num.

Otherwise, Ceph will issue a warning:

~(keystone_admin)$ ceph -s
cluster:
id: 92bfd149-37c2-43aa-8651-eec2b3e36c17
health: HEALTH_WARN
1 pools have pg_num > pgp_num
Too many PGs / per OSD

This warning indicates that the maximum number of 300 PGs per OSD is exceeded. The number of PGs cannot be reduced after the pool is created. Pools that do not contain any data can safely be deleted and then recreated with a lower number of PGs. Where pools already contain data, the only solution is to add OSDs to the cluster so that the ratio of PGs per OSD becomes lower.

Caution

Pools have to be created with the exact same properties.

To get these properties, use ceph osd dump, or use the following commands:

~(keystone_admin)$ ceph osd pool get cinder-volumes crush_rule
crush_rule: storage_tier_ruleset
~(keystone_admin)$ ceph osd pool get cinder-volumes pg_num
pg_num: 512
~(keystone_admin)$ ceph osd pool get cinder-volumes pgp_num
pg_num: 512

Before you delete a pool, use the following properties to recreate the pool; pg_num, pgp_num, crush_rule.

To delete a pool, use the following command:

~(keystone_admin)$ ceph osd pool delete <pool-name> <<pool-name>>

To create a pool, use the parameters from ceph osd dump, and run the following command:

~(keystone_admin)$ ceph osd pool create {pool-name}{pg-num} {pgp-num} {replicated} <<crush-ruleset-name>>