800 Series Alarm Messages

The system inventory and maintenance service reports system changes with different degrees of severity. Use the reported alarms to monitor the overall health of the system.

Alarm messages are numerically coded by the type of alarm.

For more information, see Fault Management Overview.

In the alarm description tables, the severity of the alarms is represented by one or more letters, as follows:

  • C: Critical

  • M: Major

  • m: Minor

  • W: Warning

A slash-separated list of letters is used when the alarm can be triggered with one of several severity levels.

An asterisk (*) indicates the management-affecting severity, if any. A management-affecting alarm is one that cannot be ignored at the indicated severity level or higher by using relaxed alarm rules during an orchestrated patch or upgrade operation.

Note

Degrade Affecting Severity: Critical indicates a node will be degraded if the alarm reaches a Critical level.

Alarm ID: 800.001

Storage Alarm Condition:

1 mons down, quorum 1,2 controller-1,storage-0

Entity Instance

cluster=<dist-fs-uuid>

Degrade Affecting Severity:

None

Severity:

C/M*

Proposed Repair Action

If problem persists, contact next level of support and provide the output of the following commands:

  • ceph status

  • ceph fs status

  • system host-list

  • system cluster-list

  • system cluster-show <cluster-name>


Alarm ID: 800.003

Storage Alarm Condition: Quota/Space mismatch for the <tiername> tier. The sum of Ceph pool quotas does not match the tier size.

Entity Instance

cluster=<dist-fs-uuid>.tier=<tiername>

Degrade Affecting Severity:

None

Severity:

m

Proposed Repair Action

Update ceph storage pool quotas to use all available tier space and provide the output of the following commands:

  • ceph status

  • ceph fs status

  • system host-fs-list <hostname>

  • system controllerfs-list


Alarm ID: 800.010

Potential data loss. No available OSDs in storage replication group.

Entity Instance

cluster=<dist-fs-uuid>.peergroup=<group-x>

Degrade Affecting Severity:

None

Severity:

C*

Proposed Repair Action

Ensure storage hosts from replication group are unlocked and available. Check if OSDs of each storage host are up and running. If problem persists contact next level of support and provide the output of the following commands:

  • ceph status

  • ceph fs status

  • system host-list

  • system cluster-list

  • system cluster-show <cluster-name>


Alarm ID: 800.011

Loss of replication in peergroup.

Entity Instance

cluster=<dist-fs-uuid>.peergroup=<group-x>

Degrade Affecting Severity:

None

Severity:

M*

Proposed Repair Action

Ensure storage hosts from replication group are unlocked and available. Check if OSDs of each storage host are up and running. If problem persists contact next level of support and provide the output of the following commands:

  • ceph status

  • ceph fs status

  • system host-list

  • system cluster-list

  • system cluster-show <cluster-name>


Alarm ID: 800.102

Storage Alarm Condition:

PV configuration <error/failed to apply> on <hostname>. Reason: <detailed reason>.

Entity Instance

pv=<pv_uuid>

Degrade Affecting Severity:

None

Severity:

C/M*

Proposed Repair Action

Remove failed PV and associated Storage Device then recreate them and provide the output of the following commands:

  • ceph status

  • ceph fs status

  • system helm-override-show platform-integ-apps rbd-provisioner kube-system

    AND/OR

  • system helm-override-show platform-integ-apps cephfs-provisioner kube-system


Alarm ID: 800.103

Storage Alarm Condition:

[ Metadata usage for LVM thin pool <VG name>/<Pool name> exceeded threshold and automatic extension failed.

Metadata usage for LVM thin pool <VG name>/<Pool name> exceeded threshold ]; threshold x%, actual y%.

Entity Instance

<hostname>.lvmthinpool=<VG name>/<Pool name>

Degrade Affecting Severity:

None

Severity:

C*

Proposed Repair Action

Increase Storage Space Allotment for Cinder on the ‘lvm’ backend. Consult the user documentation for more details. If problem persists, contact next level of support and provide the output of the following commands:

  • ceph status

  • ceph fs status

  • system host-fs-list <hostname>

  • system controllerfs-list


Alarm ID: 800.104

Storage Alarm Condition:

<storage-backend-name> configuration failed to apply on host: <host-uuid>.

Degrade Affecting Severity:

None

Severity:

C*

Proposed Repair Action

Update backend setting to reapply configuration. Consult the user documentation for more details. If problem persists, contact next level of support and provide the output of the following commands:

  • ceph status

  • ceph fs status

  • system storage-backend-list

  • system storage-backend-show <storage-backend name>