800 Series Alarm Messages¶

The system inventory and maintenance service reports system changes with different degrees of severity. Use the reported alarms to monitor the overall health of the system.

Alarm messages are numerically coded by the type of alarm.

For more information, see Fault Management Overview.

In the alarm description tables, the severity of the alarms is represented by one or more letters, as follows:

C: Critical

Indicates that a platform service affecting condition has occurred and immediate corrective action is required. (A mandatory platform service has become totally out of service and its capability must be restored.)
M: Major

Indicates that a platform service affecting condition has developed and urgent corrective action is required. (A mandatory platform service has developed a severe degradation and its full capability must be restored.)
- or -
An optional platform service has become totally out of service and its capability should be restored.
m: Minor

Indicates that a platform non-service affecting fault condition has developed and corrective action should be taken in order to prevent a more serious fault. (The fault condition is not currently impacting / degrading the capability of the platform service.)
W: Warning

Indicates the detection of a potential or impending service affecting fault. Action should be taken to further diagnose and correct the problem in order to prevent it from becoming a more serious service affecting fault

A slash-separated list of letters is used when the alarm can be triggered with one of several severity levels.

An asterisk (*) indicates the management-affecting severity, if any. A management-affecting alarm is one that cannot be ignored at the indicated severity level or higher by using relaxed alarm rules during an orchestrated patch or upgrade operation.

Note

Degrade Affecting Severity: Critical indicates a node will be degraded if the alarm reaches a Critical level.

Alarm ID: 800.001	Storage Alarm Condition: Possible data loss. Any mds, mon or osd is unavailable in storage replication group.
Entity Instance	cluster=<dist-fs-uuid>
Degrade Affecting Severity:	None
Severity:	C/M*
Proposed Repair Action	Check the state of the Ceph cluster with ceph -s. If problem persists, contact next level of support.

Alarm ID: 800.003	Storage Alarm Condition: Quota/Space mismatch for the <tiername> tier. The sum of Ceph pool quotas does not match the tier size.
Entity Instance	cluster=<dist-fs-uuid>.tier=<tiername>
Degrade Affecting Severity:	None
Severity:	m
Proposed Repair Action	Update Ceph storage pool quotas to use all available tier space.

Alarm ID: 800.010	Potential data loss. No available OSDs in storage replication group.
Entity Instance	cluster=<dist-fs-uuid>.peergroup=<group-x>
Degrade Affecting Severity:	None
Severity:	C*
Proposed Repair Action	Ensure storage hosts from replication group are unlocked and available. Check replication group state with system host-list. Check if OSDs of each storage host are up and running. Check the state of the Ceph OSDs with ceph osd stat OR ceph osd tree. If problem persists, contact next level of support.

Alarm ID: 800.011	Loss of replication in peergroup.
Entity Instance	cluster=<dist-fs-uuid>.peergroup=<group-x>
Degrade Affecting Severity:	None
Severity:	M*
Proposed Repair Action	Ensure storage hosts from replication group are unlocked and available. Check replication group state with system host-list. Check if OSDs of each storage host are up and running. Check the state of the Ceph OSDs with ceph osd stat AND/OR ceph osd tree. If problem persists, contact next level of support.

Alarm ID: 800.102	Storage Alarm Condition: PV configuration <error/failed to apply> on <hostname>. Reason: <detailed reason>.
Entity Instance	pv=<pv_uuid>
Degrade Affecting Severity:	None
Severity:	C/M*
Proposed Repair Action	Remove failed PV and associated Storage Device then recreate them.

Alarm ID: 800.103	Storage Alarm Condition: [ Metadata usage for LVM thin pool <VG name>/<Pool name> exceeded threshold and automatic extension failed. Metadata usage for LVM thin pool <VG name>/<Pool name> exceeded threshold ]; threshold x%, actual y%.
Entity Instance	<hostname>.lvmthinpool=<VG name>/<Pool name>
Degrade Affecting Severity:	None
Severity:	C*
Proposed Repair Action	Increase Storage Space Allotment for Cinder on the ‘lvm’ backend. Try the following commands: vgextend <VG name> <PV name> or vgextend -L +<size extension> <PV name>. Check status with vgdisplay. Consult the System Administration Manual for more details. If problem persists, contact next level of support.

Alarm ID: 800.104	Storage Alarm Condition: <storage-backend-name> configuration failed to apply on host: <host-uuid>.
Degrade Affecting Severity:	None
Severity:	C*
Proposed Repair Action	Update backend setting to reapply configuration. Use the following commands to try again: system storage-backend-delete <storage-backend-name> AND system storage-backend-add <storage-backend-name>. Consult the user documentation for more details. If problem persists, contact next level of support.

800 Series Alarm Messages

800 Series Alarm Messages¶

StarlingX R6.0