800 Series Alarm Messages¶

Alarm Severities

One or more of the following severity levels is associated with each alarm.

Critical

Indicates that a platform service affecting condition has occurred and immediate corrective action is required. (A mandatory platform service has become totally out of service and its capability must be restored.)

Major

Indicates that a platform service affecting condition has developed and urgent corrective action is required. (A mandatory platform service has developed a severe degradation and its full capability must be restored.)

- or -

An optional platform service has become totally out of service and its capability should be restored.

Minor

Indicates that a platform non-service affecting fault condition has developed and corrective action should be taken in order to prevent a more serious fault. (The fault condition is not currently impacting / degrading the capability of the platform service.)

Warning

Indicates the detection of a potential or impending service affecting fault. Action should be taken to further diagnose and correct the problem in order to prevent it from becoming a more serious service affecting fault.

Alarm ID: 800.001	Possible data loss. Any mds, mon or osd is unavailable in storage replication group.
Entity Instance	cluster=<dist-fs-uuid>
Degrade Affecting Severity:	none
Severity:	[‘critical’, ‘major’]
Proposed Repair Action	Manually restart Ceph processes and check the state of the Ceph cluster with ‘ceph -s’ If problem persists, contact next level of support.
Management Affecting Severity	warning

Alarm ID: 800.010	Potential data loss. No available OSDs in storage replication group.
Entity Instance	cluster=<dist-fs-uuid>.peergroup=<group-x>
Degrade Affecting Severity:	none
Severity:	[‘critical’]
Proposed Repair Action	Ensure storage hosts from replication group are unlocked and available. Check replication group state with ‘system host-list’ Check if OSDs of each storage host are up and running. Manually restart Ceph processes and check the state of the Ceph OSDs with ‘ceph osd stat’ OR ‘ceph osd tree’ If problem persists, contact next level of support.
Management Affecting Severity	warning

Alarm ID: 800.011	Loss of replication in peergroup.
Entity Instance	cluster=<dist-fs-uuid>.peergroup=<group-x>
Degrade Affecting Severity:	none
Severity:	[‘major’]
Proposed Repair Action	Ensure storage hosts from replication group are unlocked and available. Check replication group state with ‘system host-list’ Check if OSDs of each storage host are up and running. Manually restart Ceph processes and check the state of the Ceph OSDs with ‘ceph osd stat’ AND/OR ‘ceph osd tree’ If problem persists, contact next level of support.
Management Affecting Severity	warning

Alarm ID: 800.104	Storage Alarm Condition: <storage-backend-name> configuration failed to apply on host: <host-uuid>.
Entity Instance	storage_backend=<storage-backend-name>
Degrade Affecting Severity:	none
Severity:	critical
Proposed Repair Action	Update backend setting to reapply configuration. Use the following commands to try again: ‘system storage-backend-delete <storage-backend-name>’ AND ‘system storage-backend-add <storage-backend-name>’ See the StarlingX documentation at https://docs.starlingx.io/ for more details. If problem persists, contact next level of support.
Management Affecting Severity	major

800 Series Alarm Messages

800 Series Alarm Messages¶

StarlingX