SNMPv3 Support

Storyboard: https://storyboard.openstack.org/#!/story/2008132

This story introduces upgrade to Net-SNMP 5.8 version into the StarlingX solution in order to support SNMP v2c and v3 and provides a Net-SNMP containerized solution.

Problem description

Users want the ability to manage the StarlingX solution with SNMP v2c and v3. Current StarlingX does not support SNMPv3. The infrastructure management shall include the next requirements:

Net-SNMP’s features include all the mentioned requirements. Net-SNMP is an open source project. More information available at http://www.Net-SNMP.org/docs/readmefiles.html.

In addition to providing SNMPv3 support, this story will also containerize the StarlingX SNMP solution. This is consistent with long term direction of StarlingX, to containerize more of the StarlingX flock components.

Use Cases

  • End user wants to monitor StarlingX infrastructure’s Alarms and Logs via SNMP v2c and/or v3 from their SNMP Manager.

  • End user wants to use SNMP v2c and/or v3 GET/GETNEXT to get the contents of the ActiveAlarmTable and the EventLogTable in the wrsAlarmMib.

  • End user wants to receive SNMP v2c and/or v3 traps defined in the wrsAlarmMib.

Proposed change

SNMP integration

StarlingX platform is currently supporting SNMP v2c in a non-containerized solution, on the host of controller/master nodes. It uses the dynamic-loading/SNMPd-plugin approach to bind the host-based FM get methods to the appropriate nodes of the OID tree in the host-based Net-SNMP process. It uses the SNMPtrap CLI invoked from host-based FM alarm/log collection code, to generate SNMP Traps. And finally, it uses StarlingX REST APIs / CLIs to configure V2C Communities and V2 Trap Destinations.

The StarlingX SNMP solution will change to use extended Net-SNMP’s MasterAgent/SubAgent integration in order to deal with Net-SNMP being containerized and the FM application, supporting the wrsAlarmMib, being either host-based (current) or containerized (future). Specifically, Net-SNMP will run in a container as the MasterAgent, and a containerized FM-SubAgent will be implemented to interact with the host-based FM application’s postgres DB Tables. The (containerized) FM SubAgent will internally use the existing cgtsAgentplugin logic (through fmcommon.so), to bind the existing host-based FM query methods to the appropriate local OID trees (alarm & events) within the SubAgent code and trigger the SubAgent to register for those OID subtrees with the Net-SNMP MasterAgent.

A containerized FM-Trap-SubAgent will be implemented to interact with the host-based FM application’s log handling and the Net-SNMP MasterAgent. Specifically, the host-based FM-Mgr trap handling code will forward the alarm/log data to the FM-Trap-SubAgent (if configured), and the FM-Trap-SubAgent will leverage Net-SNMP subagent APIs for generating traps and sending to the Net-SNMP MasterAgent for distribution to the configured trap destinations.

V2C Communities, V3 users and Trap Destinations will be configured through override values in the Net-SNMP helm chart, which will be part of the new Net-SNMP system application. The existing StarlingX REST APIs / CLIs for SNMP configuration will be removed.

The Net-SNMP helm chart will use a kubernetes deployment and liveness/readiness probes. Net-SNMP does not support an active/active deployment, therefore the kubernetes deployment will be limited to a replica of 1 and rely on kubernetes dead host detection times and dead container detection times (through liveness/readiness probes) in order to restart failed SNMP containers.

For networking, the nginx-ingress-controller in the platform will be used to direct ingress traffic from UDP port 161 to the internal Net-SNMP ClusterIP kubernetes service.

For Distributed Cloud configuration, the syncing of SNMP trap destination and community configuration accross subclouds would be removed. Each subcloud will need to be configured for SNMP independently, through the SNMP Helm chart / Armada application.

Packaging & installation

A new optional ‘SNMP’ system application (Armada manifest and Helm chart) will be developed. This will include:

  • The building of a Net-SNMP MasterAgent container image within StarlingX and delivered in the dockerhub StarlingX repo,

  • The building of an FM-SubAgent container image (for handling SNMP GETs, etc) within StarlingX and delivered in the docker hub StarlingX repo,

  • The building of an FM-Trap-SubAgent container image (for handling SNMP Traps) within StarlingX and delivered in the docker hub StarlingX repo,

  • An Armada manifest containing a reference to a single Helm chart for Net-SNMP MasterAgent container, FM-SubAgent container and the FM-Trap-SubAgent container, and

  • A helm chart for the Net-SNMP MasterAgent container, FM-SubAgent container and the FM-Trap-SubAgent container.

The Net-SNMP Armada application tarball will be packaged as an RPM in the StarlingX ISO such that the application tarball is installed (but not uploaded or applied) as part of the StarlingX install.

Alternatives

The existing Net-SNMP integration in StarlingX could have been extended to support SNMPv3, by adding new V3 Users and V3 Trap Destinations to the StarlingX REST APIs / CLIs. However, given the long-term direction for StarlingX to containerize its flock components and given that the SNMP solution is relatively isolated, it was decided to containerize the SNMP solution and leverage Helm for deployment and configuration of Net-SNMP.

For High Availability, for improved switchover times on failure, we may look at leveraging Kubernetes leader election to run Net-SNMP active/standby within a deployment of replica=2 .

There are others commercial and open source alternatives rather than Net-SNMP, however Net-SNMP is being the SNMP tool installed in StarlingX in current implementation, it is an mature Open Source project with more than 20 years in the market and a lot of releases and it has been integrated with StarlingX successfully. Net-SNMP has also an active user and developer community support.

Data model impact

The existing StarlingX Data Model of SNMP configuration will be removed, I.e. specifically the postgres DB tables and sysinv CLI/RESTAPIs for the SNMP V2C Community table and the SNMP V2C Trap Destination Tables. SNMP Configuration will now be done through Helm Chart overrides of the Net-SNMP system application.

Since SNMP support is already provided by Net-SNMP 5.7.2 in StarlingX there are no changes in the internal Net-SNMP data model. The changes will be focused on containerize Net-SNMP 5.8 inside StarlingX solution. Additionally, since SNMP support would be provided by this new optional Armada application, it means that it will not be included in a fresh install.

REST API impact

The following REST APIs for configuring SNMP will be removed:

SNMP Configuration will now be done through Helm Chart override of the Net-SNMP system application.

Security impact

Support for SNMPv3 provides improved security over the current SNMPv2C support. SNMPv3 provides both secure user/password authentication and encryption of SNMP PDUs. SNMPv2C provides only a clear text password/community-string check and no encryption.

Net-SNMP is currently working on StarlingX solution and the changes to upgrade the Net-SNMP version and start supporting SNMP v3 is not impacting security by exposing a new API for configuration or usage.

Other end user impact

Ability to optionally use SNMPv3 instead of SNMPv2 for monitoring StarlingX Alarms and Logs.

Performance Impact

Since the solution is to containerize Net-SNMP and the code for sending traps would be modified to support not only SNMP v2c but v3 traps, so there is no impact on performance.

Other deployer impact

Configuration of SNMP will be done through Helm Chart overrides as opposed to StarlingX REST APIs / CLIs.

Developer impact

This may impact the work currently being done to containerize portions of FM code. This work is covered by a different Storyboard Story and has yet to be merged.

Upgrade impact

The SNMP solution is not considering to cover the upgrade scenario from STX 4.0 (old StarlingX implementation) to STX 5.0 (new StarlingX implementation). The rationale for this is that SNMP is not a system-critical service and the amount of SNMP configuration, that would need to be re-configured, is extremely small.

The resulting behaviour for software upgrade from STX 4.0 to STX 5.0 will be that any existing SNMP Configuration from the STX4.0 deployment will be lost. After finishing the software upgrade to STX 5.0, the new SNMP Armada application will need to be installed and the old SNMP configuration re-entered as helm overrides for this new SNMP Armada application.

Software upgrades from STX 5.0 to future release will be supported with no configuration loss.

Implementation

Assignee(s)

Primary assignee:

  • Gustavo Dobro (PL)

  • Jose Infanzon (TL)

Repos Impacted

  • Net-SNMP-armada-app (new repo)

  • config

  • config-files

  • distcloud

  • fault

Work Items

  • Create new repo for the new application ‘SNMP’,

  • Create SNMP helm chart, containing Net-SNMP MasterAgent container, FM-SubAgent container and the FM-Trap-SubAgent container,

  • With helm chart override values for configuring Net-SNMP and adding additional mibs,

  • Define required armada manifest,

  • Build new SNMP armada tarball and package in RPM,

  • Build and deliver Net-SNMP MasterAgent container image,

  • Implement system override plugin for the SNMP armada application in order to determine FM DB connection values from current system configuration and pass those details to the Net-SNMP MasterAgent container through a helm chart override,

  • Only required depending on # of replicas supported,

  • Remove existing StarlingX REST API and CLI commands related to SNMP configuration,

  • Implement FM SubAgent container image and support for SNMP GET/GETNEXT,

  • Implement FM generation container image of traps within context of SubAgent,

  • Implement changes to host-based FM-Mgr’s asynchronous generated alarm/log handling to send alarm/log data to the FM-Trap-SubAgent, if configured,

  • Remove existing host-based Net-SNMP implementation,

  • Update existing documentation.

Dependencies

None

Testing

  • SNMP pods should return to a ready state after being restarted as indicated by ‘kubectl get pods’.

  • User overrides should be available for various parameters including SNMP configuration.

  • Users should be able to perform SNMPGET/BULK/WALK operations with SNMP v2c and v3.

  • Configure SNMP trap destination and check if SNMP v2c and v3 traps are sent.

  • Validate that coldTraps and warmTraps are being sent.

  • Validate all existing StarlingX REST API / CLI commands related to SNMP are removed and documentation is updated.

  • Validate documentation on configurating SNMP.

  • Verify that on StarlingX Install, the new SNMP application is installed but NOT uploaded and NOT applied,

  • Verify system behaviour (e.g. log/alarm handling) with SNMP application NOT applied,

  • Verify system behaviour with SNMP application applied, and v2c communities and V3 users and trap destinations defined,

  • Verify system behaviour after removing SNMP application.

  • Test system behaviour when incorrect snmpd.conf data is specified in helm chart overrides. And document procedure for user to verify that SNMP application applied without error, and if error, how to determine info on error.

  • Test on all system configurations (AIO-SX, AIO-DX, Standard and DC)

  • Test controller switchovers (failures and manual) on dual controller systems

  • Test Dead-Office-Recovery

  • Test that the upgrade from from STX.4.0 to STX.5.0 removes STX.4.0 SNMP configuration and that SNMP Armada application can be installed and configured on STX.5.0 after the upgrade.

Documentation Impact

Documentation to be updated with user override configuration parameters and availability of SNMP v3 in StarlingX

References

Feature storyboard: https://storyboard.openstack.org/#!/story/2008132

Net-SNMP: http://www.Net-SNMP.org/

History

Revisions

Release Name

Description

STX 5.0

Introduced