C-state Management Application on StarlingX

Storyboard: #2011105

The objective of this spec is to introduce the C-state Management Application in StarlingX Platform.

Problem description

StarlingX, in its current version, offers a comprehensive set of features for power management. Allowing users and applications to control acceptable frequency ranges (minimum and maximum frequency) per core; the behavior of cores in such ranges (governor); which idle sleep states (C-states) a given core can access, as well as the behavior of the system in the face of workloads with known intervals/demands. Kubernetes Power Manager powers the control of the aforementioned features in targeted CPUs/cores, allowing individualized configurations.

Oftentimes, containerized applications require greater granularity by controlling their CPU idle states (C-states) in execution time. The C-state Management Application offers a set of endpoints that enable pods to dynamically consult and adjust their C-states. Therefore, it allows users to save energy by offering fine-grained control of the C-states of the cores assigned to its applications.

Use Cases

With the introduction of these new capabilities for C-state management, StarlingX end users and deployers gain enhanced control over the CPU core configurations. These new features are beneficial for optimizing power consumption and performance.

We identify the following potential impacts to StarlingX’s stakeholders with this dynamic C-state management integration:

  • End users: The ability to adjust the maximum C-state level of CPU cores assigned to pods through REST API requests offers increased flexibility without disrupting existing workflows. This feature ensures seamless integration with applications running on StarlingX, enhancing user experience.

  • Deployers: The introduction of dynamic C-state management may necessitate minor adjustments for deployers, primarily related to ensuring that assigned CPU cores are appropriately configured as application-isolated or exclusively allocated to the pods. Additionally, deployers may need to ensure that REST API requests for C-state adjustments originate from the same node where the application’s pods are deployed, maintaining security and efficiency.

  • Developers: The integration of C-state management brings significant enhancements to the development workflow within StarlingX. By incorporating a dynamic C-state management functionality, developers gain a more granular level of control over CPU core configurations, allowing for finer optimization of power usage and system performance.

Proposed change

The new C-state Management Application will be introduced to StarlingX, resulting in the addition of a REST API that empowers pods to dynamically control their C-states. When disabled, the application will not add changes to StarlingX’s standard behavior. When enabled, the Kubernetes pods will be able to programmatically manage their C-state.

C-state Management Application essentially provides endpoints that enable the following functionalities:

  • Change the maximum C-state Level of CPU Cores.

    • The application, via its REST API, initiates a request to modify the maximum C-state level of the CPU cores allocated to its pods.

    • The assigned CPU cores must either adhere to application isolation or be exclusively assigned to the pods.

    • The request originates from the node on which the application’s pods are deployed.

  • Query the Maximum Available C-state Levels.

    • The application, through its REST API, sends a request to inquire about the maximum C-state levels available for modification.

  • Query the Maximum C-state Configuration

    • The application, utilizing its REST API, requests information regarding the configured maximum C-state from the node where its pods are currently deployed.

This specification also requires that the cloud platform shall be able to:

  • Process the C-state level requests (change/query) and respond if the change occurred or to report the current max c-state level.

  • Process the max C-state level requests (change/query) on the Platform cores, in other words, it shall run the API producer on the Platform cores.

  • Fulfill the request to change the max c-state within a granularity of seconds.

Alternatives

None

Data model impact

None

REST API impact

None

Security impact

None

Other end user impact

A new REST API will be available, resulting in procedural changes for dynamically managing C-states on StarlingX. The users should be aware that the C-state Management Application is not designed to work in tandem with Kubernetes Power Manager. Therefore, we recommend the use of only one of the aforementioned applications at a time.

C-state availability might be conditioned to the presence of a label such as power-management. The C-state Management Application is able to manage the available C-states independently of the applied labels.

Performance Impact

Given the nature of dynamic C-state management, impacts related to power consumption and latency are expected to vary based on the usage of C-state Management Application. The following shall be considered:

  • Power Consumption: By actively monitoring and controlling the C-states, applications can optimize power consumption based on workload demands, reducing the overall energy consumption in the cluster. On the other hand, an incorrect or inconsistent configuration might lead to performance degradation.

  • Latency: C-States range from C0 to Cn. C0 indicates an active state. All other C-states (C1-Cn) represent idle sleep states with different parts of the processor powered down. As the C-States get deeper, the exit latency duration becomes longer (the time to transition to C0) and the power savings become greater. This potentially increases the time required for processing varying workloads based on pre-defined parameters.

Other deployer impact

None

Developer impact

Please see the Use Cases section.

Upgrade impact

None

Implementation

Assignee(s)

Primary assignee:

  • Guilherme Batista Leite (guilhermebatista)

Other contributors:

  • Alyson Deives Pereira (adeivesp)

  • Eduardo Juliano Alberti (ealberti)

  • Fabio Studyny Higa (fstudyny)

  • Guilherme Henrique Pereira dos Santos (gsantos1)

  • Vinicius Fernando Rocha Lobo (vrochalo)

Repos Impacted

  • starlingx/docs

  • starlingx/config

  • starlingx/app-cstate-management (new)

Work Items

The following work items are expected to be carried out, with the understanding that the storyboard will be updated as more work items are found to be necessary.

Spikes and Design

  • Basic testing of per-cpu latency specification.

  • Review of the proposed design.

  • Evaluation of options to reduce latency and expected latency reduction.

Development Work Items

  • Merge proof of concept to StarlingX codebase.

  • Create FluxCD manifest for C-state DaemonSet.

  • Create StarlingX application to wrap the FluxCD manifest.

  • Enhance C-state application to support IPv6 addresses.

  • Enhance C-state application to prevent modification of CPUs allocated to other Pods.

  • Installation via system application.

Customer Documentation

  • Publish the usage guide for what functionality is available and how to make use of it.

  • Sample code showing how to make use of the functionality.

Dependencies

None

Testing

System configuration

The tests will be conducted in the following system configurations:

  • AIO-SX

  • AIO-DX

  • Standard

Test Scenarios

  • Functional tests for C-state Management Application and its customizations.

  • Unit testing the impacted code areas.

  • Performance testing to identify and address any performance impacts.

  • Backup and restore tests.

Documentation Impact

The end-user documentation must be created, adding a guide to C-state Management Application deployments, configurations and customizations.

References

  1. Kubernetes Power Manager

History

Revisions

Release Name

Description

stx-10.0

Introduced