Kubernetes Power Manager integration on StarlingX

Storyboard: #2010737

The objective of this spec is to introduce Configurable Power Management in StarlingX Platform.

Problem description

StarlingX, on its current versions, does not offer a comprehensive set of features for power management. There are important limitations on the maximum frequency control that the processors can assume. Currently, this control is generalized, i.e., it does not allow individualized control of the CPUs/cores.

Users, however, have power management needs with greater scope and higher granularity, focused on containerized applications using power profiles individually by core and/or application. Among the user’s needs, we can highlight the control of acceptable frequency ranges (minimum and maximum frequency) per core, the behavior of the core in this range (governor), which power levels (c-states) a given core can access, as well as the behavior of the system in the face of workloads with known intervals/demands.

Kubernetes Power Manager is a Kubernetes Operator designed to expose and utilize the power control technologies present in some processors in a Kubernetes cluster. Its main application is directed to power control in situations of workloads in known periods and power optimization even in high performance workloads.

By controlling CPU performance states (P-states) and CPU idle states (C-states), the tool allows each core to be individually controlled according to the needs of each application’s workload. Due to this feature and its adherence to the StarlingX proposal, this spec seeks to observe the applicability, requirements, quality of operation of the Kubernetes Power Manager and its integration on StarlingX platform.

Below, there is a study about Kubernetes Power Manager and all required changes regarding StarlingX Platform in Proposed change and Work Items

Kubernetes Power Manager Components

  • Power Manager: controls the nodes, serves as a manager, or source of truth gathering information on the power profiles applied to each node

    • Power Config Controller: part of the power manager, is responsible for evaluating the presence of a default power configuration for the node and, when present, starting the Power Node Agent

    • Power Config: describes the power configuration of a given node indicating one or more profiles that can be used

  • Power Node Agent: per-node pod managed by a DaemonSet, responsible for managing the power profiles applied. Communicates with Power Manager to establish power policies to be applied to the node

    • Power Profile: establishes CPU operating frequency ranges, Energy Performance Preference (EPP), and governor. The profile has a generic aspect, that is, it only describes a possible style of power control, and its application is the responsibility of Power Workload. The pods’ deployment files, or Power Config files, can indicate the profiles they want to use by including the device power.intel.com/<POWERPROFILE>. It is also important to note that all CPUs not assigned to a specific power profile are pooled in a profile known as the “Shared”. This profile must be created manually by the user

    • Power Workload: responsible for applying a Power Profile. Its scope is set automatically when a pod requests a default power profile or via a configuration file that describes the affected CPUs. Preset profiles have workloads created automatically. The Shared Power Profile, and other personalized profiles, does not have an automatically assigned workload (needs to be created manually)

Use Cases

After installing and enabling the Kubernetes Power Manager, the user will be able to indicate which power settings will be assigned to the cores of a given application. Below, 3 examples are presented, describing common situations of use. Further details can be consulted in the Kubernetes Power Manager documentation.

  • Example A: The user wants the application to have high performance. Fragment of pod_spec.yaml to be deployed:

    # (...)
    resources:
      requests:
        cpu: "4"
        memory: "1G"
        power.intel.com/performance: "4"
      limits:
        cpu: "4"
        memory: "1G"
        power.intel.com/perfomance: "4"
    # (...)
    
  • Example B: On a server with only two c-state levels available (C3 and C4), the user wants the high-performance profiled cores to be kept at higher levels (C3) as well as idle cores to access the lowest level (C4). Fragment of the c-state.yaml profile configuration file:

    apiVersion: power.intel.com/v1
    kind: CStates
    metadata:
      name: worker-0
    spec:
      # (...)
      sharedPoolCStates:
        C3: false
        C4: true
      exclusivePoolCStates:
        performance:
          C3: true
          C4: false
      # (...)
    
  • Example C: User wants to create a profile specific to their needs. First step: deploy custom-profile.yaml. In this case the profile has the name “one-profile”. The min, max, epp, and governor can be set by the user.

    apiVersion: power.intel.com/v1
    kind: PowerProfile
    metadata:
      name: one-profile
      namespace: intel-power
    spec:
      name: one-profile
      max: 2200
      min: 2000
      epp: power
      governor: powersave
    

    Second step: deploy the pod_spec.yaml. Fragment to be deployed:

    # (...)
    resources:
      requests:
        cpu: "1"
        memory: "1G"
        power.intel.com/one-profile: "1"
      limits:
        cpu: "1"
        memory: "1G"
        power.intel.com/one-profile: "1"
    # (...)
    

Proposed change

The Kubernetes Power Manager, when disabled, will not offer any change in StarlingX standard behavior (keeping the system running at maximum performance the entire time). When activated, however, the power management system will allow the user to apply power settings as needed under conditions described below.

The power manager system is based on four standard power profiles and possible user-customized profiles. Whenever a certain application needs high performance, for example, the power profile “performance” must be declared in its deployment file. The power manager, in turn, will configure the profile on the CPU(s) assigned by Kubernetes to the Pod.

The default profiles “performance”, “balanced-performance”, “balanced-power” and “power” will be automatically configured during the installation process. It will be up to the user to create new profiles as needed. All cores not assigned to a Pod, or idle, will have their power profile set to wide frequency (minimum equals the minimum supported by the processor and maximum equals the maximum supported), as currently occurs in the system without power control.

The standard power level (c-state) that the cores will assume will also be assigned. The user will be free to change the c-states individually, indicating which states a certain core can assume, or by group, indicating which states the cores of a certain power profile can assume.

All the functionalities accessible to the user can be controlled by applying appropriate yaml files.

It is important to note that the user will be free to modify the energy settings of the cores intended for system support (platform cores), but all these settings will be overwritten during the lock/unlock process, to maintain the integrity of the system.

After installing the Kubernetes Power Manager, it will be necessary to enable it on the desired hosts by setting the label “power-management=enabled”, which will trigger the removal of the limitation of C-state C0 on nodes where the worker function is present.

Alternatives

None

Data model impact

None

REST API impact

None

Security impact

None

Other end-user impact

Some SysInv/Horizon commands may be deprecated with Kubernetes Power Manager integration (see all user configurable parameters on Host CPU MHz Parameters Configuration).

In case where the user tries to use these deprecated parameters with Kubernetes Power Manager enabled, the system should not accept these actions and prompt the user.

Performance Impact

Enabling Kubernetes Power Manager on StarlingX can cause performance impacts related to power consumption, latency, and throughput. Here are some considerations for these aspects:

  • Power Consumption: By actively monitoring and controlling power usage through policies, Kubernetes Power Manager can optimize power consumption based on workload demands, potentially reducing overall power consumption in the cluster. On the other hand, incorrect or inconsistent configuration can lead to degraded performance or increased power consumption.

  • Latency: C-States range from C0 to Cn. C0 indicates an active state. All other C-states (C1-Cn) represent idle sleep states with different parts of the processor powered down. As the C-States get deeper, the exit latency duration becomes longer (the time to transition to C0) and the power savings becomes greater. This could slightly increase the time required for resource management operations, such as scaling, scheduling, as well as platform and end-user tasks. However, the appropriate configuration of Power Manager can reduce the magnitude of this impact.

  • Throughput: The impact on throughput depends on how well Kubernetes Power Manager is configured to handle resource allocation while considering power constraints, potentially optimizing the cluster’s performance and increasing throughput. However, if Power Manager makes suboptimal decisions, it may impact throughput negatively.

The exact performance impact will depend on several factors such as workload characteristics, cluster configuration, and the specific configuration of Power Manager. Conducting thorough testing in the end-user environment is recommended to understand the precise effects on power consumption, latency, throughput, and other aspects.

Another deployer impact

None

Developer impact

None

Upgrade impact

None

Implementation

Assignee(s)

Primary assignee:

  • Guilherme Batista Leite (guilhermebatista)

Other contributors:

  • Davi Frossard (dbarrosf)

  • Eduardo Alberti (ealberti)

  • Fabio Studyny Higa (fstudyny)

  • Pedro Antônio de Souza Silva (pdesouza)

  • Reynaldo Patrone Gomes Filho (rpatrone)

  • Romão Martines (rmartine)

  • Thiago Antonio Miranda (tamiranda)

Repos Impacted

  • starlingx/docs

  • starlingx/config

  • starlingx/stx-puppet

  • starlingx/app-kubernetes-power-manager (new)

Work Items

Investigations and design

  • Investigation and evaluation of CPU Power Manager architecture and requirements

  • Evaluation of platform and application control for p-states and c-states

  • Design proposal and review for p-state and c-state control

  • Minor customizations for Kubernetes Power Manager may also be introduced, for instance, the modification to accept the use of isolated CPUs.

Kubernetes Power Manager Integration

  • Installation via system application

  • Default policy configuration:

    • Default p-state configuration and policy for platform cores (p-states enabled with full frequency range)

    • Default c-state configuration and policy for platform cores (c-states enabled with maximum idle state, limited to C6)

    • Default p-state configuration and policy for application cores (p-states enabled with full frequency range)

    • Default c-state configuration and policy for application cores (c-states enabled with maximum idle state, limited to C1)

Dependencies

Testing

System configuration

The system configurations that we are assuming for testing are:

  • AIO-SX

  • Standard

Test Scenarios

  • Functional tests for Kubernetes Power Manager and its possible customizations.

  • The usual unit testing in the impacted code areas.

  • Performance testing to identify and address any performance impacts.

  • Backup and restore tests.

  • Upgrade test to verify behavior of deprecated Host CPU MHz parameters.

Documentation Impact

The end-user documentation will need to be changed, adding Kubernetes Power Manager application deployment and configuration, as well as the customization of default and new policies.

References

  1. Kubernetes Power Manager

History

Revisions

Release Name

Description

stx-9.0

Introduced