Install Power Metrics Application

The Power Metrics app deploys two containers, cAdvisor and Telegraf that collect metrics about hardware usage. This document describes the technical preview of the Power Metrics functionality.

Prerequisites

For running power-metrics, your system must have the following drivers:

cpufreq kernel module

exposes per-CPU Frequency over sysfs (/sys/devices/system/cpu/cpu%d/cpufreq/scaling_cur_freq)

msr kernel module

provides access to processor model specific registers over devfs (/dev/cpu/cpu%d/msr)

intel-rapl module

exposes Intel Runtime Power Limiting metrics over sysfs (/sys/devices/virtual/powercap/intel-rapl)

intel-uncore-frequency module

exposes Intel uncore frequency metrics over sysfs (/sys/devices/system/cpu/intel_uncore_frequency)

Uncore events can only be loaded from the following cpu models:

Model number

Processor name

0x55

Intel Skylake-X

0x6A

Intel IceLake-X

0x6C

Intel IceLake-D

0x47

Intel Broadwell-G

0x4F

Intel Broadwell-X

0x56

Intel Broadwell-D

0x8F

Intel Sapphire Rapids X

0xCF

Intel Emerald Rapids X

Source: https://github.com/influxdata/telegraf/blob/master/plugins/inputs/intel_powerstat/README.md#supported-cpu-models

Procedure

  1. Upload the application.

    [sysadmin@controller-0 (keystone_admin)]$ system application-upload /usr/local/share/applications/helm/power-metrics-[version].tgz
    
  2. Apply the application.

    [sysadmin@controller-0 (keystone_admin)]$ system application-apply power-metrics
    
  3. Wait until Power Metrics is in applied state.

    [sysadmin@controller-0 (keystone_admin)]$ system application-show power-metrics
    
  4. Assign a label to the node.

    Note

    A label must be assigned for the power-metrics to be enabled in a node.

    power-metrics:enabled
    
    [sysadmin@controller-0 (keystone_admin)]$ system host-label-assign <node-name> power-metrics=enabled
    

Results

The Power Metrics should be installed and both cAdvisor and Telegraf pods must be up and running.

sysadmin@controller-0:~$ kubectl get pods -n power-metrics

NAME                              READY   STATUS    RESTARTS   AGE
cadvisor-v76zx                    1/1     Running   0          26h
telegraf-mc6vd                    1/1     Running   0          4d7h

It is possible to change some configurations via overrides.

Telegraf

Enable and disable Intel PMU metrics

You can activate the Intel PMU plugin with the following command:

  [sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --set pmu_enabled=true
+----------------+-------------------+
| Property       | Value             |
+----------------+-------------------+
| name           | telegraf          |
| namespace      | power-metrics     |
| user_overrides | pmu_enabled: true |
|                |                   |
+----------------+-------------------+

Override Input Plugins

You can change the default input plugins parameters by override.

The default plugin parameters include CPU and package metrics.

The list of available options for both CPU and package metrics can be found on the powerstat documentation: https://github.com/influxdata/telegraf/blob/master/plugins/inputs/intel_powerstat/README.md#configuration

Note

When overriding, you must inform both metrics parameters (CPU and package), otherwise the plugin would stop collecting the missing metrics.

Example of overriding the powerstat plugin:

Procedure

  1. Update the input parameters.

    [sysadmin@controller-0 ~(keystone_admin)]$ cat telegraf-powerstat.yaml
    config:
      inputs:
        # Default plugins to collect power-metrics data
        - intel_powerstat:
            cpu_metrics:
              - "cpu_frequency"
              - "cpu_busy_frequency"
              - "cpu_temperature"
              - "cpu_c0_state_residency"
              - "cpu_c1_state_residency"
              - "cpu_c6_state_residency"
              - "cpu_busy_cycles"
            package_metrics:
              - "current_power_consumption"
              - "current_dram_power_consumption"
              - "thermal_design_power"
              - "cpu_base_frequency"
              - "uncore_frequency"
        - intel_pmu:
            event_definitions:
              - "/etc/telegraf/events_definition.json"
            core_events:
              - events:
                  - INST_RETIRED.ANY
        - linux_cpu:
            metrics: ["cpufreq"]
    
  2. Apply the override.

    [sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --values telegraf-powerstat.yaml
    
    +----------------+------------------------------------------------+
    | Property       | Value                                          |
    +----------------+------------------------------------------------+
    | name           | telegraf                                       |
    | namespace      | power-metrics                                  |
    | user_overrides | config:                                        |
    |                |   inputs:                                      |
    |                |     - intel_powerstat:                         |
    |                |         cpu_metrics:                           |
    |                |         - cpu_frequency                        |
    |                |         - cpu_busy_frequency                   |
    |                |         - cpu_temperature                      |
    |                |         - cpu_c0_state_residency               |
    |                |         - cpu_c1_state_residency               |
    |                |         - cpu_c6_state_residency               |
    |                |         - cpu_busy_cycles                      |
    |                |         package_metrics:                       |
    |                |         - current_power_consumption            |
    |                |         - current_dram_power_consumption       |
    |                |         - thermal_design_power                 |
    |                |         - cpu_base_frequency                   |
    |                |         - uncore_frequency                     |
    |                |     - intel_pmu:                               |
    |                |       event_definitions:                       |
    |                |       - "/etc/telegraf/events_definition.json" |
    |                |       core_events:                             |
    |                |       - events:                                |
    |                |         - INST_RETIRED.ANY                     |
    |                |     - linux_cpu:                               |
    |                |         metrics: ["cpufreq"]                   |
    |                |                                                |
    +----------------+------------------------------------------------+
    
  3. Re-apply the application.

    [sysadmin@controller-0 ~(keystone_admin)]$ system application-apply power-metrics
    

Note

Power Metrics may increase the scheduling latency due to perf and MSR readings. It was observed that there was a latency impact of around 3 µs on average, plus spikes with significant increases in maximum latency values. There was also an impact on the kernel processing time. Applications that run with priorities at or above 50 in real time kernel isolated CPUs should allow kernel services to avoid unexpected system behavior.

Configuration Requirement for Power Metrics and linux_cpu

If the BIOS is not configured to delegate control to the operating system, the linux_cpu metrics may not function as expected. Remove linux_cpu to ensure that power-metrics operate correctly. In this case, metrics generated by linux_cpu will not be available.

To verify that the BIOS is properly configured, a frequency driver should be loaded in Linux. You can check this by running the cpupower frequency-info command.

Example:

sysadmin@controller-0:~$ cpupower frequency-info
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 800 MHz - 3.60 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 800 MHz and 2.50 GHz.
                 The governor "performance" may decide which speed to use
                 within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 2.50 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes

If there is no delegation from the BIOS to the operating system, the linux_cpu module may fail to function correctly. To enable power-metrics, it is necessary to remove the linux_cpu module. In this scenario, the performance metrics generated by the linux_cpu module will not be available.

Example:

sysadmin@compute-0:~$ cpupower frequency-info
analyzing CPU 0:
  no or unknown cpufreq driver is active on this CPU
  CPUs which run at the same hardware frequency: Not Available
  CPUs which need to have their frequency coordinated by software: Not Available
  maximum transition latency:  Cannot determine or is not supported.
Not Available
  available cpufreq governors: Not Available
  Unable to determine current policy
  current CPU frequency: Unable to call hardware
  current CPU frequency:  Unable to call to kernel
  boost state support:
    Supported: yes
    Active: yes

Intel Power Stat Configuration Behavior

This section describes the expected behavior for the [[inputs.intel_powerstat]] configuration for different configuration scenarios:

  • Empty configuration

    When the platform_metrics parameter is set to an empty array, as shown below, all the metrics should be restricted from being returned. This means, no metrics will be provided in this configuration.

    [[inputs.intel_powerstat]] platform_metrics = []

  • Default configuration

    With either the default configuration or when the [[inputs.intel_powerstat]] input is used without specifying platform_metrics, only the following metrics should be enabled:

    current_power_consumption current_dram_power_consumption thermal_design_power

    This default behavior ensures that only the essential power consumption metrics are collected.

  • Specific platform metrics

    If specific metrics are enabled using the following platform_metrics parameter, only the metrics specified in the platform_metrics array will be returned. No other metrics will be included beyond the explicitly listed ones.

    [[inputs.intel_powerstat]] platform_metrics = [“cpu_base_frequency”, …]

Add Input Plugins

You can add new plugins by overriding the inputs parameter.

Example of overriding the powerstat plugin:

  1. Add the cpu_c3_state_residency metric to the intel_powerstat/cpu_metrics plugin.

    [sysadmin@controller-0 ~(keystone_admin)]$ cat telegraf-powerstat.yaml
    config:
      inputs:
        # Default plugins to collect power-metrics data
        - intel_powerstat:
            cpu_metrics:
              - "cpu_frequency"
              - "cpu_busy_frequency"
              - "cpu_temperature"
              - "cpu_c0_state_residency"
              - "cpu_c1_state_residency"
              - "cpu_c3_state_residency"
              - "cpu_c6_state_residency"
              - "cpu_busy_cycles"
            package_metrics:
              - "current_power_consumption"
              - "current_dram_power_consumption"
              - "thermal_design_power"
              - "cpu_base_frequency"
              - "uncore_frequency"
            - intel_pmu:
                event_definitions:
                  - "/etc/telegraf/events_definition.json"
                core_events:
                  - events:
                      - INST_RETIRED.ANY
            - linux_cpu:
                metrics: ["cpufreq"]
    
  2. Apply the override.

    [sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --values telegraf-powerstat.yaml
    
    +----------------+------------------------------------------------+
    | Property       | Value                                          |
    +----------------+------------------------------------------------+
    | name           | telegraf                                       |
    | namespace      | power-metrics                                  |
    | user_overrides | config:                                        |
    |                |   inputs:                                      |
    |                |     - intel_powerstat:                         |
    |                |         cpu_metrics:                           |
    |                |         - cpu_frequency                        |
    |                |         - cpu_busy_frequency                   |
    |                |         - cpu_temperature                      |
    |                |         - cpu_c0_state_residency               |
    |                |         - cpu_c1_state_residency               |
    |                |         - cpu_c3_state_residency               |
    |                |         - cpu_c6_state_residency               |
    |                |         - cpu_busy_cycles                      |
    |                |         package_metrics:                       |
    |                |         - current_power_consumption            |
    |                |         - current_dram_power_consumption       |
    |                |         - thermal_design_power                 |
    |                |         - cpu_base_frequency                   |
    |                |         - uncore_frequency                     |
    |                |     - intel_pmu:                               |
    |                |       event_definitions:                       |
    |                |       - "/etc/telegraf/events_definition.json" |
    |                |       core_events:                             |
    |                |       - events:                                |
    |                |         - INST_RETIRED.ANY                     |
    |                |     - linux_cpu:                               |
    |                |         metrics: ["cpufreq"]                   |
    |                |                                                |
    +----------------+------------------------------------------------+
    
  3. Re-apply the application.

    [sysadmin@controller-0 ~(keystone_admin)]$ system application-apply power-metrics
    

Remove Input Plugins

You can remove plugins by overriding the inputs parameter.

  1. Remove the linux_cpu plugin.

    [sysadmin@controller-0 ~(keystone_admin)]$ cat telegraf-powerstat.yaml
    config:
      inputs:
        # Default plugins to collect power-metrics data
        - intel_powerstat:
            cpu_metrics:
              - "cpu_frequency"
              - "cpu_busy_frequency"
              - "cpu_temperature"
              - "cpu_c0_state_residency"
              - "cpu_c1_state_residency"
              - "cpu_c3_state_residency"
              - "cpu_c6_state_residency"
              - "cpu_busy_cycles"
            package_metrics:
              - "current_power_consumption"
              - "current_dram_power_consumption"
              - "thermal_design_power"
              - "cpu_base_frequency"
              - "uncore_frequency"
        - intel_pmu:
            event_definitions:
              - "/etc/telegraf/events_definition.json"
            core_events:
              - events:
                  - INST_RETIRED.ANY
    
  2. Apply the override.

    [sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics telegraf power-metrics --values telegraf-powerstat.yaml
    
    +----------------+------------------------------------------------+
    | Property       | Value                                          |
    +----------------+------------------------------------------------+
    | name           | telegraf                                       |
    | namespace      | power-metrics                                  |
    | user_overrides | config:                                        |
    |                |   inputs:                                      |
    |                |     - intel_powerstat:                         |
    |                |         cpu_metrics:                           |
    |                |         - cpu_frequency                        |
    |                |         - cpu_busy_frequency                   |
    |                |         - cpu_temperature                      |
    |                |         - cpu_c0_state_residency               |
    |                |         - cpu_c1_state_residency               |
    |                |         - cpu_c3_state_residency               |
    |                |         - cpu_c6_state_residency               |
    |                |         - cpu_busy_cycles                      |
    |                |         package_metrics:                       |
    |                |         - current_power_consumption            |
    |                |         - current_dram_power_consumption       |
    |                |         - thermal_design_power                 |
    |                |         - cpu_base_frequency                   |
    |                |         - uncore_frequency                     |
    |                |     - intel_pmu:                               |
    |                |       event_definitions:                       |
    |                |       - "/etc/telegraf/events_definition.json" |
    |                |       core_events:                             |
    |                |       - events:                                |
    |                |         - INST_RETIRED.ANY                     |
    |                |                                                |
    +----------------+------------------------------------------------+
    
  3. Re-apply the application.

    [sysadmin@controller-0 ~(keystone_admin)]$ system application-apply power-metrics
    

Modify Telegraf Data Collection Interval

Telegraf report its metrics each 10 seconds, but you can modify this time interval with the following command:

system helm-override-update power-metrics telegraf power-metrics --set config.agent.interval=<time-interval>

cAdvisor

Enable or Disable cAdvisor

To enable or disable cAdvisor, use the following command:

[sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics cadvisor power-metrics --set cadvisor_enabled=true
+----------------+------------------------+
| Property       | Value                  |
+----------------+------------------------+
| name           | cadvisor               |
| namespace      | power-metrics          |
| user_overrides | cadvisor_enabled: true |
|                |                        |
+----------------+------------------------+

Reapply the power-metrics application and wait for the pod to restart.

[sysadmin@controller-0 ~(keystone_admin)]$ system application-apply power-metrics

Enable and Disable Perf Events on cAdvisor

To enable or disable Perf Events on cAdvisor, use the following command:

[sysadmin@controller-0 ~(keystone_admin)]$ system helm-override-update power-metrics cadvisor power-metrics --set perf_events=true
+----------------+-------------------+
| Property       | Value             |
+----------------+-------------------+
| name           | cadvisor          |
| namespace      | power-metrics     |
| user_overrides | perf_events: true |
|                |                   |
+----------------+-------------------+

Finally, re-apply the power-metrics app, and wait until the pod restarts.

system application-apply power-metrics

Remove the Power Metrics App

To remove the Power metrics app use the following command:

system application-remove power-metrics

Then, use the following command to return the application to the uploaded state:

system application-delete power-metrics

Available Metrics

With Power Metrics application, we have access to system and hardware level raw data, enabling to visualize the power usage.

Power Metrics, by default, exposes the data collected from both, cAdvisor and Telegraf, in the OpenMetrics format.

Thermal Design Power

The Thermal Design Power, or TDP, is the maximum energy available, in watts, for the processor. The metric name for checking the TDP is: powerstat_package_thermal_design_power_watts.

Current Power Consumption

The current power usage of the system in watts. The metric name for checking power consumption is powerstat_package_current_power_consumption_watts.

Current DRAM Power Consumption

The current power usage of dram in the system in watts. The metric name for checking DRAM Consumption is: powerstat_package_current_dram_power_consumption_watts.

Current CPU Frequency

The current CPU frequency of the of the processor. The metric name for checking the CPU frequency is powerstat_core_cpu_frequency_mhz.

CPU Base Frequency

The base frequency (non-turbo) of the processor, it is the default speed of the processor. The metric name for checking cpu base frequency is powerstat_package_cpu_base_frequency_mhz.

Uncore Frequency

The application reports the current, maximum, and minimum frequency. The uncore frequency can be described as the frequency on a processor that is not actually part of its processor core, like memory controller and cache controller.

You can check the current uncore frequency with the following metric name: powerstat_package_uncore_frequency_mhz_cur, for maximum frequency metric name is powerstat_package_uncore_frequency_limit_mhz_max, and for minimum the name powerstat_package_uncore_frequency_limit_mhz_min.

Per-cpu minimum and maximum frequency

The application reports the minimum and maximum frequency that each core of the processor can achieve. It is possible to check the minimum frequency with the metric name linux_cpu_cpuinfo_min_freq or linux_cpu_scaling_min_freq, and maximum with linux_cpu_cpuinfo_max_freq or linux_cpu_scaling_max_freq.

Per-cpu busy frequency

Busy frequency is the frequency of a core that has a high utilization. (confirm this later). It is possible to see the busy frequency with the following metric name powerstat_core_cpu_busy_frequency_mhz.

Per-cpu percentage in C-State

The application can report the time, in percent, that a core of the processor spent in each c-state. c-State is the state of the core, in which it can reduce its power consumption, the higher the c-state the higher the sleep state of the core. We have in the power metrics the following c-states reports:

  • C0 state, in this state, the core is executing normally, it is exposed as powerstat_core_cpu_c0_state_residency_percent.

  • C1 state, in this state, the core is active but it’s not processing any instructions, it can quickly go back to the C0 state, it is exposed as powerstat_core_cpu_c1_state_residency_percent.

  • C6 State, in this state the core is with its voltage reduced (or powered off). This is the highest state. It takes a longer time to go to C0 state, but the power saving is higher. It is exposed as powerstat_core_cpu_c6_state_residency_percent.

Per-cpu current temperature

The application reports the current temperature of each individual core from the processor. The current temperature is exposed as the metric name powerstat_core_cpu_temperature_celsius.

Container perf events total

From cAdvisor it is reported the number of performance events that occurred in a container, it is exposed as container_perf_events_total.

Container perf events scaling ratio

It also reports the scaling ratio, which calculates the ratio of performance events in a container, it is exposed as container_perf_events_scaling_ration.

Per Core CPU Power usage

By considering the frequency of each core, gathered by powerstat_core_cpu_frequency_mhz metric with the amount of power usage of the processor, gathered by powerstat_package_current_power_consumption_watts metric, it is possible to estimate the total amount of power, in watts, that is being used by each core.

Example of formula:

per_cpu_consumption = ((0.6 * powerstat_core_cpu_frequency_mhz{cpu_id=x, package_id=y})/ ∑ powerstat_core_cpu_frequency_mhz{package_id=y}) * powerstat_package_current_power_consumption_watts{package_id=y}

Container CPU Power usage

By gathering the number of instructions in each container running on the cluster, gathered by the container_perf_events_total metric, with the corresponding core that they are using, determined by the per core cpu power usage described above, and the total number of instructions per core, also available from container_perf_events_total metric, it is possible to estimate the power that is being consumed by each container.

Example of formula to calculate the power consumption of a container on a core:

container_per_cpu_consumption = (container_perf_events_total{cpu=x, container=z} / container_perf_events_total {cpu=x}) * per_cpu_consumption{cpu=x}

Where “X” is the core_id of the cpu, “Y” is the package_id or physical_id of the processor, and “Z” is the container name.