Integration of Intel Ethernet Operator to StarlingX Platform

Storyboard: https://storyboard.openstack.org/#!/story/2010562

In a cloud environment, network interface adapters require a cloud based management system. These adapters may have advanced functionality which is best combined under a single operator.

Problem description

Firmware for Network Adapters may need management. Network Adapter personalization may need management. Activation of flow rules to ensure interfaces reach pods is also required.

Use Cases

  • Update of Firmware on interface adapters

  • Update of Device Dynamic Personalization on interface adapters

  • Update of Flow Configuration on interface adapters to allow steering

Dynamic Device Personalization (DDP) is the on-chip programmable pipeline which allows deep and diverse protocol header processing. Flow configuration allow the steering of traffic to particular VFs on the node.

Proposed change

The Intel Ethernet Operator (IEO) will allow Intel E810 Series NICs firmware to be updated in a container environment. Nodes will be drained, taken out of service and restarted as required by the update. Firmware and DDP packages can be downloaded from a suitable HTTP Server (configurable in EthernetNodeConfig Custom Resource).

Intel Ethernet Operator also requires some other plugins and operators:

Intel Ethernet operator requires SR-IOV Device Plugin which makes SR-IOV resources available in kubernetes. For ease of configuration SR-IOV Network Operator is also required. SR-IOV Network Operator requires the use of Node Feature Discovery. Both SR-IOV Network Operator and Node Feature Discovery are installed as dependencies along IEO in intel-ethernet-operator namespace.

Flow rules require the inclusion of the (Unified Flow Tool) UFT server application. UFT applies the flow rules and is called using the DPDK rte_flow API. The API supports the Switch Filter rules, supported by DPDK rte_flow. UFT is included as part of the IEO installation.

Common features

Within the ethernet operator on the control node, the controller deploys the first asset (ethernet-discovery = labeler) - this is being deployed as a pod on each node, the labeler marks the node if a supported device is connected. The controller deploys a compatibility map - a config file specifying which FW/DDP/Kernel versions can work together.

The controller deploys the Ethernet-daemon (FW/DDP daemon) as a Daemonset, nodes with appropriate label get a pod deployed on them, others don’t. The Ethernet daemon checks for Node configuration, if one is not found it creates it. The Daemon reconciles in a loop, gathers the status of the required components (found devices, PCI address, MAC, FW, DDP version etc) and updates the Node configuration with a status. User can now get a status of all Node configs, and status of a specific Node config.

Firmware and DDP upgrade

User uploads desired DDP package and/or nvmupdate package to a HTTP server accessible by the cluster (the HTTP server and mechanism to upload are out of scope of operator). User can now apply a new cluster configuration with preferred configuration, this is broken down by the Ethernet Controller into smaller Node configurations, the configurations are updated. The Ethernet Daemon reconciles in a loop for an update, if condition (fields in applied EthernetClusterConfig CRD) is unchanged it ignores, if new conditions for other nodes are detected it ignores them, when a condition change is detected for particular daemon it acts on it, it will verify the condition and deny change if it cannot be met. If condition can be met it will run appropriate functions/actions to update the node to the desired condition (ie DDP/FW update) - it will try to download packages from specified address from HTTP server, it will elect a leader to act as a controller, it will cordon off and drain the node, it will proceed with updates, it will reboot the node, uncordon it and release the leadership. Once any update to configuration is done, it will update the node configuration status. Once the update is finished the user is able to get the status of the update and status of the node.

Flow Configuration

To allow the Flow Configuration feature to compose the flow rules for the network card’s traffic, the deployment must use a trusted virtual function (VF) from each physical function (PF). Usually it is the first VF (VF0) for each PF that has trust mode enabled and then bound to the vfio-pci driver. This VF pool must be created by the user and be allocatable as a Kubernetes resource.

Rules can be written like rte_flow and will allow deep matching of packet type flows to interfaces associated with pods on a cluster. Rules can be written for cluster and pod. During pod scheduling they will be instantiated on a node to configure the flow offload hardware on interface to target a pod attached via a particular VF.

Alternatives

It’s possible to connect to each node, untar and install the firmware and Device Profiles. Similarly, flow offloads could possibly be done individually on each node.

Data model impact

IEO introduces following CRDs on the cluster: - EthernetClusterConfig - FlowConfigNodeAgentDeployment - NodeFlowConfig - ClusterFlowConfig - EthernetNodeConfig (NICs configuration status, not created by user)

EthernetClusterConfig

apiVersion: ethernet.intel.com/v1
kind: EthernetClusterConfig
metadata:
  name: config
spec:
  nodeSelectors:
    kubernetes.io/hostname: <hostname>
  deviceSelector:
    pciAddress: "<pci-address>"
  deviceConfig:
    fwURL: "<URL_to_firmware>"
    fwChecksum: "<file_checksum_SHA-1_hash>"
    ddpURL: "<URL_to_DDP>"
    ddpChecksum: "<file_checksum_SHA-1_hash>"

Parameters

  • name: Name of the specific config

  • kubernetes.io/hostname: Hostname containing cards to be updated

  • fwURL: Accessible URL for the file. Proxy may be needed

  • fwChecksum: Expected checksum of the firmware file

  • ddpURL: Accessible URL for the DDP file. Proxy may be needed

  • fwChecksum: Expected checksum of the DDP file

FlowConfigNodeAgentDeployment

apiVersion: flowconfig.intel.com/v1
kind: FlowConfigNodeAgentDeployment
metadata:
  labels:
    control-plane: flowconfig-daemon
  name: flowconfig-daemon-deployment
  namespace: intel-ethernet-operator
spec:
  DCFVfPoolName: openshift.io/cvl_uft_admin
  NADAnnotation: sriov-cvl-dcf

Parameters

  • name: Name of the FlowConfigNodeAgentDeployment

  • DCFVfPoolName: Used SriovNetworkNodePolicy name

  • NADAnnotation: Used SriovNetwork name

NodeFlowConfig

apiVersion: flowconfig.intel.com/v1
kind: NodeFlowConfig
metadata:
  name: worker-01
spec:
  rules:
    - pattern:
        - type: RTE_FLOW_ITEM_TYPE_ETH
        - type: RTE_FLOW_ITEM_TYPE_IPV4
          spec:
            hdr:
              src_addr: 10.56.217.9
          mask:
            hdr:
              src_addr: 255.255.255.255
        - type: RTE_FLOW_ITEM_TYPE_END
      action:
        - type: RTE_FLOW_ACTION_TYPE_DROP
        - type: RTE_FLOW_ACTION_TYPE_END
      portId: 0
      attr:

Parameters

  • name: Name of the config - needs to match node name

  • pattern: type: Header part to match on

  • pattern: spec & mask: Addresses to match for the rules

  • action: Alter the fate of matching traffic, its contents or properties

  • attr: Flow rule priority level

  • portID: Information to identify port on a node

ClusterFlowConfig

apiVersion: flowconfig.intel.com/v1
kind: ClusterFlowConfig
metadata:
  name: pppoes-sample
spec:
  rules:
    - pattern:
        - type: RTE_FLOW_ITEM_TYPE_ETH
        - type: RTE_FLOW_ITEM_TYPE_IPV4
          spec:
            hdr:
            src_addr: 10.56.217.9
          mask:
            hdr:
            src_addr: 255.255.255.255
        - type: RTE_FLOW_ITEM_TYPE_END
      action:
        - type: to-pod-interface
          conf:
            podInterface: net1
      attr:
        ingress: 1
        priority: 0
  podSelector:
      matchLabels:
        app: vagf
        role: controlplane

Parameters

  • name: Name of the config

  • pattern: type: Header part to match on

  • pattern: spec & mask: Addresses to match for the rules

  • action: Alter the fate of matching traffic, its contents or properties

  • attr: Flow rule priority level

  • podSelector: Labels associated with the particular pod

NOTE: Most of the objects parameters names are consistent with the names given in the official dpdk rte flow documentation. For the full description of Generic flow API see https://doc.dpdk.org/guides/prog_guide/rte_flow.html.

During the course of execution, ClusterFlowConfig rules are broken down to NodeFlowConfig rules. NodeFlowConfig rules can also be written manually.

REST API impact

Standard extension of K8s APIs based on introduction of above CRDs.

Security impact

Current/Existing K8S Authentication and Authorization apply to standard extension of K8S APIs based on introduction of IEO CRDs.

Other end user impact

End user will have the capability to: - control firmware and DDP packages - configure flow rules - display configuration status on intel ethernet devices.

Performance Impact

Using the Intel Ethernet Operator, service pods will be running on master and worker nodes all the time which will consume some amount of CPU and memory resource from cluster housekeeping, which we believe to be negligible. For a periodic reconciling, communication between controller-manager and node daemons may consume network resources as well, assuming negligible.

Other deployer impact

None.

Developer impact

In StarlingX 8.0 and future releases /lib/firmware directory is read-only. This creates problem for any customer that would want to use DDP profile other than one that comes preinstalled. Intel ice driver looks for DDP package named intel/ice/ddp/stx-ice.pkg in default firmware search paths - which are /lib/firmware and /lib/firmware/updates. Both of these paths are immutable so currently there is no way to change DDP package in use. Solution to this is alternate firmware search path that is already in kernel https://docs.kernel.org/driver-api/firmware/fw_search_path.html. This feature can be enabled by adding suitable boot parameter. Contribution that adds that to StarlingX is already made (in stx-puppet repository).

Upgrade impact

None. This is an optional operator.

Implementation

Assignee(s)

Primary assignee:

Rafal Lal

Other contributors:

Kevin Clarke

Repos Impacted

A new system-application repo will be created for the definition and building of the intel-ethernet-operator application.

Work Items

Create intel-ethernet-operator application package

Integrate intel-ethernet-operator application to FluxCD. Add application upload/apply/remove/delete commands.

Update the docs.starlingx.io for How To use intel-ethernet-operator to configure ethernet cards.

Building images

Intel Ethernet Operator team would like to redirect building of UFT container image to StarlingX. Source code of the image is publicly available, and we would provide build scripts. Images of other components would be built and made ready to pull by Intel.

Dependencies

None specific.

Testing

Testing will be done on a multi node cluster configuration.

  • Testing of packages across several revisions of packages

  • Validating firmware installs, DDP package installs.

  • Testing that traffic flow is instantiated to correct pods.

  • CRDs for particular functionality effect the change on the cluster

  • Manually deleting / changing the configuration to validate controllers make the changes

  • reboot of nodes to validate new configuration remains

  • reload of drivers to validate new configuration remains.

Documentation Impact

docs.starlingx.io will be updated for: * How to use intel-ethernet-operator application * How to perform enhanced configuration of ethernet devices with the CRDs supplied by Ethernet Operator.

References

Intel® Ethernet Operator - Overview Solution Brief https://networkbuilders.intel.com/solutionslibrary/intel-ethernet-operator-overview-solution-brief

Intel Ethernet Operator https://github.com/intel/intel-ethernet-operator

Unified Flow Tool (UFT) https://github.com/intel/UFT/tree/main

Intel Ethernet 810 series features https://www.intel.com/content/www/us/en/products/details/ethernet/800-controllers/e810-controllers/docs.html

Node Feature Discovery https://github.com/kubernetes-sigs/node-feature-discovery

SR-IOV Network Operator https://github.com/k8snetworkplumbingwg/sriov-network-operator

SR-IOV Network Device Plugin for Kubernetes https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin

History

Revisions

02-Feb-2023

Introducing Ethernet operator

02-Feb-2023

Updated with comments from StarlingX Sub-Project Meeting

03-Mar-2023

Submission

29-Mar-2023

Updated with comments from StarlingX Sub-Project Meeting

22-Jun-2023

Updated with comments from code reviews