Wireless FEC Operator integration

Integration of Intel Wireless FEC Operator to StarlingX platform

Storyboard: https://storyboard.openstack.org/#!/story/2009749

In a distributed cloud environment for vRAN workloads, there may be hundreds of sub-clouds, with each sub-cloud having one or more worker nodes managed by a System Controller, Some of these sub-clouds have worker nodes with Intel accelerator devices to offload 4G/LTE and 5G FEC (Forward Error Correction) operations.

These FEC devices have the flexibility to configure the hardware resource on a per vRAN workload basis to gain the optimal performance. In a typical scenario based on deployment locations, individual vRAN workload requirements may vary.

For an admin to manage and/or configure these Intel FEC accelerated devices in a containerized environment, additional functionality is required. The current configurability method in StartlingX does not support the flexibility to configure all the parameters in FEC h/w accelerator and has a pre-defined/static configuration options for typical workloads.

Problem description

Today in StarlingX, configuration of FEC devices is performed through a user application “pf-bb-config”, which in turn statically sets configurable parameters through a config file. Current version of StarlingX does support the configuration a few parameters (only 1 or 2) of FEC devices through “system” commands which in turn triggers puppet to “pf-bb-config” application when the system is unlocked.

Current configurability option uses pre-defined/static config files to configure FEC devices to support most the common vRAN workload requirements. To support other combinations of configurations and changing the configuration on different nodes in a cluster requires to add and maintain this configuration file in a somewhat unsupported fashion.

In addition to that, the next generation FEC devices ie., ACC101, ACC200, … support may need enhancements to the existing configuration method.

The Intel supported FEC Operator is a SRO (Special Resource Operator) for K8s which performs: * detects and labels the nodes which have FEC h/w accelerators installed * Configuration of FEC devices through standard K8s APIs (in JSON format) * Validation of FEC device configuration parameters * Configuration can be applied at cluster level or node level and device level * deployable through Kustomize/Helm deployment models * Support for next generation FEC devices is seamless

Use Cases

FEC Operator is an optional system application for the vRAN deployments where there is a need for fine tuning Intel FEC h/w accelerator resources (ie., number of VFs, queues, queue groups, etc..) based on deployment workloads.

List of parameters that can be configurable through the FEC Operator are:

  • Number of VF interface (VF bundles)

  • PF/VF mode

  • Enabling 4Gonly, 5Gonly or both 4G and 5G

  • for each direction (uplink/downlink) configuration of:

  • number of queuegroups, aqsPerGroup and aqDepth

User has the flexibility to apply these configuration per devices per node in a cluster using the native kubectl API interface.

Proposed change

The current method of configuration of FEC devices will be the default configuration for existing vRAN deployments that will not be changed.

FEC Operator will be added as an optional System application (sriov-fec-operator), which by default will be disabled (i.e. not applied or uploaded). Deployment of FEC operator is through helm charts packaged in the new system application manifest. Users on demand, can enable, deploy and configure the FEC operator by updating and applying helm overrides for the new system application.

FEC Operator functionality is distributed in few PODs: * sriov-fec-controller-manager

  • Runs on all master nodes in cluster, provides K8s Custom Resource API services for FEC device configuration,

  • communicates with FEC operator service running on each node to configure the FEC devices and reconciling.

  • sriov-fec-daemonset

    • Runs on each node in cluster,receives configuration from controller-manager

    • Detects the FEC devices on the platform/node

    • Based on data configured in SriovFecClusterConfig CRD

    • Binds the PF (Physical Function) interface with required driver ie., igb_uio or pci-pf-stub.

    • Creates the required number of VF interfaces

    • Bind the VF interface with driver (igb_uio, vfio-pci)

    • configure the FEC device using the pf-bb-config tool

  • sriov-device-plugin

    • Runs on each node, to manage the FEC device SR-IOV VF (Virtual Function) resources configured to user application PODs.

  • accelerator-discovery

    • Runs on each node to detect the FEC devices on each node

    • label the nodes which have FEC device

With the two different methods of FEC device configuration,

method-1: Default, existing method method-2: using FEC Operator

Method-1(existing method) is the default method applied on node startup. If SriovFecClusterConfig CRD is applied then sriov-fec-daemonset on the node will overwrite the existing configuration for that particular device on the node.

If admin want switch back to default static method, then performs the SriovFecClusterConfig CRD delete operation and reconfigure the device through method-1.

NOTE:

Reconfiguration and/or switching between configuration methods will impact the FEC device usage for the vRAN application PODs. Below listed steps recommended to follow during reconfiguration and/or switching configuration methods.

  • vRAN Application PODs should stop using the FEC devices and terminated.

  • Perform reconfiguration of device or switch the method and reconfigure.

  • Redeploy the vRAN application PODs to use the FEC device.

FEC devices supported through FEC Operator in STX 7 are:

ACC100(Mt.Bryce), N3000 FPGA

Alternatives

The current method of configuration to FEC devices is the default method of configuration and enabled by-default.

Configuration through FEC Operator is an optional alternative method.

Data model impact

Sriov-fec-operator application is introducing the new SriovFecClusterConfig CRD to the cluster.

Sample Cluster configuration:

apiVersion: sriovfec.intel.com/v2
kind: SriovFecClusterConfig
metadata:
  name: config
  namespace: sriov-fec-system
spec:
  priority: 1
  nodeSelector:
    kubernetes.io/hostname: <node-label>
  acceleratorSelector:
    pciAddress: 00000:17:00.0
  physicalFunction:
    pfDriver: "pci-pf-stub"
    vfDriver: "vfio-pci"
    vfAmount: 16
    bbDevConfig:
      acc100:
        # Programming mode: 0 = VF Programming, 1 = PF Programming
        pfMode: false
        numVfBundles: 16
        maxQueueSize: 1024
        uplink4G:
          numQueueGroups: 0
          numAqsPerGroups: 16
          aqDepthLog2: 4
        downlink4G:
          numQueueGroups: 0
          numAqsPerGroups: 16
          aqDepthLog2: 4
        uplink5G:
          numQueueGroups: 4
          numAqsPerGroups: 16
          aqDepthLog2: 4
        downlink5G:
          numQueueGroups: 4
          numAqsPerGroups: 16
          aqDepthLog2: 4

sriov_fec_cluster_config parameters description:

  • name: Name of the specific config.

  • cluster_config_name: Name of the cluster config.

  • priority: Priority of deployment (lower number higher priority).

  • drainskip: Allows for skipping the draining of the node after config application.

  • selected_node: (Optional) field that can be used to target only specific node.

  • pf_driver: The PF driver to be used igb_uio or pci-pf-stub.

  • vf_driver: The VF driver to be used vfio-pci or igb_uio.

  • vf_amount: The amount of VFs to be created for the device.

  • bbdevconfig:

    • pf_mode: The mode in which accelerator will be programmed, it is expected that VFs will be used and this is set to false.

    • num_vf_bundles: Number of VF bundles this should correspond to the vf_amount field.

    • max_queue_size: Max queue size this field is not expected to change in most deployments.

    • ul4g_num_queue_groups: Number of 4G Uplink queue groups, there is in total 8 queue groups that can be distributed between 4G/5G Uplink/Downlink.

    • ul4g_num_aqs_per_groups: Number of aqs per group - not expected to change for most deployments.

    • ul4g_aq_depth_log2: Log depth

    • dl4g_num_queue_groups: Number of 4G Downlink queue groups, there is in total 8 queue groups that can be distributed between 4G/5G Uplink/Downlink.

    • dl4g_num_aqs_per_groups: Number of aqs per group, not expected to change for most deployments.

    • dl4g_aq_depth_log2: Log depth.

    • ul5g_num_queue_groups: Number of 5G Uplink queue groups, there is in total 8 queue groups that can be distributed between 4G/5G Uplink/Downlink - here 4 queues are used for 5G Uplink.

    • ul5g_num_aqs_per_groups: Number of aqs per group, not expected to change for most deployments.

    • ul5g_aq_depth_log2: Log depth.

    • dl5g_num_queue_groups: Number of 5G Downlink queue groups, there is in total 8 queue groups that can be distributed between, 4G/5G Uplink/Downlink - here 4 queues are used for 5G Downlink.

    • dl5g_num_aqs_per_groups: Number of aqs per group, not expected to change for most deployments.

    • dl5g_aq_depth_log2: Log depth.

REST API impact

Standard extension of K8s APIs based on introduction of SriovFecClusterConfig CRD.

Security impact

Current/Existing K8S Authentication and Authorization apply to standard extension of K8S APIs based on introduction of SriovFecClusterConfig CRD.

Other end user impact

End user will have the capability of more detailed configuration of FEC Devices.

Performance Impact

  • In the existing method (method-1) configuration, resources (cpu and memory) will be consumed only during the configuration.

  • Using the FEC Operator method, service PODs will be running on master and worker nodes all the time which will consume some amount of CPU and memory resource from cluster housekeeping, which we believe this to be negligible.

  • For a periodic reconciling, communication between controller-manager and fec-daemon may consume network resources as well, assuming negligible.

Other deployer impact

None.

Upgrade impact

None. The sriov-fec-operator application is optional.

Implementation

Assignee(s)

Primary assignee:

  • Balendu Mouli Burla (balendu)

Other contributors:

  • Nidhi Shivashankara Belur (nshivash)

Repos Impacted

A new system-application repo will be created for the definition and building of the new sriov-fec-operator application.

Work Items

  • Create sriov-fec-operator application package

  • Integrate sriov-fec-operator application to FlexCD. Add application upload/apply/remove/delete commands.

  • Update the docs.starlingx.io for HowTo configure FEC devices using FEC operator application.

Dependencies

None

Testing

  • Testing will be performed on both SimpleX and DupleX mode deployment configurations.

  • Following functional validations will be performed

    • Check by default FEC operator is disable when node startsup first time.

    • Check the static configuration of FEC operator, make sure existing functionality is good.

    • Check enable/disable functionality of FEC operator in cluster.

    • Configure the FEC device with FEC Operator, to make sure it overrides the default configuration and verify the FEC functionality.

    • Delete the CRD configuration, re-configure the device through static configuration and verify the FEC functionality

    • Configure the device through FEC operator and reboot the node, check the node comes up with new configuration applied through fec-operator.

Documentation Impact

docs.starlingx.io will be updated for: * How to upload and apply sriov-fec-operator application

  • How to perform enhanced configuration of FEC devices with SriovFecClusterConfig CRD.

References

Intel FEC Operator: https://github.com/smart-edge-open/openshift-operator/blob/main/spec/openshift-sriov-fec-operator.md

Acronyms

  • FEC : Forward Error Correction

  • LTE : Long Term Evolution

  • vRAN : Virtual Radio Access Network

  • SR-IOV : Single Root - Input/Output Virtualization

  • PF : Physical Function

  • VF : Virtual Function

  • CRD : Custom Resource Definition

History

Initial Version.