Configure NVIDIA GPU Operator for PCI Passthrough

This is a pre-release feature and may not function as described in StarlingX 5 documentation.

This section provides instructions for configuring NVIDIA GPU Operator.

About this task


NVIDIA GPU Operator is only supported for standard performance kernel profile. There is no support provided for low-latency performance kernel profile.

NVIDIA GPU Operator automates the installation, maintenance, and management of NVIDIA software needed to provision NVIDIA GPU and provisioning of pods that require resources.

NVIDIA GPU Operator is delivered as a Helm chart to install a number of services and pods to automate the provisioning of NVIDIA GPUs with the needed NVIDIA software components. These components include:

  • NVIDIA drivers (to enable CUDA which is a parallel computing platform)

  • Kubernetes device plugin for GPUs

  • NVIDIA Container Runtime

  • Automatic Node labelling

  • DCGM (NVIDIA Data Center GPU Manager) based monitoring


Download the gpu-operator-v3- file at

Use the following steps to configure the GPU Operator container:


  1. Lock the hosts(s).

    ~(keystone_admin)]$  system host-lock <hostname>
  2. Configure the Container Runtime host path to the NVIDIA runtime which will be installed by the GPU Operator Helm deployment.

    ~(keystone_admin)]$ system service-parameter-add platform container_runtime custom_container_runtime=nvidia:/usr/local/nvidia/toolkit/nvidia-container-runtime
  3. Unlock the hosts(s). Once the system is unlocked, the system will reboot automatically.

    ~(keystone_admin)]$ system host-unlock <hostname>
  4. Create the RuntimeClass resource definition and apply it to the system.

    cat > nvidia.yml << EOF
        kind: RuntimeClass
          name: nvidia
        handler: nvidia
    ~(keystone_admin)]$ kubectl apply -f nvidia.yml
  5. Install the GPU Operator Helm charts.

    ~(keystone_admin)]$ helm install gpu-operator /path/to/gpu-operator-
  6. Check if the GPU Operator is deployed using the following command.

    ~(keystone_admin)]$ kubectl get pods –A
    NAMESPACE                     NAME                                                          READY   STATUS      RESTART    AGE
    default                       gpu-operator-596c49cb9b-2tdlw                                 1/1     Running     1          24h
    default                       gpu-operator-node-feature-discovery-master-7f87b4d6bb-wsbn4   1/1     Running     2          24h
    default                       gpu-operator-node-feature-discovery-worker-hqzvw              1/1     Running     4          24h
    gpu-operator-resources        nvidia-container-toolkit-daemonset-8f7nl                      1/1     Running     0          14h
    gpu-operator-resources        nvidia-device-plugin-daemonset-g9lmk                          1/1     Running     0          14h
    gpu-operator-resources        nvidia-device-plugin-validation                               0/1     Pending     0          24h
    gpu-operator-resources        nvidia-driver-daemonset-9mnwr                                 1/1     Running     0          14h

    The plugin validation pod is marked completed.

  7. Check if the resources are available using the following command.

    ~(keystone_admin)]$ kubectl describe nodes <hostname> | grep nvidia
  8. Create a pod that uses the NVIDIA RuntimeClass and requests a resource. Update the nvidia-usage-example-pod.yml file to launch a pod NVIDIA GPU. For example:

    cat <<EOF > nvidia-usage-example-pod.yml
    apiVersion: v1
    kind: Pod
      name: nvidia-usage-example-pod
      runtimeClassName: nvidia
       - name: nvidia-usage-example-pod
         image: nvidia/samples:cuda10.2-vectorAdd
         imagePullPolicy: IfNotPresent
         command: [ "/bin/bash", "-c", "--" ]
         args: [ "while true; do sleep 300000; done;" ]
  9. Create a pod using the following command.

    ~(keystone_admin)]$ kubectl create -f nvidia-usage-example-pod.yml
  10. Check that the pod has been set up correctly. The status of the NVIDIA device is displayed in the table.

    ~(keystone_admin)]$ kubectl exec -it nvidia-usage-example-pod -- nvidia-smi
    | NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |   0  Tesla T4            On   | 00000000:AF:00.0 Off |                    0 |
    | N/A   28C    P8    14W /  70W |      0MiB / 15109MiB |      0%      Default |
    |                               |                      |                  N/A |
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |  No running processes found                                                 |

    For information on deleting the GPU Operator, see Delete the GPU Operator.