Node Assignment Using NodeSelector, Affinity and Anti-Affinity, Taints, and Tolerations

Setting spec.nodeSelector requirements, constrains the scheduler to only schedule VMs on nodes, which contain the specified labels. In the following example the VMI contains the label app: kload (in this case on controller-1).

Example of a VM manifest using nodeSelector:

apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
  name: testvmi-nocloud
spec:
  nodeSelector:
    app: kload
  terminationGracePeriodSeconds: 30
  domain:
    resources:
      requests:
        memory: 1024M
    devices:
      disks:
      - name: containerdisk
        disk:
          bus: virtio
      - name: emptydisk
        disk:
          bus: virtio
      - disk:
          bus: virtio
        name: cloudinitdisk
  volumes:
  - name: containerdisk
    containerDisk:
      image: kubevirt/fedora-cloud-container-disk-demo:latest
  - name: emptydisk
    emptyDisk:
      capacity: "2Gi"
  - name: cloudinitdisk
    cloudInitNoCloud:
      userData: |-
        #cloud-config
        password: fedora
        chpasswd: { expire: False }

The spec.affinity field allows specifying hard- and soft-affinity for VMs. It is possible to write matching rules against workloads (VMs and pods) and Nodes. Since VMs are a workload type based on pods, Pod-affinity affects VMs as well.

Pod affinity allows you to specify which POD/VM should be scheduled together (on the same node or in the same topology, like a zone).

Where requiredDuringSchedulingIgnoredDuringExecution means that the scheduler must place the pod/VM on a node that matches the affinity rules during scheduling, but once the pod is running, the rule is ignored if other changes happen (e.g., pods with the same label are removed).

The rule here says that the pod must be scheduled on a node that has other POD/VM with the following characteristics: Label Selector: It looks for POD/VM with the label security and the value S1. This is defined in the matchExpressions part of the configuration.

TopologyKey: failure-domain.beta.kubernetes.io/zone means that the pods with the security: S1 label should be in the same zone as the POD/VM being scheduled. Essentially, the pod should be scheduled in the same failure domain (zone) as other pods with the security: S1 label.

Pod anti-affinity specifies that certain pods should not be scheduled together (on the same node or topology).

Where preferredDuringSchedulingIgnoredDuringExecution is a “soft” rule where the scheduler prefers to place the pod according to the anti-affinity rule, but it is not required. If the rule cannot be met, the POD/VM will still be scheduled, but it will try to follow the rule when possible.

The rule here says that the scheduler prefers to place the POD/VM on a node that avoids other pods with the following characteristics: Label Selector: POD/VM with the label security and value S2.

TopologyKey: kubernetes.io/hostname means that the anti-affinity applies to the hostname (i.e., the pod should prefer not to be placed on the same node as POD/VM with the security: S2 label).

Weight: The weight of 100 indicates the strength of the preference. A higher weight means the scheduler will try harder to respect the rule, but it is still not guaranteed.

Example of a VM manifest using affinity and anti-affinity:

apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
spec:
  nodeSelector:
    cpu: slow
    storage: fast
  domain:
    resources:
      requests:
        memory: 64M
    devices:
      disks:
      - name: mypvcdisk
        lun: {}
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: security
            operator: In
            values:
            - S1
        topologyKey: failure-domain.beta.kubernetes.io/zone
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: security
              operator: In
              values:
              - S2
          topologyKey: kubernetes.io/hostname
  volumes:
    - name: mypvcdisk
      persistentVolumeClaim:
        claimName: mypvc

Affinity as described above, is a property of VMs that attracts them to a set of nodes (either as a preference or a hard requirement). Taints are the opposite - they allow a node to repel a set of VMs.

Taints and tolerations work together to ensure that VMs are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this ensures that the node should not accept any VMs that do not tolerate the taints. Tolerations are applied to VMs, and allow (but do not require) the VMs to schedule onto nodes with matching taints.

Example of VM manifest using taint and tolerance.

You add a taint to a node using kubectl taint. For example, kubectl taint nodes node1 key=value:NoSchedule.

Below is an example of adding a toleration of this taint to a VM:

metadata:
  name: testvmi-ephemeral
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
spec:
  nodeSelector:
    cpu: slow
    storage: fast
  domain:
    resources:
      requests:
        memory: 64M
    devices:
      disks:
      - name: mypvcdisk
        lun: {}
  tolerations:
  - key: "key"
    operator: "Equal"
    value: "value"
    effect: "NoSchedule"