Node Assignment Using NodeSelector, Affinity and Anti-Affinity, Taints, and Tolerations¶
Setting spec.nodeSelector
requirements, constrains the scheduler to only
schedule VMs on nodes, which contain the specified labels. In the following
example the VMI contains the label app: kload
(in this case on
controller-1).
Example of a VM manifest using nodeSelector
:
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
name: testvmi-nocloud
spec:
nodeSelector:
app: kload
terminationGracePeriodSeconds: 30
domain:
resources:
requests:
memory: 1024M
devices:
disks:
- name: containerdisk
disk:
bus: virtio
- name: emptydisk
disk:
bus: virtio
- disk:
bus: virtio
name: cloudinitdisk
volumes:
- name: containerdisk
containerDisk:
image: kubevirt/fedora-cloud-container-disk-demo:latest
- name: emptydisk
emptyDisk:
capacity: "2Gi"
- name: cloudinitdisk
cloudInitNoCloud:
userData: |-
#cloud-config
password: fedora
chpasswd: { expire: False }
The spec.affinity
field allows specifying hard- and soft-affinity for
VMs. It is possible to write matching rules against workloads (VMs and
pods) and Nodes. Since VMs are a workload type based on pods, Pod-affinity
affects VMs as well.
Pod affinity allows you to specify which POD/VM should be scheduled together (on the same node or in the same topology, like a zone).
Where requiredDuringSchedulingIgnoredDuringExecution
means that the
scheduler must place the pod/VM on a node that matches the affinity rules
during scheduling, but once the pod is running, the rule is ignored if other
changes happen (e.g., pods with the same label are removed).
The rule here says that the pod must be scheduled on a node that has other
POD/VM with the following characteristics: Label Selector: It looks for POD/VM
with the label security and the value S1
. This is defined in the
matchExpressions
part of the configuration.
TopologyKey: failure-domain.beta.kubernetes.io/zone
means that the pods
with the security: S1
label should be in the same zone as the POD/VM being
scheduled. Essentially, the pod should be scheduled in the same failure domain
(zone) as other pods with the security: S1
label.
Pod anti-affinity specifies that certain pods should not be scheduled together (on the same node or topology).
Where preferredDuringSchedulingIgnoredDuringExecution
is a “soft” rule
where the scheduler prefers to place the pod according to the anti-affinity
rule, but it is not required. If the rule cannot be met, the POD/VM will still
be scheduled, but it will try to follow the rule when possible.
The rule here says that the scheduler prefers to place the POD/VM on a node
that avoids other pods with the following characteristics: Label Selector:
POD/VM with the label security and value S2
.
TopologyKey: kubernetes.io/hostname
means that the anti-affinity applies to
the hostname (i.e., the pod should prefer not to be placed on the same node as
POD/VM with the security: S2
label).
Weight
: The weight of 100 indicates the strength of the preference. A
higher weight means the scheduler will try harder to respect the rule, but it
is still not guaranteed.
Example of a VM manifest using affinity and anti-affinity:
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
spec:
nodeSelector:
cpu: slow
storage: fast
domain:
resources:
requests:
memory: 64M
devices:
disks:
- name: mypvcdisk
lun: {}
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S1
topologyKey: failure-domain.beta.kubernetes.io/zone
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
topologyKey: kubernetes.io/hostname
volumes:
- name: mypvcdisk
persistentVolumeClaim:
claimName: mypvc
Affinity as described above, is a property of VMs that attracts them to a set of nodes (either as a preference or a hard requirement). Taints are the opposite - they allow a node to repel a set of VMs.
Taints and tolerations work together to ensure that VMs are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this ensures that the node should not accept any VMs that do not tolerate the taints. Tolerations are applied to VMs, and allow (but do not require) the VMs to schedule onto nodes with matching taints.
Example of VM manifest using taint and tolerance.
You add a taint to a node using kubectl taint
. For example, kubectl taint
nodes node1 key=value:NoSchedule
.
Below is an example of adding a toleration of this taint to a VM:
metadata:
name: testvmi-ephemeral
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
spec:
nodeSelector:
cpu: slow
storage: fast
domain:
resources:
requests:
memory: 64M
devices:
disks:
- name: mypvcdisk
lun: {}
tolerations:
- key: "key"
operator: "Equal"
value: "value"
effect: "NoSchedule"