OVS-DPDK containerization

Storyboard: https://storyboard.openstack.org/#!/story/2005496

As StarlingX moves to containerization, most openstack components have been containerized. That includes OVS containerization, but OVS-DPDK is still running on host. This story is to implement OVS-DPDK containerization.

Problem description

Currently, StarlingX supports OVS and OVS-DPDK. OVS is managed by openstack-helm and running in container. But OVS-DPDK is managed by puppet, and running directly on the host. Considering the benefits of containerization, we would like to containerize OVS-DPDK. On the other hand, maintaining two implementations and keeping them consistent cost more resources than maintaining just one implementation.

Use Cases

Without OVS-DPDK containerization:

  • If we want to make some changes(upgrade OVS version, enable some features) of OVS. We need the changes at two places.

  • If we want to support other host OS distribution(i.e. Ubuntu), we need to build the OVS/DPDK package for Ubuntu, as we run OVS-DPDK on the host.

Proposed change

This story includes StarlingX changes and openstack-helm upstream. openstack-helm upstream patches are already in review.

‘ovs-dpdk’, ‘none’ are vswitch types we support for now. ‘ovs-dpdk’ means running OVS-DPDK on host, ‘none’ means running OVS(without DPDK) in container. For containerized OVS-DPDK we don’t create new vswitch type, we enhance the ‘none’ type to support dpdk. It means ‘none’ type will support both OVS and OVS-DPDK(containerized). A new kubernetes node label(openvswitch-dpdk=enabled) will be used to control dpdk enable. Once this story is completed, we will not maintain ‘ovs-dpdk’ type anymore.

Hugepages need to be reserved for DPDK. Currently, the reservation is done by sysinv/puppet. In this story , the hugepages reservation will still be covered by sysin/puppet. openstack-helm just use the hugepages. StarlingX reserves hugepages for DPDK and nova-compute, we can run ‘system host-memory-show controller-0’ to show the hugepages info. StarlingX has a default policy for hugepages allocation, users can overwrite the default by ‘system host-memory-modify’. As k8s doesn’t support multiple hugepage sizes, we can only reserve hugepages of a single size.

[wrsroot@controller-0 ~(keystone_admin)]$  system host-memory-show controller-0 0
+-------------------------------------+--------------------------------------+
| Property                            | Value                                |
+-------------------------------------+--------------------------------------+
| Memory: Usable Total (MiB)          | 9181                                 |
|         Platform     (MiB)          | 7600                                 |
|         Available    (MiB)          | 9181                                 |
| Huge Pages Configured               | True                                 |
| vSwitch Huge Pages: Size (MiB)      | 2                                    |
|                     Total           | 512                                  |
|                     Available       | 0                                    |
|                     Required        | None                                 |
| Application  Pages (4K): Total      | 1826048                              |
| Application  Huge Pages (2M): Total | 1024                                 |
|                 Available           | 1024                                 |
| Application  Huge Pages (1G): Total | 0                                    |
|                 Available           | None                                 |
| uuid                                | 56be1dc6-dc10-4318-88e3-953f75eb6684 |
| ihost_uuid                          | 3fc748fa-a831-42f0-8c67-d15786806d6b |
| inode_uuid                          | c4ee7258-fd13-4520-80f5-62c93e2e2b20 |
| created_at                          | 2019-04-28T06:08:42.884178+00:00     |
| updated_at                          | 2019-05-05T06:21:04.987518+00:00     |
+-------------------------------------+--------------------------------------+

From above output, we can see 2M * 512 hugepages are reserved for OVS-DPDK. In this story, openvswitch helm plugin will be updated to generate memory configuration(dpdk-socket-mem) for openvswitch chart according to the reserved hugepages info. If multiple NUMA nodes exist on the compute node, we should allocated hugepages on every NUMA node.

To run OVS-DPDk in container, we need to enable kubernetes hugepages feature. Currently kubernetes doesn’t support multiple hugepage sizes on a single node. I have opened the multiple size issue to track it.

OVS-DPDK process contains 2 types of threads: the control path threads and data path threads. The control path threads run on Platform cores just like all other pods. But the data path threads, known as pmd threads, need to run on one or more dedicated cores. StarlingX needs to reserve CPU cores for OVS-DPDK data path threads. Currently StarlingX reserves CPU cores for OVS-DPDK(no-containerized) by sysinv which generates kernel parameter ‘isolcpus’. For containerized OVS-DPDK, CPU cores are going to be reserved in the same way. We can run ‘system host-cpu-list controller-0’ to show the CPU info. StarlingX has a default policy for CPU allocation, users can overwrite the default by ‘system host-cpu-modify’.

[wrsroot@controller-0 ~(keystone_admin)]$ system host-cpu-list controller-0
+--------------------------------------+-------+-----------+-------+--------+-------------------------------------------+-------------------+
| uuid                                 | log_c | processor | phy_c | thread | processor_model                           | assigned_function |
|                                      | ore   |           | ore   |        |                                           |                   |
+--------------------------------------+-------+-----------+-------+--------+-------------------------------------------+-------------------+
| a6189494-a2da-4f26-8a18-658d3fa5ad4f | 0     | 0         | 0     | 0      | Intel Core i7 9xx (Nehalem Class Core i7) | Platform          |
| c7d0de01-7c95-4b90-a423-d19d777e5b86 | 1     | 0         | 1     | 0      | Intel Core i7 9xx (Nehalem Class Core i7) | Platform          |
| 0e644162-ee11-486d-8249-94099d34a160 | 2     | 0         | 2     | 0      | Intel Core i7 9xx (Nehalem Class Core i7) | vSwitch           |
| 3b13943e-5d8e-49ab-b63e-17311e314f32 | 3     | 0         | 3     | 0      | Intel Core i7 9xx (Nehalem Class Core i7) | Applications      |
| a36e8842-2f55-4697-bd89-f074b2e0c567 | 4     | 0         | 4     | 0      | Intel Core i7 9xx (Nehalem Class Core i7) | Applications      |
| a74c066b-5a9a-48bd-aeec-9e803e395f7f | 5     | 0         | 5     | 0      | Intel Core i7 9xx (Nehalem Class Core i7) | Applications      |
+--------------------------------------+-------+-----------+-------+--------+-------------------------------------------+-------------------+

From above output, we can see core 2 is allocated for OVS-DPDK pmd threads. In this story, openvswitch helm plugin will be updated to generate CPU configurations(dpdk-lcore-mask, pmd-cpu-mask). ‘pmd-cpu-mask’ is the OVS parameter which specifies which CPU cores will the PMD threads run on. The technology under ‘pmd-cpu-mask’ is cpuset cgroup. By default, all pods can only see the platform cores. We need to change the cgroup of ovs at launch time. Actually, StarlingX also reserve CPU cores for nova-compute(assigned_function of Applications), finally rendered as ‘vcpu_pin_set’ in nova.conf

When a compute node being unlocked, the vswitch.pp does some OVS related works: 1) bind datanetwork NICs to a linux module(vfio-pci by default in StarlingX). 2) Create OVS bridges 3) Add the NICs to bridges. In this story, the first item can be covered by puppet or openstack-helm or by using NetworkDeviceAttachment which leverages existing SRIOV CNI. The second and the third items will be covered by openstack-helm. To create OVS bridges and add NICs to bridges, openstack-helm needs to know the bridge names and the NIC pci_id. These parameters will be generated by neutron helm plugin according the info in sysinv.

Alternatives

None

Data model impact

None

REST API impact

None

Security impact

None

Other end user impact

As the k8s hugepage feature doesn’t support multiple hugepage sizes for now, we can allocate hugepages of only 1 single size. That means we can only create VM of 1 single hugepage size. The limitation is described in the hugepage spec commit

Performance Impact

Suppose no impact

For networking, OVS-DPDK container uses host native network.

For CPU/memory, although container resource is limited, but the resource used by OVS is configured by OVS parameters instead of container limitation.

Other deployer impact

‘openvswitch-dpdk=enabled’ label is required for compute nodes to enable OVS-DPDK.

Developer impact

Once this feature is implemented, we don’t run OVS-DPDK on the host. So the vswitch.pp file will be removed, openstack-helm takes its job for OVS-DPDK configuration.

Upgrade impact

None

Implementation

Assignee(s)

Primary assignee:

chengli3 <cheng1.li@intel.com>

Other contributors:

<launchpad-id or None>

Repos Impacted

starlingx/config, starlingx/integ

Work Items

  • Improve OVS docker image to support dpdk (starlingx/integ). To support dpdk, dpdk should be installed in OVS image and OVS should be built/installed with dpdk install option (–with-dpdk). The community OVS image already support dpdk by image patch. To build ourselves OVS image, we can author our OVS docker file in starling/integ project. The OVS/DPDK version will be the same as the host. The docker image OS may needs to be CentOS as well, as OVS container mounts host /lib/modules.

  • Make OVS chart supporting dpdk (openstack-helm-infra). To support dpdk, OVS needs to be setup with dpdk setup options. ovs patch is in review.

  • Make neutron chart supporting dpdk (openstack-helm)

  • Reserve huge pages for OVS-DPDK and enable k8s hugepage feature (starlingx/config). huge pages should be reserved for containerized OVS-DPDK. The same as how we reserve huge pages for vswitch_type ‘ovs-dpdk’.

  • Generate dpdk related configurations for openstack deployment (starlingx/config). openvswitch helm plugin needs be updated to add dpdk configurations. neutron helm plugin should be updated as well.

  • Docs update (starlingx/docs) Update the installation guide

Dependencies

  • Needs OVS version >=2.6 to support vhost-user reconnect.

Testing

The host NICs those are planed for data networks must support DPDK. Multiple hosts are needed to test connection cross hosts.

The following cases are needed:

  • Creating VM and test the networking connection between VMs and the external connection.

  • Check if any issue with host reboot.

Documentation Impact

The installation guides on the wiki need to be updated. There will be a little difference for deployer on vswitch type setting.

References

History

Optional section intended to be used each time the spec is updated to describe new design, API or any database schema updated. Useful to let reader understand what’s happened along the time.

Revisions

Release Name

Description

Stein

Introduced