EdgeWorker Management Phase One¶
Storyboard: https://storyboard.openstack.org/#!/story/2008129
This story will introduce a new node personality ‘edgeworker’ to StarlingX.
The biggest difference between ‘edgeworker’ node and ‘worker’ node is that the OS of ‘edgeworker’ nodes are not installed or configured by StarlingX controller and they may vary due to different cases, for example Ubuntu, Debian, Fedora… The basic idea is to deploy containerd and kubelet service to the ‘edgeworker’ nodes, so that the StarlingX Kubernetes platform will be extended to ‘edgeworker’ nodes.
The second difference is that ‘edgeworker’ are usually deployed close to edge devices while ‘worker’ nodes are usually servers deployed in the server room. The ‘edgeworker’ personality are suitable for the nodes that users may want to install their customized OS and may require a deployment physically close to the data producer or consumer devices.
The way to leverage advantages of StarlingX functionality is to get most flock agents containerized and enabled on edgeworker nodes. That is also aligned with long term strategy of flock service containerization.
The whole topic is broken down into 4 phases approximately:
Phase One
Add edgeworker personality
Add ansible-playbook to join edgeworker node to STX K8S cluster
Support Ubuntu and CentOS as target OS
Phase Two
Containerize a set of flock agents to get edgeworker node inventoried
Enhance multiple Ceph cluster operation
Phase Three
Support Openstack running on edgeworker nodes
Support L3/Tunnel mgmt. network
Containerize rest of flock agents
Phase Four
Enable software management on edgeworker nodes
Enable optional authentication for new nodes
Extend target OS support
This spec focuses on Phase One.
Problem description¶
In a typical IoT or industrial use case, StarlingX is usually used to facilitate the whole edge cluster setup and management. But there are different types of nodes existing in the cluster that are not in current StarlingX management scope. Various reasons are hindering administrator to get these nodes deployed as ‘worker’ nodes, from software to hardware. In particular, the common setbacks are:
OS of the nodes could not or don’t want to be installed by StarlingX.
The nodes are running a Type I hypervisor.
The hardware resources do not meet StarlingX worker node’s minimum requirement.
The nodes are connected to StarlingX controllers over a L3 network.
In this story, these nodes are categorized into a new personality to distinguish from ‘worker’ nodes. The new personality is called ‘edgeworker’ since these nodes are usually deployed close to the edge device side. An edge device could probably be an I/O device, a camera, a servo motor or a sensor.
The first three setbacks will be addressed in this phase one, while network requirement and manageability enhancement will be addressed in the next few phases. Separate specs for different phases will be submitted during different releases.
Use Cases¶
Administrator wants to have all the ‘edgeworker’ nodes managed by StarlingX
Make ‘edgeworker’ in the host list (Phase one)
Check/Lock/Unlock ‘edgeworker’ node state (Phase two)
Query ‘edgeworker’ hardware resources info (Phase two)
Configure ‘edgeworker’ resources for specific usage (Phase two and later)
Manage alarms generated by ‘edgeworker’ (Phase three)
Update ‘edgeworker’ packages (Phase four)
Administrator does not want StarlingX to install OS on the ‘edgeworker’ nodes
User wants to orchestrate container workloads to ‘edgeworker’ nodes
User wants to orchestrate VM workloads to ‘edgeworker’ nodes as an option
Proposed change¶
Edgeworker personality
Adding a new personality will require changes in sysinv db, sysinv api and sysinv conductor, as well as cgts-client.
sysinv db
In order to get ‘edgeworker’ node into sysinv, the ‘edgeworker’ value will be added to enum type invPersonalityEnum in sysinv db. Accordingly, adding ‘edgeworker’ to db models is required as well. After this change, a host from sysinv db perspective could be assigned as edgeworker personality.
sysinv api
Mainly focus on host api, adding checks during host add for ‘edgeworker’ hosts. Possible checks:
mgmt ip if mgmt network is not dynamic
host name validation
personality check
sysinv conductor
sysinv conductor is responsible for mgmt ip allocation when the mgmt network is in dynamic type.
cgts client
Add ‘edgeworker’ choice for argument ‘personality’ of host-add/ host-update command.
After underlying changes applied, the administrator is able to use
# system host-add -n <hostname> -p edgeworker or # system host-update <id> hostname=<hostname> personality=edgeworker
to add an edgeworker node to the inventory.
When an edgeworker node is added to inventory, sysinv could provide following services:
DHCP service (Phase one)
Host lock/unlock (Phase two)
Host interface modification and assignment (Phase two)
Host hardware resource query (Phase two)
Label assignment (Phase two)
The function that will not be supported on edgeworker:
host-upgrade
bmc integration
An edgeworker node is not a server, but a normal PC like industrial PC/NUC/workstation. BMC is not a required feature for those nodes. The node life cycle management is done in-band or by the maintainer manually. The use case which uses edgeworker nodes does not expect an out-of-band node management for these nodes.
Additional semantic check will be added for these functions.
Other functions will be described in detail in each phase’s spec.
ansible playbook for provisioning edgeworker nodes
The main steps for provisioning an edgeworker node are installing kubelet, kubeadm and containerd packages to the node due to different Linux distributions and joining the node to StarlingX Kubernetes platform. Besides these steps, system configurations like ntp setup, interface configuration, dns setup etc. are needed as well.
The first two Linux distributions we propose to support for edgeworker are Ubuntu and CentOS.
The version of all the kubernetes packages on edgeworker nodes must be exactly the same as the packages on controllers. If they are not, the playbook will reinstall the packages to the proper version.
The playbook sequence to provision an edgeworker node:
Preparations on controller
Send containerd config and cert to edgeworker
Generate K8S bootstrap token and calculate certificate hash
Preparations on edgeworker
Config network (interface and dns)
Setup proxy if needed
Install essential packages
Setup ntp
Add edgeworker node to STX Kubernetes
Install containerd, kubelet, kubeadm packages (based on OS)
Config sysctl and swap
Join k8s cluster
Download images
There will be one playbook with different roles included.
Alternatives¶
There are several open source projects that can provision a Kubernetes node.
Kubespray
Kubespray [1] is a composition of Ansible playbooks, inventory, provisioning tools, and domain knowledge for generic OS/Kubernetes clusters configuration management tasks. Kubespray performs generic OS configuration as well as Kubernetes cluster bootstrapping.
Kubespray provides the whole functionality of provisioning a Kubernetes node just like the edgeworker provisioning playbook does. However, Kubespray supports multiple container runtimes, multiple CNI plugins and control plane bootstrap which are too much in functionality to provision an edgeworker.
What edgeworker need is a playbook for certain container runtime, certain CNI plugins and provision a Kubernetes node only.
KubeEdge
KubeEdge [2] is an open source system for extending native containerized application orchestration capabilities to hosts at Edge. KubeEdge could run upon an existing Kubernetes cluster and deploy a customized kubelet service called ‘edged’ to the edge node. In between the apiserver and edged, the EdgeController is the bridge who manages edge nodes and pods metadata so that the data can be targeted to a specific edge node.
KubeEdge is able to provision edge nodes from cloud. But the kubelet service is customized to fulfill the specific requirement that the administrator is able to manage the pods running on edge nodes from public cloud platform. The customized kubelet(edged) brings compatibility issues when Kubernetes upgrading to a newer release, which leads to an extra effort to test/upgrade KubeEdge during each Kubernetes upgrade since edgeworker provision is a key step to enable these nodes.
Besides, KubeEdge has a whole edge device management logic that is not in current StarlingX platform scope.
Data model impact¶
The only data model change is to insert ‘edgeworker’ to ‘invPersonalityEnum’ in sysinv db model.
REST API impact¶
None
Security impact¶
The potential security threat and mitigation could be:
Malicious node
It must be guaranteed by the administrator that no unauthorized node could physically connect into the management network. The authentication of the edgeworker node onboard will be introduced in the later phases.
Malicious packages in edgeworker node
It must be guaranteed by the administrator that the packages running in edgeworker nodes are secure since the OS is managed by the administrator.
Other end user impact¶
None
Performance Impact¶
None
Other deployer impact¶
The deployer is required to run edgeworker provision playbook after adding or updating the node as edgeworker personality.
Developer impact¶
None
Upgrade impact¶
The kubelet needs to be upgraded during the Kubernetes upgrade process. The upgrade process will trigger an additional script/playbook to check the version of the packages on edgeworker nodes, and upgrade them according to their own distribution.
The distribution’s repo may not update the corresponding packages to the newest version, due to Kubernetes version skew support policy [3] , up to two minor versions older against apiserver is acceptable for kubelet and kube-proxy.
The SW patching/updating will be addressed in phase four. It could either be a 3rd party solution or plugins of current SW management. Because current SW management could not patch/update packages other than RPMs, while the OS of edgeworker nodes could be different types of packages.
Implementation¶
Assignee(s)¶
- Primary assignee:
Mingyuan Qi
Repos Impacted¶
config ansible-playbook
Work Items¶
The work items are already introduced in section Proposed change above.
Dependencies¶
None
Testing¶
Sysinv unit test
Sysinv host operation test
Adding edgeworker nodes in different deploy mode test
Simplex
Duplex
Standard
Ansible-playbook test for each target OS
Host configuration
Package installation
Edgeworker node join to the Kubernetes cluster
Documentation Impact¶
Add a new page to describe the edgeworker nodes requirement, limitation and use case.
Add new page to describe the following deployment:
Duplex + edgeworker
Standard + edgeworker
Modify all deployment docs to insert an option to deploy edgewoker nodes and link it to underlying deployment with edgeworker nodes.
References¶
History¶
Release Name |
Description |
---|---|
stx.5.0 |
Edgeworker management phase one introduced |