Install a Subcloud Without Redfish Platform Management Service

For subclouds with servers that do not support Redfish Virtual Media Service, the ISO is installed locally at the subcloud. You can use the Central Cloud’s CLI to bootstrap subclouds from the Central Cloud.

About this task

After physically installing the hardware and network connectivity of a subcloud, the subcloud installation process has two phases:

  • Installing the ISO on controller-0; this is done locally at the subcloud by using either, a bootable USB device, or a local PXE boot server

  • Executing the dcmanager subcloud add command in the Central Cloud that uses Ansible to bootstrap StarlingX on controller-0 in the subcloud

Note

After a successful remote installation of a subcloud in a Distributed Cloud system, a subsequent remote reinstallation fails because of an existing ssh key entry in the /root/.ssh/known_hosts on the System Controller. In this case, delete the host key entry, if present, from /root/.ssh/known_hosts on the System Controller before doing reinstallations.

Prerequisites

  • You must have downloaded update-iso.sh from a StarlingX mirror.

  • In order to deploy subclouds from either controller, all local files that are referenced in the subcloud-bootstrap-values.yaml file must exist on both controllers (for example, /home/sysadmin/docker-registry-ca-cert.pem).

Procedure

  1. At the subcloud location, physically install the servers and network connectivity required for the subcloud.

    Note

    The servers require connectivity to a gateway router that provides IP routing between the subcloud management subnet and the System Controller management subnet, and between the subcloud OAM subnet and the System Controller subnet.

  2. Update the ISO image to modify installation boot parameters (if required), automatically select boot menu options and add a kickstart file to automatically perform configurations such as configuring the initial IP Interface for bootstrapping.

    For subclouds, the initial IP Interface should be the planned OAM IP Interface for the subcloud.

    Use the update-iso.sh script from a StarlingX mirror. The script is used as follows:

    update-iso.sh --initial-password <password> -i <input bootimage.iso> -o <output bootimage.iso>
                    [ -a <ks-addon.cfg> ] [ -p param=value ]
                    [ -d <default menu option> ] [ -t <menu timeout> ]
         -i <file>: Specify input ISO file
         -o <file>: Specify output ISO file
         -a <file>: Specify ks-addon.cfg file
         --initial-password <password>: Specify the initial login password for sysadmin user
         --no-force-password: Do not force password change on initial login (insecure)
    
         -p <p=v>:  Specify boot parameter
                    Examples:
                    -p instdev=/dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
    
         -d <default menu option>:
                    Specify default boot menu option:
                    0 - Standard Controller, Serial Console
                    1 - Standard Controller, Graphical Console
                    2 - AIO, Serial Console
                    3 - AIO, Graphical Console
                    4 - AIO Low-latency, Serial Console
                    5 - AIO Low-latency, Graphical Console
                    NULL - Clear default selection
         -t <menu timeout>:
                    Specify boot menu timeout, in seconds
    

    The following example ks-addon.cfg file, used with the -a option, sets up an initial IP interface at boot time by defining a VLAN on an Ethernet interface and has it use DHCP to request an IP address.

    In Debian, by default the ks-addon.cfg script is executed outside of the installing subcloud runtime (outside the chroot environment). As a result, the script does not have access to the kernel runtime command shell. Instead, the file system must be accessed via the provided $IMAGE_ROOTFS environment variable.

    If required, a chroot can be manually entered, allowing full access to the installing subcloud’s execution environment. See the ks-addon.cfg given below for an example.

    #### start ks-addon.cfg
    
    DEVICE=enp0s3
    OAM_VLAN=1234
    OAM_ADDR="xxxx:xxxx:x:xxxx:xx:x:x:x"
    
    # This section is run outside of the subcloud target runtime.
    # The IMAGE_ROOTFS environment variable is set to the root of the target filesystem
    
    cat << EOF > ${IMAGE_ROOTFS}/etc/network/interfaces.d/ifcfg-${DEVICE}
    auto ${DEVICE}
    iface ${DEVICE} inet6 manual
    mtu 9000
    post-up echo 0 > /proc/sys/net/ipv6/conf/${DEVICE}/autoconf;\
    echo 0 > /proc/sys/net/ipv6/conf/${DEVICE}/accept_ra;\
    echo 0 > /proc/sys/net/ipv6/conf/${DEVICE}/accept_redirects
    EOF
    
    cat << EOF > ${IMAGE_ROOTFS}/etc/network/interfaces.d/ifcfg-vlan${OAM_VLAN}
    auto vlan${OAM_VLAN}
    iface vlan${OAM_VLAN} inet6 static
    vlan-raw-device ${DEVICE}
    address ${OAM_ADDR}
    netmask 64
    gateway ${OAM_GW_ADDR}
    mtu 1500
    post-up /usr/sbin/ip link set dev vlan${OAM_VLAN} mtu 1500;\
    echo 0 > /proc/sys/net/ipv6/conf/vlan${OAM_VLAN}/autoconf;\
    echo 0 > /proc/sys/net/ipv6/conf/vlan${OAM_VLAN}/accept_ra;\
    echo 0 > /proc/sys/net/ipv6/conf/vlan${OAM_VLAN}/accept_redirects
    EOF
    
    # If execution is required inside the chroot environment, you can manually enter the
    # chroot and run commands. Note: quotes around EOF are required:
    cat << "EOF" | chroot "${IMAGE_ROOTFS}" /bin/bash -s
      echo "ks-addon.cfg: inside chroot"
    
      # chrooted commands go here.
      # Commands are executed in the context of the installing subcloud.
    
    EOF
    
    #### end ks-addon.cfg
    

    After updating the ISO image, create a bootable USB with the ISO or put the ISO on a PXEBOOT server.

  3. At the subcloud location, install the StarlingX software from a USB device or a PXE Boot Server on the server designated as controller-0.

  4. At the subcloud location, verify that the OAM interface on the subcloud controller has been properly configured by the kickstart file added to the ISO.

  5. Log in to the subcloud’s controller-0 and ping the Central Cloud’s floating OAM IP Address.

  6. At the System Controller, create a /home/sysadmin/subcloud1-bootstrap-values.yaml overrides file for the subcloud.

    For example:

    system_mode: simplex
    name: "subcloud1"
    
    description: "test"
    location: "loc"
    
    management_subnet: 192.168.101.0/24
    management_start_address: 192.168.101.2
    management_end_address: 192.168.101.50
    management_gateway_address: 192.168.101.1
    
    external_oam_subnet: 10.10.10.0/24
    external_oam_gateway_address: 10.10.10.1
    external_oam_floating_address: 10.10.10.12
    
    systemcontroller_gateway_address: 192.168.204.101
    
    docker_registries:
      k8s.gcr.io:
        url: registry.central:9001/k8s.gcr.io
      gcr.io:
        url: registry.central:9001/gcr.io
      ghcr.io:
        url: registry.central:9001/ghcr.io
      quay.io:
        url: registry.central:9001/quay.io
      docker.io:
        url: registry.central:9001/docker.io
      docker.elastic.co:
        url: registry.central:9001/docker.elastic.co
      defaults:
        username: sysinv
        password: <sysinv_password>
        type: docker
    

    Where <sysinv_password> can be found by running the following command as ‘sysadmin’ on the Central Cloud:

    $ keyring get sysinv services
    

    This configuration uses the local registry on your central cloud. If you prefer to use the default external registries, make the following substitutions for the docker_registries and additional_local_registry_images sections of the file.

    docker_registries:
      defaults:
       username: <your_wrs-aws.io_username>
       password: <your_wrs-aws.io_password>
    

    Note

    If you have a reason not to use the Central Cloud’s local registry you can pull the images from another local private docker registry.

  7. You can use the Central Cloud’s local registry to pull images on subclouds. The Central Cloud’s local registry’s HTTPS certificate must have the Central Cloud’s OAM IP, registry.local and registry.central in the certificate’s SAN list. For example, a valid certificate contains a SAN list "DNS.1: registry.local DNS.2: registry.central IP.1: <floating management\> IP.2: <floating OAM\>".

    If required, run the following command on the Central Cloud prior to bootstrapping the subcloud to install the new certificate for the Central Cloud with the updated SAN list:

    ~(keystone_admin)]$ system certificate-install -m docker_registry path_to_cert
    
  8. At the Central Cloud / System Controller, monitor the progress of the subcloud bootstrapping and deployment by using the deploy status field of the dcmanager subcloud list command.

    For example:

    ~(keystone_admin)]$ dcmanager subcloud list
    +----+-----------+------------+--------------+---------------+---------+
    | id | name      | management | availability | deploy status | sync    |
    +----+-----------+------------+--------------+---------------+---------+
    |  1 | subcloud1 | unmanaged  | online       | complete      | unknown |
    +----+-----------+------------+--------------+---------------+---------+
    

    If deploy_status shows an installation, bootstrap or deployment failure state, you can use the dcmanager subcloud errors command in order to get more detailed information about failure.

    For example:

    sysadmin@controller-0 ~(keystone_admin)]$ dcmanager subcloud errors 1
    FAILED bootstrapping playbook of (subcloud1).
    
    detail: fatal: [subcloud1]: FAILED! => changed=true
      failed_when_result: true
      msg: non-zero return code
        500 Server Error: Internal Server Error ("manifest unknown: manifest unknown")
         Image download failed: admin-2.cumulus.mss.com: 30093/wind-river/cloud-platform-deployment-manager: WRCP_22.06 500 Server Error: Internal Server Error ("Get https://admin-2.cumulus .mss.com: 30093/v2/: dial tcp: lookup admin-2.cumulus.mss.com on 10.41.0.1:53: read udp 10.41.1.3:40251->10.41.0.1:53: i/o timeout")
         Image download failed: gcd.io/kubebuilder/kube-rdac-proxy:v0.11.0 500 Server Error: Internal Server Error ("Get https://gcd.io/v2/: dial tcp: lookup gcd.io on 10.41.0.1:53: read udp 10.41.1.3:52485->10.41.0.1:53: i/o timeout")
        raise Exception("Failed to download images %s" % failed_downloads)
         Exception: Failed to download images ["admin-2.cumulus.mss.com: 30093/wind-river/cloud-platform-deployment-manager: WRCP_22.06", "gcd.io kubebuilder/kube-rdac-proxy:v0.11.0"]
    FAILED TASK: TASK [common/push-docker-images Download images and push to local registry] Wednesday 12 October 2022 12:27:31 +0000 (0:00:00.042)
    0:16:34.495
    

    The deploy status field has the following values:

    Pre-Install

    This status indicates that the ISO for the subcloud is being updated by the Central Cloud with the boot menu parameters, and kickstart configuration as specified in the install-values.yaml file.

    Installing

    This status indicates that the subcloud’s ISO is being installed from the Central Cloud to the subcloud using the Redfish Virtual Media service on the subcloud’s BMC.

    Bootstrapping

    This status indicates that the Ansible bootstrap of StarlingX software on the subcloud’s controller-0 is in progress.

    Complete

    This status indicates that subcloud deployment is complete.

    The subcloud bootstrapping and deployment can take up to 30 minutes.

  9. You can also monitor detailed logging of the subcloud bootstrapping and deployment by monitoring the following log files on the active controller in the Central Cloud.

    /var/log/dcmanager/ansible/<subcloud_name>_playbook.output.log

    For example:

    controller-0:/home/sysadmin# tail /var/log/dcmanager/ansible/subcloud1_playbook.output.log
    k8s.gcr.io: {password: secret, url: null}
    quay.io: {password: secret, url: null}
    )
    
    TASK [bootstrap/bringup-essential-services : Mark the bootstrap as completed] ***
    changed: [subcloud1]
    
    PLAY RECAP *********************************************************************
    subcloud1                  : ok=230  changed=137  unreachable=0    failed=0
    

Postrequisites

  • Provision the newly installed and bootstrapped subcloud. For detailed StarlingX deployment procedures for the desired deployment configuration of the subcloud, see the post-bootstrap steps of the Installation guide.

  • Check and update docker registry credentials on the subcloud:

    REGISTRY="docker-registry"
    SECRET_UUID='system service-parameter-list | fgrep
    $REGISTRY | fgrep auth-secret | awk '{print $10}''
    SECRET_REF='openstack secret list | fgrep $
    {SECRET_UUID} | awk '{print $2}''
    openstack secret get ${SECRET_REF} --payload -f value
    

    The secret payload should be username: sysinv password:<password>. If the secret payload is username: admin password:<password>, see, Updating Docker Registry Credentials on a Subcloud for more information.

  • For more information on bootstrapping and deploying, see the procedures listed under Installation.

  • Add static route for nodes in subcloud to access openldap service.

    In DC system, openldap service is running on Central Cloud. In order for the nodes in the subclouds to access openldap service, such as ssh to the nodes as openldap users, a static route to the System Controller is required to be added in these nodes. This applies to controller nodes, worker nodes and storage nodes (nodes that have sssd running).

    The static route can be added on each of the nodes in the subcloud using system CLI.

    The following examples show how to add the static route in controller node and worker node:

    system host-route-add controller-0 mgmt0 <Central Cloud mgmt subnet> 64 <Gateway IP address>
    system host-route-add compute-0 mgmt0 <Central Cloud mgmt subnet> 64 <Gateway IP address>
    

    The static route can also be added using Deployment Manager by adding the route in its configuration file.

    The following examples show adding the route configuration in controller and worker host profiles of the deployment manager’s configuration file:

    Controller node:
    ---
    apiVersion: starlingx.windriver.com/v1
    kind: HostProfile
    metadata:
      labels:
        controller-tools.k8s.io: "1.0"
      name: controller-0-profile
      namespace: deployment
    spec:
      administrativeState: unlocked
      bootDevice: /dev/disk/by-path/pci-0000:c3:00.0-nvme-1
      console: ttyS0,115200n8
      installOutput: text
      ......
      routes:
          - gateway: <Gateway IP address>
        activeinterface: mgmt0
        metric: 1
        prefix: 64
        subnet: <Central Cloud mgmt subnet>
    
    Worker node:
    ---
    apiVersion: starlingx.windriver.com/v1
    kind: HostProfile
    metadata:
      labels:
        controller-tools.k8s.io: "1.0"
      name: compute-0-profile
      namespace: deployment
    spec:
      administrativeState: unlocked
      boardManagement:
        credentials:
          password:
            secret: bmc-secret
        type: dynamic
      bootDevice: /dev/disk/by-path/pci-0000:00:1f.2-ata-1.0
      clockSynchronization: ntp
      console: ttyS0,115200n8
      installOutput: text
      ......
      routes:
          - gateway: <Gateway IP address>
        interface: mgmt0
        metric: 1
        prefix: 64
        subnet: <Central Cloud mgmt subnet>