Manual

Overview

Due to the many options available to you for installing Kubernetes clusters, this document will not go into the specifics of setting up the cluster. Rather, it will provide you with guidance and requirements for your cluster.

Nodes

Depending on whether you want to use GPUs or not, you need the following nodes:

Nodes that are always required:

"main": This means nodes to run the control plane. The Composabl controller does not interact with these nodes, so they should be provisioned as recommended by the Kubernetes distribution you use.
"composabl": This node or nodes will be where the Composabl controller and Historian software are scheduled.
"envrunners": These nodes will handle training workloads. If you're not using GPUs, all training will be done on these nodes. If you are, these nodes will manage the communication with the simulators, and can be reduced in size
"simscpu": These nodes are where the simulators will be scheduled. Sizing depends on the simulator.

If you want to use GPU training, you need the following nodepool: 5. "learners": These nodes with GPUs will accelerate the learning step of the training process.

If your simulator can be accelerated using GPU, you can add the final node pool: 6. "simsgpu": These will run simulators, assigning a GPU to them.

A note on GPUs: Currently, only Nvidia GPUs are supported. The cluster must have the nvidia-gpu-operator installed for training on GPU to be enabled.

1. Sizing

Whether or not you use autoscaling using cluster-autoscaler, each node type must be sized accordingly.

main: As required by your Kubernetes distribution
composabl: In total, 16GB of memory and 4 CPU - with 1 node at least 8GB of memory.
envrunners: If not using GPUs, we recommend 8 CPU and 8 or 16 GB of memory. In any case, the number of simulators that can be managed by each envrunner instance depends on the number of CPU
simscpu: The sizing of these nodes depends on the resource requirements of your simulator
learners: These nodes should have 1 Nvidia GPU. Other resources can be limited - 2 CPU and 8GB of memory is sufficient
simsgpu: As with simscpu, depends on the simulator requirements.

2. Labels

All groups of nodes must be labeled accordingly. The name, as given in the sizing guide is the name you should set as the agentpool label.

You may be able to define this during your cluster setup, but if not, you can use the following commands:

kubectl label node <my-composabl-node> agentpool=composabl --overwrite
kubectl label node <my-envrunners-node> agentpool=envrunners --overwrite
kubectl label node <my-simulator-node> agentpool=simscpu --overwrite
kubectl label node <my-learners-node> agentpool=learners --overwrite
kubectl label node <my-simulator-gpu-node> agentpool=simsgpu --overwrite

Replace the values in between <> with the name of the nodes you'd like to assign to a specific pool.

Storage

The components also need access to (semi)persistent, shared storage. This section will detail the types and amount of storage needed.

It needs the following PersistentVolumeClaims in the composabl-train namespace:

pvc-controller-data with a size of ±1Gi and ReadWriteOnce (or better) accessMode When using Azure, you will need to set the nobrl mountOption for this PVC, as this is required for the Composabl controller to function.
pvc-training-results with a suitable size - this is where your final agent system data will be stored before it is uploaded to the No-code application. It needs accessmode to be ReadWriteMany (RWX). A good initial size is to match historian-tmp.
historian-tmp is used as temporary storage for historian data. It needs to have an accessMode of ReadWriteOnce and the size will depend on the length of your training sessions. We recommend starting with 5Gi.

The size of pvc-training-results and historian-tmp is dependent on the amount and size of training jobs you want to run simultaneously on your cluster. If you plan on running long-lived training sessions with many cycles, you may want to increase the capacity for both,

Private image registry

If you want to use a private registry for simulator images, you will need to set up this private registry yourself, and make sure the cluster is able to pull images from this registry.

Next steps

Once your cluster is running, and you have verified your setup is working, you can continue to Installing Composabl

PreviousCreating a Cluster NextAutomated

Last updated 3 months ago