Manual
Overview
Due to the many options available to you for installing Kubernetes clusters, this document will not go into the specifics of setting up the cluster. Rather, it will provide you with guidance and requirements for your cluster.
Nodes
Depending on whether you want to use GPUs or not, you need the following nodes:
Nodes that are always required:
"main": This means nodes to run the control plane. The Composabl controller does not interact with these nodes, so they should be provisioned as recommended by the Kubernetes distribution you use.
"composabl": This node or nodes will be where the Composabl controller and Historian software are scheduled.
"envrunners": These nodes will handle training workloads. If you're not using GPUs, all training will be done on these nodes. If you are, these nodes will manage the communication with the simulators, and can be reduced in size
"simscpu": These nodes are where the simulators will be scheduled. Sizing depends on the simulator.
If you want to use GPU training, you need the following nodepool: 5. "learners": These nodes with GPUs will accelerate the learning step of the training process.
If your simulator can be accelerated using GPU, you can add the final node pool: 6. "simsgpu": These will run simulators, assigning a GPU to them.
A note on GPUs: Currently, only Nvidia GPUs are supported. The cluster must have the nvidia-gpu-operator installed for training on GPU to be enabled.
1. Sizing
Whether or not you use autoscaling using cluster-autoscaler, each node type must be sized accordingly.
main
: As required by your Kubernetes distributioncomposabl
: In total, 16GB of memory and 4 CPU - with 1 node at least 8GB of memory.envrunners
: If not using GPUs, we recommend 8 CPU and 8 or 16 GB of memory. In any case, the number of simulators that can be managed by each envrunner instance depends on the number of CPUsimscpu
: The sizing of these nodes depends on the resource requirements of your simulatorlearners
: These nodes should have 1 Nvidia GPU. Other resources can be limited - 2 CPU and 8GB of memory is sufficientsimsgpu
: As withsimscpu
, depends on the simulator requirements.
2. Labels
All groups of nodes must be labeled accordingly. The name, as given in the sizing guide is the name you should set as the agentpool label.
You may be able to define this during your cluster setup, but if not, you can use the following commands:
Replace the values in between <>
with the name of the nodes you'd like to assign to a specific pool.
Storage
The components also need access to (semi)persistent, shared storage. This section will detail the types and amount of storage needed.
It needs the following PersistentVolumeClaim
s in the composabl-train
namespace:
pvc-controller-data
with a size of ±1Gi
andReadWriteOnce
(or better)accessMode
When using Azure, you will need to set thenobrl
mountOption for this PVC, as this is required for the Composabl controller to function.pvc-training-results
with a suitable size - this is where your final agent data will be stored before it is uploaded to the No-code application. It needsaccessmode
to beReadWriteMany
(RWX). A good initial size is to matchhistorian-tmp
.historian-tmp
is used as temporary storage for historian data. It needs to have anaccessMode
ofReadWriteOnce
and the size will depend on the length of your training sessions. We recommend starting with5Gi
.
The size of pvc-training-results
and historian-tmp
is dependent on the amount and size of training jobs you want to run simultaneously on your cluster. If you plan on running long-lived training sessions with many cycles, you may want to increase the capacity for both,
Private image registry
If you want to use a private registry for simulator images, you will need to set up this private registry yourself, and make sure the cluster is able to pull images from this registry.
Next steps
Once your cluster is running, and you have verified your setup is working, you can continue to Installing Composabl
Last updated