Create Skills with Goals Using the UI
Last updated
Last updated
You can use the Composabl UI to create skills that learn with deep reinforcement learning. When you create a in the UI, you select goals. Composabl then turns these goals into reward functions to train the skill.
To create a skill from the UI, click New skill from the skills panel and name your skill in the modal that pops up. You will then be prompted to configure your skill.
If you want your skill to be taught with deep reinforcement learning, select Teacher. You’ll then be prompted to add goals to your skill for training.
Each skill in your agent succeeds as it approaches a specific goal. The goals of each skill should be clean and simple. If your agent is designed well, based on a good breakdown of the task into skills, each skill will have a clear goal.
Goals apply to one of the sensor variables, and are defined using one of five possible directives:
Avoid: Keep the variable from reaching a specified value
Maximize: Maximize the value of the variable
Minimize: Minimize the value of the variable
Approach: Get the value to the target range as quickly as possible
Maintain: Keep the variable at a specified value
For example, for the industrial mixer, we want to maximize the concentration of the product, Ca. We also want to avoid temperature, T, getting above 400 degrees Kelvin.
You can also use advanced goal settings to fine-tune your goals. Access the advanced settings by clicking on the settings icon associated with one of the goals.
You'll then see additional options to configure your goal.
Tolerance only applies to the three objective types that include a target, Avoid, Approach, and Maintain. This setting allows you to tell the agent to accept a range of values around the target as successful performance. You might use this to prevent the agent from using too much compute power trying to get from good enough to perfect.
Stop value allows you to tell the agent to end the training episode when the variable reaches the target value. This could be because the agent has succeeded and the process is complete, or because it has failed and needs to try again.
Stop steps allows you to tell the agent to end the training episode after a certain number of iterations.
Boundaries are for normalizing rewards. This is useful when the problem has variables that are very different, which can otherwise make it difficult for the trainer to calculate reward. For example, one sensor variable in your agent might have very large values (in the thousands), but another variable might have small values (in the tenths), so you might use boundaries to allow the agent to better compare the two variables.
Scale allows you to provide relative weights to goals to account for goals that have different levels of importance or priority. This is very difficult to get right with Machine Teaching and should be handled with care.