LogoLogo
  • Welcome to Composabl
  • Get Started
  • Reference
    • CLI Reference
    • SDK Reference
    • Glossary
    • Sample Use Cases
  • Tutorials
    • Industrial Mixer
      • Get Started
      • Deep Reinforcement Learning
      • Strategy Pattern
      • Strategy Pattern with a Perception Layer
      • Plan-Execute Pattern
  • Establish a Simulation Environment
    • Simulation Overview
    • Connect a Simulator to Composabl
    • Composabl Simulation API
  • Build Multi-Agent Systems
    • Anatomy of a Multi-Agent System
    • Create a Use Case
    • Set Goals, Constraints, and Success Criteria
    • Create Skill Agents
      • Create Skill Agents
      • Create Skill Agents with Rewards Using the SDK
      • Configure Programmed Algorithms as Skill Agents
      • Configure API Connections to Third-Party Software as Skill Agents
    • Orchestrate Skill Agents
    • Configure Scenarios
    • Add a Perception Layer
      • Create a New Perceptor
      • Configure an ML Model as a Perceptor
      • Configure an LLM Model as a Perceptor
    • Publish Skill Agent Components to the UI
  • Train Agents
    • Submit a Training Job through the UI
    • Analyze Agent System Behavior
      • View Training Session Information
      • Analyze Data in Detail with the Historian
  • Evaluate Performance
    • Set KPI and ROI
    • Analyze Data
  • Deploy Agents
    • Access a Trained Agent System
    • Deploy an Agent System in a Container
    • Deploy an Agent System as an API
    • Connect Runtime Container to Your Operation
    • Connecting to Agent System Runtime and Plotting Results of Agent System Operations
  • clusters
    • Creating a Cluster
      • Manual
      • Automated
      • Azure
    • Connecting a Cluster
  • Troubleshooting
    • Resolving Certificate Issues for Installing the Composabl SDK on WSL
Powered by GitBook
On this page
  • Create a New Skill
  • The Python Teacher Class
  • Functions for Training
  • Functions to Guide Agent System Behavior with Rules
  • Functions to Manage Information Inside Agent Systems
Export as PDF
  1. Build Multi-Agent Systems
  2. Create Skill Agents

Create Skill Agents with Rewards Using the SDK

PreviousCreate Skill AgentsNextConfigure Programmed Algorithms as Skill Agents

Last updated 7 months ago

The Composabl SDK offers a suite of advanced tools to train skills using deep reinforcement learning. Using the Python teacher class, you can fine-tune the rewards for your skills. Once you have configured a skill with the SDK, you can publish it to the UI to use in agent system designs.

Create a New Skill

To create a skill in the Python SDK, begin by logging in to the SDK by typing Composabl login from the CLI.

Then type composabl skill new.

Give the skill a name and a description in response to the prompts that follow. Choose whether your skill should be a teacher (learned with AI) or a controller (a programmed module like an optimization algorithm or MPC controller).

Specify the folder where you’d like to create the skill.

The Composal SDK will create a folder and Python teacher file from the template.

The Python Teacher Class

The Python teacher class offers several functions that you can use to fine-tune the training of your skills.

Functions for Training

Train with Rewards: the compute_reward Function

The compute_reward function provides the bulk of the feedback after each agent system action about how much that action contributed to the success of the skill. This function returns a number that represents the reward signal the agent system will receive for its last decision. Reward functions, as they are called in reinforcement learning, can be tricky to craft. .

python
def compute_reward(self, transformed_sensors, action, sim_reward):
        self.counter += 1
        if self.past_ sensors is None:
            self.past_ sensors = transformed_ sensors
            return 0
        else:
            if self.past_ sensors ["state1"] < transformed_ sensors ["state1"]:
                return 1
            else:
                return -1

The compute_termination function tells the Composabl platform when to terminate a practice episode and start over with a new practice scenario (episode). From a teaching perspective, it makes most senses to terminate an episode when the agent system succeeds, fails, or is pursuing a course of action that you do not find likely to succeed. This function returns a Boolean flag (True or False) whether to terminate the episode. You can calculate this criteria however seems best.

python
def compute_termination(self, transformed_ sensors, action):
        return False

The success_criteria function provides a definition of skill success and a proxy for how completely the agent system has learned the skill. The platform uses the output of this function (True or False) to calculate when to stop training one skill and move on to training the next skill. It is also used to determine when to move to the next skill in a fixed order sequence. The agent system cannot move from one skill in a fixed order sequence to the next, until the success criteria for one skill is reached.

python
def compute_success_criteria(self, transformed_ sensors, action):
        return self.counter > 100

Here are some examples of success criteria definition:

  • A simple but naive success criteria might return True if the average reward for an episode or scenario crosses a threshold, but False if it does not.

  • A more complex success criteria might calculate root mean squared error (RMSE) for key variables across the episode and return True if the error is less than a customer specified benchmark, but False otherwise.

  • A complex success criteria might compare a benchmark controller or another agent system to the agent system across many key variables and trials. It returns True if the agent system beats the benchmark on this criteria, but False otherwise.

Train with Goals

Training with goals lets you use a predefined reward structure rather than configuring the rewards individually. When you use a goal, your agent system will inherit the compute reward, compute termination, and compute success functions from the goal. (You will still have the option to further customize those functions as needed.)

The five goal types you can use are:

  • AvoidGoal

  • MaximizeGoal

  • MinimizeGoal

  • ApproachGoal

  • MaintainGoal

Goals are added using specialized teacher classes rather than the general teacher class that you would otherwise use to teach skills. For example, for a skill named Balance that you wanted to train with a goal to maintain a specific orientation, you would use the MaintainGoal teacher class.

python
class BalanceTeacher(MaintainGoal):
	def __init__(self, *args, **kwargs):
super(),__init__(“pole_theta”, “Maintain pole to upright”, target=0, stop_distance=0.418)

The parameters you can use for goals are:

You can also use more than one goal for a single skill using the CoordinatedGoal teacher class. This is useful when your agent system needs to behave in a way that creates a balance between two goals that are both important.

Functions to Guide Agent System Behavior with Rules

Just like rules guide training and behavior for humans, providing rules for the agent system to follow can guide the agent system's decision-making more quickly to success. Rules guide the behavior of an agent system based on expertise and constraints.

Add Rules: the compute_action_mask Function

The compute_action_mask teaching function expresses rules that trainable agent systems must follow.

python
 # The action mask provides rules at each step about which actions the agent system is allowed to take.
    def compute_action_mask(self, transformed_ sensors, action):
        return [0, 1, 1]

The compute_action_mask teaching function works only for discrete action spaces (where the actions are integers or categories), not for continuous action spaces (where decision actions are decimal numbers). If you specify a mask for a skill whose actions are continuous, the platform will ignore the action mask.

The function returns a list of 0 and 1 values. Zero means that the action is forbidden by the rule. One means that the action is allowed by the rule. The function may change the returned value after each decision. This allows complex logic to express nuanced rules.

In the example above, the first action is forbidden for the next decision, but the second and third actions are allowed. The logic in the skill itself (whether learned or programmed) will choose between the allowed second and third actions.

All selectors have a discrete action space (they choose which child skill to activate), so you can always apply the compute_action_mask function to teach them.

Functions to Manage Information Inside Agent Systems

As information passes through perceptors, skills, and selectors in the agent system, sometimes it needs to change format along the way. You can use three teaching functions to transform sensor and action variables inside agent systems: transform_ sensors, transform_action, and filtered_ sensor _space.

To transform sensor variables, use the transform_sensor function to calculate changes to specific sensors, then return the complete set of sensor variables (the observation space).

python
def transform_sensor(self, sensor, action):
        return sensor

Two of the most common reasons for transforming sensor variables are conversion and normalization. For example, if a simulator reports temperature values in Fahrenheit, but the agent system expects temperature values in Celsius, use the transform_sensor function to convert between the two.

Normalization is when you transform variables into different ranges. For example, one sensor variable in your agent system might have very large values (in the thousands), but another variable might have small values (in the tenths), so you might use the transform_sensor function to transform these disparate sensor values to a range from 0 to 1 so that they can be better compared and used in the agent system.

You may want to transform action variables for the same reasons as sensor variables.

python
def transform_action(self, transformed_sensor, action):
    return action

Use the filtered_sensor_space function to pare down the list of sensor variables you need for a particular skill. Pass only the information that a skill or module needs in order to learn or perform well.

python
def filtered_sensor_space(self):
        return ["state1"]
Return a list of all the sensor variables that you want passed to the skill by this teacher.

End Training: the compute_termination Function

Define Success: the compute_success_criteria Function

These have the same parameters and work the same way as .

Transform Sensor Variables: the transform_sensors function

Transform Decisions within the Agent System: the transform_action function

Filter the Sensor List: the filtered_sensor_space function

Learn more about how to write good reward functions
​
​
the goal types in the UI
​
​
​
​