LogoLogo
  • Welcome to Composabl
  • Get Started
  • Reference
    • CLI Reference
    • SDK Reference
    • Glossary
    • Sample Use Cases
  • Tutorials
    • Industrial Mixer
      • Get Started
      • Deep Reinforcement Learning
      • Strategy Pattern
      • Strategy Pattern with a Perception Layer
      • Plan-Execute Pattern
  • Establish a Simulation Environment
    • Simulation Overview
    • Connect a Simulator to Composabl
    • Composabl Simulation API
  • Build Multi-Agent Systems
    • Anatomy of a Multi-Agent System
    • Create a Use Case
    • Set Goals, Constraints, and Success Criteria
    • Create Skill Agents
      • Create Skill Agents
      • Create Skill Agents with Rewards Using the SDK
      • Configure Programmed Algorithms as Skill Agents
      • Configure API Connections to Third-Party Software as Skill Agents
    • Orchestrate Skill Agents
    • Configure Scenarios
    • Add a Perception Layer
      • Create a New Perceptor
      • Configure an ML Model as a Perceptor
      • Configure an LLM Model as a Perceptor
    • Publish Skill Agent Components to the UI
  • Train Agents
    • Submit a Training Job through the UI
    • Analyze Agent System Behavior
      • View Training Session Information
      • Analyze Data in Detail with the Historian
  • Evaluate Performance
    • Set KPI and ROI
    • Analyze Data
  • Deploy Agents
    • Access a Trained Agent System
    • Deploy an Agent System in a Container
    • Deploy an Agent System as an API
    • Connect Runtime Container to Your Operation
    • Connecting to Agent System Runtime and Plotting Results of Agent System Operations
  • clusters
    • Creating a Cluster
      • Manual
      • Automated
      • Azure
    • Connecting a Cluster
  • Troubleshooting
    • Resolving Certificate Issues for Installing the Composabl SDK on WSL
Powered by GitBook
On this page
  • Benchmark Testing and Data Generation
  • Downloading Benchmark Artifacts
  • Understanding the Benchmark.json File
  • File Structure
  • Key Components
Export as PDF
  1. Evaluate Performance

Analyze Data

PreviousSet KPI and ROINextAccess a Trained Agent System

Last updated 2 months ago

Benchmark Testing and Data Generation

After training a multi-agent system in Composabl, the system automatically runs a series of standardized tests to evaluate its performance. This benchmarking process:

  1. Places the system in controlled testing environments

  2. Records detailed metrics at each step of operation

  3. Aggregates results to provide comprehensive performance statistics

The output of this testing process is compiled into a structured benchmark.json file, which contains rich performance data that can be analyzed to assess effectiveness, identify improvement opportunities, and compare different design approaches. This file is a performance record and a valuable analytics resource for optimizing your agentic systems.

Downloading Benchmark Artifacts

To download benchmark data for further analysis:

  1. Navigate to the "Training Sessions" page

  2. Click the artifacts dropdown in the top right page of a trained system

  3. Select "Benchmark"

  4. The benchmark.json file will be saved to your local machine

Understanding the Benchmark.json File

The benchmark.json file contains structured data about the performance of a trained agent system. Here's how to interpret this file:

File Structure

{
  "skill-name": {
    "scenario-0": {
      "scenario_data": { ... },
      "episode-0": [ ... ],
      "aggregate": { ... }
    }
  }
}

Key Components

Scenario Data: Contains reference values for the scenario:

"scenario_data": {
  "sensor_one": {"data": 8.57, "type": "is_equal"}, 
  "sensor_two": {"data": 373, "type": "is_equal"}
}

Episode Data: Array of state-action pairs showing how the agent performed in each step:

[
  {
    "state": "{'sensor_one': array([311.2639], dtype=float32), ...}",
    "action": "[-1.253192]",
    "teacher_reward": 1.0,
    "teacher_success": false,
    "teacher_terminal": null
  },
  ...
]

Aggregate Statistics: Summary statistics for the entire benchmark:

"aggregate": {
  "mean": { ... },
  "medians": { ... },
  "std_dev": { ... },
  "max": { ... },
  "min": { ... }
}