Analyze Data in Detail with the Historian
In this tutorial, we will explore how to use the historian to validate the trained AI agent in Composabl and training logs. The historian stores historical time-series data in an optimized format (parquet) - https://www.databricks.com/glossary/what-is-parquet, which helps in evaluating how the agent is performing over training.
Step 1: Accessing the Historian Data
The historian file stores time-series data essential for validating agent training. There are several ways to access and store the historian data, but the recommended format is as a delta file (parquet).
Understanding the Format:
The historian data is typically large, around 500 megabytes for standard operations. It is stored in a Delta Lake file format, optimized for time-series data and supporting efficient queries.
This file can be downloaded as XML or another format (e.g., CSV or XLS) but is most efficient in Delta Lake for handling larger datasets.
Downloading the Historian File:
From the Composabl UI, download the historian file. This file will likely come in a compressed format (e.g.,
.gz
).After extracting it, you should see the delta file containing time-series data.
Step 2: Setting Up for Validation
Unpacking the Historian File:
If the historian file is compressed (e.g.,
.gz
), unpack the file using a tool likegzip
:Once unzipped, you’ll see a 10 MB+ delta file with historical time-series data.
Understanding the Delta File:
The delta file is optimized for fast reads and writes of time-series data.
It supports an append-only structure, which ensures that each new piece of data can be added efficiently without modifying the existing data.
Step 3: Querying the Historian Data
Setting Up a Query Environment:
To validate your agent’s training, you’ll need to set up an environment that allows you to query the delta file. Delta Lake integrates well with systems like Apache Spark, but for simple querying, you can use tools like pandas in Python.
Querying for Agent Training Logs:
Extract and analyze relevant historical data from the delta file. Here's a simple Python example for querying the delta file using pandas:
Key Benefits of Using the Historian for Validation:
Optimized Data Handling: The Delta Lake format is designed for fast querying, making it ideal for time-series data.
Efficient Storage: The append-only nature ensures that new data can be added without overwriting or modifying existing data, making it easy to track data over time.
Continuous Monitoring: By continuously adding data to the historian, you can validate your agent's long-term impact on machine performance, uptime, and safety.
Last updated