Regression testing tool for AI agents

Snapshots and compares tool calls to catch regressions in AI agents, compatible with LangGraph, CrewAI, OpenAI, and Anthropic.

View on GitHub→

AI Agent Regression Testing Tool

This tool provides a robust solution for ensuring the stability and predictable behavior of your AI agents. By capturing and comparing tool calls made by your agents over time, it allows you to identify and address regressions that can arise from code changes, model updates, or prompt modifications. This is crucial for maintaining the reliability of complex AI systems, especially those built with frameworks like LangGraph and CrewAI.

What it Does

The core functionality of this regression testing tool is to create snapshots of the tool calls an AI agent makes during its execution. These snapshots serve as a baseline. Subsequent runs of the agent can then be compared against these established baselines. Any discrepancies in tool calls—whether it's a different tool being invoked, different arguments being passed, or a change in the order of operations—are flagged as regressions. This direct comparison mechanism simplifies the process of debugging and validating agent behavior.

Key Features

Snapshotting: Captures the exact tool calls made by your AI agent during a test run.
Comparison: Compares current tool call sequences against previously saved snapshots.
Regression Detection: Automatically identifies and reports deviations in tool usage.
Framework Compatibility: Designed to work seamlessly with popular AI agent development frameworks such as LangGraph and CrewAI.
Model Agnostic: Supports agents utilizing OpenAI and Anthropic models.
Open Source: Hosted on GitHub, allowing for transparency and community contributions.

Who it's For

This tool is specifically designed for AI developers and engineers who are building, deploying, and maintaining AI agents. If you are working with frameworks like LangGraph or CrewAI, or integrating with OpenAI or Anthropic APIs, and need to ensure your agents consistently perform their intended actions without introducing unintended side effects from updates, this tool is invaluable. It's for those who prioritize stability, reliability, and efficient debugging in their AI agent development workflows.