What it does
Ibras AI Agent is a Claude Code plugin marketplace that simplifies building and evaluating AI agents. It bundles plugins that automate the creation of complete evaluation systems for Mastra projects—from scorers and golden datasets to offline experiments and online sampling.
How it works
Install the marketplace once in Claude Code, then add individual plugins as needed. The flagship plugin, mastra-evals, profiles any Mastra agent or workflow and generates:
- Scorers (code-based, LLM-judge, and rubric-based)
- Golden datasets for benchmarking
- Offline experiments for batch testing
- CI regression tests via
runEvals - Online sampling for production monitoring
- Brainstorming entry points for defining eval goals
Use cases
Product teams use this to validate AI agent performance before shipping. Research teams run offline experiments to compare agent behaviors. DevOps teams integrate CI regression tests into their deployment pipelines. Startups rapidly prototype eval frameworks without writing boilerplate.
Who benefits
AI product managers, design systems teams, and developers building with Mastra who need evaluation rigor without manual setup overhead.