Evaluating Agents With Braintrust

Evaluating Agents with Braintrust

Greylock Change

How to evaluate AI agents with Braintrust

Join

Evaluating Agents and Assistants: The AI Conference

Jason Lopatecki, Co-Founder and CEO of Arize AI, dives into the world of

Intro to Evals with Braintrust

In this video, we walk through the complete eval workflow, including creating datasets, prompts, and scorers.

Why building eval platforms is hard — Phil Hetzel, Braintrust

An eval platform is not just a test runner. You are building shared definitions of "good," reliable data pipelines, labelling workflows, ...

Evals 101 — Doug Guthrie, Braintrust

This hands-on workshop guides participants through the full AI

Evaluating agents: how we built Loop, the AI assistant for evals

Join Doug Guthrie, Solutions Engineer at

Intro to Loop: the AI agent for evals and observability in Braintrust

Learn how you can use Loop in

Intro to Remote Agent Evals with Braintrust

In this video, we walk through how you can connect an

Intro to Braintrust: AI Observability and Evals

An end-to-end walkthrough of

Langfuse Launch Week Day 3: Agent Tracing and Evaluation

We're introducing a set of upgrades to make complex

Braintrust and Box on AI agents and the future of AI observability

BrainTrust

Evaluating and Debugging Non-Deterministic AI Agents

Evaluate

How to evaluate agents in practice

Evaluating Agents

LLM as a Judge: Scaling AI Evaluation Strategies

Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ...

[Evals Workshop] Mastering AI Evaluation: From Playground to Production

This hands-on workshop will guide participants through the complete AI

Agentic Evals by Shishir Patil

He cited "ML-Jym," a framework from Meta and collaborators, as a concrete example of a system for