Sessions in Evaluator

Similar to Prompts and No-code agents, we’ve introduced Session history to Evaluators, giving teams a complete record of all updates made to any evaluator (AI, statistical, human, etc). Each session captures every change — from LLM-as-a-judge prompt edits and pass criteria updates to code changes in Programmatic evals — creating a clear, chronological view of how an evaluator evolves over time. This provides greater transparency and traceability when collaboratively building and experimenting with evals.

Manage variables using Environments

Responses API support

⌘I

Release Notes