Input
- Required Inputs:
session: Complete interaction log between user and agent showing all steps takenexpected_steps: Ordered list of required steps to be verified in sequence
Output
Result: Binary score (0 or 1)Reasoning: Detailed explanation of step completion
Interpretation
- 1: All steps completed in exact order
- 0: Missing steps or wrong order