Skip to main content
output (list): The model-generated list of tool calls.
expectedOutput (list): The reference list of expected tool calls (JSON-formatted objects with tool name and arguments).
Output
Result (float): A score between 0 and 1.
Reasoning (str): Optional detailed feedback on the matching process.
Interpretation
- Higher scores (closer to 1): Most expected tool calls were made correctly with proper parameters and order
- Lower scores (closer to 0): Few expected tool calls were matched correctly
Tool Call Accuracy=Total expected tool callsNumber of correct tool calls
Use Cases
- Evaluating agent compliance with required tool sequences
- Assessing function-calling tasks that require specific arguments
- Measuring multi-step tool-use workflows end-to-end