Skip to main content

Why Human Evaluation Matters

Automated evaluators are useful for consistent, large-scale assessments, but they often miss nuance, such as context, intent, tone, and human judgment. Human evaluation complements automation by adding:
  • Qualitative feedback and detailed comments
  • Context-aware judgments
  • Rewritten outputs that demonstrate better responses
Human evaluators configured for logs are the same evaluators used in test runs, ensuring consistency across your evaluation workflow.

Ways to Annotate Logs

Maxim supports two approaches to human annotation:

Internal annotation (in-app)

Team members annotate logs directly within the Maxim UI after human evaluators are configured on a log repository.

External annotation (via email)

Invite external raters (outside your organization) to annotate selected logs through an email-based workflow.

Before you start

  • Ensure logging is set up to capture interactions between your LLM and users.
    Integrate the Maxim SDK into your application.
  • Make sure at least one Human Evaluator exists in your workspace.
    Create one from the Evaluators () tab in the sidebar.

Setting Up Self Annotation

1

Navigate to the repository

Open the log repository where you want to enable human evaluation.
2

Open evaluation configuration

Click Configure evaluation in the top-right corner of the page. This opens the evaluation configuration sheet.
3

Select human evaluators

In the Human evaluation section, choose the evaluators you want to enable:
  • Session evaluators – For multi-turn interactions
  • Trace evaluators – For single-response evaluations
Select evaluators from the dropdown. The same evaluators used in test runs are available here.
4

Save the configuration

Click Save configurations at the bottom of the sheet.
Once configured, the logs table automatically adds a column for each human evaluator. Team members can start annotating logs immediately.

Annotating Logs

You can annotate logs from two locations.

From the logs table

When human evaluators are enabled, corresponding columns appear in the logs table:
  1. Click a cell in a human evaluator column
  2. Provide a rating in the annotation form
  3. Optionally add comments or a rewritten output
  4. Click Save
The column displays the average score across all annotations for that log. Screenshot of the logs table with human evaluator columns

From trace details

  1. Open a trace from the logs table
  2. Click Annotate in the top-right corner of the trace details panel
  3. Provide ratings for all configured evaluators
  4. Optionally add comments or rewritten outputs
  5. Save your annotations
Screenshot of the trace details sheet with the annotate button highlighted

Using Saved Views as Annotation Queues

Saved views help organize annotation work by creating filtered queues of logs:
  1. Apply filters (e.g. unannotated logs, time ranges, specific attributes)
  2. Save the filtered view
  3. Share the view with raters to work through annotations systematically
This ensures evaluators focus on the most relevant logs and keeps annotation workflows organized.

Inviting External Raters

1

Navigate to the repository

Open the log repository where you want to trigger external human evaluation.
2

Select traces or sessions

Select one or more traces or sessions from the table, then use the floating action panel to click Add evaluators.Tips:
  • Select individual logs using row checkboxes
  • Use the top checkbox to select all logs within the current filters and time range Screenshot of the select traces for annotation
3

Choose evaluators

Select one or more Human Evaluators from the dropdown.Optionally, include other evaluators if you want to retroactively evaluate existing logs.Screenshot of the add evaluators button
4

Invite external raters via email

Click Trigger to open the Human Evaluation dialog.In the dialog:
  • Enter external rater email addresses
  • Add instructions for raters
  • Choose what data they can access
Visibility options:
  • Only trace-level data (input, output, tags, metadata)
  • Entire trace tree, including nested steps
Click Add evaluators to send invitations.Screenshot of the human evaluation dialog
5

Start annotation

Invited external raters receive an email with a link to the annotation dashboard.Screenshot of the email with the link to the annotation dashboard
6

Annotation dashboard

External raters use the dashboard to review data and submit annotations.Screenshot of the annotation dashboard

Viewing Annotations

Annotations are visible in two places.

Logs table

  • Human evaluator scores appear as columns
  • Scores are averaged across all annotators
  • Click any cell to add or edit an annotation
Internal and external annotations are treated identically; the only difference is the annotation source.

Trace details (Evaluation tab)

The Evaluation tab shows:
  • Average scores per evaluator
  • Individual annotations, including scores, comments, and rewritten outputs
  • Pass/fail status, based on evaluator criteria

Understanding Annotation Scores

  • Average scores – Mean score across all annotators for a log
  • Individual breakdown – View each annotator’s scores, comments, and rewrites
  • Pass/fail evaluation – Determined by evaluator configuration
  • Rewritten outputs – Multiple rewritten versions may exist and are all preserved
Human annotation provides insights that automated systems cannot capture, enabling continuous improvement of your AI applications based on real-world usage.