BLEU

Input
Output
Interpretation
Formula
Example Calculation:
Use Cases

Input

output (str): The generated text to be evaluated.
expectedOutput (str): The reference or ground truth text.

Output

Result (float): A score between 0 and 1.

Interpretation

Higher scores (closer to 1): Indicates higher degree of overlap between the generated text and the ground truth, suggesting better output quality
Lower scores (closer to 0): Indicates lower degree of overlap between the generated text and the ground truth, suggesting bad output quality

Formula

The BLEU score is calculated as:

\text{BLEU} = \text{BP} \times \exp\left(\sum_{n=1}^{N} w_n \log p_n\right)

For a simplified version with bigrams (N=2):

\text{BLEU} = \text{BP} \times (p_1 \times p_2)^{1/2}

where:

$p_1$ (precision 1) is the unigram precision:

p_1 = \frac{\text{number of clipped matching unigrams}}{\text{total candidate unigrams}}

$p_2$ is the bigram precision (similar calculation for bigrams)
BP is the Brevity Penalty:

\text{BP} = \exp(1 - r/c) \text{ if } c < r \text{, otherwise } \text{BP} = 1

$r$ is the reference length, and $c$ is the candidate length.

Example Calculation:

Reference: “The cat sat on the mat”
Candidate: “A cat is sitting on the mat”

Count unigrams:
- Reference: 6 words
- Candidate: 7 words
- Matching: “cat”, “on”, “the”, “mat” (4 words)
- $p_1 = 4/7 = 0.571$
Calculate BP:
- $r = 6$ (reference length)
- $c = 7$ (candidate length)
- Since $c > r$ , BP = 1
For simplicity (assuming only unigram precision):
- $\text{BLEU} = 1 \times 0.571 = 0.571$

This is a Similarity Metric

Use Cases

Evaluating machine translation systems.
Assessing the quality of text summarization.
Measuring performance in dialogue generation.

WER (Word Error Rate)

F1 Score

⌘I

Introduction

Prompt Engineering

Offline Evals

Online Evals

Tracing

Simulations

Library

Dashboards

Integrations

Settings

CI/CD

Input

Output

Interpretation

Formula

Example Calculation:

Use Cases

Introduction

Prompt Engineering

Offline Evals

Online Evals

Tracing

Simulations

Library

Dashboards

Integrations

Settings

CI/CD

​Input

​Output

​Interpretation

​Formula

​Example Calculation:

​Use Cases

Input

Output

Interpretation

Formula

Example Calculation:

Use Cases