Input
output
(str): The generated text to be evaluated.expectedOutput
(str): The reference or ground truth text.
Output
Result
(float): A score between 0 and 1.
Interpretation
- Higher scores (closer to 1): Indicates higher degree of overlap between the generated text and the ground truth, suggesting better output quality
- Lower scores (closer to 0): Indicates lower degree of overlap between the generated text and the ground truth, suggesting bad output quality
Formula
The BLEU score is calculated as: For a simplified version with bigrams (N=2): where:- (precision 1) is the unigram precision:
- is the bigram precision (similar calculation for bigrams)
- BP is the Brevity Penalty:
- is the reference length, and is the candidate length.
Example Calculation:
- Reference: “The cat sat on the mat”
- Candidate: “A cat is sitting on the mat”
-
Count unigrams:
- Reference: 6 words
- Candidate: 7 words
- Matching: “cat”, “on”, “the”, “mat” (4 words)
-
Calculate BP:
- (reference length)
- (candidate length)
- Since , BP = 1
-
For simplicity (assuming only unigram precision):
This is a Similarity Metric
Use Cases
- Evaluating machine translation systems.
- Assessing the quality of text summarization.
- Measuring performance in dialogue generation.