Understanding Output Format

This guide explains how to interpret WiMarka’s evaluation output and understand the metrics it provides.

Output Structure

For each evaluated sentence pair, WiMarka provides:

Line X:
  Source: <source sentence>
  Target: <target sentence>
  Errors: <list of detected errors>
  Fluency Score: <0-100>
  Adequacy Score: <0-100>
  Overall Score: <0-100>
  Explanation: <human-readable explanation>
  Suggested Correction: <corrected translation>

Example Output

Line 1:
  Source: Good morning!
  Target: Maayong buntag!
  Errors: []
  Fluency Score: 100/100
  Adequacy Score: 100/100
  Overall Score: 100/100
  Explanation: Perfect translation with correct meaning and natural grammar.
  Suggested Correction: Maayong buntag!

Evaluation Metrics

WiMarka provides three primary metrics for each translation:

Fluency Score

Range: 0-100

Definition: Measures how natural, grammatical, and readable the translation is in the target language.

What It Evaluates:
  • Grammatical correctness

  • Natural word order

  • Proper use of particles and markers

  • Idiomatic expression

  • Overall readability

Interpretation:

Score Range

Interpretation

90-100

Excellent fluency. Translation reads like native text with natural grammar and word choice.

75-89

Good fluency. Minor grammatical issues or slightly awkward phrasing, but generally understandable.

60-74

Acceptable fluency. Noticeable grammatical errors or unnatural constructions, but meaning is clear.

40-59

Poor fluency. Significant grammatical problems making the text difficult to read.

0-39

Very poor fluency. Severe grammatical errors; text may be incomprehensible.

Example High Fluency (Score: 95):

EN:  The weather is beautiful today.
CEB: Nindot kaayo ang panahon karon.
# Natural Cebuano with proper word order and particles

Example Low Fluency (Score: 45):

EN:  The weather is beautiful today.
CEB: Ang panahon nindot ka sa karon.
# Awkward word order, incorrect particle usage

Adequacy Score

Range: 0-100

Definition: Measures how completely and accurately the translation conveys the meaning of the source text.

What It Evaluates:
  • Semantic completeness

  • Preservation of meaning

  • No critical omissions

  • No added information

  • Correct interpretation

Interpretation:

Score Range

Interpretation

90-100

Excellent adequacy. All meaning fully preserved with correct interpretation.

75-89

Good adequacy. Most meaning conveyed; minor details may be slightly different.

60-74

Acceptable adequacy. Core meaning present but some information loss or distortion.

40-59

Poor adequacy. Significant meaning loss; important information missing or wrong.

0-39

Very poor adequacy. Most meaning lost or severely distorted.

Example High Adequacy (Score: 98):

EN:  I bought three red apples at the market.
TGT: Bumili ako ng tatlong pulang mansanas sa palengke.
# All information preserved: quantity, color, item, location

Example Low Adequacy (Score: 50):

EN:  I bought three red apples at the market.
TGT: Bumili ako ng mansanas.
# Missing: quantity, color, location

Overall Score

Range: 0-100

Definition: Combined metric representing overall translation quality.

Calculation:

Overall Score = (Fluency Score + Adequacy Score) / 2

Interpretation:

Score Range

Quality Level

90-100

Excellent - Publication-ready, professional quality

75-89

Good - Minor improvements possible, generally acceptable

60-74

Fair - Usable but needs revision

40-59

Poor - Significant revision required

0-39

Unacceptable - Major rework needed

Trade-offs:

A translation can have different fluency and adequacy scores:

High Fluency, Low Adequacy:

EN:  I need to finish this report by Friday.
CEB: Kinahanglan nakong human kini nga semana.
     (I need to finish this week)

Fluency: 90 (grammatically correct Cebuano)
Adequacy: 60 (loses specificity - "Friday" vs "this week")
Overall: 75

Low Fluency, High Adequacy:

EN:  I need to finish this report by Friday.
CEB: Ako kinahanglan finish kini report by Biyernes.

Fluency: 55 (code-switching, unnatural)
Adequacy: 95 (all meaning preserved)
Overall: 75

Error Detection

Error Types

WiMarka detects various error categories:

Error Type

Description

Lexical errors

Wrong word choice, mistranslation

Syntactic errors

Incorrect grammar, word order issues

Semantic errors

Meaning distortion or loss

Morphological errors

Wrong affixes, inflections

Pragmatic errors

Incorrect formality, register

Omissions

Missing information

Additions

Unnecessary extra information

Error Format

Errors are listed as a Python list:

# No errors
Errors: []

# Single error
Errors: ['Semantic mismatch: time of day']

# Multiple errors
Errors: ['Lexical error: wrong verb choice',
         'Omission: quantity not specified']

Understanding Errors

Example 1: Semantic Mismatch

Source: Good morning!
Target: Maayong gabii!  (Good evening!)
Errors: ['Semantic mismatch: time of day']
Explanation: Incorrect time reference - 'morning' vs 'evening'

Example 2: Omission

Source: I bought three apples.
Target: Bumili ako ng mansanas.  (I bought apples.)
Errors: ['Omission: quantity not specified']
Explanation: The number 'three' was not translated

Example 3: Syntactic Error

Source: The book is on the table.
Target: Ang libro sa lamesa.  (The book of table.)
Errors: ['Syntactic error: missing verb']
Explanation: Location verb 'nasa' missing

Explanations

Natural Language Explanations

WiMarka generates human-readable explanations for the evaluation:

Perfect Translation:

Explanation: Excellent translation with accurate meaning and
             natural grammar. No errors detected.

Minor Issues:

Explanation: Good translation overall. Minor fluency issue with
             word order, but meaning is fully preserved.

Significant Problems:

Explanation: Translation has semantic error - wrong time of day.
             'Morning' incorrectly translated as 'evening'.
             Grammar is correct but meaning is incorrect.

Contextual Information

Explanations may include:

  • Specific error locations

  • Linguistic reasoning

  • Cultural or pragmatic considerations

  • Alternative phrasings

Suggested Corrections

How Corrections Work

WiMarka provides improved translation suggestions:

Target: Maayong gabii!
Errors: ['Semantic mismatch: time of day']
Suggested Correction: Maayong buntag!
Correction Quality:
  • Addresses detected errors

  • Maintains semantic accuracy

  • Improves fluency when possible

  • Preserves general style

When to Use Corrections

Note

Suggested corrections are recommendations, not absolute truth. Always have corrections reviewed by native speakers for production use.

Good Use Cases:
  • Quick fixes for obvious errors

  • Learning from mistakes

  • Identifying problem patterns

Exercise Caution:
  • Critical translations (legal, medical)

  • Cultural/contextual nuances

  • Creative or literary content

Output Examples by Quality

Excellent Translation (90-100)

Line 1:
  Source: Thank you for your help.
  Target: Salamat sa imong tabang.
  Errors: []
  Fluency Score: 98/100
  Adequacy Score: 100/100
  Overall Score: 99/100
  Explanation: Perfect translation with natural phrasing and
               complete meaning preservation.
  Suggested Correction: Salamat sa imong tabang.

Good Translation (75-89)

Line 1:
  Source: I will call you tomorrow.
  Target: Tawagan ko ikaw ugma.
  Errors: []
  Fluency Score: 85/100
  Adequacy Score: 98/100
  Overall Score: 91.5/100
  Explanation: Good translation. Slightly more natural would be
               'Tawagan ta ka ugma' but current form is acceptable.
  Suggested Correction: Tawagan ta ka ugma.

Fair Translation (60-74)

Line 1:
  Source: The meeting starts at 2 PM.
  Target: Ang meeting magsugod sa 2.
  Errors: ['Code-switching: English word "meeting"',
           'Missing: PM specification']
  Fluency Score: 70/100
  Adequacy Score: 75/100
  Overall Score: 72.5/100
  Explanation: Acceptable but could be improved. Use 'miting'
               instead of 'meeting'. Add 'sa hapon' for PM.
  Suggested Correction: Ang miting magsugod sa alas 2 sa hapon.

Poor Translation (Below 60)

Line 1:
  Source: Good morning!
  Target: Magandang gabi!
  Errors: ['Wrong language: Tagalog instead of Cebuano',
           'Semantic error: evening instead of morning']
  Fluency Score: 40/100
  Adequacy Score: 30/100
  Overall Score: 35/100
  Explanation: Major errors. Wrong language used (Tagalog not
               Cebuano) and wrong time of day (evening not morning).
  Suggested Correction: Maayong buntag!

Best Practices for Interpretation

  1. Consider Both Scores

    Don’t rely only on the overall score. Check both fluency and adequacy separately.

  2. Read Explanations

    The explanation provides crucial context for understanding scores.

  3. Review Errors

    Pay attention to the types of errors detected.

  4. Context Matters

    Scores should be interpreted based on your use case (casual vs. professional).

  5. Use corrections Wisely

    Treat suggestions as guidance, not absolute fixes.

  6. Batch Analysis

    For multiple sentences, look at average scores and error patterns.

Programmatic Access

Accessing Results in Python

from wimarka.main import wmk_eval, results

# Run evaluation
wmk_eval('source.txt', 'EN', 'target.txt', 'CEB')

# Access individual results
for i in range(len(results['source'])):
    if results['overall_score'][i] < 70:
        print(f"Low quality at line {i+1}:")
        print(f"  Score: {results['overall_score'][i]}")
        print(f"  Errors: {results['errors'][i]}")

Exporting for Analysis

import pandas as pd
from wimarka.main import results

# Create DataFrame
df = pd.DataFrame(results)

# Calculate statistics
print(f"Average Overall Score: {df['overall_score'].mean():.2f}")
print(f"Std Dev: {df['overall_score'].std():.2f}")

# Export
df.to_csv('detailed_results.csv', index=False)

Next Steps