Understanding Output Format

This guide explains how to interpret WiMarka’s evaluation output and understand the metrics it provides.

Output Structure

For each evaluated sentence pair, WiMarka provides:

Line X:
  Source: <source sentence>
  Target: <target sentence>
  Errors: <list of detected errors>
  Fluency Score: <0-100>
  Adequacy Score: <0-100>
  Overall Score: <0-100>
  Explanation: <human-readable explanation>
  Suggested Correction: <corrected translation>

Example Output

Line 1:
  Source: Good morning!
  Target: Maayong buntag!
  Errors: []
  Fluency Score: 100/100
  Adequacy Score: 100/100
  Overall Score: 100/100
  Explanation: Perfect translation with correct meaning and natural grammar.
  Suggested Correction: Maayong buntag!

Evaluation Metrics

WiMarka provides three primary metrics for each translation:

Fluency Score

Range: 0-100

Definition: Measures how natural, grammatical, and readable the translation is in the target language.

What It Evaluates:

Grammatical correctness
Natural word order
Proper use of particles and markers
Idiomatic expression
Overall readability

Interpretation:

Score Range	Interpretation
90-100	Excellent fluency. Translation reads like native text with natural grammar and word choice.
75-89	Good fluency. Minor grammatical issues or slightly awkward phrasing, but generally understandable.
60-74	Acceptable fluency. Noticeable grammatical errors or unnatural constructions, but meaning is clear.
40-59	Poor fluency. Significant grammatical problems making the text difficult to read.
0-39	Very poor fluency. Severe grammatical errors; text may be incomprehensible.

Example High Fluency (Score: 95):

EN:  The weather is beautiful today.
CEB: Nindot kaayo ang panahon karon.
# Natural Cebuano with proper word order and particles

Example Low Fluency (Score: 45):

EN:  The weather is beautiful today.
CEB: Ang panahon nindot ka sa karon.
# Awkward word order, incorrect particle usage

Adequacy Score

Range: 0-100

Definition: Measures how completely and accurately the translation conveys the meaning of the source text.

What It Evaluates:

Semantic completeness
Preservation of meaning
No critical omissions
No added information
Correct interpretation

Interpretation:

Score Range	Interpretation
90-100	Excellent adequacy. All meaning fully preserved with correct interpretation.
75-89	Good adequacy. Most meaning conveyed; minor details may be slightly different.
60-74	Acceptable adequacy. Core meaning present but some information loss or distortion.
40-59	Poor adequacy. Significant meaning loss; important information missing or wrong.
0-39	Very poor adequacy. Most meaning lost or severely distorted.

Example High Adequacy (Score: 98):

EN:  I bought three red apples at the market.
TGT: Bumili ako ng tatlong pulang mansanas sa palengke.
# All information preserved: quantity, color, item, location

Example Low Adequacy (Score: 50):

EN:  I bought three red apples at the market.
TGT: Bumili ako ng mansanas.
# Missing: quantity, color, location

Overall Score

Range: 0-100

Definition: Combined metric representing overall translation quality.

Calculation:

Overall Score = (Fluency Score + Adequacy Score) / 2

Interpretation:

Score Range	Quality Level
90-100	Excellent - Publication-ready, professional quality
75-89	Good - Minor improvements possible, generally acceptable
60-74	Fair - Usable but needs revision
40-59	Poor - Significant revision required
0-39	Unacceptable - Major rework needed

Trade-offs:

A translation can have different fluency and adequacy scores:

High Fluency, Low Adequacy:

EN:  I need to finish this report by Friday.
CEB: Kinahanglan nakong human kini nga semana.
     (I need to finish this week)

Fluency: 90 (grammatically correct Cebuano)
Adequacy: 60 (loses specificity - "Friday" vs "this week")
Overall: 75

Low Fluency, High Adequacy:

EN:  I need to finish this report by Friday.
CEB: Ako kinahanglan finish kini report by Biyernes.

Fluency: 55 (code-switching, unnatural)
Adequacy: 95 (all meaning preserved)
Overall: 75

Error Detection

Error Types

WiMarka detects various error categories:

Error Type	Description
Lexical errors	Wrong word choice, mistranslation
Syntactic errors	Incorrect grammar, word order issues
Semantic errors	Meaning distortion or loss
Morphological errors	Wrong affixes, inflections
Pragmatic errors	Incorrect formality, register
Omissions	Missing information
Additions	Unnecessary extra information

Error Format

Errors are listed as a Python list:

# No errors
Errors: []

# Single error
Errors: ['Semantic mismatch: time of day']

# Multiple errors
Errors: ['Lexical error: wrong verb choice',
         'Omission: quantity not specified']

Understanding Errors

Example 1: Semantic Mismatch

Source: Good morning!
Target: Maayong gabii!  (Good evening!)
Errors: ['Semantic mismatch: time of day']
Explanation: Incorrect time reference - 'morning' vs 'evening'

Example 2: Omission

Source: I bought three apples.
Target: Bumili ako ng mansanas.  (I bought apples.)
Errors: ['Omission: quantity not specified']
Explanation: The number 'three' was not translated

Example 3: Syntactic Error

Source: The book is on the table.
Target: Ang libro sa lamesa.  (The book of table.)
Errors: ['Syntactic error: missing verb']
Explanation: Location verb 'nasa' missing

Explanations

Natural Language Explanations

WiMarka generates human-readable explanations for the evaluation:

Perfect Translation:

Explanation: Excellent translation with accurate meaning and
             natural grammar. No errors detected.

Minor Issues:

Explanation: Good translation overall. Minor fluency issue with
             word order, but meaning is fully preserved.

Significant Problems:

Explanation: Translation has semantic error - wrong time of day.
             'Morning' incorrectly translated as 'evening'.
             Grammar is correct but meaning is incorrect.

Contextual Information

Explanations may include:

Specific error locations
Linguistic reasoning
Cultural or pragmatic considerations
Alternative phrasings

Suggested Corrections

How Corrections Work

WiMarka provides improved translation suggestions:

Target: Maayong gabii!
Errors: ['Semantic mismatch: time of day']
Suggested Correction: Maayong buntag!

Correction Quality:

Addresses detected errors
Maintains semantic accuracy
Improves fluency when possible
Preserves general style

When to Use Corrections

Note

Suggested corrections are recommendations, not absolute truth. Always have corrections reviewed by native speakers for production use.

Good Use Cases:

Quick fixes for obvious errors
Learning from mistakes
Identifying problem patterns

Exercise Caution:

Critical translations (legal, medical)
Cultural/contextual nuances
Creative or literary content

Output Examples by Quality

Excellent Translation (90-100)

Line 1:
  Source: Thank you for your help.
  Target: Salamat sa imong tabang.
  Errors: []
  Fluency Score: 98/100
  Adequacy Score: 100/100
  Overall Score: 99/100
  Explanation: Perfect translation with natural phrasing and
               complete meaning preservation.
  Suggested Correction: Salamat sa imong tabang.

Good Translation (75-89)

Line 1:
  Source: I will call you tomorrow.
  Target: Tawagan ko ikaw ugma.
  Errors: []
  Fluency Score: 85/100
  Adequacy Score: 98/100
  Overall Score: 91.5/100
  Explanation: Good translation. Slightly more natural would be
               'Tawagan ta ka ugma' but current form is acceptable.
  Suggested Correction: Tawagan ta ka ugma.

Fair Translation (60-74)

Line 1:
  Source: The meeting starts at 2 PM.
  Target: Ang meeting magsugod sa 2.
  Errors: ['Code-switching: English word "meeting"',
           'Missing: PM specification']
  Fluency Score: 70/100
  Adequacy Score: 75/100
  Overall Score: 72.5/100
  Explanation: Acceptable but could be improved. Use 'miting'
               instead of 'meeting'. Add 'sa hapon' for PM.
  Suggested Correction: Ang miting magsugod sa alas 2 sa hapon.

Poor Translation (Below 60)

Line 1:
  Source: Good morning!
  Target: Magandang gabi!
  Errors: ['Wrong language: Tagalog instead of Cebuano',
           'Semantic error: evening instead of morning']
  Fluency Score: 40/100
  Adequacy Score: 30/100
  Overall Score: 35/100
  Explanation: Major errors. Wrong language used (Tagalog not
               Cebuano) and wrong time of day (evening not morning).
  Suggested Correction: Maayong buntag!

Best Practices for Interpretation

Consider Both Scores

Don’t rely only on the overall score. Check both fluency and adequacy separately.
Read Explanations

The explanation provides crucial context for understanding scores.
Review Errors

Pay attention to the types of errors detected.
Context Matters

Scores should be interpreted based on your use case (casual vs. professional).
Use corrections Wisely

Treat suggestions as guidance, not absolute fixes.
Batch Analysis

For multiple sentences, look at average scores and error patterns.

Programmatic Access

Accessing Results in Python

from wimarka.main import wmk_eval, results

# Run evaluation
wmk_eval('source.txt', 'EN', 'target.txt', 'CEB')

# Access individual results
for i in range(len(results['source'])):
    if results['overall_score'][i] < 70:
        print(f"Low quality at line {i+1}:")
        print(f"  Score: {results['overall_score'][i]}")
        print(f"  Errors: {results['errors'][i]}")

Exporting for Analysis

import pandas as pd
from wimarka.main import results

# Create DataFrame
df = pd.DataFrame(results)

# Calculate statistics
print(f"Average Overall Score: {df['overall_score'].mean():.2f}")
print(f"Std Dev: {df['overall_score'].std():.2f}")

# Export
df.to_csv('detailed_results.csv', index=False)

Next Steps

See Examples for complete evaluation workflows
See Python Library Usage for programmatic result processing
See Command-Line Interface (CLI) Usage for output redirection techniques