Understanding Output Format
This guide explains how to interpret WiMarka’s evaluation output and understand the metrics it provides.
Output Structure
For each evaluated sentence pair, WiMarka provides:
Line X:
Source: <source sentence>
Target: <target sentence>
Errors: <list of detected errors>
Fluency Score: <0-100>
Adequacy Score: <0-100>
Overall Score: <0-100>
Explanation: <human-readable explanation>
Suggested Correction: <corrected translation>
Example Output
Line 1:
Source: Good morning!
Target: Maayong buntag!
Errors: []
Fluency Score: 100/100
Adequacy Score: 100/100
Overall Score: 100/100
Explanation: Perfect translation with correct meaning and natural grammar.
Suggested Correction: Maayong buntag!
Evaluation Metrics
WiMarka provides three primary metrics for each translation:
Fluency Score
Range: 0-100
Definition: Measures how natural, grammatical, and readable the translation is in the target language.
- What It Evaluates:
Grammatical correctness
Natural word order
Proper use of particles and markers
Idiomatic expression
Overall readability
Interpretation:
Score Range |
Interpretation |
|---|---|
90-100 |
Excellent fluency. Translation reads like native text with natural grammar and word choice. |
75-89 |
Good fluency. Minor grammatical issues or slightly awkward phrasing, but generally understandable. |
60-74 |
Acceptable fluency. Noticeable grammatical errors or unnatural constructions, but meaning is clear. |
40-59 |
Poor fluency. Significant grammatical problems making the text difficult to read. |
0-39 |
Very poor fluency. Severe grammatical errors; text may be incomprehensible. |
Example High Fluency (Score: 95):
EN: The weather is beautiful today.
CEB: Nindot kaayo ang panahon karon.
# Natural Cebuano with proper word order and particles
Example Low Fluency (Score: 45):
EN: The weather is beautiful today.
CEB: Ang panahon nindot ka sa karon.
# Awkward word order, incorrect particle usage
Adequacy Score
Range: 0-100
Definition: Measures how completely and accurately the translation conveys the meaning of the source text.
- What It Evaluates:
Semantic completeness
Preservation of meaning
No critical omissions
No added information
Correct interpretation
Interpretation:
Score Range |
Interpretation |
|---|---|
90-100 |
Excellent adequacy. All meaning fully preserved with correct interpretation. |
75-89 |
Good adequacy. Most meaning conveyed; minor details may be slightly different. |
60-74 |
Acceptable adequacy. Core meaning present but some information loss or distortion. |
40-59 |
Poor adequacy. Significant meaning loss; important information missing or wrong. |
0-39 |
Very poor adequacy. Most meaning lost or severely distorted. |
Example High Adequacy (Score: 98):
EN: I bought three red apples at the market.
TGT: Bumili ako ng tatlong pulang mansanas sa palengke.
# All information preserved: quantity, color, item, location
Example Low Adequacy (Score: 50):
EN: I bought three red apples at the market.
TGT: Bumili ako ng mansanas.
# Missing: quantity, color, location
Overall Score
Range: 0-100
Definition: Combined metric representing overall translation quality.
Calculation:
Overall Score = (Fluency Score + Adequacy Score) / 2
Interpretation:
Score Range |
Quality Level |
|---|---|
90-100 |
Excellent - Publication-ready, professional quality |
75-89 |
Good - Minor improvements possible, generally acceptable |
60-74 |
Fair - Usable but needs revision |
40-59 |
Poor - Significant revision required |
0-39 |
Unacceptable - Major rework needed |
Trade-offs:
A translation can have different fluency and adequacy scores:
High Fluency, Low Adequacy:
EN: I need to finish this report by Friday.
CEB: Kinahanglan nakong human kini nga semana.
(I need to finish this week)
Fluency: 90 (grammatically correct Cebuano)
Adequacy: 60 (loses specificity - "Friday" vs "this week")
Overall: 75
Low Fluency, High Adequacy:
EN: I need to finish this report by Friday.
CEB: Ako kinahanglan finish kini report by Biyernes.
Fluency: 55 (code-switching, unnatural)
Adequacy: 95 (all meaning preserved)
Overall: 75
Error Detection
Error Types
WiMarka detects various error categories:
Error Type |
Description |
|---|---|
Lexical errors |
Wrong word choice, mistranslation |
Syntactic errors |
Incorrect grammar, word order issues |
Semantic errors |
Meaning distortion or loss |
Morphological errors |
Wrong affixes, inflections |
Pragmatic errors |
Incorrect formality, register |
Omissions |
Missing information |
Additions |
Unnecessary extra information |
Error Format
Errors are listed as a Python list:
# No errors
Errors: []
# Single error
Errors: ['Semantic mismatch: time of day']
# Multiple errors
Errors: ['Lexical error: wrong verb choice',
'Omission: quantity not specified']
Understanding Errors
Example 1: Semantic Mismatch
Source: Good morning!
Target: Maayong gabii! (Good evening!)
Errors: ['Semantic mismatch: time of day']
Explanation: Incorrect time reference - 'morning' vs 'evening'
Example 2: Omission
Source: I bought three apples.
Target: Bumili ako ng mansanas. (I bought apples.)
Errors: ['Omission: quantity not specified']
Explanation: The number 'three' was not translated
Example 3: Syntactic Error
Source: The book is on the table.
Target: Ang libro sa lamesa. (The book of table.)
Errors: ['Syntactic error: missing verb']
Explanation: Location verb 'nasa' missing
Explanations
Natural Language Explanations
WiMarka generates human-readable explanations for the evaluation:
Perfect Translation:
Explanation: Excellent translation with accurate meaning and
natural grammar. No errors detected.
Minor Issues:
Explanation: Good translation overall. Minor fluency issue with
word order, but meaning is fully preserved.
Significant Problems:
Explanation: Translation has semantic error - wrong time of day.
'Morning' incorrectly translated as 'evening'.
Grammar is correct but meaning is incorrect.
Contextual Information
Explanations may include:
Specific error locations
Linguistic reasoning
Cultural or pragmatic considerations
Alternative phrasings
Suggested Corrections
How Corrections Work
WiMarka provides improved translation suggestions:
Target: Maayong gabii!
Errors: ['Semantic mismatch: time of day']
Suggested Correction: Maayong buntag!
- Correction Quality:
Addresses detected errors
Maintains semantic accuracy
Improves fluency when possible
Preserves general style
When to Use Corrections
Note
Suggested corrections are recommendations, not absolute truth. Always have corrections reviewed by native speakers for production use.
- Good Use Cases:
Quick fixes for obvious errors
Learning from mistakes
Identifying problem patterns
- Exercise Caution:
Critical translations (legal, medical)
Cultural/contextual nuances
Creative or literary content
Output Examples by Quality
Excellent Translation (90-100)
Line 1:
Source: Thank you for your help.
Target: Salamat sa imong tabang.
Errors: []
Fluency Score: 98/100
Adequacy Score: 100/100
Overall Score: 99/100
Explanation: Perfect translation with natural phrasing and
complete meaning preservation.
Suggested Correction: Salamat sa imong tabang.
Good Translation (75-89)
Line 1:
Source: I will call you tomorrow.
Target: Tawagan ko ikaw ugma.
Errors: []
Fluency Score: 85/100
Adequacy Score: 98/100
Overall Score: 91.5/100
Explanation: Good translation. Slightly more natural would be
'Tawagan ta ka ugma' but current form is acceptable.
Suggested Correction: Tawagan ta ka ugma.
Fair Translation (60-74)
Line 1:
Source: The meeting starts at 2 PM.
Target: Ang meeting magsugod sa 2.
Errors: ['Code-switching: English word "meeting"',
'Missing: PM specification']
Fluency Score: 70/100
Adequacy Score: 75/100
Overall Score: 72.5/100
Explanation: Acceptable but could be improved. Use 'miting'
instead of 'meeting'. Add 'sa hapon' for PM.
Suggested Correction: Ang miting magsugod sa alas 2 sa hapon.
Poor Translation (Below 60)
Line 1:
Source: Good morning!
Target: Magandang gabi!
Errors: ['Wrong language: Tagalog instead of Cebuano',
'Semantic error: evening instead of morning']
Fluency Score: 40/100
Adequacy Score: 30/100
Overall Score: 35/100
Explanation: Major errors. Wrong language used (Tagalog not
Cebuano) and wrong time of day (evening not morning).
Suggested Correction: Maayong buntag!
Best Practices for Interpretation
Consider Both Scores
Don’t rely only on the overall score. Check both fluency and adequacy separately.
Read Explanations
The explanation provides crucial context for understanding scores.
Review Errors
Pay attention to the types of errors detected.
Context Matters
Scores should be interpreted based on your use case (casual vs. professional).
Use corrections Wisely
Treat suggestions as guidance, not absolute fixes.
Batch Analysis
For multiple sentences, look at average scores and error patterns.
Programmatic Access
Accessing Results in Python
from wimarka.main import wmk_eval, results
# Run evaluation
wmk_eval('source.txt', 'EN', 'target.txt', 'CEB')
# Access individual results
for i in range(len(results['source'])):
if results['overall_score'][i] < 70:
print(f"Low quality at line {i+1}:")
print(f" Score: {results['overall_score'][i]}")
print(f" Errors: {results['errors'][i]}")
Exporting for Analysis
import pandas as pd
from wimarka.main import results
# Create DataFrame
df = pd.DataFrame(results)
# Calculate statistics
print(f"Average Overall Score: {df['overall_score'].mean():.2f}")
print(f"Std Dev: {df['overall_score'].std():.2f}")
# Export
df.to_csv('detailed_results.csv', index=False)
Next Steps
See Examples for complete evaluation workflows
See Python Library Usage for programmatic result processing
See Command-Line Interface (CLI) Usage for output redirection techniques