Quick Start Guide
Get up and running with WiMarka in just a few minutes! This guide will walk you through your first translation evaluation.
Prerequisites
Before starting, make sure you have:
✅ Installed WiMarka (see Installation)
✅ Python 3.12 or higher
✅ Two text files: source text and its translation
Your First Evaluation
Step 1: Prepare Your Input Files
Create two text files with parallel sentences:
source_file.txt (English):
Good morning!
How are you today?
Thank you for your help.
target_file.txt (Cebuano translation):
Maayong buntag!
Kumusta ka karon?
Salamat sa imong tabang.
Important
File Format Requirements:
One sentence per line
UTF-8 encoding
Same number of lines in both files
Lines correspond (line 1 in source matches line 1 in target)
Step 2: Run WiMarka
You can use either the Python library or the command-line interface.
Option A: Using Python
Create a Python script (evaluate.py):
from wimarka.main import wmk_eval
wmk_eval(
src_file_path='source_file.txt',
src_lang='EN',
tgt_file_path='target_file.txt',
tgt_lang='CEB'
)
Run the script:
python evaluate.py
Option B: Using CLI
Run directly from the command line:
wimarka --src_file_path source_file.txt \\
--src_lang EN \\
--tgt_file_path target_file.txt \\
--tgt_lang CEB
Step 3: Understanding the Output
WiMarka will process each sentence pair and display progress:
INFO - Starting evaluation...
INFO - Evaluating line 1/3
INFO - Detecting errors...
INFO - Scoring translation...
INFO - Generating explanation...
INFO - Correcting translation...
INFO - Evaluating line 2/3
...
After processing all sentences, you’ll see detailed results:
=== Evaluation Results ===
----------------------------------------
Line 1:
Source: Good morning!
Target: Maayong buntag!
Errors: []
Fluency Score: 100/100
Adequacy Score: 100/100
Overall Score: 100/100
Explanation: Excellent translation with no errors.
Suggested Correction: Maayong buntag!
----------------------------------------
Line 2:
Source: How are you today?
Target: Kumusta ka karon?
Errors: []
Fluency Score: 98/100
Adequacy Score: 95/100
Overall Score: 96.5/100
Explanation: Very good translation, minor fluency variation.
Suggested Correction: Kumusta ka karon?
----------------------------------------
Evaluation completed.
Understanding the Scores
WiMarka provides three types of scores for each translation:
Fluency Score (0-100)
Measures how natural and grammatically correct the translation reads in the target language.
90-100: Excellent, native-like fluency
70-89: Good fluency with minor issues
50-69: Acceptable but noticeable problems
Below 50: Poor fluency, difficult to understand
Adequacy Score (0-100)
Evaluates how well the translation preserves the meaning of the source text.
90-100: Complete meaning preservation
70-89: Most meaning preserved, minor omissions
50-69: Partial meaning loss
Below 50: Significant meaning loss
Overall Score (0-100)
Combines fluency and adequacy into a single quality metric.
Calculated as:
(Fluency + Adequacy) / 2Provides a quick quality assessment
Common Language Codes
Use these codes when specifying source and target languages:
Code |
Language |
Usage Example |
|---|---|---|
|
English |
|
|
Cebuano |
|
|
Ilocano |
|
|
Tagalog |
|
See Supported Languages for complete language information.
Example with Errors
Let’s try an evaluation with translation errors:
source_error.txt:
Good morning!
target_error.txt:
Maayong gabii!
Note
“Gabii” means “evening” in Cebuano, which is incorrect for “morning”
Run the evaluation:
wimarka --src_file_path source_error.txt \\
--src_lang EN \\
--tgt_file_path target_error.txt \\
--tgt_lang CEB
Expected output:
Line 1:
Source: Good morning!
Target: Maayong gabii!
Errors: [Semantic mismatch: time of day]
Fluency Score: 95/100
Adequacy Score: 40/100
Overall Score: 67.5/100
Explanation: The translation has incorrect time reference.
"Morning" was translated as "gabii" (evening).
Suggested Correction: Maayong buntag!
Best Practices
- 📝 File Preparation
Use UTF-8 encoding for all text files
Keep sentences reasonably short (< 100 words)
Ensure proper sentence alignment
- 🎯 Choosing Languages
English is typically used as the source language
Select the appropriate Philippine language code for target
- ⚡ Performance
First run downloads models (may take time)
Subsequent runs are faster (models are cached)
For large files, consider batch processing
- 💾 Storage
Models are cached in
~/.cache/huggingface/Ensure adequate disk space (5-10 GB recommended)
Next Steps
Now that you’ve completed your first evaluation:
For Python developers: See Python Library Usage for advanced programming examples
For CLI users: See Command-Line Interface (CLI) Usage for complete command options
For more examples: See Examples for real-world scenarios
To understand output: See Understanding Output Format for detailed result interpretation