Quick Start Guide

Get up and running with WiMarka in just a few minutes! This guide will walk you through your first translation evaluation.

Prerequisites

Before starting, make sure you have:

✅ Installed WiMarka (see Installation)

✅ Python 3.12 or higher

✅ Two text files: source text and its translation

Your First Evaluation

Step 1: Prepare Your Input Files

Create two text files with parallel sentences:

source_file.txt (English):

Good morning!
How are you today?
Thank you for your help.

target_file.txt (Cebuano translation):

Maayong buntag!
Kumusta ka karon?
Salamat sa imong tabang.

Important

File Format Requirements:

One sentence per line
UTF-8 encoding
Same number of lines in both files
Lines correspond (line 1 in source matches line 1 in target)

Step 2: Run WiMarka

You can use either the Python library or the command-line interface.

Option A: Using Python

Create a Python script (evaluate.py):

from wimarka.main import wmk_eval

wmk_eval(
    src_file_path='source_file.txt',
    src_lang='EN',
    tgt_file_path='target_file.txt',
    tgt_lang='CEB'
)

Run the script:

python evaluate.py

Option B: Using CLI

Run directly from the command line:

wimarka --src_file_path source_file.txt \\
        --src_lang EN \\
        --tgt_file_path target_file.txt \\
        --tgt_lang CEB

Step 3: Understanding the Output

WiMarka will process each sentence pair and display progress:

INFO - Starting evaluation...
INFO - Evaluating line 1/3
INFO - Detecting errors...
INFO - Scoring translation...
INFO - Generating explanation...
INFO - Correcting translation...
INFO - Evaluating line 2/3
...

After processing all sentences, you’ll see detailed results:

=== Evaluation Results ===
----------------------------------------
Line 1:
  Source: Good morning!
  Target: Maayong buntag!
  Errors: []
  Fluency Score: 100/100
  Adequacy Score: 100/100
  Overall Score: 100/100
  Explanation: Excellent translation with no errors.
  Suggested Correction: Maayong buntag!
----------------------------------------
Line 2:
  Source: How are you today?
  Target: Kumusta ka karon?
  Errors: []
  Fluency Score: 98/100
  Adequacy Score: 95/100
  Overall Score: 96.5/100
  Explanation: Very good translation, minor fluency variation.
  Suggested Correction: Kumusta ka karon?
----------------------------------------

Evaluation completed.

Understanding the Scores

WiMarka provides three types of scores for each translation:

Fluency Score (0-100)

Measures how natural and grammatically correct the translation reads in the target language.

90-100: Excellent, native-like fluency
70-89: Good fluency with minor issues
50-69: Acceptable but noticeable problems
Below 50: Poor fluency, difficult to understand

Adequacy Score (0-100)

Evaluates how well the translation preserves the meaning of the source text.

90-100: Complete meaning preservation
70-89: Most meaning preserved, minor omissions
50-69: Partial meaning loss
Below 50: Significant meaning loss

Overall Score (0-100)

Combines fluency and adequacy into a single quality metric.

Calculated as: (Fluency + Adequacy) / 2
Provides a quick quality assessment

Common Language Codes

Use these codes when specifying source and target languages:

Code	Language	Usage Example
`EN`	English	`--src_lang EN`
`CEB`	Cebuano	`--tgt_lang CEB`
`ILO`	Ilocano	`--tgt_lang ILO`
`TGT`	Tagalog	`--tgt_lang TGT`

See Supported Languages for complete language information.

Example with Errors

Let’s try an evaluation with translation errors:

source_error.txt:

Good morning!

target_error.txt:

Maayong gabii!

Note

“Gabii” means “evening” in Cebuano, which is incorrect for “morning”

Run the evaluation:

wimarka --src_file_path source_error.txt \\
        --src_lang EN \\
        --tgt_file_path target_error.txt \\
        --tgt_lang CEB

Expected output:

Line 1:
  Source: Good morning!
  Target: Maayong gabii!
  Errors: [Semantic mismatch: time of day]
  Fluency Score: 95/100
  Adequacy Score: 40/100
  Overall Score: 67.5/100
  Explanation: The translation has incorrect time reference.
               "Morning" was translated as "gabii" (evening).
  Suggested Correction: Maayong buntag!

Best Practices

📝 File Preparation

Use UTF-8 encoding for all text files
Keep sentences reasonably short (< 100 words)
Ensure proper sentence alignment

🎯 Choosing Languages

English is typically used as the source language
Select the appropriate Philippine language code for target

⚡ Performance

First run downloads models (may take time)
Subsequent runs are faster (models are cached)
For large files, consider batch processing

💾 Storage

Models are cached in ~/.cache/huggingface/
Ensure adequate disk space (5-10 GB recommended)

Next Steps

Now that you’ve completed your first evaluation:

For Python developers: See Python Library Usage for advanced programming examples
For CLI users: See Command-Line Interface (CLI) Usage for complete command options
For more examples: See Examples for real-world scenarios
To understand output: See Understanding Output Format for detailed result interpretation