Quick Start Guide

Get up and running with WiMarka in just a few minutes! This guide will walk you through your first translation evaluation.

Prerequisites

Before starting, make sure you have:

✅ Installed WiMarka (see Installation)

✅ Python 3.12 or higher

✅ Two text files: source text and its translation

Your First Evaluation

Step 1: Prepare Your Input Files

Create two text files with parallel sentences:

source_file.txt (English):

Good morning!
How are you today?
Thank you for your help.

target_file.txt (Cebuano translation):

Maayong buntag!
Kumusta ka karon?
Salamat sa imong tabang.

Important

File Format Requirements:

  • One sentence per line

  • UTF-8 encoding

  • Same number of lines in both files

  • Lines correspond (line 1 in source matches line 1 in target)

Step 2: Run WiMarka

You can use either the Python library or the command-line interface.

Option A: Using Python

Create a Python script (evaluate.py):

from wimarka.main import wmk_eval

wmk_eval(
    src_file_path='source_file.txt',
    src_lang='EN',
    tgt_file_path='target_file.txt',
    tgt_lang='CEB'
)

Run the script:

python evaluate.py

Option B: Using CLI

Run directly from the command line:

wimarka --src_file_path source_file.txt \\
        --src_lang EN \\
        --tgt_file_path target_file.txt \\
        --tgt_lang CEB

Step 3: Understanding the Output

WiMarka will process each sentence pair and display progress:

INFO - Starting evaluation...
INFO - Evaluating line 1/3
INFO - Detecting errors...
INFO - Scoring translation...
INFO - Generating explanation...
INFO - Correcting translation...
INFO - Evaluating line 2/3
...

After processing all sentences, you’ll see detailed results:

=== Evaluation Results ===
----------------------------------------
Line 1:
  Source: Good morning!
  Target: Maayong buntag!
  Errors: []
  Fluency Score: 100/100
  Adequacy Score: 100/100
  Overall Score: 100/100
  Explanation: Excellent translation with no errors.
  Suggested Correction: Maayong buntag!
----------------------------------------
Line 2:
  Source: How are you today?
  Target: Kumusta ka karon?
  Errors: []
  Fluency Score: 98/100
  Adequacy Score: 95/100
  Overall Score: 96.5/100
  Explanation: Very good translation, minor fluency variation.
  Suggested Correction: Kumusta ka karon?
----------------------------------------

Evaluation completed.

Understanding the Scores

WiMarka provides three types of scores for each translation:

Fluency Score (0-100)

Measures how natural and grammatically correct the translation reads in the target language.

  • 90-100: Excellent, native-like fluency

  • 70-89: Good fluency with minor issues

  • 50-69: Acceptable but noticeable problems

  • Below 50: Poor fluency, difficult to understand

Adequacy Score (0-100)

Evaluates how well the translation preserves the meaning of the source text.

  • 90-100: Complete meaning preservation

  • 70-89: Most meaning preserved, minor omissions

  • 50-69: Partial meaning loss

  • Below 50: Significant meaning loss

Overall Score (0-100)

Combines fluency and adequacy into a single quality metric.

  • Calculated as: (Fluency + Adequacy) / 2

  • Provides a quick quality assessment

Common Language Codes

Use these codes when specifying source and target languages:

Code

Language

Usage Example

EN

English

--src_lang EN

CEB

Cebuano

--tgt_lang CEB

ILO

Ilocano

--tgt_lang ILO

TGT

Tagalog

--tgt_lang TGT

See Supported Languages for complete language information.

Example with Errors

Let’s try an evaluation with translation errors:

source_error.txt:

Good morning!

target_error.txt:

Maayong gabii!

Note

“Gabii” means “evening” in Cebuano, which is incorrect for “morning”

Run the evaluation:

wimarka --src_file_path source_error.txt \\
        --src_lang EN \\
        --tgt_file_path target_error.txt \\
        --tgt_lang CEB

Expected output:

Line 1:
  Source: Good morning!
  Target: Maayong gabii!
  Errors: [Semantic mismatch: time of day]
  Fluency Score: 95/100
  Adequacy Score: 40/100
  Overall Score: 67.5/100
  Explanation: The translation has incorrect time reference.
               "Morning" was translated as "gabii" (evening).
  Suggested Correction: Maayong buntag!

Best Practices

📝 File Preparation
  • Use UTF-8 encoding for all text files

  • Keep sentences reasonably short (< 100 words)

  • Ensure proper sentence alignment

🎯 Choosing Languages
  • English is typically used as the source language

  • Select the appropriate Philippine language code for target

Performance
  • First run downloads models (may take time)

  • Subsequent runs are faster (models are cached)

  • For large files, consider batch processing

💾 Storage
  • Models are cached in ~/.cache/huggingface/

  • Ensure adequate disk space (5-10 GB recommended)

Next Steps

Now that you’ve completed your first evaluation: