Command-Line Interface (CLI) Usage

This guide covers using WiMarka from the command line for quick and efficient translation evaluation.

Basic Command

The basic syntax for the WiMarka CLI is:

wimarka --src_file_path <source_file> \\
        --src_lang <source_language> \\
        --tgt_file_path <target_file> \\
        --tgt_lang <target_language>

Example:

wimarka --src_file_path english.txt \\
        --src_lang EN \\
        --tgt_file_path cebuano.txt \\
        --tgt_lang CEB

Command Options

Required Options

Option

Type

Description

--src_file_path

String

Path to the source text file

--src_lang

String

Source language code (EN, CEB, ILO, TGT)

--tgt_file_path

String

Path to the target translation file

--tgt_lang

String

Target language code (CEB, ILO, TGT)

Optional Options

Option

Type

Description

-h, --help

Flag

Show help message and exit

Getting Help

Display the help message:

wimarka --help

Output:

Usage: wimarka [OPTIONS]

Evaluate machine translation quality using WiMarka.

Options:
  --src_file_path TEXT  Path to source text file  [required]
  --src_lang TEXT       Source language code (EN, CEB, ILO, TGT)  [required]
  --tgt_file_path TEXT  Path to target text file  [required]
  --tgt_lang TEXT       Target language code (CEB, ILO, TGT)  [required]
  -h, --help            Show this message and exit.

CLI Examples

Example 1: English to Cebuano

wimarka --src_file_path data/english.txt \\
        --src_lang EN \\
        --tgt_file_path data/cebuano.txt \\
        --tgt_lang CEB

Example 2: English to Ilocano

wimarka --src_file_path sources/en_sentences.txt \\
        --src_lang EN \\
        --tgt_file_path translations/ilo_sentences.txt \\
        --tgt_lang ILO

Example 3: English to Tagalog

wimarka --src_file_path ~/documents/english.txt \\
        --src_lang EN \\
        --tgt_file_path ~/documents/tagalog.txt \\
        --tgt_lang TGT

Example 4: Relative Paths

# Using relative paths
wimarka --src_file_path ./test/source.txt \\
        --src_lang EN \\
        --tgt_file_path ./test/target.txt \\
        --tgt_lang CEB

Example 5: Absolute Paths

# Using absolute paths (recommended for scripts)
wimarka --src_file_path /home/user/data/source.txt \\
        --src_lang EN \\
        --tgt_file_path /home/user/data/translation.txt \\
        --tgt_lang CEB

Working with Output

Console Output

WiMarka prints evaluation progress and results to the console:

INFO - Starting evaluation...
INFO - Evaluating line 1/3
INFO - Detecting errors...
INFO - Scoring translation...
INFO - Generating explanation...
INFO - Correcting translation...

=== Evaluation Results ===
----------------------------------------
Line 1:
  Source: Good morning!
  Target: Maayong buntag!
  Errors: []
  Fluency Score: 100/100
  Adequacy Score: 100/100
  Overall Score: 100/100
  Explanation: Perfect translation with correct meaning and grammar.
  Suggested Correction: Maayong buntag!
----------------------------------------

Redirecting Output to File

Save evaluation results to a file:

wimarka --src_file_path source.txt \\
        --src_lang EN \\
        --tgt_file_path target.txt \\
        --tgt_lang CEB > results.txt

Append to existing file:

wimarka --src_file_path source.txt \\
        --src_lang EN \\
        --tgt_file_path target.txt \\
        --tgt_lang CEB >> all_results.txt

Suppressing Progress Messages

To save only results without progress messages:

wimarka --src_file_path source.txt \\
        --src_lang EN \\
        --tgt_file_path target.txt \\
        --tgt_lang CEB 2>/dev/null > results.txt

Batch Processing

Process Multiple File Pairs (Bash)

#!/bin/bash

# List of file pairs
pairs=(
    "file1_en.txt:file1_ceb.txt:CEB"
    "file2_en.txt:file2_ilo.txt:ILO"
    "file3_en.txt:file3_tgt.txt:TGT"
)

# Process each pair
for pair in "${pairs[@]}"; do
    IFS=':' read -r src_file tgt_file tgt_lang <<< "$pair"

    echo "Evaluating $src_file -> $tgt_file"

    wimarka --src_file_path "$src_file" \\
            --src_lang EN \\
            --tgt_file_path "$tgt_file" \\
            --tgt_lang "$tgt_lang"

    echo "---"
done

Process All Files in Directory

#!/bin/bash

# Process all English files and their Cebuano translations
for src_file in data/en/*.txt; do
    # Get base filename
    base=$(basename "$src_file" .txt)
    tgt_file="data/ceb/${base}.txt"

    if [ -f "$tgt_file" ]; then
        echo "Evaluating: $base"
        wimarka --src_file_path "$src_file" \\
                --src_lang EN \\
                --tgt_file_path "$tgt_file" \\
                --tgt_lang CEB
    else
        echo "Warning: Translation not found for $base"
    fi
done

Parallel Processing (GNU Parallel)

For faster processing of multiple file pairs:

# Create a file list
cat > filelist.txt <<EOF
src1.txt tgt1.txt CEB
src2.txt tgt2.txt ILO
src3.txt tgt3.txt TGT
EOF

# Process in parallel (4 jobs at once)
parallel -j 4 --colsep ' ' \\
    wimarka --src_file_path {1} --src_lang EN \\
            --tgt_file_path {2} --tgt_lang {3} \\
    :::: filelist.txt

Integration Examples

Integration with Make

Makefile:

.PHONY: evaluate clean

SRC_DIR = data/source
TGT_DIR = data/translations
RESULTS_DIR = results

evaluate:
\t@mkdir -p $(RESULTS_DIR)
\t@for src in $(SRC_DIR)/*.txt; do \\
\t\tbase=$$(basename $$src .txt); \\
\t\ttgt=$(TGT_DIR)/$$base.txt; \\
\t\tif [ -f $$tgt ]; then \\
\t\t\techo "Evaluating $$base..."; \\
\t\t\twimarka --src_file_path $$src --src_lang EN \\
\t\t\t        --tgt_file_path $$tgt --tgt_lang CEB \\
\t\t\t        > $(RESULTS_DIR)/$$base.txt; \\
\t\tfi; \\
\tdone

clean:
\trm -rf $(RESULTS_DIR)

Usage:

make evaluate

Integration with Python Scripts

Call WiMarka CLI from Python:

import subprocess
import sys

def run_wimarka(src_file, tgt_file, src_lang='EN', tgt_lang='CEB'):
    """Run WiMarka CLI from Python."""
    cmd = [
        'wimarka',
        '--src_file_path', src_file,
        '--src_lang', src_lang,
        '--tgt_file_path', tgt_file,
        '--tgt_lang', tgt_lang
    ]

    try:
        result = subprocess.run(
            cmd,
            capture_output=True,
            text=True,
            check=True
        )
        print(result.stdout)
        return result.returncode == 0
    except subprocess.CalledProcessError as e:
        print(f"Error: {e.stderr}", file=sys.stderr)
        return False

# Usage
success = run_wimarka('source.txt', 'translation.txt')
if success:
    print("Evaluation completed")

Error Handling

Common Errors and Solutions

File Not Found:

Error: [Errno 2] No such file or directory: 'source.txt'

Solution: Check file paths and ensure files exist

ls -la source.txt target.txt

Line Count Mismatch:

ValueError: Source and target files must have the same number of lines.

Solution: Verify both files have equal line counts

wc -l source.txt target.txt

Invalid Language Code:

Error: Invalid language code

Solution: Use valid codes (EN, CEB, ILO, TGT)

Exit Codes

  • 0: Success

  • 1: Error (file not found, invalid arguments, etc.)

  • 2: Command line usage error

Check exit code in scripts:

wimarka --src_file_path source.txt \\
        --src_lang EN \\
        --tgt_file_path target.txt \\
        --tgt_lang CEB

if [ $? -eq 0 ]; then
    echo "Success"
else
    echo "Failed"
    exit 1
fi

Best Practices

  1. Use Absolute Paths in Scripts

    # Good
    wimarka --src_file_path /home/user/data/source.txt ...
    
    # Avoid in scripts (relative paths can be ambiguous)
    wimarka --src_file_path ../data/source.txt ...
    
  2. Validate Inputs Before Running

    if [ ! -f "$src_file" ]; then
        echo "Error: Source file not found"
        exit 1
    fi
    
  3. Log Results for Reproducibility

    timestamp=$(date +%Y%m%d_%H%M%S)
    wimarka ... > "results_${timestamp}.txt"
    
  4. Use Meaningful File Names

    # Good
    wimarka --src_file_path en_news_articles.txt \\
            --tgt_file_path ceb_news_articles.txt ...
    
    # Avoid
    wimarka --src_file_path file1.txt --tgt_file_path file2.txt ...
    

Tips and Tricks

Quick Evaluation of Single Sentence

# Create temporary files
echo "Good morning!" > /tmp/src.txt
echo "Maayong buntag!" > /tmp/tgt.txt

# Evaluate
wimarka --src_file_path /tmp/src.txt \\
        --src_lang EN \\
        --tgt_file_path /tmp/tgt.txt \\
        --tgt_lang CEB

Comparing Translation Systems

# Evaluate System A
wimarka --src_file_path source.txt \\
        --src_lang EN \\
        --tgt_file_path system_a.txt \\
        --tgt_lang CEB > results_a.txt

# Evaluate System B
wimarka --src_file_path source.txt \\
        --src_lang EN \\
        --tgt_file_path system_b.txt \\
        --tgt_lang CEB > results_b.txt

# Compare
diff results_a.txt results_b.txt

Next Steps