Command-Line Interface (CLI) Usage
This guide covers using WiMarka from the command line for quick and efficient translation evaluation.
Basic Command
The basic syntax for the WiMarka CLI is:
wimarka --src_file_path <source_file> \\
--src_lang <source_language> \\
--tgt_file_path <target_file> \\
--tgt_lang <target_language>
Example:
wimarka --src_file_path english.txt \\
--src_lang EN \\
--tgt_file_path cebuano.txt \\
--tgt_lang CEB
Command Options
Required Options
Option |
Type |
Description |
|---|---|---|
|
String |
Path to the source text file |
|
String |
Source language code (EN, CEB, ILO, TGT) |
|
String |
Path to the target translation file |
|
String |
Target language code (CEB, ILO, TGT) |
Optional Options
Option |
Type |
Description |
|---|---|---|
|
Flag |
Show help message and exit |
Getting Help
Display the help message:
wimarka --help
Output:
Usage: wimarka [OPTIONS]
Evaluate machine translation quality using WiMarka.
Options:
--src_file_path TEXT Path to source text file [required]
--src_lang TEXT Source language code (EN, CEB, ILO, TGT) [required]
--tgt_file_path TEXT Path to target text file [required]
--tgt_lang TEXT Target language code (CEB, ILO, TGT) [required]
-h, --help Show this message and exit.
CLI Examples
Example 1: English to Cebuano
wimarka --src_file_path data/english.txt \\
--src_lang EN \\
--tgt_file_path data/cebuano.txt \\
--tgt_lang CEB
Example 2: English to Ilocano
wimarka --src_file_path sources/en_sentences.txt \\
--src_lang EN \\
--tgt_file_path translations/ilo_sentences.txt \\
--tgt_lang ILO
Example 3: English to Tagalog
wimarka --src_file_path ~/documents/english.txt \\
--src_lang EN \\
--tgt_file_path ~/documents/tagalog.txt \\
--tgt_lang TGT
Example 4: Relative Paths
# Using relative paths
wimarka --src_file_path ./test/source.txt \\
--src_lang EN \\
--tgt_file_path ./test/target.txt \\
--tgt_lang CEB
Example 5: Absolute Paths
# Using absolute paths (recommended for scripts)
wimarka --src_file_path /home/user/data/source.txt \\
--src_lang EN \\
--tgt_file_path /home/user/data/translation.txt \\
--tgt_lang CEB
Working with Output
Console Output
WiMarka prints evaluation progress and results to the console:
INFO - Starting evaluation...
INFO - Evaluating line 1/3
INFO - Detecting errors...
INFO - Scoring translation...
INFO - Generating explanation...
INFO - Correcting translation...
=== Evaluation Results ===
----------------------------------------
Line 1:
Source: Good morning!
Target: Maayong buntag!
Errors: []
Fluency Score: 100/100
Adequacy Score: 100/100
Overall Score: 100/100
Explanation: Perfect translation with correct meaning and grammar.
Suggested Correction: Maayong buntag!
----------------------------------------
Redirecting Output to File
Save evaluation results to a file:
wimarka --src_file_path source.txt \\
--src_lang EN \\
--tgt_file_path target.txt \\
--tgt_lang CEB > results.txt
Append to existing file:
wimarka --src_file_path source.txt \\
--src_lang EN \\
--tgt_file_path target.txt \\
--tgt_lang CEB >> all_results.txt
Suppressing Progress Messages
To save only results without progress messages:
wimarka --src_file_path source.txt \\
--src_lang EN \\
--tgt_file_path target.txt \\
--tgt_lang CEB 2>/dev/null > results.txt
Batch Processing
Process Multiple File Pairs (Bash)
#!/bin/bash
# List of file pairs
pairs=(
"file1_en.txt:file1_ceb.txt:CEB"
"file2_en.txt:file2_ilo.txt:ILO"
"file3_en.txt:file3_tgt.txt:TGT"
)
# Process each pair
for pair in "${pairs[@]}"; do
IFS=':' read -r src_file tgt_file tgt_lang <<< "$pair"
echo "Evaluating $src_file -> $tgt_file"
wimarka --src_file_path "$src_file" \\
--src_lang EN \\
--tgt_file_path "$tgt_file" \\
--tgt_lang "$tgt_lang"
echo "---"
done
Process All Files in Directory
#!/bin/bash
# Process all English files and their Cebuano translations
for src_file in data/en/*.txt; do
# Get base filename
base=$(basename "$src_file" .txt)
tgt_file="data/ceb/${base}.txt"
if [ -f "$tgt_file" ]; then
echo "Evaluating: $base"
wimarka --src_file_path "$src_file" \\
--src_lang EN \\
--tgt_file_path "$tgt_file" \\
--tgt_lang CEB
else
echo "Warning: Translation not found for $base"
fi
done
Parallel Processing (GNU Parallel)
For faster processing of multiple file pairs:
# Create a file list
cat > filelist.txt <<EOF
src1.txt tgt1.txt CEB
src2.txt tgt2.txt ILO
src3.txt tgt3.txt TGT
EOF
# Process in parallel (4 jobs at once)
parallel -j 4 --colsep ' ' \\
wimarka --src_file_path {1} --src_lang EN \\
--tgt_file_path {2} --tgt_lang {3} \\
:::: filelist.txt
Integration Examples
Integration with Make
Makefile:
.PHONY: evaluate clean
SRC_DIR = data/source
TGT_DIR = data/translations
RESULTS_DIR = results
evaluate:
\t@mkdir -p $(RESULTS_DIR)
\t@for src in $(SRC_DIR)/*.txt; do \\
\t\tbase=$$(basename $$src .txt); \\
\t\ttgt=$(TGT_DIR)/$$base.txt; \\
\t\tif [ -f $$tgt ]; then \\
\t\t\techo "Evaluating $$base..."; \\
\t\t\twimarka --src_file_path $$src --src_lang EN \\
\t\t\t --tgt_file_path $$tgt --tgt_lang CEB \\
\t\t\t > $(RESULTS_DIR)/$$base.txt; \\
\t\tfi; \\
\tdone
clean:
\trm -rf $(RESULTS_DIR)
Usage:
make evaluate
Integration with Python Scripts
Call WiMarka CLI from Python:
import subprocess
import sys
def run_wimarka(src_file, tgt_file, src_lang='EN', tgt_lang='CEB'):
"""Run WiMarka CLI from Python."""
cmd = [
'wimarka',
'--src_file_path', src_file,
'--src_lang', src_lang,
'--tgt_file_path', tgt_file,
'--tgt_lang', tgt_lang
]
try:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
check=True
)
print(result.stdout)
return result.returncode == 0
except subprocess.CalledProcessError as e:
print(f"Error: {e.stderr}", file=sys.stderr)
return False
# Usage
success = run_wimarka('source.txt', 'translation.txt')
if success:
print("Evaluation completed")
Error Handling
Common Errors and Solutions
File Not Found:
Error: [Errno 2] No such file or directory: 'source.txt'
Solution: Check file paths and ensure files exist
ls -la source.txt target.txt
Line Count Mismatch:
ValueError: Source and target files must have the same number of lines.
Solution: Verify both files have equal line counts
wc -l source.txt target.txt
Invalid Language Code:
Error: Invalid language code
Solution: Use valid codes (EN, CEB, ILO, TGT)
Exit Codes
0: Success1: Error (file not found, invalid arguments, etc.)2: Command line usage error
Check exit code in scripts:
wimarka --src_file_path source.txt \\
--src_lang EN \\
--tgt_file_path target.txt \\
--tgt_lang CEB
if [ $? -eq 0 ]; then
echo "Success"
else
echo "Failed"
exit 1
fi
Best Practices
Use Absolute Paths in Scripts
# Good wimarka --src_file_path /home/user/data/source.txt ... # Avoid in scripts (relative paths can be ambiguous) wimarka --src_file_path ../data/source.txt ...
Validate Inputs Before Running
if [ ! -f "$src_file" ]; then echo "Error: Source file not found" exit 1 fi
Log Results for Reproducibility
timestamp=$(date +%Y%m%d_%H%M%S) wimarka ... > "results_${timestamp}.txt"
Use Meaningful File Names
# Good wimarka --src_file_path en_news_articles.txt \\ --tgt_file_path ceb_news_articles.txt ... # Avoid wimarka --src_file_path file1.txt --tgt_file_path file2.txt ...
Tips and Tricks
Quick Evaluation of Single Sentence
# Create temporary files
echo "Good morning!" > /tmp/src.txt
echo "Maayong buntag!" > /tmp/tgt.txt
# Evaluate
wimarka --src_file_path /tmp/src.txt \\
--src_lang EN \\
--tgt_file_path /tmp/tgt.txt \\
--tgt_lang CEB
Comparing Translation Systems
# Evaluate System A
wimarka --src_file_path source.txt \\
--src_lang EN \\
--tgt_file_path system_a.txt \\
--tgt_lang CEB > results_a.txt
# Evaluate System B
wimarka --src_file_path source.txt \\
--src_lang EN \\
--tgt_file_path system_b.txt \\
--tgt_lang CEB > results_b.txt
# Compare
diff results_a.txt results_b.txt
Next Steps
See Understanding Output Format to understand the evaluation output
See Examples for more real-world scenarios
See Python Library Usage for Python library usage