WiMarka Documentation
WiMarka is a comprehensive Python library and CLI tool designed for evaluating machine translations with advanced syntactic and semantic analysis, providing detailed interpretability for Philippine Languages.
Overview
WiMarka addresses the critical need for accurate machine translation evaluation in Philippine languages. It goes beyond simple metrics by providing:
Error Detection: Identifies specific translation errors between source and target texts
Multi-dimensional Scoring: Evaluates translations across fluency, adequacy, and overall quality
Explainability: Generates human-readable explanations for detected errors
Correction Suggestions: Provides corrected translation alternatives
Philippine Language Focus: Specialized support for Cebuano (CEB), Ilocano (ILO), and Tagalog (TGT)
Key Features
- ✨ Advanced Error Detection
Sophisticated algorithms identify translation inconsistencies and errors
- 📊 Multi-dimensional Scoring
Fluency Score: Measures how natural the translation reads
Adequacy Score: Evaluates semantic completeness and accuracy
Overall Quality Score: Comprehensive translation quality assessment
- 💡 Explainable Results
Detailed explanations for each detected error
- 🔧 Correction Suggestions
AI-powered suggestions for improving translations
- 🖥️ Dual Interface
Both Python library and CLI for flexible integration
- 🌏 Philippine Language Support
Specialized models for CEB, ILO, and TGT
Quick Start
Installation
pip install git+https://github.com/wimarka-uic/WiMarka.git
Basic Usage
Python Library:
from wimarka.main import wmk_eval
wmk_eval(
src_file_path='source_file.txt',
src_lang='EN',
tgt_file_path='target_file.txt',
tgt_lang='CEB'
)
Command Line:
wimarka --src_file_path source_file.txt \\
--src_lang EN \\
--tgt_file_path target_file.txt \\
--tgt_lang CEB
Documentation Structure
This documentation is organized into two main sections:
- User Manual
Complete guide for using WiMarka, including installation, usage examples, and best practices.
- Technical Manual
In-depth technical documentation covering architecture, API reference, and development guidelines.
User Manual
Support & Contributing
For questions, issues, or suggestions:
Issues: GitHub Issues
Discussions: GitHub Discussions
We welcome contributions! See the Development Guide guide for details.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Citation
If you use WiMarka in your research, please cite:
@software{wimarka2025,
title={WiMarka: A Reference-free Evaluation Metric for Machine
Translation of Philippine Languages},
author={University of the Immaculate Conception},
year={2025},
url={https://github.com/wimarka-uic/WiMarka}
}