Technical Manual
Welcome to the WiMarka Technical Manual. This section provides comprehensive technical documentation for developers, contributors, and researchers who want to understand WiMarka’s internals or extend its functionality.
What You’ll Find Here
This Technical Manual covers:
Architecture: System design, components, and data flow
API Reference: Complete API documentation with function signatures and parameters
Task Modules: Detailed documentation of evaluation pipeline components
Utility Modules: Helper functions, model management, and caching
Models: Information about the underlying language models
Development: Setting up a development environment and contributing
Extending: Guide for adding new features and languages
Who This Manual Is For
This manual is intended for:
Developers: Building on top of WiMarka or integrating it into systems
Contributors: Adding features, fixing bugs, or improving documentation
Researchers: Understanding the evaluation methodology and models
Advanced Users: Customizing WiMarka for specific needs
Prerequisites
To work with WiMarka’s internals, you should be familiar with:
Python programming (>= 3.12)
Machine learning basics
Natural language processing concepts
PyTorch fundamentals
Git and version control
Documentation Structure
The Technical Manual is organized into the following sections:
System Overview
Start here to understand WiMarka’s high-level architecture:
Architecture - System design and evaluation pipeline
Language Models - Language models and their characteristics
API Documentation
Detailed API reference for all modules:
API Reference - Main functions and classes
Task Modules - Task modules (error detection, scoring, etc.)
Utility Modules - Utility modules (helpers, logging, caching)
Development & Extension
Guides for contributing and extending:
Development Guide - Development environment setup and workflows
Extending WiMarka - Adding new features and languages
Getting Started with Development
Quick setup for developers:
# Clone the repository
git clone https://github.com/wimarka-uic/WiMarka.git
cd WiMarka
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\\Scripts\\activate
# Install in development mode
pip install -e .
# Run tests
cd test
python main.py
Key Concepts
Evaluation Pipeline
WiMarka uses a four-stage pipeline:
Error Detection: Identifies translation errors
Scoring: Calculates fluency, adequacy, and overall scores
Explanation: Generates human-readable error explanations
Correction: Suggests improved translations
Language Models
WiMarka leverages:
Transformer-based models for semantic understanding
LLM-based evaluation using llama-cpp-python
Cached inference for performance optimization
Design Principles
Modularity: Each task is an independent module
Extensibility: Easy to add new languages and tasks
Performance: Caching and optimization for speed
Interpretability: Explainable results at every stage
Architecture Diagram
High-level system architecture:
┌─────────────────────┐
│ wmk_eval() │ Main Entry Point
└──────────┬──────────┘
│
├───► Error Detection ──► Identify translation errors
│
├───► Scoring ──► Calculate quality metrics
│
├───► Explanation ──► Generate human explanations
│
└───► Correction ──► Suggest improvements
│
▼
┌──────────────┐
│ Results │
└──────────────┘
Module Organization
wimarka/
├── main.py # Core evaluation logic
├── cli.py # Command-line interface
├── config.py # Configuration settings
├── tasks/ # Evaluation tasks
│ ├── error_detection.py
│ ├── scoring.py
│ ├── explanation.py
│ └── correction.py
└── utils/ # Utilities
├── helper.py
├── logger.py
├── model.py
├── cache.py
└── torch.py
Contents
Contributing
We welcome contributions! See the Development Guide guide for:
Setting up your development environment
Code style guidelines
Testing procedures
Submitting pull requests
Support
For technical questions:
GitHub Issues: Report bugs or request features
GitHub Discussions: Ask questions or share ideas
—
Note: For user-oriented documentation, see the User Manual.