Architecture ============ This document describes WiMarka's system architecture, design decisions, and data flow. System Overview --------------- WiMarka is designed as a modular machine translation evaluation system with a four-stage pipeline that processes sentence pairs to generate quality assessments. High-Level Architecture ----------------------- .. code-block:: text ┌──────────────────────────────────────────────────────────┐ │ User Interface │ ├───────────────────┬──────────────────────────────────────┤ │ Python Library │ Command-Line Interface (CLI) │ └────────┬──────────┴──────────────┬───────────────────────┘ │ │ └──────────┬──────────────┘ │ ┌────▼────┐ │ wmk_eval│ Main Entry Point └────┬────┘ │ ┌───────────────────┼───────────────────┐ │ Evaluation Pipeline │ │ ┌──────────────────────────────────┐ │ │ │ 1. Error Detection │ │ │ └─────────────┬────────────────────┘ │ │ │ │ │ ┌─────────────▼────────────────────┐ │ │ │ 2. Scoring │ │ │ └─────────────┬────────────────────┘ │ │ │ │ │ ┌─────────────▼────────────────────┐ │ │ │ 3. Explanation Generation │ │ │ └─────────────┬────────────────────┘ │ │ │ │ │ ┌─────────────▼────────────────────┐ │ │ │ 4. Correction Suggestion │ │ │ └─────────────┬────────────────────┘ │ └────────────────┼────────────────────────┘ │ ┌──────────▼──────────┐ │ Utilities Layer │ ├──────────────────────┤ │ • Model Management │ │ • Caching │ │ • Logging │ │ • Helper Functions │ └──────────┬───────────┘ │ ┌──────────▼──────────┐ │ Language Models │ │ • Transformer LMs │ │ • LLM (llama-cpp) │ └─────────────────────┘ Core Components --------------- Main Module (``main.py``) ~~~~~~~~~~~~~~~~~~~~~~~~~~ **Responsibility**: Orchestrates the evaluation pipeline **Key Functions**: * ``wmk_eval()``: Main entry point for evaluation * Loads source and target files * Manages results dictionary * Coordinates task modules **Design Decisions**: * Sequential processing ensures deterministic results * Global ``results`` dictionary for easy access * File-based input for batch processing CLI Module (``cli.py``) ~~~~~~~~~~~~~~~~~~~~~~~ **Responsibility**: Command-line interface **Features**: * Argument parsing with Click * Input validation * Error handling and user feedback **Integration**: Wraps ``wmk_eval()`` with CLI argument handling Task Modules ~~~~~~~~~~~~ Four independent task modules implement the evaluation pipeline: 1. **error_detection.py**: Identifies translation errors 2. **scoring.py**: Calculates quality metrics 3. **explanation.py**: Generates explanations 4. **correction.py**: Suggests corrections See :doc:`tasks` for detailed documentation. Utility Modules ~~~~~~~~~~~~~~~ Support modules provide shared functionality: * **helper.py**: File I/O, language tag management * **logger.py**: Logging configuration * **model.py**: Model loading and management * **cache.py**: Response caching * **torch.py**: PyTorch device management See :doc:`utils` for detailed documentation. Evaluation Pipeline ------------------- Stage 1: Error Detection ~~~~~~~~~~~~~~~~~~~~~~~~~ **Input**: Tagged source and target sentences .. code-block:: python src_line = "[EN] Good morning!" tgt_line = "[CEB] Maayong gabii!" **Process**: 1. Feeds sentences to error detection model 2. Analyzes syntactic and semantic differences 3. Identifies specific error types **Output**: List of detected errors .. code-block:: python errors = ['Semantic mismatch: time of day'] **Implementation Details**: * Uses LLM-based analysis * Prompt engineering for error identification * Language-specific error patterns Stage 2: Scoring ~~~~~~~~~~~~~~~~ **Input**: Source sentence, target sentence, detected errors **Process**: 1. Evaluates fluency (grammatical correctness) 2. Evaluates adequacy (meaning preservation) 3. Calculates overall score **Output**: Three numerical scores (0-100) .. code-block:: python fluency_score = 95 adequacy_score = 40 overall_score = 67.5 # Average **Scoring Algorithm**: * LLM-based scoring with structured prompts * Error count influences scores * Language-specific quality criteria Stage 3: Explanation Generation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Input**: All previous stage outputs **Process**: 1. Analyzes detected errors 2. Considers score levels 3. Generates human-readable explanation **Output**: Natural language explanation .. code-block:: python explanation = "The translation has incorrect time reference. 'Morning' was translated as 'gabii' (evening)." **Design**: * Context-aware explanation generation * References specific errors * Educational tone for clarity Stage 4: Correction Suggestion ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Input**: All previous stage outputs **Process**: 1. Analyzes errors and explanations 2. Generates improved translation 3. Validates correction quality **Output**: Suggested corrected translation .. code-block:: python corrected_translation = "Maayong buntag!" **Approach**: * Error-informed correction * Preserves correct portions * Maintains semantic equivalence Data Flow --------- Detailed data flow through the system: .. code-block:: text ┌─────────────┐ │ Input Files │ └──────┬──────┘ │ ▼ ┌─────────────────────┐ │ wmk_eval() │ │ - Load files │ │ - Validate counts │ │ - Add language tags │ └──────┬──────────────┘ │ ▼ (For each sentence pair) ┌────────────────────────┐ │ error_detection() │ │ Input: src, tgt │ │ Output: errors[] │ └──────┬─────────────────┘ │ ▼ ┌────────────────────────┐ │ scoring() │ │ Input: src, tgt, errors│ │ Output: 3 scores │ └──────┬─────────────────┘ │ ▼ ┌────────────────────────┐ │ generate_explanation() │ │ Input: all above │ │ Output: explanation │ └──────┬─────────────────┘ │ ▼ ┌────────────────────────┐ │ generate_correction() │ │ Input: all above │ │ Output: correction │ └──────┬─────────────────┘ │ ▼ ┌────────────────────────┐ │ results{} │ │ - Append all outputs │ └──────┬─────────────────┘ │ ▼ (After all sentences) ┌────────────────────────┐ │ printEvaluationResults │ └────────────────────────┘ Model Management ---------------- Model Loading Strategy ~~~~~~~~~~~~~~~~~~~~~~ WiMarka uses lazy loading and caching: 1. **First Request**: Model downloaded from HuggingFace Hub 2. **Subsequent Requests**: Loaded from local cache 3. **Memory Management**: Models loaded once per session **Cache Location**: * Windows: ``C:\Users\\.cache\huggingface\`` * macOS/Linux: ``~/.cache/huggingface/`` Model Types ~~~~~~~~~~~ WiMarka utilizes two types of models: 1. **Transformer Models** (via ``transformers`` library) * Used for: Text embeddings, classification * Format: PyTorch checkpoints 2. **LLM Models** (via ``llama-cpp-python``) * Used for: Error detection, scoring, explanation, correction * Format: GGUF quantized models * Benefits: Efficient CPU inference Device Management ~~~~~~~~~~~~~~~~~ Automatic device selection: .. code-block:: python # From torch.py device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') * CUDA GPU if available * Falls back to CPU * No manual configuration needed Performance Considerations -------------------------- Optimization Strategies ~~~~~~~~~~~~~~~~~~~~~~~ 1. **Model Caching** * Models loaded once per session * Inference results cached when possible * Reduces redundant computation 2. **Sequential Processing** * Sentences processed one at a time * Prevents memory overflow * Ensures deterministic results 3. **Efficient File I/O** * Streaming file reading * UTF-8 encoding handled properly * Minimal memory footprint Bottlenecks ~~~~~~~~~~~ Identified performance bottlenecks: * **Model Download**: First-time model download can be slow * **LLM Inference**: CPU inference slower than GPU * **Large Files**: Processing time scales linearly with file size Scalability ~~~~~~~~~~~ Current limitations and future improvements: **Current**: * Single-threaded processing * File-based input/output * In-memory results storage **Future Improvements**: * Parallel sentence processing * Streaming API * Database integration for large-scale evaluation Error Handling -------------- Error Handling Strategy ~~~~~~~~~~~~~~~~~~~~~~~ WiMarka implements defensive programming: 1. **Input Validation** * File existence checks * Line count validation * Language code verification 2. **Graceful Degradation** * Model loading failures logged * Fallback mechanisms where possible 3. **Informative Errors** * Clear error messages * Actionable suggestions Exception Hierarchy ~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Common exceptions FileNotFoundError # Input files missing ValueError # Invalid arguments, mismatched line counts RuntimeError # Model loading failures Logging ------- Logging Architecture ~~~~~~~~~~~~~~~~~~~~ Structured logging at multiple levels: .. code-block:: python # From logger.py logger.info("Starting evaluation...") # Progress logger.warning("Model cache miss") # Warnings logger.error("Failed to load model") # Errors logger.debug("Intermediate result: ...") # Debugging **Log Levels**: * ``INFO``: Progress and status updates * ``WARNING``: Non-critical issues * ``ERROR``: Failures and exceptions * ``DEBUG``: Detailed debugging information (disabled by default) Configuration Management ------------------------ Configuration Strategy ~~~~~~~~~~~~~~~~~~~~~~ WiMarka uses ``config.py`` for centralized configuration: * Model paths and identifiers * Hyperparameters * Default settings * API endpoints **Design Principle**: Configuration separate from code allows easy customization without modifying source. Extensibility Points -------------------- WiMarka is designed for extension: 1. **New Languages** * Add language codes to ``config.py`` * Update helper functions for new tags * Train/add language-specific models 2. **New Tasks** * Create new module in ``tasks/`` * Integrate into ``main.py`` pipeline * Update results dictionary structure 3. **New Models** * Add model identifiers to ``config.py`` * Update ``model.py`` loading logic * Ensure compatibility with existing interfaces 4. **Alternative Interfaces** * Web API wrapper * GUI application * Integration with other tools See :doc:`extending` for detailed guides on extending WiMarka. Design Patterns --------------- Patterns Used in WiMarka ~~~~~~~~~~~~~~~~~~~~~~~~ 1. **Pipeline Pattern** * Sequential task execution * Each stage processes and passes data * Clear separation of concerns 2. **Lazy Initialization** * Models loaded on first use * Reduces startup time * Efficient resource usage 3. **Facade Pattern** * ``wmk_eval()`` provides simple interface * Complex pipeline hidden from users * Easy to use, hard to misuse 4. **Singleton Pattern** * Global results dictionary * Logger instance * Model cache Trade-offs ~~~~~~~~~~ **Simplicity vs. Flexibility**: * Current: Simple API, less configuration * Trade-off: Limited customization options **Speed vs. Accuracy**: * Current: CPU inference for accessibility * Trade-off: Slower than GPU-optimized solutions **Memory vs. Speed**: * Current: Sequential processing * Trade-off: Slower but memory-efficient Future Architecture Improvements --------------------------------- Planned Enhancements ~~~~~~~~~~~~~~~~~~~~ 1. **Asynchronous Processing** * Non-blocking I/O * Parallel sentence evaluation * Progress callbacks 2. **Streaming API** * Process large files efficiently * Real-time results * Lower memory usage 3. **Plugin System** * Third-party task modules * Custom scoring algorithms * Community extensions 4. **Distributed Evaluation** * Multi-machine processing * Cloud deployment * Horizontal scaling References ---------- * HuggingFace Transformers: https://huggingface.co/docs/transformers/ * llama.cpp: https://github.com/ggerganov/llama.cpp * Click CLI Framework: https://click.palletsprojects.com/ Next Steps ---------- * See :doc:`api_reference` for detailed API documentation * See :doc:`tasks` for task module internals * See :doc:`extending` for customization guides