Architecture
============

This document describes WiMarka's system architecture, design decisions, and data flow.

System Overview
---------------

WiMarka is designed as a modular machine translation evaluation system with a four-stage pipeline that processes sentence pairs to generate quality assessments.

High-Level Architecture
-----------------------

.. code-block:: text

    ┌──────────────────────────────────────────────────────────┐
    │                     User Interface                        │
    ├───────────────────┬──────────────────────────────────────┤
    │  Python Library   │     Command-Line Interface (CLI)     │
    └────────┬──────────┴──────────────┬───────────────────────┘
             │                          │
             └──────────┬──────────────┘
                        │
                   ┌────▼────┐
                   │ wmk_eval│  Main Entry Point
                   └────┬────┘
                        │
    ┌───────────────────┼───────────────────┐
    │              Evaluation Pipeline       │
    │  ┌──────────────────────────────────┐  │
    │  │  1. Error Detection              │  │
    │  └─────────────┬────────────────────┘  │
    │                │                        │
    │  ┌─────────────▼────────────────────┐  │
    │  │  2. Scoring                      │  │
    │  └─────────────┬────────────────────┘  │
    │                │                        │
    │  ┌─────────────▼────────────────────┐  │
    │  │  3. Explanation Generation       │  │
    │  └─────────────┬────────────────────┘  │
    │                │                        │
    │  ┌─────────────▼────────────────────┐  │
    │  │  4. Correction Suggestion        │  │
    │  └─────────────┬────────────────────┘  │
    └────────────────┼────────────────────────┘
                     │
          ┌──────────▼──────────┐
          │  Utilities Layer     │
          ├──────────────────────┤
          │  • Model Management  │
          │  • Caching           │
          │  • Logging           │
          │  • Helper Functions  │
          └──────────┬───────────┘
                     │
          ┌──────────▼──────────┐
          │   Language Models    │
          │  • Transformer LMs   │
          │  • LLM (llama-cpp)   │
          └─────────────────────┘

Core Components
---------------

Main Module (``main.py``)
~~~~~~~~~~~~~~~~~~~~~~~~~~

**Responsibility**: Orchestrates the evaluation pipeline

**Key Functions**:

* ``wmk_eval()``: Main entry point for evaluation
* Loads source and target files
* Manages results dictionary
* Coordinates task modules

**Design Decisions**:

* Sequential processing ensures deterministic results
* Global ``results`` dictionary for easy access
* File-based input for batch processing

CLI Module (``cli.py``)
~~~~~~~~~~~~~~~~~~~~~~~

**Responsibility**: Command-line interface

**Features**:

* Argument parsing with Click
* Input validation
* Error handling and user feedback

**Integration**: Wraps ``wmk_eval()`` with CLI argument handling

Task Modules
~~~~~~~~~~~~

Four independent task modules implement the evaluation pipeline:

1. **error_detection.py**: Identifies translation errors
2. **scoring.py**: Calculates quality metrics
3. **explanation.py**: Generates explanations
4. **correction.py**: Suggests corrections

See :doc:`tasks` for detailed documentation.

Utility Modules
~~~~~~~~~~~~~~~

Support modules provide shared functionality:

* **helper.py**: File I/O, language tag management
* **logger.py**: Logging configuration
* **model.py**: Model loading and management
* **cache.py**: Response caching
* **torch.py**: PyTorch device management

See :doc:`utils` for detailed documentation.

Evaluation Pipeline
-------------------

Stage 1: Error Detection
~~~~~~~~~~~~~~~~~~~~~~~~~

**Input**: Tagged source and target sentences

.. code-block:: python

   src_line = "[EN] Good morning!"
   tgt_line = "[CEB] Maayong gabii!"

**Process**:

1. Feeds sentences to error detection model
2. Analyzes syntactic and semantic differences
3. Identifies specific error types

**Output**: List of detected errors

.. code-block:: python

   errors = ['Semantic mismatch: time of day']

**Implementation Details**:

* Uses LLM-based analysis
* Prompt engineering for error identification
* Language-specific error patterns

Stage 2: Scoring
~~~~~~~~~~~~~~~~

**Input**: Source sentence, target sentence, detected errors

**Process**:

1. Evaluates fluency (grammatical correctness)
2. Evaluates adequacy (meaning preservation)
3. Calculates overall score

**Output**: Three numerical scores (0-100)

.. code-block:: python

   fluency_score = 95
   adequacy_score = 40
   overall_score = 67.5  # Average

**Scoring Algorithm**:

* LLM-based scoring with structured prompts
* Error count influences scores
* Language-specific quality criteria

Stage 3: Explanation Generation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Input**: All previous stage outputs

**Process**:

1. Analyzes detected errors
2. Considers score levels
3. Generates human-readable explanation

**Output**: Natural language explanation

.. code-block:: python

    explanation = "The translation has incorrect time reference. 'Morning' was translated as 'gabii' (evening)."

**Design**:

* Context-aware explanation generation
* References specific errors
* Educational tone for clarity

Stage 4: Correction Suggestion
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Input**: All previous stage outputs

**Process**:

1. Analyzes errors and explanations
2. Generates improved translation
3. Validates correction quality

**Output**: Suggested corrected translation

.. code-block:: python

   corrected_translation = "Maayong buntag!"

**Approach**:

* Error-informed correction
* Preserves correct portions
* Maintains semantic equivalence

Data Flow
---------

Detailed data flow through the system:

.. code-block:: text

   ┌─────────────┐
   │ Input Files │
   └──────┬──────┘
          │
          ▼
   ┌─────────────────────┐
   │ wmk_eval()          │
   │ - Load files        │
   │ - Validate counts   │
   │ - Add language tags │
   └──────┬──────────────┘
          │
          ▼  (For each sentence pair)
   ┌────────────────────────┐
   │ error_detection()      │
   │ Input: src, tgt        │
   │ Output: errors[]       │
   └──────┬─────────────────┘
          │
          ▼
   ┌────────────────────────┐
   │ scoring()              │
   │ Input: src, tgt, errors│
   │ Output: 3 scores       │
   └──────┬─────────────────┘
          │
          ▼
   ┌────────────────────────┐
   │ generate_explanation() │
   │ Input: all above       │
   │ Output: explanation    │
   └──────┬─────────────────┘
          │
          ▼
   ┌────────────────────────┐
   │ generate_correction()  │
   │ Input: all above       │
   │ Output: correction     │
   └──────┬─────────────────┘
          │
          ▼
   ┌────────────────────────┐
   │ results{}              │
   │ - Append all outputs   │
   └──────┬─────────────────┘
          │
          ▼  (After all sentences)
   ┌────────────────────────┐
   │ printEvaluationResults │
   └────────────────────────┘

Model Management
----------------

Model Loading Strategy
~~~~~~~~~~~~~~~~~~~~~~

WiMarka uses lazy loading and caching:

1. **First Request**: Model downloaded from HuggingFace Hub
2. **Subsequent Requests**: Loaded from local cache
3. **Memory Management**: Models loaded once per session

**Cache Location**:

* Windows: ``C:\Users\<username>\.cache\huggingface\``
* macOS/Linux: ``~/.cache/huggingface/``

Model Types
~~~~~~~~~~~

WiMarka utilizes two types of models:

1. **Transformer Models** (via ``transformers`` library)
   
   * Used for: Text embeddings, classification
   * Format: PyTorch checkpoints

2. **LLM Models** (via ``llama-cpp-python``)
   
   * Used for: Error detection, scoring, explanation, correction
   * Format: GGUF quantized models
   * Benefits: Efficient CPU inference

Device Management
~~~~~~~~~~~~~~~~~

Automatic device selection:

.. code-block:: python

   # From torch.py
   device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

* CUDA GPU if available
* Falls back to CPU
* No manual configuration needed

Performance Considerations
--------------------------

Optimization Strategies
~~~~~~~~~~~~~~~~~~~~~~~

1. **Model Caching**
   
   * Models loaded once per session
   * Inference results cached when possible
   * Reduces redundant computation

2. **Sequential Processing**
   
   * Sentences processed one at a time
   * Prevents memory overflow
   * Ensures deterministic results

3. **Efficient File I/O**
   
   * Streaming file reading
   * UTF-8 encoding handled properly
   * Minimal memory footprint

Bottlenecks
~~~~~~~~~~~

Identified performance bottlenecks:

* **Model Download**: First-time model download can be slow
* **LLM Inference**: CPU inference slower than GPU
* **Large Files**: Processing time scales linearly with file size

Scalability
~~~~~~~~~~~

Current limitations and future improvements:

**Current**:

* Single-threaded processing
* File-based input/output
* In-memory results storage

**Future Improvements**:

* Parallel sentence processing
* Streaming API
* Database integration for large-scale evaluation

Error Handling
--------------

Error Handling Strategy
~~~~~~~~~~~~~~~~~~~~~~~

WiMarka implements defensive programming:

1. **Input Validation**
   
   * File existence checks
   * Line count validation
   * Language code verification

2. **Graceful Degradation**
   
   * Model loading failures logged
   * Fallback mechanisms where possible

3. **Informative Errors**
   
   * Clear error messages
   * Actionable suggestions

Exception Hierarchy
~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Common exceptions
   FileNotFoundError  # Input files missing
   ValueError         # Invalid arguments, mismatched line counts
   RuntimeError       # Model loading failures

Logging
-------

Logging Architecture
~~~~~~~~~~~~~~~~~~~~

Structured logging at multiple levels:

.. code-block:: python

   # From logger.py
   logger.info("Starting evaluation...")      # Progress
   logger.warning("Model cache miss")         # Warnings
   logger.error("Failed to load model")       # Errors
   logger.debug("Intermediate result: ...")   # Debugging

**Log Levels**:

* ``INFO``: Progress and status updates
* ``WARNING``: Non-critical issues
* ``ERROR``: Failures and exceptions
* ``DEBUG``: Detailed debugging information (disabled by default)

Configuration Management
------------------------

Configuration Strategy
~~~~~~~~~~~~~~~~~~~~~~

WiMarka uses ``config.py`` for centralized configuration:

* Model paths and identifiers
* Hyperparameters
* Default settings
* API endpoints

**Design Principle**: Configuration separate from code allows easy customization without modifying source.

Extensibility Points
--------------------

WiMarka is designed for extension:

1. **New Languages**
   
   * Add language codes to ``config.py``
   * Update helper functions for new tags
   * Train/add language-specific models

2. **New Tasks**
   
   * Create new module in ``tasks/``
   * Integrate into ``main.py`` pipeline
   * Update results dictionary structure

3. **New Models**
   
   * Add model identifiers to ``config.py``
   * Update ``model.py`` loading logic
   * Ensure compatibility with existing interfaces

4. **Alternative Interfaces**
   
   * Web API wrapper
   * GUI application
   * Integration with other tools

See :doc:`extending` for detailed guides on extending WiMarka.

Design Patterns
---------------

Patterns Used in WiMarka
~~~~~~~~~~~~~~~~~~~~~~~~

1. **Pipeline Pattern**
   
   * Sequential task execution
   * Each stage processes and passes data
   * Clear separation of concerns

2. **Lazy Initialization**
   
   * Models loaded on first use
   * Reduces startup time
   * Efficient resource usage

3. **Facade Pattern**
   
   * ``wmk_eval()`` provides simple interface
   * Complex pipeline hidden from users
   * Easy to use, hard to misuse

4. **Singleton Pattern**
   
   * Global results dictionary
   * Logger instance
   * Model cache

Trade-offs
~~~~~~~~~~

**Simplicity vs. Flexibility**:

* Current: Simple API, less configuration
* Trade-off: Limited customization options

**Speed vs. Accuracy**:

* Current: CPU inference for accessibility
* Trade-off: Slower than GPU-optimized solutions

**Memory vs. Speed**:

* Current: Sequential processing
* Trade-off: Slower but memory-efficient

Future Architecture Improvements
---------------------------------

Planned Enhancements
~~~~~~~~~~~~~~~~~~~~

1. **Asynchronous Processing**
   
   * Non-blocking I/O
   * Parallel sentence evaluation
   * Progress callbacks

2. **Streaming API**
   
   * Process large files efficiently
   * Real-time results
   * Lower memory usage

3. **Plugin System**
   
   * Third-party task modules
   * Custom scoring algorithms
   * Community extensions

4. **Distributed Evaluation**
   
   * Multi-machine processing
   * Cloud deployment
   * Horizontal scaling

References
----------

* HuggingFace Transformers: https://huggingface.co/docs/transformers/
* llama.cpp: https://github.com/ggerganov/llama.cpp
* Click CLI Framework: https://click.palletsprojects.com/

Next Steps
----------

* See :doc:`api_reference` for detailed API documentation
* See :doc:`tasks` for task module internals
* See :doc:`extending` for customization guides