Architecture

This document describes WiMarka’s system architecture, design decisions, and data flow.

System Overview

WiMarka is designed as a modular machine translation evaluation system with a four-stage pipeline that processes sentence pairs to generate quality assessments.

High-Level Architecture

┌──────────────────────────────────────────────────────────┐
│                     User Interface                        │
├───────────────────┬──────────────────────────────────────┤
│  Python Library   │     Command-Line Interface (CLI)     │
└────────┬──────────┴──────────────┬───────────────────────┘
         │                          │
         └──────────┬──────────────┘
                    │
               ┌────▼────┐
               │ wmk_eval│  Main Entry Point
               └────┬────┘
                    │
┌───────────────────┼───────────────────┐
│              Evaluation Pipeline       │
│  ┌──────────────────────────────────┐  │
│  │  1. Error Detection              │  │
│  └─────────────┬────────────────────┘  │
│                │                        │
│  ┌─────────────▼────────────────────┐  │
│  │  2. Scoring                      │  │
│  └─────────────┬────────────────────┘  │
│                │                        │
│  ┌─────────────▼────────────────────┐  │
│  │  3. Explanation Generation       │  │
│  └─────────────┬────────────────────┘  │
│                │                        │
│  ┌─────────────▼────────────────────┐  │
│  │  4. Correction Suggestion        │  │
│  └─────────────┬────────────────────┘  │
└────────────────┼────────────────────────┘
                 │
      ┌──────────▼──────────┐
      │  Utilities Layer     │
      ├──────────────────────┤
      │  • Model Management  │
      │  • Caching           │
      │  • Logging           │
      │  • Helper Functions  │
      └──────────┬───────────┘
                 │
      ┌──────────▼──────────┐
      │   Language Models    │
      │  • Transformer LMs   │
      │  • LLM (llama-cpp)   │
      └─────────────────────┘

Core Components

Main Module (`main.py`)

Responsibility: Orchestrates the evaluation pipeline

Key Functions:

wmk_eval(): Main entry point for evaluation
Loads source and target files
Manages results dictionary
Coordinates task modules

Design Decisions:

Sequential processing ensures deterministic results
Global results dictionary for easy access
File-based input for batch processing

CLI Module (`cli.py`)

Responsibility: Command-line interface

Features:

Argument parsing with Click
Input validation
Error handling and user feedback

Integration: Wraps wmk_eval() with CLI argument handling

Task Modules

Four independent task modules implement the evaluation pipeline:

error_detection.py: Identifies translation errors
scoring.py: Calculates quality metrics
explanation.py: Generates explanations
correction.py: Suggests corrections

See Task Modules for detailed documentation.

Utility Modules

Support modules provide shared functionality:

helper.py: File I/O, language tag management
logger.py: Logging configuration
model.py: Model loading and management
cache.py: Response caching
torch.py: PyTorch device management

See Utility Modules for detailed documentation.

Evaluation Pipeline

Stage 1: Error Detection

Input: Tagged source and target sentences

src_line = "[EN] Good morning!"
tgt_line = "[CEB] Maayong gabii!"

Process:

Feeds sentences to error detection model
Analyzes syntactic and semantic differences
Identifies specific error types

Output: List of detected errors

errors = ['Semantic mismatch: time of day']

Implementation Details:

Uses LLM-based analysis
Prompt engineering for error identification
Language-specific error patterns

Stage 2: Scoring

Input: Source sentence, target sentence, detected errors

Process:

Evaluates fluency (grammatical correctness)
Evaluates adequacy (meaning preservation)
Calculates overall score

Output: Three numerical scores (0-100)

fluency_score = 95
adequacy_score = 40
overall_score = 67.5  # Average

Scoring Algorithm:

LLM-based scoring with structured prompts
Error count influences scores
Language-specific quality criteria

Stage 3: Explanation Generation

Input: All previous stage outputs

Process:

Analyzes detected errors
Considers score levels
Generates human-readable explanation

Output: Natural language explanation

explanation = "The translation has incorrect time reference. 'Morning' was translated as 'gabii' (evening)."

Design:

Context-aware explanation generation
References specific errors
Educational tone for clarity

Stage 4: Correction Suggestion

Input: All previous stage outputs

Process:

Analyzes errors and explanations
Generates improved translation
Validates correction quality

Output: Suggested corrected translation

corrected_translation = "Maayong buntag!"

Approach:

Error-informed correction
Preserves correct portions
Maintains semantic equivalence

Data Flow

Detailed data flow through the system:

┌─────────────┐
│ Input Files │
└──────┬──────┘
       │
       ▼
┌─────────────────────┐
│ wmk_eval()          │
│ - Load files        │
│ - Validate counts   │
│ - Add language tags │
└──────┬──────────────┘
       │
       ▼  (For each sentence pair)
┌────────────────────────┐
│ error_detection()      │
│ Input: src, tgt        │
│ Output: errors[]       │
└──────┬─────────────────┘
       │
       ▼
┌────────────────────────┐
│ scoring()              │
│ Input: src, tgt, errors│
│ Output: 3 scores       │
└──────┬─────────────────┘
       │
       ▼
┌────────────────────────┐
│ generate_explanation() │
│ Input: all above       │
│ Output: explanation    │
└──────┬─────────────────┘
       │
       ▼
┌────────────────────────┐
│ generate_correction()  │
│ Input: all above       │
│ Output: correction     │
└──────┬─────────────────┘
       │
       ▼
┌────────────────────────┐
│ results{}              │
│ - Append all outputs   │
└──────┬─────────────────┘
       │
       ▼  (After all sentences)
┌────────────────────────┐
│ printEvaluationResults │
└────────────────────────┘

Model Management

Model Loading Strategy

WiMarka uses lazy loading and caching:

First Request: Model downloaded from HuggingFace Hub
Subsequent Requests: Loaded from local cache
Memory Management: Models loaded once per session

Cache Location:

Windows: C:\Users\<username>\.cache\huggingface\
macOS/Linux: ~/.cache/huggingface/

Model Types

WiMarka utilizes two types of models:

Transformer Models (via transformers library)
- Used for: Text embeddings, classification
- Format: PyTorch checkpoints
LLM Models (via llama-cpp-python)
- Used for: Error detection, scoring, explanation, correction
- Format: GGUF quantized models
- Benefits: Efficient CPU inference

Device Management

Automatic device selection:

# From torch.py
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

CUDA GPU if available
Falls back to CPU
No manual configuration needed

Performance Considerations

Optimization Strategies

Model Caching
- Models loaded once per session
- Inference results cached when possible
- Reduces redundant computation
Sequential Processing
- Sentences processed one at a time
- Prevents memory overflow
- Ensures deterministic results
Efficient File I/O
- Streaming file reading
- UTF-8 encoding handled properly
- Minimal memory footprint

Bottlenecks

Identified performance bottlenecks:

Model Download: First-time model download can be slow
LLM Inference: CPU inference slower than GPU
Large Files: Processing time scales linearly with file size

Scalability

Current limitations and future improvements:

Current:

Single-threaded processing
File-based input/output
In-memory results storage

Future Improvements:

Parallel sentence processing
Streaming API
Database integration for large-scale evaluation

Error Handling

Error Handling Strategy

WiMarka implements defensive programming:

Input Validation
- File existence checks
- Line count validation
- Language code verification
Graceful Degradation
- Model loading failures logged
- Fallback mechanisms where possible
Informative Errors
- Clear error messages
- Actionable suggestions

Exception Hierarchy

# Common exceptions
FileNotFoundError  # Input files missing
ValueError         # Invalid arguments, mismatched line counts
RuntimeError       # Model loading failures

Logging

Logging Architecture

Structured logging at multiple levels:

# From logger.py
logger.info("Starting evaluation...")      # Progress
logger.warning("Model cache miss")         # Warnings
logger.error("Failed to load model")       # Errors
logger.debug("Intermediate result: ...")   # Debugging

Log Levels:

INFO: Progress and status updates
WARNING: Non-critical issues
ERROR: Failures and exceptions
DEBUG: Detailed debugging information (disabled by default)

Configuration Management

Configuration Strategy

WiMarka uses config.py for centralized configuration:

Model paths and identifiers
Hyperparameters
Default settings
API endpoints

Design Principle: Configuration separate from code allows easy customization without modifying source.

Extensibility Points

WiMarka is designed for extension:

New Languages
- Add language codes to config.py
- Update helper functions for new tags
- Train/add language-specific models
New Tasks
- Create new module in tasks/
- Integrate into main.py pipeline
- Update results dictionary structure
New Models
- Add model identifiers to config.py
- Update model.py loading logic
- Ensure compatibility with existing interfaces
Alternative Interfaces
- Web API wrapper
- GUI application
- Integration with other tools

See Extending WiMarka for detailed guides on extending WiMarka.

Design Patterns

Patterns Used in WiMarka

Pipeline Pattern
- Sequential task execution
- Each stage processes and passes data
- Clear separation of concerns
Lazy Initialization
- Models loaded on first use
- Reduces startup time
- Efficient resource usage
Facade Pattern
- wmk_eval() provides simple interface
- Complex pipeline hidden from users
- Easy to use, hard to misuse
Singleton Pattern
- Global results dictionary
- Logger instance
- Model cache

Trade-offs

Simplicity vs. Flexibility:

Current: Simple API, less configuration
Trade-off: Limited customization options

Speed vs. Accuracy:

Current: CPU inference for accessibility
Trade-off: Slower than GPU-optimized solutions

Memory vs. Speed:

Current: Sequential processing
Trade-off: Slower but memory-efficient

Future Architecture Improvements

Planned Enhancements

Asynchronous Processing
- Non-blocking I/O
- Parallel sentence evaluation
- Progress callbacks
Streaming API
- Process large files efficiently
- Real-time results
- Lower memory usage
Plugin System
- Third-party task modules
- Custom scoring algorithms
- Community extensions
Distributed Evaluation
- Multi-machine processing
- Cloud deployment
- Horizontal scaling

References

HuggingFace Transformers: https://huggingface.co/docs/transformers/
llama.cpp: https://github.com/ggerganov/llama.cpp
Click CLI Framework: https://click.palletsprojects.com/

Next Steps

See API Reference for detailed API documentation
See Task Modules for task module internals
See Extending WiMarka for customization guides

Architecture

System Overview

High-Level Architecture

Core Components

Main Module (main.py)

CLI Module (cli.py)

Task Modules

Utility Modules

Evaluation Pipeline

Stage 1: Error Detection

Stage 2: Scoring

Stage 3: Explanation Generation

Stage 4: Correction Suggestion

Data Flow

Model Management

Model Loading Strategy

Model Types

Device Management

Performance Considerations

Optimization Strategies

Bottlenecks

Scalability

Error Handling

Error Handling Strategy

Exception Hierarchy

Logging

Logging Architecture

Configuration Management

Configuration Strategy

Extensibility Points

Design Patterns

Patterns Used in WiMarka

Trade-offs

Future Architecture Improvements

Planned Enhancements

References

Next Steps

Main Module (`main.py`)

CLI Module (`cli.py`)