Architecture
This document describes WiMarka’s system architecture, design decisions, and data flow.
System Overview
WiMarka is designed as a modular machine translation evaluation system with a four-stage pipeline that processes sentence pairs to generate quality assessments.
High-Level Architecture
┌──────────────────────────────────────────────────────────┐
│ User Interface │
├───────────────────┬──────────────────────────────────────┤
│ Python Library │ Command-Line Interface (CLI) │
└────────┬──────────┴──────────────┬───────────────────────┘
│ │
└──────────┬──────────────┘
│
┌────▼────┐
│ wmk_eval│ Main Entry Point
└────┬────┘
│
┌───────────────────┼───────────────────┐
│ Evaluation Pipeline │
│ ┌──────────────────────────────────┐ │
│ │ 1. Error Detection │ │
│ └─────────────┬────────────────────┘ │
│ │ │
│ ┌─────────────▼────────────────────┐ │
│ │ 2. Scoring │ │
│ └─────────────┬────────────────────┘ │
│ │ │
│ ┌─────────────▼────────────────────┐ │
│ │ 3. Explanation Generation │ │
│ └─────────────┬────────────────────┘ │
│ │ │
│ ┌─────────────▼────────────────────┐ │
│ │ 4. Correction Suggestion │ │
│ └─────────────┬────────────────────┘ │
└────────────────┼────────────────────────┘
│
┌──────────▼──────────┐
│ Utilities Layer │
├──────────────────────┤
│ • Model Management │
│ • Caching │
│ • Logging │
│ • Helper Functions │
└──────────┬───────────┘
│
┌──────────▼──────────┐
│ Language Models │
│ • Transformer LMs │
│ • LLM (llama-cpp) │
└─────────────────────┘
Core Components
Main Module (main.py)
Responsibility: Orchestrates the evaluation pipeline
Key Functions:
wmk_eval(): Main entry point for evaluationLoads source and target files
Manages results dictionary
Coordinates task modules
Design Decisions:
Sequential processing ensures deterministic results
Global
resultsdictionary for easy accessFile-based input for batch processing
CLI Module (cli.py)
Responsibility: Command-line interface
Features:
Argument parsing with Click
Input validation
Error handling and user feedback
Integration: Wraps wmk_eval() with CLI argument handling
Task Modules
Four independent task modules implement the evaluation pipeline:
error_detection.py: Identifies translation errors
scoring.py: Calculates quality metrics
explanation.py: Generates explanations
correction.py: Suggests corrections
See Task Modules for detailed documentation.
Utility Modules
Support modules provide shared functionality:
helper.py: File I/O, language tag management
logger.py: Logging configuration
model.py: Model loading and management
cache.py: Response caching
torch.py: PyTorch device management
See Utility Modules for detailed documentation.
Evaluation Pipeline
Stage 1: Error Detection
Input: Tagged source and target sentences
src_line = "[EN] Good morning!"
tgt_line = "[CEB] Maayong gabii!"
Process:
Feeds sentences to error detection model
Analyzes syntactic and semantic differences
Identifies specific error types
Output: List of detected errors
errors = ['Semantic mismatch: time of day']
Implementation Details:
Uses LLM-based analysis
Prompt engineering for error identification
Language-specific error patterns
Stage 2: Scoring
Input: Source sentence, target sentence, detected errors
Process:
Evaluates fluency (grammatical correctness)
Evaluates adequacy (meaning preservation)
Calculates overall score
Output: Three numerical scores (0-100)
fluency_score = 95
adequacy_score = 40
overall_score = 67.5 # Average
Scoring Algorithm:
LLM-based scoring with structured prompts
Error count influences scores
Language-specific quality criteria
Stage 3: Explanation Generation
Input: All previous stage outputs
Process:
Analyzes detected errors
Considers score levels
Generates human-readable explanation
Output: Natural language explanation
explanation = "The translation has incorrect time reference. 'Morning' was translated as 'gabii' (evening)."
Design:
Context-aware explanation generation
References specific errors
Educational tone for clarity
Stage 4: Correction Suggestion
Input: All previous stage outputs
Process:
Analyzes errors and explanations
Generates improved translation
Validates correction quality
Output: Suggested corrected translation
corrected_translation = "Maayong buntag!"
Approach:
Error-informed correction
Preserves correct portions
Maintains semantic equivalence
Data Flow
Detailed data flow through the system:
┌─────────────┐
│ Input Files │
└──────┬──────┘
│
▼
┌─────────────────────┐
│ wmk_eval() │
│ - Load files │
│ - Validate counts │
│ - Add language tags │
└──────┬──────────────┘
│
▼ (For each sentence pair)
┌────────────────────────┐
│ error_detection() │
│ Input: src, tgt │
│ Output: errors[] │
└──────┬─────────────────┘
│
▼
┌────────────────────────┐
│ scoring() │
│ Input: src, tgt, errors│
│ Output: 3 scores │
└──────┬─────────────────┘
│
▼
┌────────────────────────┐
│ generate_explanation() │
│ Input: all above │
│ Output: explanation │
└──────┬─────────────────┘
│
▼
┌────────────────────────┐
│ generate_correction() │
│ Input: all above │
│ Output: correction │
└──────┬─────────────────┘
│
▼
┌────────────────────────┐
│ results{} │
│ - Append all outputs │
└──────┬─────────────────┘
│
▼ (After all sentences)
┌────────────────────────┐
│ printEvaluationResults │
└────────────────────────┘
Model Management
Model Loading Strategy
WiMarka uses lazy loading and caching:
First Request: Model downloaded from HuggingFace Hub
Subsequent Requests: Loaded from local cache
Memory Management: Models loaded once per session
Cache Location:
Windows:
C:\Users\<username>\.cache\huggingface\macOS/Linux:
~/.cache/huggingface/
Model Types
WiMarka utilizes two types of models:
Transformer Models (via
transformerslibrary)Used for: Text embeddings, classification
Format: PyTorch checkpoints
LLM Models (via
llama-cpp-python)Used for: Error detection, scoring, explanation, correction
Format: GGUF quantized models
Benefits: Efficient CPU inference
Device Management
Automatic device selection:
# From torch.py
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
CUDA GPU if available
Falls back to CPU
No manual configuration needed
Performance Considerations
Optimization Strategies
Model Caching
Models loaded once per session
Inference results cached when possible
Reduces redundant computation
Sequential Processing
Sentences processed one at a time
Prevents memory overflow
Ensures deterministic results
Efficient File I/O
Streaming file reading
UTF-8 encoding handled properly
Minimal memory footprint
Bottlenecks
Identified performance bottlenecks:
Model Download: First-time model download can be slow
LLM Inference: CPU inference slower than GPU
Large Files: Processing time scales linearly with file size
Scalability
Current limitations and future improvements:
Current:
Single-threaded processing
File-based input/output
In-memory results storage
Future Improvements:
Parallel sentence processing
Streaming API
Database integration for large-scale evaluation
Error Handling
Error Handling Strategy
WiMarka implements defensive programming:
Input Validation
File existence checks
Line count validation
Language code verification
Graceful Degradation
Model loading failures logged
Fallback mechanisms where possible
Informative Errors
Clear error messages
Actionable suggestions
Exception Hierarchy
# Common exceptions
FileNotFoundError # Input files missing
ValueError # Invalid arguments, mismatched line counts
RuntimeError # Model loading failures
Logging
Logging Architecture
Structured logging at multiple levels:
# From logger.py
logger.info("Starting evaluation...") # Progress
logger.warning("Model cache miss") # Warnings
logger.error("Failed to load model") # Errors
logger.debug("Intermediate result: ...") # Debugging
Log Levels:
INFO: Progress and status updatesWARNING: Non-critical issuesERROR: Failures and exceptionsDEBUG: Detailed debugging information (disabled by default)
Configuration Management
Configuration Strategy
WiMarka uses config.py for centralized configuration:
Model paths and identifiers
Hyperparameters
Default settings
API endpoints
Design Principle: Configuration separate from code allows easy customization without modifying source.
Extensibility Points
WiMarka is designed for extension:
New Languages
Add language codes to
config.pyUpdate helper functions for new tags
Train/add language-specific models
New Tasks
Create new module in
tasks/Integrate into
main.pypipelineUpdate results dictionary structure
New Models
Add model identifiers to
config.pyUpdate
model.pyloading logicEnsure compatibility with existing interfaces
Alternative Interfaces
Web API wrapper
GUI application
Integration with other tools
See Extending WiMarka for detailed guides on extending WiMarka.
Design Patterns
Patterns Used in WiMarka
Pipeline Pattern
Sequential task execution
Each stage processes and passes data
Clear separation of concerns
Lazy Initialization
Models loaded on first use
Reduces startup time
Efficient resource usage
Facade Pattern
wmk_eval()provides simple interfaceComplex pipeline hidden from users
Easy to use, hard to misuse
Singleton Pattern
Global results dictionary
Logger instance
Model cache
Trade-offs
Simplicity vs. Flexibility:
Current: Simple API, less configuration
Trade-off: Limited customization options
Speed vs. Accuracy:
Current: CPU inference for accessibility
Trade-off: Slower than GPU-optimized solutions
Memory vs. Speed:
Current: Sequential processing
Trade-off: Slower but memory-efficient
Future Architecture Improvements
Planned Enhancements
Asynchronous Processing
Non-blocking I/O
Parallel sentence evaluation
Progress callbacks
Streaming API
Process large files efficiently
Real-time results
Lower memory usage
Plugin System
Third-party task modules
Custom scoring algorithms
Community extensions
Distributed Evaluation
Multi-machine processing
Cloud deployment
Horizontal scaling
References
HuggingFace Transformers: https://huggingface.co/docs/transformers/
llama.cpp: https://github.com/ggerganov/llama.cpp
Click CLI Framework: https://click.palletsprojects.com/
Next Steps
See API Reference for detailed API documentation
See Task Modules for task module internals
See Extending WiMarka for customization guides