Language Models
Information about the language models used by WiMarka for translation evaluation.
Overview
WiMarka utilizes large language models (LLMs) for:
Error detection
Quality scoring
Explanation generation
Correction suggestions
Model Architecture
LLM Backend
WiMarka uses llama-cpp-python for efficient LLM inference:
Format: GGUF quantized models
Inference: CPU-optimized (GPU optional)
Quantization: Reduces memory footprint
Performance: Fast inference with reasonable quality
Model Selection
Models are selected based on:
Language pair support
Model size vs. performance
Inference speed
Memory requirements
Model Management
Download and Caching
Models are automatically downloaded on first use:
Check local cache
If not found, download from HuggingFace Hub
Cache for future use
Cache Location:
Windows:
C:\\Users\\<username>\\.cache\\huggingface\\macOS/Linux:
~/.cache/huggingface/
Model Loading
Models are loaded lazily:
First task execution triggers model load
Models remain in memory for session duration
Subsequent calls reuse loaded model
Performance
Inference Speed
Factors affecting speed:
Model size
CPU/GPU availability
Sentence length
Batch size
Memory Usage
Typical memory requirements:
Small models: 2-4 GB RAM
Medium models: 4-8 GB RAM
Large models: 8-16 GB RAM
Future Improvements
Potential enhancements:
GPU acceleration support
Model quantization options
Multi-model ensemble
Fine-tuned Philippine language models
See Also
Architecture - System architecture
Installation - Installation guide