Language Models =============== Information about the language models used by WiMarka for translation evaluation. Overview -------- WiMarka utilizes large language models (LLMs) for: * Error detection * Quality scoring * Explanation generation * Correction suggestions Model Architecture ------------------ LLM Backend ~~~~~~~~~~~ WiMarka uses ``llama-cpp-python`` for efficient LLM inference: * **Format**: GGUF quantized models * **Inference**: CPU-optimized (GPU optional) * **Quantization**: Reduces memory footprint * **Performance**: Fast inference with reasonable quality Model Selection ~~~~~~~~~~~~~~~ Models are selected based on: * Language pair support * Model size vs. performance * Inference speed * Memory requirements Model Management ---------------- Download and Caching ~~~~~~~~~~~~~~~~~~~~ Models are automatically downloaded on first use: 1. Check local cache 2. If not found, download from HuggingFace Hub 3. Cache for future use **Cache Location**: * Windows: ``C:\\Users\\\\.cache\\huggingface\\`` * macOS/Linux: ``~/.cache/huggingface/`` Model Loading ~~~~~~~~~~~~~ Models are loaded lazily: * First task execution triggers model load * Models remain in memory for session duration * Subsequent calls reuse loaded model Performance ----------- Inference Speed ~~~~~~~~~~~~~~~ Factors affecting speed: * Model size * CPU/GPU availability * Sentence length * Batch size Memory Usage ~~~~~~~~~~~~ Typical memory requirements: * Small models: 2-4 GB RAM * Medium models: 4-8 GB RAM * Large models: 8-16 GB RAM Future Improvements ------------------- Potential enhancements: * GPU acceleration support * Model quantization options * Multi-model ensemble * Fine-tuned Philippine language models See Also -------- * :doc:`architecture` - System architecture * :doc:`../user/installation` - Installation guide