Language Models
===============

Information about the language models used by WiMarka for translation evaluation.

Overview
--------

WiMarka utilizes large language models (LLMs) for:

* Error detection
* Quality scoring
* Explanation generation
* Correction suggestions

Model Architecture
------------------

LLM Backend
~~~~~~~~~~~

WiMarka uses ``llama-cpp-python`` for efficient LLM inference:

* **Format**: GGUF quantized models
* **Inference**: CPU-optimized (GPU optional)
* **Quantization**: Reduces memory footprint
* **Performance**: Fast inference with reasonable quality

Model Selection
~~~~~~~~~~~~~~~

Models are selected based on:

* Language pair support
* Model size vs. performance
* Inference speed
* Memory requirements

Model Management
----------------

Download and Caching
~~~~~~~~~~~~~~~~~~~~

Models are automatically downloaded on first use:

1. Check local cache
2. If not found, download from HuggingFace Hub
3. Cache for future use

**Cache Location**:

* Windows: ``C:\\Users\\<username>\\.cache\\huggingface\\``
* macOS/Linux: ``~/.cache/huggingface/``

Model Loading
~~~~~~~~~~~~~

Models are loaded lazily:

* First task execution triggers model load
* Models remain in memory for session duration
* Subsequent calls reuse loaded model

Performance
-----------

Inference Speed
~~~~~~~~~~~~~~~

Factors affecting speed:

* Model size
* CPU/GPU availability
* Sentence length
* Batch size

Memory Usage
~~~~~~~~~~~~

Typical memory requirements:

* Small models: 2-4 GB RAM
* Medium models: 4-8 GB RAM
* Large models: 8-16 GB RAM

Future Improvements
-------------------

Potential enhancements:

* GPU acceleration support
* Model quantization options
* Multi-model ensemble
* Fine-tuned Philippine language models

See Also
--------

* :doc:`architecture` - System architecture
* :doc:`../user/installation` - Installation guide