Language Models

Information about the language models used by WiMarka for translation evaluation.

Overview

WiMarka utilizes large language models (LLMs) for:

Error detection
Quality scoring
Explanation generation
Correction suggestions

Model Architecture

LLM Backend

WiMarka uses llama-cpp-python for efficient LLM inference:

Format: GGUF quantized models
Inference: CPU-optimized (GPU optional)
Quantization: Reduces memory footprint
Performance: Fast inference with reasonable quality

Model Selection

Models are selected based on:

Language pair support
Model size vs. performance
Inference speed
Memory requirements

Model Management

Download and Caching

Models are automatically downloaded on first use:

Check local cache
If not found, download from HuggingFace Hub
Cache for future use

Cache Location:

Windows: C:\\Users\\<username>\\.cache\\huggingface\\
macOS/Linux: ~/.cache/huggingface/

Model Loading

Models are loaded lazily:

First task execution triggers model load
Models remain in memory for session duration
Subsequent calls reuse loaded model

Performance

Inference Speed

Factors affecting speed:

Model size
CPU/GPU availability
Sentence length
Batch size

Memory Usage

Typical memory requirements:

Small models: 2-4 GB RAM
Medium models: 4-8 GB RAM
Large models: 8-16 GB RAM

Future Improvements

Potential enhancements:

GPU acceleration support
Model quantization options
Multi-model ensemble
Fine-tuned Philippine language models

See Also

Architecture - System architecture
Installation - Installation guide