Language Models

Information about the language models used by WiMarka for translation evaluation.

Overview

WiMarka utilizes large language models (LLMs) for:

  • Error detection

  • Quality scoring

  • Explanation generation

  • Correction suggestions

Model Architecture

LLM Backend

WiMarka uses llama-cpp-python for efficient LLM inference:

  • Format: GGUF quantized models

  • Inference: CPU-optimized (GPU optional)

  • Quantization: Reduces memory footprint

  • Performance: Fast inference with reasonable quality

Model Selection

Models are selected based on:

  • Language pair support

  • Model size vs. performance

  • Inference speed

  • Memory requirements

Model Management

Download and Caching

Models are automatically downloaded on first use:

  1. Check local cache

  2. If not found, download from HuggingFace Hub

  3. Cache for future use

Cache Location:

  • Windows: C:\\Users\\<username>\\.cache\\huggingface\\

  • macOS/Linux: ~/.cache/huggingface/

Model Loading

Models are loaded lazily:

  • First task execution triggers model load

  • Models remain in memory for session duration

  • Subsequent calls reuse loaded model

Performance

Inference Speed

Factors affecting speed:

  • Model size

  • CPU/GPU availability

  • Sentence length

  • Batch size

Memory Usage

Typical memory requirements:

  • Small models: 2-4 GB RAM

  • Medium models: 4-8 GB RAM

  • Large models: 8-16 GB RAM

Future Improvements

Potential enhancements:

  • GPU acceleration support

  • Model quantization options

  • Multi-model ensemble

  • Fine-tuned Philippine language models

See Also