API Reference

This page provides complete API documentation for WiMarka’s public and internal interfaces.

Main Module (`wimarka.main`)

wimarka.main.wmk_eval(src_file_path: str, src_lang: str, tgt_file_path: str, tgt_lang: str)[source]

wmk_eval

wimarka.main.wmk_eval(src_file_path: str, src_lang: str, tgt_file_path: str, tgt_lang: str)[source]

Main evaluation function that orchestrates the entire pipeline.

Function Signature:

def wmk_eval(src_file_path: str, src_lang: str,
             tgt_file_path: str, tgt_lang: str) -> None

Parameters:

src_file_path (str): Absolute or relative path to source text file
src_lang (str): Source language code (EN, CEB, ILO, TGT)
tgt_file_path (str): Absolute or relative path to target translation file
tgt_lang (str): Target language code (CEB, ILO, TGT)

Returns: None (results stored in global results dictionary and printed)

Raises:

ValueError: If source and target files have different line counts
FileNotFoundError: If input files don’t exist

Example:

from wimarka.main import wmk_eval

wmk_eval(
    src_file_path='data/english.txt',
    src_lang='EN',
    tgt_file_path='data/cebuano.txt',
    tgt_lang='CEB'
)

results Dictionary

Global dictionary storing evaluation results.

Structure:

results = {
    'source': List[str],                  # Source sentences (with tags)
    'target': List[str],                  # Target sentences (with tags)
    'errors': List[List[str]],            # Detected errors per sentence
    'fluency_score': List[float],         # Fluency scores (0-100)
    'adequacy_score': List[float],        # Adequacy scores (0-100)
    'overall_score': List[float],         # Overall scores (0-100)
    'explanation': List[str],             # Human-readable explanations
    'corrected_translation': List[str]    # Suggested corrections
}

Access Pattern:

from wimarka.main import wmk_eval, results

wmk_eval('src.txt', 'EN', 'tgt.txt', 'CEB')

# Access results
for i in range(len(results['source'])):
    print(f"Score: {results['overall_score'][i]}")

CLI Module (`wimarka.cli`)

main

@click.command()
@click.option('--src_file_path', required=True,
              help='Path to source text file')
@click.option('--src_lang', required=True,
              help='Source language code')
@click.option('--tgt_file_path', required=True,
              help='Path to target text file')
@click.option('--tgt_lang', required=True,
              help='Target language code')
def main(src_file_path, src_lang, tgt_file_path, tgt_lang):
    """Command-line interface for WiMarka evaluation."""

Entry point for CLI execution. Wraps wmk_eval() with Click decorators.

Task Modules

error_detection Module

wimarka.tasks.error_detection.split_words(text: str)[source]

wimarka.tasks.error_detection.tokenize_with_spans(text: str)[source]

wimarka.tasks.error_detection.format_tagged_sentence_using_spans(original_text: str, token_spans, labels)[source]

wimarka.tasks.error_detection.error_detection(source_sentence, target_sentence)[source]

Main Function:

def error_detection(src_line: str, tgt_line: str) -> List[str]

Detects translation errors between source and target sentences.

Parameters:

src_line: Source sentence with language tag
tgt_line: Target sentence with language tag

Returns: List of detected error descriptions

Error Types:

Lexical errors (wrong word choice)
Syntactic errors (grammar issues)
Semantic errors (meaning loss)
Morphological errors (wrong affixes)
Omissions/additions

scoring Module

wimarka.tasks.scoring.classify(score)[source]

wimarka.tasks.scoring.scoring(source, target, errors)[source]

Main Function:

def scoring(src_line: str, tgt_line: str, errors: List[str]) -> Tuple[float, float, float]

Calculates quality scores for the translation.

Parameters:

src_line: Source sentence with tag
tgt_line: Target sentence with tag
errors: List of detected errors from error_detection

Returns: Tuple of (fluency_score, adequacy_score, overall_score)

All scores are floats in range [0, 100]
Overall score = (fluency + adequacy) / 2

explanation Module

wimarka.tasks.explanation.generate_explanation(src, tgt, errors, fluency, adequacy, overall)[source]

Main Function:

def generate_explanation(src_line: str, tgt_line: str, errors: List[str],
                        fluency: float, adequacy: float, overall: float) -> str

Generates human-readable explanation of the evaluation.

Parameters:

src_line: Source sentence
tgt_line: Target sentence
errors: Detected errors
fluency: Fluency score
adequacy: Adequacy score
overall: Overall score

Returns: Natural language explanation string

correction Module

wimarka.tasks.correction.generate_correction(src, tgt, errors, comments)[source]

Main Function:

def generate_correction(src_line: str, tgt_line: str,
                       errors: List[str], comments: str) -> str

Generates corrected translation suggestion.

Parameters:

src_line: Source sentence
tgt_line: Target sentence
errors: Detected errors
comments: Explanation from explanation module

Returns: Suggested corrected translation

Utility Modules

helper Module

wimarka.utils.helper.add_tag(line, tag: str)[source]

wimarka.utils.helper.check_tag(src_tag: str, tgt_tag: str)[source]

wimarka.utils.helper.get_column(X)[source]

wimarka.utils.helper.printEvaluationResults(results)[source]

Key Functions:

def check_tag(src_lang: str, tgt_lang: str) -> None:
    """Validate language codes."""

def add_tag(sentence: str, lang: str) -> str:
    """Add language tag to sentence."""

def printEvaluationResults(results: dict) -> None:
    """Print formatted evaluation results."""

logger Module

wimarka.utils.logger.setup_logger(log_dir: str = 'logs', name: str = 'wimarka', level=20)[source]

Main Function:

def setup_logger() -> logging.Logger:
    """Configure and return logger instance."""

Usage:

from wimarka.utils.logger import setup_logger

logger = setup_logger()
logger.info("Processing started")
logger.error("An error occurred")

model Module

wimarka.utils.model.load_model(model_name: str)[source]

Key Functions:

def load_model(model_name: str):
    """Load model from HuggingFace Hub or cache."""

def get_model_path(model_name: str) -> str:
    """Get local path to cached model."""

cache Module

Caching utilities for model responses and intermediate results.

torch Module

wimarka.utils.torch.get_device()[source]

wimarka.utils.torch.move_model_to_device(model, device=None)[source]

Device Management:

import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Type Hints

Common type hints used throughout WiMarka:

from typing import List, Tuple, Dict, Optional

# Sentence with language tag
TaggedSentence = str  # Format: "[LANG] sentence text"

# Error list
ErrorList = List[str]

# Scores tuple
Scores = Tuple[float, float, float]  # (fluency, adequacy, overall)

# Results structure
ResultsDict = Dict[str, List]

Constants

Language Codes

SUPPORTED_LANGUAGES = ['EN', 'CEB', 'ILO', 'TGT']

LANGUAGE_NAMES = {
    'EN': 'English',
    'CEB': 'Cebuano',
    'ILO': 'Ilocano',
    'TGT': 'Tagalog'
}

Score Ranges

MIN_SCORE = 0
MAX_SCORE = 100

# Quality thresholds
EXCELLENT_THRESHOLD = 90
GOOD_THRESHOLD = 75
FAIR_THRESHOLD = 60

Configuration

Model Identifiers

Models are identified by HuggingFace repository names or local paths.

See wimarka/config.py for complete configuration.

Best Practices

Using the API

Import Correctly:

from wimarka.main import wmk_eval, results  # Correct
import wimarka  # Less efficient

Handle Results:

# Copy results if needed for multiple evaluations
from copy import deepcopy

wmk_eval('src1.txt', 'EN', 'tgt1.txt', 'CEB')
results1 = deepcopy(results)

wmk_eval('src2.txt', 'EN', 'tgt2.txt', 'CEB')
results2 = deepcopy(results)

Error Handling:

try:
    wmk_eval('src.txt', 'EN', 'tgt.txt', 'CEB')
except ValueError as e:
    print(f"Validation error: {e}")
except FileNotFoundError as e:
    print(f"File not found: {e}")

Extending the API

To add custom processing:

from wimarka.main import wmk_eval, results

def custom_evaluation(src, tgt, lang, custom_metric):
    """Custom evaluation with additional metric."""
    # Run standard evaluation
    wmk_eval(src, 'EN', tgt, lang)

    # Add custom processing
    custom_scores = []
    for i in range(len(results['source'])):
        score = custom_metric(
            results['source'][i],
            results['target'][i]
        )
        custom_scores.append(score)

    # Add to results
    results['custom_score'] = custom_scores

    return results

API Reference

Main Module (wimarka.main)

wmk_eval

results Dictionary

CLI Module (wimarka.cli)

main

Task Modules

error_detection Module

scoring Module

explanation Module

correction Module

Utility Modules

helper Module

logger Module

model Module

cache Module

torch Module

Type Hints

Constants

Language Codes

Score Ranges

Configuration

Model Identifiers

Best Practices

Using the API

Extending the API

See Also

Main Module (`wimarka.main`)

CLI Module (`wimarka.cli`)