API Reference

This page provides complete API documentation for WiMarka’s public and internal interfaces.

Main Module (wimarka.main)

wimarka.main.wmk_eval(src_file_path: str, src_lang: str, tgt_file_path: str, tgt_lang: str)[source]

wmk_eval

wimarka.main.wmk_eval(src_file_path: str, src_lang: str, tgt_file_path: str, tgt_lang: str)[source]

Main evaluation function that orchestrates the entire pipeline.

Function Signature:

def wmk_eval(src_file_path: str, src_lang: str,
             tgt_file_path: str, tgt_lang: str) -> None

Parameters:

  • src_file_path (str): Absolute or relative path to source text file

  • src_lang (str): Source language code (EN, CEB, ILO, TGT)

  • tgt_file_path (str): Absolute or relative path to target translation file

  • tgt_lang (str): Target language code (CEB, ILO, TGT)

Returns: None (results stored in global results dictionary and printed)

Raises:

  • ValueError: If source and target files have different line counts

  • FileNotFoundError: If input files don’t exist

Example:

from wimarka.main import wmk_eval

wmk_eval(
    src_file_path='data/english.txt',
    src_lang='EN',
    tgt_file_path='data/cebuano.txt',
    tgt_lang='CEB'
)

results Dictionary

Global dictionary storing evaluation results.

Structure:

results = {
    'source': List[str],                  # Source sentences (with tags)
    'target': List[str],                  # Target sentences (with tags)
    'errors': List[List[str]],            # Detected errors per sentence
    'fluency_score': List[float],         # Fluency scores (0-100)
    'adequacy_score': List[float],        # Adequacy scores (0-100)
    'overall_score': List[float],         # Overall scores (0-100)
    'explanation': List[str],             # Human-readable explanations
    'corrected_translation': List[str]    # Suggested corrections
}

Access Pattern:

from wimarka.main import wmk_eval, results

wmk_eval('src.txt', 'EN', 'tgt.txt', 'CEB')

# Access results
for i in range(len(results['source'])):
    print(f"Score: {results['overall_score'][i]}")

CLI Module (wimarka.cli)

main

@click.command()
@click.option('--src_file_path', required=True,
              help='Path to source text file')
@click.option('--src_lang', required=True,
              help='Source language code')
@click.option('--tgt_file_path', required=True,
              help='Path to target text file')
@click.option('--tgt_lang', required=True,
              help='Target language code')
def main(src_file_path, src_lang, tgt_file_path, tgt_lang):
    """Command-line interface for WiMarka evaluation."""

Entry point for CLI execution. Wraps wmk_eval() with Click decorators.

Task Modules

error_detection Module

wimarka.tasks.error_detection.split_words(text: str)[source]
wimarka.tasks.error_detection.tokenize_with_spans(text: str)[source]
wimarka.tasks.error_detection.format_tagged_sentence_using_spans(original_text: str, token_spans, labels)[source]
wimarka.tasks.error_detection.error_detection(source_sentence, target_sentence)[source]

Main Function:

def error_detection(src_line: str, tgt_line: str) -> List[str]

Detects translation errors between source and target sentences.

Parameters:

  • src_line: Source sentence with language tag

  • tgt_line: Target sentence with language tag

Returns: List of detected error descriptions

Error Types:

  • Lexical errors (wrong word choice)

  • Syntactic errors (grammar issues)

  • Semantic errors (meaning loss)

  • Morphological errors (wrong affixes)

  • Omissions/additions

scoring Module

wimarka.tasks.scoring.classify(score)[source]
wimarka.tasks.scoring.scoring(source, target, errors)[source]

Main Function:

def scoring(src_line: str, tgt_line: str, errors: List[str]) -> Tuple[float, float, float]

Calculates quality scores for the translation.

Parameters:

  • src_line: Source sentence with tag

  • tgt_line: Target sentence with tag

  • errors: List of detected errors from error_detection

Returns: Tuple of (fluency_score, adequacy_score, overall_score)

  • All scores are floats in range [0, 100]

  • Overall score = (fluency + adequacy) / 2

explanation Module

wimarka.tasks.explanation.generate_explanation(src, tgt, errors, fluency, adequacy, overall)[source]

Main Function:

def generate_explanation(src_line: str, tgt_line: str, errors: List[str],
                        fluency: float, adequacy: float, overall: float) -> str

Generates human-readable explanation of the evaluation.

Parameters:

  • src_line: Source sentence

  • tgt_line: Target sentence

  • errors: Detected errors

  • fluency: Fluency score

  • adequacy: Adequacy score

  • overall: Overall score

Returns: Natural language explanation string

correction Module

wimarka.tasks.correction.generate_correction(src, tgt, errors, comments)[source]

Main Function:

def generate_correction(src_line: str, tgt_line: str,
                       errors: List[str], comments: str) -> str

Generates corrected translation suggestion.

Parameters:

  • src_line: Source sentence

  • tgt_line: Target sentence

  • errors: Detected errors

  • comments: Explanation from explanation module

Returns: Suggested corrected translation

Utility Modules

helper Module

wimarka.utils.helper.add_tag(line, tag: str)[source]
wimarka.utils.helper.check_tag(src_tag: str, tgt_tag: str)[source]
wimarka.utils.helper.get_column(X)[source]
wimarka.utils.helper.printEvaluationResults(results)[source]

Key Functions:

def check_tag(src_lang: str, tgt_lang: str) -> None:
    """Validate language codes."""

def add_tag(sentence: str, lang: str) -> str:
    """Add language tag to sentence."""

def printEvaluationResults(results: dict) -> None:
    """Print formatted evaluation results."""

logger Module

wimarka.utils.logger.setup_logger(log_dir: str = 'logs', name: str = 'wimarka', level=20)[source]

Main Function:

def setup_logger() -> logging.Logger:
    """Configure and return logger instance."""

Usage:

from wimarka.utils.logger import setup_logger

logger = setup_logger()
logger.info("Processing started")
logger.error("An error occurred")

model Module

wimarka.utils.model.load_model(model_name: str)[source]

Key Functions:

def load_model(model_name: str):
    """Load model from HuggingFace Hub or cache."""

def get_model_path(model_name: str) -> str:
    """Get local path to cached model."""

cache Module

Caching utilities for model responses and intermediate results.

torch Module

wimarka.utils.torch.get_device()[source]
wimarka.utils.torch.move_model_to_device(model, device=None)[source]

Device Management:

import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Type Hints

Common type hints used throughout WiMarka:

from typing import List, Tuple, Dict, Optional

# Sentence with language tag
TaggedSentence = str  # Format: "[LANG] sentence text"

# Error list
ErrorList = List[str]

# Scores tuple
Scores = Tuple[float, float, float]  # (fluency, adequacy, overall)

# Results structure
ResultsDict = Dict[str, List]

Constants

Language Codes

SUPPORTED_LANGUAGES = ['EN', 'CEB', 'ILO', 'TGT']

LANGUAGE_NAMES = {
    'EN': 'English',
    'CEB': 'Cebuano',
    'ILO': 'Ilocano',
    'TGT': 'Tagalog'
}

Score Ranges

MIN_SCORE = 0
MAX_SCORE = 100

# Quality thresholds
EXCELLENT_THRESHOLD = 90
GOOD_THRESHOLD = 75
FAIR_THRESHOLD = 60

Configuration

Model Identifiers

Models are identified by HuggingFace repository names or local paths.

See wimarka/config.py for complete configuration.

Best Practices

Using the API

  1. Import Correctly:

    from wimarka.main import wmk_eval, results  # Correct
    import wimarka  # Less efficient
    
  2. Handle Results:

    # Copy results if needed for multiple evaluations
    from copy import deepcopy
    
    wmk_eval('src1.txt', 'EN', 'tgt1.txt', 'CEB')
    results1 = deepcopy(results)
    
    wmk_eval('src2.txt', 'EN', 'tgt2.txt', 'CEB')
    results2 = deepcopy(results)
    
  3. Error Handling:

    try:
        wmk_eval('src.txt', 'EN', 'tgt.txt', 'CEB')
    except ValueError as e:
        print(f"Validation error: {e}")
    except FileNotFoundError as e:
        print(f"File not found: {e}")
    

Extending the API

To add custom processing:

from wimarka.main import wmk_eval, results

def custom_evaluation(src, tgt, lang, custom_metric):
    """Custom evaluation with additional metric."""
    # Run standard evaluation
    wmk_eval(src, 'EN', tgt, lang)

    # Add custom processing
    custom_scores = []
    for i in range(len(results['source'])):
        score = custom_metric(
            results['source'][i],
            results['target'][i]
        )
        custom_scores.append(score)

    # Add to results
    results['custom_score'] = custom_scores

    return results

See Also