API Reference
This page provides complete API documentation for WiMarka’s public and internal interfaces.
Main Module (wimarka.main)
- wimarka.main.wmk_eval(src_file_path: str, src_lang: str, tgt_file_path: str, tgt_lang: str)[source]
wmk_eval
- wimarka.main.wmk_eval(src_file_path: str, src_lang: str, tgt_file_path: str, tgt_lang: str)[source]
Main evaluation function that orchestrates the entire pipeline.
Function Signature:
def wmk_eval(src_file_path: str, src_lang: str,
tgt_file_path: str, tgt_lang: str) -> None
Parameters:
src_file_path(str): Absolute or relative path to source text filesrc_lang(str): Source language code (EN, CEB, ILO, TGT)tgt_file_path(str): Absolute or relative path to target translation filetgt_lang(str): Target language code (CEB, ILO, TGT)
Returns: None (results stored in global results dictionary and printed)
Raises:
ValueError: If source and target files have different line countsFileNotFoundError: If input files don’t exist
Example:
from wimarka.main import wmk_eval
wmk_eval(
src_file_path='data/english.txt',
src_lang='EN',
tgt_file_path='data/cebuano.txt',
tgt_lang='CEB'
)
results Dictionary
Global dictionary storing evaluation results.
Structure:
results = {
'source': List[str], # Source sentences (with tags)
'target': List[str], # Target sentences (with tags)
'errors': List[List[str]], # Detected errors per sentence
'fluency_score': List[float], # Fluency scores (0-100)
'adequacy_score': List[float], # Adequacy scores (0-100)
'overall_score': List[float], # Overall scores (0-100)
'explanation': List[str], # Human-readable explanations
'corrected_translation': List[str] # Suggested corrections
}
Access Pattern:
from wimarka.main import wmk_eval, results
wmk_eval('src.txt', 'EN', 'tgt.txt', 'CEB')
# Access results
for i in range(len(results['source'])):
print(f"Score: {results['overall_score'][i]}")
CLI Module (wimarka.cli)
main
@click.command()
@click.option('--src_file_path', required=True,
help='Path to source text file')
@click.option('--src_lang', required=True,
help='Source language code')
@click.option('--tgt_file_path', required=True,
help='Path to target text file')
@click.option('--tgt_lang', required=True,
help='Target language code')
def main(src_file_path, src_lang, tgt_file_path, tgt_lang):
"""Command-line interface for WiMarka evaluation."""
Entry point for CLI execution. Wraps wmk_eval() with Click decorators.
Task Modules
error_detection Module
- wimarka.tasks.error_detection.format_tagged_sentence_using_spans(original_text: str, token_spans, labels)[source]
Main Function:
def error_detection(src_line: str, tgt_line: str) -> List[str]
Detects translation errors between source and target sentences.
Parameters:
src_line: Source sentence with language tagtgt_line: Target sentence with language tag
Returns: List of detected error descriptions
Error Types:
Lexical errors (wrong word choice)
Syntactic errors (grammar issues)
Semantic errors (meaning loss)
Morphological errors (wrong affixes)
Omissions/additions
scoring Module
Main Function:
def scoring(src_line: str, tgt_line: str, errors: List[str]) -> Tuple[float, float, float]
Calculates quality scores for the translation.
Parameters:
src_line: Source sentence with tagtgt_line: Target sentence with tagerrors: List of detected errors from error_detection
Returns: Tuple of (fluency_score, adequacy_score, overall_score)
All scores are floats in range [0, 100]
Overall score = (fluency + adequacy) / 2
explanation Module
- wimarka.tasks.explanation.generate_explanation(src, tgt, errors, fluency, adequacy, overall)[source]
Main Function:
def generate_explanation(src_line: str, tgt_line: str, errors: List[str],
fluency: float, adequacy: float, overall: float) -> str
Generates human-readable explanation of the evaluation.
Parameters:
src_line: Source sentencetgt_line: Target sentenceerrors: Detected errorsfluency: Fluency scoreadequacy: Adequacy scoreoverall: Overall score
Returns: Natural language explanation string
correction Module
Main Function:
def generate_correction(src_line: str, tgt_line: str,
errors: List[str], comments: str) -> str
Generates corrected translation suggestion.
Parameters:
src_line: Source sentencetgt_line: Target sentenceerrors: Detected errorscomments: Explanation from explanation module
Returns: Suggested corrected translation
Utility Modules
helper Module
Key Functions:
def check_tag(src_lang: str, tgt_lang: str) -> None:
"""Validate language codes."""
def add_tag(sentence: str, lang: str) -> str:
"""Add language tag to sentence."""
def printEvaluationResults(results: dict) -> None:
"""Print formatted evaluation results."""
logger Module
Main Function:
def setup_logger() -> logging.Logger:
"""Configure and return logger instance."""
Usage:
from wimarka.utils.logger import setup_logger
logger = setup_logger()
logger.info("Processing started")
logger.error("An error occurred")
model Module
Key Functions:
def load_model(model_name: str):
"""Load model from HuggingFace Hub or cache."""
def get_model_path(model_name: str) -> str:
"""Get local path to cached model."""
cache Module
Caching utilities for model responses and intermediate results.
torch Module
Device Management:
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
Type Hints
Common type hints used throughout WiMarka:
from typing import List, Tuple, Dict, Optional
# Sentence with language tag
TaggedSentence = str # Format: "[LANG] sentence text"
# Error list
ErrorList = List[str]
# Scores tuple
Scores = Tuple[float, float, float] # (fluency, adequacy, overall)
# Results structure
ResultsDict = Dict[str, List]
Constants
Language Codes
SUPPORTED_LANGUAGES = ['EN', 'CEB', 'ILO', 'TGT']
LANGUAGE_NAMES = {
'EN': 'English',
'CEB': 'Cebuano',
'ILO': 'Ilocano',
'TGT': 'Tagalog'
}
Score Ranges
MIN_SCORE = 0
MAX_SCORE = 100
# Quality thresholds
EXCELLENT_THRESHOLD = 90
GOOD_THRESHOLD = 75
FAIR_THRESHOLD = 60
Configuration
Model Identifiers
Models are identified by HuggingFace repository names or local paths.
See wimarka/config.py for complete configuration.
Best Practices
Using the API
Import Correctly:
from wimarka.main import wmk_eval, results # Correct import wimarka # Less efficient
Handle Results:
# Copy results if needed for multiple evaluations from copy import deepcopy wmk_eval('src1.txt', 'EN', 'tgt1.txt', 'CEB') results1 = deepcopy(results) wmk_eval('src2.txt', 'EN', 'tgt2.txt', 'CEB') results2 = deepcopy(results)
Error Handling:
try: wmk_eval('src.txt', 'EN', 'tgt.txt', 'CEB') except ValueError as e: print(f"Validation error: {e}") except FileNotFoundError as e: print(f"File not found: {e}")
Extending the API
To add custom processing:
from wimarka.main import wmk_eval, results
def custom_evaluation(src, tgt, lang, custom_metric):
"""Custom evaluation with additional metric."""
# Run standard evaluation
wmk_eval(src, 'EN', tgt, lang)
# Add custom processing
custom_scores = []
for i in range(len(results['source'])):
score = custom_metric(
results['source'][i],
results['target'][i]
)
custom_scores.append(score)
# Add to results
results['custom_score'] = custom_scores
return results
See Also
Task Modules - Detailed task module documentation
Utility Modules - Utility module internals
Architecture - System architecture overview
Extending WiMarka - Extension guides