Task Modules
This document provides detailed documentation of the four core task modules that implement WiMarka’s evaluation pipeline.
Overview
The task modules are located in wimarka/tasks/ and each implements a specific stage of the evaluation:
error_detection.py - Identifies translation errors
scoring.py - Calculates quality metrics
explanation.py - Generates human explanations
correction.py - Suggests improvements
All task modules are designed to be:
Independent: Can function standalone
Composable: Output feeds naturally into next stage
Extensible: Easy to modify or replace
error_detection Module
File: wimarka/tasks/error_detection.py
Purpose
Identifies specific translation errors by comparing source and target sentences.
Implementation
Uses LLM-based analysis to detect:
Lexical errors (wrong words)
Syntactic errors (grammar problems)
Semantic errors (meaning loss/distortion)
Morphological errors (incorrect word forms)
Omissions (missing information)
Additions (extra information)
Function Signature
def error_detection(src_line: str, tgt_line: str) -> List[str]
Parameters:
src_line: Source sentence with language tag (e.g., “[EN] Good morning!”)tgt_line: Target sentence with language tag (e.g., “[CEB] Maayong buntag!”)
Returns: List of error descriptions
# No errors
[]
# With errors
['Semantic mismatch: time of day', 'Wrong verb tense']
Algorithm
Construct prompt with source and target sentences
Query LLM for error analysis
Parse response into error list
Return structured error descriptions
Prompt Template (simplified):
Analyze the following translation and identify any errors:
Source: {src_line}
Target: {tgt_line}
List all errors found, one per line.
Example Usage
from wimarka.tasks.error_detection import error_detection
src = "[EN] Good morning!"
tgt = "[CEB] Maayong gabii!" # Wrong: gabii = evening
errors = error_detection(src, tgt)
print(errors)
# Output: ['Semantic mismatch: time of day (morning vs evening)']
scoring Module
File: wimarka/tasks/scoring.py
Purpose
Calculates three quality metrics: fluency, adequacy, and overall score.
Implementation
Uses error count and LLM-based assessment to score translations on a 0-100 scale.
Function Signature
def scoring(src_line: str, tgt_line: str, errors: List[str]) -> Tuple[float, float, float]
Parameters:
src_line: Source sentence with tagtgt_line: Target sentence with tagerrors: List of errors from error_detection
Returns: Tuple of (fluency, adequacy, overall)
(95.0, 98.0, 96.5) # All scores 0-100
Scoring Methodology
Fluency Score:
Evaluates grammatical correctness
Considers naturalness in target language
Affected by syntactic and morphological errors
Adequacy Score:
Evaluates meaning preservation
Checks for omissions and additions
Affected by semantic errors
Overall Score:
Simple average:
(fluency + adequacy) / 2Provides quick quality assessment
Algorithm
Analyze error list for severity
Query LLM for fluency assessment
Query LLM for adequacy assessment
Calculate overall score as average
Return all three scores
Example Usage
from wimarka.tasks.scoring import scoring
src = "[EN] Good morning!"
tgt = "[CEB] Maayong buntag!"
errors = [] # No errors
fluency, adequacy, overall = scoring(src, tgt, errors)
print(f"Scores: F={fluency}, A={adequacy}, O={overall}")
# Output: Scores: F=100.0, A=100.0, O=100.0
explanation Module
File: wimarka/tasks/explanation.py
Purpose
Generates human-readable explanations of the evaluation results.
Implementation
Synthesizes information from all previous stages into coherent natural language.
Function Signature
def generate_explanation(
src_line: str,
tgt_line: str,
errors: List[str],
fluency: float,
adequacy: float,
overall: float
) -> str
Parameters:
src_line: Source sentencetgt_line: Target sentenceerrors: Detected errorsfluency: Fluency scoreadequacy: Adequacy scoreoverall: Overall score
Returns: Natural language explanation string
Explanation Components
A good explanation includes:
Overall Assessment: Quality judgment
Specific Issues: References to detected errors
Score Context: Why scores are high/low
Constructive Feedback: Actionable insights
Algorithm
Construct context from all inputs
Query LLM for explanation generation
Format and clean response
Return explanation string
Example Usage
from wimarka.tasks.explanation import generate_explanation
src = "[EN] Good morning!"
tgt = "[CEB] Maayong gabii!"
errors = ['Semantic mismatch: time of day']
explanation = generate_explanation(
src, tgt, errors,
fluency=95.0,
adequacy=40.0,
overall=67.5
)
print(explanation)
# Output: "The translation has high fluency but low adequacy due to
# incorrect time reference. 'Morning' was translated as
# 'gabii' (evening)."
correction Module
File: wimarka/tasks/correction.py
Purpose
Generates improved translation suggestions based on detected errors.
Implementation
Uses error information and explanations to suggest corrections that address identified issues.
Function Signature
def generate_correction(
src_line: str,
tgt_line: str,
errors: List[str],
comments: str
) -> str
Parameters:
src_line: Source sentencetgt_line: Target sentenceerrors: Detected errorscomments: Explanation from explanation module
Returns: Suggested corrected translation
Correction Strategy
Error-Focused: Addresses specific detected errors
Conservative: Changes only what’s needed
Validated: Ensures correction is better than original
Algorithm
Analyze errors and explanation
Identify parts requiring correction
Generate improved translation via LLM
Validate correction quality
Return corrected sentence
Example Usage
from wimarka.tasks.correction import generate_correction
src = "[EN] Good morning!"
tgt = "[CEB] Maayong gabii!"
errors = ['Semantic mismatch: time of day']
comments = "Wrong time of day: evening instead of morning"
correction = generate_correction(src, tgt, errors, comments)
print(correction)
# Output: "Maayong buntag!"
Task Module Integration
Complete Pipeline Example
from wimarka.tasks import (
error_detection, scoring,
explanation, correction
)
# Input
src = "[EN] How are you today?"
tgt = "[CEB] Kumusta ka karon?"
# Stage 1: Error Detection
errors = error_detection.error_detection(src, tgt)
print(f"Errors: {errors}")
# Stage 2: Scoring
fluency, adequacy, overall = scoring.scoring(src, tgt, errors)
print(f"Scores: F={fluency}, A={adequacy}, O={overall}")
# Stage 3: Explanation
explanation_text = explanation.generate_explanation(
src, tgt, errors, fluency, adequacy, overall
)
print(f"Explanation: {explanation_text}")
# Stage 4: Correction
corrected = correction.generate_correction(
src, tgt, errors, explanation_text
)
print(f"Corrected: {corrected}")
Data Flow Between Modules
src_line, tgt_line
│
▼
┌──────────────────┐
│ error_detection │
└────────┬─────────┘
│ errors[]
▼
┌──────────────────┐
│ scoring │
└────────┬─────────┘
│ fluency, adequacy, overall
▼
┌──────────────────┐
│ explanation │
└────────┬─────────┘
│ explanation_text
▼
┌──────────────────┐
│ correction │
└────────┬─────────┘
│ corrected_translation
▼
Results
Customization
Replacing a Task Module
To replace a module with custom logic:
# Custom error detection
def my_error_detection(src, tgt):
"""Custom implementation."""
errors = []
# Your logic here
return errors
# Use in evaluation
from wimarka import main
# Replace function
main.tasks.error_detection.error_detection = my_error_detection
# Run evaluation
main.wmk_eval('src.txt', 'EN', 'tgt.txt', 'CEB')
Adding a New Task
To add a new evaluation task:
Create new file in
wimarka/tasks/Implement main function
Import in
main.pyIntegrate into pipeline
Example:
# wimarka/tasks/style_analysis.py
def analyze_style(src_line: str, tgt_line: str) -> str:
"""Analyze translation style."""
# Implementation
return style_report
# In main.py
from wimarka.tasks import style_analysis
# Add to pipeline
style_report = style_analysis.analyze_style(src_line, tgt_line)
results['style'].append(style_report)
Performance Considerations
Optimization Opportunities
Batch Processing: Process multiple sentences in one LLM call
Caching: Cache LLM responses for identical inputs
Parallel Execution: Run independent tasks in parallel
Current Implementation: Sequential for simplicity and determinism
Future: Async/parallel processing for speed
Best Practices
For Task Module Development
Keep Modules Independent: Minimal dependencies between tasks
Clear Interfaces: Well-defined input/output types
Error Handling: Graceful failure, informative errors
Documentation: Docstrings for all functions
Testing: Unit tests for each module
See Also
API Reference - Complete API documentation
Architecture - System architecture details
Extending WiMarka - Guide to extending WiMarka