Task Modules

This document provides detailed documentation of the four core task modules that implement WiMarka’s evaluation pipeline.

Overview

The task modules are located in wimarka/tasks/ and each implements a specific stage of the evaluation:

error_detection.py - Identifies translation errors
scoring.py - Calculates quality metrics
explanation.py - Generates human explanations
correction.py - Suggests improvements

All task modules are designed to be:

Independent: Can function standalone
Composable: Output feeds naturally into next stage
Extensible: Easy to modify or replace

error_detection Module

File: wimarka/tasks/error_detection.py

Purpose

Identifies specific translation errors by comparing source and target sentences.

Implementation

Uses LLM-based analysis to detect:

Lexical errors (wrong words)
Syntactic errors (grammar problems)
Semantic errors (meaning loss/distortion)
Morphological errors (incorrect word forms)
Omissions (missing information)
Additions (extra information)

Function Signature

def error_detection(src_line: str, tgt_line: str) -> List[str]

Parameters:

src_line: Source sentence with language tag (e.g., “[EN] Good morning!”)
tgt_line: Target sentence with language tag (e.g., “[CEB] Maayong buntag!”)

Returns: List of error descriptions

# No errors
[]

# With errors
['Semantic mismatch: time of day', 'Wrong verb tense']

Algorithm

Construct prompt with source and target sentences
Query LLM for error analysis
Parse response into error list
Return structured error descriptions

Prompt Template (simplified):

Analyze the following translation and identify any errors:

Source: {src_line}
Target: {tgt_line}

List all errors found, one per line.

Example Usage

from wimarka.tasks.error_detection import error_detection

src = "[EN] Good morning!"
tgt = "[CEB] Maayong gabii!"  # Wrong: gabii = evening

errors = error_detection(src, tgt)
print(errors)
# Output: ['Semantic mismatch: time of day (morning vs evening)']

scoring Module

File: wimarka/tasks/scoring.py

Purpose

Calculates three quality metrics: fluency, adequacy, and overall score.

Implementation

Uses error count and LLM-based assessment to score translations on a 0-100 scale.

Function Signature

def scoring(src_line: str, tgt_line: str, errors: List[str]) -> Tuple[float, float, float]

Parameters:

src_line: Source sentence with tag
tgt_line: Target sentence with tag
errors: List of errors from error_detection

Returns: Tuple of (fluency, adequacy, overall)

(95.0, 98.0, 96.5)  # All scores 0-100

Scoring Methodology

Fluency Score:

Evaluates grammatical correctness
Considers naturalness in target language
Affected by syntactic and morphological errors

Adequacy Score:

Evaluates meaning preservation
Checks for omissions and additions
Affected by semantic errors

Overall Score:

Simple average: (fluency + adequacy) / 2
Provides quick quality assessment

Algorithm

Analyze error list for severity
Query LLM for fluency assessment
Query LLM for adequacy assessment
Calculate overall score as average
Return all three scores

Example Usage

from wimarka.tasks.scoring import scoring

src = "[EN] Good morning!"
tgt = "[CEB] Maayong buntag!"
errors = []  # No errors

fluency, adequacy, overall = scoring(src, tgt, errors)
print(f"Scores: F={fluency}, A={adequacy}, O={overall}")
# Output: Scores: F=100.0, A=100.0, O=100.0

explanation Module

File: wimarka/tasks/explanation.py

Purpose

Generates human-readable explanations of the evaluation results.

Implementation

Synthesizes information from all previous stages into coherent natural language.

Function Signature

def generate_explanation(
    src_line: str,
    tgt_line: str,
    errors: List[str],
    fluency: float,
    adequacy: float,
    overall: float
) -> str

Parameters:

src_line: Source sentence
tgt_line: Target sentence
errors: Detected errors
fluency: Fluency score
adequacy: Adequacy score
overall: Overall score

Returns: Natural language explanation string

Explanation Components

A good explanation includes:

Overall Assessment: Quality judgment
Specific Issues: References to detected errors
Score Context: Why scores are high/low
Constructive Feedback: Actionable insights

Algorithm

Construct context from all inputs
Query LLM for explanation generation
Format and clean response
Return explanation string

Example Usage

from wimarka.tasks.explanation import generate_explanation

src = "[EN] Good morning!"
tgt = "[CEB] Maayong gabii!"
errors = ['Semantic mismatch: time of day']

explanation = generate_explanation(
    src, tgt, errors,
    fluency=95.0,
    adequacy=40.0,
    overall=67.5
)
print(explanation)
# Output: "The translation has high fluency but low adequacy due to
#          incorrect time reference. 'Morning' was translated as
#          'gabii' (evening)."

correction Module

File: wimarka/tasks/correction.py

Purpose

Generates improved translation suggestions based on detected errors.

Implementation

Uses error information and explanations to suggest corrections that address identified issues.

Function Signature

def generate_correction(
    src_line: str,
    tgt_line: str,
    errors: List[str],
    comments: str
) -> str

Parameters:

src_line: Source sentence
tgt_line: Target sentence
errors: Detected errors
comments: Explanation from explanation module

Returns: Suggested corrected translation

Correction Strategy

Error-Focused: Addresses specific detected errors
Conservative: Changes only what’s needed
Validated: Ensures correction is better than original

Algorithm

Analyze errors and explanation
Identify parts requiring correction
Generate improved translation via LLM
Validate correction quality
Return corrected sentence

Example Usage

from wimarka.tasks.correction import generate_correction

src = "[EN] Good morning!"
tgt = "[CEB] Maayong gabii!"
errors = ['Semantic mismatch: time of day']
comments = "Wrong time of day: evening instead of morning"

correction = generate_correction(src, tgt, errors, comments)
print(correction)
# Output: "Maayong buntag!"

Task Module Integration

Complete Pipeline Example

from wimarka.tasks import (
    error_detection, scoring,
    explanation, correction
)

# Input
src = "[EN] How are you today?"
tgt = "[CEB] Kumusta ka karon?"

# Stage 1: Error Detection
errors = error_detection.error_detection(src, tgt)
print(f"Errors: {errors}")

# Stage 2: Scoring
fluency, adequacy, overall = scoring.scoring(src, tgt, errors)
print(f"Scores: F={fluency}, A={adequacy}, O={overall}")

# Stage 3: Explanation
explanation_text = explanation.generate_explanation(
    src, tgt, errors, fluency, adequacy, overall
)
print(f"Explanation: {explanation_text}")

# Stage 4: Correction
corrected = correction.generate_correction(
    src, tgt, errors, explanation_text
)
print(f"Corrected: {corrected}")

Data Flow Between Modules

src_line, tgt_line
       │
       ▼
┌──────────────────┐
│ error_detection  │
└────────┬─────────┘
         │ errors[]
         ▼
┌──────────────────┐
│     scoring      │
└────────┬─────────┘
         │ fluency, adequacy, overall
         ▼
┌──────────────────┐
│   explanation    │
└────────┬─────────┘
         │ explanation_text
         ▼
┌──────────────────┐
│   correction     │
└────────┬─────────┘
         │ corrected_translation
         ▼
       Results

Customization

Replacing a Task Module

To replace a module with custom logic:

# Custom error detection
def my_error_detection(src, tgt):
    """Custom implementation."""
    errors = []
    # Your logic here
    return errors

# Use in evaluation
from wimarka import main

# Replace function
main.tasks.error_detection.error_detection = my_error_detection

# Run evaluation
main.wmk_eval('src.txt', 'EN', 'tgt.txt', 'CEB')

Adding a New Task

To add a new evaluation task:

Create new file in wimarka/tasks/
Implement main function
Import in main.py
Integrate into pipeline

Example:

# wimarka/tasks/style_analysis.py
def analyze_style(src_line: str, tgt_line: str) -> str:
    """Analyze translation style."""
    # Implementation
    return style_report

# In main.py
from wimarka.tasks import style_analysis

# Add to pipeline
style_report = style_analysis.analyze_style(src_line, tgt_line)
results['style'].append(style_report)

Performance Considerations

Optimization Opportunities

Batch Processing: Process multiple sentences in one LLM call
Caching: Cache LLM responses for identical inputs
Parallel Execution: Run independent tasks in parallel

Current Implementation: Sequential for simplicity and determinism

Future: Async/parallel processing for speed

Best Practices

For Task Module Development

Keep Modules Independent: Minimal dependencies between tasks
Clear Interfaces: Well-defined input/output types
Error Handling: Graceful failure, informative errors
Documentation: Docstrings for all functions
Testing: Unit tests for each module

Task Modules

Overview

error_detection Module

Purpose

Implementation

Function Signature

Algorithm

Example Usage

scoring Module

Purpose

Implementation

Function Signature

Scoring Methodology

Algorithm

Example Usage

explanation Module

Purpose

Implementation

Function Signature

Explanation Components

Algorithm

Example Usage

correction Module

Purpose

Implementation

Function Signature

Correction Strategy

Algorithm

Example Usage

Task Module Integration

Complete Pipeline Example

Data Flow Between Modules

Customization

Replacing a Task Module

Adding a New Task

Performance Considerations

Optimization Opportunities

Best Practices

For Task Module Development

See Also