Task Modules

This document provides detailed documentation of the four core task modules that implement WiMarka’s evaluation pipeline.

Overview

The task modules are located in wimarka/tasks/ and each implements a specific stage of the evaluation:

  1. error_detection.py - Identifies translation errors

  2. scoring.py - Calculates quality metrics

  3. explanation.py - Generates human explanations

  4. correction.py - Suggests improvements

All task modules are designed to be:

  • Independent: Can function standalone

  • Composable: Output feeds naturally into next stage

  • Extensible: Easy to modify or replace

error_detection Module

File: wimarka/tasks/error_detection.py

Purpose

Identifies specific translation errors by comparing source and target sentences.

Implementation

Uses LLM-based analysis to detect:

  • Lexical errors (wrong words)

  • Syntactic errors (grammar problems)

  • Semantic errors (meaning loss/distortion)

  • Morphological errors (incorrect word forms)

  • Omissions (missing information)

  • Additions (extra information)

Function Signature

def error_detection(src_line: str, tgt_line: str) -> List[str]

Parameters:

  • src_line: Source sentence with language tag (e.g., “[EN] Good morning!”)

  • tgt_line: Target sentence with language tag (e.g., “[CEB] Maayong buntag!”)

Returns: List of error descriptions

# No errors
[]

# With errors
['Semantic mismatch: time of day', 'Wrong verb tense']

Algorithm

  1. Construct prompt with source and target sentences

  2. Query LLM for error analysis

  3. Parse response into error list

  4. Return structured error descriptions

Prompt Template (simplified):

Analyze the following translation and identify any errors:

Source: {src_line}
Target: {tgt_line}

List all errors found, one per line.

Example Usage

from wimarka.tasks.error_detection import error_detection

src = "[EN] Good morning!"
tgt = "[CEB] Maayong gabii!"  # Wrong: gabii = evening

errors = error_detection(src, tgt)
print(errors)
# Output: ['Semantic mismatch: time of day (morning vs evening)']

scoring Module

File: wimarka/tasks/scoring.py

Purpose

Calculates three quality metrics: fluency, adequacy, and overall score.

Implementation

Uses error count and LLM-based assessment to score translations on a 0-100 scale.

Function Signature

def scoring(src_line: str, tgt_line: str, errors: List[str]) -> Tuple[float, float, float]

Parameters:

  • src_line: Source sentence with tag

  • tgt_line: Target sentence with tag

  • errors: List of errors from error_detection

Returns: Tuple of (fluency, adequacy, overall)

(95.0, 98.0, 96.5)  # All scores 0-100

Scoring Methodology

Fluency Score:

  • Evaluates grammatical correctness

  • Considers naturalness in target language

  • Affected by syntactic and morphological errors

Adequacy Score:

  • Evaluates meaning preservation

  • Checks for omissions and additions

  • Affected by semantic errors

Overall Score:

  • Simple average: (fluency + adequacy) / 2

  • Provides quick quality assessment

Algorithm

  1. Analyze error list for severity

  2. Query LLM for fluency assessment

  3. Query LLM for adequacy assessment

  4. Calculate overall score as average

  5. Return all three scores

Example Usage

from wimarka.tasks.scoring import scoring

src = "[EN] Good morning!"
tgt = "[CEB] Maayong buntag!"
errors = []  # No errors

fluency, adequacy, overall = scoring(src, tgt, errors)
print(f"Scores: F={fluency}, A={adequacy}, O={overall}")
# Output: Scores: F=100.0, A=100.0, O=100.0

explanation Module

File: wimarka/tasks/explanation.py

Purpose

Generates human-readable explanations of the evaluation results.

Implementation

Synthesizes information from all previous stages into coherent natural language.

Function Signature

def generate_explanation(
    src_line: str,
    tgt_line: str,
    errors: List[str],
    fluency: float,
    adequacy: float,
    overall: float
) -> str

Parameters:

  • src_line: Source sentence

  • tgt_line: Target sentence

  • errors: Detected errors

  • fluency: Fluency score

  • adequacy: Adequacy score

  • overall: Overall score

Returns: Natural language explanation string

Explanation Components

A good explanation includes:

  1. Overall Assessment: Quality judgment

  2. Specific Issues: References to detected errors

  3. Score Context: Why scores are high/low

  4. Constructive Feedback: Actionable insights

Algorithm

  1. Construct context from all inputs

  2. Query LLM for explanation generation

  3. Format and clean response

  4. Return explanation string

Example Usage

from wimarka.tasks.explanation import generate_explanation

src = "[EN] Good morning!"
tgt = "[CEB] Maayong gabii!"
errors = ['Semantic mismatch: time of day']

explanation = generate_explanation(
    src, tgt, errors,
    fluency=95.0,
    adequacy=40.0,
    overall=67.5
)
print(explanation)
# Output: "The translation has high fluency but low adequacy due to
#          incorrect time reference. 'Morning' was translated as
#          'gabii' (evening)."

correction Module

File: wimarka/tasks/correction.py

Purpose

Generates improved translation suggestions based on detected errors.

Implementation

Uses error information and explanations to suggest corrections that address identified issues.

Function Signature

def generate_correction(
    src_line: str,
    tgt_line: str,
    errors: List[str],
    comments: str
) -> str

Parameters:

  • src_line: Source sentence

  • tgt_line: Target sentence

  • errors: Detected errors

  • comments: Explanation from explanation module

Returns: Suggested corrected translation

Correction Strategy

  1. Error-Focused: Addresses specific detected errors

  2. Conservative: Changes only what’s needed

  3. Validated: Ensures correction is better than original

Algorithm

  1. Analyze errors and explanation

  2. Identify parts requiring correction

  3. Generate improved translation via LLM

  4. Validate correction quality

  5. Return corrected sentence

Example Usage

from wimarka.tasks.correction import generate_correction

src = "[EN] Good morning!"
tgt = "[CEB] Maayong gabii!"
errors = ['Semantic mismatch: time of day']
comments = "Wrong time of day: evening instead of morning"

correction = generate_correction(src, tgt, errors, comments)
print(correction)
# Output: "Maayong buntag!"

Task Module Integration

Complete Pipeline Example

from wimarka.tasks import (
    error_detection, scoring,
    explanation, correction
)

# Input
src = "[EN] How are you today?"
tgt = "[CEB] Kumusta ka karon?"

# Stage 1: Error Detection
errors = error_detection.error_detection(src, tgt)
print(f"Errors: {errors}")

# Stage 2: Scoring
fluency, adequacy, overall = scoring.scoring(src, tgt, errors)
print(f"Scores: F={fluency}, A={adequacy}, O={overall}")

# Stage 3: Explanation
explanation_text = explanation.generate_explanation(
    src, tgt, errors, fluency, adequacy, overall
)
print(f"Explanation: {explanation_text}")

# Stage 4: Correction
corrected = correction.generate_correction(
    src, tgt, errors, explanation_text
)
print(f"Corrected: {corrected}")

Data Flow Between Modules

src_line, tgt_line
       │
       ▼
┌──────────────────┐
│ error_detection  │
└────────┬─────────┘
         │ errors[]
         ▼
┌──────────────────┐
│     scoring      │
└────────┬─────────┘
         │ fluency, adequacy, overall
         ▼
┌──────────────────┐
│   explanation    │
└────────┬─────────┘
         │ explanation_text
         ▼
┌──────────────────┐
│   correction     │
└────────┬─────────┘
         │ corrected_translation
         ▼
       Results

Customization

Replacing a Task Module

To replace a module with custom logic:

# Custom error detection
def my_error_detection(src, tgt):
    """Custom implementation."""
    errors = []
    # Your logic here
    return errors

# Use in evaluation
from wimarka import main

# Replace function
main.tasks.error_detection.error_detection = my_error_detection

# Run evaluation
main.wmk_eval('src.txt', 'EN', 'tgt.txt', 'CEB')

Adding a New Task

To add a new evaluation task:

  1. Create new file in wimarka/tasks/

  2. Implement main function

  3. Import in main.py

  4. Integrate into pipeline

Example:

# wimarka/tasks/style_analysis.py
def analyze_style(src_line: str, tgt_line: str) -> str:
    """Analyze translation style."""
    # Implementation
    return style_report

# In main.py
from wimarka.tasks import style_analysis

# Add to pipeline
style_report = style_analysis.analyze_style(src_line, tgt_line)
results['style'].append(style_report)

Performance Considerations

Optimization Opportunities

  1. Batch Processing: Process multiple sentences in one LLM call

  2. Caching: Cache LLM responses for identical inputs

  3. Parallel Execution: Run independent tasks in parallel

Current Implementation: Sequential for simplicity and determinism

Future: Async/parallel processing for speed

Best Practices

For Task Module Development

  1. Keep Modules Independent: Minimal dependencies between tasks

  2. Clear Interfaces: Well-defined input/output types

  3. Error Handling: Graceful failure, informative errors

  4. Documentation: Docstrings for all functions

  5. Testing: Unit tests for each module

See Also