Task Modules ============ This document provides detailed documentation of the four core task modules that implement WiMarka's evaluation pipeline. Overview -------- The task modules are located in ``wimarka/tasks/`` and each implements a specific stage of the evaluation: 1. **error_detection.py** - Identifies translation errors 2. **scoring.py** - Calculates quality metrics 3. **explanation.py** - Generates human explanations 4. **correction.py** - Suggests improvements All task modules are designed to be: * **Independent**: Can function standalone * **Composable**: Output feeds naturally into next stage * **Extensible**: Easy to modify or replace error_detection Module ---------------------- **File**: ``wimarka/tasks/error_detection.py`` Purpose ~~~~~~~ Identifies specific translation errors by comparing source and target sentences. Implementation ~~~~~~~~~~~~~~ Uses LLM-based analysis to detect: * Lexical errors (wrong words) * Syntactic errors (grammar problems) * Semantic errors (meaning loss/distortion) * Morphological errors (incorrect word forms) * Omissions (missing information) * Additions (extra information) Function Signature ~~~~~~~~~~~~~~~~~~ .. code-block:: python def error_detection(src_line: str, tgt_line: str) -> List[str] **Parameters**: * ``src_line``: Source sentence with language tag (e.g., "[EN] Good morning!") * ``tgt_line``: Target sentence with language tag (e.g., "[CEB] Maayong buntag!") **Returns**: List of error descriptions .. code-block:: python # No errors [] # With errors ['Semantic mismatch: time of day', 'Wrong verb tense'] Algorithm ~~~~~~~~~ 1. Construct prompt with source and target sentences 2. Query LLM for error analysis 3. Parse response into error list 4. Return structured error descriptions **Prompt Template** (simplified): .. code-block:: text Analyze the following translation and identify any errors: Source: {src_line} Target: {tgt_line} List all errors found, one per line. Example Usage ~~~~~~~~~~~~~ .. code-block:: python from wimarka.tasks.error_detection import error_detection src = "[EN] Good morning!" tgt = "[CEB] Maayong gabii!" # Wrong: gabii = evening errors = error_detection(src, tgt) print(errors) # Output: ['Semantic mismatch: time of day (morning vs evening)'] scoring Module -------------- **File**: ``wimarka/tasks/scoring.py`` Purpose ~~~~~~~ Calculates three quality metrics: fluency, adequacy, and overall score. Implementation ~~~~~~~~~~~~~~ Uses error count and LLM-based assessment to score translations on a 0-100 scale. Function Signature ~~~~~~~~~~~~~~~~~~ .. code-block:: python def scoring(src_line: str, tgt_line: str, errors: List[str]) -> Tuple[float, float, float] **Parameters**: * ``src_line``: Source sentence with tag * ``tgt_line``: Target sentence with tag * ``errors``: List of errors from error_detection **Returns**: Tuple of (fluency, adequacy, overall) .. code-block:: python (95.0, 98.0, 96.5) # All scores 0-100 Scoring Methodology ~~~~~~~~~~~~~~~~~~~ **Fluency Score**: * Evaluates grammatical correctness * Considers naturalness in target language * Affected by syntactic and morphological errors **Adequacy Score**: * Evaluates meaning preservation * Checks for omissions and additions * Affected by semantic errors **Overall Score**: * Simple average: ``(fluency + adequacy) / 2`` * Provides quick quality assessment Algorithm ~~~~~~~~~ 1. Analyze error list for severity 2. Query LLM for fluency assessment 3. Query LLM for adequacy assessment 4. Calculate overall score as average 5. Return all three scores Example Usage ~~~~~~~~~~~~~ .. code-block:: python from wimarka.tasks.scoring import scoring src = "[EN] Good morning!" tgt = "[CEB] Maayong buntag!" errors = [] # No errors fluency, adequacy, overall = scoring(src, tgt, errors) print(f"Scores: F={fluency}, A={adequacy}, O={overall}") # Output: Scores: F=100.0, A=100.0, O=100.0 explanation Module ------------------ **File**: ``wimarka/tasks/explanation.py`` Purpose ~~~~~~~ Generates human-readable explanations of the evaluation results. Implementation ~~~~~~~~~~~~~~ Synthesizes information from all previous stages into coherent natural language. Function Signature ~~~~~~~~~~~~~~~~~~ .. code-block:: python def generate_explanation( src_line: str, tgt_line: str, errors: List[str], fluency: float, adequacy: float, overall: float ) -> str **Parameters**: * ``src_line``: Source sentence * ``tgt_line``: Target sentence * ``errors``: Detected errors * ``fluency``: Fluency score * ``adequacy``: Adequacy score * ``overall``: Overall score **Returns**: Natural language explanation string Explanation Components ~~~~~~~~~~~~~~~~~~~~~~ A good explanation includes: 1. **Overall Assessment**: Quality judgment 2. **Specific Issues**: References to detected errors 3. **Score Context**: Why scores are high/low 4. **Constructive Feedback**: Actionable insights Algorithm ~~~~~~~~~ 1. Construct context from all inputs 2. Query LLM for explanation generation 3. Format and clean response 4. Return explanation string Example Usage ~~~~~~~~~~~~~ .. code-block:: python from wimarka.tasks.explanation import generate_explanation src = "[EN] Good morning!" tgt = "[CEB] Maayong gabii!" errors = ['Semantic mismatch: time of day'] explanation = generate_explanation( src, tgt, errors, fluency=95.0, adequacy=40.0, overall=67.5 ) print(explanation) # Output: "The translation has high fluency but low adequacy due to # incorrect time reference. 'Morning' was translated as # 'gabii' (evening)." correction Module ----------------- **File**: ``wimarka/tasks/correction.py`` Purpose ~~~~~~~ Generates improved translation suggestions based on detected errors. Implementation ~~~~~~~~~~~~~~ Uses error information and explanations to suggest corrections that address identified issues. Function Signature ~~~~~~~~~~~~~~~~~~ .. code-block:: python def generate_correction( src_line: str, tgt_line: str, errors: List[str], comments: str ) -> str **Parameters**: * ``src_line``: Source sentence * ``tgt_line``: Target sentence * ``errors``: Detected errors * ``comments``: Explanation from explanation module **Returns**: Suggested corrected translation Correction Strategy ~~~~~~~~~~~~~~~~~~~ 1. **Error-Focused**: Addresses specific detected errors 2. **Conservative**: Changes only what's needed 3. **Validated**: Ensures correction is better than original Algorithm ~~~~~~~~~ 1. Analyze errors and explanation 2. Identify parts requiring correction 3. Generate improved translation via LLM 4. Validate correction quality 5. Return corrected sentence Example Usage ~~~~~~~~~~~~~ .. code-block:: python from wimarka.tasks.correction import generate_correction src = "[EN] Good morning!" tgt = "[CEB] Maayong gabii!" errors = ['Semantic mismatch: time of day'] comments = "Wrong time of day: evening instead of morning" correction = generate_correction(src, tgt, errors, comments) print(correction) # Output: "Maayong buntag!" Task Module Integration ----------------------- Complete Pipeline Example ~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from wimarka.tasks import ( error_detection, scoring, explanation, correction ) # Input src = "[EN] How are you today?" tgt = "[CEB] Kumusta ka karon?" # Stage 1: Error Detection errors = error_detection.error_detection(src, tgt) print(f"Errors: {errors}") # Stage 2: Scoring fluency, adequacy, overall = scoring.scoring(src, tgt, errors) print(f"Scores: F={fluency}, A={adequacy}, O={overall}") # Stage 3: Explanation explanation_text = explanation.generate_explanation( src, tgt, errors, fluency, adequacy, overall ) print(f"Explanation: {explanation_text}") # Stage 4: Correction corrected = correction.generate_correction( src, tgt, errors, explanation_text ) print(f"Corrected: {corrected}") Data Flow Between Modules ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: text src_line, tgt_line │ ▼ ┌──────────────────┐ │ error_detection │ └────────┬─────────┘ │ errors[] ▼ ┌──────────────────┐ │ scoring │ └────────┬─────────┘ │ fluency, adequacy, overall ▼ ┌──────────────────┐ │ explanation │ └────────┬─────────┘ │ explanation_text ▼ ┌──────────────────┐ │ correction │ └────────┬─────────┘ │ corrected_translation ▼ Results Customization ------------- Replacing a Task Module ~~~~~~~~~~~~~~~~~~~~~~~ To replace a module with custom logic: .. code-block:: python # Custom error detection def my_error_detection(src, tgt): """Custom implementation.""" errors = [] # Your logic here return errors # Use in evaluation from wimarka import main # Replace function main.tasks.error_detection.error_detection = my_error_detection # Run evaluation main.wmk_eval('src.txt', 'EN', 'tgt.txt', 'CEB') Adding a New Task ~~~~~~~~~~~~~~~~~ To add a new evaluation task: 1. Create new file in ``wimarka/tasks/`` 2. Implement main function 3. Import in ``main.py`` 4. Integrate into pipeline Example: .. code-block:: python # wimarka/tasks/style_analysis.py def analyze_style(src_line: str, tgt_line: str) -> str: """Analyze translation style.""" # Implementation return style_report # In main.py from wimarka.tasks import style_analysis # Add to pipeline style_report = style_analysis.analyze_style(src_line, tgt_line) results['style'].append(style_report) Performance Considerations -------------------------- Optimization Opportunities ~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. **Batch Processing**: Process multiple sentences in one LLM call 2. **Caching**: Cache LLM responses for identical inputs 3. **Parallel Execution**: Run independent tasks in parallel **Current Implementation**: Sequential for simplicity and determinism **Future**: Async/parallel processing for speed Best Practices -------------- For Task Module Development ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. **Keep Modules Independent**: Minimal dependencies between tasks 2. **Clear Interfaces**: Well-defined input/output types 3. **Error Handling**: Graceful failure, informative errors 4. **Documentation**: Docstrings for all functions 5. **Testing**: Unit tests for each module See Also -------- * :doc:`api_reference` - Complete API documentation * :doc:`architecture` - System architecture details * :doc:`extending` - Guide to extending WiMarka