Task Modules
============

This document provides detailed documentation of the four core task modules that implement WiMarka's evaluation pipeline.

Overview
--------

The task modules are located in ``wimarka/tasks/`` and each implements a specific stage of the evaluation:

1. **error_detection.py** - Identifies translation errors
2. **scoring.py** - Calculates quality metrics
3. **explanation.py** - Generates human explanations
4. **correction.py** - Suggests improvements

All task modules are designed to be:

* **Independent**: Can function standalone
* **Composable**: Output feeds naturally into next stage
* **Extensible**: Easy to modify or replace

error_detection Module
----------------------

**File**: ``wimarka/tasks/error_detection.py``

Purpose
~~~~~~~

Identifies specific translation errors by comparing source and target sentences.

Implementation
~~~~~~~~~~~~~~

Uses LLM-based analysis to detect:

* Lexical errors (wrong words)
* Syntactic errors (grammar problems)
* Semantic errors (meaning loss/distortion)
* Morphological errors (incorrect word forms)
* Omissions (missing information)
* Additions (extra information)

Function Signature
~~~~~~~~~~~~~~~~~~

.. code-block:: python

   def error_detection(src_line: str, tgt_line: str) -> List[str]

**Parameters**:

* ``src_line``: Source sentence with language tag (e.g., "[EN] Good morning!")
* ``tgt_line``: Target sentence with language tag (e.g., "[CEB] Maayong buntag!")

**Returns**: List of error descriptions

.. code-block:: python

   # No errors
   []
   
   # With errors
   ['Semantic mismatch: time of day', 'Wrong verb tense']

Algorithm
~~~~~~~~~

1. Construct prompt with source and target sentences
2. Query LLM for error analysis
3. Parse response into error list
4. Return structured error descriptions

**Prompt Template** (simplified):

.. code-block:: text

   Analyze the following translation and identify any errors:
   
   Source: {src_line}
   Target: {tgt_line}
   
   List all errors found, one per line.

Example Usage
~~~~~~~~~~~~~

.. code-block:: python

   from wimarka.tasks.error_detection import error_detection

   src = "[EN] Good morning!"
   tgt = "[CEB] Maayong gabii!"  # Wrong: gabii = evening
   
   errors = error_detection(src, tgt)
   print(errors)
   # Output: ['Semantic mismatch: time of day (morning vs evening)']

scoring Module
--------------

**File**: ``wimarka/tasks/scoring.py``

Purpose
~~~~~~~

Calculates three quality metrics: fluency, adequacy, and overall score.

Implementation
~~~~~~~~~~~~~~

Uses error count and LLM-based assessment to score translations on a 0-100 scale.

Function Signature
~~~~~~~~~~~~~~~~~~

.. code-block:: python

   def scoring(src_line: str, tgt_line: str, errors: List[str]) -> Tuple[float, float, float]

**Parameters**:

* ``src_line``: Source sentence with tag
* ``tgt_line``: Target sentence with tag
* ``errors``: List of errors from error_detection

**Returns**: Tuple of (fluency, adequacy, overall)

.. code-block:: python

   (95.0, 98.0, 96.5)  # All scores 0-100

Scoring Methodology
~~~~~~~~~~~~~~~~~~~

**Fluency Score**:

* Evaluates grammatical correctness
* Considers naturalness in target language
* Affected by syntactic and morphological errors

**Adequacy Score**:

* Evaluates meaning preservation
* Checks for omissions and additions
* Affected by semantic errors

**Overall Score**:

* Simple average: ``(fluency + adequacy) / 2``
* Provides quick quality assessment

Algorithm
~~~~~~~~~

1. Analyze error list for severity
2. Query LLM for fluency assessment
3. Query LLM for adequacy assessment
4. Calculate overall score as average
5. Return all three scores

Example Usage
~~~~~~~~~~~~~

.. code-block:: python

   from wimarka.tasks.scoring import scoring

   src = "[EN] Good morning!"
   tgt = "[CEB] Maayong buntag!"
   errors = []  # No errors
   
   fluency, adequacy, overall = scoring(src, tgt, errors)
   print(f"Scores: F={fluency}, A={adequacy}, O={overall}")
   # Output: Scores: F=100.0, A=100.0, O=100.0

explanation Module
------------------

**File**: ``wimarka/tasks/explanation.py``

Purpose
~~~~~~~

Generates human-readable explanations of the evaluation results.

Implementation
~~~~~~~~~~~~~~

Synthesizes information from all previous stages into coherent natural language.

Function Signature
~~~~~~~~~~~~~~~~~~

.. code-block:: python

   def generate_explanation(
       src_line: str,
       tgt_line: str,
       errors: List[str],
       fluency: float,
       adequacy: float,
       overall: float
   ) -> str

**Parameters**:

* ``src_line``: Source sentence
* ``tgt_line``: Target sentence
* ``errors``: Detected errors
* ``fluency``: Fluency score
* ``adequacy``: Adequacy score
* ``overall``: Overall score

**Returns**: Natural language explanation string

Explanation Components
~~~~~~~~~~~~~~~~~~~~~~

A good explanation includes:

1. **Overall Assessment**: Quality judgment
2. **Specific Issues**: References to detected errors
3. **Score Context**: Why scores are high/low
4. **Constructive Feedback**: Actionable insights

Algorithm
~~~~~~~~~

1. Construct context from all inputs
2. Query LLM for explanation generation
3. Format and clean response
4. Return explanation string

Example Usage
~~~~~~~~~~~~~

.. code-block:: python

   from wimarka.tasks.explanation import generate_explanation

   src = "[EN] Good morning!"
   tgt = "[CEB] Maayong gabii!"
   errors = ['Semantic mismatch: time of day']
   
   explanation = generate_explanation(
       src, tgt, errors,
       fluency=95.0,
       adequacy=40.0,
       overall=67.5
   )
   print(explanation)
   # Output: "The translation has high fluency but low adequacy due to 
   #          incorrect time reference. 'Morning' was translated as 
   #          'gabii' (evening)."

correction Module
-----------------

**File**: ``wimarka/tasks/correction.py``

Purpose
~~~~~~~

Generates improved translation suggestions based on detected errors.

Implementation
~~~~~~~~~~~~~~

Uses error information and explanations to suggest corrections that address identified issues.

Function Signature
~~~~~~~~~~~~~~~~~~

.. code-block:: python

   def generate_correction(
       src_line: str,
       tgt_line: str,
       errors: List[str],
       comments: str
   ) -> str

**Parameters**:

* ``src_line``: Source sentence
* ``tgt_line``: Target sentence
* ``errors``: Detected errors
* ``comments``: Explanation from explanation module

**Returns**: Suggested corrected translation

Correction Strategy
~~~~~~~~~~~~~~~~~~~

1. **Error-Focused**: Addresses specific detected errors
2. **Conservative**: Changes only what's needed
3. **Validated**: Ensures correction is better than original

Algorithm
~~~~~~~~~

1. Analyze errors and explanation
2. Identify parts requiring correction
3. Generate improved translation via LLM
4. Validate correction quality
5. Return corrected sentence

Example Usage
~~~~~~~~~~~~~

.. code-block:: python

   from wimarka.tasks.correction import generate_correction

   src = "[EN] Good morning!"
   tgt = "[CEB] Maayong gabii!"
   errors = ['Semantic mismatch: time of day']
   comments = "Wrong time of day: evening instead of morning"
   
   correction = generate_correction(src, tgt, errors, comments)
   print(correction)
   # Output: "Maayong buntag!"

Task Module Integration
-----------------------

Complete Pipeline Example
~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from wimarka.tasks import (
       error_detection, scoring,
       explanation, correction
   )

   # Input
   src = "[EN] How are you today?"
   tgt = "[CEB] Kumusta ka karon?"

   # Stage 1: Error Detection
   errors = error_detection.error_detection(src, tgt)
   print(f"Errors: {errors}")

   # Stage 2: Scoring
   fluency, adequacy, overall = scoring.scoring(src, tgt, errors)
   print(f"Scores: F={fluency}, A={adequacy}, O={overall}")

   # Stage 3: Explanation
   explanation_text = explanation.generate_explanation(
       src, tgt, errors, fluency, adequacy, overall
   )
   print(f"Explanation: {explanation_text}")

   # Stage 4: Correction
   corrected = correction.generate_correction(
       src, tgt, errors, explanation_text
   )
   print(f"Corrected: {corrected}")

Data Flow Between Modules
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: text

   src_line, tgt_line
          │
          ▼
   ┌──────────────────┐
   │ error_detection  │
   └────────┬─────────┘
            │ errors[]
            ▼
   ┌──────────────────┐
   │     scoring      │
   └────────┬─────────┘
            │ fluency, adequacy, overall
            ▼
   ┌──────────────────┐
   │   explanation    │
   └────────┬─────────┘
            │ explanation_text
            ▼
   ┌──────────────────┐
   │   correction     │
   └────────┬─────────┘
            │ corrected_translation
            ▼
          Results

Customization
-------------

Replacing a Task Module
~~~~~~~~~~~~~~~~~~~~~~~

To replace a module with custom logic:

.. code-block:: python

   # Custom error detection
   def my_error_detection(src, tgt):
       """Custom implementation."""
       errors = []
       # Your logic here
       return errors

   # Use in evaluation
   from wimarka import main
   
   # Replace function
   main.tasks.error_detection.error_detection = my_error_detection
   
   # Run evaluation
   main.wmk_eval('src.txt', 'EN', 'tgt.txt', 'CEB')

Adding a New Task
~~~~~~~~~~~~~~~~~

To add a new evaluation task:

1. Create new file in ``wimarka/tasks/``
2. Implement main function
3. Import in ``main.py``
4. Integrate into pipeline

Example:

.. code-block:: python

   # wimarka/tasks/style_analysis.py
   def analyze_style(src_line: str, tgt_line: str) -> str:
       """Analyze translation style."""
       # Implementation
       return style_report

   # In main.py
   from wimarka.tasks import style_analysis
   
   # Add to pipeline
   style_report = style_analysis.analyze_style(src_line, tgt_line)
   results['style'].append(style_report)

Performance Considerations
--------------------------

Optimization Opportunities
~~~~~~~~~~~~~~~~~~~~~~~~~~

1. **Batch Processing**: Process multiple sentences in one LLM call
2. **Caching**: Cache LLM responses for identical inputs
3. **Parallel Execution**: Run independent tasks in parallel

**Current Implementation**: Sequential for simplicity and determinism

**Future**: Async/parallel processing for speed

Best Practices
--------------

For Task Module Development
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1. **Keep Modules Independent**: Minimal dependencies between tasks
2. **Clear Interfaces**: Well-defined input/output types
3. **Error Handling**: Graceful failure, informative errors
4. **Documentation**: Docstrings for all functions
5. **Testing**: Unit tests for each module

See Also
--------

* :doc:`api_reference` - Complete API documentation
* :doc:`architecture` - System architecture details
* :doc:`extending` - Guide to extending WiMarka