Understanding Output Format =========================== This guide explains how to interpret WiMarka's evaluation output and understand the metrics it provides. Output Structure ---------------- For each evaluated sentence pair, WiMarka provides: .. code-block:: text Line X: Source: Target: Errors: Fluency Score: <0-100> Adequacy Score: <0-100> Overall Score: <0-100> Explanation: Suggested Correction: Example Output ~~~~~~~~~~~~~~ .. code-block:: text Line 1: Source: Good morning! Target: Maayong buntag! Errors: [] Fluency Score: 100/100 Adequacy Score: 100/100 Overall Score: 100/100 Explanation: Perfect translation with correct meaning and natural grammar. Suggested Correction: Maayong buntag! Evaluation Metrics ------------------ WiMarka provides three primary metrics for each translation: Fluency Score ~~~~~~~~~~~~~ **Range**: 0-100 **Definition**: Measures how natural, grammatical, and readable the translation is in the target language. **What It Evaluates**: * Grammatical correctness * Natural word order * Proper use of particles and markers * Idiomatic expression * Overall readability **Interpretation**: .. list-table:: :header-rows: 1 :widths: 20 80 * - Score Range - Interpretation * - **90-100** - Excellent fluency. Translation reads like native text with natural grammar and word choice. * - **75-89** - Good fluency. Minor grammatical issues or slightly awkward phrasing, but generally understandable. * - **60-74** - Acceptable fluency. Noticeable grammatical errors or unnatural constructions, but meaning is clear. * - **40-59** - Poor fluency. Significant grammatical problems making the text difficult to read. * - **0-39** - Very poor fluency. Severe grammatical errors; text may be incomprehensible. **Example High Fluency** (Score: 95): .. code-block:: text EN: The weather is beautiful today. CEB: Nindot kaayo ang panahon karon. # Natural Cebuano with proper word order and particles **Example Low Fluency** (Score: 45): .. code-block:: text EN: The weather is beautiful today. CEB: Ang panahon nindot ka sa karon. # Awkward word order, incorrect particle usage Adequacy Score ~~~~~~~~~~~~~~ **Range**: 0-100 **Definition**: Measures how completely and accurately the translation conveys the meaning of the source text. **What It Evaluates**: * Semantic completeness * Preservation of meaning * No critical omissions * No added information * Correct interpretation **Interpretation**: .. list-table:: :header-rows: 1 :widths: 20 80 * - Score Range - Interpretation * - **90-100** - Excellent adequacy. All meaning fully preserved with correct interpretation. * - **75-89** - Good adequacy. Most meaning conveyed; minor details may be slightly different. * - **60-74** - Acceptable adequacy. Core meaning present but some information loss or distortion. * - **40-59** - Poor adequacy. Significant meaning loss; important information missing or wrong. * - **0-39** - Very poor adequacy. Most meaning lost or severely distorted. **Example High Adequacy** (Score: 98): .. code-block:: text EN: I bought three red apples at the market. TGT: Bumili ako ng tatlong pulang mansanas sa palengke. # All information preserved: quantity, color, item, location **Example Low Adequacy** (Score: 50): .. code-block:: text EN: I bought three red apples at the market. TGT: Bumili ako ng mansanas. # Missing: quantity, color, location Overall Score ~~~~~~~~~~~~~ **Range**: 0-100 **Definition**: Combined metric representing overall translation quality. **Calculation**: .. code-block:: python Overall Score = (Fluency Score + Adequacy Score) / 2 **Interpretation**: .. list-table:: :header-rows: 1 :widths: 20 80 * - Score Range - Quality Level * - **90-100** - Excellent - Publication-ready, professional quality * - **75-89** - Good - Minor improvements possible, generally acceptable * - **60-74** - Fair - Usable but needs revision * - **40-59** - Poor - Significant revision required * - **0-39** - Unacceptable - Major rework needed **Trade-offs**: A translation can have different fluency and adequacy scores: **High Fluency, Low Adequacy**: .. code-block:: text EN: I need to finish this report by Friday. CEB: Kinahanglan nakong human kini nga semana. (I need to finish this week) Fluency: 90 (grammatically correct Cebuano) Adequacy: 60 (loses specificity - "Friday" vs "this week") Overall: 75 **Low Fluency, High Adequacy**: .. code-block:: text EN: I need to finish this report by Friday. CEB: Ako kinahanglan finish kini report by Biyernes. Fluency: 55 (code-switching, unnatural) Adequacy: 95 (all meaning preserved) Overall: 75 Error Detection --------------- Error Types ~~~~~~~~~~~ WiMarka detects various error categories: .. list-table:: :header-rows: 1 :widths: 25 75 * - Error Type - Description * - **Lexical errors** - Wrong word choice, mistranslation * - **Syntactic errors** - Incorrect grammar, word order issues * - **Semantic errors** - Meaning distortion or loss * - **Morphological errors** - Wrong affixes, inflections * - **Pragmatic errors** - Incorrect formality, register * - **Omissions** - Missing information * - **Additions** - Unnecessary extra information Error Format ~~~~~~~~~~~~ Errors are listed as a Python list: .. code-block:: python # No errors Errors: [] # Single error Errors: ['Semantic mismatch: time of day'] # Multiple errors Errors: ['Lexical error: wrong verb choice', 'Omission: quantity not specified'] Understanding Errors ~~~~~~~~~~~~~~~~~~~~ **Example 1: Semantic Mismatch** .. code-block:: text Source: Good morning! Target: Maayong gabii! (Good evening!) Errors: ['Semantic mismatch: time of day'] Explanation: Incorrect time reference - 'morning' vs 'evening' **Example 2: Omission** .. code-block:: text Source: I bought three apples. Target: Bumili ako ng mansanas. (I bought apples.) Errors: ['Omission: quantity not specified'] Explanation: The number 'three' was not translated **Example 3: Syntactic Error** .. code-block:: text Source: The book is on the table. Target: Ang libro sa lamesa. (The book of table.) Errors: ['Syntactic error: missing verb'] Explanation: Location verb 'nasa' missing Explanations ------------ Natural Language Explanations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WiMarka generates human-readable explanations for the evaluation: **Perfect Translation**: .. code-block:: text Explanation: Excellent translation with accurate meaning and natural grammar. No errors detected. **Minor Issues**: .. code-block:: text Explanation: Good translation overall. Minor fluency issue with word order, but meaning is fully preserved. **Significant Problems**: .. code-block:: text Explanation: Translation has semantic error - wrong time of day. 'Morning' incorrectly translated as 'evening'. Grammar is correct but meaning is incorrect. Contextual Information ~~~~~~~~~~~~~~~~~~~~~~ Explanations may include: * Specific error locations * Linguistic reasoning * Cultural or pragmatic considerations * Alternative phrasings Suggested Corrections --------------------- How Corrections Work ~~~~~~~~~~~~~~~~~~~~ WiMarka provides improved translation suggestions: .. code-block:: text Target: Maayong gabii! Errors: ['Semantic mismatch: time of day'] Suggested Correction: Maayong buntag! **Correction Quality**: * Addresses detected errors * Maintains semantic accuracy * Improves fluency when possible * Preserves general style When to Use Corrections ~~~~~~~~~~~~~~~~~~~~~~~ .. note:: Suggested corrections are **recommendations**, not absolute truth. Always have corrections reviewed by native speakers for production use. **Good Use Cases**: * Quick fixes for obvious errors * Learning from mistakes * Identifying problem patterns **Exercise Caution**: * Critical translations (legal, medical) * Cultural/contextual nuances * Creative or literary content Output Examples by Quality --------------------------- Excellent Translation (90-100) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: text Line 1: Source: Thank you for your help. Target: Salamat sa imong tabang. Errors: [] Fluency Score: 98/100 Adequacy Score: 100/100 Overall Score: 99/100 Explanation: Perfect translation with natural phrasing and complete meaning preservation. Suggested Correction: Salamat sa imong tabang. Good Translation (75-89) ~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: text Line 1: Source: I will call you tomorrow. Target: Tawagan ko ikaw ugma. Errors: [] Fluency Score: 85/100 Adequacy Score: 98/100 Overall Score: 91.5/100 Explanation: Good translation. Slightly more natural would be 'Tawagan ta ka ugma' but current form is acceptable. Suggested Correction: Tawagan ta ka ugma. Fair Translation (60-74) ~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: text Line 1: Source: The meeting starts at 2 PM. Target: Ang meeting magsugod sa 2. Errors: ['Code-switching: English word "meeting"', 'Missing: PM specification'] Fluency Score: 70/100 Adequacy Score: 75/100 Overall Score: 72.5/100 Explanation: Acceptable but could be improved. Use 'miting' instead of 'meeting'. Add 'sa hapon' for PM. Suggested Correction: Ang miting magsugod sa alas 2 sa hapon. Poor Translation (Below 60) ~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: text Line 1: Source: Good morning! Target: Magandang gabi! Errors: ['Wrong language: Tagalog instead of Cebuano', 'Semantic error: evening instead of morning'] Fluency Score: 40/100 Adequacy Score: 30/100 Overall Score: 35/100 Explanation: Major errors. Wrong language used (Tagalog not Cebuano) and wrong time of day (evening not morning). Suggested Correction: Maayong buntag! Best Practices for Interpretation ---------------------------------- 1. **Consider Both Scores** Don't rely only on the overall score. Check both fluency and adequacy separately. 2. **Read Explanations** The explanation provides crucial context for understanding scores. 3. **Review Errors** Pay attention to the types of errors detected. 4. **Context Matters** Scores should be interpreted based on your use case (casual vs. professional). 5. **Use corrections Wisely** Treat suggestions as guidance, not absolute fixes. 6. **Batch Analysis** For multiple sentences, look at average scores and error patterns. Programmatic Access ------------------- Accessing Results in Python ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from wimarka.main import wmk_eval, results # Run evaluation wmk_eval('source.txt', 'EN', 'target.txt', 'CEB') # Access individual results for i in range(len(results['source'])): if results['overall_score'][i] < 70: print(f"Low quality at line {i+1}:") print(f" Score: {results['overall_score'][i]}") print(f" Errors: {results['errors'][i]}") Exporting for Analysis ~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python import pandas as pd from wimarka.main import results # Create DataFrame df = pd.DataFrame(results) # Calculate statistics print(f"Average Overall Score: {df['overall_score'].mean():.2f}") print(f"Std Dev: {df['overall_score'].std():.2f}") # Export df.to_csv('detailed_results.csv', index=False) Next Steps ---------- * See :doc:`examples` for complete evaluation workflows * See :doc:`usage_library` for programmatic result processing * See :doc:`usage_cli` for output redirection techniques