Quick Start Guide ================= Get up and running with WiMarka in just a few minutes! This guide will walk you through your first translation evaluation. Prerequisites ------------- Before starting, make sure you have: ✅ Installed WiMarka (see :doc:`installation`) ✅ Python 3.12 or higher ✅ Two text files: source text and its translation Your First Evaluation --------------------- Step 1: Prepare Your Input Files ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Create two text files with parallel sentences: **source_file.txt** (English): .. code-block:: text Good morning! How are you today? Thank you for your help. **target_file.txt** (Cebuano translation): .. code-block:: text Maayong buntag! Kumusta ka karon? Salamat sa imong tabang. .. important:: **File Format Requirements:** * One sentence per line * UTF-8 encoding * Same number of lines in both files * Lines correspond (line 1 in source matches line 1 in target) Step 2: Run WiMarka ~~~~~~~~~~~~~~~~~~~ You can use either the Python library or the command-line interface. Option A: Using Python ^^^^^^^^^^^^^^^^^^^^^^^ Create a Python script (``evaluate.py``): .. code-block:: python from wimarka.main import wmk_eval wmk_eval( src_file_path='source_file.txt', src_lang='EN', tgt_file_path='target_file.txt', tgt_lang='CEB' ) Run the script: .. code-block:: bash python evaluate.py Option B: Using CLI ^^^^^^^^^^^^^^^^^^^ Run directly from the command line: .. code-block:: bash wimarka --src_file_path source_file.txt \\ --src_lang EN \\ --tgt_file_path target_file.txt \\ --tgt_lang CEB Step 3: Understanding the Output ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ WiMarka will process each sentence pair and display progress: .. code-block:: text INFO - Starting evaluation... INFO - Evaluating line 1/3 INFO - Detecting errors... INFO - Scoring translation... INFO - Generating explanation... INFO - Correcting translation... INFO - Evaluating line 2/3 ... After processing all sentences, you'll see detailed results: .. code-block:: text === Evaluation Results === ---------------------------------------- Line 1: Source: Good morning! Target: Maayong buntag! Errors: [] Fluency Score: 100/100 Adequacy Score: 100/100 Overall Score: 100/100 Explanation: Excellent translation with no errors. Suggested Correction: Maayong buntag! ---------------------------------------- Line 2: Source: How are you today? Target: Kumusta ka karon? Errors: [] Fluency Score: 98/100 Adequacy Score: 95/100 Overall Score: 96.5/100 Explanation: Very good translation, minor fluency variation. Suggested Correction: Kumusta ka karon? ---------------------------------------- Evaluation completed. Understanding the Scores ------------------------ WiMarka provides three types of scores for each translation: Fluency Score (0-100) ~~~~~~~~~~~~~~~~~~~~~ Measures how natural and grammatically correct the translation reads in the target language. * **90-100**: Excellent, native-like fluency * **70-89**: Good fluency with minor issues * **50-69**: Acceptable but noticeable problems * **Below 50**: Poor fluency, difficult to understand Adequacy Score (0-100) ~~~~~~~~~~~~~~~~~~~~~~ Evaluates how well the translation preserves the meaning of the source text. * **90-100**: Complete meaning preservation * **70-89**: Most meaning preserved, minor omissions * **50-69**: Partial meaning loss * **Below 50**: Significant meaning loss Overall Score (0-100) ~~~~~~~~~~~~~~~~~~~~~ Combines fluency and adequacy into a single quality metric. * Calculated as: ``(Fluency + Adequacy) / 2`` * Provides a quick quality assessment Common Language Codes ---------------------- Use these codes when specifying source and target languages: .. list-table:: :header-rows: 1 :widths: 15 25 60 * - Code - Language - Usage Example * - ``EN`` - English - ``--src_lang EN`` * - ``CEB`` - Cebuano - ``--tgt_lang CEB`` * - ``ILO`` - Ilocano - ``--tgt_lang ILO`` * - ``TGT`` - Tagalog - ``--tgt_lang TGT`` See :doc:`supported_languages` for complete language information. Example with Errors ------------------- Let's try an evaluation with translation errors: **source_error.txt**: .. code-block:: text Good morning! **target_error.txt**: .. code-block:: text Maayong gabii! .. note:: "Gabii" means "evening" in Cebuano, which is incorrect for "morning" Run the evaluation: .. code-block:: bash wimarka --src_file_path source_error.txt \\ --src_lang EN \\ --tgt_file_path target_error.txt \\ --tgt_lang CEB Expected output: .. code-block:: text Line 1: Source: Good morning! Target: Maayong gabii! Errors: [Semantic mismatch: time of day] Fluency Score: 95/100 Adequacy Score: 40/100 Overall Score: 67.5/100 Explanation: The translation has incorrect time reference. "Morning" was translated as "gabii" (evening). Suggested Correction: Maayong buntag! Best Practices -------------- 📝 **File Preparation** * Use UTF-8 encoding for all text files * Keep sentences reasonably short (< 100 words) * Ensure proper sentence alignment 🎯 **Choosing Languages** * English is typically used as the source language * Select the appropriate Philippine language code for target ⚡ **Performance** * First run downloads models (may take time) * Subsequent runs are faster (models are cached) * For large files, consider batch processing 💾 **Storage** * Models are cached in ``~/.cache/huggingface/`` * Ensure adequate disk space (5-10 GB recommended) Next Steps ---------- Now that you've completed your first evaluation: * **For Python developers**: See :doc:`usage_library` for advanced programming examples * **For CLI users**: See :doc:`usage_cli` for complete command options * **For more examples**: See :doc:`examples` for real-world scenarios * **To understand output**: See :doc:`output_format` for detailed result interpretation