Quick Start Guide
=================

Get up and running with WiMarka in just a few minutes! This guide will walk you through your first translation evaluation.

Prerequisites
-------------

Before starting, make sure you have:

✅ Installed WiMarka (see :doc:`installation`)

✅ Python 3.12 or higher

✅ Two text files: source text and its translation

Your First Evaluation
---------------------

Step 1: Prepare Your Input Files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Create two text files with parallel sentences:

**source_file.txt** (English):

.. code-block:: text

   Good morning!
   How are you today?
   Thank you for your help.

**target_file.txt** (Cebuano translation):

.. code-block:: text

   Maayong buntag!
   Kumusta ka karon?
   Salamat sa imong tabang.

.. important::

   **File Format Requirements:**
   
   * One sentence per line
   * UTF-8 encoding
   * Same number of lines in both files
   * Lines correspond (line 1 in source matches line 1 in target)

Step 2: Run WiMarka
~~~~~~~~~~~~~~~~~~~

You can use either the Python library or the command-line interface.

Option A: Using Python
^^^^^^^^^^^^^^^^^^^^^^^

Create a Python script (``evaluate.py``):

.. code-block:: python

   from wimarka.main import wmk_eval

   wmk_eval(
       src_file_path='source_file.txt',
       src_lang='EN',
       tgt_file_path='target_file.txt',
       tgt_lang='CEB'
   )

Run the script:

.. code-block:: bash

   python evaluate.py

Option B: Using CLI
^^^^^^^^^^^^^^^^^^^

Run directly from the command line:

.. code-block:: bash

   wimarka --src_file_path source_file.txt \\
           --src_lang EN \\
           --tgt_file_path target_file.txt \\
           --tgt_lang CEB

Step 3: Understanding the Output
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

WiMarka will process each sentence pair and display progress:

.. code-block:: text

   INFO - Starting evaluation...
   INFO - Evaluating line 1/3
   INFO - Detecting errors...
   INFO - Scoring translation...
   INFO - Generating explanation...
   INFO - Correcting translation...
   INFO - Evaluating line 2/3
   ...

After processing all sentences, you'll see detailed results:

.. code-block:: text

   === Evaluation Results ===
   ----------------------------------------
   Line 1:
     Source: Good morning!
     Target: Maayong buntag!
     Errors: []
     Fluency Score: 100/100
     Adequacy Score: 100/100
     Overall Score: 100/100
     Explanation: Excellent translation with no errors.
     Suggested Correction: Maayong buntag!
   ----------------------------------------
   Line 2:
     Source: How are you today?
     Target: Kumusta ka karon?
     Errors: []
     Fluency Score: 98/100
     Adequacy Score: 95/100
     Overall Score: 96.5/100
     Explanation: Very good translation, minor fluency variation.
     Suggested Correction: Kumusta ka karon?
   ----------------------------------------

   Evaluation completed.

Understanding the Scores
------------------------

WiMarka provides three types of scores for each translation:

Fluency Score (0-100)
~~~~~~~~~~~~~~~~~~~~~

Measures how natural and grammatically correct the translation reads in the target language.

* **90-100**: Excellent, native-like fluency
* **70-89**: Good fluency with minor issues
* **50-69**: Acceptable but noticeable problems
* **Below 50**: Poor fluency, difficult to understand

Adequacy Score (0-100)
~~~~~~~~~~~~~~~~~~~~~~

Evaluates how well the translation preserves the meaning of the source text.

* **90-100**: Complete meaning preservation
* **70-89**: Most meaning preserved, minor omissions
* **50-69**: Partial meaning loss
* **Below 50**: Significant meaning loss

Overall Score (0-100)
~~~~~~~~~~~~~~~~~~~~~

Combines fluency and adequacy into a single quality metric.

* Calculated as: ``(Fluency + Adequacy) / 2``
* Provides a quick quality assessment

Common Language Codes
----------------------

Use these codes when specifying source and target languages:

.. list-table::
   :header-rows: 1
   :widths: 15 25 60

   * - Code
     - Language
     - Usage Example
   * - ``EN``
     - English
     - ``--src_lang EN``
   * - ``CEB``
     - Cebuano
     - ``--tgt_lang CEB``
   * - ``ILO``
     - Ilocano
     - ``--tgt_lang ILO``
   * - ``TGT``
     - Tagalog
     - ``--tgt_lang TGT``

See :doc:`supported_languages` for complete language information.

Example with Errors
-------------------

Let's try an evaluation with translation errors:

**source_error.txt**:

.. code-block:: text

   Good morning!

**target_error.txt**:

.. code-block:: text

   Maayong gabii!

.. note::
   "Gabii" means "evening" in Cebuano, which is incorrect for "morning"

Run the evaluation:

.. code-block:: bash

   wimarka --src_file_path source_error.txt \\
           --src_lang EN \\
           --tgt_file_path target_error.txt \\
           --tgt_lang CEB

Expected output:

.. code-block:: text

   Line 1:
     Source: Good morning!
     Target: Maayong gabii!
     Errors: [Semantic mismatch: time of day]
     Fluency Score: 95/100
     Adequacy Score: 40/100
     Overall Score: 67.5/100
     Explanation: The translation has incorrect time reference.
                  "Morning" was translated as "gabii" (evening).
     Suggested Correction: Maayong buntag!

Best Practices
--------------

📝 **File Preparation**
   * Use UTF-8 encoding for all text files
   * Keep sentences reasonably short (< 100 words)
   * Ensure proper sentence alignment

🎯 **Choosing Languages**
   * English is typically used as the source language
   * Select the appropriate Philippine language code for target

⚡ **Performance**
   * First run downloads models (may take time)
   * Subsequent runs are faster (models are cached)
   * For large files, consider batch processing

💾 **Storage**
   * Models are cached in ``~/.cache/huggingface/``
   * Ensure adequate disk space (5-10 GB recommended)

Next Steps
----------

Now that you've completed your first evaluation:

* **For Python developers**: See :doc:`usage_library` for advanced programming examples
* **For CLI users**: See :doc:`usage_cli` for complete command options
* **For more examples**: See :doc:`examples` for real-world scenarios
* **To understand output**: See :doc:`output_format` for detailed result interpretation