This guide explains how you can manually compare different types of translations in Lokalise:
Pro AI translations using Custom AI profiles (with RAG context)
vs Pro AI translations without RAG
vs standard AI/MT engines (Google, DeepL)
This evaluation setup helps you understand which option delivers the best quality for your content.
Create an evaluation project: Start with a clean workspace
We recommend creating a brand-new project that you will use only for evaluation.
You need to:
Create a new Web and mobile or Ad hoc documents project (e.g., AI Quality Evaluation).
Add or import the source text you want to test (for example, in English). You can:
Copy/move keys from another project using bulk actions
Add keys manually
Add comparison languages: Use multiple variants for testing
To compare different translation outputs of the same language, you need several "variants" of that language. Because a project cannot contain the same locale twice, we recommend adding custom locale codes to the evaluation project.
Note that we recommend using Pro AI with RAG for a "real" language variant — the language that already has previous translations and translation memory entries.
Pro AI and Standard AI/MT can be used for "fake" (custom) language variants.
Example:
Variant | Locale code | Purpose |
Latvian A |
| Pro AI with Custom AI Profile (RAG) |
Latvian B |
| Pro AI (no RAG) |
Latvian C |
| Standard AI/MT engine (e.g., Google) |
You need to:
Add as many language variants as you want to compare.
Use the correct ISO code as the prefix (e.g.,
lv), and add any suffix you like (-test1,-v02, etc.).Rename the languages to neutral labels such as Latvian A, Latvian B, Latvian C so reviewers cannot guess which engine produced which translation.
This helps prevent bias during evaluation.
Optional: Create custom statuses for linguistic scoring
If you want a structured way to mark quality, you can create custom statuses such as:
Acceptable
Not acceptable
Reviewers will later use these statuses to evaluate each translation.
Reviewers don’t need to fix translations they think are wrong — they can simply mark them by using a custom status. For example, if something looks incorrect, they can just apply the “Not acceptable” status.
Set up your Custom AI profile: Apply RAG to one language only
If you want to evaluate the impact of Custom AI profiles, you need to apply the profile to only one language variant.
You need to:
Create or open a Custom AI profile.
Add your high-quality past translations as examples (for RAG context).
Assign this profile to one target language within the evaluation project (e.g., Latvian A).
Do not assign it to the other variants.
This ensures that only one variant uses customized RAG-powered AI.
Generate translations: Use automations for consistent output
Once your languages are ready, you can fill all translations in bulk.
You need to:
Go to Automations within the evaluation project.
Create an automation that fills each target language:
Use Pro AI for Latvian A and for Latvian B. As long as you've added a custom AI profile only for the Latvian A, the B variant won't use RAG.
Use Standard AI/MT for Latvian C (and additional variants if needed).
Automations allow you to generate translations consistently across all keys and languages. Automations also won't trigger translation scoring which is important to avoid bias.
Invite reviewers: Evaluate quality blindly
Once translations are generated, you can invite your linguists or reviewers.
We recommend:
Asking reviewers to evaluate translations without knowing which engine produced them.
Note that reviewers are not required to fix incorrect translations.
Using custom statuses (e.g., Acceptable / Not acceptable).
Adding comments for suggested improvements.
This ensures an unbiased assessment.
Analyze results: Compare the quality of each translation variant
After the evaluation is complete, you can compare the results across all languages.
You need to:
Filter translations by:
Language
Status (Acceptable / Not acceptable)
Count how many translations were marked as acceptable for each variant.
Review comments to understand typical errors or strengths.
If needed, you can rename the languages to reveal which engine produced each result (e.g., Latvian A → Custom AI, Latvian B → Standard AI, Latvian C → MT).
What you learn: Understanding the impact of customization
By following this workflow, you will see:
How much Custom AI profiles (with RAG) improve quality
How Pro AI performs without past translation context
How Standard AI/MT engines compare to both AI options
How close your customized AI results get to previous human translations
This evaluation helps you choose the best translation approach for your product and content domain.
