This feature is currently in beta.
The AI-powered translation scoring feature lets you quickly check the accuracy of target translations. Each string in the editor receives a score from 0 to 100 based on Multidimensional Quality Metrics (MQM).
High score (≥ 80): Translation can usually be auto-approved or lightly reviewed.
Low score (< 80): Human review is strongly recommended.
This helps avoid the common trade-off between manually reviewing everything (slow and expensive) or skipping reviews entirely (risky).
What translation scoring offers?
With translation scoring, you can:
Focus human effort only where it matters, cutting review time and cost
Get instant, data-driven QA directly in the editor
Improve reviewer productivity with specific issue guidance
Confidently handle low-priority languages that were previously cost-prohibitive
Unlike generic scoring tools, Lokalise scoring is:
Integrated – built into the Lokalise editor and workflows
Context-aware – tailored to your project’s content and quality needs
Real-time – scores appear instantly per string
Cost-efficient – reduces unnecessary reviews and budget spend
Can we really trust an AI score to determine translation quality?
The goal of Translation Scoring isn’t to replace human reviewers: it's to help them focus on the content that truly needs attention. Instead of reviewing every string manually (slow and expensive), teams can:
Route only low-quality translations (<80) for human review
Skip or lightly edit high-scoring translations
Spend review time more strategically, without sacrificing quality
In practice, teams can skip or lightly edit high-scoring translations and only review lower ones, reducing post-editing effort by up to 80% while maintaining quality.
Why do you use MQM to evaluate translation quality?
We use MQM (Multidimensional Quality Metrics) because it’s the most widely recognized standard for evaluating translation quality with AI.
Unlike automatic metrics such as BLEU, COMET, or METEOR that compare output against a single "reference" translation, MQM evaluates the translation itself. It highlights errors, explains their severity, and produces interpretable scores that reflect real-world usability, not just a number.
How to view scores for translations
To see translation scores, open your Lokalise project editor and look for a target translation value. You’ll notice a small lens icon next to it:
This icon only appears for target translations, not source values.
Click the lens to trigger scoring. AI will check the translation quality and show a score. Click the score to get more details:
If the translation has issues, points will be deducted, and you’ll see a list of detected problems:
Scoring does not consume any extra AI words from your team’s quota.
Scoring and AI tasks: Generating scores in bulk
You can generate scores for many translations at once. To do this, select one or more keys in the editor, then choose Create a task from the bulk actions menu.
In the task settings, select the Automatic translation type (AI-powered task). This task will automatically score all added keys. Scoring doesn’t use your AI words quota. AI words are only counted for the translation itself, not for scoring.
Scoring won’t apply to 100% matches from the translation memory. If a translation comes straight from your translation memory, it won’t be scored. Lokalise only scores translations generated by AI within tasks.
Using scoring in workflows
You can also set up scoring as part of your workflows. To do this, go to the Workflows page and create a new workflow that includes the step Review task with AI scoring. This step comes after the AI translation step that might be preceded by the TM step.
When setting up the AI scoring step, you’ll see these options:
You can set a quality score threshold — by default, it’s 80. If a translation score falls below this number, the translation is automatically added to a review task created during this step. All translations that for some reason were not scored will also be added to the task. Please keep in mind that the scoring will ignore all entries from translation memory with 100% matches; Lokalise only scores AI-generated translations within tasks.
You can also customize the review task: give it a name, description, set a due date, and assign team members to handle the translations flagged for review. For more details on review tasks, check the Tasks documentation.
How scoring works
Translation scoring is powered by the MQM (Multidimensional Quality Metrics) framework — the industry standard for evaluating translation quality. MQM breaks down issues into categories such as grammar, spelling, fluency, terminology, and meaning errors. Each issue type is assigned a penalty weight based on severity:
Critical (-75 points): Errors that change or break the meaning (e.g., missing words, wrong translation, character limit violations).
Major (-25 points): Significant issues with grammar, spelling, terminology, or readability.
Minor (-5 points): Small flaws such as unnatural phrasing or extra spaces.
The final score is calculated as:
100 – (total penalties)
Examples:
One major (-25) + one minor (-5) → Score: 70
Two criticals (-75 each) → Score: 0 (scores cannot go below 0)
Score thresholds:
100: Perfect translation, ready to publish.
80–99: Good quality, minor improvements possible but safe to release.
Below 80: Review strongly recommended.
In workflows, thresholds can be configured to auto-route low scores to post-editing, while high scores can be published with confidence.
Translation scoring vs. AI LQA
Both translation scoring and AI LQA use the MQM framework to evaluate translation quality, but they serve different purposes:
Translation scoring
Built directly into the editor with real-time scores for each string
Integrated into workflows, enabling automation (e.g., routing low scores to human review)
Provides a faster, more user-friendly experience for day-to-day translation work
AI LQA
Generates quality reports in batch format
Useful for large-scale assessments and audits
Offers less interactivity compared to scoring
Translation scoring is the natural evolution of AI LQA bringing the same evaluation logic into a more scalable, real-time, and workflow-integrated experience. Over time, it may fully take the place of AI LQA for most use cases.
Frequently asked questions
💡 Looking for more?
Find all answers in AI: Frequently asked questions!