Skip to main content

Scoring translation quality

AI scoring in Lokalise shows translation quality and flags issues using the MQM standard.

Ilya Krukowski avatar
Written by Ilya Krukowski
Updated over 2 weeks ago

This feature is currently in beta.

The AI-powered translation scoring feature lets you quickly check the accuracy of target translations. Each string in the editor receives a score from 0 to 100 based on Multidimensional Quality Metrics (MQM).

  • High score (≥ 80): Translation can usually be auto-approved or lightly reviewed.

  • Low score (< 80): Human review is strongly recommended.

This helps avoid the common trade-off between manually reviewing everything (slow and expensive) or skipping reviews entirely (risky).

What translation scoring offers?

With translation scoring, you can:

  • Focus human effort only where it matters, cutting review time and cost

  • Get instant, data-driven QA directly in the editor

  • Improve reviewer productivity with specific issue guidance

  • Confidently handle low-priority languages that were previously cost-prohibitive

Unlike generic scoring tools, Lokalise scoring is:

  • Integrated – built into the Lokalise editor and workflows

  • Context-aware – tailored to your project’s content and quality needs

  • Real-time – scores appear instantly per string

  • Cost-efficient – reduces unnecessary reviews and budget spend

Can we really trust an AI score to determine translation quality?

The goal of Translation Scoring isn’t to replace human reviewers: it's to help them focus on the content that truly needs attention. Instead of reviewing every string manually (slow and expensive), teams can:

  • Route only low-quality translations (<80) for human review

  • Skip or lightly edit high-scoring translations

  • Spend review time more strategically, without sacrificing quality

In practice, teams can skip or lightly edit high-scoring translations and only review lower ones, reducing post-editing effort by up to 80% while maintaining quality.

Why do you use MQM to evaluate translation quality?

We use MQM (Multidimensional Quality Metrics) because it’s the most widely recognized standard for evaluating translation quality with AI.

Unlike automatic metrics such as BLEU, COMET, or METEOR that compare output against a single "reference" translation, MQM evaluates the translation itself. It highlights errors, explains their severity, and produces interpretable scores that reflect real-world usability, not just a number.


How to view scores for translations

To see translation scores, open your Lokalise project editor and look for a target translation value. You’ll notice a small lens icon next to it:

This icon only appears for target translations, not source values.

Click the lens to trigger scoring. AI will check the translation quality and show a score. Click the score to get more details:

If the translation has issues, points will be deducted, and you’ll see a list of detected problems:

View scoring issues

Scoring does not consume any extra AI words from your team’s quota.


Scoring and AI tasks: Generating scores in bulk

You can generate scores for many translations at once. To do this, select one or more keys in the editor, then choose Create a task from the bulk actions menu.

Bulk actions menu

In the task settings, select the Automatic translation type (AI-powered task). This task will automatically score all added keys. Scoring doesn’t use your AI words quota. AI words are only counted for the translation itself, not for scoring.

Scoring won’t apply to 100% matches from the translation memory. If a translation comes straight from your translation memory, it won’t be scored. Lokalise only scores translations generated by AI within tasks.


Using scoring in workflows

You can also set up scoring as part of your workflows. To do this, go to the Workflows page and create a new workflow that includes the step Review task with AI scoring. This step comes after the AI translation step that might be preceded by the TM step.

When setting up the AI scoring step, you’ll see these options:

View AI scoring step options

You can set a quality score threshold — by default, it’s 80. If a translation score falls below this number, the translation is automatically added to a review task created during this step. All translations that for some reason were not scored will also be added to the task. Please keep in mind that the scoring will ignore all entries from translation memory with 100% matches; Lokalise only scores AI-generated translations within tasks.

You can also customize the review task: give it a name, description, set a due date, and assign team members to handle the translations flagged for review. For more details on review tasks, check the Tasks documentation.


How scoring works

Translation scoring is powered by the MQM (Multidimensional Quality Metrics) framework — the industry standard for evaluating translation quality. MQM breaks down issues into categories such as grammar, spelling, fluency, terminology, and meaning errors. Each issue type is assigned a penalty weight based on severity:

  • Critical (-75 points): Errors that change or break the meaning (e.g., missing words, wrong translation, character limit violations).

  • Major (-25 points): Significant issues with grammar, spelling, terminology, or readability.

  • Minor (-5 points): Small flaws such as unnatural phrasing or extra spaces.

The final score is calculated as:

100 – (total penalties)

Examples:

  • One major (-25) + one minor (-5) → Score: 70

  • Two criticals (-75 each) → Score: 0 (scores cannot go below 0)

Score thresholds:

  • 100: Perfect translation, ready to publish.

  • 80–99: Good quality, minor improvements possible but safe to release.

  • Below 80: Review strongly recommended.

In workflows, thresholds can be configured to auto-route low scores to post-editing, while high scores can be published with confidence.


Translation scoring vs. AI LQA

Both translation scoring and AI LQA use the MQM framework to evaluate translation quality, but they serve different purposes:

  • Translation scoring

    • Built directly into the editor with real-time scores for each string

    • Integrated into workflows, enabling automation (e.g., routing low scores to human review)

    • Provides a faster, more user-friendly experience for day-to-day translation work

  • AI LQA

    • Generates quality reports in batch format

    • Useful for large-scale assessments and audits

    • Offers less interactivity compared to scoring

Translation scoring is the natural evolution of AI LQA bringing the same evaluation logic into a more scalable, real-time, and workflow-integrated experience. Over time, it may fully take the place of AI LQA for most use cases.


Frequently asked questions

💡 Looking for more?
Find all answers in AI: Frequently asked questions!

Did this answer your question?