AI LQA is a new task type that allows to perform localization quality assurance on the provided content in a fully automated way. It uses Lokalise AI assistant built on OpenAI's GPT API to automatically identify linguistic issues, categorize them according to the DQF-MQM framework and deliver detailed reports with comments and suggested corrections. AI LQA helps to improve translation quality without increasing costs.
AI LQA is a new task type that is built on top of the Lokalise AI. AI LQA task provides the following capabilities:
Create a task to evaluate translation quality in 30 languages (see full list below)
The task will be automatically assigned to Lokalise AI assistant
During the ongoing task, all translation keys will be automatically locked until Lokalise AI assistant finishes the job for the respective language
Receive notifications upon language evaluation completion
Generate quality report that will contain:
Scorecard for each language
Detailed report with suggested corrections and comments from Lokalise AI assistant
Perform glossary adherence checks to identify translations that do not respect your glossary terms
On average, AI LQA takes just a few minutes to complete but this heavily depends on the amount of content and number of languages that you decide to perform it on. You would be able to see estimated time of completion once you selected languages and scope.
AI LQA currently supports 30 languages and their variations for different locales: English, Russian, Chinese Traditional, Chinese Simplified, Japanese, Spanish, German, French, Portuguese, Arabic, Korean, Italian, Dutch, Turkish, Polish, Swedish, Indonesian, Czech, Hebrew, Danish, Finnish, Romanian, Hungarian, Vietnamese, Hindi, Lithuanian, Latvian, Slovak, Estonian, Ukrainian.
This is not an exhaustive list. It will be updated based on the quality level and our confidence in quality for the respective language. In case you want us to support other languages and are happy to help with quality evaluation, let us know.
Using AI LQA
To get started with AI LQA, you have to enable Reviewing for your project.
First, proceed to the project and click More > Settings:
Then, find the Quality assurance section and tick the Reviewing option:
Don't forget to save the changes.
This is it, now you can start using AI LQA!
Creating a new LQA task
To get started with AI LQA, open any project in Lokalise and create a new task. There are two ways to create a task.
First, you can select multiple keys in the editor by ticking checkboxes next to their names and then choosing Create task... from the bulk actions menu. This will open the task creation page with a predefined task scope, limited to the keys you have just selected.
Alternatively, open your project and proceed to the Tasks page:
Click Create a task to open the task creation wizard.
In the task creation wizard you will see a new task type called AI LQA:
Select the AI LQA task, provide task name and description, and proceed to the next step.
Adjusting task options
The Advanced options will be limited to the following:
Tag keys after the task is closed — tag translation keys added to the current task once it's completed. This way you'll be able to easily identify these keys afterwards.
Some other options are hidden and always on by default:
Lock translations (non-modifiable) — all translations added to the task will be locked until Lokalise AI assistant completes the respective language.
Auto-close languages (non-modifiable) — once Lokalise AI assistant completes the respective language, it'll be automatically closed and the task creator will receive a notification via e-mail.
Auto-close task (non-modifiable) — the task will be automatically closed once all the added languages have been completed by Lokalise AI assistant. Task creator will receive a notification via e-mail.
Adjusting task scope
Select the scope and languages:
Task scope — adjust the filter to choose the keys that should be added to the task.
Source language — choose the language that should be used as a reference for performing quality evaluation.
Target languages — choose one or more languages that should be evaluated by Lokalise AI assistant.
Task assignees — you won't be able to modify assignees as all languages will be assigned to Lokalise AI assistant for automatic evaluation.
To the right, you'll see the task summary and your Lokalise AI words balance:
In this summary you can see the AI words quota for your team and the number of AI words to be consumed in this exact task. Please refer to the Team quotas article to learn about the words quota.
You will also see the estimated delivery time for this task. Please note this is an approximate number and it can fluctuate depending on the load on our system.
Downloading a report
Once the task is completed, the Download report button will be enabled. Click on it to view a report for the task:
Report will be downloaded in
Each language will be displayed on a separate sheet in the report.
At the top you will see the quality metric scorecard with all the detected errors. Please refer to the translation quality evaluation framework section to understand categories and severity levels. ETPT stands for the error type penalty total and is calculated by multiplying error count by severity multiplier.
At the bottom of each scorecard you'll see multiple calculations and numbers:
Evaluation word count — number of words that have been evaluated for this language.
Reference word count — "arbitrary number of words” in a hypothetical reference evaluation text for easier comparison between different scorecards. The purpose of this metric is to understand what would be the penalty score for the scope of X words. Default value is 1000.
Scaling parameter — a special multiplier to differentiate between the overall normed penalty total values for different text types or projects with significantly different specifications.
In other words, this variable is used to give some weights to different scopes. For example, one project might contain high-visibility strings from your landing page while another project contains backend strings that aren't visible to customers and thus you might not care much about the quality. Default value is 1.
Max score value — maximum quality score that a language can get. Default value is 100.
Threshold value — what is the quality score threshold that determines whether translation quality for this language passed or failed. Default value is 85.
Per-word penalty total — calculated by dividing the absolute penalty total by the evaluation word count.
Overall normed penalty total — represents the per-word error penalty in total relative to the reference word count (1000 words by default).
Overall quality score — primary quality measure for your translations. This value is calculated by multiplying the per-word penalty score by the maximum score value (which usually equals to 100), and subtracting this value from 100 to resemble a percentage value.
Pass/fail rating — indicates whether quality score has passed the threshold or not.
Below the scorecard you'll see a detailed report that provides a granular breakdown into the issues that Lokalise AI found. Every error will be represented in a separate row.
Besides error and severity level, you will be able to see valuable information such as:
Suggested correction — the AI will provide a corrected translation version to fix the found issue. Later we're planning to provide these corrections as suggestions in Lokalise UI.
Comment — comment left by Lokalise AI that will explain why it has determined this as an error and what is wrong with this specific translation.
Translation quality evaluation framework
AI LQA utilizes the DQF-MQM framework to perform linguistic quality assurance (LQA). It can be applied to both human translation and machine translation. The aim is to standardize error categorization and provide data in a structured way, hence reducing the subjectivity bias on translation quality.
Results can be used for identifying underperforming languages, performing root cause analysis and improving localization processes to result in higher quality of translations.
Framework consists of pre-defined categories and severity levels with different multipliers depending on how critical error is.
At the moment it is not possible to modify existing error categories or to adjust their weights.
Issues related to the correctness and fidelity of the translation, such as mistranslations, omissions, or additions.
Issues related to the naturalness and readability of the translation, such as grammar, syntax, punctuation, or spelling errors.
Issues related to the use of domain-specific terms, such as incorrect or inconsistent terminology.
Issues related to the adherence to locale-specific conventions, such as date formats, number formats, or currency symbols.
Issues related to the adherence to a specific style guide or the overall tone and voice of the translation.
Issues related to the consistency of the translation, such as using of different terms for the same concept or inconsistencies in formatting.
Issues related to the logical flow and organization of the translation, such as unclear references or incorrect sentence structure.
Issues related to the visual presentation of the translated content, such as layout, formatting, or font issues.
Issues related to the handling of markup elements in the translation, such as incorrect or missing tags.
Issues related to the adaptation of the content for a specific target audience, such as cultural appropriateness or the use of region-specific examples.
Issues related to the truthfulness or factual correctness of the content, such as incorrect or outdated information.
Issues categorized as neutral have minimal impact on the overall quality of the translation and are considered inconsequential or insignificant.
Minor severity indicates small issues that might slightly affect the translation quality, but they do not significantly impact the overall understanding or usability of the translated content.
Major severity represents significant issues that affect the quality and comprehension of the translated content, potentially leading to confusion or misunderstandings for the target audience.
Critical severity denotes severe issues that completely compromise the accuracy, clarity, or usability of the translated content, rendering it unusable or misleading for its intended purpose.
Frequently asked questions
What features are available?
You'll be able to use the AI Translations to provide instructions to Lokalise AI and run automatic translations in bulk taking into account the context and your terminology.
How does Lokalise AI translation work?
Lokalise AI uses Open AI models to assist with content translation. At Lokalise, we add an extra layer on top of OpenAI’s models to improve the precision and accuracy of your translations. For example, we give you the option to add context, like character limits, key descriptions, glossary terms, and style guides, which Lokalise AI takes into account to deliver terms and phrases that are more accurately translated.
What types of content can be translated using Lokalise AI?
Lokalise AI translation is incredibly versatile and can handle a wide range of content types, including websites, documents, emails, marketing materials, user-generated content, and more. It can adapt to different industries and content formats, making it the perfect solution for businesses aiming to reach global audiences.
How accurate are Lokalise AI translations?
Lokalise AI translations can vary depending on factors like language pair and content complexity. But based on feedback from our customers, we can give you a ballpark estimation of around 80% accuracy. In most cases, you might just need to make minor changes for the remaining 20% to make it perfect for your specific needs.
Can Lokalise AI handle translations of highly specialized or technical content?
Absolutely! Lokalise AI can handle translations of highly specialized or technical content. Our AI models are trained to understand complex terminology, ensuring accurate translations even for the most specialized industries.
How will my data be used?
The only data being processed by LQA is the source text, target translations, key descriptions, and glossary terms matching the content. This data is not used for other purposes, such as model training.