Segmentation

Segmentation makes translating longer text much more convenient.

Ilya Krukowski avatar
Written by Ilya Krukowski
Updated over a week ago

This feature is currently in private beta.

Segmentation splits translations into smaller, relevant chunks. It makes translators more efficient, translation memory richer and the experience better when localizing longer texts.

How does it work? Basically, Lokalise is able to automatically split translation-related text into smaller chunks using rules based on the code elements and features of the particular language, like a full stop. You can also perform this process manually by taking any long text under a translation key, splitting it into multiple smaller segments, and translating them separately. The key will be considered fully translated when all its segments in all languages are translated. Upon export, these segments will be automatically combined into a single translation string.

Please note that segmentation cannot be disabled. Also, it cannot be enabled for non-segmented projects. You can only create a new project and make it initially segmented.

Index

Getting started

When creating a new translation project, enable the Split text into segments option:

You'll be able to choose segmentation rules: Default rules or Custom rules. If you are not sure what to choose, we suggest sticking with the Default rules for now.

If you've chosen Custom rules, you'll be asked to upload a properly formatted .srx file with your rules. You'll also be presented with a link to download a template with all the default rules Lokalise utilizes:

Once you are ready, click Proceed. Next, upload your translation files as usual. The text under each translation key will be segmented automatically according to the chosen rules:

Settings

While it is not possible to enable or disable segmentation once the project is created, you can at least see whether segmentation was enabled initially. To achieve that, click More > Settings and scroll to the bottom of the page:

If segmentation is enabled, the corresponding checkbox will be ticked. Please note that the Branching is grayed-out because segmented projects do not support branching.

If you've chosen Custom rules upon project creation, you'll also be presented with a link to download your .srx file:

How is segmentation performed?

To learn about some edge cases and potential issues that you may encounter when updating your segmented project, check out the corresponding article.

Segmentation has two main components:

  • Language-based segmentation — the text is split based on the language rules (processing is performed by Okapi). This type works on all content.

  • Code-base segmentation — this type works only for HTML content. In this case we perform segmentation using HTML block-level tags (like p, div, article, ul, and others). Take a look at the following example:

In this case, we have five block tags, namely article, p, section, ul, and li. These block tags will act as delimiters, and the text will be separated accordingly. Also there are two inline tags: strong and a. These inline tags will be left intact and won't be utilized during segmentation.

Here's the result:

As you can see, all the inline tags are present in the segments, whereas all the block tags were stripped out. However, this does not mean that p, article, and the other block tags are lost — no, they are simply hidden in the project editor. When you go to export your translation keys, all block tags and any whitespaces between them will be restored automatically. Under the hood, these block tags are stored in segment suffixes and prefixes.

Segmentation specifics

To learn about some edge cases and potential issues that you may encounter when updating your segmented project, check out the corresponding article.

Translation editor

  • Filters work with translation keys. It means that if you choose to show only unverified items, and one translation key segment is unverified, then the whole key with all the segments will be displayed.

  • When you edit the base language translation for a segment, all the corresponding segments in the target languages will become unverified:

Character limit

  • The character limit is set on a per-key basis.

  • The limit will count the sum of all the key segment characters. For example, suppose you have a translation key with a character limit set to 100, and this key contains three segments. In the first segment you enter a phrase that is 60 characters long. It means that when you proceed to the second segment, the limit will show as 60/100. In other words, you'll have only 40 characters left for the last two segments.

  • If you change the key character limit and the sum of all segment characters exceeds this limit, you'll see a warning message when editing your segments.

Tasks

  • You can create tasks for translation keys only — not for individual segments.

  • You cannot include or exclude specific segments — all segments will always be included in the task.

  • The Offline XLIFF task works in the same way as for non-segmented projects.

  • When running the task initial analysis, each segment is calculated separately. Suppose you have one translation key with two segments. The first segment has an 85% match for 5 source words. The second segment has a 100% match for 7 source words, which means that in total the base value has 12 words. In this case, the initial analysis will calculate translation memory (TM) matches based on segments. Thus, in the initial analysis table you'll see:

    • 7 in the TM 100% category

    • 5 in the TM 85-94% category

    • 2 in the Segments total category

Orders

  • Translation orders can be created for translation keys only — you cannot add individual segments to the order.

Translation memory

Key merge

  • You can merge segmented translation keys but only when these keys have the same number of segments.

Translation history

  • Each segment has its own translation history.

  • Each segment also shows the entire translation history before it was segmented.

  • You can switch between the segment and key history using the Segment history switch in the History pane:

Apps (previously known as "integrations")

  • Some apps support segmented projects. Upon importing, all texts are segmented automatically, and upon exporting they are automatically concatenated. Please note that certain plugins (for example, design plugins) do not have support for this feature.

Branching

  • You cannot enable branching for a segmented project.

  • You cannot enable segmentation for a project that has branching enabled.

Changing segment statuses

You can set individual segments to reviewed or verified, as well as assign custom translation statuses, these actions have some specifics that are summarized in the following table:

Unverified

Reviewed

Custom translation status

Change in base segment affects status of target segments

Yes

No

No

Change of segment value deactivates its status

Yes

Yes

No

When splitting a segment, its status is copied to newly created base segments

No

No

Yes

When splitting a segment, its status is copied to newly created target segments

Yes

No

No

When creating a key through API and forcing the status (is_unverified, is_reviewed), the status is copied to newly created base segments

Yes

Yes

N/A

When creating a key through API and forcing the status (is_unverified, is_reviewed), the status is copied to newly created target segments

Yes

No

No

Segmentation during key creation

Whenever you create new translation keys in your project with segmentation enabled, their translations will be segmented automatically. Here's an example:

Here, the text has two paragraphs marked with the p tags, and therefore the newly created translation key will contain two segments.

File exporting and segmentation

Upon exporting your translations, the segments will be automatically merged into a single translation key. However, there are some exceptions wherein you'll still receive separate segments in the output:

  • Offline XLIFF — upon export, keys will have new names: lokalise-<key_id>-<segment_number>.

  • Translation object in APIv2.

  • Segment object in APIv2.

Please note that translation filters work as a group of all segments for a specific key. For example, if you choose to export only the Translated strings, then all segments must be translated to be included in the export. If one of the segments is not translated, the whole translation key will be ignored. The same logic applies to other filters. For instance, if you enable the Reviewed only strings filter, then all segments must be reviewed.

Also, if you choose to replace empty translations with the base language value, the whole translation will be replaced, not the individual segments.

Finally, sorting does not affect segment order. For example, if you choose to sort by last updated, the sorting will take the most recently updated segments into consideration and sort the keys accordingly. However, the segment order won't be changed.

Did this answer your question?