This feature is available exclusively for Marketing and support projects.

In this article, you’ll learn how HTML is parsed within Marketing and support projects and how to set up custom HTML parsing rules to tailor the parsing process to your specific needs.

How it works

When Lokalise imports content from various sources like Marketo, Salesforce, Iterable, and others, it first converts the content into HTML, regardless of its original format. After conversion, Lokalise separates the translatable content into individual keys. Our HTML parsing rules are designed to simplify translation texts by removing unnecessary tags, streamlining the translation process, and improving the accuracy of translation memory entries.

Here’s how it works:

Cleaning up extra tags: Any extraneous tags at the end of the content are removed to ensure cleaner, more focused text for translation.
Focusing on textual content: Keys that consist solely of iframes and images are not created, directing attention to the translation of text-based content.
Handling line breaks: If a paragraph contains a   tag to create a new line, Lokalise treats this as an indicator to generate two separate translation keys—one for the text before the   tag and another for the text after. This approach allows for more precise translations.
Extracting text from HTML attributes: Texts within HTML attributes like alt and title are extracted as separate keys. When exporting, Lokalise accurately reconstructs the original document structure, ensuring that these keys are properly integrated. Additionally, these keys are given extra context to enhance the accuracy and relevance of the translations.

Customize parsing

Only contributors with the Manage project settings permission can adjust the parsing rules.

You can customize HTML parsing rules on a per-project basis to better suit your specific needs. For more detailed information on the available rules, please refer to the OKAPI documentation.

Getting started with custom parsing rules

Go to your project settings and navigate to the HTML parsing rules tab:

After making your adjustments, click Save changes to apply them. If needed, you can click Restore default rules to revert all changes and reload the default settings.

Application of the rules

Once you've adjusted the rules, they will be applied to all new content imports. Previously imported content will remain unchanged, but if you re-import an existing entry, it will be recreated according to the new rules.

Technical limitations

HTML attributes extraction: Content within HTML attributes will be extracted without further parsing. For example, if you have a <a title="test"> tag, the title will be extracted as test without any additional HTML processing.
Invalid HTML: Issues such as duplicate tag IDs or improperly closed tags can lead to import failures or broken content in Lokalise. It’s essential to ensure your HTML is valid before importing to avoid disruptions.
Broken layouts: Sometimes, stripping out trailing tags can result in a seemingly broken layout in Lokalise. However, the exported content should be correct.

Input	Visible content in Editor	Exported content is correct
`<span>Hello,<br></span>`	`<span>Hello,`	`<span>Hello,<br></span>`

Improperly closed tags: If the initial markup contains improperly closed tags, the content may appear broken both in the editor and upon export.

Input	Visible content in Editor	Export will be broken
`<strong>Hello, <u>Lokalise</strong></u>`	`Hello, <u>Lokalise</strong></u>`	`Hello, <u>Lokalise</strong></u>`

Projects

Keys and platforms

Glossary

Downloading translation files

Create a translation memory for existing translations (translation alignment)