Internationalization and localization, take a good start !

This article will guide you through internationalization (i18n) and localization process. Discover 4 good reasons to do it now and learn how to do it right.

Why should I care now?

Let's imagine we are the finest Belgium chocolate maker. We have this new mobile app we are working on. This app allows our customer to order chocolate. We are not sure this will be worth the investment yet and first we want to launch a MVP (Minimum viable product) in one country. But, the vision is to launch it in every country where customers can buy our chocolates.

So why should I spend time on world readiness when I'm working on a MVP or a v1.0?

#1 Separation of concern

Ever heard about this annoying spelling mistake, which has never been corrected because the developers always say they have more important things to do?

I'm a little busy at the moment!

But let me tell you a secret: this is NOT their job!

The antidote to all this pain and suffering is to have this text easily editable by a non-developer. Translators, copy writers and growth hackers will thank you.

#2 Almost free now, expensive later

While i18n might be a lot easier when done from the beginning, this might become much harder when the code base starts to grow.

i18n has an impact on architecture. Some choices could make it painful to put in place later. For example, you might choose to use a bitmap image rather than writing this complex text effect your designer had in mind. Let me tell you something image localization is a pain in the ass.
Do you know how much image localization is a pain in the ass?

Another horror I have seen a lot is string concatenation. Developers are human, and sometime they are just lazy. Who hasn't faced such terrible kind of code:

total.innerText = cart.totalItems + "product" + (cart.totalItems > 1 ? "s, " : ", ") + cart.totalPrice + " €";

This will need a complete rewrite when you'll have to handle a new language. So don't do that! Never ever!

#3 We live in a global world

Unless you have a specific project, it is likely that users from different cultures will use it. So let them use the language and culture they are comfortable with.

Besides, this will probably run on computers with different regional configurations. I mean, some french people install their computers with English as locale. This is quite common among developers, but it might occur in some global companies as well. What will happen when your French only application will face such a configuration? It may crash due to some number formatting issue.

Once, I worked with team which implemented i18n in a hurry as part of a v2. Guess what? They had it completely wrong. I do believe that they made several mistakes :

  • First but not least, ignoring i18n while having a global target.
  • Adding it in a hurry without thinking about the team workflow.
  • Bad default/fallback culture choice. The resource key was french content. When a resource was not found the fallback was actually the resource key.
  • Resource keys were unconsistent. There was no naming convention.
  • Gender and plural handling relied on string concatenations.

As a consequence, developers were spending hours to update wording from spreadsheets. This was error-prone and it generated a lot of frustration. But don't worry, thanks to this article, you will avoid all those traps.

#4 Just code quality

An internationalized and localizable software is likely to have better quality. Don't get me wrong, I'm not talking about localized but localizable).

Such a code base will have more consistent behavior against different configurations. Maybe it should be part of any quality development, just as linting is.

What are i18n and l10n?

In computing, internationalization and localization are means of adapting computer software to different languages, regional differences and technical requirements of a target market. (Wikipedia)

Internationalization a.k.a. i18n. Internationalization is about generalization. An internationalized software abstracts languages and culture dependent features. This way, it can be adapted to various cultures and languages without further engineering. It makes your software culture/language agnostic. Some refer it as globalization or world readiness.

Localization a.k.a. i10n is about adapting your software to a specific region/language. If you took good care of internationalization, localization will mostly rely on resources translation. This should be quite straightforward. But sometimes, you have to write specific business rules or layouts for advanced scenarios.

I18n affects almost everything :

  • Text such as Titles, buttons, labels, menus etc... Even sometimes the app title.
  • Images such as logos including taglines
  • Content such as Database content, services etc...
  • Layouts when dealing with right to left or BiDi languages
  • Dates (Timezone and calendars)
  • User inputs such as numeric or dates
  • Currency
  • Last but not least sorting

How to do it right?

We need to build an infrastructure which allows us to retrieve appropriate resources and format data for a given culture. Let's go through some steps.

Plan for supported languages

Remember? We are a Belgium chocolate maker. So we will support Dutch! But French too. And we definitely want to export our amazing products to France (they speak French! easy!). Oh, but we have a lot of customers in Canada, UK and Wisconsin too! So we will definitely need English. The final list:

  • en-US
  • en-GB
  • en-CA
  • fr-CA
  • fr-FR
  • fr-BE
  • nl-BE

OK, but right now we only have Dutch, English (us) and french (fr) translations. But because UK English and American English are not exactly the same, this is where you will need to introduce neutral cultures. This allows you to set default resources for a given language while still supporting regional data formatting and resource overriding.

Final language / region hierarchy

This schema shows a culture fallback schema which could work but some platforms have more elaborated ones. For instance, In .net you the fallback sequence for fr-CA would be fr-CA, fr, fr-*, default language definied in the assembly

You might have noticed we are using language tags as defined in RFC 5646. This is the most common and standard way of naming languages.

It even supports scripts, variants, etc... This is handy when supporting more complex languages or when tagging content for automated processing. You are unlikely to handle both zh-Hans-CN (simplified Chineese) and zh-Hant-CN (traditional Chineese) the same way.

Default languages/region

The default culture will be used when:

  • No suitable culture was found for the user
  • A resource does not exist in the user's language

Usually the obvious choice is en-US (English with united states regional settings).

Text resources

Text resources are usually the biggest part of localization. The most common and simple way to address text resources is to use a key value pair approach. The key is a constant across languages and used to retrieve a specific text resource.

Many platforms have a built-in support for such kind of localized text resources. Despite they have quite the same schema (key, value, description), they have different format. Some translation companies use a standard called XLIFF. It is xml based and you can easily convert from XLIFF to any other format. So I would say it is the way to go for text resource management. There are several tools for XLIFF file management such as Java Open Language Tools, Transolution or Microsoft multilingual app toolkit. Under some circumstances you might prefer .po files but this is a little tight to linux/unix ecosystem.

Microsoft's XLIFF editor

Mozilla has also worked on a very nice alternative solution ! l20n.

This might seem simple, but don't fool yourself, there is a lot of traps. Let's review some rules to avoid them.

  • Do not reuse a key in several places. Translations are context sensitive. If you want to break that rule make sure that:

    • They are used in the same context
    • You have the same space available on the screen

    One simple example are on and off usage. In English it is completely fine to use these for "Flash" and "Cloud file synchronisation" settings. But it french you would rather use "Allumé/Eteint" and "Activée/Désactivée".

  • Do not merge noun and verbs. Be careful not to merge verb and noun under the same key. "Text" as a noun translates as "texte" or "message" in French. "Text" as a verb translates as "envoyer un message".

  • Do not make assumptions. Grammars, especially punctuation, agreement or gender, radically change from one language to another. For instance, plural in Russian is much more complicated than in English or French. Speaking of plurals, here is a rather complete list of rules.

  • Do not assume that part of speech order is the same in every language. For example Firstname and Lastname are not always in the same order depending on the language. Moreover, this is completely fine to hide the first name in French or English, but it is rude in Korean. This is the translator's call, not the developer's.

  • Do not exclude punctuation from localization. Punctuation must be localized. For instance, in french you put a space before and after a column :. In English you don't put one before.

Now that you're aware of common pitfalls, let's talk about "format strings" (sprintf, string.Format, etc...). For instance, we've all used them to print numeric data, such as "x items in cart" or "only x items left". Taking previous points into consideration, I would advise against using features such as sprintf for those resources. Rather use some extensible templating engine such as handlebars. There are several advantages to doing so:

  • You will use "Welcome {{customer.firstname}} {{customer.lastname}}!" instead of "Welcome {0} {1}!". Believe me, this is much easier to localize.
  • You can provide translators helpers such as plural helpers. This will make sure you let them handle grammar.

    resource description samples

    For instance, using handlebars, charCounter_charsLeft will output 1 remaining character or 10 remaining characters depending on {{this}} value.

  • You can provide access to a complete data model and let translators and copy writer use appropriate data.

When defining your resources, as shown in previous capture, you should write a comment which describes in which context they are used. It may also contain everything that might help translators/copy writer do their jobs.

One last thing, I like my resource keys to have a prefix which indicates in which view they are used. This automatically sorts resources per view. That makes them much easier to find. For example, I would use homepage_intro_title to refer to the title in the intro part of the homepage.

Image

There are two working approaches for image localization:

  • Naming convention: with this approach you will tag each image by prefixing or suffixing its name with a language tag.
  • Key value pair: with this approach you will store the path or Uri of the localized image in a key value pair store (like any other text resources).

Number/Date/Currency Format

Platforms usually provide a set of APIs for that matter. Yet, this is not always the case and you might need some extra library. I would recommend using some well known open source libraries rather than building your own. For example, in web development there are "formatjs" or "momentjs".

Sorting

This is a tough one. Yet again, platforms might have built-in support for most of languages.

Sorting? What the hell are you talking about? B comes after A, C after B etc... Simple! But all languages do not work like Latin characters based ones. The best example is Japanese. Kanji sorting is phonetic. Easy you would say, we just need an algorithm that converts string to a phonetic notation and performs sorting based on that. Nice try! Except, that wouldn't work. Japanese has another surprise for you! Kanji pronunciation depends on context.

I won't lie to you, adding the fact that Japanese has not only kanji, but 3 other character sets makes Japanese localization a complete nightmare. Meanwhile, each language has its specificities so just be aware and dont believe that ordinal ordering will do the job :).

Layouts

Usually you won't need layout variants unless you have to support right to left languages. That said, development patterns such as MVC or MVVM usually make loading of specific views pretty easy.

Note that some platforms have advanced left to right and BiDi support. You might be tempted to build one single view that supports all scenarios but this is not an easy thing. I wouldn't recommend that.

Conclusion

If you do internationalization right and implement it from the start, you will save your team a lot of time and a lot of furstration. It will also improve your code base and product quality. And it will, at least from the product point of view, make you global ready. So you should definitely consider internationalization as an investment.

Localization is sometimes a little more complex. But remember, being l10n ready does not mean actually supporting every languages yet. I often learn new things when working on that subject. I will probably update this in the future. Meanwhile, please leave us a comment if you want to submit a specific case or an error. In the meantime, please fell free to comment to share your personal experience or ask questions.

We'll write technical guidance for different platforms such as Windows Apps, Angular or NodeJS. Subscribe to our newsletter and get notified when it's available! May code be with you !

Image credits : Antique Globe 2 by James Saunders.