We are in the process of dealing with the internationalization and localization of one of our open source web applications, which is fully developed using C# and ASP.NET MVC. Although ASP.NET already provides easy support for i18n via the usage of resx resource files, these are not very user friendly for translators.
A resx file is actually just an XML, but there are not many open source tools oriented for translators to easily handle them. Therefore we decided to try GNU gettext.
Gettext uses .po file format which basically consists of pairs of key/values. The interesting thing here is that keys in Gettext are the original untranslated text, so there is no need to keep the extra file containing untranslated strings. These files also keep tags, occurrences of the string in the code, and any other annotation inside comments, and there are plenty of helpful tools to handle them, specially designed to be used by translators, such as PoEdit, or the Pootle server which provides an online store for uploading, translating and merging .po files.
(just as a side note, NET resource files can be converted to a plain key=value text format using the ResGen utility; however, this format lacks all the added value provided by po files.)
GNU Gettext for windows can be downloaded from here, or you may want to directly use the tools included in the CygWin environment.
Handling messages
As gettext does not require an extra untranslated resource file, with the default key=value mappings for each message, it provides an utility (xgettext) to extract all calls to any specified "translator" function in code.
For example, suppose we have the traditional example:
Console.WriteLine("Hello {0}!", name);
Using traditional .NET resource files, we should create a Resources file, add an entry with the text, and reference it:
Console.WriteLine(Resources.Hello, name);
However, using gettext, we define a function (one of the defaults is GetString, although we may change it to a much shorter one) that marks and translates strings and enclose the string within it:
Console.WriteLine(Resources.GetString("Hello {0}!"), name);
Using a shorter version with format string included:
Console.WriteLine(Messages.T("Hello {0}!", name));
This ends up being a much faster way of manipulating strings, without requiring the need of editing a separate file.
Behind GetString
Although GNU gettext tools already provide a default implementation of a Resources Manager, I prefer writing a custom version with more control over strings manipulation, and using the default .NET logic for satellite culture assemblies lookup (as the default implementation does it manually).
A resources manager, which we shall name Messages , can be implemented by using a very similar structure to the Resource class generated by .NET when we add a resx file. That code loads the resource dictionary, and if you take a closer look, every method for accessing the externalized strings is just wrapping the key string:
/// <summary> /// Looks up a localized string similar to This is a test. /// </summary> internal static string TestString { get { return ResourceManager.GetString("E_NoScriptManager", resourceCulture); } }
Therefore, we can bypass those strongly typed methods and provide direct access to the dictionary itself. Since the keys are the untranslated version of the strings, we can simply write:
/// <summary> /// Looks up a localized string. /// </summary> /// <param name="t">The untranslated string.</param> /// <returns>Translated string according to the resource culture.</returns> public static string T(string t) { return ResourceManager.GetString(t, resourceCulture); }
Here is the full version we are currently using with multiple overloads to handle string formatting and specifying a certain culture, in case you want to use it.
Note that the ResourceManager NET’s class automatically looks up the string based on the invoker thread’s current UI culture property, unless you specify a different one.
Creating the PO files
If we want to get our strings localized, we must generate the po files to be distributed to the translators and then reincluded in the project. Creating these files is extremely easy using the xgettext tool as we had previously said.
The first step is to create a files list to be crawled by this tool. The simplest way is using a DOS dir command to retrieve all csharp files and store them into a list. This can be done by using:
dir .Source*.cs /S /B > Strings.filelist
The /S option forces to check on subdirs (a behaviour that xgettext cannot be instructed to have) and the /B prints a plain list with only filenames.
Then we run xgettext feeding it with the filelist we have just generated, and instructing it to process strings included in calls to Messages.T:
xgettext.exe -k -kMessages.T --from-code=UTF-8 -LC# --omit-header -oMessages.po -fStrings.filelist
The –k parameter specifies the function we want to extract strings from, –f is the filelist we use and –o the output po file, which will look like this:
#: D:TestSourceHello.cs:10 #, csharp-format msgid "Hello {0}!" msgstr ""
This file can be interpreted by any of the tools discussed before (like PoEdit). One for each different language is generated and distributed to translators.
Creating the resources files
We have marked the strings for translation, extracted them and are translating them at runtime using the T function. What we are missing is actually creating the resources files to be retrieved by NET’s ResourceManager, using the translated po files.
These files must conform to a specific resource format understood by the resource manager. Luckily, .NET provides a both a ResGen tool for creating them from other simpler formats, such as a plain text name=value file, as well as a ResourceWriter class.
What we did is develop a simple tool using this class which parses po files and outputs them as a NET resources file; you can find the code here. So all you have to do is invoke it with the input po file:
Msgfmt.exe -iMessages.es.po -oMessages.es.resources
By including this resources file in the Visual Studio project, a satellite assembly for the specific culture will be automatically generated. Note that if you add it in a folder Resources, for example, the path to access it when creating the resources manager will be ProjectName.Resources.Strings.
In the case of ASP.NET projects, you should deploy the resources file to the App_GlobalResources folder, and access it via Resources.Strings:
Getting the user’s culture
After all this work, we still have to enable some mechanism for getting the user’s preferred culture and use it in the current thread, so our application will be effectively displayed in the chosen language.
I will not be going deep into this subject, as setting the culture for desktop applications is mostly automatic, and Sergio has an excellent post about how to deal with i18n of ASP NET MVC websites via configuration, cookies and HttpModules.
More to come…
There are some non typical scenarios which you may run into when dealing with this solutions. Some of the most interesting we ran into are how to deal with internationalization of message sending, when the recipient’s culture may differ from the sender’s, or how to make an easy translation function for the ASP NET views.
I hope to be publishing more on this subjects soon.