gettext — Message Catalogs¶
Purpose: | Message catalog API for internationalization. |
---|
The gettext
module provides a pure-Python implementation
compatible with the GNU gettext library for message translation
and catalog management. The tools available with the Python source
distribution enable you to extract messages from a set of source
files, build a message catalog containing translations, and use that
message catalog to display an appropriate message for the user at
runtime.
Message catalogs can be used to provide internationalized interfaces for a program, showing messages in a language appropriate to the user. They can also be used for other message customizations, including “skinning” an interface for different wrappers or partners.
Note
Although the standard library documentation says all of the
necessary tools are included with Python, pygettext.py
failed
to extract messages wrapped in the ngettext
call, even with
the appropriate command line options. These examples use
xgettext
from the GNU gettext tool set, instead.
Translation Workflow Overview¶
The process for setting up and using translations includes five steps.
Identify and mark up literal strings in the source code that contain messages to translate.
Start by identifying the messages within the program source that need to be translated, and marking the literal strings so the extraction program can find them.
Extract the messages.
After the translatable strings in the source are identified, use
xgettext
to extract them and create a.pot
file, or translation template. The template is a text file with copies of all of the strings identified and placeholders for their translations.Translate the messages.
Give a copy of the
.pot
file to the translator, changing the extension to.po
. The.po
file is an editable source file used as input for the compilation step. The translator should update the header text in the file and provide translations for all of the strings.“Compile” the message catalog from the translation.
When the translator sends back the completed
.po
file, compile the text file to the binary catalog format usingmsgfmt
. The binary format is used by the runtime catalog lookup code.Load and activate the appropriate message catalog at runtime.
The final step is to add a few lines to the application to configure and load the message catalog and install the translation function. There are a couple of ways to do that, with associated trade-offs.
The rest of this section will examine those steps in a little more detail, starting with the code modifications needed.
Creating Message Catalogs from Source Code¶
gettext
works by looking up literal strings in a database of
translations, and pulling out the appropriate translated string. The
usual pattern is to bind the appropriate lookup function to the name
“_
” (a single underscore character) so that the code is not
cluttered with a lot of calls to functions with longer names.
The message extraction program, xgettext
, looks for messages
embedded in calls to the catalog lookup functions. It understands
different source languages and uses an appropriate parser for each.
If the lookup functions are aliased, or extra functions are added,
give xgettext
the names of additional symbols to consider when
extracting messages.
This script has a single message ready to be translated.
import gettext
# Set up message catalog access
t = gettext.translation(
'example_domain', 'locale',
fallback=True,
)
_ = t.gettext
print(_('This message is in the script.'))
The text "This message is in the script."
is the message to be
substituted from the catalog. Fallback mode is enabled, so if the
script is run without a message catalog, the in-lined message is
printed.
$ python3 gettext_example.py
This message is in the script.
The next step is to extract the message and create the .pot
file,
using pygettext.py
or xgettext
.
$ xgettext -o example.pot gettext_example.py
The output file produced contains the following content.
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2018-03-18 16:20-0400\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"Language: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: 8bit\n"
#: gettext_example.py:19
msgid "This message is in the script."
msgstr ""
Message catalogs are installed into directories organized by domain
and language. The domain is provided by the application or library,
and is usually a unique value like the application name. In this
case, the domain in gettext_example.py
is example_domain
. The
language value is provided by the user’s environment at runtime,
through one of the environment variables LANGUAGE
, LC_ALL
,
LC_MESSAGES
, or LANG
, depending on their configuration and
platform. These examples were all run with the language set to
en_US
.
Now that the template is ready, the next step is to create the
required directory structure and copy the template in to the right
spot. The locale
directory inside the PyMOTW source tree will
serve as the root of the message catalog directory for these examples,
but it is typically better to use a directory accessible system-wide
so that all users have access to the message catalogs. The full path
to the catalog input source is
$localedir/$language/LC_MESSAGES/$domain.po
, and the actual
catalog has the filename extension .mo
.
The catalog is created by copying example.pot
to
locale/en_US/LC_MESSAGES/example.po
and editing it to change
the values in the header and set the alternate messages. The result
is shown next.
# Messages from gettext_example.py.
# Copyright (C) 2009 Doug Hellmann
# Doug Hellmann <doug@doughellmann.com>, 2016.
#
msgid ""
msgstr ""
"Project-Id-Version: PyMOTW-3\n"
"Report-Msgid-Bugs-To: Doug Hellmann <doug@doughellmann.com>\n"
"POT-Creation-Date: 2016-01-24 13:04-0500\n"
"PO-Revision-Date: 2016-01-24 13:04-0500\n"
"Last-Translator: Doug Hellmann <doug@doughellmann.com>\n"
"Language-Team: US English <doug@doughellmann.com>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
#: gettext_example.py:16
msgid "This message is in the script."
msgstr "This message is in the en_US catalog."
The catalog is built from the .po
file using msgformat
.
$ cd locale/en_US/LC_MESSAGES; msgfmt -o example.mo example.po
The domain in gettext_example.py
is example_domain
, but the
file is called example.pot
. To have gettext
find the right
translation file, the names need to match.
t = gettext.translation(
'example', 'locale',
fallback=True,
)
Now when the script is run, the message from the catalog is printed instead of the in-line string.
$ python3 gettext_example_corrected.py
This message is in the en_US catalog.
Finding Message Catalogs at Runtime¶
As described earlier, the locale directory containing the message
catalogs is organized based on the language with catalogs named for
the domain of the program. Different operating systems define their
own default value, but gettext
does not know all of these
defaults. It uses a default locale directory of sys.prefix +
'/share/locale'
, but most of the time it is safer to always
explicitly give a localedir
value than to depend on this default
being valid. The find()
function is responsible for locating an
appropriate message catalog at runtime.
import gettext
catalogs = gettext.find('example', 'locale', all=True)
print('Catalogs:', catalogs)
The language portion of the path is taken from one of several
environment variables that can be used to configure localization
features (LANGUAGE
, LC_ALL
, LC_MESSAGES
, and LANG
).
The first variable found to be set is used. Multiple languages can be
selected by separating the values with a colon (:
). To see how
that works, use a second message catalog to run a few experiments.
$ cd locale/en_CA/LC_MESSAGES; msgfmt -o example.mo example.po
$ cd ../../..
$ python3 gettext_find.py
Catalogs: ['locale/en_US/LC_MESSAGES/example.mo']
$ LANGUAGE=en_CA python3 gettext_find.py
Catalogs: ['locale/en_CA/LC_MESSAGES/example.mo']
$ LANGUAGE=en_CA:en_US python3 gettext_find.py
Catalogs: ['locale/en_CA/LC_MESSAGES/example.mo',
'locale/en_US/LC_MESSAGES/example.mo']
$ LANGUAGE=en_US:en_CA python3 gettext_find.py
Catalogs: ['locale/en_US/LC_MESSAGES/example.mo',
'locale/en_CA/LC_MESSAGES/example.mo']
Although find()
shows the complete list of catalogs, only the
first one in the sequence is actually loaded for message lookups.
$ python3 gettext_example_corrected.py
This message is in the en_US catalog.
$ LANGUAGE=en_CA python3 gettext_example_corrected.py
This message is in the en_CA catalog.
$ LANGUAGE=en_CA:en_US python3 gettext_example_corrected.py
This message is in the en_CA catalog.
$ LANGUAGE=en_US:en_CA python3 gettext_example_corrected.py
This message is in the en_US catalog.
Plural Values¶
While simple message substitution will handle most translation needs,
gettext
treats pluralization as a special case. Depending on
the language, the difference between the singular and plural forms of
a message may vary only by the ending of a single word, or the entire
sentence structure may be different. There may also be different
forms depending on the level of plurality. To make managing plurals
easier (and, in some cases, possible), there is a separate set of
functions for asking for the plural form of a message.
from gettext import translation
import sys
t = translation('plural', 'locale', fallback=False)
num = int(sys.argv[1])
msg = t.ngettext('{num} means singular.',
'{num} means plural.',
num)
# Still need to add the values to the message ourself.
print(msg.format(num=num))
Use ngettext()
to access the plural substitution for a message.
The arguments are the messages to be translated and the item count.
$ xgettext -L Python -o plural.pot gettext_plural.py
Since there are alternate forms to be translated, the replacements are listed in an array. Using an array allows translations for languages with multiple plural forms (for example, Polish has different forms indicating the relative quantity).
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2018-03-18 16:20-0400\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"Language: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=INTEGER; plural=EXPRESSION;\n"
#: gettext_plural.py:15
#, python-brace-format
msgid "{num} means singular."
msgid_plural "{num} means plural."
msgstr[0] ""
msgstr[1] ""
In addition to filling in the translation strings, the library needs
to be told about the way plurals are formed so it knows how to index
into the array for any given count value. The line "Plural-Forms:
nplurals=INTEGER; plural=EXPRESSION;\n"
includes two values to
replace manually. nplurals
is an integer indicating the size of
the array (the number of translations used) and plural
is a C
language expression for converting the incoming quantity to an index
in the array when looking up the translation. The literal string
n
is replaced with the quantity passed to ungettext()
.
For example, English includes two plural forms. A quantity of 0
is treated as plural (“0 bananas”). The Plural-Forms
entry is:
Plural-Forms: nplurals=2; plural=n != 1;
The singular translation would then go in position 0, and the plural translation in position 1.
# Messages from gettext_plural.py
# Copyright (C) 2009 Doug Hellmann
# This file is distributed under the same license
# as the PyMOTW package.
# Doug Hellmann <doug@doughellmann.com>, 2016.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PyMOTW-3\n"
"Report-Msgid-Bugs-To: Doug Hellmann <doug@doughellmann.com>\n"
"POT-Creation-Date: 2016-01-24 13:04-0500\n"
"PO-Revision-Date: 2016-01-24 13:04-0500\n"
"Last-Translator: Doug Hellmann <doug@doughellmann.com>\n"
"Language-Team: en_US <doug@doughellmann.com>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=n != 1;"
#: gettext_plural.py:15
#, python-format
msgid "{num} means singular."
msgid_plural "{num} means plural."
msgstr[0] "In en_US, {num} is singular."
msgstr[1] "In en_US, {num} is plural."
Running the test script a few times after the catalog is compiled will demonstrate how different values of N are converted to indexes for the translation strings.
$ cd locale/en_US/LC_MESSAGES/; msgfmt -o plural.mo plural.po
$ cd ../../..
$ python3 gettext_plural.py 0
In en_US, 0 is plural.
$ python3 gettext_plural.py 1
In en_US, 1 is singular.
$ python3 gettext_plural.py 2
In en_US, 2 is plural.
Application vs. Module Localization¶
The scope of a translation effort defines how gettext
is
installed and used with a body of code.
Application Localization¶
For application-wide translations, it is acceptable for the
author to install a function like ngettext()
globally using the
__builtins__
namespace, because they have control over the
top-level of the application’s code.
import gettext
gettext.install(
'example',
'locale',
names=['ngettext'],
)
print(_('This message is in the script.'))
The install()
function binds gettext()
to the name _()
in the __builtins__
namespace. It also adds ngettext()
and
other functions listed in names
.
Module Localization¶
For a library or individual module, modifying __builtins__
is not
a good idea because it may introduce conflicts with an application
global value. Instead, import or re-bind the names of translation
functions by hand at the top of the module.
import gettext
t = gettext.translation(
'example',
'locale',
fallback=False,
)
_ = t.gettext
ngettext = t.ngettext
print(_('This message is in the script.'))
Switching Translations¶
The earlier examples all use a single translation for the duration of
the program. Some situations, especially web applications, need to
use different message catalogs at different times, without exiting and
resetting the environment. For those cases, the class-based API
provided in gettext
will be more convenient. The API calls are
essentially the same as the global calls described in this section,
but the message catalog object is exposed and can be manipulated
directly, so that multiple catalogs can be used.
See also
- Standard library documentation for gettext
locale
– Other localization tools.- GNU gettext – The message catalog formats, API, etc. for this module are all based on the original gettext package from GNU. The catalog file formats are compatible, and the command line scripts have similar options (if not identical). The GNU gettext manual has a detailed description of the file formats and describes GNU versions of the tools for working with them.
- Plural forms – Handling of plural forms of words and sentences in different languages.
- Internationalizing Python – A paper by Martin von Löwis about techniques for internationalization of Python applications.
- Django Internationalization – Another good source of information on using gettext, including real-life examples.