Translating the Stylesheets

Translating the Stylesheets The Gnome documentation stylesheets provide support for localizing the rendered output from DocBook documents. Localizable strings in the stylesheets are marked for translation and extracted into PO files and &intltool;. After they have been translated, &intltool; merges them into an XML file called &l10n_xml;, which is then used by the stylesheets to localize the output. Providing full internationalization support for the DocBook stylesheets is not trivial, and providing localizations requires translators to understand how documents are formatted and, to some extent, how DocBook works. DocBook is a high-level markup language, and it requires processing applications to provide much of the formatting for documents. DocBook applications must resolve cross references, create tables of contents, format headers, and perform other formatting tasks that need to be localized. Localizing these tasks involves more than translating stand-alone sentences. In effect, the very formatting code itself is localizable. Help Us Help You Document formatting varies greatly across the world. Each locale has a long history of formatting conventions and methods. The maintainers of these stylesheets do not know all the nuances of formatting documents in all locales. The only way we can create better output for your locale is if you tell us when you encounter problems. Although all localization is done by translating strings in a PO file, there are many cases where the translatable string is not a simple sentence or phrase. Rather, translators must provide data using XML fragments. These structured fragments can be used to work with plural forms, provide alternative formattings based on context, or provide format strings.

Simple Strings Translated strings are extracted in the stylesheets by a template called &l10n_gettext;. This template extracts strings from an XML file, which is built from PO files by &intltool;. For example, consider the string Caution, which is used as a heading for caution elements. The stylesheets would call &l10n_gettext; to extract the translated value of this string for the document's language. The &l10n_xml; file would have an entry similar to this: Caution Caution إنتباه Внимание Compte Upozornění Forsigtig Achtung Προσοχή Caution Caution Precaución Ettevaatust Varoitus Attention સાવધાન चेतावनी Figyelem Perhatian Attenzione 注意 주의 Įspėjimas Advarsel सावधानी Let op Advarsel ਸਾਵਧਾਨ Cuidado Cuidado Atenţie Výstraha Kujdes Пажња Pažnja Varning Uyarı Застереження Cảnh báo Adviertance 小心注意 ]]> Translators, however, will work only with the PO files. Using PO files for these strings is no different than any other simple string translation. The PO entry in the sr locale for the above string would look like this:

Plurals In many cases, a static string is insufficient for proper localized content. For these cases, the stylesheets allow for alternate strings by placing the strings in a structured XML fragment. Alternate strings are used in two ways: to provide plural forms according to the plural rules of the language, and to provide alternate formattings based on a specified role. This section discusses plurals. See for a discussion of roles. Plural forms are handled similarly to how they are handled in other applications. A rule is provided to transform a number into a plural index, and a translation is provided for each of those indexes. Unfortunately, there is no standard way to encode this information into XML; thus, there is no mechanism in &intltool;'s XML mode. Consequently, translators must place their translations in an XML fragment. This fragment is merged into the &l10n_xml; file, and the stylesheets extract the appropriate form. Here is an example entry in the &l10n_xml; file: Author Author Authors Аутор Аутори Аутори ]]> Since the Serbian language has three plural forms, three translations have been provided. Each of these is placed in a msgstr element, and the form attribute is used to indicate for which plural form they are used. Again, translators will only see the entries in their PO files. The PO file entry for the above translation looks like this: Author " "Authors" msgstr "" "Аутор\n" "Аутори\n" "Аутори" ]]> Since intltool often alters whitespace, the entry in the PO file might not look as nice as this. When creating the translated message strings, translators may add or remove whitespace between msgstr elements if they choose. This extra text content is ignored by &l10n_gettext;. Note that translators may add a msgstr element without a form attribute as a fallback translation. In the example above, the last two msgstr elements could have been replaced by a single msgstr element without a form attribute. The &l10n_gettext; template would match the first element whenever the plural form is 0, and the fallback element otherwise. The plural form is selected using the &l10n_plural_form; template. This templates takes the number of items as a parameter, and returns the numeric index of the plural form to use. Currently, the rule cannot be automatically extracted from the Plural-Forms entry in the PO file, though this feature is planned for the future. Until this feature is added, plural rules have to be coded manually in the stylesheets. Translators need to notify the maintainers when they begin translating the stylesheets.

Roles In many cases, how to render an element depends on various conditions, such as the grammatical role. For these cases, the stylesheets allow translators to provide multiple translations, each marked with a role attribute from a fixed vocabulary. The list of valid roles will depend on the template, and should be given in the translator comment accompanying each string. However, there are a number of common cases, particularly for labels and cross references. Translating using roles is similar to translating using plural forms. A translation consists of any number of msgstr elements, each with a role attribute. A msgstr element without an attribute can be provided as a default if none of the roles match. For example, the citetitle element in DocBook is used to cite the title of a publication. The type of the publication is specified in the class attribute. In many English publications, article titles are placed in quotes, while book titles are italicized. The following fragment will quote article titles, but italicize all other cited titles. “” ]]> The Serbian translation team has chosen to follow the same convention of quoting article titles and italicizing all others. The entry in sr.po follows. citetitle.format " "“” " "" msgstr "" "„“\n" "" ]]> The meaning of the markup inside the msgstr elements will be explained in . For now, simply note that multiple strings have been provided, each in a msgstr element. The only difference between the original string and the Serbian string is that Serbian is using a different opening quote character, aligned at the baseline. Note also that the original translation contains an additional msgid element. This element is redundant in the merged XML; its only purpose is to distinguish the string from other strings, which may potentially have the same formatting in English. Redundant msgid elements are sometimes used even when neither plural forms nor roles are being used. In those cases, the sole translatable string is placed in a msgstr element with no attributes.

Format Strings Using specialized format strings, the Gnome documentation stylesheets can translate more than just simple strings. These format strings are similar in principle to format strings used in C programs, except that XML is used to insert named parameters, rather than special format tokens being used to insert positional parameters. For instance, DocBook provides the quote element, used to mark inline quotations. How to render an inline quotation depends on the typographic conventions of the language. In U.S. English, they are rendered inside “double inverted-comma” quotation marks. In Serbian, they are typically rendered with the opening quotation „aligned at the baseline“. The entry in the Serbian PO file for this format string follows. quote.format " "‘’ " "“”" msgstr "" "‘’\n" "„”" ]]> Multiple msgstr elements have been provided with roles. These are used to distinguish inner quotes from outer quotes. Roles were described in . The msgstr elements also contain inline markup. This markup is used to insert additional content. In this case, only the node element has been used. This element is replaced by the contents of the quotation element being processed. Each format string has a set number of named arguments available. These arguments should be documented in the translator comments that accompany the string. Note that the default translation may not make use of all the available arguments. In addition to marker elements in format strings, translators may also use simple inline formatting markup. Any content can be wrapped in a span element with certain attributes to control formatting. The attributes are a subset of CSS properties. For HTML output, they are converted directly into the corresponding CSS. The list of allowed attributes is as follows: font-family This attribute sets the font family. Specifying exact font is generally a not advisable. Rather, this should be used to provide a generic family: serif, sans-serif, cursive, fantasy, or monospace. font-style This attribute can be used to italicize the text. The allowed values are italic, oblique, and normal. font-variant This attribute can be used to set the text in small caps. Small caps prints lowercase letters with smaller versions of the uppercase glyphs. The allowed values are small-caps and normal. font-weight This attribute can be used to mark the text bold. CSS allows any number from 100 to 900, with normal text being 400 and bold being 700. In addition to numerical values, you can use one of bold, bolder, lighter, or normal to use pre-defined values. Only bold and normal should generally be needed. font-stretch This attribute can be used to stretch or condense the font. CSS allows a number of keywords to specify by exactly how much to stretch the font. In practice, only wider, narrower, and normal should generally be used. font-size This attribute sets the size of the font. CSS allows both absolute font sizes using keywords or numeric lenghts, as well as relative font sizes using keywords or percentages. Generally, only larger, smaller, and normal should be used for this attribute. Better, use the big and small convenience elements described below. They are defined to respect the size scales used throughout the stylesheets. text-decoration This attribute can set various effects on the text. The allowed values are none, underline, overline, line-through, and blink. Don't use blink. Additionally, extra inline elements are provided for convenience. The formatting done by these elements could also be done using the span element. Using these elements is just easier for common formatting tasks. big Make the text larger. This is preferred over using the font-size attribute, because it is defined to use the size scale used throughout the stylesheets. small Make the text smaller. This is preferred over using the font-size attribute, because it is defined to use the size scale used throughout the stylesheets. sub Render the text as a subscript. sup Render the text as a superscript. b Make the text bold. This is equivalent to setting the font-weight attribute to bold. i Make the text italic. This is equivalent to setting the font-style attribute to italic. tt Make the text monospace. This is equivalent to setting the font-family attribute to monospace. u Underline the text. This is equivalent to setting the text-decoration attribute to underline.

Common Formatter Types There are a number of common types of format strings that are marked for translation in the stylesheets. DocBook contains a lot of structural markup, and many of the same sorts of formatting tasks have to be performed on different elements. For example, chapters, appendixes, and sections all have similar formatting needs, but they usually need to be handled distinctly. The stylesheets do not expose every distinct element of DocBook; rather, they only make distinctions when they matter from a document presentation viewpoint. This section outlines many of the common types of strings translators will encounter. Strings of the same type will generally have the same format arguments and the same set of roles.

Label Formatters Labels are used before titles in headers and contents listings. Usually, a label will contain the object's number followed by some punctuation. Formal block objects, such as tables and figures, often have more information in the label. The Serbian label formatters for sections and figures follow. section.label " ". " ". " "Section " msgstr "" ". \n" ". \n" "Одељак " #: ../xslt/gettext/l10n.xml.in.h:492 msgid "" "figure.label " "Figure " "Figure " "Figure " msgstr "" "Слика \n" "Слика \n" "Слика " ]]> In both cases, translations are provided for the header and li roles. Additionally, a fallback formatting has been provided to format labels when no role is provided. Label formatters will generally use the same two roles. Fallback translations should be provided as well. Most label formatters provide three format arguments which can be used in the translations: title Insert the title of the element being labeled. For most types of element, the title is simply provided by the title in DocBook. A few DocBook elements, notably refentry, have more complicated content models. Translators need only reference the argument, and the stylesheets will determine the title. titleabbrev Insert the abbreviated title of the element being labeled. Abbreviated titles are provided by the titleabbrev element in DocBook. If the labeled element does not have an abbreviated title, the title is used instead. number Insert the fully qualified number of the element being labeled. For most label formatters, there is a corresponding number formatter that will be called for this argument. Since labels are used before titles, most label formatters should only need to use the number of the element.

Number Formatters Numbers are used in labels, cross references, and other identifiers. Numbers identify elements by their position in the document. Numbers can be as simple as single-level positions, or they may indicate a hierarchy. For example, the third subsection of the fourth section in the second chapter would be Section 2.4.3. The job of number formatters is to put together the hierarchical number string. Thus, number formatters are not called for single-level numbers. The single-level number of an element in its parent is referred to as that element's digit. Number formatters work by specifying how to combine the parent element's number with the current element's digit. Two format arguments are allowed: parent Insert the fully qualified number of the element's parent. In many cases, this number is constructed by calling the number formatter for the parent element. digit Insert the single-level position of the element in its parent element. How the digit is displayed is determined by the corresponding digit format. The Serbian label formatters for sections and figures follow. section.number " "." msgstr"" "." #: ../xslt/gettext/l10n.xml.in.h:525 msgid "" "figure.number " "-" msgstr "" "-" ]]> Note that msgstr elements are used to contain the strings, even though neither plural forms nor roles are being used. This is because a msgid has been inserted into the translatable string to allow number formatters for different elements to be distinct messages in PO files.

Digit Formats Digits are the part of an element's number that specify its position in its parent element. An element's number consists of its parent number and its digit. Digits can be formatted using a number of numbering systems. Digit formats aren't actually format strings, nor are they user-visible strings. They're simply tokens that specify how to format a number. Currently, only the following five digit formats are supported: 1 Format the number using Arabic numerals: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12. A Format the number using uppercase Latin letters: A, B, C, D, E, F, G, H, I, J, K, L. a Format the number using lowercase Latin letters: a, b, c, d, e, f, g, h, i, j, k, l. I Format the number using uppercase Roman numerals: I, II, III, IV, V, VI, VII, VIII, IX, X, XI, XII: i Format the number using lowercase Roman numerals: i, ii, iii, iv, v, vi, vii, viii, ix, x, xi, xii: These five numbering systems are unlikely to be sufficient, particularly for non-Western languages. Translators who would like to format numbers differently should contact the maintainers, and we can try to add additional digit formats.

Cross Reference Formatters Cross reference formatters are used to format the text of a link to another element in the document. In many languages, how best to format an individual cross reference will depend on its usage. Often, cross references will need to be formatted differently based on their grammatical role in a sentence. The cross reference formatters allow translators to provide multiple formattings using roles. Documentation authors and translators can then select the format for a cross reference using the &xrefstyle; attribute on the xref element. The Gnome documentation stylesheets allow &xrefstyle; attributes of the form role:somerole, where somerole is the role to be passed to the cross reference formatter. Cross reference formatters generally provide the following three format arguments: title Insert the title of the element being referenced. For most types of element, the title is simply provided by the title in DocBook. A few DocBook elements, notably refentry, have more complicated content models. Translators need only reference the argument, and the stylesheets will determine the title. titleabbrev Insert the abbreviated title of the element being referenced. Abbreviated titles are provided by the titleabbrev element in DocBook. If the labeled element does not have an abbreviated title, the title is used instead. number Insert the fully qualified number of the element being referenced. For most label formatters, there is a corresponding number formatter that will be called for this argument. Insert an example here

Tooltip Formatters Each hyperlink in the HTML output is given a tooltip by the stylesheets. Since hyperlinks can be created using a number of different semantic linking mechanisms in DocBook, the stylesheets are able to provide rich information in the hyperlink tooltips. The stylesheets provide tooltip formatters for various linking mechanisms. These can then be translated to provide rich information about hyperlinks in any language. For example, the email element in DocBook is converted into a hyperlink allowing users to send email to the given address. The Serbian translation for this formatter follows. email.tooltip " "Send email to ‘’." msgstr "" "Пошаљите е-писмо на „“." ]]> Each tooltip formatter will have its own format arguments. Generally, only a single format argument will be needed, and the translator comments for the string should clearly specify the valid arguments.