summaryrefslogtreecommitdiff
path: root/gettext-tools/doc/gettext_3.html
blob: ae348a806661a3f5f467bee0e3b879f6857e7c80 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.52b
     from gettext.texi on 28 December 2015 -->

<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
<TITLE>GNU gettext utilities - 3  The Format of PO Files</TITLE>
</HEAD>
<BODY>
Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_2.html">previous</A>, <A HREF="gettext_4.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
<P><HR><P>


<H1><A NAME="SEC15" HREF="gettext_toc.html#TOC15">3  The Format of PO Files</A></H1>
<P>
<A NAME="IDX55"></A>
<A NAME="IDX56"></A>

</P>
<P>
The GNU <CODE>gettext</CODE> toolset helps programmers and translators
at producing, updating and using translation files, mainly those
PO files which are textual, editable files.  This chapter explains
the format of PO files.

</P>
<P>
A PO file is made up of many entries, each entry holding the relation
between an original untranslated string and its corresponding
translation.  All entries in a given PO file usually pertain
to a single project, and all translations are expressed in a single
target language.  One PO file <EM>entry</EM> has the following schematic
structure:

</P>

<PRE>
<VAR>white-space</VAR>
#  <VAR>translator-comments</VAR>
#. <VAR>extracted-comments</VAR>
#: <VAR>reference</VAR>...
#, <VAR>flag</VAR>...
#| msgid <VAR>previous-untranslated-string</VAR>
msgid <VAR>untranslated-string</VAR>
msgstr <VAR>translated-string</VAR>
</PRE>

<P>
The general structure of a PO file should be well understood by
the translator.  When using PO mode, very little has to be known
about the format details, as PO mode takes care of them for her.

</P>
<P>
A simple entry can look like this:

</P>

<PRE>
#: lib/error.c:116
msgid "Unknown system error"
msgstr "Error desconegut del sistema"
</PRE>

<P>
<A NAME="IDX57"></A>
<A NAME="IDX58"></A>
<A NAME="IDX59"></A>
Entries begin with some optional white space.  Usually, when generated
through GNU <CODE>gettext</CODE> tools, there is exactly one blank line
between entries.  Then comments follow, on lines all starting with the
character <CODE>#</CODE>.  There are two kinds of comments: those which have
some white space immediately following the <CODE>#</CODE> - the <VAR>translator
comments</VAR> -, which comments are created and maintained exclusively by the
translator, and those which have some non-white character just after the
<CODE>#</CODE> - the <VAR>automatic comments</VAR> -, which comments are created and
maintained automatically by GNU <CODE>gettext</CODE> tools.  Comment lines
starting with <CODE>#.</CODE> contain comments given by the programmer, directed
at the translator; these comments are called <VAR>extracted comments</VAR>
because the <CODE>xgettext</CODE> program extracts them from the program's
source code.  Comment lines starting with <CODE>#:</CODE> contain references to
the program's source code.  Comment lines starting with <CODE>#,</CODE> contain
flags; more about these below.  Comment lines starting with <CODE>#|</CODE>
contain the previous untranslated string for which the translator gave
a translation.

</P>
<P>
All comments, of either kind, are optional.

</P>
<P>
<A NAME="IDX60"></A>
<A NAME="IDX61"></A>
After white space and comments, entries show two strings, namely
first the untranslated string as it appears in the original program
sources, and then, the translation of this string.  The original
string is introduced by the keyword <CODE>msgid</CODE>, and the translation,
by <CODE>msgstr</CODE>.  The two strings, untranslated and translated,
are quoted in various ways in the PO file, using <CODE>"</CODE>
delimiters and <CODE>\</CODE> escapes, but the translator does not really
have to pay attention to the precise quoting format, as PO mode fully
takes care of quoting for her.

</P>
<P>
The <CODE>msgid</CODE> strings, as well as automatic comments, are produced
and managed by other GNU <CODE>gettext</CODE> tools, and PO mode does not
provide means for the translator to alter these.  The most she can
do is merely deleting them, and only by deleting the whole entry.
On the other hand, the <CODE>msgstr</CODE> string, as well as translator
comments, are really meant for the translator, and PO mode gives her
the full control she needs.

</P>
<P>
The comment lines beginning with <CODE>#,</CODE> are special because they are
not completely ignored by the programs as comments generally are.  The
comma separated list of <VAR>flag</VAR>s is used by the <CODE>msgfmt</CODE>
program to give the user some better diagnostic messages.  Currently
there are two forms of flags defined:

</P>
<DL COMPACT>

<DT><CODE>fuzzy</CODE>
<DD>
<A NAME="IDX62"></A>
This flag can be generated by the <CODE>msgmerge</CODE> program or it can be
inserted by the translator herself.  It shows that the <CODE>msgstr</CODE>
string might not be a correct translation (anymore).  Only the translator
can judge if the translation requires further modification, or is
acceptable as is.  Once satisfied with the translation, she then removes
this <CODE>fuzzy</CODE> attribute.  The <CODE>msgmerge</CODE> program inserts this
when it combined the <CODE>msgid</CODE> and <CODE>msgstr</CODE> entries after fuzzy
search only.  See section <A HREF="gettext_8.html#SEC64">8.3.6  Fuzzy Entries</A>.

<DT><CODE>c-format</CODE>
<DD>
<A NAME="IDX63"></A>
<DT><CODE>no-c-format</CODE>
<DD>
<A NAME="IDX64"></A>
These flags should not be added by a human.  Instead only the
<CODE>xgettext</CODE> program adds them.  In an automated PO file processing
system as proposed here, the user's changes would be thrown away again as
soon as the <CODE>xgettext</CODE> program generates a new template file.

The <CODE>c-format</CODE> flag indicates that the untranslated string and the
translation are supposed to be C format strings.  The <CODE>no-c-format</CODE>
flag indicates that they are not C format strings, even though the untranslated
string happens to look like a C format string (with <SAMP>&lsquo;%&rsquo;</SAMP> directives).

When the <CODE>c-format</CODE> flag is given for a string the <CODE>msgfmt</CODE>
program does some more tests to check the validity of the translation.
See section <A HREF="gettext_10.html#SEC157">10.1  Invoking the <CODE>msgfmt</CODE> Program</A>, section <A HREF="gettext_4.html#SEC22">4.6  Special Comments preceding Keywords</A> and section <A HREF="gettext_15.html#SEC252">15.3.1  C Format Strings</A>.

<DT><CODE>objc-format</CODE>
<DD>
<A NAME="IDX65"></A>
<DT><CODE>no-objc-format</CODE>
<DD>
<A NAME="IDX66"></A>
Likewise for Objective C, see section <A HREF="gettext_15.html#SEC253">15.3.2  Objective C Format Strings</A>.

<DT><CODE>sh-format</CODE>
<DD>
<A NAME="IDX67"></A>
<DT><CODE>no-sh-format</CODE>
<DD>
<A NAME="IDX68"></A>
Likewise for Shell, see section <A HREF="gettext_15.html#SEC254">15.3.3  Shell Format Strings</A>.

<DT><CODE>python-format</CODE>
<DD>
<A NAME="IDX69"></A>
<DT><CODE>no-python-format</CODE>
<DD>
<A NAME="IDX70"></A>
Likewise for Python, see section <A HREF="gettext_15.html#SEC255">15.3.4  Python Format Strings</A>.

<DT><CODE>python-brace-format</CODE>
<DD>
<A NAME="IDX71"></A>
<DT><CODE>no-python-brace-format</CODE>
<DD>
<A NAME="IDX72"></A>
Likewise for Python brace, see section <A HREF="gettext_15.html#SEC255">15.3.4  Python Format Strings</A>.

<DT><CODE>lisp-format</CODE>
<DD>
<A NAME="IDX73"></A>
<DT><CODE>no-lisp-format</CODE>
<DD>
<A NAME="IDX74"></A>
Likewise for Lisp, see section <A HREF="gettext_15.html#SEC256">15.3.5  Lisp Format Strings</A>.

<DT><CODE>elisp-format</CODE>
<DD>
<A NAME="IDX75"></A>
<DT><CODE>no-elisp-format</CODE>
<DD>
<A NAME="IDX76"></A>
Likewise for Emacs Lisp, see section <A HREF="gettext_15.html#SEC257">15.3.6  Emacs Lisp Format Strings</A>.

<DT><CODE>librep-format</CODE>
<DD>
<A NAME="IDX77"></A>
<DT><CODE>no-librep-format</CODE>
<DD>
<A NAME="IDX78"></A>
Likewise for librep, see section <A HREF="gettext_15.html#SEC258">15.3.7  librep Format Strings</A>.

<DT><CODE>scheme-format</CODE>
<DD>
<A NAME="IDX79"></A>
<DT><CODE>no-scheme-format</CODE>
<DD>
<A NAME="IDX80"></A>
Likewise for Scheme, see section <A HREF="gettext_15.html#SEC259">15.3.8  Scheme Format Strings</A>.

<DT><CODE>smalltalk-format</CODE>
<DD>
<A NAME="IDX81"></A>
<DT><CODE>no-smalltalk-format</CODE>
<DD>
<A NAME="IDX82"></A>
Likewise for Smalltalk, see section <A HREF="gettext_15.html#SEC260">15.3.9  Smalltalk Format Strings</A>.

<DT><CODE>java-format</CODE>
<DD>
<A NAME="IDX83"></A>
<DT><CODE>no-java-format</CODE>
<DD>
<A NAME="IDX84"></A>
Likewise for Java, see section <A HREF="gettext_15.html#SEC261">15.3.10  Java Format Strings</A>.

<DT><CODE>csharp-format</CODE>
<DD>
<A NAME="IDX85"></A>
<DT><CODE>no-csharp-format</CODE>
<DD>
<A NAME="IDX86"></A>
Likewise for C#, see section <A HREF="gettext_15.html#SEC262">15.3.11  C# Format Strings</A>.

<DT><CODE>awk-format</CODE>
<DD>
<A NAME="IDX87"></A>
<DT><CODE>no-awk-format</CODE>
<DD>
<A NAME="IDX88"></A>
Likewise for awk, see section <A HREF="gettext_15.html#SEC263">15.3.12  awk Format Strings</A>.

<DT><CODE>object-pascal-format</CODE>
<DD>
<A NAME="IDX89"></A>
<DT><CODE>no-object-pascal-format</CODE>
<DD>
<A NAME="IDX90"></A>
Likewise for Object Pascal, see section <A HREF="gettext_15.html#SEC264">15.3.13  Object Pascal Format Strings</A>.

<DT><CODE>ycp-format</CODE>
<DD>
<A NAME="IDX91"></A>
<DT><CODE>no-ycp-format</CODE>
<DD>
<A NAME="IDX92"></A>
Likewise for YCP, see section <A HREF="gettext_15.html#SEC265">15.3.14  YCP Format Strings</A>.

<DT><CODE>tcl-format</CODE>
<DD>
<A NAME="IDX93"></A>
<DT><CODE>no-tcl-format</CODE>
<DD>
<A NAME="IDX94"></A>
Likewise for Tcl, see section <A HREF="gettext_15.html#SEC266">15.3.15  Tcl Format Strings</A>.

<DT><CODE>perl-format</CODE>
<DD>
<A NAME="IDX95"></A>
<DT><CODE>no-perl-format</CODE>
<DD>
<A NAME="IDX96"></A>
Likewise for Perl, see section <A HREF="gettext_15.html#SEC267">15.3.16  Perl Format Strings</A>.

<DT><CODE>perl-brace-format</CODE>
<DD>
<A NAME="IDX97"></A>
<DT><CODE>no-perl-brace-format</CODE>
<DD>
<A NAME="IDX98"></A>
Likewise for Perl brace, see section <A HREF="gettext_15.html#SEC267">15.3.16  Perl Format Strings</A>.

<DT><CODE>php-format</CODE>
<DD>
<A NAME="IDX99"></A>
<DT><CODE>no-php-format</CODE>
<DD>
<A NAME="IDX100"></A>
Likewise for PHP, see section <A HREF="gettext_15.html#SEC268">15.3.17  PHP Format Strings</A>.

<DT><CODE>gcc-internal-format</CODE>
<DD>
<A NAME="IDX101"></A>
<DT><CODE>no-gcc-internal-format</CODE>
<DD>
<A NAME="IDX102"></A>
Likewise for the GCC sources, see section <A HREF="gettext_15.html#SEC269">15.3.18  GCC internal Format Strings</A>.

<DT><CODE>gfc-internal-format</CODE>
<DD>
<A NAME="IDX103"></A>
<DT><CODE>no-gfc-internal-format</CODE>
<DD>
<A NAME="IDX104"></A>
Likewise for the GNU Fortran Compiler sources, see section <A HREF="gettext_15.html#SEC270">15.3.19  GFC internal Format Strings</A>.

<DT><CODE>qt-format</CODE>
<DD>
<A NAME="IDX105"></A>
<DT><CODE>no-qt-format</CODE>
<DD>
<A NAME="IDX106"></A>
Likewise for Qt, see section <A HREF="gettext_15.html#SEC271">15.3.20  Qt Format Strings</A>.

<DT><CODE>qt-plural-format</CODE>
<DD>
<A NAME="IDX107"></A>
<DT><CODE>no-qt-plural-format</CODE>
<DD>
<A NAME="IDX108"></A>
Likewise for Qt plural forms, see section <A HREF="gettext_15.html#SEC272">15.3.21  Qt Format Strings</A>.

<DT><CODE>kde-format</CODE>
<DD>
<A NAME="IDX109"></A>
<DT><CODE>no-kde-format</CODE>
<DD>
<A NAME="IDX110"></A>
Likewise for KDE, see section <A HREF="gettext_15.html#SEC273">15.3.22  KDE Format Strings</A>.

<DT><CODE>boost-format</CODE>
<DD>
<A NAME="IDX111"></A>
<DT><CODE>no-boost-format</CODE>
<DD>
<A NAME="IDX112"></A>
Likewise for Boost, see section <A HREF="gettext_15.html#SEC275">15.3.24  Boost Format Strings</A>.

<DT><CODE>lua-format</CODE>
<DD>
<A NAME="IDX113"></A>
<DT><CODE>no-lua-format</CODE>
<DD>
<A NAME="IDX114"></A>
Likewise for Lua, see section <A HREF="gettext_15.html#SEC276">15.3.25  Lua Format Strings</A>.

<DT><CODE>javascript-format</CODE>
<DD>
<A NAME="IDX115"></A>
<DT><CODE>no-javascript-format</CODE>
<DD>
<A NAME="IDX116"></A>
Likewise for JavaScript, see section <A HREF="gettext_15.html#SEC277">15.3.26  JavaScript Format Strings</A>.

</DL>

<P>
<A NAME="IDX117"></A>
<A NAME="IDX118"></A>
It is also possible to have entries with a context specifier. They look like
this:

</P>

<PRE>
<VAR>white-space</VAR>
#  <VAR>translator-comments</VAR>
#. <VAR>extracted-comments</VAR>
#: <VAR>reference</VAR>...
#, <VAR>flag</VAR>...
#| msgctxt <VAR>previous-context</VAR>
#| msgid <VAR>previous-untranslated-string</VAR>
msgctxt <VAR>context</VAR>
msgid <VAR>untranslated-string</VAR>
msgstr <VAR>translated-string</VAR>
</PRE>

<P>
The context serves to disambiguate messages with the same
<VAR>untranslated-string</VAR>.  It is possible to have several entries with
the same <VAR>untranslated-string</VAR> in a PO file, provided that they each
have a different <VAR>context</VAR>.  Note that an empty <VAR>context</VAR> string
and an absent <CODE>msgctxt</CODE> line do not mean the same thing.

</P>
<P>
<A NAME="IDX119"></A>
<A NAME="IDX120"></A>
A different kind of entries is used for translations which involve
plural forms.

</P>

<PRE>
<VAR>white-space</VAR>
#  <VAR>translator-comments</VAR>
#. <VAR>extracted-comments</VAR>
#: <VAR>reference</VAR>...
#, <VAR>flag</VAR>...
#| msgid <VAR>previous-untranslated-string-singular</VAR>
#| msgid_plural <VAR>previous-untranslated-string-plural</VAR>
msgid <VAR>untranslated-string-singular</VAR>
msgid_plural <VAR>untranslated-string-plural</VAR>
msgstr[0] <VAR>translated-string-case-0</VAR>
...
msgstr[N] <VAR>translated-string-case-n</VAR>
</PRE>

<P>
Such an entry can look like this:

</P>

<PRE>
#: src/msgcmp.c:338 src/po-lex.c:699
#, c-format
msgid "found %d fatal error"
msgid_plural "found %d fatal errors"
msgstr[0] "s'ha trobat %d error fatal"
msgstr[1] "s'han trobat %d errors fatals"
</PRE>

<P>
Here also, a <CODE>msgctxt</CODE> context can be specified before <CODE>msgid</CODE>,
like above.

</P>
<P>
Here, additional kinds of flags can be used:

</P>
<DL COMPACT>

<DT><CODE>range:</CODE>
<DD>
<A NAME="IDX121"></A>
This flag is followed by a range of non-negative numbers, using the syntax
<CODE>range: <VAR>minimum-value</VAR>..<VAR>maximum-value</VAR></CODE>.  It designates the
possible values that the numeric parameter of the message can take.  In some
languages, translators may produce slightly better translations if they know
that the value can only take on values between 0 and 10, for example.
</DL>

<P>
The <VAR>previous-untranslated-string</VAR> is optionally inserted by the
<CODE>msgmerge</CODE> program, at the same time when it marks a message fuzzy.
It helps the translator to see which changes were done by the developers
on the <VAR>untranslated-string</VAR>.

</P>
<P>
It happens that some lines, usually whitespace or comments, follow the
very last entry of a PO file.  Such lines are not part of any entry,
and will be dropped when the PO file is processed by the tools, or may
disturb some PO file editors.

</P>
<P>
The remainder of this section may be safely skipped by those using
a PO file editor, yet it may be interesting for everybody to have a better
idea of the precise format of a PO file.  On the other hand, those
wishing to modify PO files by hand should carefully continue reading on.

</P>
<P>
An empty <VAR>untranslated-string</VAR> is reserved to contain the header
entry with the meta information (see section <A HREF="gettext_6.html#SEC44">6.2  Filling in the Header Entry</A>).  This header
entry should be the first entry of the file.  The empty
<VAR>untranslated-string</VAR> is reserved for this purpose and must
not be used anywhere else.

</P>
<P>
Each of <VAR>untranslated-string</VAR> and <VAR>translated-string</VAR> respects
the C syntax for a character string, including the surrounding quotes
and embedded backslashed escape sequences.  When the time comes
to write multi-line strings, one should not use escaped newlines.
Instead, a closing quote should follow the last character on the
line to be continued, and an opening quote should resume the string
at the beginning of the following PO file line.  For example:

</P>

<PRE>
msgid ""
"Here is an example of how one might continue a very long string\n"
"for the common case the string represents multi-line output.\n"
</PRE>

<P>
In this example, the empty string is used on the first line, to
allow better alignment of the <CODE>H</CODE> from the word <SAMP>&lsquo;Here&rsquo;</SAMP>
over the <CODE>f</CODE> from the word <SAMP>&lsquo;for&rsquo;</SAMP>.  In this example, the
<CODE>msgid</CODE> keyword is followed by three strings, which are meant
to be concatenated.  Concatenating the empty string does not change
the resulting overall string, but it is a way for us to comply with
the necessity of <CODE>msgid</CODE> to be followed by a string on the same
line, while keeping the multi-line presentation left-justified, as
we find this to be a cleaner disposition.  The empty string could have
been omitted, but only if the string starting with <SAMP>&lsquo;Here&rsquo;</SAMP> was
promoted on the first line, right after <CODE>msgid</CODE>.<A NAME="DOCF2" HREF="gettext_foot.html#FOOT2">(2)</A> It was not really necessary
either to switch between the two last quoted strings immediately after
the newline <SAMP>&lsquo;\n&rsquo;</SAMP>, the switch could have occurred after <EM>any</EM>
other character, we just did it this way because it is neater.

</P>
<P>
<A NAME="IDX122"></A>
One should carefully distinguish between end of lines marked as
<SAMP>&lsquo;\n&rsquo;</SAMP> <EM>inside</EM> quotes, which are part of the represented
string, and end of lines in the PO file itself, outside string quotes,
which have no incidence on the represented string.

</P>
<P>
<A NAME="IDX123"></A>
Outside strings, white lines and comments may be used freely.
Comments start at the beginning of a line with <SAMP>&lsquo;#&rsquo;</SAMP> and extend
until the end of the PO file line.  Comments written by translators
should have the initial <SAMP>&lsquo;#&rsquo;</SAMP> immediately followed by some white
space.  If the <SAMP>&lsquo;#&rsquo;</SAMP> is not immediately followed by white space,
this comment is most likely generated and managed by specialized GNU
tools, and might disappear or be replaced unexpectedly when the PO
file is given to <CODE>msgmerge</CODE>.

</P>
<P><HR><P>
Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_2.html">previous</A>, <A HREF="gettext_4.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
</BODY>
</HTML>