diff options
author | Simon Cozens <simon@simon-cozens.org> | 2015-08-25 19:57:15 +0100 |
---|---|---|
committer | Simon Cozens <simon@simon-cozens.org> | 2015-08-25 19:57:15 +0100 |
commit | f0807654da160bd7ceb9aff5b8338ec0b643171c (patch) | |
tree | 6104b66230eb7f679369763e0102c3d21c871f7f | |
parent | fdd1770e006ca2d2973c049177ceda87a575e07f (diff) | |
download | harfbuzz-f0807654da160bd7ceb9aff5b8338ec0b643171c.tar.gz |
First two chapters. More to follow.
-rw-r--r-- | docs/usermanual-ch01.xml | 115 | ||||
-rw-r--r-- | docs/usermanual-ch02.xml | 182 |
2 files changed, 297 insertions, 0 deletions
diff --git a/docs/usermanual-ch01.xml b/docs/usermanual-ch01.xml new file mode 100644 index 00000000..1ee0cbee --- /dev/null +++ b/docs/usermanual-ch01.xml @@ -0,0 +1,115 @@ +<sect1 id="what-is-harfbuzz"> + <title>What is Harfbuzz?</title> + <para> + Harfbuzz is a <emphasis>text shaping engine</emphasis>. It solves + the problem of selecting and positioning glyphs from a font given a + Unicode string. + </para> + <sect2 id="why-do-i-need-it"> + <title>Why do I need it?</title> + <para> + Text shaping is an integral part of preparing text for display. It + is a fairly low level operation; Harfbuzz is used directly by + graphic rendering libraries such as Pango, and the layout engines + in Firefox, LibreOffice and Chromium. Unless you are + <emphasis>writing</emphasis> one of these layout engines yourself, + you will probably not need to use Harfbuzz - normally higher level + libraries will turn text into glyphs for you. + </para> + <para> + However, if you <emphasis>are</emphasis> writing a layout engine + or graphics library yourself, you will need to perform text + shaping, and this is where Harfbuzz can help you. Here are some + reasons why you need it: + </para> + <itemizedlist> + <listitem> + <para> + OpenType fonts contain a set of glyphs, indexed by glyph ID. + The glyph ID within the font does not necessarily relate to a + Unicode codepoint. For instance, some fonts have the letter + "a" as glyph ID 1. To pull the right glyph out of + the font in order to display it, you need to consult a table + within the font (the "cmap" table) which maps + Unicode codepoints to glyph IDs. Text shaping turns codepoints + into glyph IDs. + </para> + </listitem> + <listitem> + <para> + Many OpenType fonts contain ligatures: combinations of + characters which are rendered together. For instance, it's + common for the <literal>fi</literal> combination to appear in + print as the single ligature "fi". Whether you should + render text as <literal>fi</literal> or "fi" does not + depend on the input text, but on the capabilities of the font + and the level of ligature application you wish to perform. + Text shaping involves querying the font's ligature tables and + determining what substitutions should be made. + </para> + </listitem> + <listitem> + <para> + While ligatures like "fi" are typographic + refinements, some languages <emphasis>require</emphasis> such + substitutions to be made in order to display text correctly. + In Tamil, when the letter "TTA" (ட) letter is + followed by "U" (உ), the combination should appear + as the single glyph "டு". The sequence of Unicode + characters "டஉ" needs to be rendered as a single + glyph from the font - text shaping chooses the correct glyph + from the sequence of characters provided. + </para> + </listitem> + <listitem> + <para> + Similarly, each Arabic character has four different variants: + within a font, there will be glyphs for the initial, medial, + final, and isolated forms of each letter. Unicode only encodes + one codepoint per character, and so a Unicode string will not + tell you which glyph to use. Text shaping chooses the correct + form of the letter and returns the correct glyph from the font + that you need to render. + </para> + </listitem> + <listitem> + <para> + Other languages have marks and accents which need to be + rendered in certain positions around a base character. For + instance, the Moldovan language has the Cyrillic letter + "zhe" (ж) with a breve accent, like so: ӂ. Some + fonts will contain this character as an individual glyph, + whereas other fonts will not contain a zhe-with-breve glyph + but expect the rendering engine to form the character by + overlaying the two glyphs ж and ˘. Where you should draw the + combining breve depends on the height of the preceding glyph. + Again, for Arabic, the correct positioning of vowel marks + depends on the height of the character on which you are + placing the mark. Text shaping tells you whether you have a + precomposed glyph within your font or if you need to compose a + glyph yourself out of combining marks, and if so, where to + position those marks. + </para> + </listitem> + </itemizedlist> + <para> + If this is something that you need to do, then you need a text + shaping engine: you could use Uniscribe if you are using Windows; + you could use CoreText on OS X; or you could use Harfbuzz. In the + rest of this manual, we are going to assume that you are the + implementor of a text layout engine. + </para> + </sect2> + <sect2 id="why-is-it-called-harfbuzz"> + <title>Why is it called Harfbuzz?</title> + <para> + Harfbuzz began its life as text shaping code within the FreeType + project, (and you will see references to the FreeType authors + within the source code copyright declarations) but was then + abstracted out to its own project. This project is maintained by + Behdad Esfahbod, and named Harfbuzz. Originally, it was a shaping + engine for OpenType fonts - "Harfbuzz" is the Persian + for "open type". + </para> + </sect2> +</sect1>
\ No newline at end of file diff --git a/docs/usermanual-ch02.xml b/docs/usermanual-ch02.xml new file mode 100644 index 00000000..f0a161dd --- /dev/null +++ b/docs/usermanual-ch02.xml @@ -0,0 +1,182 @@ +<sect1 id="hello-harfbuzz"> + <title>Hello, Harfbuzz</title> + <para> + Here's the simplest Harfbuzz that can possibly work. We will improve + it later. + </para> + <orderedlist numeration="arabic"> + <listitem> + <para> + Create a buffer and put your text in it. + </para> + </listitem> + </orderedlist> + <programlisting language="C"> + #include <hb.h> + hb_buffer_t *buf; + buf = hb_buffer_create(); + hb_buffer_add_utf8(buf, text, strlen(text), 0, strlen(text)); +</programlisting> + <orderedlist numeration="arabic"> + <listitem override="2"> + <para> + Guess the script, language and direction of the buffer. + </para> + </listitem> + </orderedlist> + <programlisting language="C"> + hb_buffer_guess_segment_properties(buf); +</programlisting> + <orderedlist numeration="arabic"> + <listitem override="3"> + <para> + Create a face and a font, using FreeType for now. + </para> + </listitem> + </orderedlist> + <programlisting language="C"> + #include <hb-ft.h> + FT_New_Face(ft_library, font_path, index, &face) + hb_font_t *font = hb_ft_font_create(face); +</programlisting> + <orderedlist numeration="arabic"> + <listitem override="4"> + <para> + Shape! + </para> + </listitem> + </orderedlist> + <programlisting> + hb_shape(font, buf, NULL, 0); +</programlisting> + <orderedlist numeration="arabic"> + <listitem override="5"> + <para> + Get the glyph and position information. + </para> + </listitem> + </orderedlist> + <programlisting language="C"> + hb_glyph_info_t *glyph_info = hb_buffer_get_glyph_infos(buf, &glyph_count); + hb_glyph_position_t *glyph_pos = hb_buffer_get_glyph_positions(buf, &glyph_count); +</programlisting> + <orderedlist numeration="arabic"> + <listitem override="6"> + <para> + Iterate over each glyph. + </para> + </listitem> + </orderedlist> + <programlisting language="C"> + for (i = 0; i < glyph_count; ++i) { + glyphid = glyph_info[i].codepoint; + x_offset = glyph_pos[i].x_offset / 64.0; + y_offset = glyph_pos[i].y_offset / 64.0; + x_advance = glyph_pos[i].x_advance / 64.0; + y_advance = glyph_pos[i].y_advance / 64.0; + draw_glyph(glyphid, cursor_x + x_offset, cursor_y + y_offset); + cursor_x += x_advance; + cursor_y += y_advance; + } +</programlisting> + <orderedlist numeration="arabic"> + <listitem override="7"> + <para> + Tidy up. + </para> + </listitem> + </orderedlist> + <programlisting language="C"> + hb_buffer_destroy(buf); + hb_font_destroy(hb_ft_font); +</programlisting> + <sect2 id="what-harfbuzz-doesnt-do"> + <title>What Harfbuzz doesn't do</title> + <para> + The code above will take a UTF8 string, shape it, and give you the + information required to lay it out correctly on a single + horizontal (or vertical) line using the font provided. That is the + extent of Harfbuzz's responsibility. + </para> + <para> + If you are implementing a text layout engine you may have other + responsibilities, that Harfbuzz will not help you with: + </para> + <itemizedlist> + <listitem> + <para> + Harfbuzz won't help you with bidirectionality. If you want to + lay out text with mixed Hebrew and English, you will need to + ensure that the buffer provided to Harfbuzz has those + characters in the correct layout order. This will be different + from the logical order in which the Unicode text is stored. In + other words, the user will hit the keys in the following + sequence: + </para> + <programlisting> +A B C [space] ג ב א [space] D E F + </programlisting> + <para> + but will expect to see in the output: + </para> + <programlisting> +ABC אבג DEF + </programlisting> + <para> + This reordering is called <emphasis>bidi processing</emphasis> + ("bidi" is short for bidirectional), and there's an + algorithm as an annex to the Unicode Standard which tells you how + to reorder a string from logical order into presentation order. + Before sending your string to Harfbuzz, you may need to apply the + bidi algorithm to it. Libraries such as ICU and fribidi can do + this for you. + </para> + <listitem> + <para> + Harfbuzz won't help you with text that contains different font + properties. For instance, if you have the string "a + <emphasis>huge</emphasis> breakfast", and you expect + "huge" to be italic, you will need to send three + strings to Harfbuzz: <literal>a</literal>, in your Roman font; + <literal>huge</literal> using your italic font; and + <literal>breakfast</literal> using your Roman font again. + Similarly if you change font, font size, script, language or + direction within your string, you will need to shape each run + independently and then output them independently. Harfbuzz + expects to shape a run of characters sharing the same + properties. + </para> + </listitem> + <listitem> + <para> + Harfbuzz won't help you with line breaking, hyphenation or + justification. As mentioned above, it lays out the string + along a <emphasis>single line</emphasis> of, notionally, + infinite length. If you want to find out where the potential + word, sentence and line break points are in your text, you + could use the ICU library's break iterator functions. + </para> + <para> + Harfbuzz can tell you how wide a shaped piece of text is, which is + useful input to a justification algorithm, but it knows nothing + about paragraphs, lines or line lengths. Nor will it adjust the + space between words to fit them proportionally into a line. If you + want to layout text in paragraphs, you will probably want to send + each word of your text to Harfbuzz to determine its shaped width + after glyph substitutions, then work out how many words will fit + on a line, and then finally output each word of the line separated + by a space of the correct size to fully justify the paragraph. + </para> + </listitem> + </itemizedlist> + <para> + As a layout engine implementor, Harfbuzz will help you with the + interface between your text and your font, and that's something + that you'll need - what you then do with the glyphs that your font + returns is up to you. The example we saw above enough to get us + started using Harfbuzz. Now we are going to use the remainder of + Harfbuzz's API to refine that example and improve our text shaping + capabilities. + </para> + </sect2> +</sect1>
\ No newline at end of file |