summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPhilip Withnall <pwithnall@endlessos.org>2023-04-28 11:17:23 +0100
committerPhilip Withnall <pwithnall@endlessos.org>2023-05-02 10:41:50 +0100
commit0e941418d139a77a0a505f28e42562bd20b452ba (patch)
tree2d797d7270707f60d5c138a259c83d837e212d56
parentc86fde7e02bb942af2165fb7e7a1947469ed45bc (diff)
downloadglib-0e941418d139a77a0a505f28e42562bd20b452ba.tar.gz
docs: Document high-level UTF-8 requirements for GLib
I’ve finally found the right place in the docs to put this stuff. This doesn’t auto-link this section from every string in the GLib documentation, but I think that at this point (with gtk-doc in maintenance mode, and gi-docgen not fully applied to GLib) I don’t think we can do any better. The perfect is the enemy of the good, and having this stuff documented somewhere means that someone can link to it from multiple places in future *somehow*. Signed-off-by: Philip Withnall <pwithnall@endlessos.org> Fixes: #116
-rw-r--r--docs/reference/glib/programming.xml30
1 files changed, 30 insertions, 0 deletions
diff --git a/docs/reference/glib/programming.xml b/docs/reference/glib/programming.xml
index 2c38fee5d..9efa19d33 100644
--- a/docs/reference/glib/programming.xml
+++ b/docs/reference/glib/programming.xml
@@ -31,6 +31,36 @@ to test all the allocation failure code paths.
</refsect2>
<refsect2>
+<title>UTF-8 and String Encoding</title>
+
+<para>
+All GLib, GObject and GIO functions accept and return strings in
+<ulink url="https://en.wikipedia.org/wiki/UTF-8">UTF-8 encoding</ulink>
+unless otherwise specified.
+</para>
+
+<para>
+Input strings to function calls are <emphasis>not</emphasis> checked to see if
+they are valid UTF-8: it is the application developer’s responsibility to
+validate input strings at the time of input, either at the program or library
+boundary, and to only use valid UTF-8 string constants in their application.
+If GLib were to UTF-8 validate all string inputs to all functions, there would
+be a significant drop in performance.
+</para>
+
+<para>Similarly, output strings from functions are guaranteed to be in UTF-8,
+and this does not need to be validated by the calling function. If a function
+returns invalid UTF-8 (and is not documented as doing so), that’s a bug.
+</para>
+
+<para>
+See <link linkend='g-utf8-validate'><function>g_utf8_validate()</function></link>
+and <link linkend='g-utf8-make-valid'><function>g_utf8_make_valid()</function></link>
+for validating UTF-8 input.
+</para>
+</refsect2>
+
+<refsect2>
<title>Threads</title>
<para>