summaryrefslogtreecommitdiff
path: root/erts/doc
diff options
context:
space:
mode:
authorSverker Eriksson <sverker@erlang.org>2021-11-03 11:39:42 +0100
committerSverker Eriksson <sverker@erlang.org>2021-11-03 11:39:42 +0100
commitf88e73f278b5799e6d647dad90eeb9ea6879b876 (patch)
tree03b6108ce3b1e6d31c9431b8a9c23706e2526902 /erts/doc
parent773a44042c6394b78726ac8157d1ecc8025634c6 (diff)
parentb4ac3e1da84d690d3eece02e268a9890a294cce4 (diff)
downloaderlang-f88e73f278b5799e6d647dad90eeb9ea6879b876.tar.gz
Merge branch 'sverker/atom-encoding-doc-fix' into maint
Diffstat (limited to 'erts/doc')
-rw-r--r--erts/doc/src/erl_ext_dist.xml104
-rw-r--r--erts/doc/src/erlang.xml13
2 files changed, 53 insertions, 64 deletions
diff --git a/erts/doc/src/erl_ext_dist.xml b/erts/doc/src/erl_ext_dist.xml
index 06e9ac639d..a4fa3449cf 100644
--- a/erts/doc/src/erl_ext_dist.xml
+++ b/erts/doc/src/erl_ext_dist.xml
@@ -117,23 +117,34 @@
<cell align="center"><c>Data</c></cell>
</row>
<tcaption>Compressed Data Format when Expanded</tcaption></table>
+ </section>
+
+ <section>
<marker id="utf8_atoms"/>
- <note>
- <p>As from ERTS 9.0 (OTP 20), atoms may contain any Unicode
- characters and are always encoded using the UTF-8 external formats
- <seeguide marker="#ATOM_UTF8_EXT"><c>ATOM_UTF8_EXT</c></seeguide>
- or <seeguide marker="#SMALL_ATOM_UTF8_EXT"><c>SMALL_ATOM_UTF8_EXT</c></seeguide>.
- The old Latin-1 formats <seeguide marker="#ATOM_EXT"><c>ATOM_EXT</c></seeguide>
- and <seeguide marker="#SMALL_ATOM_EXT"><c>SMALL_ATOM_EXT</c></seeguide>
- are deprecated and are only kept for backward
- compatibility when decoding terms encoded by older nodes.</p>
- <p>Support for UTF-8 encoded atoms in the external format has been
- available since ERTS 5.10 (OTP R16). This ability allows such old nodes
- to decode, store and encode any Unicode atoms received from a new OTP 20
- node.</p>
- <p>The maximum number of allowed characters in an atom is 255. In the
- UTF-8 case, each character can need 4 bytes to be encoded.</p>
- </note>
+ <title>Encoding atoms</title>
+ <p>
+ As from ERTS 9.0 (OTP 20), atoms may contain any Unicode characters.
+ </p>
+ <p>
+ Atoms sent over node distribution are always encoded in UTF-8 using
+ either <seeguide marker="#ATOM_UTF8_EXT"><c>ATOM_UTF8_EXT</c></seeguide>,
+ <seeguide marker="#SMALL_ATOM_UTF8_EXT"><c>SMALL_ATOM_UTF8_EXT</c></seeguide>
+ or <seeguide marker="#ATOM_CACHE_REF"><c>ATOM_CACHE_REF</c></seeguide>.
+ </p>
+ <p>
+ Atoms encoded with <seemfa marker="erts:erlang#term_to_binary/1">
+ <c>erlang:term_to_binary/1,2</c></seemfa> or
+ <seemfa marker="erts:erlang#term_to_iovec/1">
+ <c>erlang:term_to_iovec/1,2</c></seemfa> are by default still using the
+ old deprecated Latin-1 format
+ <seeguide marker="#ATOM_EXT"><c>ATOM_EXT</c></seeguide>
+ for atoms that only contain Latin-1 characters (Unicode code points
+ 0-255). Atoms with higher code points will be encoded in UTF-8 using
+ either <seeguide marker="#ATOM_UTF8_EXT"><c>ATOM_UTF8_EXT</c></seeguide> or
+ <seeguide marker="#SMALL_ATOM_UTF8_EXT"><c>SMALL_ATOM_UTF8_EXT</c></seeguide>.
+ </p>
+ <p>The maximum number of allowed characters in an atom is 255. In the
+ UTF-8 case, each character can need 4 bytes to be encoded.</p>
</section>
<section>
@@ -283,8 +294,8 @@
</p>
<p>
For more information on encoding of atoms, see the
- <seeguide marker="#utf8_atoms">note on UTF-8 encoded atoms</seeguide>
- in the beginning of this section.
+ <seeguide marker="#utf8_atoms">section on UTF-8 encoded atoms</seeguide>
+ above.
</p>
<p>
If the <c>NewCacheEntryFlag</c> for the next <c>AtomCacheRef</c>
@@ -666,10 +677,8 @@
<p>
Encodes a port identifier (obtained from
<seemfa marker="erlang#open_port/2"><c>erlang:open_port/2</c></seemfa>).
- <c>Node</c> is an encoded atom, that is,
- <seeguide marker="#ATOM_UTF8_EXT"><c>ATOM_UTF8_EXT</c></seeguide>,
- <seeguide marker="#SMALL_ATOM_UTF8_EXT"><c>SMALL_ATOM_UTF8_EXT</c></seeguide>
- or <seeguide marker="#ATOM_CACHE_REF"><c>ATOM_CACHE_REF</c></seeguide>.
+ <c>Node</c> is the originating node,
+ <seeguide marker="#utf8_atoms">encoded as an atom</seeguide>.
<c>ID</c> is a 64-bit big endian unsigned integer. The <c>Creation</c>
works just like in <seeguide marker="#NEW_PID_EXT"><c>NEW_PID_EXT</c></seeguide>.
Port operations are not allowed across node boundaries.
@@ -729,11 +738,8 @@
</p>
<taglist>
<tag><c>Node</c></tag>
- <item><p>The name of the originating node, encoded using
- <seeguide marker="#ATOM_UTF8_EXT"><c>ATOM_UTF8_EXT</c></seeguide>,
- <seeguide marker="#SMALL_ATOM_UTF8_EXT"><c>SMALL_ATOM_UTF8_EXT</c></seeguide>
- or <seeguide
- marker="#ATOM_CACHE_REF"><c>ATOM_CACHE_REF</c></seeguide>.</p>
+ <item><p>The name of the originating node,
+ <seeguide marker="#utf8_atoms">encoded as an atom</seeguide>.</p>
</item>
<tag><c>ID</c></tag>
<item><p>A 32-bit big endian unsigned integer. If distribution flag
@@ -1072,10 +1078,8 @@
</p>
<taglist>
<tag><c>Node</c></tag>
- <item><p>The name of the originating node, encoded using
- <seeguide marker="#ATOM_UTF8_EXT"><c>ATOM_UTF8_EXT</c></seeguide>,
- <seeguide marker="#SMALL_ATOM_UTF8_EXT"><c>SMALL_ATOM_UTF8_EXT</c></seeguide>
- or <seeguide marker="#ATOM_CACHE_REF"><c>ATOM_CACHE_REF</c></seeguide>.</p>
+ <item><p>The name of the originating node,
+ <seeguide marker="#utf8_atoms">encoded as an atom</seeguide>.</p>
</item>
<tag><c>Len</c></tag>
<item><p>A 16-bit big endian unsigned integer not larger than 5 when the
@@ -1137,12 +1141,9 @@
</item>
<tag><c>Module</c></tag>
<item>
- <p>Encoded as an atom, using
- <seeguide marker="#ATOM_UTF8_EXT"><c>ATOM_UTF8_EXT</c></seeguide>,
- <seeguide marker="#SMALL_ATOM_UTF8_EXT"><c>SMALL_ATOM_UTF8_EXT</c></seeguide>,
- or <seeguide marker="#ATOM_CACHE_REF">
- <c>ATOM_CACHE_REF</c></seeguide>.
- This is the module that the fun is implemented in.
+ <p>
+ The module that the fun is implemented in,
+ <seeguide marker="#utf8_atoms">encoded as an atom</seeguide>.
</p>
</item>
<tag><c>Index</c></tag>
@@ -1232,13 +1233,10 @@
</item>
<tag><c>Module</c></tag>
<item>
- <p>Encoded as an atom, using
- <seeguide marker="#ATOM_EXT"><c>ATOM_UTF8_EXT</c></seeguide>,
- <seeguide marker="#SMALL_ATOM_EXT"><c>SMALL_ATOM_UTF8_EXT</c></seeguide>,
- or <seeguide marker="#ATOM_CACHE_REF">
- <c>ATOM_CACHE_REF</c></seeguide>.
- Is the module that the fun is implemented in.
- </p>
+ <p>
+ The module that the fun is implemented in,
+ <seeguide marker="#utf8_atoms">encoded as an atom</seeguide>.
+ </p>
</item>
<tag><c>OldIndex</c></tag>
<item>
@@ -1295,10 +1293,8 @@
This term is the encoding for external funs: <c>fun M:F/A</c>.
</p>
<p>
- <c>Module</c> and <c>Function</c> are atoms
- (encoded using <seeguide marker="#ATOM_EXT"><c>ATOM_UTF8_EXT</c></seeguide>,
- <seeguide marker="#SMALL_ATOM_EXT"><c>SMALL_ATOM_UTF8_EXT</c></seeguide>, or
- <seeguide marker="#ATOM_CACHE_REF"><c>ATOM_CACHE_REF</c></seeguide>).
+ <c>Module</c> and <c>Function</c> are
+ <seeguide marker="#utf8_atoms">encoded as atoms</seeguide>.
</p>
<p>
<c>Arity</c> is an integer encoded using
@@ -1377,9 +1373,9 @@
in UTF-8.
</p>
<p>
- For more information on encoding of atoms, see the
- <seeguide marker="#utf8_atoms">note on UTF-8 encoded atoms</seeguide>
- in the beginning of this section.
+ For more information, see the
+ <seeguide marker="#utf8_atoms">section on encoding atoms</seeguide>
+ in the beginning of this page.
</p>
</section>
@@ -1405,9 +1401,9 @@
<seeguide marker="#ATOM_UTF8_EXT"><c>ATOM_UTF8_EXT</c></seeguide>.
</p>
<p>
- For more information on encoding of atoms, see the
- <seeguide marker="#utf8_atoms">note on UTF-8 encoded atoms</seeguide>
- in the beginning of this section.
+ For more information, see the
+ <seeguide marker="#utf8_atoms">section on encoding atoms</seeguide>
+ in the beginning of this page.
</p>
</section>
diff --git a/erts/doc/src/erlang.xml b/erts/doc/src/erlang.xml
index 3b96ed5039..4ec5b38bf5 100644
--- a/erts/doc/src/erlang.xml
+++ b/erts/doc/src/erlang.xml
@@ -565,12 +565,8 @@ client(ServerPid, Request) ->
valid UTF-8 sequences.</p>
<note>
<p>As from Erlang/OTP 20, <c>binary_to_atom(<anno>Binary</anno>, utf8)</c>
- is capable of encoding any Unicode character. Earlier versions would
- fail if the binary contained Unicode characters &gt; 255.
- For more information about Unicode support in atoms, see the
- <seeguide marker="erl_ext_dist#utf8_atoms">note on UTF-8
- encoded atoms</seeguide>
- in section "External Term Format" in the User's Guide.</p>
+ is capable of decoding any Unicode character. Earlier versions would
+ fail if the binary contained Unicode characters &gt; 255.</p>
</note>
<note>
<p>The number of characters that are permitted in an atom
@@ -3233,10 +3229,7 @@ is_process_alive(P2Pid),
<p>As from Erlang/OTP 20, <c><anno>String</anno></c> may contain
any Unicode character. Earlier versions allowed only ISO-latin-1
characters as the implementation did not allow Unicode characters
- above 255. For more information on Unicode support in atoms, see
- <seeguide marker="erl_ext_dist#utf8_atoms">note on UTF-8
- encoded atoms</seeguide>
- in section "External Term Format" in the User's Guide.</p>
+ above 255.</p>
<note>
<p>The number of characters that are permitted in an atom
name is limited. The default limits can be found in the