diff options
author | Sverker Eriksson <sverker@erlang.org> | 2021-11-03 11:39:42 +0100 |
---|---|---|
committer | Sverker Eriksson <sverker@erlang.org> | 2021-11-03 11:39:42 +0100 |
commit | f88e73f278b5799e6d647dad90eeb9ea6879b876 (patch) | |
tree | 03b6108ce3b1e6d31c9431b8a9c23706e2526902 /erts/doc | |
parent | 773a44042c6394b78726ac8157d1ecc8025634c6 (diff) | |
parent | b4ac3e1da84d690d3eece02e268a9890a294cce4 (diff) | |
download | erlang-f88e73f278b5799e6d647dad90eeb9ea6879b876.tar.gz |
Merge branch 'sverker/atom-encoding-doc-fix' into maint
Diffstat (limited to 'erts/doc')
-rw-r--r-- | erts/doc/src/erl_ext_dist.xml | 104 | ||||
-rw-r--r-- | erts/doc/src/erlang.xml | 13 |
2 files changed, 53 insertions, 64 deletions
diff --git a/erts/doc/src/erl_ext_dist.xml b/erts/doc/src/erl_ext_dist.xml index 06e9ac639d..a4fa3449cf 100644 --- a/erts/doc/src/erl_ext_dist.xml +++ b/erts/doc/src/erl_ext_dist.xml @@ -117,23 +117,34 @@ <cell align="center"><c>Data</c></cell> </row> <tcaption>Compressed Data Format when Expanded</tcaption></table> + </section> + + <section> <marker id="utf8_atoms"/> - <note> - <p>As from ERTS 9.0 (OTP 20), atoms may contain any Unicode - characters and are always encoded using the UTF-8 external formats - <seeguide marker="#ATOM_UTF8_EXT"><c>ATOM_UTF8_EXT</c></seeguide> - or <seeguide marker="#SMALL_ATOM_UTF8_EXT"><c>SMALL_ATOM_UTF8_EXT</c></seeguide>. - The old Latin-1 formats <seeguide marker="#ATOM_EXT"><c>ATOM_EXT</c></seeguide> - and <seeguide marker="#SMALL_ATOM_EXT"><c>SMALL_ATOM_EXT</c></seeguide> - are deprecated and are only kept for backward - compatibility when decoding terms encoded by older nodes.</p> - <p>Support for UTF-8 encoded atoms in the external format has been - available since ERTS 5.10 (OTP R16). This ability allows such old nodes - to decode, store and encode any Unicode atoms received from a new OTP 20 - node.</p> - <p>The maximum number of allowed characters in an atom is 255. In the - UTF-8 case, each character can need 4 bytes to be encoded.</p> - </note> + <title>Encoding atoms</title> + <p> + As from ERTS 9.0 (OTP 20), atoms may contain any Unicode characters. + </p> + <p> + Atoms sent over node distribution are always encoded in UTF-8 using + either <seeguide marker="#ATOM_UTF8_EXT"><c>ATOM_UTF8_EXT</c></seeguide>, + <seeguide marker="#SMALL_ATOM_UTF8_EXT"><c>SMALL_ATOM_UTF8_EXT</c></seeguide> + or <seeguide marker="#ATOM_CACHE_REF"><c>ATOM_CACHE_REF</c></seeguide>. + </p> + <p> + Atoms encoded with <seemfa marker="erts:erlang#term_to_binary/1"> + <c>erlang:term_to_binary/1,2</c></seemfa> or + <seemfa marker="erts:erlang#term_to_iovec/1"> + <c>erlang:term_to_iovec/1,2</c></seemfa> are by default still using the + old deprecated Latin-1 format + <seeguide marker="#ATOM_EXT"><c>ATOM_EXT</c></seeguide> + for atoms that only contain Latin-1 characters (Unicode code points + 0-255). Atoms with higher code points will be encoded in UTF-8 using + either <seeguide marker="#ATOM_UTF8_EXT"><c>ATOM_UTF8_EXT</c></seeguide> or + <seeguide marker="#SMALL_ATOM_UTF8_EXT"><c>SMALL_ATOM_UTF8_EXT</c></seeguide>. + </p> + <p>The maximum number of allowed characters in an atom is 255. In the + UTF-8 case, each character can need 4 bytes to be encoded.</p> </section> <section> @@ -283,8 +294,8 @@ </p> <p> For more information on encoding of atoms, see the - <seeguide marker="#utf8_atoms">note on UTF-8 encoded atoms</seeguide> - in the beginning of this section. + <seeguide marker="#utf8_atoms">section on UTF-8 encoded atoms</seeguide> + above. </p> <p> If the <c>NewCacheEntryFlag</c> for the next <c>AtomCacheRef</c> @@ -666,10 +677,8 @@ <p> Encodes a port identifier (obtained from <seemfa marker="erlang#open_port/2"><c>erlang:open_port/2</c></seemfa>). - <c>Node</c> is an encoded atom, that is, - <seeguide marker="#ATOM_UTF8_EXT"><c>ATOM_UTF8_EXT</c></seeguide>, - <seeguide marker="#SMALL_ATOM_UTF8_EXT"><c>SMALL_ATOM_UTF8_EXT</c></seeguide> - or <seeguide marker="#ATOM_CACHE_REF"><c>ATOM_CACHE_REF</c></seeguide>. + <c>Node</c> is the originating node, + <seeguide marker="#utf8_atoms">encoded as an atom</seeguide>. <c>ID</c> is a 64-bit big endian unsigned integer. The <c>Creation</c> works just like in <seeguide marker="#NEW_PID_EXT"><c>NEW_PID_EXT</c></seeguide>. Port operations are not allowed across node boundaries. @@ -729,11 +738,8 @@ </p> <taglist> <tag><c>Node</c></tag> - <item><p>The name of the originating node, encoded using - <seeguide marker="#ATOM_UTF8_EXT"><c>ATOM_UTF8_EXT</c></seeguide>, - <seeguide marker="#SMALL_ATOM_UTF8_EXT"><c>SMALL_ATOM_UTF8_EXT</c></seeguide> - or <seeguide - marker="#ATOM_CACHE_REF"><c>ATOM_CACHE_REF</c></seeguide>.</p> + <item><p>The name of the originating node, + <seeguide marker="#utf8_atoms">encoded as an atom</seeguide>.</p> </item> <tag><c>ID</c></tag> <item><p>A 32-bit big endian unsigned integer. If distribution flag @@ -1072,10 +1078,8 @@ </p> <taglist> <tag><c>Node</c></tag> - <item><p>The name of the originating node, encoded using - <seeguide marker="#ATOM_UTF8_EXT"><c>ATOM_UTF8_EXT</c></seeguide>, - <seeguide marker="#SMALL_ATOM_UTF8_EXT"><c>SMALL_ATOM_UTF8_EXT</c></seeguide> - or <seeguide marker="#ATOM_CACHE_REF"><c>ATOM_CACHE_REF</c></seeguide>.</p> + <item><p>The name of the originating node, + <seeguide marker="#utf8_atoms">encoded as an atom</seeguide>.</p> </item> <tag><c>Len</c></tag> <item><p>A 16-bit big endian unsigned integer not larger than 5 when the @@ -1137,12 +1141,9 @@ </item> <tag><c>Module</c></tag> <item> - <p>Encoded as an atom, using - <seeguide marker="#ATOM_UTF8_EXT"><c>ATOM_UTF8_EXT</c></seeguide>, - <seeguide marker="#SMALL_ATOM_UTF8_EXT"><c>SMALL_ATOM_UTF8_EXT</c></seeguide>, - or <seeguide marker="#ATOM_CACHE_REF"> - <c>ATOM_CACHE_REF</c></seeguide>. - This is the module that the fun is implemented in. + <p> + The module that the fun is implemented in, + <seeguide marker="#utf8_atoms">encoded as an atom</seeguide>. </p> </item> <tag><c>Index</c></tag> @@ -1232,13 +1233,10 @@ </item> <tag><c>Module</c></tag> <item> - <p>Encoded as an atom, using - <seeguide marker="#ATOM_EXT"><c>ATOM_UTF8_EXT</c></seeguide>, - <seeguide marker="#SMALL_ATOM_EXT"><c>SMALL_ATOM_UTF8_EXT</c></seeguide>, - or <seeguide marker="#ATOM_CACHE_REF"> - <c>ATOM_CACHE_REF</c></seeguide>. - Is the module that the fun is implemented in. - </p> + <p> + The module that the fun is implemented in, + <seeguide marker="#utf8_atoms">encoded as an atom</seeguide>. + </p> </item> <tag><c>OldIndex</c></tag> <item> @@ -1295,10 +1293,8 @@ This term is the encoding for external funs: <c>fun M:F/A</c>. </p> <p> - <c>Module</c> and <c>Function</c> are atoms - (encoded using <seeguide marker="#ATOM_EXT"><c>ATOM_UTF8_EXT</c></seeguide>, - <seeguide marker="#SMALL_ATOM_EXT"><c>SMALL_ATOM_UTF8_EXT</c></seeguide>, or - <seeguide marker="#ATOM_CACHE_REF"><c>ATOM_CACHE_REF</c></seeguide>). + <c>Module</c> and <c>Function</c> are + <seeguide marker="#utf8_atoms">encoded as atoms</seeguide>. </p> <p> <c>Arity</c> is an integer encoded using @@ -1377,9 +1373,9 @@ in UTF-8. </p> <p> - For more information on encoding of atoms, see the - <seeguide marker="#utf8_atoms">note on UTF-8 encoded atoms</seeguide> - in the beginning of this section. + For more information, see the + <seeguide marker="#utf8_atoms">section on encoding atoms</seeguide> + in the beginning of this page. </p> </section> @@ -1405,9 +1401,9 @@ <seeguide marker="#ATOM_UTF8_EXT"><c>ATOM_UTF8_EXT</c></seeguide>. </p> <p> - For more information on encoding of atoms, see the - <seeguide marker="#utf8_atoms">note on UTF-8 encoded atoms</seeguide> - in the beginning of this section. + For more information, see the + <seeguide marker="#utf8_atoms">section on encoding atoms</seeguide> + in the beginning of this page. </p> </section> diff --git a/erts/doc/src/erlang.xml b/erts/doc/src/erlang.xml index 3b96ed5039..4ec5b38bf5 100644 --- a/erts/doc/src/erlang.xml +++ b/erts/doc/src/erlang.xml @@ -565,12 +565,8 @@ client(ServerPid, Request) -> valid UTF-8 sequences.</p> <note> <p>As from Erlang/OTP 20, <c>binary_to_atom(<anno>Binary</anno>, utf8)</c> - is capable of encoding any Unicode character. Earlier versions would - fail if the binary contained Unicode characters > 255. - For more information about Unicode support in atoms, see the - <seeguide marker="erl_ext_dist#utf8_atoms">note on UTF-8 - encoded atoms</seeguide> - in section "External Term Format" in the User's Guide.</p> + is capable of decoding any Unicode character. Earlier versions would + fail if the binary contained Unicode characters > 255.</p> </note> <note> <p>The number of characters that are permitted in an atom @@ -3233,10 +3229,7 @@ is_process_alive(P2Pid), <p>As from Erlang/OTP 20, <c><anno>String</anno></c> may contain any Unicode character. Earlier versions allowed only ISO-latin-1 characters as the implementation did not allow Unicode characters - above 255. For more information on Unicode support in atoms, see - <seeguide marker="erl_ext_dist#utf8_atoms">note on UTF-8 - encoded atoms</seeguide> - in section "External Term Format" in the User's Guide.</p> + above 255.</p> <note> <p>The number of characters that are permitted in an atom name is limited. The default limits can be found in the |