An Erlang process is lightweight compared to threads and processes in operating systems.
A newly spawned Erlang process uses 326 words of memory. The size can be found as follows:
Erlang/OTP 24 [erts-12.0] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit] Eshell V5.6 (abort with ^G) 1> Fun = fun() -> receive after infinity -> ok end end. #Fun<...> 2> {_,Bytes} = process_info(spawn(Fun), memory). {memory,1232} 3> Bytes div erlang:system_info(wordsize). 309
The size includes 233 words for the heap area (which includes the stack). The garbage collector increases the heap as needed.
The main (outer) loop for a process must be tail-recursive. Otherwise, the stack grows until the process terminates.
DO NOT
loop() ->
receive
{sys, Msg} ->
handle_sys_msg(Msg),
loop();
{From, Msg} ->
Reply = handle_msg(Msg),
From ! Reply,
loop()
end,
io:format("Message is processed~n", []).
The call to
DO
loop() ->
receive
{sys, Msg} ->
handle_sys_msg(Msg),
loop();
{From, Msg} ->
Reply = handle_msg(Msg),
From ! Reply,
loop()
end.
The default initial heap size of 233 words is quite conservative to support Erlang systems with hundreds of thousands or even millions of processes. The garbage collector grows and shrinks the heap as needed.
In a system that use comparatively few processes, performance
might be improved by increasing the minimum heap size
using either the
The gain is twofold:
The emulator probably uses more memory, and because garbage collections occur less frequently, huge binaries can be kept much longer.
In systems with many processes, computation tasks that run for a short time can be spawned off into a new process with a higher minimum heap size. When the process is done, it sends the result of the computation to another process and terminates. If the minimum heap size is calculated properly, the process might not have to do any garbage collections at all. This optimization is not to be attempted without proper measurements.
All data in messages sent between Erlang processes is copied,
except for
When a message is sent to a process on another Erlang node, it is first encoded to the Erlang External Format before being sent through a TCP/IP socket. The receiving Erlang node decodes the message and distributes it to the correct process.
The cost of receiving messages depends on how complicated the
DO
handle_msg(Message)
end.]]>
However, this is not always convenient: we can receive a message that we do not know how to handle at this point, so it is common to only match the messages we expect:
handle_msg(Message)
end.]]>
While this is convenient it means that the entire message queue must be searched until it finds a matching message. This is very expensive for processes with long message queues, so we have added an optimization for the common case of sending a request and waiting for a response shortly after:
DO
erlang:demonitor(MRef, [flush]),
handle_reply(Reply);
{'DOWN', MRef, _, _, Reason} ->
handle_error(Reason)
end.]]>
Since the compiler knows that the reference created by
The above is a simple example where one is but guaranteed that the optimization will take, but what about more complicated code?
Use the
or passed through an environment variable:
Notice that
The warnings look as follows:
To make it clearer exactly what code the warnings refer to, the warnings in the following examples are inserted as comments after the clause they refer to, for example:
%% efficiency_guide.erl:194: Warning: INFO: not a selective receive, this is always fast
receive
Message -> handle_msg(Message)
end.
%% DO NOT, unless Tag is known to be a suitable reference: see
%% cross_function_receive/0 further down.
selective_receive(Tag, Message) ->
%% efficiency_guide.erl:200: Warning: NOT OPTIMIZED: all clauses do not match a suitable reference
receive
{Tag, Message} -> handle_msg(Message)
end.
%% DO
optimized_receive(Process, Request) ->
%% efficiency_guide.erl:206: Warning: OPTIMIZED: reference used to mark a message queue position
MRef = monitor(process, Process),
Process ! {self(), MRef, Request},
%% efficiency_guide.erl:208: Warning: OPTIMIZED: matches reference created by monitor/2 at efficiency_guide.erl:206
receive
{MRef, Reply} ->
erlang:demonitor(MRef, [flush]),
handle_reply(Reply);
{'DOWN', MRef, _, _, Reason} ->
handle_error(Reason)
end.
%% DO
cross_function_receive() ->
%% efficiency_guide.erl:218: Warning: OPTIMIZED: reference used to mark a message queue position
Ref = make_ref(),
%% efficiency_guide.erl:219: Warning: INFO: passing reference created by make_ref/0 at efficiency_guide.erl:218
cross_function_receive(Ref).
cross_function_receive(Ref) ->
%% efficiency_guide.erl:222: Warning: OPTIMIZED: all clauses match reference in function parameter 1
receive
{Ref, Message} -> handle_msg(Message)
end.]]>
Constant Erlang terms (hereafter called literals) are kept in literal pools; each loaded module has its own pool. The following function does not build the tuple every time it is called (only to have it discarded the next time the garbage collector was run), but the tuple is located in the module's literal pool:
DO
element(M, {31,28,31,30,31,30,31,31,30,31,30,31}).]]>
If a literal, or a term that contains a literal, is inserted into an Ets table, it is copied. The reason is that the module containing the literal can be unloaded in the future.
When a literal is sent to another process, it is not copied. When a module holding a literal is unloaded, the literal will be copied to the heap of all processes that hold references to that literal.
There also exists a global literal pool that is managed by the
By default, 1 GB of virtual address space is reserved for all
literal pools (in BEAM code and persistent terms). The amount of
virtual address space reserved for literals can be changed by
using the
Here is an example how the reserved virtual address space for literals can be raised to 2 GB (2048 MB):
erl +MIscs 2048
An Erlang term can have shared subterms. Here is a simple example:
Shared subterms are not preserved in the following cases:
That is an optimization. Most applications do not send messages with shared subterms.
The following example shows how a shared subterm can be created:
1> byte_size(list_to_binary(efficiency_guide:kilo_byte())). 1024
Using the
2> erts_debug:size(efficiency_guide:kilo_byte()). 22
Using the
3> erts_debug:flat_size(efficiency_guide:kilo_byte()). 4094
It can be verified that sharing will be lost if the data is inserted into an Ets table:
4> T = ets:new(tab, []). #Ref<0.1662103692.2407923716.214181> 5> ets:insert(T, {key,efficiency_guide:kilo_byte()}). true 6> erts_debug:size(element(2, hd(ets:lookup(T, key)))). 4094 7> erts_debug:flat_size(element(2, hd(ets:lookup(T, key)))). 4094
When the data has passed through an Ets table,
It is possible to build an experimental variant of the
runtime system that will preserve sharing when copying terms by
giving the
The emulator takes advantage of a multi-core or multi-CPU computer by running several Erlang scheduler threads (typically, the same as the number of cores).
To gain performance from a multi-core computer, your application must have more than one runnable Erlang process most of the time. Otherwise, the Erlang emulator can still only run one Erlang process at the time.
Benchmarks that appear to be concurrent are often sequential.
The estone benchmark, for example, is entirely sequential. So is
the most common implementation of the "ring benchmark"; usually one process
is active, while the others wait in a