| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Binary upgrades are not supported without master process, but it is,
however, possible, that nginx running with master process is asked
to upgrade binary, and the configuration file as available on disk
at this time includes "master_process off;".
If this happens, listening sockets inherited from the previous binary
will have ls[i].previous set. But the old cycle on initial process
startup, including startup after binary upgrade, is destroyed by
ngx_init_cycle() once configuration parsing is complete. As a result,
an attempt to dereference ls[i].previous in ngx_event_process_init()
accesses already freed memory.
Fix is to avoid looking into ls[i].previous if the old cycle is already
freed.
With this change it is also no longer needed to clear ls[i].previous in
worker processes, so the relevant code was removed.
|
|
|
|
|
|
| |
Previously, if an event was posted by a read event handler, called by
ngx_close_idle_connections(), that event was not processed until the next
event loop iteration, which could happen after a timeout.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When reading exactly rev->available bytes, rev->available might become 0
after FIONREAD usage introduction in efd71d49bde0. On the next call of
ngx_readv_chain() on systems with EPOLLRDHUP this resulted in return without
any actions, that is, with rev->ready set, and this in turn resulted in no
timers set in event pipe, leading to socket leaks.
Fix is to reset rev->ready in ngx_readv_chain() when returning due to
rev->available being 0 with EPOLLRDHUP, much like it is already done in
ngx_unix_recv(). This ensures that if rev->available will become 0, on
systems with EPOLLRDHUP support appropriate EPOLLRDHUP-specific handling
will happen on the next ngx_readv_chain() call.
While here, also synced ngx_readv_chain() to match ngx_unix_recv() and
reset rev->ready when returning due to rev->available being 0 with kqueue.
This is mostly cosmetic change, as rev->ready is anyway reset when
rev->available is set to 0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 7583:efd71d49bde0 (nginx 1.17.5) along with introduction of the
ioctl(FIONREAD) support proper handling of systems without EPOLLRDHUP
support in the kernel (but with EPOLLRDHUP in headers) was broken.
Before the change, rev->available was never set to 0 unless
ngx_use_epoll_rdhup was also set (that is, runtime test for EPOLLRDHUP
introduced in 6536:f7849bfb6d21 succeeded). After the change,
rev->available might reach 0 on systems without runtime EPOLLRDHUP
support, stopping further reading in ngx_readv_chain() and ngx_unix_recv().
And, if EOF happened to be already reported along with the last event,
it is not reported again by epoll_wait(), leading to connection hangs
and timeouts on such systems.
This affects Linux kernels before 2.6.17 if nginx was compiled
with newer headers, and, more importantly, emulation layers, such as
DigitalOcean's App Platform's / gVisor's epoll emulation layer.
Fix is to explicitly check ngx_use_epoll_rdhup before the corresponding
rev->pending_eof tests in ngx_readv_chain() and ngx_unix_recv().
|
| |
|
| |
|
|
|
|
|
| |
The NGX_HAVE_ADDRINFO_CMSG macro is defined when at least one of methods
to deal with corresponding control message is available.
|
|
|
|
|
|
| |
The SF_NOCACHE flag, introduced in FreeBSD 11 along with the new non-blocking
sendfile() implementation by glebius@, makes it possible to use sendfile()
along with the "directio" directive.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Starting with FreeBSD 11, there is no need to use AIO operations to preload
data into cache for sendfile(SF_NODISKIO) to work. Instead, sendfile()
handles non-blocking loading data from disk by itself. It still can, however,
return EBUSY if a page is already being loaded (for example, by a different
process). If this happens, we now post an event for the next event loop
iteration, so sendfile() is retried "after a short period", as manpage
recommends.
The limit of the number of EBUSY tolerated without any progress is preserved,
but now it does not result in an alert, since on an idle system event loop
iteration might be very short and EBUSY can happen many times in a row.
Instead, SF_NODISKIO is simply disabled for one call once the limit is
reached.
With this change, sendfile(SF_NODISKIO) is now used automatically as long as
sendfile() is enabled, and no longer requires "aio on;".
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With sendfile in threads, "task already active" alerts might appear in logs
if a write event happens on the main HTTP/2 connection, triggering a sendfile
in threads while another thread operation is already running. Observed
with "aio threads; aio_write on; sendfile on;" and with thread event handlers
modified to post a write event to the main HTTP/2 connection (though can
happen without any modifications).
Similarly, sendfile() with AIO preloading on FreeBSD can trigger duplicate
aio operation, resulting in "second aio post" alerts. This is, however,
harder to reproduce, especially on modern FreeBSD systems, since sendfile()
usually does not return EBUSY.
Fix is to avoid starting a sendfile operation if other thread operation
is active by checking r->aio in the thread handler (and, similarly, in
aio preload handler). The added check also makes duplicate calls protection
redundant, so it is removed.
|
|
|
|
|
|
|
|
|
|
|
| |
On Linux starting with 2.6.16, sendfile() silently limits all operations
to MAX_RW_COUNT, defined as (INT_MAX & PAGE_MASK). This incorrectly
triggered the interrupt check, and resulted in 0-sized writev() on the
next loop iteration.
Fix is to make sure the limit is always checked, so we will return from
the loop if the limit is already reached even if number of bytes sent is
not exactly equal to the number of bytes we've tried to send.
|
|
|
|
| |
This allows to build nginx on macOS with -Wdeprecated-declarations.
|
|
|
|
|
|
|
|
|
|
| |
In d1bde5c3c5d2, the number of preallocated iovec's for ngx_readv_chain()
was increased. Still, in some setups, the function might allocate memory
for iovec's from a connection pool, which is only freed when closing the
connection.
The ngx_readv_chain() function was modified to use only preallocated
memory, similarly to the ngx_writev_chain() change in 8e903522c17a.
|
|
|
|
|
|
|
|
|
| |
Due to structure's alignment, some uninitialized memory contents may have
been passed between processes.
Zeroing was removed in 0215ec9aaa8a.
Reported by Johnny Wang.
|
| |
|
|
|
|
|
|
| |
The strerrordesc_np() function, introduced in glibc 2.32, provides an
async-signal-safe way to obtain error messages. This makes it possible
to avoid copying error messages.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, systems without sys_nerr (or _sys_nerr) were handled with an
assumption that errors start at 0 and continuous. This is, however, not
something POSIX requires, and not true on some platforms.
Notably, on Linux, where sys_nerr is no longer available for newly linked
binaries starting with glibc 2.32, there are gaps in error list, which
used to stop us from properly detecting maximum errno. Further, on
GNU/Hurd errors start at 0x40000001.
With this change, maximum errno detection is moved to the runtime code,
now able to ignore gaps, and also detects the first error if needed.
This fixes observed "Unknown error" messages as seen on Linux with
glibc 2.32 and on GNU/Hurd.
|
|
|
|
|
|
|
|
|
|
|
| |
Clearing cache based on free space left on a file system is
expected to allow better disk utilization in some cases, notably
when disk space might be also used for something other than nginx
cache (including nginx own temporary files) and while loading
cache (when cache size might be inaccurate for a while, effectively
disabling max_size cache clearing).
Based on a patch by Adam Bambuch.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With XFS, using "allocsize=64m" mount option results in large preallocation
being reported in the st_blocks as returned by fstat() till the file is
closed. This in turn results in incorrect cache size calculations and
wrong clearing based on max_size.
To avoid too aggressive cache clearing on such volumes, st_blocks values
which result in sizes larger than st_size and eight blocks (an arbitrary
limit) are no longer trusted, and we use st_size instead.
The ngx_de_fs_size() counterpart is intentionally not modified, as
it is used on closed files and hence not affected by this problem.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NFS on Linux is known to report wsize as a block size (in both f_bsize
and f_frsize, both in statfs() and statvfs()). On the other hand,
typical file system block sizes on Linux (ext2/ext3/ext4, XFS) are limited
to pagesize. (With FAT, block sizes can be at least up to 512k in
extreme cases, but this doesn't really matter, see below.)
To avoid too aggressive cache clearing on NFS volumes on Linux, block
sizes larger than pagesize are now ignored.
Note that it is safe to ignore large block sizes. Since 3899:e7cd13b7f759
(1.0.1) cache size is calculated based on fstat() st_blocks, and rounding
to file system block size is preserved mostly for Windows.
Note well that on other OSes valid block sizes seen are at least up
to 65536. In particular, UFS on FreeBSD is known to work well with block
and fragment sizes set to 65536.
|
| |
|
|
|
|
|
|
| |
Listening UNIX sockets were not removed on graceful shutdown, preventing
the next runs. The fix is to replace the custom socket closing code in
ngx_master_process_cycle() by the ngx_close_listening_sockets() call.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes it possible to avoid looping for a long time while working
with a fast enough peer when data are added to the socket buffer faster
than we are able to read and process them (ticket #1431). This is
basically what we already do on FreeBSD with kqueue, where information
about the number of bytes in the socket buffer is returned by
the kevent() call.
With other event methods rev->available is now set to -1 when the socket
is ready for reading. Later in ngx_recv() and ngx_recv_chain(), if
full buffer is received, real number of bytes in the socket buffer is
retrieved using ioctl(FIONREAD). Reading more than this number of bytes
ensures that even with edge-triggered event methods the event will be
triggered again, so it is safe to stop processing of the socket and
switch to other connections.
Using ioctl(FIONREAD) only after reading a full buffer is an optimization.
With this approach we only call ioctl(FIONREAD) when there are at least
two recv()/readv() calls.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AIO support in nginx was originally developed against FreeBSD versions 4-6,
where the sival_ptr field was named as sigval_ptr (seemingly by mistake[1]),
which made nginx use the only name available then. The standard-complaint
name was restored in 2005 (first appeared in FreeBSD 7.0, 2008), retaining
compatibility with previous versions[2][3]. In DragonFly, similar changes
were committed in 2009[4], with backward compatibility recently removed[5].
The change switches to the standard name, retaining compatibility with old
FreeBSD versions.
[1] https://svnweb.freebsd.org/changeset/base/48621
[2] https://svnweb.freebsd.org/changeset/base/152029
[3] https://svnweb.freebsd.org/changeset/base/174003
[4] https://gitweb.dragonflybsd.org/dragonfly.git/commit/3693401
[5] https://gitweb.dragonflybsd.org/dragonfly.git/commit/7875042
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previous interface of ngx_open_dir() assumed that passed directory name
has a room for NGX_DIR_MASK at the end (NGX_DIR_MASK_LEN bytes). While all
direct users of ngx_dir_open() followed this interface, this also implied
similar requirements for indirect uses - in particular, via ngx_walk_tree().
Currently none of ngx_walk_tree() uses provides appropriate space, and
fixing this does not look like a right way to go. Instead, ngx_dir_open()
interface was changed to not require any additional space and use
appropriate allocations instead.
|
|
|
|
|
|
| |
Previously, "%uA" was used, which corresponds to ngx_atomic_uint_t.
Size of ngx_atomic_uint_t can be easily different from uint64_t,
leading to undefined results.
|
|
|
|
|
|
|
|
|
| |
The bug in question was fixed in glibc 2.3.2 and is no longer expected
to manifest itself on real servers. On the other hand, the workaround
causes compilation problems on various systems. Previously, we've
already fixed the code to compile with musl libc (fd6fd02f6a4d), and
now it is broken on Fedora 28 where glibc's crypt library was replaced
by libxcrypt. So the workaround was removed.
|
|
|
|
| |
No functional changes.
|
|
|
|
|
|
|
|
|
|
| |
Previously, capset(2) was called with the 64-bit capabilities version
_LINUX_CAPABILITY_VERSION_3. With this version Linux kernel expected two
copies of struct __user_cap_data_struct, while only one was submitted. As a
result, random stack memory was accessed and random capabilities were requested
by the worker. This sometimes caused capset() errors. Now the 32-bit version
_LINUX_CAPABILITY_VERSION_1 is used instead. This is OK since CAP_NET_RAW is
a 32-bit capability (CAP_NET_RAW = 13).
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously included file sys/capability.h mentioned in capset(2) man page,
belongs to the libcap-dev package, which may not be installed on some Linux
systems when compiling nginx. This prevented the capabilities feature from
being detected and compiled on that systems.
Now linux/capability.h system header is included instead. Since capset()
declaration is located in sys/capability.h, now capset() syscall is defined
explicitly in code using the SYS_capset constant, similarly to other
Linux-specific features in nginx.
|
|
|
|
|
|
|
|
| |
The capability is retained automatically in unprivileged worker processes after
changing UID if transparent proxying is enabled at least once in nginx
configuration.
The feature is only available in Linux.
|
|
|
|
|
|
| |
Determine cacheline size at runtime if supported
using sysconf(_SC_LEVEL1_DCACHE_LINESIZE). In case not supported,
fallback to compile time defaults.
|
| |
|
|
|
|
|
|
|
|
|
| |
On some systems, it's possible that reaper of orphaned processes is
set to something other than "init" process. On such systems, the
changing binary procedure did not work.
The fix is to check if PPID has changed, instead of assuming it's
always 1 for orphaned processes.
|
|
|
|
| |
After e284f3ff6831, ngx_crypt() can no longer return NGX_AGAIN.
|
| |
|
| |
|
|
|
|
| |
Found by gcc7 (-Wimplicit-fallthrough).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the source IP address of a response UDP datagram could differ from
the original datagram destination address. This could happen if the server UDP
socket is bound to a wildcard address and the network interface chosen to output
the response packet has a different default address than the destination address
of the original packet. For example, if two addresses from the same network are
configured on an interface.
Now source address is set explicitly if a response is sent for a server UDP
socket bound to a wildcard address.
|
|
|
|
|
| |
This change allows setting the destination IPv6 address of a UDP datagram
received on a wildcard socket.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ngx_linux_sendfile() function is now used for both normal sendfile()
and sendfile in threads. The ngx_linux_sendfile_thread() function was
modified to use the same interface as ngx_linux_sendfile(), and is simply
called from ngx_linux_sendfile() when threads are enabled.
Special return code NGX_DONE is used to indicate that a thread task was
posted and no further actions are needed.
If number of bytes sent is less that what we were sending, we now always
retry sending. This is needed for sendfile() in threads as the number
of bytes we are sending might have been changed since the thread task
was posted. And this is also needed for Linux 4.3+, as sendfile() might
be interrupted at any time and provides no indication if it was interrupted
or not (ticket #1174).
|
|
|
|
| |
This has somehow escaped from fbdaad9b0e7b.
|
| |
|
|
|
|
|
|
| |
The directive configures a timeout to be used when gracefully shutting down
worker processes. When the timer expires, nginx will try to close all
the connections currently open to facilitate shutdown.
|
|
|
|
|
|
|
|
|
| |
There is no need to cancel timers early if there are other timers blocking
shutdown anyway. Preserving such timers allows nginx to continue some
periodic work till the shutdown is actually possible.
With the new approach, timers with ev->cancelable are simply ignored when
checking if there are any timers left during shutdown.
|
|
|
|
|
| |
These messages doesn't seem to be needed in practice and only make
debugging logs harder to read.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ngx_chain_coalesce_file() function may produce more bytes to send then
requested in the limit passed, as it aligns the last file position
to send to memory page boundary. As a result, (limit - send) may become
negative. This resulted in big positive number when converted to size_t
while calling ngx_output_chain_to_iovec().
Another part of the problem is in ngx_chain_coalesce_file(): it changes cl
to the next chain link even if the current buffer is only partially sent
due to limit.
Therefore, if a file buffer was not expected to be fully sent due to limit,
and was followed by a memory buffer, nginx called sendfile() with a part
of the file buffer, and the memory buffer in trailer. If there were enough
room in the socket buffer, this resulted in a part of the file buffer being
skipped, and corresponding part of the memory buffer sent instead.
The bug was introduced in 8e903522c17a (1.7.8). Configurations affected
are ones using limits, that is, limit_rate and/or sendfile_max_chunk, and
memory buffers after file ones (may happen when using subrequests or
with proxying with disk buffering).
Fix is to explicitly check if (send < limit) before constructing trailer
with ngx_output_chain_to_iovec(). Additionally, ngx_chain_coalesce_file()
was modified to preserve unfinished file buffers in cl.
|
|
|
|
|
|
|
|
|
| |
The new parameters "manager_files", "manager_sleep"
and "manager_threshold" were added to proxy_cache_path
and friends.
Note that ngx_path_manager_pt was changed to return ngx_msec_t
instead of time_t (API change).
|