summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* MEDIUM: udp: implement udp_suspend() and udp_resume()20201007-proxy-state-7Willy Tarreau2020-10-071-2/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In Linux kernel's net/ipv4/udp.c there's a udp_disconnect() function which is called when connecting to AF_UNSPEC, and which unhashes a "connection". This property, which is also documented in connect(2) both in Linux and Open Group's man pages for datagrams, is interesting because it allows to reverse a connect() which is in fact a filter on the source. As such we can suspend a receiver by making it connect to itself, which will cause it not to receive any traffic anymore, letting a new one receive it all, then resume it by breaking this connection. This was tested to work well on Linux, other operating systems should also be tested. Before this, sending a SIGTTOU to a process having a UDP syslog forwarder would cause this error: [WARNING] 280/194249 (3268) : Paused frontend GLOBAL. [WARNING] 280/194249 (3268) : Some proxies refused to pause, performing soft stop now. [WARNING] 280/194249 (3268) : Proxy GLOBAL stopped (cumulated conns: FE: 0, BE: 0). [WARNING] 280/194249 (3268) : Proxy sylog-loadb stopped (cumulated conns: FE: 0, BE: 0). With this change, it now proceeds just like with TCP listeners: [WARNING] 280/195503 (3885) : Paused frontend GLOBAL. [WARNING] 280/195503 (3885) : Paused frontend sylog-loadb. And SIGTTIN also works: [WARNING] 280/195507 (3885) : Resumed frontend GLOBAL. [WARNING] 280/195507 (3885) : Resumed frontend sylog-loadb. On Linux this also works with TCP listeners (which can then be resumed using listen()) and established TCP sockets (which we currently kill using setsockopt(so_linger)), both not being portable on other OSes. UNIX sockets and ABNS sockets do not support it however (connect always fails). This needs to be further explored to see if other OSes might benefit from this to perform portable and reliable resets particularly on the backend side.
* MEDIUM: proxy: make soft_stop() stop most listeners using protocol_stop_now()Willy Tarreau2020-10-071-29/+7
| | | | | | | | | | | | One difficulty in soft-stopping is to make sure not to forget unlisted listeners. By first doing a pass using protocol_stop_now() we catch the vast majority of them. The few remaining ones are the ones belonging to a proxy having a grace period. For these ones, the proxy will arm its stop_time timer and emit a log message. Since neither UDP listeners nor peers use the grace period, we can already get rid of the special cases there since we know they will have been stopped by the protocols.
* MINOR: protocol: add protocol_stop_now() to instant-stop listenersWilly Tarreau2020-10-072-0/+28
| | | | | | This will instantly stop all listeners except those which belong to a proxy configured with a grace time. This means that UDP listeners, and peers will also be stopped when called this way.
* MEDIUM: proxy: centralize proxy status update and reportingWilly Tarreau2020-10-073-14/+38
| | | | | | | | | There are multiple ways a proxy may switch to the disabled state, but now it's essentially once it loses its last listener. Instead of keeping duplicate code around and reporting the state change before actually seeing it, we now report it at the moment it's performed (from the last listener leaving) which allows to remove the message from all other places.
* MEDIUM: proxy: add mode PR_MODE_PEERS to flag peers frontendsWilly Tarreau2020-10-073-1/+4
| | | | | | | For now we cannot easily distinguish a peers frontend from another one, which will be problematic to avoid reporting them when stopping their listeners. Let's add PR_MODE_PEERS for this. It's not supposed to cause any issue since all non-HTTP proxies are handled similarly now.
* MEDIUM: proxy: make stop_proxy() now use stop_listener()Willy Tarreau2020-10-071-22/+2
| | | | | The function will stop the listeners using this method, which in turn will ping back once it finishes disabling the proxy.
* MINOR: listeners: add a new stop_listener() functionWilly Tarreau2020-10-072-0/+72
| | | | | | | | | | | | | | This function will be used to definitely stop a listener (e.g. during a soft_stop). This is actually tricky because it may be called for a proxy or for a protocol, both of which require locks and already hold some. The function takes booleans indicating which ones are already held, hoping this will be enough. It's not well defined wether proto->disable() and proto->rx_disable() are supposed to be called with any lock held, and they are used from do_unbind_listener() with all these locks. Some back annotations ought to be added on this point. The proxy's listeners count is updated, and the proxy is marked as disabled and woken up after the last one is gone.
* MINOR: listeners: split delete_listener() in two versionsWilly Tarreau2020-10-072-7/+15
| | | | | | We'll need an already locked variant of this function so let's make __delete_listener() which will be called with the protocol lock held and the listener's lock held.
* MEDIUM: listeners: now use the listener's ->enable/disableWilly Tarreau2020-10-071-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | At each place we used to manipulate the FDs directly we can now call the listener protocol's enable/disable/rx_enable/rx_disable depending on whether the state changes on the listener or the receiver. One exception currently remains in listener_accept() which is a bit special and which should be split into 2 or 3 parts in the various protocol layers. The test of fd_updt in do_unbind_listener() that was added by commit a51885621 ("BUG/MEDIUM: listeners: Don't call fd_stop_recv() if fd_updt is NULL.") could finally be removed since that part is correctly handled in the low-level disable() function. One disable() was added in resume_listener() before switching to LI_FULL because rx_resume() enables polling on the FD for the receiver while we want to disable it if the listener is full. There are different ways to clean this up in the future. One of them could be to consider that TCP receivers only act at the listener level. But in fact it does not translate reality. The reality is that only the receiver is paused and that the listener's state ought not be affected here. Ultimately the resume_listener() function should be split so that the part controlled by the protocols only acts on the receiver, and that the receiver itself notifies the upper listener about the change so that the listener protocol may decide to disable or enable polling. Conversely the listener should automatically update its receiver when they share the same state. Since there is no harm proceeding like this, let's keep this for now.
* MINOR: protocol: add a new pair of enable/disable methods for listenersWilly Tarreau2020-10-075-0/+94
| | | | | | | | | | | | | | These methods will be used to enable/disable accepting new connections so that listeners do not play with FD directly anymore. Since all the currently supported protocols work on socket for now, these are identical to the rx_enable/rx_disable functions. However they were not defined in sock.c since it's likely that some will quickly start to differ. At the moment they're not used. We have to take care of fd_updt before calling fd_{want,stop}_recv() because it's allocated fairly late in the boot process and some such functions may be called very early (e.g. to stop a disabled frontend's listeners).
* MINOR: protocol: add a new pair of rx_enable/rx_disable methodsWilly Tarreau2020-10-075-0/+15
| | | | | | These methods will be used to enable/disable rx at the receiver level so that callers don't play with FDs directly anymore. All our protocols use the generic ones from sock.c at the moment. For now they're not used.
* MINOR: sock: provide a set of generic enable/disable functionsWilly Tarreau2020-10-072-0/+20
| | | | | | | | | | These will be used on receivers, to enable or disable receiving on a listener, which most of the time just consists in enabling/disabling the file descriptor. We have to take care of the existence of fd_updt to know if we may or not call fd_{want,stop}_recv() since it's not permitted in very early boot.
* MINOR: listener: use the protocol's ->rx_resume() method when availableWilly Tarreau2020-10-071-3/+2
| | | | | | | Instead of calling listen() for IPPROTO_TCP in resume_listener(), let's call the protocol's ->rx_resume() method when defined, which does the same. This removes another hard-dependency on the fd and underlying protocol from the generic functions.
* MINOR: protocol: implement an ->rx_resume() methodWilly Tarreau2020-10-072-2/+24
| | | | | | This one undoes ->rx_suspend(), it tries to restore an operational socket. It was only implemented for TCP since it's the only one we support right now.
* MINOR: protocol: replace ->pause(listener) with ->rx_suspend(receiver)Willy Tarreau2020-10-078-44/+50
| | | | | | | | | | | | | | | | | | The ->pause method is inappropriate since it doesn't exactly "pause" a listener but rather temporarily disables it so that it's not visible at all to let another process take its place. The term "suspend" is more suitable, since the "pause" is actually what we'll need to apply to the FULL and LIMITED states which really need to make a pause in the accept process. And it goes well with the use of the "resume" function that will also need to be made per-protocol. Let's rename the function and make it act on the receiver since it's already what it essentially does, hence the prefix "_rx" to make it more explicit. The protocol struct was a bit reordered because it was becoming a real mess between the parts related to the listeners and those for the receivers.
* MINOR: protocol: rename the ->listeners field to ->receiversWilly Tarreau2020-10-077-32/+32
| | | | | | | Since the listeners were split into receiver+listener, this field ought to have been renamed because it's confusing. It really links receivers and not listeners, as most of the time it's used via rx.proto_list! The nb_listeners field was updated accordingly.
* CLEANUP: listeners: remove the now unused enable_all_listeners()Willy Tarreau2020-10-072-26/+0
| | | | | It's not used anymore since previous commit. The good thing is that no more listener function now directly acts on a protocol.
* CLEANUP: protocol: remove the ->enable_all methodWilly Tarreau2020-10-075-9/+1
| | | | | It's not used anymore, now the listeners are enabled from protocol_enable_all().
* MINOR: protocol: directly call enable_listener() from protocol_enable_all()Willy Tarreau2020-10-071-8/+6
| | | | | | | | protocol_enable_all() calls proto->enable_all() for all protocols, which is always equal to enable_all_listeners() which in turn simply is a generic loop calling enable_listener() always returning ERR_NONE. Let's clean this madness by first calling enable_listener() directly from protocol_enable_all().
* MINOR: listeners: export enable_listener()Willy Tarreau2020-10-072-1/+9
| | | | we'll soon call it from outside.
* CLEANUP: listeners: remove unused disable_listener and disable_all_listenersWilly Tarreau2020-10-072-41/+0
| | | | | | | These ones have never been called, they were referenced by the protocol's disable_all for some protocols but there are no traces of their use, so in addition to not being sure the code works, it has never been tested. Let's remove a bit of complexity starting from there.
* CLEANUP: protocol: remove the ->disable_all methodWilly Tarreau2020-10-074-23/+0
| | | | | | This one has never been used, is only referenced by proto_uxst and proto_sockpair, and it's not even certain it works at all. Let's get rid of it.
* MINOR: listeners: move fd_stop_recv() to the receiver's socket codeWilly Tarreau2020-10-073-2/+4
| | | | | | | | | | fd_stop_recv() has nothing to do in the generic listener code, it's per protocol as some don't need it. For instance with abns@ it could even lead to fd_stop_recv(-1). And later with QUIC we don't want to touch the fd at all! It used to be that since commit f2cb169487 delegating fd manipulation to their respective threads it wasn't possible to call it down there but it's not the case anymore, so let's perform the action in the protocol-specific code.
* MINOR: listeners: correctly report pause() errorsWilly Tarreau2020-10-071-3/+2
| | | | | | | | | | | | | By using the same "ret" variable in the "if" block to test the return value of pause(), the second one shadows the first one and when forcing the result to zero in case of an error, it doesn't do anything. In practice this is not really used so we don't mind but it's dirty. The test on ==0 is wrong too since technically speaking a total stop validates the need for a pause, but stops the listener so it's just the resume that won't work anymore. We could switch to stopped but it's an involuntary switch and the user will not know. Better then mark it as paused and let the resume continue to fail so that only the resume will eventually report an error (e.g. abns@).
* CLEANUP: proxy: remove the now unused pause_proxies() and resume_proxies()Willy Tarreau2020-10-072-69/+0
| | | | | They're not used anymore, delete them before someone thinks about using them again!
* MAJOR: signals: use protocol_pause_all() and protocol_resume_all()Willy Tarreau2020-10-071-2/+11
| | | | | | | | | | | | | | | When temporarily pausing the listeners with SIG_TTOU, we now pause all listeners via the protocols instead of the proxies. This has the benefits that listeners are paused regardless of whether or not they belong to a visible proxy. And for resuming via SIG_TTIN we do the same, which allows to report binding conflicts and address them, since the operation can be repeated on a per-listener basis instead of a per-proxy basis. While in appearance all cases were properly handled, it's impossible to completely rule out the possibility that something broken used to work by luck due to the scan ordering which is naturally different, hence the major tag.
* MINOR: protocol: introduce protocol_{pause,resume}_all()Willy Tarreau2020-10-072-0/+58
| | | | | | | | | | | | | These two functions are used to pause and resume all listeners of all protocols. They use the standard listener functions for this so they're supposed to handle the situation gracefully regardless of the upper proxies' states, and they will report completion on proxies once the switch is performed. It might be nice to define a particular "failed" state for listeners that cannot resume and to count them on proxies in order to mention that they're definitely stuck. On the other hand, the current situation is retryable which is quite appreciable as well.
* MEDIUM: listener/proxy: make the listeners notify about proxy pause/resumeWilly Tarreau2020-10-072-7/+15
| | | | | | | | | | | | | | | | | Till now, we used to call pause_proxy()/resume_proxy() to enable/disable processing on a proxy, which is used during soft reloads. But since we want to drive this process from the listeners themselves, we have to instead proceed the other way around so that when we enable/disable a listener, it checks if it changed anything for the proxy and notifies about updates at this level. The detection is made using li_ready=0 for pause(), and li_paused=0 for resume(). Note that we must not include any test for li_bound because this state is seen by processes which share the listener with another one and which must not act on it since the other process will do it. As such the socket behind the FD will automatically be paused and resume without its local state changing, but this is the limit of a multi-process system with shared listeners.
* MINOR: listeners: check the current listener earlier state in resume_listener()Willy Tarreau2020-10-071-3/+3
| | | | | | | It's quite confusing to have the test on LI_READY very low in the function as it should be made much earlier. Just like with previous commit, let's do it when entering. The additional states, however (limited, full) continue to go through the whole function.
* MINOR: listeners: check the current listener state in pause_listener()Willy Tarreau2020-10-071-0/+3
| | | | | It's better not to try to perform pause() actions on wrong states, so let's check this and make sure that all callers are now safe.
* MEDIUM: proxy: merge zombify_proxy() with stop_proxy()Willy Tarreau2020-10-073-36/+15
| | | | | | | The two functions don't need to be distinguished anymore since they have all the necessary info to act as needed on their listeners. Let's just pass via stop_proxy() and make it check for each listener which one to close or not.
* MEDIUM: proxy: remove start_proxies()Willy Tarreau2020-10-0726-197/+4
| | | | | | | | | | | | Its sole remaining purpose was to display "proxy foo started", which has little benefit and pollutes output for those with plenty of proxies. Let's remove it now. The VTCs were updated to reflect this, because many of them had explicit counts of dropped lines to match this message. This is tagged as MEDIUM because some users may be surprized by the loss of this quite old message.
* MEDIUM: proxy: replace proxy->state with proxy->disabledWilly Tarreau2020-10-0718-58/+46
| | | | | | | | | | | | | | The remaining proxy states were only used to distinguish an enabled proxy from a disabled one. Due to the initialization order, both PR_STNEW and PR_STREADY were equivalent after startup, and they would only differ from PR_STSTOPPED when the proxy is disabled or shutdown (which is effectively another way to disable it). Now we just have a "disabled" field which allows to distinguish them. It's becoming obvious that start_proxies() is only used to print a greeting message now, that we'd rather get rid of. Probably that zombify_proxy() and stop_proxy() should be merged once their differences move to the right place.
* CLEANUP: peers: don't use the PR_ST* states to mark enabled/disabledWilly Tarreau2020-10-073-8/+8
| | | | | | | | | The enabled/disabled config options were stored into a "state" field that is an integer but contained only PR_STNEW or PR_STSTOPPED, which is a bit confusing, and causes a dependency with proxies. This was renamed to "disabled" and is used as a boolean. The field was also moved to the end of the struct to stop creating a hole and fill another one.
* MINOR: startup: don't rely on PR_STNEW to check for listenersWilly Tarreau2020-10-071-1/+1
| | | | | | | Instead of looking at listeners in proxies in PR_STNEW state, we'd rather check for listeners in those not in PR_STSTOPPED as it's only this state which indicates the proxy was disabled. And let's check the listeners count instead of testing the list's head.
* MEDIUM: proxy: remove state PR_STPAUSEDWilly Tarreau2020-10-072-11/+6
| | | | | | | | | | | | | | | | | | | | | | | | | This state was used to mention that a proxy was in PAUSED state, as opposed to the READY state. This was causing some trouble because if a listener failed to resume (e.g. because its port was temporarily in use during the resume), it was not possible to retry the operation later. Now by checking the number of READY or PAUSED listeners instead, we can accurately know if something went bad and try to fix it again later. The case of the temporary port conflict during resume now works well: $ socat readline /tmp/sock1 prompt > disable frontend testme3 > disable frontend testme3 All sockets are already disabled. > enable frontend testme3 Failed to resume frontend, check logs for precise cause (port conflict?). > enable frontend testme3 > enable frontend testme3 All sockets are already enabled.
* MEDIUM: proxy: remove the PR_STERROR stateWilly Tarreau2020-10-072-10/+5
| | | | | | | This state is only set when a pause() fails but isn't even set when a resume() fails. And we cannot recover from this state. Instead, let's just count remaining ready listeners to decide to emit an error or not. It's more accurate and will better support new attempts if needed.
* MEDIUM: proxy: remove the unused PR_STFULL stateWilly Tarreau2020-10-074-12/+4
| | | | | | | | | | | Since v1.4 or so, it's almost not possible anymore to set this state. The only exception is by using the CLI to change a frontend's maxconn setting below its current usage. This case makes no sense, and for other cases it doesn't make sense either because "full" is a vague concept when only certain listeners are full and not all. Let's just remove this unused state and make it clear that it's not reported. The "ready" or "open" states will continue to be reported without being misleading as they will be opposed to "stop".
* MINOR: proxy: maintain per-state counters of listenersWilly Tarreau2020-10-072-1/+51
| | | | | | | | | | | The proxy state tries to be synthetic but that doesn't work well with many listeners, especially for transition phases or after a failed pause/resume. In order to address this, we'll instead rely on counters of listeners in a given state for the 3 major states (ready, paused, listen) and a total counter. We'll now be able to determine a proxy's state by comparing these counters only.
* MINOR: listeners: introduce listener_set_state()Willy Tarreau2020-10-076-22/+31
| | | | | | | This function is used as a wrapper to set a listener's state everywhere. We'll use it later to maintain some counters in a consistent state when switching state so it's capital that all state changes go through it. No functional change was made beyond calling the wrapper.
* CLEANUP: proxy: remove the first_to_listen hack in zombify_proxy()Willy Tarreau2020-10-071-22/+1
| | | | | | This thing was needed for an optimization used in soft_stop() which doesn't exist anymore, so let's remove it as it's cryptic and hinders the listeners cleanup.
* MINOR: listeners: do not uselessly try to close zombie listeners in soft_stop()Willy Tarreau2020-10-071-15/+0
| | | | | | The loop doesn't match anymore since the non-started listeners are in LI_INIT and even if it had ever worked the benefit of closing zombies at this point looks void at best.
* MEDIUM: listeners: remove the now unused ZOMBIE stateWilly Tarreau2020-10-076-41/+17
| | | | | | | | | | | | | | The zombie state is not used anymore by the listeners, because in the last two cases where it was tested it couldn't match as it was covered by the test on the process mask. Instead now the FD is either in the LISTEN state or the INIT state. This also avoids forcing the listener to be single-dimensional because actually belonging to another process isn't totally exclusive with the other states, which explains some of the difficulties requiring to check the proc_mask and the fd sometimes. So let's get rid of it now not to be tempted to reuse it. The doc on the listeners state was updated.
* MEDIUM: deinit: close all receivers/listeners before scanning proxiesWilly Tarreau2020-10-071-14/+31
| | | | | | | | | | | | | | | Because of the zombie state, proxies have a skewed vision of the state of listeners, which explains why there are hacks switching the state from ZOMBIE to INIT in the proxy cleaning loop. This is particularly complicated and not needed, as all the information is now available in the protocol list and the fdtab. What we do here instead is to first close all active listeners or receivers by protocol and clean their protocol parts. Then we scan the fdtab to get rid of remaining ones that were necessarily in INIT state after a previous invocation of delete_listener(). From this point, we know the listeners are cleaned, the can safely be freed by scanning the proxies.
* MEDIUM: listeners: make unbind_listener() converge if neededWilly Tarreau2020-10-071-6/+8
| | | | | | | | | | | | | | | The ZOMBIE state on listener is a real mess. Listeners passing through this state have lost their consistency with the proxy AND with the fdtab. Plus this state is not used for all foreign listeners, only for those belonging to a proxy that entirely runs on another process, otherwise it stays in INIT state, which makes the usefulness extremely questionable. But the real issue is that it's impossible to untangle the receivers from the proxy state as long as we have this because of deinit()... So what we do here is to start by making unbind_listener() support being called more than once. This will permit to call it again to really close the FD and finish the operations if it's called with an FD that's in a fake state (such as INIT but with a valid fd).
* MEDIUM: init: stop disabled proxies after initializing fdtabWilly Tarreau2020-10-072-2/+11
| | | | | | | | | | | | | | During the startup process we don't have any fdtab nor fd_updt for quite a long time, and as such some operations on the listeners are not permitted, such as fd_want_*/fd_stop_* or fd_delete(). The latter is of particular concern because it's used when stopping a disabled frontend, and it's performed very early during check_config_validity() while there is no fdtab yet. The trick till now relies on the listener's state which is a bit brittle. There is absolutely no valid reason for stopping a proxy's listeners this early, we can postpone it after init_pollers() which will at least have allocated fdtab.
* MEDIUM: listeners: don't bounce listeners management between queuesWilly Tarreau2020-10-071-37/+0
| | | | | | | | | | | | | | | | | | | | | During 2.1 development, commit f2cb16948 ("BUG/MAJOR: listener: fix thread safety in resume_listener()") was introduced to bounce the enabling/disabling of a listener's FD to one of its threads because the remains of fd_update_cache() were fundamentally incompatible with the need to call fd_want_recv() or fd_stop_recv() for another thread. However since then we've totally dropped such code and it's totally safe to use these functions on an FD that is solely used by another thread (this is even used by the FD migration code). The only remaining limitation concerning the wake up delay was addressed by previous commit "MEDIUM: fd: always wake up one thread when enabling a foreing FD". The current situation forces the FD management to remain in the pause_listener() and resume_listener() functions just so that it can bounce between threads, without having the ability to delegate it to the suitable protocol layer. So let's first remove this now unneeded workaround.
* MEDIUM: fd: always wake up one thread when enabling a foreing FDWilly Tarreau2020-10-071-2/+12
| | | | | | | | | Since 2.2 it's safe to enable/disable another thread's FD but the fd_wake calls will not immediately be considered because nothing wakes the other threads up. This will have an impact on listeners when deciding to resume them after they were paused, so at minima we want to wake up one of their threads, just like the scheduler does on task_kill(). This is what this patch does.
* MEDIUM: log: syslog TCP support on log forward section.Emeric Brun2020-10-072-5/+303
| | | | | | | | | This patch re-introduce the "bind" statement on log forward sections to handle syslog TCP listeners as defined in rfc-6587. As complement it introduce "maxconn", "backlog" and "timeout client" statements to parameter those listeners.
* MINOR: channel: new getword and getchar functions on channel.Emeric Brun2020-10-072-0/+76
| | | | | This patch adds two new functions to get a char or a word from a channel.