diff options
author | Jim Schutt <jaschut@sandia.gov> | 2010-12-02 12:41:35 -0700 |
---|---|---|
committer | Sage Weil <sage@newdream.net> | 2010-12-03 09:10:58 -0800 |
commit | a5297388a7495fa23612d9477537d1f875784ba5 (patch) | |
tree | d0ae2a367a50fc782ea014a719a5783dcb4212a0 /src/msg/tcp.h | |
parent | 39b42b21e9805b3ec838f8682420166fede719f2 (diff) | |
download | ceph-a5297388a7495fa23612d9477537d1f875784ba5.tar.gz |
msgr: Correctly handle half-open connections.
If poll() says a socket is ready for reading, but zero bytes
are read, that means that the peer has sent a FIN. Handle that.
One way the incorrect handling was manifesting is as follows:
Under a heavy write load, clients log many messages like this:
[19021.523192] libceph: tid 876 timed out on osd6, will reset osd
[19021.523328] libceph: tid 866 timed out on osd10, will reset osd
[19081.616032] libceph: tid 841 timed out on osd0, will reset osd
[19081.616121] libceph: tid 826 timed out on osd2, will reset osd
[19081.616176] libceph: tid 806 timed out on osd3, will reset osd
[19081.616226] libceph: tid 875 timed out on osd9, will reset osd
[19081.616275] libceph: tid 834 timed out on osd12, will reset osd
[19081.616326] libceph: tid 874 timed out on osd10, will reset osd
After the clients are done writing and the file system should
be quiet, osd hosts have a high load with many active threads:
$ ps u -C cosd
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1383 162 11.5 1456248 943224 ? Ssl 11:31 406:59 /usr/bin/cosd -i 7 -c /etc/ceph/ceph.conf
$ for p in `ps -C cosd -o pid --no-headers`; do grep -nH State /proc/$p/task/*/status | grep -v sleep; done
/proc/1383/task/10702/status:2:State: R (running)
/proc/1383/task/10710/status:2:State: R (running)
/proc/1383/task/10717/status:2:State: R (running)
/proc/1383/task/11396/status:2:State: R (running)
/proc/1383/task/27111/status:2:State: R (running)
/proc/1383/task/27117/status:2:State: R (running)
/proc/1383/task/27162/status:2:State: R (running)
/proc/1383/task/27694/status:2:State: R (running)
/proc/1383/task/27704/status:2:State: R (running)
/proc/1383/task/27728/status:2:State: R (running)
With this fix applied, a heavy load still causes many client
resets of osds, but no runaway threads result.
Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Sage Weil <sage@newdream.net>
Diffstat (limited to 'src/msg/tcp.h')
-rw-r--r-- | src/msg/tcp.h | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/src/msg/tcp.h b/src/msg/tcp.h index 31ae967747b..bccdbda213d 100644 --- a/src/msg/tcp.h +++ b/src/msg/tcp.h @@ -26,7 +26,7 @@ inline ostream& operator<<(ostream& out, const sockaddr_storage &ss) } extern int tcp_read(int sd, char *buf, int len, int timeout=-1); -extern int tcp_wait(int sd, int timeout); +extern int tcp_read_wait(int sd, int timeout); extern int tcp_read_nonblocking(int sd, char *buf, int len); extern int tcp_write(int sd, const char *buf, int len); |