summaryrefslogtreecommitdiff
path: root/src/msg/tcp.h
diff options
context:
space:
mode:
authorJim Schutt <jaschut@sandia.gov>2010-12-02 12:41:35 -0700
committerSage Weil <sage@newdream.net>2010-12-03 09:10:58 -0800
commita5297388a7495fa23612d9477537d1f875784ba5 (patch)
treed0ae2a367a50fc782ea014a719a5783dcb4212a0 /src/msg/tcp.h
parent39b42b21e9805b3ec838f8682420166fede719f2 (diff)
downloadceph-a5297388a7495fa23612d9477537d1f875784ba5.tar.gz
msgr: Correctly handle half-open connections.
If poll() says a socket is ready for reading, but zero bytes are read, that means that the peer has sent a FIN. Handle that. One way the incorrect handling was manifesting is as follows: Under a heavy write load, clients log many messages like this: [19021.523192] libceph: tid 876 timed out on osd6, will reset osd [19021.523328] libceph: tid 866 timed out on osd10, will reset osd [19081.616032] libceph: tid 841 timed out on osd0, will reset osd [19081.616121] libceph: tid 826 timed out on osd2, will reset osd [19081.616176] libceph: tid 806 timed out on osd3, will reset osd [19081.616226] libceph: tid 875 timed out on osd9, will reset osd [19081.616275] libceph: tid 834 timed out on osd12, will reset osd [19081.616326] libceph: tid 874 timed out on osd10, will reset osd After the clients are done writing and the file system should be quiet, osd hosts have a high load with many active threads: $ ps u -C cosd USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1383 162 11.5 1456248 943224 ? Ssl 11:31 406:59 /usr/bin/cosd -i 7 -c /etc/ceph/ceph.conf $ for p in `ps -C cosd -o pid --no-headers`; do grep -nH State /proc/$p/task/*/status | grep -v sleep; done /proc/1383/task/10702/status:2:State: R (running) /proc/1383/task/10710/status:2:State: R (running) /proc/1383/task/10717/status:2:State: R (running) /proc/1383/task/11396/status:2:State: R (running) /proc/1383/task/27111/status:2:State: R (running) /proc/1383/task/27117/status:2:State: R (running) /proc/1383/task/27162/status:2:State: R (running) /proc/1383/task/27694/status:2:State: R (running) /proc/1383/task/27704/status:2:State: R (running) /proc/1383/task/27728/status:2:State: R (running) With this fix applied, a heavy load still causes many client resets of osds, but no runaway threads result. Signed-off-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Sage Weil <sage@newdream.net>
Diffstat (limited to 'src/msg/tcp.h')
-rw-r--r--src/msg/tcp.h2
1 files changed, 1 insertions, 1 deletions
diff --git a/src/msg/tcp.h b/src/msg/tcp.h
index 31ae967747b..bccdbda213d 100644
--- a/src/msg/tcp.h
+++ b/src/msg/tcp.h
@@ -26,7 +26,7 @@ inline ostream& operator<<(ostream& out, const sockaddr_storage &ss)
}
extern int tcp_read(int sd, char *buf, int len, int timeout=-1);
-extern int tcp_wait(int sd, int timeout);
+extern int tcp_read_wait(int sd, int timeout);
extern int tcp_read_nonblocking(int sd, char *buf, int len);
extern int tcp_write(int sd, const char *buf, int len);