[nSLUG] SSH - Dropped connections

Dop Ganger nslug at fop.ns.ca
Sat Jan 17 08:49:27 AST 2004


On Sat, 17 Jan 2004 bdavidso at supercity.ns.ca wrote:

[snip]
> I changed a few network parameters on the servers and the client, as
> follows, and things have been much better:
>
> ~# cat /proc/sys/net/ipv4/tcp_keepalive_intvl
> 15
> ~# cat /proc/sys/net/ipv4/tcp_keepalive_time
> 300
> ~# cat /proc/sys/net/ipv4/tcp_keepalive_probes
> 1
[snip]
> I don't claim to be a tcp/ip guru; these were settings suggested for a
> piece of commercial software, and I found that the server using those
> setings didn't lose connections as odten as others.  I will leave it to
> others smarter than me (Peter? Dop?) to explain why it works; I just know
> that it does work.

Awww... I feel all special now ;->

The default values are:

dop at fop:/proc/sys/net/ipv4$ cat tcp_keepalive_intvl
75
dop at fop:/proc/sys/net/ipv4$ cat tcp_keepalive_time
7200
dop at fop:/proc/sys/net/ipv4$ cat tcp_keepalive_probes
9

What these settings are doing is forcing the TCP layer to send traffic
more often. The NAT device (or NAPT, or SNAT, or whatever acronym you're
using) watches connections and when no traffic has gone over the
connection after a certain period of time, the connection is considered
closed and the NAT device tears down the connection with the remote end.
The reason this is done is because otherwise, the connection table would
fill up with dead connection if the local clients crashed for some reason
without tearing down the connection (for example, some legacy operating
systems have what's called a "blue screen of death", requiring a power
cycle and thus implicitly closing the connection without explicitly
closing it).

As to the settings themselves; tcp_keepalive_intvl isn't documented, but
reading the code it appears to be (as the name implies) the interval
between TCP keepalive packets, measured in seconds (or, more accurately,
multiplied by HZ and then counted off by the TCP timer), when no traffic
has been sent over the TCP connection.  tcp_keepalive_time is the time in
seconds (ditto HZ) when keepalive packets are sent, regardless of whether
traffic is going over the connection. tcp_keepalive_probes is the number
of keepalive probes sent out before the connection is considered dead.

So, with your settings, the net result appears to be:

* keepalive probes are sent every 15 seconds on idle connections
* keepalive probes are sent every 5 minutes on all connections with
keepalive enabled
* if one keepalive probe is lost, the connection is considered dead

You might want to try ramping up tcp_keepalive_probes to something a bit
higher, then that way it should be a bit more solid - if you're on a lossy
connection (eg, congested upstream) then you can quite easily lose a
keepalive probe and then your connection is considered dead.

Cheers... Dop.




More information about the nSLUG mailing list