linux - Haproxy 中大量的 TIME_WAIT

我们将 haproxy 1.3.26 托管在配备 2.13 GHz Intel Xeon 处理器的 CentOS 5.9 机器上，该处理器充当众多服务的 http 和 tcp 负载均衡器，峰值吞吐量约为 2000 个请求/秒。它已经运行了 2 年，但流量和服务数量都在逐渐增加。

我们观察到，即使在重新加载后，旧的 haproxy 进程仍然存在。在进一步调查中，我们发现旧进程有许多处于 TIME_WAIT 状态的连接。我们还看到 netstat 和 lsof 花费了很长时间。关于引用http://agiletesting.blogspot.in/2013/07/the-mystery-of-stale-haproxy-processes.html我们引入了 option forceclose，但它扰乱了各种监控服务，因此将其还原。在进一步挖掘中，我们意识到在 /proc/net/sockstat 中，接近 200K 套接字处于 tw (TIME_WAIT) 状态，这令人惊讶，如/etc/haproxy/haproxy.cfg maxconn 已指定为 31000，ulimit-n 已指定为 64000。我们有 timeout server 和 timeout client 为 300s 我们改成了 30s 但用处不大。

现在的疑问是:-

如此大量的 TIME_WAIT 是否可以接受。如果是，我们应该担心的数字是多少。看着What is the cost of many TIME_WAIT on the server side?和 Setting TIME_WAIT TCP看起来应该没有任何问题。
如何减少这些 TIME_WAIT
有没有 netstat 和 lsof 的替代品，即使有非常多的 TIME_WAIT 也能正常运行

最佳答案

注意:本回答中的引号均来自a mail by Willy Tarreau (HAProxy 的主要作者)到 HAProxy 邮件列表。

处于TIME_WAIT 状态的连接是无害的，不再真正消耗任何资源。它们由内核在服务器上保存一段时间，以应对连接关闭后仍然收到包的罕见事件。关闭的连接在该状态下保持的默认时间通常为 120 秒(或最大段生命周期的 2 倍)

TIME_WAIT are harmless on the server side. You can easily reach millions without any issues.

如果您仍想减少该数量以更早地释放连接，您可以指示内核这样做。例如将其设置为 30 秒执行此操作:

echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout

如果你有很多连接(无论是否在 TIME_WAIT 中)，netstat、lsof、ipcs 的性能都非常差，实际上会减慢整个过程系统宕机。再次引用威利的话:

There are two commands that you must absolutely never use in a monitoring system :

netstat -a

ipcs -a

Both of them will saturate the system and considerably slow it down when something starts to go wrong. For the sockets you should use what's in /proc/net/sockstat. You have all the numbers you want. If you need more details, use ss -a instead of netstat -a, it uses the netlink interface and is several orders of magnitude faster.

在 Debian 和 Ubuntu 系统上，ss 在 iproute 或 iproute2 包中可用(取决于您的发行版本)。

关于linux - Haproxy 中大量的 TIME_WAIT，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20421705/

linux - Haproxy 中大量的 TIME_WAIT

上一篇：linux - AWK脚本打印字段数最多的行

下一篇：python - 从 Python 运行 bash 脚本