python - Gunicorn 一次响应的请求不超过 6 个

给你一些背景:

我有两个运行相同应用程序的服务器环境。第一个是我打算放弃的标准 Google App Engine 环境，它有很多限制。第二个是使用 Gunicorn 运行我的 Python 应用程序的 Google Kubernetes 集群。

并发

在第一台服务器上，我可以向应用程序发送多个请求，它会同时回答许多请求。我在两个环境中对应用程序运行两批同时请求。在 Google App Engine 上，第一批和第二批同时响应，第一批不会阻止第二批。

在 Kubernetes，服务器只同时响应 6 个，第一批阻塞第二个。我读过一些关于如何使用 gevent 或多线程实现 Gunicorn 并发的帖子，他们都说我需要有 CPU 内核，但问题是无论我投入多少 cpu，限制仍在继续。我已经尝试过从 1VCPU 到 8VCPU 的 Google 节点，但变化不大。

你们能给我关于我可能缺少什么的任何想法吗？也许谷歌集群节点限制？

Kubernetes 响应瀑布

如您所见，第二批仅在第一批开始完成后才开始响应。

App Engine 响应瀑布

最佳答案

您所描述的似乎表明您使用 sync worker 运行 Gunicorn 服务器。服务于 I/O 绑定(bind)应用程序的类。你能分享你的 Gunicorn 配置吗？

Google 的平台是否有可能在您的 Kubernetes 配置没有触发时具有某种自动缩放功能(我不太熟悉他们的服务)？

一般而言，增加单个实例的核心数量只有在您还增加为处理传入请求而产生的工作人员数量时才会有所帮助。请参阅 Gunicorn's design documentation特别强调工作人员类型部分(以及为什么 sync 工作人员对于 I/O 绑定(bind)应用程序不是最理想的)——这是一本很好的读物，并提供了关于这个问题的更详细的解释。

只是为了好玩，这里有一个小练习来比较这两种方法:

import time

def app(env, start_response):
    time.sleep(1) # takes 1 second to process the request
    start_response('200 OK', [('Content-Type', 'text/plain')])
    return [b'Hello World']

使用 4 个同步 worker 运行 Gunicorn:gunicorn --bind '127.0.0.1:9001' --workers 4 --worker-class sync --chdir app app:app
让我们同时触发8个请求:ab -n 8 -c 8 "http://localhost:9001/"

This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        gunicorn/19.8.1
Server Hostname:        localhost
Server Port:            9001

Document Path:          /
Document Length:        11 bytes

Concurrency Level:      8
Time taken for tests:   2.007 seconds
Complete requests:      8
Failed requests:        0
Total transferred:      1096 bytes
HTML transferred:       88 bytes
Requests per second:    3.99 [#/sec] (mean)
Time per request:       2006.938 [ms] (mean)
Time per request:       250.867 [ms] (mean, across all concurrent requests)
Transfer rate:          0.53 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   0.2      1       1
Processing:  1003 1504 535.7   2005    2005
Waiting:     1002 1504 535.8   2005    2005
Total:       1003 1505 535.8   2006    2006

Percentage of the requests served within a certain time (ms)
  50%   2006
  66%   2006
  75%   2006
  80%   2006
  90%   2006
  95%   2006
  98%   2006
  99%   2006
 100%   2006 (longest request)

大约 2 秒完成测试。这就是您在测试中遇到的行为 - 前 4 个请求让您的工作人员忙碌，第二批排队等待第一批处理完毕。

相同的测试，但让我们告诉 Gunicorn 使用异步 worker :unicorn --bind '127.0.0.1:9001' --workers 4 --worker-class gevent --chdir app app:app
与上述相同的测试:ab -n 8 -c 8 "http://localhost:9001/"

This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        gunicorn/19.8.1
Server Hostname:        localhost
Server Port:            9001

Document Path:          /
Document Length:        11 bytes

Concurrency Level:      8
Time taken for tests:   1.005 seconds
Complete requests:      8
Failed requests:        0
Total transferred:      1096 bytes
HTML transferred:       88 bytes
Requests per second:    7.96 [#/sec] (mean)
Time per request:       1005.463 [ms] (mean)
Time per request:       125.683 [ms] (mean, across all concurrent requests)
Transfer rate:          1.06 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   0.4      1       2
Processing:  1002 1003   0.6   1003    1004
Waiting:     1001 1003   0.9   1003    1004
Total:       1002 1004   0.9   1004    1005

Percentage of the requests served within a certain time (ms)
  50%   1004
  66%   1005
  75%   1005
  80%   1005
  90%   1005
  95%   1005
  98%   1005
  99%   1005
 100%   1005 (longest request)

实际上，我们在这里将应用程序的吞吐量翻了一番——回复所有请求只用了大约 1 秒。

要了解发生了什么，Gevent 有一个 great tutorial关于它的架构和this article对协程有更深入的解释。

如果对您的问题的实际原因有疑问，我提前道歉(我确实相信您最初的评论中缺少一些额外的信息，以便任何人有一个结论性的答案)。如果不是你，我希望这对其他人有帮助。 :)

另请注意，我已经过分简化了很多事情(我的示例是一个简单的概念证明)，调整 HTTP 服务器配置主要是一个试错练习——这完全取决于应用程序的工作负载类型和它的硬件坐在。

关于python - Gunicorn 一次响应的请求不超过 6 个，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49454072/

python - Gunicorn 一次响应的请求不超过 6 个

上一篇：kubernetes - 空闲一段时间后Kubernetes自动关闭

下一篇：带有 hostPath 挂载的谷歌云上的 Kubernetes