nginx - GKE配置502错误网关上的Ingress-nginx

标签 nginx kubernetes kubernetes-ingress nginx-config nginx-ingress

我正在尝试通过Ingress-nginx和Google云负载均衡器公开GKE集群中的mlflow模型。
各个部署的服务配置如下所示:

apiVersion: v1
kind: Service
metadata:
  name: model-inference-service
  labels:
    app: inference
spec:
  ports:
  - port: 5555
    targetPort: 5555
  selector:
    app: inference
使用kubectl port-forward service/model-inference-service 5555:5555将此服务转发到localhost时,我可以使用以下script将测试图像发送到api端点,从而成功查询模型。
请求发送到的URL是http://127.0.0.1:5555/invocations
这可以按预期工作,因此我假设运行Pod的部署公开了模型,并且正确配置了相应的clusterIP服务model-inference-service
接下来,我通过执行以下操作将ingress-nxinx安装到集群中
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install my-release ingress-nginx/ingress-nginx
入口的配置如下(我怀疑错误一定在这里吗?):
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
#    nginx.ingress.kubernetes.io/rewrite-target: /invocations
  name: inference-ingress
  namespace: default
  labels:
    app: inference
spec:
  rules:
    - http:
        paths:
          - path: /invocations
            backend:
              serviceName: model-inference-service
              servicePort: 5555
入口 Controller Pod运行成功:
my-release-ingress-nginx-controller-6758cc8f45-fwtw7   1/1     Running   0          3h33m
在GCP控制台中,我可以看到负载均衡器也已成功创建,并且可以获取其IP。
当使用我之前向Rest api端点发出请求(以前该服务已转发到localhost)但现在使用负载均衡器ip的测试脚本时,出现502 Bad Gateway错误:
该URL现在是以下内容:http://34.90.4.0:80/invocations
Traceback (most recent call last):
  File "test_inference.py", line 80, in <module>
    run()
  File "//anaconda3/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "//anaconda3/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "//anaconda3/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "//anaconda3/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "test_inference.py", line 76, in run
    print(score_model(data_path, host, port).text)
  File "test_inference.py", line 54, in score_model
    status_code=response.status_code, text=response.text
Exception: Status Code 502. <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.19.1</center>
</body>
</html>

在浏览器中访问相同的URL时,它说:
502 Bad Gateway
nginx/1.19.1
入口 Controller 的日志状态:
2020/08/26 16:06:45 [warn] 86#86: *42282 a client request body is buffered to a temporary file /tmp/client-body/0000000009, client: 10.10.0.30, server: _, request: "POST /invocations HTTP/1.1", host: "34.90.4.0"
2020/08/26 16:06:45 [error] 86#86: *42282 connect() failed (111: Connection refused) while connecting to upstream, client: 10.10.0.30, server: _, request: "POST /invocations HTTP/1.1", upstream: "http://10.52.3.7:5555/invocations", host: "34.90.4.0"
2020/08/26 16:06:45 [error] 86#86: *42282 connect() failed (111: Connection refused) while connecting to upstream, client: 10.10.0.30, server: _, request: "POST /invocations HTTP/1.1", upstream: "http://10.52.3.7:5555/invocations", host: "34.90.4.0"
2020/08/26 16:06:45 [error] 86#86: *42282 connect() failed (111: Connection refused) while connecting to upstream, client: 10.10.0.30, server: _, request: "POST /invocations HTTP/1.1", upstream: "http://10.52.3.7:5555/invocations", host: "34.90.4.0"
10.10.0.30 - - [26/Aug/2020:16:06:45 +0000] "POST /invocations HTTP/1.1" 502 157 "-" "python-requests/2.24.0" 86151 0.738 [default-model-inference-service-5555] [] 10.52.3.7:5555, 10.52.3.7:5555, 10.52.3.7:5555 0, 0, 0 0.000, 0.001, 0.000 502, 502, 502 0d86e360427c0a81c287da4ff5e907bc
为了测试入口和负载平衡器是否在原则上工作,我用真正的rest api替换了docker镜像,我要用此docker image公开该文件,该文件在端口5050和路径/上返回“hello world”。我在上面显示的服务和入口 list 中更改了端口和路径(从/invocations/),并且在浏览器中访问负载均衡器的ip时可以成功看到“hello world”。
有人看到我做错了吗?
非常感谢你!
最好的祝福,
F

最佳答案

您共享的配置看起来不错。群集环境中一定有某种原因导致此行为。查看Pod到Pod的通讯是否正常。在与Nginx入口 Controller 相同的节点上启动测试容器,并从该容器到目标服务执行curl。查看是否遇到任何DNS或网络问题。尝试在调用服务时更改主机 header ,看看它是否对此敏感。

关于nginx - GKE配置502错误网关上的Ingress-nginx,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63601758/

相关文章:

asp.net-core - 无法使用 nginx 访问托管在 ubuntu 上的网站

wordpress - 如何使用WPML的 'different domain per language'配置dockerized wordpress nginx?

kubernetes - 为什么入口服务有地址?该地址有什么用?

kubernetes - 通过Kubernetes注释进行Traefik健康检查

kubernetes - persistenceVolumeClaim 所需的值

kubernetes - 如何修复 "503 Service Temporarily Unavailable"

javascript - AngularJS:[$compile:tpload] 无法加载模板

NGINX 反向代理到没有根子目录的主机

docker - Kube-DNS不起作用

kubernetes - dns-controller Kubernetes 部署尚未更新 Kubernetes 集群的 - AWS