api - 来自 k8s 主节点的集群 IP 的连接受损/延迟

标签 api kubernetes timeout metrics flannel

我在带有 flannel:v0.11.0 的 CentOS 7 上使用 kubernetes 1.17,并且在从控制平面访问我的 CLUSTER-IP 时遇到问题。

我使用 kubeadm 手动安装和设置集群。

这基本上是我的集群:

k8s-master-01 10.0.0.50/24
k8s-worker-01 10.0.0.60/24 
k8s-worker-02 10.0.0.61/24

Pod CIDR: 10.244.0.0/16
Service CIDR: 10.96.0.0/12

提示:每个节点有两个网卡(eth0:上行链路,eth1:私有(private))上面列出的IP分别分配给eth1。 kubelet、kube-proxy 和 flannel 配置为通过 eth1 上的专用网络发送/接收它们的流量。

当我尝试通过 kube-apiserver 提供 metric-server api 时,我第一次遇到了这个问题。我按照 here 中的说明进行操作. 控制平面似乎无法与服务网络正常通信。

下面是我的 kube-system 命名空间的 pod:

$ kubectl get pods -n kube-system -o wide
NAME                                    READY   STATUS    RESTARTS   AGE     IP            NODE            NOMINATED NODE   READINESS GATES
coredns-6955765f44-jrbs6                0/1     Running   9          24d     10.244.0.30   k8s-master-01   <none>           <none>
coredns-6955765f44-mwn2l                1/1     Running   8          24d     10.244.1.37   k8s-worker-01   <none>           <none>
etcd-k8s-master-01                      1/1     Running   9          24d     10.0.0.50     k8s-master-01   <none>           <none>
kube-apiserver-k8s-master-01            1/1     Running   0          2m26s   10.0.0.50     k8s-master-01   <none>           <none>
kube-controller-manager-k8s-master-01   1/1     Running   15         24d     10.0.0.50     k8s-master-01   <none>           <none>
kube-flannel-ds-amd64-7d6jq             1/1     Running   11         26d     10.0.0.60     k8s-worker-01   <none>           <none>
kube-flannel-ds-amd64-c5rj2             1/1     Running   11         26d     10.0.0.50     k8s-master-01   <none>           <none>
kube-flannel-ds-amd64-dsg6l             1/1     Running   11         26d     10.0.0.61     k8s-worker-02   <none>           <none>
kube-proxy-mrz9v                        1/1     Running   10         24d     10.0.0.50     k8s-master-01   <none>           <none>
kube-proxy-slt95                        1/1     Running   9          24d     10.0.0.61     k8s-worker-02   <none>           <none>
kube-proxy-txlrp                        1/1     Running   9          24d     10.0.0.60     k8s-worker-01   <none>           <none>
kube-scheduler-k8s-master-01            1/1     Running   14         24d     10.0.0.50     k8s-master-01   <none>           <none>
metrics-server-67684d476-mrvj2          1/1     Running   2          7d23h   10.244.2.43   k8s-worker-02   <none>           <none>

这是我的服务:

kubectl get services --all-namespaces -o wide
NAMESPACE              NAME                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE    SELECTOR
default                kubernetes                  ClusterIP   10.96.0.1       <none>        443/TCP                  26d    <none>
default                phpdemo                     ClusterIP   10.96.52.157    <none>        80/TCP                   11d    app=phpdemo
kube-system            kube-dns                    ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP,9153/TCP   26d    k8s-app=kube-dns
kube-system            metrics-server              ClusterIP   10.96.71.138    <none>        443/TCP                  5d3h   k8s-app=metrics-server
kubernetes-dashboard   dashboard-metrics-scraper   ClusterIP   10.99.136.237   <none>        8000/TCP                 23d    k8s-app=dashboard-metrics-scraper
kubernetes-dashboard   kubernetes-dashboard        ClusterIP   10.97.209.113   <none>        443/TCP                  23d    k8s-app=kubernetes-dashboard

由于连接检查失败,Metric API 无法工作:

$ kubectl describe apiservice v1beta1.metrics.k8s.io
...
Status:
  Conditions:
    Last Transition Time:  2019-12-27T21:25:01Z
    Message:               failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    Reason:                FailedDiscoveryCheck
    Status:                False
    Type:  

kube-apiserver 没有连接:

$ kubectl logs --tail=20 kube-apiserver-k8s-master-01 -n kube-system
...
I0101 22:27:00.712413       1 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
W0101 22:27:00.712514       1 handler_proxy.go:97] no RequestInfo found in the context
E0101 22:27:00.712559       1 controller.go:114] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I0101 22:27:00.712591       1 controller.go:127] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
E0101 22:27:04.712991       1 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0101 22:27:09.714801       1 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0101 22:27:34.709557       1 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0101 22:27:39.714173       1 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

我试图弄清楚 kube-apiserver 上发生了什么,最终可以确认问题所在。我在 >60 秒后收到延迟响应(不幸的是 time 未安装)

$ kubectl exec -it kube-apiserver-k8s-master-01 -n kube-system -- /bin/sh
# echo -e "GET /apis/metrics.k8s.io/v1beta1 HTTP/1.1\r\nHost:v1beta1.metrics.k8s.io\r\n" | openssl s_client -connect 10.96.71.138:443 -quiet
Can't use SSL_get_servername
depth=1 CN = localhost-ca@1577481905
verify error:num=19:self signed certificate in certificate chain
verify return:1
depth=1 CN = localhost-ca@1577481905
verify return:1
depth=0 CN = localhost@1577481906
verify return:1
HTTP/1.1 400 Bad Request
Content-Type: text/plain; charset=utf-8
Connection: close

我自己的两个测试 pod(分别来自两个不同的工作节点)执行了相同的命令。因此,可以从工作节点上的 Pod 网络访问服务 IP:

$ kubectl exec -it phpdemo-55858f97c4-fjc6q -- /bin/sh
/usr/local/bin # echo -e "GET /apis/metrics.k8s.io/v1beta1 HTTP/1.1\r\nHost:v1beta1.metrics.k8s.io\r\n" | openssl s_client -connect 10.96.71.138:443 -quiet
Can't use SSL_get_servername
depth=1 CN = localhost-ca@1577481905
verify error:num=19:self signed certificate in certificate chain
verify return:1
depth=1 CN = localhost-ca@1577481905
verify return:1
depth=0 CN = localhost@1577481906
verify return:1
HTTP/1.1 403 Forbidden
Content-Type: application/json
X-Content-Type-Options: nosniff
Date: Wed, 01 Jan 2020 22:53:44 GMT
Content-Length: 212

{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"forbidden: User \"system:anonymous\" cannot get path \"/apis/metrics.k8s.io/v1beta1\"","reason":"Forbidden","details":{},"code":403}

还有来自工作节点的:

[root@k8s-worker-02 ~ ] time curl -k https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/apis/metrics.k8s.io/v1beta1\"",
  "reason": "Forbidden",
  "details": {

  },
  "code": 403
}
real    0m0.146s
user    0m0.048s
sys 0m0.089s

这在我的主节点上不起作用。我在 >60 秒后收到延迟响应

[root@k8s-master-01 ~ ] time curl -k https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/apis/metrics.k8s.io/v1beta1\"",
  "reason": "Forbidden",
  "details": {

  },
  "code": 403
}
real    1m3.248s
user    0m0.061s
sys 0m0.079s

从主节点我可以看到很多未回复的 SYN_SENT 数据包。

[root@k8s-master-01 ~ ] conntrack -L -d 10.96.71.138
tcp      6 75 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48550 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=19813 mark=0 use=1
tcp      6 5 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48287 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=23710 mark=0 use=1
tcp      6 40 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48422 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=24286 mark=0 use=1
tcp      6 5 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48286 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=35030 mark=0 use=1
tcp      6 80 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48574 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=40636 mark=0 use=1
tcp      6 50 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48464 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=65512 mark=0 use=1
tcp      6 5 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48290 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=47617 mark=0 use=1

iptables 已设置:

[root@k8s-master-01 ~ ] iptables-save | grep 10.96.71.138
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.71.138/32 -p tcp -m comment --comment "kube-system/metrics-server: cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.71.138/32 -p tcp -m comment --comment "kube-system/metrics-server: cluster IP" -m tcp --dport 443 -j KUBE-SVC-LC5QY66VUV2HJ6WZ

kube-proxy 在每个节点上正常运行。

$ kubectl get pods -A -o wide
...
kube-system            kube-proxy-mrz9v                             1/1     Running   10         21d    10.0.0.50     k8s-master-01   <none>           <none>
kube-system            kube-proxy-slt95                             1/1     Running   9          21d    10.0.0.61     k8s-worker-02   <none>           <none>
kube-system            kube-proxy-txlrp                             1/1     Running   9          21d    10.0.0.60     k8s-worker-01   <none>           <none>
$ kubectl -n kube-system logs kube-proxy-mrz9v
W0101 21:31:14.268698       1 server_others.go:323] Unknown proxy mode "", assuming iptables proxy
I0101 21:31:14.283958       1 node.go:135] Successfully retrieved node IP: 10.0.0.50
I0101 21:31:14.284034       1 server_others.go:145] Using iptables Proxier.
I0101 21:31:14.284624       1 server.go:571] Version: v1.17.0
I0101 21:31:14.286031       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0101 21:31:14.286093       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0101 21:31:14.287207       1 conntrack.go:83] Setting conntrack hashsize to 32768
I0101 21:31:14.298760       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0101 21:31:14.298984       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0101 21:31:14.300618       1 config.go:313] Starting service config controller
I0101 21:31:14.300665       1 shared_informer.go:197] Waiting for caches to sync for service config
I0101 21:31:14.300720       1 config.go:131] Starting endpoints config controller
I0101 21:31:14.300740       1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I0101 21:31:14.400864       1 shared_informer.go:204] Caches are synced for service config 
I0101 21:31:14.401021       1 shared_informer.go:204] Caches are synced for endpoints config 

> kubectl -n kube-system logs kube-proxy-slt95
W0101 21:31:13.856897       1 server_others.go:323] Unknown proxy mode "", assuming iptables proxy
I0101 21:31:13.905653       1 node.go:135] Successfully retrieved node IP: 10.0.0.61
I0101 21:31:13.905704       1 server_others.go:145] Using iptables Proxier.
I0101 21:31:13.906370       1 server.go:571] Version: v1.17.0
I0101 21:31:13.906983       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0101 21:31:13.907032       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0101 21:31:13.907413       1 conntrack.go:83] Setting conntrack hashsize to 32768
I0101 21:31:13.912221       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0101 21:31:13.912321       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0101 21:31:13.915322       1 config.go:313] Starting service config controller
I0101 21:31:13.915353       1 shared_informer.go:197] Waiting for caches to sync for service config
I0101 21:31:13.915755       1 config.go:131] Starting endpoints config controller
I0101 21:31:13.915779       1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I0101 21:31:14.016995       1 shared_informer.go:204] Caches are synced for endpoints config 
I0101 21:31:14.017115       1 shared_informer.go:204] Caches are synced for service config 

> kubectl -n kube-system logs kube-proxy-txlrp
W0101 21:31:13.552518       1 server_others.go:323] Unknown proxy mode "", assuming iptables proxy
I0101 21:31:13.696793       1 node.go:135] Successfully retrieved node IP: 10.0.0.60
I0101 21:31:13.696846       1 server_others.go:145] Using iptables Proxier.
I0101 21:31:13.697396       1 server.go:571] Version: v1.17.0
I0101 21:31:13.698000       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0101 21:31:13.698101       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0101 21:31:13.698509       1 conntrack.go:83] Setting conntrack hashsize to 32768
I0101 21:31:13.704280       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0101 21:31:13.704467       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0101 21:31:13.704888       1 config.go:131] Starting endpoints config controller
I0101 21:31:13.704935       1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I0101 21:31:13.705046       1 config.go:313] Starting service config controller
I0101 21:31:13.705059       1 shared_informer.go:197] Waiting for caches to sync for service config
I0101 21:31:13.806299       1 shared_informer.go:204] Caches are synced for endpoints config 
I0101 21:31:13.806430       1 shared_informer.go:204] Caches are synced for service config 

这是我的(默认)kube-proxy 设置:

$ kubectl -n kube-system get configmap kube-proxy -o yaml
apiVersion: v1
data:
  config.conf: |-
    apiVersion: kubeproxy.config.k8s.io/v1alpha1
    bindAddress: 0.0.0.0
    clientConnection:
      acceptContentTypes: ""
      burst: 10
      contentType: application/vnd.kubernetes.protobuf
      kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
      qps: 5
    clusterCIDR: 10.244.0.0/16
    configSyncPeriod: 15m0s
    conntrack:
      maxPerCore: 32768
      min: 131072
      tcpCloseWaitTimeout: 1h0m0s
      tcpEstablishedTimeout: 24h0m0s
    enableProfiling: false
    healthzBindAddress: 0.0.0.0:10256
    hostnameOverride: ""
    iptables:
      masqueradeAll: false
      masqueradeBit: 14
      minSyncPeriod: 0s
      syncPeriod: 30s
    ipvs:
      excludeCIDRs: null
      minSyncPeriod: 0s
      scheduler: ""
      strictARP: false
      syncPeriod: 30s
    kind: KubeProxyConfiguration
    metricsBindAddress: 127.0.0.1:10249
    mode: ""
    nodePortAddresses: null
    oomScoreAdj: -999
    portRange: ""
    udpIdleTimeout: 250ms
    winkernel:
      enableDSR: false
      networkName: ""
      sourceVip: ""
  kubeconfig.conf: |-
    apiVersion: v1
    kind: Config
    clusters:
    - cluster:
        certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        server: https://10.0.0.50:6443
      name: default
    contexts:
    - context:
        cluster: default
        namespace: default
        user: default
      name: default
    current-context: default
    users:
    - name: default
      user:
        tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
  creationTimestamp: "2019-12-06T22:07:40Z"
  labels:
    app: kube-proxy
  name: kube-proxy
  namespace: kube-system
  resourceVersion: "185"
  selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
  uid: bac4a8df-e318-4c91-a6ed-9305e58ac6d9
$ kubectl -n kube-system get daemonset kube-proxy -o yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    deprecated.daemonset.template.generation: "2"
  creationTimestamp: "2019-12-06T22:07:40Z"
  generation: 2
  labels:
    k8s-app: kube-proxy
  name: kube-proxy
  namespace: kube-system
  resourceVersion: "115436"
  selfLink: /apis/apps/v1/namespaces/kube-system/daemonsets/kube-proxy
  uid: 64a53d29-1eaa-424f-9ebd-606bcdb3169c
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: kube-proxy
  template:
    metadata:
      creationTimestamp: null
      labels:
        k8s-app: kube-proxy
    spec:
      containers:
      - command:
        - /usr/local/bin/kube-proxy
        - --config=/var/lib/kube-proxy/config.conf
        - --hostname-override=$(NODE_NAME)
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        image: k8s.gcr.io/kube-proxy:v1.17.0
        imagePullPolicy: IfNotPresent
        name: kube-proxy
        resources: {}
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/lib/kube-proxy
          name: kube-proxy
        - mountPath: /run/xtables.lock
          name: xtables-lock
        - mountPath: /lib/modules
          name: lib-modules
          readOnly: true
      dnsPolicy: ClusterFirst
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/os: linux
      priorityClassName: system-node-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: kube-proxy
      serviceAccountName: kube-proxy
      terminationGracePeriodSeconds: 30
      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      - operator: Exists
      volumes:
      - configMap:
          defaultMode: 420
          name: kube-proxy
        name: kube-proxy
      - hostPath:
          path: /run/xtables.lock
          type: FileOrCreate
        name: xtables-lock
      - hostPath:
          path: /lib/modules
          type: ""
        name: lib-modules
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
status:
  currentNumberScheduled: 3
  desiredNumberScheduled: 3
  numberAvailable: 3
  numberMisscheduled: 0
  numberReady: 3
  observedGeneration: 2
  updatedNumberScheduled: 3

这仅仅是配置错误的结果还是错误? 感谢您的帮助。

最佳答案

以下是我为让它工作所做的工作:

1.在 kube API 服务器中设置 - --enable-aggregator-routing=true 标志。

2.在 metrics-server-deployment.yaml 中设置以下标志

- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP

3.在 metrics-server-deployment.yaml 中设置 hostNetwork: true

关于api - 来自 k8s 主节点的集群 IP 的连接受损/延迟,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59581669/

相关文章:

c++ - glMapBufferRange 卡住 OpenGL 驱动程序

mysql - golang sql查询超时

javascript - 表达错误 [ERR_HTTP_HEADERS_SENT] : Cannot set headers after they are sent to the client

mysql - 无法登录部署在k8s集群中的mysql服务器

angular - 如何区分内部和外部 REST API 请求?

kubernetes - 如何在本地从谷歌云访问磁盘快照?

docker - Minikube在带有hyperkit驱动程序和VPN的Mac上不起作用

clojure - 为什么取消的 Clojure future 继续使用 CPU?

javascript - 将 Create-React-App 与 ExpressJs 结合使用

python - 在 iPython Notebook(Twitter API)中使用 Tweepy - 错误 "data must be a byte string"