kubernetes - 为什么部署rook-ceph后在kubernetes中找不到osd pod?

标签 kubernetes ceph rook-storage kubernetes-rook

尝试按照本指南在 kubernetes 上安装 rook-ceph:

https://rook.io/docs/rook/v1.3/ceph-quickstart.html

git clone --single-branch --branch release-1.3 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph
kubectl create -f common.yaml
kubectl create -f operator.yaml
kubectl create -f cluster.yaml

当我检查所有 Pod 时

$ kubectl -n rook-ceph get pod
NAME                                            READY   STATUS    RESTARTS   AGE
csi-cephfsplugin-9c2z9                          3/3     Running   0          23m
csi-cephfsplugin-provisioner-7678bcfc46-s67hq   5/5     Running   0          23m
csi-cephfsplugin-provisioner-7678bcfc46-sfljd   5/5     Running   0          23m
csi-cephfsplugin-smmlf                          3/3     Running   0          23m
csi-rbdplugin-provisioner-fbd45b7c8-dnwsq       6/6     Running   0          23m
csi-rbdplugin-provisioner-fbd45b7c8-rp85z       6/6     Running   0          23m
csi-rbdplugin-s67lw                             3/3     Running   0          23m
csi-rbdplugin-zq4k5                             3/3     Running   0          23m
rook-ceph-mon-a-canary-954dc5cd9-5q8tk          1/1     Running   0          2m9s
rook-ceph-mon-b-canary-b9d6f5594-mcqwc          1/1     Running   0          2m9s
rook-ceph-mon-c-canary-78b48dbfb7-z2t7d         0/1     Pending   0          2m8s
rook-ceph-operator-757d6db48d-x27lm             1/1     Running   0          25m
rook-ceph-tools-75f575489-znbbz                 1/1     Running   0          7m45s
rook-discover-gq489                             1/1     Running   0          24m
rook-discover-p9zlg                             1/1     Running   0          24m
$ kubectl -n rook-ceph get pod -l app=rook-ceph-osd-prepare
No resources found in rook-ceph namespace.

做一些其他操作

$ kubectl taint nodes $(hostname) node-role.kubernetes.io/master:NoSchedule-
$ kubectl -n rook-ceph-system delete pods rook-ceph-operator-757d6db48d-x27lm

创建文件系统

$ kubectl create -f filesystem.yaml

再次检查

$ kubectl get pods -n rook-ceph -o wide
NAME                                              READY   STATUS     RESTARTS   AGE    IP             NODE     NOMINATED NODE   READINESS GATES
csi-cephfsplugin-9c2z9                            3/3     Running    0          135m   192.168.0.53   kube3    <none>           <none>
csi-cephfsplugin-provisioner-7678bcfc46-s67hq     5/5     Running    0          135m   10.1.2.6       kube3    <none>           <none>
csi-cephfsplugin-provisioner-7678bcfc46-sfljd     5/5     Running    0          135m   10.1.2.5       kube3    <none>           <none>
csi-cephfsplugin-smmlf                            3/3     Running    0          135m   192.168.0.52   kube2    <none>           <none>
csi-rbdplugin-provisioner-fbd45b7c8-dnwsq         6/6     Running    0          135m   10.1.1.6       kube2    <none>           <none>
csi-rbdplugin-provisioner-fbd45b7c8-rp85z         6/6     Running    0          135m   10.1.1.5       kube2    <none>           <none>
csi-rbdplugin-s67lw                               3/3     Running    0          135m   192.168.0.52   kube2    <none>           <none>
csi-rbdplugin-zq4k5                               3/3     Running    0          135m   192.168.0.53   kube3    <none>           <none>
rook-ceph-crashcollector-kube2-6d95bb9c-r5w7p     0/1     Init:0/2   0          110m   <none>         kube2    <none>           <none>
rook-ceph-crashcollector-kube3-644c849bdb-9hcvg   0/1     Init:0/2   0          110m   <none>         kube3    <none>           <none>
rook-ceph-mon-a-canary-954dc5cd9-6ccbh            1/1     Running    0          75s    10.1.2.130     kube3    <none>           <none>
rook-ceph-mon-b-canary-b9d6f5594-k85w5            1/1     Running    0          74s    10.1.1.74      kube2    <none>           <none>
rook-ceph-mon-c-canary-78b48dbfb7-kfzzx           0/1     Pending    0          73s    <none>         <none>   <none>           <none>
rook-ceph-operator-757d6db48d-nlh84               1/1     Running    0          110m   10.1.2.28      kube3    <none>           <none>
rook-ceph-tools-75f575489-znbbz                   1/1     Running    0          119m   10.1.1.14      kube2    <none>           <none>
rook-discover-gq489                               1/1     Running    0          135m   10.1.1.3       kube2    <none>           <none>
rook-discover-p9zlg                               1/1     Running    0          135m   10.1.2.4       kube3    <none>           <none>

无法将 pod 视为 rook-ceph-osd-

并且rook-ceph-mon-c-canary-78b48dbfb7-kfzzx pod 始终待处理

如果安装工具箱为

https://rook.io/docs/rook/v1.3/ceph-toolbox.html

$ kubectl create -f toolbox.yaml
$ kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash

在容器内,检查ceph状态

[root@rook-ceph-tools-75f575489-znbbz /]# ceph -s
unable to get monitor info from DNS SRV with service name: ceph-mon
[errno 2] error connecting to the cluster

它在 Ubuntu 16.04.6 上运行。


再次部署

$ kubectl -n rook-ceph get pod -o wide
NAME                                            READY   STATUS    RESTARTS   AGE     IP             NODE     NOMINATED NODE   READINESS GATES
csi-cephfsplugin-4tww8                          3/3     Running   0          3m38s   192.168.0.52   kube2    <none>           <none>
csi-cephfsplugin-dbbfb                          3/3     Running   0          3m38s   192.168.0.53   kube3    <none>           <none>
csi-cephfsplugin-provisioner-7678bcfc46-8kt96   5/5     Running   0          3m37s   10.1.2.6       kube3    <none>           <none>
csi-cephfsplugin-provisioner-7678bcfc46-kq6vv   5/5     Running   0          3m38s   10.1.1.6       kube2    <none>           <none>
csi-rbdplugin-4qrqn                             3/3     Running   0          3m39s   192.168.0.53   kube3    <none>           <none>
csi-rbdplugin-dqx9z                             3/3     Running   0          3m39s   192.168.0.52   kube2    <none>           <none>
csi-rbdplugin-provisioner-fbd45b7c8-7f57t       6/6     Running   0          3m39s   10.1.2.5       kube3    <none>           <none>
csi-rbdplugin-provisioner-fbd45b7c8-9zwhb       6/6     Running   0          3m39s   10.1.1.5       kube2    <none>           <none>
rook-ceph-mon-a-canary-954dc5cd9-rgqpg          1/1     Running   0          2m40s   10.1.1.7       kube2    <none>           <none>
rook-ceph-mon-b-canary-b9d6f5594-n2pwc          1/1     Running   0          2m35s   10.1.2.8       kube3    <none>           <none>
rook-ceph-mon-c-canary-78b48dbfb7-fv46f         0/1     Pending   0          2m30s   <none>         <none>   <none>           <none>
rook-ceph-operator-757d6db48d-2m25g             1/1     Running   0          6m27s   10.1.2.3       kube3    <none>           <none>
rook-discover-lpsht                             1/1     Running   0          5m15s   10.1.1.3       kube2    <none>           <none>
rook-discover-v4l77                             1/1     Running   0          5m15s   10.1.2.4       kube3    <none>           <none>

描述待处理的 Pod

$ kubectl describe pod rook-ceph-mon-c-canary-78b48dbfb7-fv46f -n rook-ceph
Name:           rook-ceph-mon-c-canary-78b48dbfb7-fv46f
Namespace:      rook-ceph
Priority:       0
Node:           <none>
Labels:         app=rook-ceph-mon
                ceph_daemon_id=c
                mon=c
                mon_canary=true
                mon_cluster=rook-ceph
                pod-template-hash=78b48dbfb7
                rook_cluster=rook-ceph
Annotations:    <none>
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  ReplicaSet/rook-ceph-mon-c-canary-78b48dbfb7
Containers:
  mon:
    Image:      rook/ceph:v1.3.4
    Port:       6789/TCP
    Host Port:  0/TCP
    Command:
      /tini
    Args:
      --
      sleep
      3600
    Environment:
      CONTAINER_IMAGE:                ceph/ceph:v14.2.9
      POD_NAME:                       rook-ceph-mon-c-canary-78b48dbfb7-fv46f (v1:metadata.name)
      POD_NAMESPACE:                  rook-ceph (v1:metadata.namespace)
      NODE_NAME:                       (v1:spec.nodeName)
      POD_MEMORY_LIMIT:               node allocatable (limits.memory)
      POD_MEMORY_REQUEST:             0 (requests.memory)
      POD_CPU_LIMIT:                  node allocatable (limits.cpu)
      POD_CPU_REQUEST:                0 (requests.cpu)
      ROOK_CEPH_MON_HOST:             <set to the key 'mon_host' in secret 'rook-ceph-config'>             Optional: false
      ROOK_CEPH_MON_INITIAL_MEMBERS:  <set to the key 'mon_initial_members' in secret 'rook-ceph-config'>  Optional: false
      ROOK_POD_IP:                     (v1:status.podIP)
    Mounts:
      /etc/ceph from rook-config-override (ro)
      /etc/ceph/keyring-store/ from rook-ceph-mons-keyring (ro)
      /var/lib/ceph/crash from rook-ceph-crash (rw)
      /var/lib/ceph/mon/ceph-c from ceph-daemon-data (rw)
      /var/log/ceph from rook-ceph-log (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-65xtn (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  rook-config-override:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rook-config-override
    Optional:  false
  rook-ceph-mons-keyring:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  rook-ceph-mons-keyring
    Optional:    false
  rook-ceph-log:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook/rook-ceph/log
    HostPathType:  
  rook-ceph-crash:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook/rook-ceph/crash
    HostPathType:  
  ceph-daemon-data:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook/mon-c/data
    HostPathType:  
  default-token-65xtn:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-65xtn
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  22s (x3 over 84s)  default-scheduler  0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match pod affinity/anti-affinity, 2 node(s) didn't satisfy existing pods anti-affinity rules.

测试安装

创建nginx.yaml文件

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.7.9
    ports:
    - containerPort: 80
    volumeMounts:
    - name: www
      mountPath: /usr/share/nginx/html
  volumes:
  - name: www
    flexVolume:
      driver: ceph.rook.io/rook
      fsType: ceph
      options:
        fsName: myfs
        clusterNamespace: rook-ceph

部署它并描述 Pod 详细信息

...
Events:
  Type     Reason       Age                    From               Message
  ----     ------       ----                   ----               -------
  Normal   Scheduled    9m28s                  default-scheduler  Successfully assigned default/nginx to kube2
  Warning  FailedMount  9m28s                  kubelet, kube2     Unable to attach or mount volumes: unmounted volumes=[www default-token-fnb28], unattached volumes=[www default-token-fnb28]: failed to get Plugin from volumeSpec for volume "www" err=no volume plugin matched
  Warning  FailedMount  6m14s (x2 over 6m38s)  kubelet, kube2     Unable to attach or mount volumes: unmounted volumes=[www], unattached volumes=[default-token-fnb28 www]: failed to get Plugin from volumeSpec for volume "www" err=no volume plugin matched
  Warning  FailedMount  4m6s (x23 over 9m13s)  kubelet, kube2     Unable to attach or mount volumes: unmounted volumes=[www], unattached volumes=[www default-token-fnb28]: failed to get Plugin from volumeSpec for volume "www" err=no volume plugin matched

最佳答案

rook-ceph-mon-x pod 具有以下关联性:

spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: rook-ceph-mon
        topologyKey: kubernetes.io/hostname

这不允许在同一节点上运行 2 个 rook-ceph-mon pod。 由于您似乎有 3 个节点:1 个主节点和 2 个工作节点,因此创建了 2 个 pod,一个在 kube2 节点上,一个在 kube3 节点上。 kube1 是被污染为不可调度的主节点,因此无法在那里调度 rook-ceph-mon-c。

要解决这个问题,您可以:

  • 再添加一个工作节点
  • 使用 kubectl 污点节点 kube1 key:NoSchedule- 删除 NoSchedule 污点
  • mon count 更改为较低的值

关于kubernetes - 为什么部署rook-ceph后在kubernetes中找不到osd pod?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62045052/

相关文章:

linux - 如何为 CEHP 配置 OSD 容器

storage - Ceph 每个 osd 的 pg 太多 : all you need to know

ceph - 用 Cephadm 配置集群网络?

kubernetes - 从磁盘恢复Rook群集

kubernetes - 在 kubernetes 集群中安装 rook-ceph 后显示 OSD 0

在 Kubernetes 中调试 uWSGI

kubernetes - 在 GKE 上的 Kubernetes Horizo​​ntalPodAutoscaler 上描述的指标是什么?

kubernetes - 如何将用户请求重定向到Kubernetes中的另一个部署?

docker - 在运行时将 (Docker) 环境变量传递到 Vue/Quasar 应用程序

kubernetes - k8s volume.attachments中的节点字段意味着什么?