我昨天将我的开发 Ceph 集群从 Jewel 更新为 Luminous。一切看起来都很好,直到我运行这个命令“ceph osd require-osd-release luminous”。之后,我的集群中的数据现在完全未知。如果我对任何给定的 pg 进行详细查看,它会显示“active+clean”。集群认为它们是退化且不干净的。这是我所看到的:
粉碎 map
-1 10.05318 root default
-2 3.71764 host cephfs01
0 0.09044 osd.0 up 1.00000 1.00000
1 1.81360 osd.1 up 1.00000 1.00000
2 1.81360 osd.2 up 1.00000 1.00000
-3 3.62238 host cephfs02
3 hdd 1.81360 osd.3 up 1.00000 1.00000
4 hdd 0.90439 osd.4 up 1.00000 1.00000
5 hdd 0.90439 osd.5 up 1.00000 1.00000
-4 2.71317 host cephfs03
6 hdd 0.90439 osd.6 up 1.00000 1.00000
7 hdd 0.90439 osd.7 up 1.00000 1.00000
8 hdd 0.90439 osd.8 up 1.00000 1.00000
健康
cluster:
id: 279e0565-1ab4-46f2-bb27-adcb1461e618
health: HEALTH_WARN
Reduced data availability: 1024 pgs inactive
Degraded data redundancy: 1024 pgs unclean
services:
mon: 2 daemons, quorum cephfsmon02,cephfsmon01
mgr: cephfsmon02(active)
mds: ceph_library-1/1/1 up {0=cephfsmds01=up:active}
osd: 9 osds: 9 up, 9 in; 306 remapped pgs
data:
pools: 2 pools, 1024 pgs
objects: 0 objects, 0 bytes
usage: 0 kB used, 0 kB / 0 kB avail
pgs: 100.000% pgs unknown
1024 unknown
HEALTH_WARN
Reduced data availability: 1024 pgs inactive; Degraded data redundancy: 1024 pgs unclean PG_AVAILABILITY Reduced data availability: 1024 pgs inactive pg 1.e6 is stuck inactive for 2239.530584, current state unknown, last acting [] pg 1.e8 is stuck inactive for 2239.530584, current state unknown, last acting [] pg 1.e9 is stuck inactive for 2239.530584, current state unknown, last acting []
集群中的每个 PG 看起来都是这样。
PG 详细信息
"stats": {
"version": "57'5211",
"reported_seq": "4527",
"reported_epoch": "57",
"state": "active+clean",
由于以下原因,我无法在 pgs 或 osd 上运行清理或修复:
ceph osd 修复 osd.0 无法指示 osd(s) 0 修复(未连接)
有什么想法吗?
最佳答案
问题出在防火墙上。我在每台主机上弹开了防火墙,立即发现了 pg。
关于cluster-computing - Ceph 更新后数据 100% 未知,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46079301/