I updated my dev Ceph cluster yesterday from Jewel to Luminous. Everything was seemingly okay until I ran this command "ceph osd require-osd-release luminous". After that, the data in my cluster is now completely unknown. If I do a detailed view on any given pg, it shows "active+clean". The cluster thinks they're degraded and unclean. Here's what I am seeing:
CRUSH MAP
-1 10.05318 root default
-2 3.71764 host cephfs01
0 0.09044 osd.0 up 1.00000 1.00000
1 1.81360 osd.1 up 1.00000 1.00000
2 1.81360 osd.2 up 1.00000 1.00000
-3 3.62238 host cephfs02
3 hdd 1.81360 osd.3 up 1.00000 1.00000
4 hdd 0.90439 osd.4 up 1.00000 1.00000
5 hdd 0.90439 osd.5 up 1.00000 1.00000
-4 2.71317 host cephfs03
6 hdd 0.90439 osd.6 up 1.00000 1.00000
7 hdd 0.90439 osd.7 up 1.00000 1.00000
8 hdd 0.90439 osd.8 up 1.00000 1.00000
HEALTH
cluster:
id: 279e0565-1ab4-46f2-bb27-adcb1461e618
health: HEALTH_WARN
Reduced data availability: 1024 pgs inactive
Degraded data redundancy: 1024 pgs unclean
services:
mon: 2 daemons, quorum cephfsmon02,cephfsmon01
mgr: cephfsmon02(active)
mds: ceph_library-1/1/1 up {0=cephfsmds01=up:active}
osd: 9 osds: 9 up, 9 in; 306 remapped pgs
data:
pools: 2 pools, 1024 pgs
objects: 0 objects, 0 bytes
usage: 0 kB used, 0 kB / 0 kB avail
pgs: 100.000% pgs unknown
1024 unknown
HEALTH_WARN
Reduced data availability: 1024 pgs inactive; Degraded data redundancy: 1024 pgs unclean PG_AVAILABILITY Reduced data availability: 1024 pgs inactive pg 1.e6 is stuck inactive for 2239.530584, current state unknown, last acting [] pg 1.e8 is stuck inactive for 2239.530584, current state unknown, last acting [] pg 1.e9 is stuck inactive for 2239.530584, current state unknown, last acting []
It looks like this for every PG in the cluster.
PG DETAIL
"stats": {
"version": "57'5211",
"reported_seq": "4527",
"reported_epoch": "57",
"state": "active+clean",
I can't run a scrub or repair on the pgs or osds because of this:
ceph osd repair osd.0 failed to instruct osd(s) 0 to repair (not connected)
Any ideas?
The problem was the firewall. I bounced the firewall on each host and immediately the pgs were found.