Data 100% unknown after Ceph Update

4.3k views Asked by At

I updated my dev Ceph cluster yesterday from Jewel to Luminous. Everything was seemingly okay until I ran this command "ceph osd require-osd-release luminous". After that, the data in my cluster is now completely unknown. If I do a detailed view on any given pg, it shows "active+clean". The cluster thinks they're degraded and unclean. Here's what I am seeing:

CRUSH MAP

-1       10.05318 root default                              
-2        3.71764     host cephfs01                         
 0        0.09044         osd.0         up  1.00000 1.00000 
 1        1.81360         osd.1         up  1.00000 1.00000 
 2        1.81360         osd.2         up  1.00000 1.00000 
-3        3.62238     host cephfs02                         
 3   hdd  1.81360         osd.3         up  1.00000 1.00000 
 4   hdd  0.90439         osd.4         up  1.00000 1.00000 
 5   hdd  0.90439         osd.5         up  1.00000 1.00000 
-4        2.71317     host cephfs03                         
 6   hdd  0.90439         osd.6         up  1.00000 1.00000 
 7   hdd  0.90439         osd.7         up  1.00000 1.00000 
 8   hdd  0.90439         osd.8         up  1.00000 1.00000 

HEALTH

  cluster:
    id:     279e0565-1ab4-46f2-bb27-adcb1461e618
    health: HEALTH_WARN
            Reduced data availability: 1024 pgs inactive
            Degraded data redundancy: 1024 pgs unclean

  services:
    mon: 2 daemons, quorum cephfsmon02,cephfsmon01
    mgr: cephfsmon02(active)
    mds: ceph_library-1/1/1 up  {0=cephfsmds01=up:active}
    osd: 9 osds: 9 up, 9 in; 306 remapped pgs

  data:
    pools:   2 pools, 1024 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:     100.000% pgs unknown
             1024 unknown

HEALTH_WARN

Reduced data availability: 1024 pgs inactive; Degraded data redundancy: 1024 pgs unclean PG_AVAILABILITY Reduced data availability: 1024 pgs inactive pg 1.e6 is stuck inactive for 2239.530584, current state unknown, last acting [] pg 1.e8 is stuck inactive for 2239.530584, current state unknown, last acting [] pg 1.e9 is stuck inactive for 2239.530584, current state unknown, last acting []

It looks like this for every PG in the cluster.

PG DETAIL

"stats": {
                "version": "57'5211",
                "reported_seq": "4527",
                "reported_epoch": "57",
                "state": "active+clean",

I can't run a scrub or repair on the pgs or osds because of this:

ceph osd repair osd.0 failed to instruct osd(s) 0 to repair (not connected)

Any ideas?

1

There are 1 answers

0
A.Keen On BEST ANSWER

The problem was the firewall. I bounced the firewall on each host and immediately the pgs were found.