Ceph enters degraded state after Deis installation

920 views Asked by At

I have successfully upgraded Deis to v1.0.1 with 3 nodes cluster, with each node having 2GB ram, hosted by Digital Ocean.

I then nse'ed into a deis-store-monitor service, ran ceph -s, and realized it has entered active+undersized+degraded state, and never get back to the active+clean state.

Detail messages follow:

root@deis-2:/# ceph -s
libust[276/276]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
  cluster dfa09ba0-66f2-46bb-8d84-12795f281f7d
  health HEALTH_WARN 1536 pgs degraded; 1536 pgs stuck unclean; 1536 pgs undersized; recovery 1314/3939 objects degraded (33.359%)
  monmap e3: 3 mons at {deis-1=10.132.183.190:6789/0,deis-2=10.132.183.191:6789/0,deis-3=10.132.183.192:6789/0}, election epoch 28, quorum 0,1,2 deis-1,deis-2,deis-3
  mdsmap e32: 1/1/1 up {0=deis-1=up:active}, 2 up:standby
  osdmap e77: 3 osds: 2 up, 2 in
   pgmap v109093: 1536 pgs, 12 pools, 897 MB data, 1313 objects
        27342 MB used, 48256 MB / 77175 MB avail
        1314/3939 objects degraded (33.359%)
             1536 active+undersized+degraded
  client io 817 B/s wr, 0 op/s

I am totally new to ceph. I wonder:

  • Is it a big deal to fix this issue, or could I let it be in this state?
  • If it is recommended to fix this, would you point out how should I go about it?

I read about Ceph troubleshooting section and POOL, PG AND CRUSH CONFIG REFERENCE, but still have no idea what I should do next.

Thanks a lot!

1

There are 1 answers

3
Christopher Armstrong On BEST ANSWER

From this output: osdmap e77: 3 osds: 2 up, 2 in. It sounds like one of your deis-store-daemons isn't responding. deisctl restart store-daemon should recover your cluster, but I'd be curious about what happened to that daemon. I'd love to see journalctl --no-pager -u deis-store-daemon on all of your hosts. If you could add your logs to https://github.com/deis/deis/issues/2520 that'd help us figure out why the daemon isn't responding.

Also, 2GB nodes on DO will likely result in performance issues (and Ceph may be unhappy).