Our Mars400's BMC can shut down all nodes in the same chassis. So before a maintenance or BMC firmware update, we need to make CEPH stop IO before shutting down a chassis.


CEPH blog gives the steps before shutting down hosts on the CEPH cluster as this link show: how-to-do-a-ceph-cluster-maintenance-shutdown/


Here are the steps to do before shutdown a chassis or the cluster:


  1. Set OSDs flags

    # ceph osd set noout
    # ceph osd set nobackfill
    # ceph osd set norecover
    # ceph osd set norebalance
    # ceph osd set nodown
    # ceph osd set pause (this is not necessary if you only shut down a chassis)


  2. Log in to BMC and make chassis become standby

    or use to command line to make the chassis power become stand by. 

    ssh root@<BMC IP> -t "/usr/bin/bmcpipe \"standby ;\""


After about 3 mins for graceful shutdown of the nodes, BMC will make the power supply become standby mode.


After you complete the hardware maintenance, power on the Mars400 by BMC, unset all the flags, and wait for the cluster to become healthy.

# ceph osd unset noout
# ceph osd unset nobackfill
# ceph osd unset norecover
# ceph osd unset norebalance
# ceph osd unset nodown
# ceph osd set pause (this is not necessary if you only shut down a chassis)