Our Mars400's BMC can shut down all nodes in the same chassis. So before a maintenance or BMC firmware update, we need to make CEPH stop IO before shutting down a chassis.


CEPH blog gives the steps before shutting down hosts on the CEPH cluster as this link show: how-to-do-a-ceph-cluster-maintenance-shutdown/


Here are the steps to do before shutting down a chassis or the cluster:


  1. Stop your client's I/O.
  2. Unmount ceph on client.

  3. Set OSDs flags

  4. Login to a monitor node and set the following command


  5. # ceph osd set noout
    # ceph osd set nobackfill
    # ceph osd set norecover
    # ceph osd set norebalance
    # ceph osd set nodown
    # ceph osd set pause (this is not necessary if you only shut down a chassis)




  6. Log in to BMC and use the "standby" command to set the Mars400 to be standby.

    or use the command line to make the chassis power become stand by. 

    ssh root@<BMC IP> -t "/usr/bin/bmcpipe \"standby ;\""



============================================

     BMC Simple Command Line Interface

============================================

Input Command.  Enter 'H' or 'help' to display available commands.

    {C} console            : Console access to module/switch

    {S} status             : Show current machine status

    {A} advance            : Advance module control functions

    {M} maintenance        : Advance device maintenance helper comamnds

    {Q} logout             : Logout

    {V} version            : Show software version

    {H} help               : Show help message

============================================

Enter command >>>standby


    

4.  If you want to power off other nodes, please repeat step 2 for other Mars400.

After about 3 mins for graceful shutdown of the nodes, BMC will make the power supply go into standby mode.


After you complete the hardware maintenance, power on the Mars400 by BMC, unset all the flags, and wait for the cluster to become healthy.

# ceph osd unset noout
# ceph osd unset nobackfill
# ceph osd unset norecover
# ceph osd unset norebalance
# ceph osd unset nodown
# ceph osd unset pause