CRUSH is designed to approximate a uniform probability distribution for write requests that assign new data objects PGs and PGs to OSDs. Despite the CRUSH design, it is possible for clusters to become imbalanced for various reasons. If this occurs, set the OSD weight by utilization.
OSD weight imbalance can occur from various reasons, for example:
- Multiple Pools: You can assign multiple pools to a CRUSH hierarchy, but the pools might have different numbers of placement groups, size (number of replicas to store), and object size characteristics.
- Custom Clients: Ceph clients such as block device, object gateway, and filesystem share data from their clients and stripe the data as objects across the cluster as uniform-sized smaller RADOS objects. So except for the foregoing scenario, CRUSH usually achieves its goal. However, there is another case where a cluster can become imbalanced: namely, using librados to store data without normalizing the size of objects. This scenario can lead to imbalanced clusters (for example, storing 100 1-MB objects and 10 4-MB objects will make a few OSDs have more data than the others).
- Probability: A uniform distribution will result in some OSDs with more PGs and some with less. For clusters with a large number of OSDs, the statistical outliers will be further out.
You can reweight OSDs by utilization by executing the following:
Syntax
ceph osd reweight-by-utilization [THRESHOLD] [WEIGHT_CHANGE_AMOUNT] [NUMBER_OF_OSDS] [--no-increasing]
Example
[ceph: root@host01 /]# ceph osd test-reweight-by-utilization 110 .5 4 --no-increasing
Where:
- threshold is a percentage of utilization such that OSDs facing higher data storage loads will receive a lower weight and thus fewer PGs assigned to them. The default value is 120, reflecting 120%. Any value from 100+ is a valid threshold. Optional.
- weight_change_amount is the amount to change the weight. Valid values are greater than 0.0 - 1.0. The default value is 0.05. Optional.
- number_of_OSDs is the maximum number of OSDs to reweight. For large clusters, limiting the number of OSDs to reweight prevents significant rebalancing. Optional.
- no-increasing is off by default. Increasing the osd weight is allowed when using the reweight-by-utilization or test-reweight-by-utilization commands. If this option is used with these commands, it prevents the OSD weight from increasing, even if the OSD is underutilized. Optional.
Important: Executing reweight-by-utilization
is recommended and somewhat inevitable for large clusters. Utilization rates might change over time, and as your cluster size or hardware changes, the weightings might need to be updated to reflect changing utilization. If you elect to reweight by utilization, you might need to re-run this command as utilization, hardware or cluster size change.
Executing this or other weight commands that assign a weight will override the weight assigned by this command (for example, osd reweight-by-utilization
, osd crush weight
, osd weight
, in
or out
).