Discourage rebalance, warn against stopping it #1298

feorlen · 2024-08-14T22:37:30Z

Starting point for temporary guidance while we fix issues with stopping a rebalance operation.

Staged

Warning on stop for older versions:

http://192.241.195.202:9000/staging/rebalance-stop-guidance/linux/reference/minio-mc-admin/mc-admin-rebalance.html

Discourage manual rebalance:

Mention unbalanced pool capacity in hardware checklist:
http://192.241.195.202:9000/staging/rebalance-stop-guidance/linux/operations/checklists/hardware.html#use-consistent-drive-type-and-capacity

cc @kannappanr @harshavardhana @krisis

djwfyi

A few suggestions and comments for consideration.
I'll take another look after others have had their say.

source/operations/concepts.rst

source/reference/minio-mc-admin/mc-admin-rebalance.rst

source/operations/install-deploy-manage/expand-minio-deployment.rst

kannappanr · 2024-08-20T17:26:49Z

source/operations/concepts.rst

@@ -149,12 +149,14 @@ For more information on write preference calculation logic, see :ref:`Writing Fi
 Rebalancing data across all pools after an expansion is an expensive operation that requires scanning the entire deployment and moving objects between pools.
 This may take a long time to complete depending on the amount of data to move.

-Starting with MinIO Client version RELEASE.2022-11-07T23-47-39Z, you can manually initiate a rebalancing operation across all server pools using :mc:`mc admin rebalance`. 
+MinIO does not recommend manual rebalancing.


Can we remove the recommendation text?

@kannappanr you mean we should say it's ok to manually rebalance?

Should we instead use a cautionary statement of using only with consultation with MinIO Engineering?
The ask we had for this PR was specifically to discourage use of this feature.

Correct we have made a release already, and the fixes are also in the EOS binaries so to some extent we have addressed this already.

We should perhaps talk about a more broader tone that rebalance is not a real requirement if you size your pools properly.

Basically discourage budget setups

don't expand in this manner

first pool

100 nodes which is now 90% used

you botched buying hardware now you just expand by 20 nodes

20 nodes is second pool

This 20 nodes will take all the I/O hit causing significant slowness, the sizing must be appropriate to the load that 100 node was handling. if 20 can handle and its a new hardware no problem but if its not then it is going to cause outage etc.

A cautionary guidance on why rebalance don't solve the problem of high utilization the second pool. IT may look like that but it won't solve the problem.

We mention the "2 years" guidance in the hardware checklist, although there isn't a specific and obvious section along the lines of "How big should I make my pool?" (Noticed that AJ's blog from January recommends minimum 3 years of capacity.)

I think we can reinforce this in the Storage section of the hardware checklist, maybe mention it elsewhere too. Like the concepts page. To make the point that tacking on a bit of new capacity here and there doesn't go well and is not a reliable plan.

feorlen · 2024-08-22T23:19:33Z

source/operations/checklists/hardware.rst

+For deployments with multiple server pools, each individual pool may have its own hardware configuration.
+However, significant capacity differences between pools may temporarily result in high loads on a new pool's nodes during :ref:`expansion <expand-minio-distributed>`. For more information, see :ref:`How do I manage object distribution across a MinIO deployment? <minio-rebalance>`


Does it make sense to say this in the hardware checklist?

feorlen · 2024-08-22T23:20:55Z

source/operations/concepts.rst

+As the new pool fills, write operations eventually balance out across all pools in the deployment.
+Until then, the new pool's nodes may experience higher loads and slower writes.
+
+To reduce this temporary performance impact, MinIO recommends expanding a deployment well before its existing pools are near capacity and with new pools of a similar size.
+For more information on write preference calculation logic, see :ref:`Writing Files <minio-writing-files>`.


Accurate? Sufficient?

Other mentions of pool sizing link to this section

feorlen · 2024-08-22T23:23:53Z

source/operations/install-deploy-manage/expand-minio-deployment.rst

+Since a pool with more free space has a higher probability of being written to, the nodes of that pool may experience higher loads as free space equalizes.
+
+If required, you can manually initiate a rebalance procedure with :mc:`mc admin rebalance`.


Explain what happens if pools have very different available free space. Is this text an accurate characterization?

feorlen · 2024-08-22T23:25:03Z

source/reference/minio-mc-admin/mc-admin-rebalance.rst

+   .. admonition:: Stopping a rebalance job on previous versions of MinIO may cause data loss
+      :class: warning
+
+      A bug in MinIO prior to :minio-release:`RELEASE.2024-08-17T01-24-54Z` can overwrite objects while stopping a in-progress rebalance operation. 
+      Interrupting rebalance on these older versions may result in data loss.
+


Is there a usual way we reference things like this? What should we say about the now fixed bug?

feorlen · 2024-08-22T23:30:19Z

@kannappanr @harshavardhana made several edits with proposed text that is less scary about rebalance. Appreciate another look.

Left the warning about stop, but for older versions. What should we say about that?

draft: discourage rebalance, warn against stopping it

a9bfde7

feorlen marked this pull request as draft August 14, 2024 22:37

cniackz assigned feorlen Aug 14, 2024

djwfyi requested changes Aug 15, 2024

View reviewed changes

kannappanr reviewed Aug 20, 2024

View reviewed changes

additional discussion of mismatched pool sizes

40606ce

feorlen commented Aug 22, 2024

View reviewed changes

feorlen requested a review from ravindk89 August 23, 2024 13:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discourage rebalance, warn against stopping it #1298

Discourage rebalance, warn against stopping it #1298

feorlen commented Aug 14, 2024 •

edited

Loading

djwfyi left a comment

kannappanr Aug 20, 2024

feorlen Aug 20, 2024

djwfyi Aug 20, 2024

harshavardhana Aug 20, 2024

harshavardhana Aug 20, 2024

harshavardhana Aug 20, 2024

feorlen Aug 21, 2024

feorlen Aug 22, 2024

feorlen Aug 22, 2024

feorlen Aug 22, 2024

feorlen Aug 22, 2024

feorlen commented Aug 22, 2024

		For deployments with multiple server pools, each individual pool may have its own hardware configuration.
		However, significant capacity differences between pools may temporarily result in high loads on a new pool's nodes during :ref:`expansion <expand-minio-distributed>`. For more information, see :ref:`How do I manage object distribution across a MinIO deployment? <minio-rebalance>`

		Since a pool with more free space has a higher probability of being written to, the nodes of that pool may experience higher loads as free space equalizes.

		If required, you can manually initiate a rebalance procedure with :mc:`mc admin rebalance`.

Discourage rebalance, warn against stopping it #1298

Are you sure you want to change the base?

Discourage rebalance, warn against stopping it #1298

Conversation

feorlen commented Aug 14, 2024 • edited Loading

djwfyi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

feorlen commented Aug 22, 2024

feorlen commented Aug 14, 2024 •

edited

Loading