Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/zenko 4583 reconfigure mongodb rollback time #1870

Merged

Conversation

williamlardier
Copy link
Contributor

@williamlardier williamlardier commented May 15, 2023

This PR also bumps some projects to mutualize PRs

The goal of this change is to improve the user experience with mongodb, more than providing a real fix.

If one mongodb instance is down for too long today (1day by default), and some operations were done on the cluster more than 1d before the down mongodb instance comes back, an error will be thrown, where mongodb refuses to start, as the first oplog entry to recover is too old.

We must distinguish 2 scenarios here:

  • All the cluster was down for more than 1 day OR only a subset of the mongodb instances were down, but no operations were completed during the downtime duration: in this case, the fix will help a bit being more flexible, allowing for a full "weekend" to complete before restarting the service, that will be able to come back without issue. In this scenario the oplog was not overriden, so having a higher rollback duration here is fine.

  • A subset of the mongodb instances were down, but the oplog was overriden during the downtime duration. In this case, we have 2 more cases. A too high rollback duration could cause dead loops, with a mongodb not able to recover before the oplog is overriden, or alternatively, the oplog could be already overriden, and this change has no effect as a full init sync will be required.

To sum up, the main motivation of this change is UX, to be flexible if something went wrong during a weekend. The choice of 3 days vs 1 day is the most efficient value to avoid as much as possible the dead loops, while improving UX.

To complete this work, we document several procedures to:

Note 1: Tests showed that under load, the oplog is overriden in less than 1h.

Note 2: I saw strange cases in AWS with snapshots where the issue would arise or not, using the same snapshot... Even after seeting mongodb in readonly. Let's hope this behavior got fixed in 4.4, that brings some improvements on this side.

@bert-e
Copy link
Contributor

bert-e commented May 15, 2023

Hello williamlardier,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Status report is not available.

@williamlardier williamlardier force-pushed the bugfix/ZENKO-4583-reconfigure-mongodb-rollback-time branch 3 times, most recently from 4448497 to 6869960 Compare May 15, 2023 16:19
mongodbExtraFlags: []
mongodbExtraFlags:
- "--setParameter rollbackTimeLimitSecs=259200"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we set many parameters through the makefile as well : any specific reason we set this here instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the Makefile for now does not work anymore. Charts are not stored at the same place and they put in place a 6 month retention policy AFAIK. I am planning to work on it with the 4.4 upgrade.

I can add it to a patch file though, so that we don't loose this change later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done here: 2345e95

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the Makefile for now does not work anymore. Charts are not stored at the same place and they put in place a 6 month retention policy AFAIK. I am planning to work on it with the 4.4 upgrade.

i meant the makefile used to render the chart and generate the Zenko-Base ISO, which does not depend on the url or 6-month retention, but just renders the chart with some specific values :

(looks like it is a build.sh script and not a makefile, sorry for the confusion)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok I see, then I will make the appropriate changes 😄

I also foudn the workaround for the Makefile, putting it here but I'll do it later:

CHART_REPO:="https://raw.githubusercontent.com/bitnami/charts/archive-full-index/bitnami"

Source: bitnami/charts#10539

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (and rebased) in 456d0a1

Copy link
Contributor Author

@williamlardier williamlardier May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update after some rebases: done in 08c20c4 for the sharded mode only. I do not want to ointroduce more entropy in this PR, and the fix is only for an Artesca use case, so sharded.

@williamlardier williamlardier force-pushed the bugfix/ZENKO-4583-reconfigure-mongodb-rollback-time branch 9 times, most recently from b7a8854 to 6a96212 Compare May 22, 2023 09:52
@williamlardier
Copy link
Contributor Author

/reset

@bert-e
Copy link
Contributor

bert-e commented May 22, 2023

Reset complete

I have successfully deleted this pull request's integration branches.

@scality scality deleted a comment from bert-e May 22, 2023
@scality scality deleted a comment from bert-e May 22, 2023
@williamlardier
Copy link
Contributor Author

/status

@bert-e
Copy link
Contributor

bert-e commented May 22, 2023

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

@scality scality deleted a comment from bert-e May 22, 2023
@scality scality deleted a comment from bert-e May 22, 2023
@scality scality deleted a comment from bert-e May 22, 2023
@scality scality deleted a comment from bert-e May 22, 2023
@williamlardier williamlardier force-pushed the bugfix/ZENKO-4583-reconfigure-mongodb-rollback-time branch from c9fa062 to 65df95b Compare May 22, 2023 13:26
@bert-e
Copy link
Contributor

bert-e commented May 22, 2023

History mismatch

Merge commit #8186192eee5f2740a79aadff9e13712f0488aaee on the integration branch
w/2.7/bugfix/ZENKO-4583-reconfigure-mongodb-rollback-time is merging a branch which is neither the current
branch bugfix/ZENKO-4583-reconfigure-mongodb-rollback-time nor the development branch
development/2.7.

It is likely due to a rebase of the branch bugfix/ZENKO-4583-reconfigure-mongodb-rollback-time and the
merge is not possible until all related w/* branches are deleted or updated.

Please use the reset command to have me reinitialize these branches.

@williamlardier williamlardier force-pushed the bugfix/ZENKO-4583-reconfigure-mongodb-rollback-time branch from 65df95b to 4159306 Compare May 22, 2023 13:50
@williamlardier
Copy link
Contributor Author

/reset

@bert-e
Copy link
Contributor

bert-e commented May 22, 2023

Reset complete

I have successfully deleted this pull request's integration branches.

@scality scality deleted a comment from bert-e May 22, 2023
@bert-e
Copy link
Contributor

bert-e commented May 22, 2023

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

@williamlardier
Copy link
Contributor Author

/approve

@bert-e
Copy link
Contributor

bert-e commented May 22, 2023

In the queue

The changeset has received all authorizations and has been added to the
relevant queue(s). The queue(s) will be merged in the target development
branch(es) as soon as builds have passed.

The changeset will be merged in:

  • ✔️ development/2.6

  • ✔️ development/2.7

The following branches will NOT be impacted:

  • development/2.4
  • development/2.5

There is no action required on your side. You will be notified here once
the changeset has been merged. In the unlikely event that the changeset
fails permanently on the queue, a member of the admin team will
contact you to help resolve the matter.

IMPORTANT

Please do not attempt to modify this pull request.

  • Any commit you add on the source branch will trigger a new cycle after the
    current queue is merged.
  • Any commit you add on one of the integration branches will be lost.

If you need this pull request to be removed from the queue, please contact a
member of the admin team now.

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented May 22, 2023

I have successfully merged the changeset of this pull request
into targetted development branches:

  • ✔️ development/2.6

  • ✔️ development/2.7

The following branches have NOT changed:

  • development/2.4
  • development/2.5

Please check the status of the associated issue ZENKO-4583.

Goodbye williamlardier.

@bert-e bert-e merged commit 4159306 into development/2.6 May 22, 2023
@bert-e bert-e deleted the bugfix/ZENKO-4583-reconfigure-mongodb-rollback-time branch May 22, 2023 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants