-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/zenko 4583 reconfigure mongodb rollback time #1870
Bugfix/zenko 4583 reconfigure mongodb rollback time #1870
Conversation
Hello williamlardier,My role is to assist you with the merge of this Status report is not available. |
4448497
to
6869960
Compare
mongodbExtraFlags: [] | ||
mongodbExtraFlags: | ||
- "--setParameter rollbackTimeLimitSecs=259200" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we set many parameters through the makefile as well : any specific reason we set this here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the Makefile for now does not work anymore. Charts are not stored at the same place and they put in place a 6 month retention policy AFAIK. I am planning to work on it with the 4.4 upgrade.
I can add it to a patch file though, so that we don't loose this change later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done here: 2345e95
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the Makefile for now does not work anymore. Charts are not stored at the same place and they put in place a 6 month retention policy AFAIK. I am planning to work on it with the 4.4 upgrade.
i meant the makefile used to render the chart and generate the Zenko-Base ISO, which does not depend on the url or 6-month retention, but just renders the chart with some specific values :
Line 128 in 40ea563
helm template ${MONGODB_NAME} ${CHART_PATH} -n ${MONGODB_NAMESPACE} \ Line 166 in 40ea563
helm template ${MONGODB_SHARDED_NAME} ${CHART_PATH} -n ${MONGODB_NAMESPACE} \
(looks like it is a build.sh script and not a makefile, sorry for the confusion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ok I see, then I will make the appropriate changes 😄
I also foudn the workaround for the Makefile, putting it here but I'll do it later:
CHART_REPO:="https://raw.githubusercontent.com/bitnami/charts/archive-full-index/bitnami"
Source: bitnami/charts#10539
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (and rebased) in 456d0a1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update after some rebases: done in 08c20c4 for the sharded mode only. I do not want to ointroduce more entropy in this PR, and the fix is only for an Artesca use case, so sharded.
b7a8854
to
6a96212
Compare
/reset |
Reset completeI have successfully deleted this pull request's integration branches. |
/status |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
|
c9fa062
to
65df95b
Compare
History mismatchMerge commit #8186192eee5f2740a79aadff9e13712f0488aaee on the integration branch It is likely due to a rebase of the branch Please use the |
65df95b
to
4159306
Compare
/reset |
Reset completeI have successfully deleted this pull request's integration branches. |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
|
/approve |
In the queueThe changeset has received all authorizations and has been added to the The changeset will be merged in:
The following branches will NOT be impacted:
There is no action required on your side. You will be notified here once IMPORTANT Please do not attempt to modify this pull request.
If you need this pull request to be removed from the queue, please contact a The following options are set: approve |
I have successfully merged the changeset of this pull request
The following branches have NOT changed:
Please check the status of the associated issue ZENKO-4583. Goodbye williamlardier. |
This PR also bumps some projects to mutualize PRs
The goal of this change is to improve the user experience with mongodb, more than providing a real fix.
If one mongodb instance is down for too long today (1day by default), and some operations were done on the cluster more than 1d before the down mongodb instance comes back, an error will be thrown, where mongodb refuses to start, as the first oplog entry to recover is too old.
We must distinguish 2 scenarios here:
All the cluster was down for more than 1 day OR only a subset of the mongodb instances were down, but no operations were completed during the downtime duration: in this case, the fix will help a bit being more flexible, allowing for a full "weekend" to complete before restarting the service, that will be able to come back without issue. In this scenario the oplog was not overriden, so having a higher rollback duration here is fine.
A subset of the mongodb instances were down, but the oplog was overriden during the downtime duration. In this case, we have 2 more cases. A too high rollback duration could cause dead loops, with a mongodb not able to recover before the oplog is overriden, or alternatively, the oplog could be already overriden, and this change has no effect as a full init sync will be required.
To sum up, the main motivation of this change is UX, to be flexible if something went wrong during a weekend. The choice of 3 days vs 1 day is the most efficient value to avoid as much as possible the dead loops, while improving UX.
To complete this work, we document several procedures to:
(see https://github.com/scality/artesca/pull/1668)
Note 1: Tests showed that under load, the oplog is overriden in less than 1h.
Note 2: I saw strange cases in AWS with snapshots where the issue would arise or not, using the same snapshot... Even after seeting mongodb in readonly. Let's hope this behavior got fixed in 4.4, that brings some improvements on this side.