Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib/repo-pull: Retry pulls without static deltas if they fail #1612

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pwithnall
Copy link
Member

If pulling a static delta fails (due to one part of it missing from a
mirror, for example), retry the entire pull operation with
disable-static-deltas=true.

WIP: This needs some work to tidy it up.

Signed-off-by: Philip Withnall [email protected]

https://phabricator.endlessm.com/T22873


This is a WIP commit, mostly submitted as a PR to get feedback on the concept. If you’re happy with the concept, I’ll clean the code up. I was wondering whether this is something which you might want controlled by an option (e.g. disable-static-deltas-on-failure) or always enabled.

@pwithnall pwithnall self-assigned this Jun 5, 2018
If pulling a static delta fails (due to one part of it missing from a
mirror, for example), retry the entire pull operation with
disable-static-deltas=true.

WIP: This needs some work to tidy it up.

Signed-off-by: Philip Withnall <[email protected]>

https://phabricator.endlessm.com/T22873
pwithnall added a commit to endlessm/ostree that referenced this pull request Jun 6, 2018
If pulling a static delta fails (due to one part of it missing from a
mirror, for example), retry the entire pull operation with
disable-static-deltas=true.

This is a functional, but ugly approach to fixing the problem, and a
neater solution may be adopted upstream in future. If so, this commit
should be dropped and replaced with the upstream solution:

ostreedev/ostree#1612

Signed-off-by: Philip Withnall <[email protected]>

https://phabricator.endlessm.com/T22873
pwithnall added a commit to endlessm/eos-updater that referenced this pull request Jun 6, 2018
Currently, we manually handle the case of pulling a broken static delta
failing, by checking for G_IO_ERROR_NOT_FOUND from a pull, and retrying
with disable-static-deltas=true set.

In future, this may be handled internally by libostree, and hence we
could eventually drop our handling for it.

See ostreedev/ostree#1612.

Signed-off-by: Philip Withnall <[email protected]>

https://phabricator.endlessm.com/T22873
@jlebon
Copy link
Member

jlebon commented Jun 6, 2018

We already fallback if we fail to fetch the superblock, right? So this seems to be specifically about falling back if we fail to fetch one of the delta parts? I'm not sure how likely it is that we'd successfully fetch the superblock but fail on one of the delta parts. (Or is this something you've seen happen in the wild more than once?)

@jlebon jlebon added the WIP label Jun 6, 2018
@pwithnall
Copy link
Member Author

Having looked back to the code which we originally added in eos-updater for this (about a year ago), it’s to fix this error:

While pulling app/com.endlessm.farming.en/x86_64/eos3 from remote eos-apps: opcode set-read-source: No such file object 4b52c218e72c8155f31f25eef936094ec2ea60fb4351cf2ad22c7f264817ee74

The analysis we did concluded that there was nothing wrong with the repository on the server (or mirror), but that the client’s repository was missing an object from the previous commit, so the delta could not be applied. I think the mechanism for that object going missing locally in the first place was a lack of locking between the updater and gnome-software, with a prune operation by one process racing with an update by the other. That’s been solved with the locking stuff dbn introduced though.

While that cause of a corrupt repository has been fixed, it’s conceivable that a local repository could get corrupted another way. It would be nice to be robust to that kind of failure, given that the non-static-delta code already is.

@jlebon
Copy link
Member

jlebon commented Jun 8, 2018

Thanks for the context. Wrapping the whole function still feels a bit strong-handed though. This is something ostree fsck could help recover from too, right? (It marks commits as partial if there are missing objects, so it wouldn't be usable as a delta FROM in a follow-up pull). We could also just do this on the fly by checking that the commit doesn't have missing objects in get_best_static_delta_start_for?

@cgwalters
Copy link
Member

We could also just do this on the fly by checking that the commit doesn't have missing objects in get_best_static_delta_start_for?

We already do that today - I think I added it for the pull --commit-metadata-only but it applies equally to missing objects.

Won't help existing systems though - those need a new ostree fsck run with ostree 2018.5. Which...maybe EOS could add a unit to force a one-time fsck? We need to make that operation cheaper though, Rob was already talking about how expensive prunes are on rotational drives.

Perhaps the simplest hack is to - if we encounter ENOENT when loading an object (during delta processing), drop into the fsck code and mark it partial.

@jlebon
Copy link
Member

jlebon commented Jun 8, 2018

We already do that today

We're only checking for OSTREE_REPO_COMMIT_STATE_PARTIAL in that path, right? I meant actually traversing the commit in the non-partial case to make sure it really is complete before using it as a from commit. We don't have to verify checksums, just that the files exist. That would at least put it on par with what we do in the non-delta case; right now when scanning a dirtree, if a file already exists, we just skip over it (but we don't verify its checksum).

Perhaps the simplest hack is to - if we encounter ENOENT when loading an object (during delta processing), drop into the fsck code and mark it partial.

Hmm, and a message to retry the operation? Not as magical, but that works too. (Actually, we probably should print something however this gets implemented, since missing objects means something went wrong.)

pwithnall added a commit to endlessm/ostree that referenced this pull request Jun 11, 2018
If pulling a static delta fails (due to one part of it missing from a
mirror, for example), retry the entire pull operation with
disable-static-deltas=true.

This is a functional, but ugly approach to fixing the problem, and a
neater solution may be adopted upstream in future. If so, this commit
should be dropped and replaced with the upstream solution:

ostreedev/ostree#1612

Signed-off-by: Philip Withnall <[email protected]>

https://phabricator.endlessm.com/T22873
pwithnall added a commit to endlessm/eos-updater that referenced this pull request Jun 11, 2018
Currently, we manually handle the case of pulling a broken static delta
failing, by checking for G_IO_ERROR_NOT_FOUND from a pull, and retrying
with disable-static-deltas=true set.

In future, this may be handled internally by libostree, and hence we
could eventually drop our handling for it.

See ostreedev/ostree#1612.

Signed-off-by: Philip Withnall <[email protected]>

https://phabricator.endlessm.com/T22873
@pwithnall
Copy link
Member Author

We already do that today

We're only checking for OSTREE_REPO_COMMIT_STATE_PARTIAL in that path, right?

Indeed, that’s how I read the function. I don’t see anything which could detect if an object referenced by a commit is missing; just code to detect if the commit itself is missing.

I meant actually traversing the commit in the non-partial case to make sure it really is complete before using it as a from commit. We don't have to verify checksums, just that the files exist. That would at least put it on par with what we do in the non-delta case; right now when scanning a dirtree, if a file already exists, we just skip over it (but we don't verify its checksum).

That makes sense to me, though I guess there could be a concern about statting the entire OS several times over before settling on a best from-commit to use for the static delta. It would be done on every pull, too, not just if the delta code found that an object was missing.

Perhaps the simplest hack is to - if we encounter ENOENT when loading an object (during delta processing), drop into the fsck code and mark it partial.

Hmm, and a message to retry the operation? Not as magical, but that works too. (Actually, we probably should print something however this gets implemented, since missing objects means something went wrong.)

I’d be much more in favour of something magical, which doesn’t require manually retrying the operation. Marking a commit as partial if ENOENT is encountered when applying the delta seems sensible anyway. If it could then cause the pull to be retried, that would work — get_best_static_delta_for() would then skip over the newly-marked-as-partial commit, and either find a different from-commit, or fall back to a non-static-delta pull.

mwleeds pushed a commit to endlessm/ostree that referenced this pull request Jul 19, 2018
If pulling a static delta fails (due to one part of it missing from a
mirror, for example), retry the entire pull operation with
disable-static-deltas=true.

This is a functional, but ugly approach to fixing the problem, and a
neater solution may be adopted upstream in future. If so, this commit
should be dropped and replaced with the upstream solution:

ostreedev/ostree#1612

Signed-off-by: Philip Withnall <[email protected]>

https://phabricator.endlessm.com/T22873

Rebase 2018.6 (T23138): Fix a minor merge conflict due to "#ifdef
  OSTREE_ENABLE_EXPERIMENTAL_API" being gone now.
mwleeds pushed a commit to endlessm/ostree that referenced this pull request Jul 20, 2018
If pulling a static delta fails (due to one part of it missing from a
mirror, for example), retry the entire pull operation with
disable-static-deltas=true.

This is a functional, but ugly approach to fixing the problem, and a
neater solution may be adopted upstream in future. If so, this commit
should be dropped and replaced with the upstream solution:

ostreedev/ostree#1612

Signed-off-by: Philip Withnall <[email protected]>

https://phabricator.endlessm.com/T22873

Rebase 2018.6 (T23138): Fix a minor merge conflict due to "#ifdef
  OSTREE_ENABLE_EXPERIMENTAL_API" being gone now.
mwleeds pushed a commit to endlessm/ostree that referenced this pull request Jul 21, 2018
If pulling a static delta fails (due to one part of it missing from a
mirror, for example), retry the entire pull operation with
disable-static-deltas=true.

This is a functional, but ugly approach to fixing the problem, and a
neater solution may be adopted upstream in future. If so, this commit
should be dropped and replaced with the upstream solution:

ostreedev/ostree#1612

Signed-off-by: Philip Withnall <[email protected]>

https://phabricator.endlessm.com/T22873

Rebase 2018.6 (T23138): Fix a minor merge conflict due to "#ifdef
  OSTREE_ENABLE_EXPERIMENTAL_API" being gone now.
mwleeds pushed a commit to endlessm/ostree that referenced this pull request Oct 16, 2018
If pulling a static delta fails (due to one part of it missing from a
mirror, for example), retry the entire pull operation with
disable-static-deltas=true.

This is a functional, but ugly approach to fixing the problem, and a
neater solution may be adopted upstream in future. If so, this commit
should be dropped and replaced with the upstream solution:

ostreedev/ostree#1612

Signed-off-by: Philip Withnall <[email protected]>

https://phabricator.endlessm.com/T22873

Rebase 2018.6 (T23138): Fix a minor merge conflict due to "#ifdef
  OSTREE_ENABLE_EXPERIMENTAL_API" being gone now.
mwleeds pushed a commit to endlessm/ostree that referenced this pull request Oct 22, 2018
If pulling a static delta fails (due to one part of it missing from a
mirror, for example), retry the entire pull operation with
disable-static-deltas=true.

This is a functional, but ugly approach to fixing the problem, and a
neater solution may be adopted upstream in future. If so, this commit
should be dropped and replaced with the upstream solution:

ostreedev/ostree#1612

Signed-off-by: Philip Withnall <[email protected]>

https://phabricator.endlessm.com/T22873

Rebase 2018.6 (T23138): Fix a minor merge conflict due to "#ifdef
  OSTREE_ENABLE_EXPERIMENTAL_API" being gone now.
mwleeds pushed a commit to endlessm/ostree that referenced this pull request Oct 25, 2018
If pulling a static delta fails (due to one part of it missing from a
mirror, for example), retry the entire pull operation with
disable-static-deltas=true.

This is a functional, but ugly approach to fixing the problem, and a
neater solution may be adopted upstream in future. If so, this commit
should be dropped and replaced with the upstream solution:

ostreedev/ostree#1612

Signed-off-by: Philip Withnall <[email protected]>

https://phabricator.endlessm.com/T22873

Rebase 2018.6 (T23138): Fix a minor merge conflict due to "#ifdef
  OSTREE_ENABLE_EXPERIMENTAL_API" being gone now.
mwleeds pushed a commit to endlessm/ostree that referenced this pull request Dec 11, 2018
If pulling a static delta fails (due to one part of it missing from a
mirror, for example), retry the entire pull operation with
disable-static-deltas=true.

This is a functional, but ugly approach to fixing the problem, and a
neater solution may be adopted upstream in future. If so, this commit
should be dropped and replaced with the upstream solution:

ostreedev/ostree#1612

Signed-off-by: Philip Withnall <[email protected]>

https://phabricator.endlessm.com/T22873

Rebase 2018.6 (T23138): Fix a minor merge conflict due to "#ifdef
  OSTREE_ENABLE_EXPERIMENTAL_API" being gone now.
mwleeds pushed a commit to endlessm/ostree that referenced this pull request Jan 22, 2019
If pulling a static delta fails (due to one part of it missing from a
mirror, for example), retry the entire pull operation with
disable-static-deltas=true.

This is a functional, but ugly approach to fixing the problem, and a
neater solution may be adopted upstream in future. If so, this commit
should be dropped and replaced with the upstream solution:

ostreedev/ostree#1612

Signed-off-by: Philip Withnall <[email protected]>

https://phabricator.endlessm.com/T22873

Rebase 2018.6 (T23138): Fix a minor merge conflict due to "#ifdef
  OSTREE_ENABLE_EXPERIMENTAL_API" being gone now.
mwleeds pushed a commit to endlessm/ostree that referenced this pull request Jan 23, 2019
If pulling a static delta fails (due to one part of it missing from a
mirror, for example), retry the entire pull operation with
disable-static-deltas=true.

This is a functional, but ugly approach to fixing the problem, and a
neater solution may be adopted upstream in future. If so, this commit
should be dropped and replaced with the upstream solution:

ostreedev/ostree#1612

Signed-off-by: Philip Withnall <[email protected]>

https://phabricator.endlessm.com/T22873

Rebase 2018.6 (T23138): Fix a minor merge conflict due to "#ifdef
  OSTREE_ENABLE_EXPERIMENTAL_API" being gone now.
andrunko pushed a commit to endlessm/ostree that referenced this pull request Sep 17, 2019
If pulling a static delta fails (due to one part of it missing from a
mirror, for example), retry the entire pull operation with
disable-static-deltas=true.

This is a functional, but ugly approach to fixing the problem, and a
neater solution may be adopted upstream in future. If so, this commit
should be dropped and replaced with the upstream solution:

ostreedev/ostree#1612

Signed-off-by: Philip Withnall <[email protected]>

https://phabricator.endlessm.com/T22873

Rebase 2018.6 (T23138): Fix a minor merge conflict due to "#ifdef
  OSTREE_ENABLE_EXPERIMENTAL_API" being gone now.
andrunko pushed a commit to endlessm/ostree that referenced this pull request Sep 18, 2019
If pulling a static delta fails (due to one part of it missing from a
mirror, for example), retry the entire pull operation with
disable-static-deltas=true.

This is a functional, but ugly approach to fixing the problem, and a
neater solution may be adopted upstream in future. If so, this commit
should be dropped and replaced with the upstream solution:

ostreedev/ostree#1612

Signed-off-by: Philip Withnall <[email protected]>

https://phabricator.endlessm.com/T22873

Rebase 2018.6 (T23138): Fix a minor merge conflict due to "#ifdef
  OSTREE_ENABLE_EXPERIMENTAL_API" being gone now.
andrunko pushed a commit to endlessm/ostree that referenced this pull request Sep 18, 2019
If pulling a static delta fails (due to one part of it missing from a
mirror, for example), retry the entire pull operation with
disable-static-deltas=true.

This is a functional, but ugly approach to fixing the problem, and a
neater solution may be adopted upstream in future. If so, this commit
should be dropped and replaced with the upstream solution:

ostreedev/ostree#1612

Signed-off-by: Philip Withnall <[email protected]>

https://phabricator.endlessm.com/T22873

Rebase 2018.6 (T23138): Fix a minor merge conflict due to "#ifdef
  OSTREE_ENABLE_EXPERIMENTAL_API" being gone now.
@pwithnall
Copy link
Member Author

@cgwalters, @jlebon, any further thoughts on this?

@pwithnall
Copy link
Member Author

Actually, thinking about it some more, the cost of statting all the objects in the source commit might well be quite high.

Going with the principle of “ask foregiveness rather than permission”, I think it might be better to do what I have in the current version of the PR: try the static delta update, and if that fails, do something about it, rather than trying to check perfectly that the static delta update will work before trying it.

The downside of the PR in its current form is that it means the user downloads the entire static delta, discards it, and then downloads the update object-by-object again, which uses (roughly) twice as much data as needed.

So perhaps a hybrid solution, as mentioned in some of the comments above, would work? Try the static delta and, if that fails, run enough of the fsck code to fix up the source commit, and try again? If that fails, retry the whole update with static deltas disabled (as in the current version of the PR).

@jlebon
Copy link
Member

jlebon commented Nov 18, 2019

I’d be much more in favour of something magical, which doesn’t require manually retrying the operation.

OK yeah, that makes sense to me, regardless of whatever implementation we settle on.

Marking a commit as partial if ENOENT is encountered when applying the delta seems sensible anyway. If it could then cause the pull to be retried, that would work — get_best_static_delta_for() would then skip over the newly-marked-as-partial commit, and either find a different from-commit, or fall back to a non-static-delta pull.

How about this:

  • we call get_best_static_delta_start_for() as usual
  • if it's DELTA_SEARCH_RESULT_FROM, we verify whether just that commit is complete
  • if it is, we proceed with the delta pull
  • if it's not, we mark as partial and immediately fall back to queue_scan_one_metadata_object

That way we only ever fully stat one commit. The reasoning being, if one commit we thought was complete turned out not to be, we're not going to waste time trying all the other commits too.

Actually, thinking about it some more, the cost of statting all the objects in the source commit might well be quite high.

I think for one commit it's not so bad? Anyway, if it turns out to indeed be complete (which will be the case the great majority of the time), warming up those files is probably helpful when we run the static delta ops that read from them.

dbnicholson pushed a commit to endlessm/ostree that referenced this pull request Mar 17, 2021
If pulling a static delta fails (due to one part of it missing from a
mirror, for example), retry the entire pull operation with
disable-static-deltas=true.

This is a functional, but ugly approach to fixing the problem, and a
neater solution may be adopted upstream in future. If so, this commit
should be dropped and replaced with the upstream solution:

ostreedev/ostree#1612

Signed-off-by: Philip Withnall <[email protected]>

https://phabricator.endlessm.com/T22873

Rebase 2018.6 (T23138): Fix a minor merge conflict due to "#ifdef
  OSTREE_ENABLE_EXPERIMENTAL_API" being gone now.
dbnicholson pushed a commit to endlessm/ostree that referenced this pull request Feb 3, 2022
If pulling a static delta fails (due to one part of it missing from a
mirror, for example), retry the entire pull operation with
disable-static-deltas=true.

This is a functional, but ugly approach to fixing the problem, and a
neater solution may be adopted upstream in future. If so, this commit
should be dropped and replaced with the upstream solution:

ostreedev/ostree#1612

Signed-off-by: Philip Withnall <[email protected]>

https://phabricator.endlessm.com/T22873

Rebase 2018.6 (T23138): Fix a minor merge conflict due to "#ifdef
  OSTREE_ENABLE_EXPERIMENTAL_API" being gone now.
dbnicholson pushed a commit to endlessm/ostree that referenced this pull request Feb 4, 2022
If pulling a static delta fails (due to one part of it missing from a
mirror, for example), retry the entire pull operation with
disable-static-deltas=true.

This is a functional, but ugly approach to fixing the problem, and a
neater solution may be adopted upstream in future. If so, this commit
should be dropped and replaced with the upstream solution:

ostreedev/ostree#1612

Signed-off-by: Philip Withnall <[email protected]>

https://phabricator.endlessm.com/T22873

Rebase 2018.6 (T23138): Fix a minor merge conflict due to "#ifdef
  OSTREE_ENABLE_EXPERIMENTAL_API" being gone now.
@dbnicholson dbnicholson mentioned this pull request Feb 4, 2022
22 tasks
@openshift-ci
Copy link

openshift-ci bot commented Jun 29, 2023

@pwithnall: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/fcos-e2e 3e02c48 link true /test fcos-e2e
ci/prow/images 3e02c48 link true /test images

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

dbnicholson pushed a commit to endlessm/ostree that referenced this pull request Dec 14, 2023
If pulling a static delta fails (due to one part of it missing from a
mirror, for example), retry the entire pull operation with
disable-static-deltas=true.

This is a functional, but ugly approach to fixing the problem, and a
neater solution may be adopted upstream in future. If so, this commit
should be dropped and replaced with the upstream solution:

ostreedev/ostree#1612

Signed-off-by: Philip Withnall <[email protected]>

https://phabricator.endlessm.com/T22873

Rebase 2018.6 (T23138): Fix a minor merge conflict due to "#ifdef
  OSTREE_ENABLE_EXPERIMENTAL_API" being gone now.

Rebase 2023.5 (T35068): Fix merge conflict with upstream reformatted
  function signature. Run clang-format to comply with upstream style.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants