-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix handling of scratch deltas with existing commits #2005
base: main
Are you sure you want to change the base?
Fix handling of scratch deltas with existing commits #2005
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: dbnicholson The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
As I said here: #2004 (comment) With this change if you had a previous version of the ref and with flatpak downloading a COMMIT_ONLY of the latest version of the ref then the is-partial check will always be true, even if we have some other version of the ref so using the from-scratch is not a good idea. |
After sleeping on it, I agree. Even though we don't build scratch deltas for our OS, we'd never be able to use them with our updater since it also does a metadata only pull first. A slightly more involved but still pretty simple check (as you're suggesting). If the current revision of partial but the parent exists and is not partial, then do an object pull as well. Actually, I could see cases where the parent is also partial because you didn't complete the previous pull and now the origin has published a new commit. It would be better to walk the ref until you find a normal commit. So, then if there was a normal commit reachable from the ref, use objects. Otherwise, use the scratch delta. Does that make sense? |
That seems fine to me. Although I'm gonna do the workaround in flatpak, so its not a huge hurry for me for this change. |
The rest of the logic for what delta to choose is in `get_best_static_delta_start_for`, so move the logic for skipping scratch deltas there, too. This makes the handling in `initiate_request` more straightforward by returning no match when it should be skipped so that it goes directly into the object pull path.
3c0fadc
to
aeaca9b
Compare
Apologies if anyone was reviewing this. I restarted after the above discussion and force pushed over the previous series. |
Actually I don't know if this is right either. The walk over the commits parent will only succeed if the direct parent of the new commit object is available locally. If you jump multiple revisions when you do the update then it will not find the intermediate parents and thus not the grandparent. |
Ugh, you're right. I added a test where the parent is partial and the grandparent is normal, but it's entirely common that you don't have the parent at all. I'm not sure I can think of a great way to keep this heuristic on the ostree side besides having metadata only pulls not update refs, but I'm sure that would break programs. I think this can only be reliably handled is if the program does not update the ref with a metadata only pull. I.e., pull by revision instead of by ref. Then the heuristic could still be applied correctly - what's on the ref when you do the real pull will accurately represent the state you want. |
Another thing that might help is a flag or option that says not to update the ref. So you can fetch the commit to your repo but not update your local ref. |
@alexlarsson did you see my comment at flatpak/flatpak#3413 (comment)? |
Ahh that's a tricky corner case. It's not at all obvious that doing a commit-only pull only would negatively affect static deltas. I guess a reflog would have been useful here. Hmm, couldn't we also use the |
We could probably extend the "partial" state to be:
The "use object fetch" path came about for things like |
A git style reflog did come to mind, but I'd forgotten about the |
I think that would be useful, but I think you'd still be in the same situation for partial metadata if you had a previously not partial commit on the ref but now you can't find it because of a non-linear history. I think the only way to have the partial metadata state not throw off the heuristic is:
|
In e7305bb a heuristic was added that if a scratch delta exists but you already have a commit on that ref that an object pull should be preferred. The idea is that if you already have something on the ref, then it's likely that an object pull require less data to be fetched than pulling the entire new commit. However, this doesn't take into account the commit state. If the existing commit is partial, then it might be better to fetch the scratch delta. On the other hand, if the existing commit is partial only because a metadata only pull has been done, then the parent commit might be normal and then you'd want to prefer an object pull again. Instead of considering just the HEAD local commit, walk the parents and check if any of them are normal commits. Only then use an object pull.
If a scratch delta exists and the caller has required static deltas, use it instead of the heuristic about existing commits.
If a normal commit cannot be found in the history of the current ref, try to find one by looking at the collection and ref bindings in all the commits in the repo. This will only work if the remote is using collection IDs since otherwise just looking at the ref bindings in the local commits would be ambiguous about what remote the commit came from. If a normal commit with matching bindings is found, prefer an object pull. With this change it's possible to have a partial local HEAD commit (from a previous metadata only pull, for example) and broken history to a normal ancestor and correctly use the heuristic that an object pull should be preferred. This is a common case since user repos are likely to have missed intervening commits on the same remote ref.
b6a5c4e
to
04a55cb
Compare
Here's another rework that includes looking for matching commit bindings as a fallback if the history search is unsuccessful. The added tests indicate it's doing the right thing, but I'm not sure how comfortable I am potentially scanning for and loading all the commits in the repo just for the benefit of this heuristic. |
@dbnicholson: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@dbnicholson: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This fixes 2 bugs with handling of scratch deltas. First, if the existing commit is partial, use the scratch delta instead of the assumption that an object pull would be better. This fixes flatpak/flatpak#3412 because it initially does a metadata only fetch to figure out some things about extra data flatpaks. Second, if the caller specifies
require-static-deltas
, honor it for a scratch delta.The tests were a little tricky to put together because ostree internally will fall back to an object pull if the delta doesn't exist. I had to corrupt the superblock to make it error instead of falling back.
Closes: #2004