-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT MERGE] prepare for vfs 2.22.0 #140
Conversation
9921d1a
to
c45448a
Compare
Do we still use the |
c45448a
to
d49ccd0
Compare
a14ed48
to
cf11154
Compare
Hmm. It seems that
@derrickstolee did you see this issue before? |
cf11154
to
1c72774
Compare
Turns out that this has been addressed in v6 of A much better idea is most likely to pick up v6 and drop the @derrickstolee do you want me to try my hand at updating this here PR with v6, or would you rather do it yourself? |
@dscho I'll create a version of the v6 stuff that reverts the old code and plays the new code and make a PR against your While you are re-rolling, I think it would be good to drop this merge and commit.
That |
We got some more test failures in the full functional tests including this output difference:
|
@derrickstolee as @dscho indicated the post index changed hook was renamed from |
I have an idea. How about special-casing that hook for now, and look for It might actually be necessary in the long run, too, unless we have a good upgrade story for already-cloned worktrees, to change the |
acbc709
to
5b3b769
Compare
I made it so: 5b3b769e655a1557b54526c4793e80fb199ee804 @derrickstolee I also rebased after merging your #143, so the old midx-expire patches are gone. |
(@derrickstolee oh, and I put back the documentation for the |
Hmm. The test case that I introduced may be flaky. I'll need to look at this again tomorrow. |
5b3b769
to
13a22bc
Compare
The failure in the "Windows - Extra" build is due to how I updated the multi-pack-index to add packs to the |
Signed-off-by: Jeff Hostetler <[email protected]>
It took this developer quite a good while to understand why the current code cannot get a `NULL` returned by `index_file_exists()`. To un-confuse readers (and future-proof the code), let's just be safe and check before we dereference the returned pointer.
Signed-off-by: Jeff Hostetler <[email protected]>
Includes gitster / jh/trace2-sid-fix
The idea of the virtual file system really is to tell Git to avoid accessing certain paths. This fixes the case where a given path is not yet included in the virtual file system and we are about to write a conflicted version of it.
…iment Gvfs trace2 checkout and reset experiment
The repack builtin deletes redundant pack-files and their associated .idx, .promisor, .bitmap, and .keep files. We will want to re-use this logic in the future for other types of repack, so pull the logic into 'unlink_pack_path()' in packfile.c. The 'ignore_keep' parameter is enabled for the use in repack, but will be important for a future caller. Signed-off-by: Derrick Stolee <[email protected]>
We will add new subcommands to the multi-pack-index, and that will make the documentation a bit messier. Clean up the 'verb' descriptions by renaming the concept to 'subcommand' and removing the reference to the object directory. Helped-by: Stefan Beller <[email protected]> Helped-by: Szeder Gábor <[email protected]> Signed-off-by: Derrick Stolee <[email protected]>
The multi-pack-index tracks objects in a collection of pack-files. Only one copy of each object is indexed, using the modified time of the pack-files to determine tie-breakers. It is possible to have a pack-file with no referenced objects because all objects have a duplicate in a newer pack-file. Introduce a new 'expire' subcommand to the multi-pack-index builtin. This subcommand will delete these unused pack-files and rewrite the multi-pack-index to no longer refer to those files. More details about the specifics will follow as the method is implemented. Add a test that verifies the 'expire' subcommand is correctly wired, but will still be valid when the verb is implemented. Specifically, create a set of packs that should all have referenced objects and should not be removed during an 'expire' operation. The packs are created carefully to ensure they have a specific order when sorted by size. This will be important in a later test. Signed-off-by: Derrick Stolee <[email protected]>
Before writing the multi-pack-index, we compute the length of the pack-index names concatenated together. This forms the data in the pack name chunk, and we precompute it to compute chunk offsets. The value is also modified to fit alignment needs. Previously, this computation was coupled with adding packs from the existing multi-pack-index and the remaining packs in the object dir not already covered by the multi-pack-index. In anticipation of this becoming more complicated with the 'expire' subcommand, simplify the computation by centralizing it to a single loop before writing the file. Signed-off-by: Derrick Stolee <[email protected]>
In anticipation of the expire subcommand, refactor the way we sort the packfiles by name. This will greatly simplify our approach to dropping expired packs from the list. First, create 'struct pack_info' to replace 'struct pack_pair'. This struct contains the necessary information about a pack, including its name, a pointer to its packfile struct (if not already in the multi-pack-index), and the original pack-int-id. Second, track the pack information using an array of pack_info structs in the pack_list struct. This simplifies the logic around the multiple arrays we were tracking in that struct. Finally, update get_sorted_entries() to not permute the pack-int-id and instead supply the permutation to write_midx_object_offsets(). This requires sorting the packs after get_sorted_entries(). Signed-off-by: Derrick Stolee <[email protected]>
The 'git multi-pack-index expire' subcommand looks at the existing mult-pack-index, counts the number of objects referenced in each pack-file, deletes the pack-fils with no referenced objects, and rewrites the multi-pack-index to no longer reference those packs. Refactor the write_midx_file() method to call write_midx_internal() which now takes an existing 'struct multi_pack_index' and a list of pack-files to drop (as specified by the names of their pack- indexes). As we write the new multi-pack-index, we drop those file names from the list of known pack-files. The expire_midx_packs() method removes the unreferenced pack-files after carefully closing the packs to avoid open handles. Test that a new pack-file that covers the contents of two other pack-files leads to those pack-files being deleted during the expire subcommand. Be sure to read the multi-pack-index to ensure it no longer references those packs. Signed-off-by: Derrick Stolee <[email protected]>
In an environment where the multi-pack-index is useful, it is due to many pack-files and an inability to repack the object store into a single pack-file. However, it is likely that many of these pack-files are rather small, and could be repacked into a slightly larger pack-file without too much effort. It may also be important to ensure the object store is highly available and the repack operation does not interrupt concurrent git commands. Introduce a 'repack' subcommand to 'git multi-pack-index' that takes a '--batch-size' option. The subcommand will inspect the multi-pack-index for referenced pack-files whose size is smaller than the batch size, until collecting a list of pack-files whose sizes sum to larger than the batch size. Then, a new pack-file will be created containing the objects from those pack-files that are referenced by the multi-pack-index. The resulting pack is likely to actually be smaller than the batch size due to compression and the fact that there may be objects in the pack- files that have duplicate copies in other pack-files. The current change introduces the command-line arguments, and we add a test that ensures we parse these options properly. Since we specify a small batch size, we will guarantee that future implementations do not change the list of pack-files. In addition, we hard-code the modified times of the packs in the pack directory to ensure the list of packs sorted by modified time matches the order if sorted by size (ascending). This will be important in a future test. Signed-off-by: Derrick Stolee <[email protected]>
To repack with a non-zero batch-size, first sort all pack-files by their modified time. Second, walk those pack-files from oldest to newest, compute their expected size, and add the packs to a list if they are smaller than the given batch-size. Stop when the total expected size is at least the batch size. If the batch size is zero, select all packs in the multi-pack-index. Finally, collect the objects from the multi-pack-index that are in the selected packs and send them to 'git pack-objects'. Write a new multi-pack-index that includes the new pack. Using a batch size of zero is very similar to a standard 'git repack' command, except that we do not delete the old packs and instead rely on the new multi-pack-index to prevent new processes from reading the old packs. This does not disrupt other Git processes that are currently reading the old packs based on the old multi-pack-index. While first designing a 'git multi-pack-index repack' operation, I started by collecting the batches based on the actual size of the objects instead of the size of the pack-files. This allows repacking a large pack-file that has very few referencd objects. However, this came at a significant cost of parsing pack-files instead of simply reading the multi-pack-index and getting the file information for the pack-files. The "expected size" version provides similar behavior, but could skip a pack-file if the average object size is much larger than the actual size of the referenced objects, or can create a large pack if the actual size of the referenced objects is larger than the expected size. Signed-off-by: Derrick Stolee <[email protected]>
During development of the multi-pack-index expire subcommand, a version went out that improperly computed the pack order if a new pack was introduced while other packs were being removed. Part of the subtlety of the bug involved the new pack being placed before other packs that already existed in the multi-pack-index. Add a test to t5319-multi-pack-index.sh that catches this issue. The test adds new packs that cause another pack to be expired, and creates new packs that are lexicographically sorted before and after the existing packs. Signed-off-by: Derrick Stolee <[email protected]>
The 'git multi-pack-index expire' subcommand may delete packs that are not needed from the perspective of the multi-pack-index. If a pack has a .keep file, then we should not delete that pack. Add a test that ensures we preserve a pack that would otherwise be expired. First, create a new pack that contains every object in the repo, then add it to the multi-pack-index. Then create a .keep file for a pack starting with "a-pack" that was added in the previous test. Finally, expire and verify that the pack remains and the other packs were expired. Signed-off-by: Derrick Stolee <[email protected]>
The major work on merge-recursive.c that went into v2.21.0 necessitates an adjustment to work well with VFSforGit's virtualfilesystem hook.
The 'git multi-pack-index repack' command can take a batch size of zero, which creates a new pack-file containing all objects in the multi-pack-index. The first 'repack' command will create one new pack-file, and an 'expire' command after that will delete the old pack-files, as they no longer contain any referenced objects in the multi-pack-index. We must remove the .keep file that was added in the previous test in order to expire that pack-file. Also test that a 'repack' will do nothing if there is only one pack-file. Signed-off-by: Derrick Stolee <[email protected]>
Replace `git multi-pack-index expire/repack` with upstream v6
When our patches to support that hook were upstreamed, the hook's name was eliciting some reviewer suggestions, and it was renamed to `post-index-change`. These patches (with the new name) made it into v2.22.0. However, VFSforGit users may very well have checkouts with that hook installed under the original name. To support this, let's just introduce a hack where we look a bit more closely when we just failed to find the `post-index-change` hook, and allow any `post-indexchanged` hook to run instead (if it exists).
84e40b2
to
58ad016
Compare
Okay, y'all, I rebased onto Git for Windows v2.22.0 and force-pushed. This is it. Hopefully the Azure Pipeline succeeds without any help from my side, so that I can spend a couple of mostly offline days... |
Almost ;-) I had to retry once because of two flaky test cases that fail sometimes (and which are on the backlog to be investigated). |
Thanks so much, @dscho! I'm going to build a PR build installer and run the VFS for Git tests one more time before updating |
Updates to v2.22.0.vfs.1 as discussed in microsoft/git#140. Includes these changes in microsoft/git: * microsoft/git#133 * microsoft/git#139 * microsoft/git#141 * microsoft/git#143 Also, the `post-indexchanged` hook was renamed to `post-index-change` as it was upstreamed.
This PR is set up to prepare for v2.22.0: its target branch tracks git-for-windows#2192, rebasing VFS for Git's patches on top of it as frequently as possible.
Note that quite a few trace2 patches from VFS for Git were upstreamed (some in slightly different forms), so they are no longer visible in this PR's commits.