Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] prepare for vfs 2.22.0 #140

Closed
wants to merge 1,317 commits into from
Closed

Conversation

dscho
Copy link
Member

@dscho dscho commented May 24, 2019

This PR is set up to prepare for v2.22.0: its target branch tracks git-for-windows#2192, rebasing VFS for Git's patches on top of it as frequently as possible.

Note that quite a few trace2 patches from VFS for Git were upstreamed (some in slightly different forms), so they are no longer visible in this PR's commits.

@dscho
Copy link
Member Author

dscho commented May 25, 2019

Do we still use the post-index-changed hook? It was renamed to post-index-change when it was upstreamed...

@dscho dscho requested a review from benpeart May 25, 2019 09:25
@dscho dscho force-pushed the prepare-for-vfs-2.22.0 branch 2 times, most recently from a14ed48 to cf11154 Compare May 25, 2019 19:49
@dscho
Copy link
Member Author

dscho commented May 26, 2019

Hmm. It seems that t/t5319-multi-pack-index.sh fails on Windows (also in the slow PR build) with:

expecting success: 
	(
		cd dup &&
		git pack-objects --revs .git/objects/pack/pack-combined <<-EOF &&
		refs/heads/A
		^refs/heads/C
		EOF
		git multi-pack-index write &&
		ls .git/objects/pack | grep -v -e pack-[AB] >expect &&
		git multi-pack-index expire &&
		ls .git/objects/pack >actual &&
		test_cmp expect actual
	)

++ cd dup
++ git pack-objects --revs .git/objects/pack/pack-combined
3397c17c12e399df77c4f8ad89ef7ba8f628f947
++ git multi-pack-index write
++ ls .git/objects/pack
++ grep -v -e 'pack-[AB]'
++ git multi-pack-index expire
./test-lib.sh: line 908: 13244 Segmentation fault      git multi-pack-index expire
error: last command exited with $?=139
not ok 41 - expire removes unreferenced packs

@derrickstolee did you see this issue before?

@dscho
Copy link
Member Author

dscho commented May 27, 2019

./test-lib.sh: line 908: 13244 Segmentation fault git multi-pack-index expire

Turns out that this has been addressed in v6 of ds/midx-expire-repack. I papered over it by picking up the removal of FREE_AND_NULL(m->packs[i]); in expire_midx_packs() (after coming up with a fix that removes the packs from the MRU).

A much better idea is most likely to pick up v6 and drop the expire stuff that we have currently.

@derrickstolee do you want me to try my hand at updating this here PR with v6, or would you rather do it yourself?

@derrickstolee
Copy link
Collaborator

@dscho I'll create a version of the v6 stuff that reverts the old code and plays the new code and make a PR against your prepare branch.

While you are re-rolling, I think it would be good to drop this merge and commit.

*   29d26a4035 Merge 'gvfs/ds/generation-numbers-update'
|\
| * 2478506054 commit: add generation to pop_most_recent_commit()

That 247850 adds a use of generation number, but only supplies GENERATION_NUMBER_ZERO to the callers, so is worthless.

@derrickstolee
Copy link
Collaborator

We got some more test failures in the full functional tests including this output difference:

status Output Lines counts do not match. was: 2 expected: 15
Extra: nothing to commit, working tree clean
Missing: Changes not staged for commit:
Missing:   (use "git add/rm <file>..." to update what will be committed)
Missing:   (use "git checkout -- <file>..." to discard changes in working directory)
Missing: 	deleted:    GVFS/GVFS/CommandLine/CloneHelper.cs
Missing: 	deleted:    GVFS/GVFS/CommandLine/CloneVerb.cs
Missing: 	deleted:    GVFS/GVFS/CommandLine/DiagnoseVerb.cs
Missing: 	deleted:    GVFS/GVFS/CommandLine/GVFSVerb.cs
Missing: 	deleted:    GVFS/GVFS/CommandLine/LogVerb.cs
Missing: 	deleted:    GVFS/GVFS/CommandLine/MountVerb.cs
Missing: 	deleted:    GVFS/GVFS/CommandLine/PrefetchHelper.cs
Missing: 	deleted:    GVFS/GVFS/CommandLine/PrefetchVerb.cs
Missing: 	deleted:    GVFS/GVFS/CommandLine/StatusVerb.cs
Missing: 	deleted:    GVFS/GVFS/CommandLine/UnmountVerb.

@kewillford
Copy link
Member

@derrickstolee as @dscho indicated the post index changed hook was renamed from post-index-changed to post-index-change when it was upstreamed so we will need to update the VFSForGit code or we will have failures since the projection will not be getting updated.

@dscho
Copy link
Member Author

dscho commented May 28, 2019

the post index changed hook was renamed from post-index-changed to post-index-change when it was upstreamed so we will need to update the VFSForGit code or we will have failures since the projection will not be getting updated.

I have an idea. How about special-casing that hook for now, and look for .git/hooks/post-index-changed if ...-change was not found? That should work around things in the short run.

It might actually be necessary in the long run, too, unless we have a good upgrade story for already-cloned worktrees, to change the ...-changed hook into a ...-change one?

@dscho
Copy link
Member Author

dscho commented May 28, 2019

I have an idea. How about special-casing that hook for now, and look for .git/hooks/post-index-changed if ...-change was not found?

I made it so: 5b3b769e655a1557b54526c4793e80fb199ee804

@derrickstolee I also rebased after merging your #143, so the old midx-expire patches are gone.

@dscho
Copy link
Member Author

dscho commented May 28, 2019

(@derrickstolee oh, and I put back the documentation for the post-index-change hook, thank you so much for your thorough review!)

@dscho dscho mentioned this pull request May 28, 2019
@dscho
Copy link
Member Author

dscho commented May 28, 2019

I have an idea. How about special-casing that hook for now, and look for .git/hooks/post-index-changed if ...-change was not found?

I made it so: 5b3b769

Hmm. The test case that I introduced may be flaky. I'll need to look at this again tomorrow.

@derrickstolee
Copy link
Collaborator

The failure in the "Windows - Extra" build is due to how I updated the multi-pack-index to add packs to the packed_git list. See microsoft/VFSForGit#1213 for full details. It does not require any Git fix.

jeffhostetler and others added 20 commits June 8, 2019 19:37
It took this developer quite a good while to understand why the current
code cannot get a `NULL` returned by `index_file_exists()`. To
un-confuse readers (and future-proof the code), let's just be safe and
check before we dereference the returned pointer.
Includes gitster / jh/trace2-sid-fix
The idea of the virtual file system really is to tell Git to avoid
accessing certain paths. This fixes the case where a given path is not
yet included in the virtual file system and we are about to write a
conflicted version of it.
…iment

Gvfs trace2 checkout and reset experiment
The repack builtin deletes redundant pack-files and their
associated .idx, .promisor, .bitmap, and .keep files. We will want
to re-use this logic in the future for other types of repack, so
pull the logic into 'unlink_pack_path()' in packfile.c.

The 'ignore_keep' parameter is enabled for the use in repack, but
will be important for a future caller.

Signed-off-by: Derrick Stolee <[email protected]>
We will add new subcommands to the multi-pack-index, and that will
make the documentation a bit messier. Clean up the 'verb'
descriptions by renaming the concept to 'subcommand' and removing
the reference to the object directory.

Helped-by: Stefan Beller <[email protected]>
Helped-by: Szeder Gábor <[email protected]>
Signed-off-by: Derrick Stolee <[email protected]>
The multi-pack-index tracks objects in a collection of pack-files.
Only one copy of each object is indexed, using the modified time
of the pack-files to determine tie-breakers. It is possible to
have a pack-file with no referenced objects because all objects
have a duplicate in a newer pack-file.

Introduce a new 'expire' subcommand to the multi-pack-index builtin.
This subcommand will delete these unused pack-files and rewrite the
multi-pack-index to no longer refer to those files. More details
about the specifics will follow as the method is implemented.

Add a test that verifies the 'expire' subcommand is correctly wired,
but will still be valid when the verb is implemented. Specifically,
create a set of packs that should all have referenced objects and
should not be removed during an 'expire' operation. The packs are
created carefully to ensure they have a specific order when sorted
by size. This will be important in a later test.

Signed-off-by: Derrick Stolee <[email protected]>
Before writing the multi-pack-index, we compute the length of the
pack-index names concatenated together. This forms the data in the
pack name chunk, and we precompute it to compute chunk offsets.
The value is also modified to fit alignment needs.

Previously, this computation was coupled with adding packs from
the existing multi-pack-index and the remaining packs in the object
dir not already covered by the multi-pack-index.

In anticipation of this becoming more complicated with the 'expire'
subcommand, simplify the computation by centralizing it to a single
loop before writing the file.

Signed-off-by: Derrick Stolee <[email protected]>
In anticipation of the expire subcommand, refactor the way we sort
the packfiles by name. This will greatly simplify our approach to
dropping expired packs from the list.

First, create 'struct pack_info' to replace 'struct pack_pair'.
This struct contains the necessary information about a pack,
including its name, a pointer to its packfile struct (if not
already in the multi-pack-index), and the original pack-int-id.

Second, track the pack information using an array of pack_info
structs in the pack_list struct. This simplifies the logic around
the multiple arrays we were tracking in that struct.

Finally, update get_sorted_entries() to not permute the pack-int-id
and instead supply the permutation to write_midx_object_offsets().
This requires sorting the packs after get_sorted_entries().

Signed-off-by: Derrick Stolee <[email protected]>
The 'git multi-pack-index expire' subcommand looks at the existing
mult-pack-index, counts the number of objects referenced in each
pack-file, deletes the pack-fils with no referenced objects, and
rewrites the multi-pack-index to no longer reference those packs.

Refactor the write_midx_file() method to call write_midx_internal()
which now takes an existing 'struct multi_pack_index' and a list
of pack-files to drop (as specified by the names of their pack-
indexes). As we write the new multi-pack-index, we drop those
file names from the list of known pack-files.

The expire_midx_packs() method removes the unreferenced pack-files
after carefully closing the packs to avoid open handles.

Test that a new pack-file that covers the contents of two other
pack-files leads to those pack-files being deleted during the
expire subcommand. Be sure to read the multi-pack-index to ensure
it no longer references those packs.

Signed-off-by: Derrick Stolee <[email protected]>
In an environment where the multi-pack-index is useful, it is due
to many pack-files and an inability to repack the object store
into a single pack-file. However, it is likely that many of these
pack-files are rather small, and could be repacked into a slightly
larger pack-file without too much effort. It may also be important
to ensure the object store is highly available and the repack
operation does not interrupt concurrent git commands.

Introduce a 'repack' subcommand to 'git multi-pack-index' that
takes a '--batch-size' option. The subcommand will inspect the
multi-pack-index for referenced pack-files whose size is smaller
than the batch size, until collecting a list of pack-files whose
sizes sum to larger than the batch size. Then, a new pack-file
will be created containing the objects from those pack-files that
are referenced by the multi-pack-index. The resulting pack is
likely to actually be smaller than the batch size due to
compression and the fact that there may be objects in the pack-
files that have duplicate copies in other pack-files.

The current change introduces the command-line arguments, and we
add a test that ensures we parse these options properly. Since
we specify a small batch size, we will guarantee that future
implementations do not change the list of pack-files.

In addition, we hard-code the modified times of the packs in
the pack directory to ensure the list of packs sorted by modified
time matches the order if sorted by size (ascending). This will
be important in a future test.

Signed-off-by: Derrick Stolee <[email protected]>
To repack with a non-zero batch-size, first sort all pack-files by
their modified time. Second, walk those pack-files from oldest
to newest, compute their expected size, and add the packs to a list
if they are smaller than the given batch-size. Stop when the total
expected size is at least the batch size.

If the batch size is zero, select all packs in the multi-pack-index.

Finally, collect the objects from the multi-pack-index that are in
the selected packs and send them to 'git pack-objects'. Write a new
multi-pack-index that includes the new pack.

Using a batch size of zero is very similar to a standard 'git repack'
command, except that we do not delete the old packs and instead rely
on the new multi-pack-index to prevent new processes from reading the
old packs. This does not disrupt other Git processes that are currently
reading the old packs based on the old multi-pack-index.

While first designing a 'git multi-pack-index repack' operation, I
started by collecting the batches based on the actual size of the
objects instead of the size of the pack-files. This allows repacking
a large pack-file that has very few referencd objects. However, this
came at a significant cost of parsing pack-files instead of simply
reading the multi-pack-index and getting the file information for
the pack-files. The "expected size" version provides similar
behavior, but could skip a pack-file if the average object size is
much larger than the actual size of the referenced objects, or
can create a large pack if the actual size of the referenced objects
is larger than the expected size.

Signed-off-by: Derrick Stolee <[email protected]>
During development of the multi-pack-index expire subcommand, a
version went out that improperly computed the pack order if a new
pack was introduced while other packs were being removed. Part of
the subtlety of the bug involved the new pack being placed before
other packs that already existed in the multi-pack-index.

Add a test to t5319-multi-pack-index.sh that catches this issue.
The test adds new packs that cause another pack to be expired, and
creates new packs that are lexicographically sorted before and
after the existing packs.

Signed-off-by: Derrick Stolee <[email protected]>
The 'git multi-pack-index expire' subcommand may delete packs that
are not needed from the perspective of the multi-pack-index. If
a pack has a .keep file, then we should not delete that pack. Add
a test that ensures we preserve a pack that would otherwise be
expired. First, create a new pack that contains every object in
the repo, then add it to the multi-pack-index. Then create a .keep
file for a pack starting with "a-pack" that was added in the
previous test. Finally, expire and verify that the pack remains
and the other packs were expired.

Signed-off-by: Derrick Stolee <[email protected]>
The major work on merge-recursive.c that went into v2.21.0 necessitates
an adjustment to work well with VFSforGit's virtualfilesystem hook.
The 'git multi-pack-index repack' command can take a batch size of
zero, which creates a new pack-file containing all objects in the
multi-pack-index. The first 'repack' command will create one new
pack-file, and an 'expire' command after that will delete the old
pack-files, as they no longer contain any referenced objects in the
multi-pack-index.

We must remove the .keep file that was added in the previous test
in order to expire that pack-file.

Also test that a 'repack' will do nothing if there is only one
pack-file.

Signed-off-by: Derrick Stolee <[email protected]>
Replace `git multi-pack-index expire/repack` with upstream v6
When our patches to support that hook were upstreamed, the hook's name
was eliciting some reviewer suggestions, and it was renamed to
`post-index-change`. These patches (with the new name) made it into
v2.22.0.

However, VFSforGit users may very well have checkouts with that hook
installed under the original name.

To support this, let's just introduce a hack where we look a bit more
closely when we just failed to find the `post-index-change` hook, and
allow any `post-indexchanged` hook to run instead (if it exists).
@dscho
Copy link
Member Author

dscho commented Jun 8, 2019

Okay, y'all, I rebased onto Git for Windows v2.22.0 and force-pushed. This is it. Hopefully the Azure Pipeline succeeds without any help from my side, so that I can spend a couple of mostly offline days...

@dscho
Copy link
Member Author

dscho commented Jun 9, 2019

Hopefully the Azure Pipeline succeeds without any help from my side

Almost ;-) I had to retry once because of two flaky test cases that fail sometimes (and which are on the backlog to be investigated).

@derrickstolee
Copy link
Collaborator

Thanks so much, @dscho! I'm going to build a PR build installer and run the VFS for Git tests one more time before updating vfs-2.22.0 to the current prepare-for-vfs-2.22.0 branch.

@dscho dscho deleted the prepare-for-vfs-2.22.0 branch June 10, 2019 16:28
derrickstolee added a commit to microsoft/VFSForGit that referenced this pull request Jun 12, 2019
Updates to v2.22.0.vfs.1 as discussed in  microsoft/git#140.

Includes these changes in microsoft/git:

* microsoft/git#133
* microsoft/git#139
* microsoft/git#141
* microsoft/git#143

Also, the `post-indexchanged` hook was renamed to `post-index-change` as it was upstreamed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.