-
-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize glob matching for the DigestTrie subset case (and others) #14890
Comments
stuhood
added a commit
that referenced
this issue
Mar 23, 2022
… by subsetting (#14889) #14858 reported that an invalid `Snapshot` was being created, and reproducing in debug mode triggered `debug_assertions` related to the validity of `remexec::Directory`s created by the subset matching code. Although porting the existing subset-matching code to `DigestTrie` _and_ fixing the bug would be one option, we've wanted to remove duplication between the "apply globs to a `Snapshot`" and "apply globs to the filesystem" codepaths for a long time (see #9967), and fixing the bug by unifying the codepaths kills two birds with one stone. This change ports to implementing subset matching using `Vfs::expand_globs`, followed by the creation of a new `Snapshot` from the matches. This is definitely not as optimized as the direct subset matching was (`./cargo bench -p store -- subset` reports a change from ~20ms to ~100ms for 10k files), but it opens the door to unified optimization of _both_ our glob-expansion code and in-memory glob matching in parallel: see #14890. Fixes #9967, fixes #14858, fixes #12462, and fixes #13112.
stuhood
added a commit
to stuhood/pants
that referenced
this issue
Mar 23, 2022
… by subsetting (pantsbuild#14889) pantsbuild#14858 reported that an invalid `Snapshot` was being created, and reproducing in debug mode triggered `debug_assertions` related to the validity of `remexec::Directory`s created by the subset matching code. Although porting the existing subset-matching code to `DigestTrie` _and_ fixing the bug would be one option, we've wanted to remove duplication between the "apply globs to a `Snapshot`" and "apply globs to the filesystem" codepaths for a long time (see pantsbuild#9967), and fixing the bug by unifying the codepaths kills two birds with one stone. This change ports to implementing subset matching using `Vfs::expand_globs`, followed by the creation of a new `Snapshot` from the matches. This is definitely not as optimized as the direct subset matching was (`./cargo bench -p store -- subset` reports a change from ~20ms to ~100ms for 10k files), but it opens the door to unified optimization of _both_ our glob-expansion code and in-memory glob matching in parallel: see pantsbuild#14890. Fixes pantsbuild#9967, fixes pantsbuild#14858, fixes pantsbuild#12462, and fixes pantsbuild#13112.
stuhood
added a commit
that referenced
this issue
Mar 24, 2022
… by subsetting (cherrypick of #14889) (#14896) #14858 reported that an invalid `Snapshot` was being created, and reproducing in debug mode triggered `debug_assertions` related to the validity of `remexec::Directory`s created by the subset matching code. Although porting the existing subset-matching code to `DigestTrie` _and_ fixing the bug would be one option, we've wanted to remove duplication between the "apply globs to a `Snapshot`" and "apply globs to the filesystem" codepaths for a long time (see #9967), and fixing the bug by unifying the codepaths kills two birds with one stone. This change ports to implementing subset matching using `Vfs::expand_globs`, followed by the creation of a new `Snapshot` from the matches. This is definitely not as optimized as the direct subset matching was (`./cargo bench -p store -- subset` reports a change from ~20ms to ~100ms for 10k files), but it opens the door to unified optimization of _both_ our glob-expansion code and in-memory glob matching in parallel: see #14890. Fixes #9967, fixes #14858, fixes #12462, and fixes #13112. [ci skip-build-wheels]
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In #14889, we removed an optimization from glob matching of
Snapshot
s: the ability to "match everything in a directory recursively" if we encounter a recursive wildcard which is guaranteed to match everything.We should reintroduce this optimization (likely as an optional method of
Vfs
which can return "allStat
s recursively under a directory"), but we should also see what other optimizations can be made to theVfs
interface and glob matching to improve performance. For example:Vfs::scandir
as we recurse in glob matching (whichimpl Vfs for DigestTrie
could use to maintain the currentdirectory::Entry
, rather than repeatedly recursing down to it viaDigestTrie::entry
).DigestTrie::merge
, rather than recursing independently for eachPathGlob
.The text was updated successfully, but these errors were encountered: