-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
contenthash: improve the correctness of needsScan #5060
Conversation
@tonistiigi Based on my testing, this fixes the issue you described as well as having a few other fixes that should improve other possible performance regressions (all mainly dealing with non-existent files). I can cook up a few tests that call |
b37ddba
to
918c9f9
Compare
|
3b9491e
to
542e4cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pathSet is the simplest way of implementing the structure needed for the prefix checks. If you feel a more complex structure is needed, let me know
I guess another radix tree would work well for this but assuming the length of prefixes
array is always expected to be small, the potentially inefficient lookup in includes
shouldn't matter for practical cases.
I can cook up a few tests that call needsScan and scanPath directly if those are the kind of tests you'd like to add?
That sgtm, but we could also add some private counters logic using private variables that can be turned on by the tests so we can see how many times the scanning/walking happens for certain conditions and detect if some future change causes the more expensive scanning part happen more often than expected.
Ah yeah,
Already working on the |
39e3281
to
6dff9ad
Compare
6dff9ad
to
9a5ddc0
Compare
a485d4d
to
8179763
Compare
Commit f724d6f ("contenthash: implement proper Linux symlink semantics for needsScan") fixed issues with needScan's handling of symlinks, but the logic used to figure out if a parent path is in the cache was incorrect in two ways: 1. The optimisations in getFollowSymlinksCallback to avoid looking up / or no-op components in the cache lead to the callback not being called when we go through / (both in for the initial currentPath=/ state and for absolute symlinks). The upshot is that needsScan(/non-existent-path) would always return true because we didn't check if / has been scanned. There were also some issues with the wrong cache record being returned if you hit a symlink to only /. These optimisations only make sense if are only returning a path (as in rootPath) and not anything else. 2. Because needsScan would only store the _last_ good path, cases with symlink jumps to non-existent paths within directories already scanned would result in a re-scan that isn't necessary. Fix this by saving a set of prefix paths we have seen. This change also means that we can also simplify the logic in the needsScan callback. Note that in combination with (1) and (2), if / has been scanned then needsScan will always return false now (because the / prefix is always checked against, and every path has it as a parent). The pathSet structure is the "dumb" way of doing it. We could use a radix tree and LongestPrefix() to implement it in a smarter way, but in practice the list will be very small and is both short-lived and write-many-read-few, so go-immutable-radix's node copy cost probably makes the array better in practice anyway. Fixes: f724d6f ("contenthash: implement proper Linux symlink semantics for needsScan") Signed-off-by: Aleksa Sarai <[email protected]>
Most of these tests revolve around making sure that we are scanning the correct path during Checksum, and that we don't think a scan is required for subpaths we have already scanned. Signed-off-by: Aleksa Sarai <[email protected]>
While we now have tests ensuring that needsScan does not regress and indicate that a scan is neccessary, it seems prudent to also include checks that scanPath is definitely not running on any new paths when we don't expect it. Suggested-by: Tonis Tiigi <[email protected]> Signed-off-by: Aleksa Sarai <[email protected]>
8179763
to
7b630eb
Compare
Commit f724d6f ("contenthash: implement proper Linux symlink semantics for needsScan") fixed issues with needScan's handling of symlinks, but the logic used to figure out if a parent path is in the cache was incorrect in a couple of ways:
Fixes: f724d6f ("contenthash: implement proper Linux symlink semantics for needsScan")
Fixes #5042
Signed-off-by: Aleksa Sarai [email protected]