-
Notifications
You must be signed in to change notification settings - Fork 697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New infrastructure: tracking changes in files and values #3102
Conversation
In |
This PR should take careful review because I'm pretty sure there are bugs in the current implementation. Things to look out for (though maybe these are later patchsets): #3049 If these are difficult to debug, that just means we don't have the necessary logging infrastructure to help us diagnose these problems. |
Looks good on a first glance, but needs tests and Haddock comments for top-level definitions. Will do a more detailed review tomorrow. |
And the |
Re @ezyang's concerns: We do have a test suite for the file status cache. I can clean it up and include it.
@23Skidoo I'm happy to do these things before or after merging the nix-local-build branch. It's simply a question of when you want to do the 1.24 release as I'd like it in master and included in the release (not in the default code paths of course, just to give us wider beta testing and a sense of momentum). Will fix the -Werror thing. |
OK, that analysis sounds fine. Dry run pass sounds great and should help a lot; if it is difficult to use the file status cache correctly we should figure out how to make it less likely for the clients (us) to make mistakes. |
@23Skidoo Thanks for making the tracking ticket, very useful. |
Design note as I was reading the patch: why is this specialized for tracking changes in files and values, and not other things? For example, in Shake, the cache is generalized to just support tracking if any sort of value changed (a file being only one particular type of file.) |
That's all we've needed. That said, we have a number of cases for files (mtime, mtime+chash, non-existing, globbed), but it's not been a big modularity problem to cover those cases. Shake is a library so it has to make its system extensible in a modular way. Don't get me wrong, this isn't my ideal incremental rebuild framework, but my ideal one would be a lot harder to integrate with existing IO code (and more work to do in the first place). |
@23Skidoo Just checking: should I be waiting on this, or did you mean in your subsequent comments (with the tracking ticket etc) that we should merge? I still plan to add the tests and top haddock comments etc. |
-- | These two are to deal with the situation where we've been asked | ||
-- to monitor a file that's expected to exist, but when we come to | ||
-- check it's status, it no longer exists. | ||
| MonitorStateFileChanged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bikeshedding: maybe Gone
instead of Changed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aye, Gone
or even AlreadyGone
to emphasise the case it's there to cover.
LGTM modulo minor comments, let's merge. Sorry for the delay. |
What do you think about extending the globbing code to support
BTW, you can use |
So I think All that said, if Cabal starts supporting recursive file globs for data files then we may be forced to extend the file status cache to support that too, since the intention is to translate .cabal files into file monitors in a precise way, including search paths and data file globs. The nix-local-build branch also supports globs in cabal.project files, and there I think it's a bad idea to allow recursive globs since the performance is likely to be bad. For data files at least people arrange things into sensible subtrees if they're using recursive globs.
Yeah, I'll use the code from the Cabal lib directly, and/or extend it where needed for the globs for the file status cache. One thing that's different for file status checking vs simply "what files are in the package", is that we need to know what files we looked at to establish what files were in the package. For example if we probed for Foo.hs at a/Foo.hs and b/Foo.hs before finding it at c/Foo.hs then in the file status cache we need to record the non-existence of a/Foo.hs and b/Foo.hs. So this is why I've not implemented that bit yet, it's slightly more work. At the moment I cheat and include everything (leading to the problems in #3019). |
Thanks for the detailed review btw, much appreciated. |
Ok, I think that's all the comments addressed or tracked in #3104 so I'll smash the patches together and merge. I'll add the D.C.FileStatusCache tests next. |
Just a little note here: Shake does this in a pretty simple way which may be applicable for automated tracking too. (In Shake, it's semi-manual, but you might imagine how to automate it.) The idea is simply to track the list of files as a separate artifact (in Shake, a file) and then having everything else depend on that specific artifact. EDIT: And the fact that Shake tracks the contents of the |
CC @ndmitchell |
Since @ndmitchell has been summoned, I should say: It's been a while since I used Shake. I'm not sure whether that's good or bad :). |
Btw, @dcoutts is there a write-up somewhere of where all this nix-local-build stuff is leading? (I'm thinking high-level description). |
@BardurArantsson well I've talked about the general idea on various occasions, ICFP and blog posts. What are you after exactly? |
Thanks, yes, I think I actually may have read that, but I'm not quite sure what the "end game" is? Is it really just the ability to install any set of packages in a single package DB? I guess I'm just confused at some of the PRs which seemed somewhat obliquely related to the overall goal. It's probably just confusion on my part. (Don't get me wrong, I'm definitely not complaining -- it'll be great to be able to finally get of the awkwardness of sandboxes!) |
@BardurArantsson so all this file status stuff is so that we can make the interface be more like "make", where it will just do all the things (solving, installing deps, reconfiguring, rebuilding etc) but only do the ones that are needed based on what's changed. That's in contrast to the current cabal ui where you have to manually do various steps, and some of them are not quick to re-do when nothing has changed. You can try this all out in the nix-local-build branch. |
Aha! Got you! Turns out that, yes, I was confused! :) Thanks! |
@23Skidoo when updating the tests I ran into the file timestamp granularity problem with older directory/unix versions and so looked at using our D.C.Compat.Time which is apparently for that purpose a8609ae But am I going mad or does that not help at all, it's returning EpochTime which is an Int64 that's explicitly just seconds. So no sub-second resolution. What's going on with that? Of course the tests then always fail with that version so I've still got it using the System.Directory version (and adapted the tests to cope with the resolution problem). |
@dcoutts Yes, it looks like we should be using |
Re-export getInstalledPackagesMonitorFiles from Cabal lib and add getSourcePackagesMonitorFiles locally to D.C.IndexUtils. These are for tracking changes to these bits of the environment, so that it's possible for us to recompute things that depend on them.
A FileMonitor is an abstraction for monitoring the status of files, as well as changes in an in-memory value. Files can be tracked by file modification time, or mod time plus content. We can also track files that are expected not to exist (to help implement search paths). We can also have file globs. The abstraction is useful for re-running actions when input files or values change. This pattern is captured by the Rebuild monad. This adds a dependency on the hashable package (used by unordered-containers). If this is a problem we can extract just the hash function we need. This is not used yet, so there's a temporary import just to make sure it gets compiled.
New infrastructure: tracking changes in files and values
Done. |
Thanks! |
Two related patches:
The latter is fundamental to the approach we take in the nix-local-build branch. We want to run things automatically (run solver, install deps, configure, build etc) but we only want to run them when needed. This means we need to re-run actions when things those actions depend on change. What can they depend on? Files and values. So that's exactly what a file status cache monitors.