Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New infrastructure: tracking changes in files and values #3102

Merged
merged 3 commits into from
Feb 7, 2016

Conversation

dcoutts
Copy link
Contributor

@dcoutts dcoutts commented Feb 1, 2016

Two related patches:

  • get access to files we need to monitor for changes in source packages and installed packages
  • new abstractions for tracking changes to files

The latter is fundamental to the approach we take in the nix-local-build branch. We want to run things automatically (run solver, install deps, configure, build etc) but we only want to run them when needed. This means we need to re-run actions when things those actions depend on change. What can they depend on? Files and values. So that's exactly what a file status cache monitors.

@23Skidoo
Copy link
Member

23Skidoo commented Feb 1, 2016

In add-source code we do something similar when we check whether add-source dependencies need to be reinstalled.

@ezyang
Copy link
Contributor

ezyang commented Feb 1, 2016

This PR should take careful review because I'm pretty sure there are bugs in the current implementation. Things to look out for (though maybe these are later patchsets):

#3049
#3021
#3020
#3019 !!!
#3014
#2976

If these are difficult to debug, that just means we don't have the necessary logging infrastructure to help us diagnose these problems.

@23Skidoo
Copy link
Member

23Skidoo commented Feb 1, 2016

Looks good on a first glance, but needs tests and Haddock comments for top-level definitions. Will do a more detailed review tomorrow.

@23Skidoo
Copy link
Member

23Skidoo commented Feb 1, 2016

And the -Werror failure on Travis needs fixing.

@dcoutts
Copy link
Contributor Author

dcoutts commented Feb 1, 2016

Re @ezyang's concerns:

We do have a test suite for the file status cache. I can clean it up and include it.

@23Skidoo I'm happy to do these things before or after merging the nix-local-build branch. It's simply a question of when you want to do the 1.24 release as I'd like it in master and included in the release (not in the default code paths of course, just to give us wider beta testing and a sense of momentum).

Will fix the -Werror thing.

@ezyang
Copy link
Contributor

ezyang commented Feb 1, 2016

OK, that analysis sounds fine. Dry run pass sounds great and should help a lot; if it is difficult to use the file status cache correctly we should figure out how to make it less likely for the clients (us) to make mistakes.

@dcoutts
Copy link
Contributor Author

dcoutts commented Feb 1, 2016

we should figure out how to make it less likely for the clients (us) to make mistakes.

The Rebuild monad is one of these abstractions.

@23Skidoo 23Skidoo mentioned this pull request Feb 2, 2016
13 tasks
@23Skidoo
Copy link
Member

23Skidoo commented Feb 2, 2016

@dcoutts

I'm happy to do these things before or after merging the nix-local-build branch.

Sure, we can wait after the branch is merged. I've opened a ticket with a list of tasks to do post-merge: #3104.

@dcoutts
Copy link
Contributor Author

dcoutts commented Feb 2, 2016

@23Skidoo Thanks for making the tracking ticket, very useful.

@ezyang
Copy link
Contributor

ezyang commented Feb 2, 2016

Design note as I was reading the patch: why is this specialized for tracking changes in files and values, and not other things? For example, in Shake, the cache is generalized to just support tracking if any sort of value changed (a file being only one particular type of file.)

@dcoutts
Copy link
Contributor Author

dcoutts commented Feb 2, 2016

Design note as I was reading the patch: why is this specialized for tracking changes in files and values, and not other things?

That's all we've needed. That said, we have a number of cases for files (mtime, mtime+chash, non-existing, globbed), but it's not been a big modularity problem to cover those cases. Shake is a library so it has to make its system extensible in a modular way.

Don't get me wrong, this isn't my ideal incremental rebuild framework, but my ideal one would be a lot harder to integrate with existing IO code (and more work to do in the first place).

@dcoutts
Copy link
Contributor Author

dcoutts commented Feb 3, 2016

Will do a more detailed review tomorrow.

@23Skidoo Just checking: should I be waiting on this, or did you mean in your subsequent comments (with the tracking ticket etc) that we should merge? I still plan to add the tests and top haddock comments etc.

-- | These two are to deal with the situation where we've been asked
-- to monitor a file that's expected to exist, but when we come to
-- check it's status, it no longer exists.
| MonitorStateFileChanged
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshedding: maybe Gone instead of Changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aye, Gone or even AlreadyGone to emphasise the case it's there to cover.

@23Skidoo
Copy link
Member

23Skidoo commented Feb 4, 2016

LGTM modulo minor comments, let's merge. Sorry for the delay.

@23Skidoo
Copy link
Member

23Skidoo commented Feb 4, 2016

@dcoutts

What do you think about extending the globbing code to support **/* patterns? Then we could move it to the Cabal library and reuse it for extra-source-files and data-files.

The current hack is to track all files in a package dir rather than just the package files implied by the .cabal file.

BTW, you can use cabal sdist --list-sources to just list the sources implied by the .cabal file. This is what add-source code does.

@dcoutts
Copy link
Contributor Author

dcoutts commented Feb 5, 2016

So I think ** for recursive globs are hard to do efficiently in the file status cache. I've no objection to the concept in general or the syntax or sharing one true globbing impl.

All that said, if Cabal starts supporting recursive file globs for data files then we may be forced to extend the file status cache to support that too, since the intention is to translate .cabal files into file monitors in a precise way, including search paths and data file globs.

The nix-local-build branch also supports globs in cabal.project files, and there I think it's a bad idea to allow recursive globs since the performance is likely to be bad. For data files at least people arrange things into sensible subtrees if they're using recursive globs.

BTW, you can use cabal sdist --list-sources to just list the sources implied by the .cabal file. This is what add-source code does.

Yeah, I'll use the code from the Cabal lib directly, and/or extend it where needed for the globs for the file status cache. One thing that's different for file status checking vs simply "what files are in the package", is that we need to know what files we looked at to establish what files were in the package. For example if we probed for Foo.hs at a/Foo.hs and b/Foo.hs before finding it at c/Foo.hs then in the file status cache we need to record the non-existence of a/Foo.hs and b/Foo.hs. So this is why I've not implemented that bit yet, it's slightly more work. At the moment I cheat and include everything (leading to the problems in #3019).

@dcoutts
Copy link
Contributor Author

dcoutts commented Feb 5, 2016

Thanks for the detailed review btw, much appreciated.

@dcoutts
Copy link
Contributor Author

dcoutts commented Feb 5, 2016

Ok, I think that's all the comments addressed or tracked in #3104 so I'll smash the patches together and merge. I'll add the D.C.FileStatusCache tests next.

@BardurArantsson
Copy link
Collaborator

Yeah, I'll use the code from the Cabal lib directly, and/or extend it where needed for the globs for the file status cache. One thing that's different for file status checking vs simply "what files are in the package", is that we need to know what files we looked at to establish what files were in the package. For example if we probed for Foo.hs at a/Foo.hs and b/Foo.hs before finding it at c/Foo.hs then in the file status cache we need to record the non-existence of a/Foo.hs and b/Foo.hs. So this is why I've not implemented that bit yet, it's slightly more work. At the moment I cheat and include everything (leading to the problems in #3019).

Just a little note here: Shake does this in a pretty simple way which may be applicable for automated tracking too. (In Shake, it's semi-manual, but you might imagine how to automate it.) The idea is simply to track the list of files as a separate artifact (in Shake, a file) and then having everything else depend on that specific artifact.

EDIT: And the fact that Shake tracks the contents of the .lst file means that you won't have spurious rebuilds "just because".

@ezyang
Copy link
Contributor

ezyang commented Feb 5, 2016

CC @ndmitchell

@BardurArantsson
Copy link
Collaborator

Since @ndmitchell has been summoned, I should say: It's been a while since I used Shake. I'm not sure whether that's good or bad :).

@BardurArantsson
Copy link
Collaborator

Btw, @dcoutts is there a write-up somewhere of where all this nix-local-build stuff is leading? (I'm thinking high-level description).

@dcoutts
Copy link
Contributor Author

dcoutts commented Feb 5, 2016

@BardurArantsson well I've talked about the general idea on various occasions, ICFP and blog posts.
http://www.well-typed.com/blog/2015/01/how-we-might-abolish-cabal-hell-part-2/

What are you after exactly?

@BardurArantsson
Copy link
Collaborator

Thanks, yes, I think I actually may have read that, but I'm not quite sure what the "end game" is? Is it really just the ability to install any set of packages in a single package DB? I guess I'm just confused at some of the PRs which seemed somewhat obliquely related to the overall goal. It's probably just confusion on my part. (Don't get me wrong, I'm definitely not complaining -- it'll be great to be able to finally get of the awkwardness of sandboxes!)

@dcoutts
Copy link
Contributor Author

dcoutts commented Feb 5, 2016

@BardurArantsson so all this file status stuff is so that we can make the interface be more like "make", where it will just do all the things (solving, installing deps, reconfiguring, rebuilding etc) but only do the ones that are needed based on what's changed. That's in contrast to the current cabal ui where you have to manually do various steps, and some of them are not quick to re-do when nothing has changed.

You can try this all out in the nix-local-build branch.

@BardurArantsson
Copy link
Collaborator

Aha! Got you! Turns out that, yes, I was confused! :)

Thanks!

@dcoutts
Copy link
Contributor Author

dcoutts commented Feb 7, 2016

@23Skidoo when updating the tests I ran into the file timestamp granularity problem with older directory/unix versions and so looked at using our D.C.Compat.Time which is apparently for that purpose a8609ae

But am I going mad or does that not help at all, it's returning EpochTime which is an Int64 that's explicitly just seconds. So no sub-second resolution. What's going on with that? Of course the tests then always fail with that version so I've still got it using the System.Directory version (and adapted the tests to cope with the resolution problem).

@23Skidoo
Copy link
Member

23Skidoo commented Feb 7, 2016

@dcoutts Yes, it looks like we should be using modificationTimeHiRes when it's available (Shake, from which I originally borrowed the code, now does that as well). OTOH, my recollection is that that change fixed resolution problems for me, maybe the result depends on the file system?

Re-export getInstalledPackagesMonitorFiles from Cabal lib and add
getSourcePackagesMonitorFiles locally to D.C.IndexUtils.

These are for tracking changes to these bits of the environment, so that
it's possible for us to recompute things that depend on them.
A FileMonitor is an abstraction for monitoring the status of files,
as well as changes in an in-memory value. Files can be tracked by file
modification time, or mod time plus content. We can also track files
that are expected not to exist (to help implement search paths). We can
also have file globs.

The abstraction is useful for re-running actions when input files or
values change. This pattern is captured by the Rebuild monad.

This adds a dependency on the hashable package (used by
unordered-containers). If this is a problem we can extract just the hash
function we need.

This is not used yet, so there's a temporary import just to make sure it
gets compiled.
dcoutts added a commit that referenced this pull request Feb 7, 2016
New infrastructure: tracking changes in files and values
@dcoutts dcoutts merged commit 8f77569 into haskell:master Feb 7, 2016
@dcoutts
Copy link
Contributor Author

dcoutts commented Feb 7, 2016

@23Skidoo just noticed this interacts badly with the recent 7aab356 as this code uses the WIN32 cpp define. We should change it to use mingw32_HOST_OS.

@23Skidoo
Copy link
Member

23Skidoo commented Feb 7, 2016

@dcoutts

We should change it to use mingw32_HOST_OS.

Done.

@dcoutts
Copy link
Contributor Author

dcoutts commented Feb 7, 2016

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants