-
-
Notifications
You must be signed in to change notification settings - Fork 637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow arbitrary directories to be Snapshotted #5801
Allow arbitrary directories to be Snapshotted #5801
Conversation
As discussed in slack, I think option (2) is the way to go, via a native method like: |
e3a93d4
to
e8c3279
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
src/rust/engine/fs/src/lib.rs
Outdated
Ok(PosixFS { | ||
root: Self::canonicalize(root)?, | ||
pool: self.pool.clone(), | ||
ignore: self.ignore.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ignore
patterns make this a bit challenging I think... the patterns used here are relative to the root of the repository/buildroot, and so we construct the Gitignore
instance with an empty root. But if you were going to be capturing from a subdirectory of the repository and you still wanted to apply the same patterns, you'd need to pass in the portion of the subdirectory relative to the buildroot (afaik).
This is to account for patterns like:
$ cat .gitignore
/please/ignore/this/long/specific/file.sh
I think that not applying ignore patterns to this type of capture would be reasonable, and would probably be fine for now. But if we're going to apply them, I think we'd need to re-construct them as relative to the buildroot (if the capture was under the buildroot), or ignore the patterns if the capture was outside the buildroot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still an issue I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! Done
src/python/pants/engine/native.py
Outdated
@@ -216,6 +216,8 @@ | |||
PyResult execution_add_root_select(Scheduler*, ExecutionRequest*, Key, TypeConstraint); | |||
RawNodes* execution_execute(Scheduler*, ExecutionRequest*); | |||
|
|||
PyResult capture_snapshot(Scheduler*, char*, Value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, for synchronous methods with concurrent/async implementations, I think it's a good idea to expose batch interfaces by default, as that sends the right signal about usage (that a caller should attempt to batch their calls wherever possible). Would recommend making this a batch method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the desire, but will push back a little for now for two reasons:
- I don't expect people to actually call this much at all, it's really just a short-term hack; if the do see people calling this, we need to design a better API.
- The overhead of taking a list of things, which I think to do nicely would be: Create a new intrinsic type for
PathGlobsAndRoot
, pass it through everywhere, take a pointer to them with separate length, project out each thing from the tuple, seems like overkill. Happy to do this if we end up actually needing it, but again, if we find ourselves in that place we should probably design an actual API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't expect people to actually call this much at all, it's really just a short-term hack; if the do see people calling this, we need to design a better API.
It will be used until "all" of pants is ported to the v2 engine in order to transition back and forth between local files and snapshots. And we immediately have a need for it to be concurrent I think, in order to capture ~500 jars into 500 snapshots on every jvm_compile
startup. See #5762 (comment)
The overhead of taking a list of things, which I think to do nicely would be: Create a new intrinsic type for PathGlobsAndRoot, pass it through everywhere, take a pointer to them with separate length, project out each thing from the tuple, seems like overkill.
I don't think you need a new intrinsic type, although wrapper datatypes
would be useful. It would look like:
project_multi
for a field that contains a list of PathGlobsAndRoots- project str for the root
- use existing PathGlobs listing code for the rest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before jumping into discussion here: This is now done.
I don't expect people to actually call this much at all, it's really just a short-term hack; if the do see people calling this, we need to design a better API.
It will be used until "all" of pants is ported to the v2 engine in order to transition back and forth between local files and snapshots.
I was expecting this to be a stop-gap until BinUtils
is migrated into v2, and somehow exposed to v1 tasks...
And we immediately have a need for it to be concurrent I think, in order to capture ~500 jars into 500 snapshots on every jvm_compile startup. See #5762 (comment)
I think I maybe don't understand #5762 then... What are those 500 jars? My understanding is that we already create snapshots for targets when we parse BUILD files for them (that's why ./pants list ::
fills up ~/.cache/pants/lmdb_store
), but that we throw away the references to the Snapshots when we create the v1 Target type. I thought that #5762 was basically a "don't throw away those references" task, plus some "clean up some legacy stuff which means we have some convoluted paths" task, rather than a "do all of those snapshots twice" kind of task...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was expecting this to be a stop-gap until BinUtils is migrated into v2, and somehow exposed to v1 tasks...
There are lots of existing tasks that download things that will need to be ported/bootstrapped into the new model in order to be remoted... maybe this is the last integration that we will do starting from "mid graph", and all other task porting will be bottom up (and thus run resolvers in the datacenter with network access...?), but I have been assuming that ivy/coursier/npm/yarn/go/etc would need bootstrapping into Snapshots for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I maybe don't understand #5762 then... What are those 500 jars?
The jars fetched by ivy and coursier: the 3rdparty dependencies. The ivy/coursier tasks run during the resolve
goal, rather than during build file parsing.
src/python/pants/engine/scheduler.py
Outdated
@@ -286,6 +286,14 @@ def run_and_return_roots(self, execution_request): | |||
self._native.lib.nodes_destroy(raw_roots) | |||
return roots | |||
|
|||
def capture_snapshot(self, root_path, path_globs): | |||
result = self._native.lib.capture_snapshot(self._scheduler, root_path, self._to_value(path_globs)) | |||
value = self._from_value(result.value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth lifting out a little _from_pyresult
helper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/rust/engine/src/lib.rs
Outdated
}) | ||
} | ||
|
||
fn capture_snapshot_inner( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping this file slimmer and making this an impl Scheduler
method would be preferable I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Slightly ugly because I can't use ?
s in a function returning PyResult
, but probably a reasonable separation
e8c3279
to
874350f
Compare
(rebased this to get a useful travis run) |
We don't cache anything intermediate here, we just dump the Snapshot into the Store and return it. This is intended to be a short-term transition path to allow BinaryUtils to be consumed in the v2 engine.
874350f
to
f7ba084
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
We don't cache anything intermediate here, we just dump the Snapshot
into the Store and return it.
This is intended to be a short-term transition path to allow BinaryUtils
to be consumed in the v2 engine.