Allow arbitrary directories to be Snapshotted #5801

illicitonion · 2018-05-10T16:46:17Z

We don't cache anything intermediate here, we just dump the Snapshot
into the Store and return it.

This is intended to be a short-term transition path to allow BinaryUtils
to be consumed in the v2 engine.

stuhood · 2018-05-10T17:42:23Z

As discussed in slack, I think option (2) is the way to go, via a native method like: PyResult scheduler_capture_snapshots(Scheduler*, char*, Value*, uint64_t), which would take a base path and a list of Values wrapping PathGlobs, and return a value wrapping a list of Snapshots.

stuhood

Thanks!

stuhood · 2018-05-13T17:48:11Z

src/rust/engine/fs/src/lib.rs

+    Ok(PosixFS {
+      root: Self::canonicalize(root)?,
+      pool: self.pool.clone(),
+      ignore: self.ignore.clone(),


The ignore patterns make this a bit challenging I think... the patterns used here are relative to the root of the repository/buildroot, and so we construct the Gitignore instance with an empty root. But if you were going to be capturing from a subdirectory of the repository and you still wanted to apply the same patterns, you'd need to pass in the portion of the subdirectory relative to the buildroot (afaik).

This is to account for patterns like:

$ cat .gitignore /please/ignore/this/long/specific/file.sh

I think that not applying ignore patterns to this type of capture would be reasonable, and would probably be fine for now. But if we're going to apply them, I think we'd need to re-construct them as relative to the buildroot (if the capture was under the buildroot), or ignore the patterns if the capture was outside the buildroot.

This is still an issue I think.

stuhood · 2018-05-13T17:50:43Z

src/python/pants/engine/native.py

@@ -216,6 +216,8 @@
 PyResult  execution_add_root_select(Scheduler*, ExecutionRequest*, Key, TypeConstraint);
 RawNodes* execution_execute(Scheduler*, ExecutionRequest*);

+PyResult capture_snapshot(Scheduler*, char*, Value);


In general, for synchronous methods with concurrent/async implementations, I think it's a good idea to expose batch interfaces by default, as that sends the right signal about usage (that a caller should attempt to batch their calls wherever possible). Would recommend making this a batch method.

I understand the desire, but will push back a little for now for two reasons:

I don't expect people to actually call this much at all, it's really just a short-term hack; if the do see people calling this, we need to design a better API.

The overhead of taking a list of things, which I think to do nicely would be: Create a new intrinsic type for PathGlobsAndRoot, pass it through everywhere, take a pointer to them with separate length, project out each thing from the tuple, seems like overkill. Happy to do this if we end up actually needing it, but again, if we find ourselves in that place we should probably design an actual API.

I don't expect people to actually call this much at all, it's really just a short-term hack; if the do see people calling this, we need to design a better API.

It will be used until "all" of pants is ported to the v2 engine in order to transition back and forth between local files and snapshots. And we immediately have a need for it to be concurrent I think, in order to capture ~500 jars into 500 snapshots on every jvm_compile startup. See #5762 (comment)

The overhead of taking a list of things, which I think to do nicely would be: Create a new intrinsic type for PathGlobsAndRoot, pass it through everywhere, take a pointer to them with separate length, project out each thing from the tuple, seems like overkill.

I don't think you need a new intrinsic type, although wrapper datatypes would be useful. It would look like:

project_multi for a field that contains a list of PathGlobsAndRoots

project str for the root

use existing PathGlobs listing code for the rest

Before jumping into discussion here: This is now done.

I don't expect people to actually call this much at all, it's really just a short-term hack; if the do see people calling this, we need to design a better API.

It will be used until "all" of pants is ported to the v2 engine in order to transition back and forth between local files and snapshots.

I was expecting this to be a stop-gap until BinUtils is migrated into v2, and somehow exposed to v1 tasks...

And we immediately have a need for it to be concurrent I think, in order to capture ~500 jars into 500 snapshots on every jvm_compile startup. See #5762 (comment)

I think I maybe don't understand #5762 then... What are those 500 jars? My understanding is that we already create snapshots for targets when we parse BUILD files for them (that's why ./pants list :: fills up ~/.cache/pants/lmdb_store), but that we throw away the references to the Snapshots when we create the v1 Target type. I thought that #5762 was basically a "don't throw away those references" task, plus some "clean up some legacy stuff which means we have some convoluted paths" task, rather than a "do all of those snapshots twice" kind of task...?

I was expecting this to be a stop-gap until BinUtils is migrated into v2, and somehow exposed to v1 tasks...

There are lots of existing tasks that download things that will need to be ported/bootstrapped into the new model in order to be remoted... maybe this is the last integration that we will do starting from "mid graph", and all other task porting will be bottom up (and thus run resolvers in the datacenter with network access...?), but I have been assuming that ivy/coursier/npm/yarn/go/etc would need bootstrapping into Snapshots for now.

I think I maybe don't understand #5762 then... What are those 500 jars?

The jars fetched by ivy and coursier: the 3rdparty dependencies. The ivy/coursier tasks run during the resolve goal, rather than during build file parsing.

stuhood · 2018-05-13T17:51:13Z

src/python/pants/engine/scheduler.py

@@ -286,6 +286,14 @@ def run_and_return_roots(self, execution_request):
      self._native.lib.nodes_destroy(raw_roots)
    return roots

+  def capture_snapshot(self, root_path, path_globs):
+    result = self._native.lib.capture_snapshot(self._scheduler, root_path, self._to_value(path_globs))
+    value = self._from_value(result.value)


Worth lifting out a little _from_pyresult helper?

stuhood · 2018-05-13T17:58:36Z

src/rust/engine/src/lib.rs

+  })
+}
+
+fn capture_snapshot_inner(


Keeping this file slimmer and making this an impl Scheduler method would be preferable I think.

Done. Slightly ugly because I can't use ?s in a function returning PyResult, but probably a reasonable separation

stuhood · 2018-05-13T18:18:56Z

(rebased this to get a useful travis run)

We don't cache anything intermediate here, we just dump the Snapshot into the Store and return it. This is intended to be a short-term transition path to allow BinaryUtils to be consumed in the v2 engine.

stuhood

Thanks!

illicitonion requested a review from stuhood May 10, 2018 16:46

illicitonion force-pushed the dwagnerhall/binutil/snapshotoutsideroot branch from e3a93d4 to e8c3279 Compare May 11, 2018 10:18

stuhood force-pushed the master branch from b6bb42d to 9e2fdb5 Compare May 11, 2018 23:54

stuhood reviewed May 13, 2018

View reviewed changes

stuhood force-pushed the dwagnerhall/binutil/snapshotoutsideroot branch from e8c3279 to 874350f Compare May 13, 2018 18:18

illicitonion added 2 commits May 16, 2018 13:05

Allow arbitrary directories to be Snapshotted

af27e5e

We don't cache anything intermediate here, we just dump the Snapshot into the Store and return it. This is intended to be a short-term transition path to allow BinaryUtils to be consumed in the v2 engine.

Review comments

f7ba084

illicitonion force-pushed the dwagnerhall/binutil/snapshotoutsideroot branch from 874350f to f7ba084 Compare May 16, 2018 12:05

illicitonion added 2 commits May 17, 2018 14:28

Review comments

fb9d774

Add pydoc

fa72578

stuhood approved these changes May 17, 2018

View reviewed changes

stuhood merged commit 777e500 into pantsbuild:master May 17, 2018

illicitonion mentioned this pull request Jun 22, 2018

Expose materialize_directory as an intrinsic function on scheduler #6007

Closed

illicitonion deleted the dwagnerhall/binutil/snapshotoutsideroot branch August 15, 2018 03:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow arbitrary directories to be Snapshotted #5801

Allow arbitrary directories to be Snapshotted #5801

illicitonion commented May 10, 2018 •

edited

Loading

stuhood commented May 10, 2018 •

edited

Loading

stuhood left a comment

stuhood May 13, 2018

stuhood May 16, 2018

illicitonion May 17, 2018

stuhood May 13, 2018

illicitonion May 16, 2018

stuhood May 16, 2018

illicitonion May 17, 2018

stuhood May 17, 2018

stuhood May 17, 2018 •

edited

Loading

stuhood May 13, 2018

illicitonion May 16, 2018

stuhood May 13, 2018 •

edited

Loading

illicitonion May 16, 2018

stuhood commented May 13, 2018

stuhood left a comment

Allow arbitrary directories to be Snapshotted #5801

Allow arbitrary directories to be Snapshotted #5801

Conversation

illicitonion commented May 10, 2018 • edited Loading

stuhood commented May 10, 2018 • edited Loading

stuhood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood May 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood May 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood commented May 13, 2018

stuhood left a comment

Choose a reason for hiding this comment

illicitonion commented May 10, 2018 •

edited

Loading

stuhood commented May 10, 2018 •

edited

Loading

stuhood May 17, 2018 •

edited

Loading

stuhood May 13, 2018 •

edited

Loading