Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port pantsd client to rust. #11831

Closed
jsirois opened this issue Apr 2, 2021 · 13 comments · Fixed by #18957
Closed

Port pantsd client to rust. #11831

jsirois opened this issue Apr 2, 2021 · 13 comments · Fixed by #18957
Assignees

Comments

@jsirois
Copy link
Contributor

jsirois commented Apr 2, 2021

This saves us ~500ms on my machine:

$ hyperfine -w 2 './pants --no-anonymous-telemetry-enabled -V' 'nails-client 0.0.0.0:$(cat .pids/1fd8a12b4fc1/pantsd/socket) does-not-matter --no-anonymous-telemetry-enabled -V'
Benchmark #1: ./pants --no-anonymous-telemetry-enabled -V
  Time (mean ± σ):     700.7 ms ±  11.5 ms    [User: 483.5 ms, System: 45.2 ms]
  Range (min … max):   688.7 ms … 722.2 ms    10 runs
 
Benchmark #2: nails-client 0.0.0.0:$(cat .pids/1fd8a12b4fc1/pantsd/socket) does-not-matter --no-anonymous-telemetry-enabled -V
  Time (mean ± σ):     178.1 ms ±   4.6 ms    [User: 2.8 ms, System: 1.5 ms]
  Range (min … max):   173.4 ms … 193.3 ms    16 runs
 
Summary
  'nails-client 0.0.0.0:$(cat .pids/1fd8a12b4fc1/pantsd/socket) does-not-matter --no-anonymous-telemetry-enabled -V' ran
    3.93 ± 0.12 times faster than './pants --no-anonymous-telemetry-enabled -V'
@jsirois
Copy link
Contributor Author

jsirois commented Apr 2, 2021

And the rust client is as snappy as we'll likely get since:

$ hyperfine -w 3 './pants --no-anonymous-telemetry-enabled -V' 'nails-client 127.0.0.1:$(cat .pids/1fd8a12b4fc1/pantsd/socket) does-not-matter --no-anonymous-telemetry-enabled -V' 'ng --nailgun-server 127.0.0.1 --nailgun-port $(cat .pids/1fd8a12b4fc1/pantsd/socket) does-not-matter --no-anonymous-telemetry-enabled -V'
Benchmark #1: ./pants --no-anonymous-telemetry-enabled -V
  Time (mean ± σ):     703.9 ms ±  14.8 ms    [User: 490.4 ms, System: 45.6 ms]
  Range (min … max):   688.6 ms … 739.4 ms    10 runs
 
Benchmark #2: nails-client 127.0.0.1:$(cat .pids/1fd8a12b4fc1/pantsd/socket) does-not-matter --no-anonymous-telemetry-enabled -V
  Time (mean ± σ):     181.9 ms ±   9.9 ms    [User: 2.6 ms, System: 1.9 ms]
  Range (min … max):   174.4 ms … 205.5 ms    16 runs
 
Benchmark #3: ng --nailgun-server 127.0.0.1 --nailgun-port $(cat .pids/1fd8a12b4fc1/pantsd/socket) does-not-matter --no-anonymous-telemetry-enabled -V
  Time (mean ± σ):     220.1 ms ±   4.0 ms    [User: 1.2 ms, System: 1.1 ms]
  Range (min … max):   215.8 ms … 227.0 ms    13 runs
 
Summary
  'nails-client 127.0.0.1:$(cat .pids/1fd8a12b4fc1/pantsd/socket) does-not-matter --no-anonymous-telemetry-enabled -V' ran
    1.21 ± 0.07 times faster than 'ng --nailgun-server 127.0.0.1 --nailgun-port $(cat .pids/1fd8a12b4fc1/pantsd/socket) does-not-matter --no-anonymous-telemetry-enabled -V'
    3.87 ± 0.22 times faster than './pants --no-anonymous-telemetry-enabled -V'

@jsirois jsirois self-assigned this Apr 2, 2021
jsirois added a commit to jsirois/pants that referenced this issue Apr 14, 2021
Currently the client only handles talking to a pantsd brought up by
other means. The Pants repo ./pants script is updated to optionally use
the native client and prop up pantsd using the python client as needed.

Work towards pantsbuild#11831
stuhood added a commit that referenced this issue Apr 16, 2021
As described in #11618, when `pantsd` intentionally exits due to low memory, a few types of work can be cut short:
1. if the run ends in Ctrl+C, processes that were cancelled may not have had time to be dropped before `pantsd exits.
2. async StreamingWorkunitHandler threads might still be running.

This change adds orderly-shutdown mechanisms to the `Scheduler`/`Core` to join all ongoing `Sessions` (including the SWH), and improves tests to ensure that the SWH is waited for.

Additionally, in the last commit, added purging of the `pantsd` metadata as soon as we decide to restart, which should reduce (but probably not eliminate) the incidence of item 1. from #11618. Work for #11831 will likely further harden this path.

[ci skip-build-wheels]
stuhood added a commit to stuhood/pants that referenced this issue Apr 16, 2021
As described in pantsbuild#11618, when `pantsd` intentionally exits due to low memory, a few types of work can be cut short:
1. if the run ends in Ctrl+C, processes that were cancelled may not have had time to be dropped before `pantsd exits.
2. async StreamingWorkunitHandler threads might still be running.

This change adds orderly-shutdown mechanisms to the `Scheduler`/`Core` to join all ongoing `Sessions` (including the SWH), and improves tests to ensure that the SWH is waited for.

Additionally, in the last commit, added purging of the `pantsd` metadata as soon as we decide to restart, which should reduce (but probably not eliminate) the incidence of item 1. from pantsbuild#11618. Work for pantsbuild#11831 will likely further harden this path.

[ci skip-build-wheels]
stuhood added a commit that referenced this issue Apr 16, 2021
…11934)

As described in #11618, when `pantsd` intentionally exits due to low memory, a few types of work can be cut short:
1. if the run ends in Ctrl+C, processes that were cancelled may not have had time to be dropped before `pantsd` exits.
2. async StreamingWorkunitHandler threads might still be running.

This change adds orderly-shutdown mechanisms to the `Scheduler`/`Core` to join all ongoing `Sessions` (including the SWH), and improves tests to ensure that the SWH is waited for.

Additionally, in the last commit, added purging of the `pantsd` metadata as soon as we decide to restart, which should reduce (but probably not eliminate) the incidence of item 1. from #11618. Work for #11831 will likely further harden this path.

[ci skip-build-wheels]
jsirois added a commit to jsirois/pants that referenced this issue Apr 21, 2021
Currently the client only handles talking to a pantsd brought up by
other means. The Pants repo ./pants script is updated to optionally use
the native client and prop up pantsd using the python client as needed.

Work towards pantsbuild#11831
jsirois added a commit to jsirois/pants that referenced this issue Apr 22, 2021
Currently the client only handles talking to a pantsd brought up by
other means. The Pants repo ./pants script is updated to optionally use
the native client and prop up pantsd using the python client as needed.

Work towards pantsbuild#11831

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
jsirois added a commit to jsirois/pants that referenced this issue Apr 22, 2021
Currently the client only handles talking to a pantsd brought up by
other means. The Pants repo ./pants script is updated to optionally use
the native client and prop up pantsd using the python client as needed.

Work towards pantsbuild#11831

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
jsirois added a commit to jsirois/pants that referenced this issue Jun 22, 2021
Currently the client only handles talking to a pantsd brought up by
other means. The Pants repo ./pants script is updated to optionally use
the native client and prop up pantsd using the python client as needed.

Work towards pantsbuild#11831

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
jsirois added a commit that referenced this issue Jul 6, 2021
* Introduce a native pants client.

Currently the client only handles talking to a pantsd brought up by
other means. The Pants repo ./pants script is updated to optionally use
the native client and prop up pantsd using the python client as needed.

Work towards #11831

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]

* Debug macOS test failure.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]

* Tom's feedback.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]

* Fix parser precedence.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]

* Build engine and client together.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]

* Fixup flag parsing precedence / string list gathering.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]

* Use nailgun crate to manage the pantsd ng connection.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]

* Tighten up test CWD lock impl.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]

* Break tests out into seperate module files.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]

* Restructure BuildRoot to be testable.

Previously the reliance on CWD forced races across test modules which
would require a Mutex shared across those modules. That's awkward and
this arrangement provides for the important test coverage still.

[ci skip-build-wheels]

* Fix clippy lints.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]

* Add best effort tty settings save & restore.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]

* Use nix for termios.

[ci skip-build-wheels]

* Hack in an attempt at cross-platform process cmdline handling.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]

* Switch from remoteprocess to sysinfo.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]

* Use sysinfo exclusively to check pants process.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]

* Make rust Process name extraction match Python.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]

* Introduce ParseError to delay expensive error formatting.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]

* Fix shell fmt and lint.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]

* Add a client test.

Refactor launch_pantsd into a test utility to support this.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]

* Add parse_tests.

This found a few bugs, particularly handling escape sequences for
implicit adds.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]

* Add Args tests.

[ci skip-build-wheels]

* Fix test string formatting.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]

* Add Env tests.

[ci skip-build-wheels]

* Clean up Args tests.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@benjyw
Copy link
Contributor

benjyw commented Jul 6, 2021

#11922 is epic! The client feels snappier, but do we have numbers to prove it? Is there more speed-related work to do? Or is the remaining work for this ticket related to bringing up pantsd?

@benjyw
Copy link
Contributor

benjyw commented Jul 6, 2021

I just did some casual benchmarking of ./pants and ./pants help and I don't see a noticeable improvement. Is that on deck for future work?

@jsirois
Copy link
Contributor Author

jsirois commented Jul 6, 2021

The client feels snappier, but do we have numbers to prove it? Is there more speed-related work to do?

It should be a lot snappier. Are you using USE_NATIVE_PANTS=1?

The numbers are in this PR: #11922 (comment)

The client connection now takes a few ms (see this with -ldebug), so we can't win much more there. The remaining ~200ms noop is all in pantsd. Certainly there is work we could do on that side

Or is the remaining work for this ticket related to bringing up pantsd?

Yes, exactly.

@benjyw
Copy link
Contributor

benjyw commented Jul 6, 2021

Oh, hah, I was not using USE_NATIVE_PANTS=1. That is... excellent.

jsirois added a commit to jsirois/pants that referenced this issue Jul 12, 2021
This is a follow-up to pantsbuild#11831.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
jsirois added a commit that referenced this issue Jul 12, 2021
This is a follow-up to #11831.
@benjyw benjyw removed the performance label Sep 9, 2021
@jsirois
Copy link
Contributor Author

jsirois commented Feb 1, 2023

The combination of #18145 and pantsbuild/scie-pants#94 give a natural home for the Pants native client in Pants distributions and execution flows. It would be good to finish off the client in anticipation of this. Towards that end I'll be updating this ticket with the remaining work outline needed to enable fully replacing the Pants python main entrypoint with the native Pants client.

@stuhood
Copy link
Member

stuhood commented Feb 1, 2023

FWIW: I've had USE_NATIVE_PANTS set for ... almost two years now, and never noticed an issue with it.

@jsirois
Copy link
Contributor Author

jsirois commented Feb 1, 2023

I'll edit this comment as I gather all the required missing elements and cleanups. The lists are not ordered intentionally, but they are labelled for discussion.

The required missing functionality:

  • 1. Handle launching pantsd when pantsd is not up and pantsd is enabled.

The cleanup tasks:

Notes:

  • 1 could be parametrized such that some information about how to launch Pantsd comes from configuration / the environment. In the pants scie case, there will be a known Python binary the Pants native client knows nothing about but will need to know about to run Pantsd as well as venv information.
  • B. Requires implementation of dict option parsing in Rust at the least - further review is needed.

@jsirois
Copy link
Contributor Author

jsirois commented Feb 2, 2023

I actually think the comment above covers things. 1 is the only thing needed to get a prototype of this working and it can probably be implemented robustly by injecting the command line for running the Python Pants client (or that command line's key components) via env var in the lift manifest. The client would then gain the logic currently in the Pants repo ./pants script - try to connect to Pantsd and if there is a failure, fall back to the python client command line which already knows how to start pantsd and wait for it to be up. This is probably not an ideal end-state - it would likely be good to get the launching of pantsd and wait loop ported into the native client - but that work would be non-disruptive to the release + install process and users of it.

@stuhood
Copy link
Member

stuhood commented Apr 24, 2023

.@benjyw and I hacked on this a bit today. We think that a quick path to getting the native client used by default is to:

  • Begin including the native-client binary in the Pants wheels (draft PR here: Include the native client in the pantsbuild wheel. #18827)
  • Adjust the native-client binary to either detect-or-be-explicitly-passed a venv that it is running in, and execute its own fallback from connecting to pantsd to invoking the server's entrypoint in that venv.
    • The environment variable would switch behavior for the native-client: if running in a venv, it would be responsible for execv'ing into the venv. If not running in a venv (i.e., if running Pants from source), it could continue to return 75, so that the pantsbuild/pants/pants script could continue to drive bootstrap (for now).
  • Adjust scie-pants to invoke the native client binary if it exists in the venv that has been created.

@stuhood stuhood self-assigned this May 2, 2023
@stuhood
Copy link
Member

stuhood commented May 4, 2023

Made some progress on the above: it looks like the native-client does its own buildroot detection, which lightly conflicts with what is going on in scie-pants: will adjust that.

The outcome (tomorrow hopefully?) should be 1 PR to each of scie-pants and pants.

stuhood added a commit that referenced this issue May 10, 2023
The Pants native client which was introduced in #11922 has so far only
been usable in the Pants repo when the `USE_NATIVE_PANTS` environment
variable was set.

To make the native client available to end users, this change begins
distributing the native-client binary in Pants wheels. A corresponding
change in the `scie-pants` repo
(pantsbuild/scie-pants#172) uses the native
client to run `pants`.

To reduce the surface area of `scie-pants` (rather than having it be
responsible for handling the special `75` exit code similar to the
`pants` script integration), this PR also moves to having the
native-client execute its own fallback (via `execv`) to the Pants
entrypoint. In future, the `pantsbuild/pants` `pants` script could also
use that facility (e.g. by specifying a separate `pants_server` bash
script as the entrypoint to use as the `_PANTS_SERVER_EXE`).

----

As originally demonstrated on #11831, the native client is still much
faster than the legacy client. Using
pantsbuild/scie-pants#172, the timings look
like:
```
Benchmark #1: PANTS_NO_NATIVE_CLIENT=true PANTS_SHA=836cceb74e6af042e7c82677f3ceb4927efce20e scie-pants-macos-x86_64 help
  Time (mean ± σ):      1.161 s ±  0.067 s    [User: 830.6 ms, System: 79.2 ms]
  Range (min … max):    1.054 s …  1.309 s    10 runs

Benchmark #2: PANTS_SHA=836cceb74e6af042e7c82677f3ceb4927efce20e scie-pants-macos-x86_64 help
  Time (mean ± σ):     271.0 ms ±  30.6 ms    [User: 8.9 ms, System: 6.9 ms]
  Range (min … max):   241.5 ms … 360.6 ms    12 runs

Summary
  'PANTS_SHA=836cceb74e6af042e7c82677f3ceb4927efce20e scie-pants-macos-x86_64 help' ran
    4.29 ± 0.54 times faster than 'PANTS_NO_NATIVE_CLIENT=true PANTS_SHA=836cceb74e6af042e7c82677f3ceb4927efce20e scie-pants-macos-x86_64 help'
```

Fixes #11831.
@benjyw
Copy link
Contributor

benjyw commented May 10, 2023

🎉

@stuhood
Copy link
Member

stuhood commented May 10, 2023

The final piece is over in pantsbuild/scie-pants#172 : users will need to upgrade scie-pants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants