Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPICE-0009: External Readers #10

Merged
merged 10 commits into from
Nov 5, 2024
Merged

SPICE-0009: External Readers #10

merged 10 commits into from
Nov 5, 2024

Conversation

HT154
Copy link
Contributor

@HT154 HT154 commented Aug 6, 2024

@HT154 HT154 marked this pull request as ready for review August 7, 2024 01:26
Copy link
Contributor

@bioball bioball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some high level thoughts:

Using a persistent process with message passing isn't a bad idea--for processes with a higher startup cost (e.g. JVM-based processes), you pay that penalty once for the life of the evaluator, rather than every time a read() happens. However, there's an argument to be made for having the external reader binary be short-lived. For example, this makes it possible to write an external reader that's simply a bash script. And making these things bash-script-able is quite nice:

#!/usr/bin/env bash

set -e

URI="$1"

if [[ "$URI" != ldap:* ]]; then
  >&2 echo "Unexpected URI: $URI. Expected ldap scheme."
  exit 1
fi

ldapsearch -D "${URI:5}"

If it's a persistent process, it should probably be spawned the first time a module or resource is read, then kept alive while the evaluator session is alive (e.g. until it is closed).

I don't know if we need a DiscoverReadersRequest and DiscoverReadersResponse? Wouldn't the evaluator already know about the external readers it needs?

spices/SPICE-0009-external-readers.adoc Outdated Show resolved Hide resolved
spices/SPICE-0009-external-readers.adoc Outdated Show resolved Hide resolved
spices/SPICE-0009-external-readers.adoc Show resolved Hide resolved
Copy link
Contributor

@stackoverflow stackoverflow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing an example that shows the whole process. Your example just shows a Pkl file using the reader and a cli call, but it's not clear to me how these things glue together.

Also think having the processes being long-running would put too much hurdles for people who want to contribute a reader, so let's keep that as a future improvement.

For the package support:
That seems to complicated the design quite a bit. Now Pkl has to understand how to properly process packages-as-readers. If my reader is a Ruby script and I want to deploy it as a package, do I have to bundle the script together? Gets even more complicated with executables and multiplatform support. Or maybe I misunderstood you and package readers have to be Pkl-only.
I'm not sure how useful writing readers in Pkl instead of general languages will be. Though perhaps I'm wrong. My point is, maybe pkl packages are not the best distribution mechanism for readers, even though I can't think of a good solution for this.

spices/SPICE-0009-external-readers.adoc Outdated Show resolved Hide resolved
spices/SPICE-0009-external-readers.adoc Outdated Show resolved Hide resolved
spices/SPICE-0009-external-readers.adoc Outdated Show resolved Hide resolved
@HT154
Copy link
Contributor Author

HT154 commented Aug 8, 2024

Thanks for the review @bioball!

Using a persistent process with message passing isn't a bad idea--for processes with a higher startup cost (e.g. JVM-based processes), you pay that penalty once for the life of the evaluator, rather than every time a read() happens. However, there's an argument to be made for having the external reader binary be short-lived. For example, this makes it possible to write an external reader that's simply a bash script.

I definitely agree that there's utility in enabling the super simple scripting route! My primary motivation for a persistent subprocess is, as you've identified, startup cost.

Here's a real world example: I would like to be able to use an external reader to mediate access to a secret store in a large monorepo. I'd estimate that there are 80+ unique resource URIs that would be read during a "world eval". The process that would implement the external reader currently takes ~0.25 seconds to invoke from the command line (in the happy path), which would add ~20 seconds to the overall evaluation time. Most of that quarter second is process and initialization overhead and the actual read takes ~0.05 seconds, so if the reader process were persistent it would save ~15 seconds from the total evaluation time.

If persistent reader processes are implemented, it also becomes fairly trivial to implement one-shot readers. Spitballing, it might look something like this:

pkl eval <module.pkl> --external-reader 'pkl-cmd ldap=path/to/pkl-ldap.sh ldaps=path/to/pkl-ldap.sh' --allowed-resources ...

Where the pkl-cmd executable implements the external reader message passing API and registers a ResourceReader for each specified scheme->script mapping. I'd be interested in shipping pkl-cmd as part of the Pkl project.

If it's a persistent process, it should probably be spawned the first time a module or resource is read, then kept alive while the evaluator session is alive (e.g. until it is closed).

I don't know if we need a DiscoverReadersRequest and DiscoverReadersResponse? Wouldn't the evaluator already know about the external readers it needs?

The ergonomic motivation for spawning a persistent process on startup and having the readers be discovered at runtime would be to decouple which reader binaries are used from which schemes they provide. If one-shot reader processes are used, the scheme->binary registration is unavoidable, but persistent readers may implement many schemes. Having to perform the same mapping for persistent processes would be very verbose (since setting allowed modules is also required), but would be able to spawn the process on first use instead of during evaluator initialization. Another consideration here is user error: if a user registers a reader for a scheme that it does not provide, how should pkl respond?

@HT154
Copy link
Contributor Author

HT154 commented Aug 8, 2024

Thanks for the review @stackoverflow!

I'm missing an example that shows the whole process. Your example just shows a Pkl file using the reader and a cli call, but it's not clear to me how these things glue together.

I think I've addressed this in other comments, but please let me know if you still have questions.

Also think having the processes being long-running would put too much hurdles for people who want to contribute a reader, so let's keep that as a future improvement.

My main concerns with only targeting "one-shot" reader processes at this time are:

  • It's not clear how to support globbing (i.e. ListResourcesRequest/ListModulesRequest)
  • It may make "persistent" readers more complicated to implement in the future from a configuration/CLI perspective
  • Performance, as outlined above

Above I proposed a pkl-cmd persistent external reader that could function as an "adapter" of sorts to simpler one-shot reader processes. I definitely agree that the simplicity of writing a simple script should be a goal here, and I don't think the more involved persistent reader design rules that out.

For the package support: That seems to complicated the design quite a bit. Now Pkl has to understand how to properly process packages-as-readers.

I may be misinterpreting this concern, but I'm not sure this is so complicated. Reader executables would need to be supplied as package file URIs including the fragment #/path/to/binary.

If my reader is a Ruby script and I want to deploy it as a package, do I have to bundle the script together?

My thought here is to primarily target languages for which (message-passing) bindings already exist. Both Golang and Swift (at least on linux) can produce statically linked binaries that can be packaged dependency-free. If there are bindings for interpreted languages in the future, distribution would likely need to be via the language package manager, which this design does not rule out.

Gets even more complicated with executables and multiplatform support.

Definitely agree, but I think pkl:platform makes this reasonable. I show what this might look like in a PklProject here: #10 (comment)

@stackoverflow
Copy link
Contributor

I think I've addressed this in other comments, but please let me know if you still have questions.

You addressed it in the comments, but someone reading the SPICE shouldn't have to read comments to understand it, so it would be nice if that was clear in the SPICE itself.

@stackoverflow
Copy link
Contributor

About having readers being persistent:

I think you are approaching this from the wrong side. For Pkl, the reader should just be a process it spawns once it finds a read. If you want it to be persistent (which I consider an edge case) you can run a persistent background process and write a simple script for Pkl to call which would pipe the reads.
This would simplify your proposal and there wouldn't be a need for a pkl-cmd.

Also it's not clear to me how errors and the communication happens. I imagine we can keep the unix tradition of using return codes so 0 would mean success and other numbers would be errors. stderr could contain the error message in case things went wrong and stdout would contain the actual result of the read. But anyway, all of this should be in the SPICE.

@bioball
Copy link
Contributor

bioball commented Aug 9, 2024

Josh actually brings up a good point--the process of resolving a read can involve multiple calls to a reader, which can make a short-lived execution quite complicated. For example, resolving a glob star import (e.g. import*("foo:/**.pkl") is linear to the number of directories needed to read.

Additionally, Pkl wants to know three facts about each scheme:

  • Is it globbable?
  • Is it hierarchical?
  • Is it local?

It's quite a rough experience to provide all this to Pkl as CLI args: this is quite long: --external-reader ldap=pkl-ldap --globbable-reader=pkl-ldap --hierarchical-reader=pkl-ldap --local-reader=pkl-ldap.

I echo Islon's concerns about distributing binaries using Pkl packages. I see the value here, but this isn't what packages are designed for. Besides, there are already many ways to distribute binaries; languages typically already offer their own mechanisms for installing executables (go install, npm isntall -g, pip install, etc), and so do OS-level tools (apt-get, rpm, winget, yum, brew). Also: a tarball isn't a sufficient way to distribute executables; some executables need to link to libraries (e.g. glibc), and others require a runtime (e.g. Node.js). It's better to delegate this task to something else.

Some additional thoughts:

  • Right now, the message passing API is held outside of pkl-core. We did this to reduce the number of dependencies (no msgpack), and to minimize the amount of core that gets shipped in our core library. This feature implies that much of our message passing logic (e.g. MessageTransport) gets moved to pkl-core. However, I think we want this anyway--we've talked about moving BinaryEvaluator into pkl-core, and providing a serializer/deserializer to our Java library users (and perhaps even in-language).
  • I don't know if we want shelling out to executables to become the norm for Pkl. This feature makes sense to bring feature parity to the CLI, but library users may not want Pkl to spawn executables. However, it still makes sense that library users should be able to configure Pkl to do this if the CLI can do this too.

@HT154
Copy link
Contributor Author

HT154 commented Aug 17, 2024

I've taken another pass on this. Here are the major changes:

  • Moved URI-based reader executables to future directions
  • Addressed exec-per-read and up-front scheme->reader registration under alternatives considered
  • Added significant detail to the detailed design section

spices/SPICE-0009-external-readers.adoc Outdated Show resolved Hide resolved
spices/SPICE-0009-external-readers.adoc Outdated Show resolved Hide resolved
spices/SPICE-0009-external-readers.adoc Outdated Show resolved Hide resolved
spices/SPICE-0009-external-readers.adoc Outdated Show resolved Hide resolved
spices/SPICE-0009-external-readers.adoc Outdated Show resolved Hide resolved
HT154 added a commit to HT154/pkl that referenced this pull request Sep 9, 2024
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10).
It is being contributed in a separate pull request to ease review.

The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
HT154 added a commit to HT154/pkl that referenced this pull request Sep 9, 2024
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10).
It is being contributed in a separate pull request to ease review.

The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
HT154 added a commit to HT154/pkl that referenced this pull request Sep 9, 2024
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10).
It is being contributed in a separate pull request to ease review.

The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
HT154 added a commit to HT154/pkl that referenced this pull request Sep 9, 2024
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10).
It is being contributed in a separate pull request to ease review.

The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
HT154 added a commit to HT154/pkl that referenced this pull request Sep 9, 2024
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10).
It is being contributed in a separate pull request to ease review.

The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
HT154 added a commit to HT154/pkl that referenced this pull request Sep 9, 2024
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10).
It is being contributed in a separate pull request to ease review.

The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
HT154 added a commit to HT154/pkl that referenced this pull request Sep 15, 2024
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10).
It is being contributed in a separate pull request to ease review.

This is required to allow deferred launch of external reader processes without requiring up-front configuration of scheme->process mappings.
More details on why this is required here: apple/pkl-evolution#10 (comment)

One concern here (which applies before this change as well) is that the semantics of module key factories and resource readers is reversed.
For module key factories, the first factory that answers for a URI scheme "wins", while for resource readers, the _last_ reader that answers for a URI scheme "wins".
This PR preserves that behavior, but it may be worth reconsidering this design at this time.
This behavior can be observed in practice in `ResourceReadersEvaluatorTest.\`module path\`` where `ResourceReaders.modulePath` is added after the pre-configured `ResourceReaders.classPath` and takes precedence.
HT154 added a commit to HT154/pkl that referenced this pull request Sep 15, 2024
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10).
It is being contributed in a separate pull request to ease review.

This is required to allow deferred launch of external reader processes without requiring up-front configuration of scheme->process mappings.
More details on why this is required here: apple/pkl-evolution#10 (comment)

One concern here (which applies before this change as well) is that the semantics of module key factories and resource reader factories is reversed.
For module key factories, the first factory that answers for a URI scheme "wins", while for resource reader factories, the _last_ reader factory that answers for a URI scheme "wins".
This PR preserves that behavior, but it may be worth reconsidering this design at this time.

This behavior can be observed in practice in <code>ResourceReadersEvaluatorTest.\`module path\`</code> where `ResourceReaders.modulePath` is added after the pre-configured `ResourceReaders.classPath` and takes precedence.

This is a fairly large breaking API change, but in practice most clients will only need to replace a few lines in their calls to `EvaluatorBuilder`, eg.
```java
.addResourceReader(ResourceReaders.environmentVariable())
// becomes
.addResourceReaderFactory(ResourceReaderFactories.environmentVariable())
```
HT154 added a commit to HT154/pkl that referenced this pull request Sep 16, 2024
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10).
It is being contributed in a separate pull request to ease review.

This is required to allow deferred launch of external reader processes without requiring up-front configuration of scheme->process mappings.
More details on why this is required here: apple/pkl-evolution#10 (comment)

One concern here (which applies before this change as well) is that the semantics of module key factories and resource reader factories is reversed.
For module key factories, the first factory that answers for a URI scheme "wins", while for resource reader factories, the _last_ reader factory that answers for a URI scheme "wins".
This PR preserves that behavior, but it may be worth reconsidering this design at this time.

This behavior can be observed in practice in <code>ResourceReadersEvaluatorTest.\`module path\`</code> where `ResourceReaders.modulePath` is added after the pre-configured `ResourceReaders.classPath` and takes precedence.

This is a fairly large breaking API change, but in practice most clients will only need to replace a few lines in their calls to `EvaluatorBuilder`, eg.
```java
.addResourceReader(ResourceReaders.environmentVariable())
// becomes
.addResourceReaderFactory(ResourceReaderFactories.environmentVariable())
```
requestId: Int

/// The scheme of the resource to discover the spec for
/// The scheme of the resource to initialize.
scheme: String
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reader should already know what schemes it should read when it receives a Read*Request. Do we need scheme here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initialize messages only request the spec for a single scheme at a time. It would be possible to do an "all at once" sort of thing like the original proposal, but this method is actually super convenient from an implementation standpoint as it enables using the battle-tested ConcurrentHashMap response memoization already widely employed by the message passing API. My initial pass at the implementation that did the "all at once" discovery needed some messy locking; it would be great to pass the buck to the standard library instead.

HT154 added a commit to HT154/pkl that referenced this pull request Sep 24, 2024
HT154 added a commit to HT154/pkl-go that referenced this pull request Sep 24, 2024
[SPICE-0009](apple/pkl-evolution#10)

* Add `EvaluatorOptions.ExternalModuleReaders` and `EvaluatorOptions.ExternalResourceReaders`.
* Add `ExternalReaderRuntime` to host the child process side of the external reader workflow.
HT154 added a commit to HT154/pkl-swift that referenced this pull request Sep 25, 2024
HT154 added a commit to HT154/pkl-swift that referenced this pull request Sep 25, 2024
HT154 added a commit to HT154/pkl-swift that referenced this pull request Sep 25, 2024
HT154 added a commit to HT154/pkl-swift that referenced this pull request Sep 25, 2024
HT154 added a commit to HT154/pkl that referenced this pull request Oct 3, 2024
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10).
It is being contributed in a separate pull request to ease review.

The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
HT154 added a commit to HT154/pkl that referenced this pull request Oct 3, 2024
HT154 added a commit to HT154/pkl that referenced this pull request Oct 7, 2024
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10).
It is being contributed in a separate pull request to ease review.

The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
HT154 added a commit to HT154/pkl that referenced this pull request Oct 7, 2024
HT154 added a commit to HT154/pkl-go that referenced this pull request Oct 10, 2024
[SPICE-0009](apple/pkl-evolution#10)

* Add `EvaluatorOptions.ExternalModuleReaders` and `EvaluatorOptions.ExternalResourceReaders`.
* Add `ExternalReaderRuntime` to host the child process side of the external reader workflow.
HT154 added a commit to HT154/pkl-swift that referenced this pull request Oct 10, 2024
HT154 added a commit to HT154/pkl that referenced this pull request Oct 16, 2024
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10).
It is being contributed in a separate pull request to ease review.

The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
HT154 added a commit to HT154/pkl that referenced this pull request Oct 16, 2024
HT154 added a commit to HT154/pkl that referenced this pull request Oct 17, 2024
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10).
It is being contributed in a separate pull request to ease review.

The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
HT154 added a commit to HT154/pkl that referenced this pull request Oct 17, 2024
HT154 added a commit to HT154/pkl that referenced this pull request Oct 17, 2024
HT154 added a commit to HT154/pkl that referenced this pull request Oct 24, 2024
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10).
It is being contributed in a separate pull request to ease review.

The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
HT154 added a commit to HT154/pkl that referenced this pull request Oct 24, 2024
HT154 added a commit to HT154/pkl that referenced this pull request Oct 24, 2024
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10).
It is being contributed in a separate pull request to ease review.

The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
HT154 added a commit to HT154/pkl that referenced this pull request Oct 24, 2024
HT154 added a commit to HT154/pkl that referenced this pull request Oct 26, 2024
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10).
It is being contributed in a separate pull request to ease review.

The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
HT154 added a commit to HT154/pkl that referenced this pull request Oct 26, 2024
@bioball bioball merged commit 5c8b77d into apple:main Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants