-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPICE-0009: External Readers #10
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some high level thoughts:
Using a persistent process with message passing isn't a bad idea--for processes with a higher startup cost (e.g. JVM-based processes), you pay that penalty once for the life of the evaluator, rather than every time a read()
happens. However, there's an argument to be made for having the external reader binary be short-lived. For example, this makes it possible to write an external reader that's simply a bash script. And making these things bash-script-able is quite nice:
#!/usr/bin/env bash
set -e
URI="$1"
if [[ "$URI" != ldap:* ]]; then
>&2 echo "Unexpected URI: $URI. Expected ldap scheme."
exit 1
fi
ldapsearch -D "${URI:5}"
If it's a persistent process, it should probably be spawned the first time a module or resource is read, then kept alive while the evaluator session is alive (e.g. until it is closed).
I don't know if we need a DiscoverReadersRequest
and DiscoverReadersResponse
? Wouldn't the evaluator already know about the external readers it needs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm missing an example that shows the whole process. Your example just shows a Pkl file using the reader and a cli call, but it's not clear to me how these things glue together.
Also think having the processes being long-running would put too much hurdles for people who want to contribute a reader, so let's keep that as a future improvement.
For the package
support:
That seems to complicated the design quite a bit. Now Pkl has to understand how to properly process packages-as-readers. If my reader is a Ruby script and I want to deploy it as a package, do I have to bundle the script together? Gets even more complicated with executables and multiplatform support. Or maybe I misunderstood you and package readers have to be Pkl-only.
I'm not sure how useful writing readers in Pkl instead of general languages will be. Though perhaps I'm wrong. My point is, maybe pkl packages are not the best distribution mechanism for readers, even though I can't think of a good solution for this.
Co-authored-by: Islon Scherer <[email protected]>
Co-authored-by: Islon Scherer <[email protected]>
Thanks for the review @bioball!
I definitely agree that there's utility in enabling the super simple scripting route! My primary motivation for a persistent subprocess is, as you've identified, startup cost. Here's a real world example: I would like to be able to use an external reader to mediate access to a secret store in a large monorepo. I'd estimate that there are 80+ unique resource URIs that would be read during a "world eval". The process that would implement the external reader currently takes ~0.25 seconds to invoke from the command line (in the happy path), which would add ~20 seconds to the overall evaluation time. Most of that quarter second is process and initialization overhead and the actual read takes ~0.05 seconds, so if the reader process were persistent it would save ~15 seconds from the total evaluation time. If persistent reader processes are implemented, it also becomes fairly trivial to implement one-shot readers. Spitballing, it might look something like this:
Where the
The ergonomic motivation for spawning a persistent process on startup and having the readers be discovered at runtime would be to decouple which reader binaries are used from which schemes they provide. If one-shot reader processes are used, the scheme->binary registration is unavoidable, but persistent readers may implement many schemes. Having to perform the same mapping for persistent processes would be very verbose (since setting allowed modules is also required), but would be able to spawn the process on first use instead of during evaluator initialization. Another consideration here is user error: if a user registers a reader for a scheme that it does not provide, how should pkl respond? |
Thanks for the review @stackoverflow!
I think I've addressed this in other comments, but please let me know if you still have questions.
My main concerns with only targeting "one-shot" reader processes at this time are:
Above I proposed a
I may be misinterpreting this concern, but I'm not sure this is so complicated. Reader executables would need to be supplied as package file URIs including the fragment
My thought here is to primarily target languages for which (message-passing) bindings already exist. Both Golang and Swift (at least on linux) can produce statically linked binaries that can be packaged dependency-free. If there are bindings for interpreted languages in the future, distribution would likely need to be via the language package manager, which this design does not rule out.
Definitely agree, but I think |
You addressed it in the comments, but someone reading the SPICE shouldn't have to read comments to understand it, so it would be nice if that was clear in the SPICE itself. |
About having readers being persistent: I think you are approaching this from the wrong side. For Pkl, the reader should just be a process it spawns once it finds a read. If you want it to be persistent (which I consider an edge case) you can run a persistent background process and write a simple script for Pkl to call which would pipe the reads. Also it's not clear to me how errors and the communication happens. I imagine we can keep the unix tradition of using return codes so 0 would mean success and other numbers would be errors. |
Josh actually brings up a good point--the process of resolving a read can involve multiple calls to a reader, which can make a short-lived execution quite complicated. For example, resolving a glob star import (e.g. Additionally, Pkl wants to know three facts about each scheme:
It's quite a rough experience to provide all this to Pkl as CLI args: this is quite long: I echo Islon's concerns about distributing binaries using Pkl packages. I see the value here, but this isn't what packages are designed for. Besides, there are already many ways to distribute binaries; languages typically already offer their own mechanisms for installing executables ( Some additional thoughts:
|
I've taken another pass on this. Here are the major changes:
|
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10). It is being contributed in a separate pull request to ease review. The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10). It is being contributed in a separate pull request to ease review. The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10). It is being contributed in a separate pull request to ease review. The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10). It is being contributed in a separate pull request to ease review. The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10). It is being contributed in a separate pull request to ease review. The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10). It is being contributed in a separate pull request to ease review. The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10). It is being contributed in a separate pull request to ease review. This is required to allow deferred launch of external reader processes without requiring up-front configuration of scheme->process mappings. More details on why this is required here: apple/pkl-evolution#10 (comment) One concern here (which applies before this change as well) is that the semantics of module key factories and resource readers is reversed. For module key factories, the first factory that answers for a URI scheme "wins", while for resource readers, the _last_ reader that answers for a URI scheme "wins". This PR preserves that behavior, but it may be worth reconsidering this design at this time. This behavior can be observed in practice in `ResourceReadersEvaluatorTest.\`module path\`` where `ResourceReaders.modulePath` is added after the pre-configured `ResourceReaders.classPath` and takes precedence.
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10). It is being contributed in a separate pull request to ease review. This is required to allow deferred launch of external reader processes without requiring up-front configuration of scheme->process mappings. More details on why this is required here: apple/pkl-evolution#10 (comment) One concern here (which applies before this change as well) is that the semantics of module key factories and resource reader factories is reversed. For module key factories, the first factory that answers for a URI scheme "wins", while for resource reader factories, the _last_ reader factory that answers for a URI scheme "wins". This PR preserves that behavior, but it may be worth reconsidering this design at this time. This behavior can be observed in practice in <code>ResourceReadersEvaluatorTest.\`module path\`</code> where `ResourceReaders.modulePath` is added after the pre-configured `ResourceReaders.classPath` and takes precedence. This is a fairly large breaking API change, but in practice most clients will only need to replace a few lines in their calls to `EvaluatorBuilder`, eg. ```java .addResourceReader(ResourceReaders.environmentVariable()) // becomes .addResourceReaderFactory(ResourceReaderFactories.environmentVariable()) ```
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10). It is being contributed in a separate pull request to ease review. This is required to allow deferred launch of external reader processes without requiring up-front configuration of scheme->process mappings. More details on why this is required here: apple/pkl-evolution#10 (comment) One concern here (which applies before this change as well) is that the semantics of module key factories and resource reader factories is reversed. For module key factories, the first factory that answers for a URI scheme "wins", while for resource reader factories, the _last_ reader factory that answers for a URI scheme "wins". This PR preserves that behavior, but it may be worth reconsidering this design at this time. This behavior can be observed in practice in <code>ResourceReadersEvaluatorTest.\`module path\`</code> where `ResourceReaders.modulePath` is added after the pre-configured `ResourceReaders.classPath` and takes precedence. This is a fairly large breaking API change, but in practice most clients will only need to replace a few lines in their calls to `EvaluatorBuilder`, eg. ```java .addResourceReader(ResourceReaders.environmentVariable()) // becomes .addResourceReaderFactory(ResourceReaderFactories.environmentVariable()) ```
requestId: Int | ||
|
||
/// The scheme of the resource to discover the spec for | ||
/// The scheme of the resource to initialize. | ||
scheme: String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reader should already know what schemes it should read when it receives a Read*Request
. Do we need scheme
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initialize messages only request the spec for a single scheme at a time. It would be possible to do an "all at once" sort of thing like the original proposal, but this method is actually super convenient from an implementation standpoint as it enables using the battle-tested ConcurrentHashMap
response memoization already widely employed by the message passing API. My initial pass at the implementation that did the "all at once" discovery needed some messy locking; it would be great to pass the buck to the standard library instead.
[SPICE-0009](apple/pkl-evolution#10) * Add `EvaluatorOptions.ExternalModuleReaders` and `EvaluatorOptions.ExternalResourceReaders`. * Add `ExternalReaderRuntime` to host the child process side of the external reader workflow.
a237987
to
cd20d2e
Compare
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10). It is being contributed in a separate pull request to ease review. The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10). It is being contributed in a separate pull request to ease review. The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
[SPICE-0009](apple/pkl-evolution#10) * Add `EvaluatorOptions.ExternalModuleReaders` and `EvaluatorOptions.ExternalResourceReaders`. * Add `ExternalReaderRuntime` to host the child process side of the external reader workflow.
75171b3
to
46ba868
Compare
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10). It is being contributed in a separate pull request to ease review. The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10). It is being contributed in a separate pull request to ease review. The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
[SPICE-0009](apple/pkl-evolution#10) New close flow
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10). It is being contributed in a separate pull request to ease review. The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
[SPICE-0009](apple/pkl-evolution#10) New close flow
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10). It is being contributed in a separate pull request to ease review. The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
[SPICE-0009](apple/pkl-evolution#10) New close flow
This is preparatory work for [SPICE-0009](apple/pkl-evolution#10). It is being contributed in a separate pull request to ease review. The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.
[SPICE-0009](apple/pkl-evolution#10) New close flow
Implemented in: