-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job Spec #8
base: main
Are you sure you want to change the base?
Job Spec #8
Conversation
* Trying to incorporate warpforge formulas, bucket-vm, and Bacalhau jobs
job/job.var1.json
Outdated
"right": { | ||
"run": { | ||
"wasm": "bafyWasmRight", | ||
"input": [{"from": "start", "output": 4}, {"from": "database"}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the output here just "returning a number"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 4
? Current theory is that it'll be the index on the multivalued return in this completely made up example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the 4
. So,it's the index, gotcha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Evidently the field name should be changed for clarity. Good feedback :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest named slots for clarity (and robustness to changes in # of output parameters)
Another suggestion for robustness: Optional expected type/schema for both input and outputs. Feeling there's a good opportunity for a Cambrian-like data lens structure here...
Maybe that's out of scope (because we have clear expectations of stability from the model) but then again, given that mutability is expected...
Some 2022-11-08 IPVM November Meeting Notes @pdxjohnny took (sorry for any errors)
|
…en?: Link to shouldi Coach Alice: Our Open Source Guide Related: ipvm-wg/spec#8 2022-11-14 @pdxjohnny Engineering Logs: #1406
… within Architecting Alice: She's Arriving When? Related: ipvm-wg/spec#8 2022-11-14 @pdxjohnny Engineering Logs: #1406 2022-11-14 SCITT Meeting Notes: #1406
Not sure if this has been brought up before, but would it make sense to use IPLD Schemas instead of JSON Schemas as much as possible and to link to their DMT CID so stuff can be loaded from the network? edit: Whoops, didn't see the latest commit. :P |
task/README.md
Outdated
"type": "ipvm/task", | ||
"version": "0.1.0", | ||
"using": "docker:Qm12345" | ||
"meta": { | ||
"annotations": [] | ||
}, | ||
"args": { | ||
"resources": { | ||
"ram": {"gb": 10} | ||
}, | ||
"inputs": [1, 2, 3], | ||
"entry": "/", | ||
"workdir": "/", | ||
"env": { | ||
"$FOO": "bar" | ||
}, | ||
"timeout": {"seconds": "3600"}, | ||
"contexts": [], | ||
"output": [], | ||
"sharding": 5 | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukemarsden & @simonwo: this is a very rough mockup, but is this more in line with what you need?
Changes:
- They all just "tasks" (wasm, docker, etc)
- Nested fields for docker-specific things
- Signalling Docker with
docker:
... not sure if there's an offical URI scheme from Docker/CNCF/OCI - Annotations are a good idea! Here they're in the common fields section (un-nested)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks broadly good!
- what determines the fields that are valid under
args
? is it the scheme in theusing
URI? I'd maybe prefer to have it be an explicitkind
sub-field (open to a better name!) i.e.type
is top-leveltask
,kind
is the kind of task it is, which determines the validargs
fields. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking of it as being scoped to the URI, but you're right that we probably want at least a version in there or something. We could also keep the same pattern of the type
field inside the payload:
{
"type": "ipvm/task",
"version": "0.1.0",
"using": "docker:Qm12345"
"meta": {
"annotations": []
},
"args": {
"type": "bacalhau/docker", // This line
"version": "0.1.0", // Possibly this, too
"resources": {
"ram": {"gb": 10}
},
"inputs": [1, 2, 3],
"entry": "/",
"workdir": "/",
"env": {
"$FOO": "bar"
},
"timeout": {"seconds": "3600"},
"contexts": [],
"output": [],
"sharding": 5
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also ideally we don't have bacalhau specific things, but generic things that bacalhau and others can implement. So we try and spearhead ipvm/wasm and add ipvm/docker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I haven't read all of everything yet but it's looking fabulous so far!
I'm still trying to get clear that we are aiming at the same end result. In my head it seems to be:
- For memoization, we are looking for a thing we can hash and link to a peice of output. In the best case, someone can compute something on IPVM and then we can reuse that result on Bacalhau, in a fully distriubted way. So the thing we want represents the computation done (the "what") but doesn't depend on execution environment (the "how" i.e. memory limits, gas, timeouts).
- (Because you can run the same computation with timeout = 1hr and timeout = 1day and get the same result, if no error).
- The "IPVM Task" represents the "what" of a unit of computation i.e. what is the code to run, what are the arguments to that code, what data is expected to be available, etc. It doesn't contain the "how" of the execution.
So the thing we want to share across CoD networks specifically is the Task. Bacalhau will adopt the Task and use it in place of our own Spec
. And then when we start building the memozation service, we'll use the Task as input key into the map of jobs to outputs.
- Separately, we want to be able to invoke a Task on Bacalhau from within IPVM. That will involve sending an IPVM Task towards Bacalhau but it will also require sending some extra data as well (the "how"). E.g. what time limits Bac should apply, what publisher to use for the results. So we also need to build an IPVM Effect that will execute a Bac job. Am I right?
- On the outside, the IPVM Workflow defines orchestration of tasks and verification options. Bacalhau's options in this area are currently a richer on e.g. the hardware limits side than what is here but less sophisticated around orchestration. I'm not sure if we want to try and jam everything into this IPVM Workflow? Because there will be a lot of stuff that is not common between networks.
- But Bac should work towards being able to use the Workflow, not least because we don't have our own orchestration stuff. But I think it will mean implementing/using quite a lot of the IPVM scheduler internally because it sounds like there are quite strict rules around how things are executed esp. around Effects.
- How interested are we in the Output object? On your slides there is an "IPLI Output" which I am also keen to explore – I imagine a lot of it might be network specific but there are some clear commonalities e.g. data CIDs created, verification results, links to previous Task specs, etc.
WDYT? Am I in the right place with this?
task/README.md
Outdated
|
||
### 2.1.1 `type` | ||
|
||
The `type` field is used to declare the shape of the objet. This field MUST be either `ipvm/wasm` for pure Wasm, or `ipvm/effect` for effectful computation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I feel like what we are saying is that the type
field is not limited to these two things, but it user-defined. So the comment about effectful computation is true in an IPVM context (i.e. IPVM will only accept these two values and will treat them like this) but other contexts will accept other values for either type of computation. WDYT?
task/README.md
Outdated
### 2.1.2 `with` Resource | ||
|
||
The `with` field MUST contain a CID or URI of the resource to interact with. For example, this MAY be the Wasm to execute, or the URL of a web server to send a message to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we have found is that arguments required vary considerably on a per-type basis. And so generally one field isn't enough. ANd I think you have taken this out below?
task/README.md
Outdated
| Field | Type | Description | Required | Default | | ||
|-----------|-----------------------|-------------------------------------------|----------|-------------------------------| | ||
| `type` | `"ipvm/wasm"` | Identify this task as Wasm 1.0 | Yes | | | ||
| `version` | SemVer | The Wasm module's Wasm version | No | `"0.1.0"` | | ||
| `mod` | CID | Reference to the Wasm module to run | Yes | | | ||
| `fun` | `String or OutputRef` | The function to invoke on the Wasm module | Yes | | | ||
| `args` | `[{String => CID}]` | Arguments to the Wasm executable | Yes | | | ||
| `secret` | Boolean | | No | `False` | | ||
| `maxgas` | Integer | Maximum gas for the invocation | No | 1000 <!-- ...or something --> | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this looks great! I am intrigued about the args array works :)
But also in Bacalhau we are mainly running WASI-type workloads in a different way...
We have mod
and fun
, but we also have environment variables, filesystem mounts, and program arguments (strings as if executed on a command line). And we are just adding references to other WASM modules to load and link.
So I think we have two different types here...
ipvm/wasm
for the IPVM style of deterministic WASM, invoking a single WASM function with argswasi-32/wasm
(or something) for the Bacalhau style invocation...
IPVM will focus on the top one, Bac will support both eventually?
task/README.md
Outdated
``` json | ||
{ | ||
"type": "ipvm/effect", | ||
"version": "0.1.0", | ||
"using": "docker:Qm12345" | ||
"meta": { | ||
"description": "Tensorflow container", | ||
"tags": ["machine-learning", "tensorflow", "myproject"] | ||
}, | ||
"do": { | ||
"resources": { | ||
"ram": {"gb": 10} | ||
}, | ||
"inputs": [1, 2, 3], | ||
"entry": "/", | ||
"workdir": "/", | ||
"env": { | ||
"$FOO": "bar" | ||
}, | ||
"timeout": {"seconds": "3600"}, | ||
"contexts": [], | ||
"output": [], | ||
"sharding": 5 | ||
} | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I've really understood the intention around effects w.r.t. the rest of the objects in this spec... it seems like an effect IS a task? But I also feel like these things are very different? E.g. the effect you have on line 215 looks very different from this one.
Is an effect really a wrapper around a task? So an effect may be "invoke another task" but it may also be something else?
workflow/README.md
Outdated
"version": "0.1.0", | ||
"requestor": "did:key:zAlice", | ||
"nonce": "o3--8Gdu5", | ||
"verification": {"optimistic": 2}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot in here! And we are coming to the conclusion (like Koii) that verification can be quite task specific... and hence probably another user-definable object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting — do you mean defining arbitrary user-defined verification types, or pushing this field into each task?
How does Koii do verification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is only from watching Al Morris' talk, but he seemed to suggest that it is up to the owner of the Task to define how it is verified. This opens up doing quite specific things for verification, e.g. he mentioned that deterministic work over data might be verified via hashing, but non-deterministic work like web scraping might be verified stochastically, in a domain-specific way.
workflow/README.md
Outdated
| Field | Type | Description | Required | Default | | ||
|----------|-------------------|-----------------------------------------|----------|--------------------------| | ||
| `secret` | `Boolean or null` | Whether the output is unsafe to publish | No | `null` | | ||
| `check` | `Verification` | How to verify the output | No | `"attestation"` | | ||
| `time` | `TimeLength` | Timeout | No | `[5, "minutes"]` | | ||
| `memory` | `InfoSize` | Memory limit | No | `[100, "kilo", "bytes"]` | | ||
| `disk` | `InfoSize` | Disk limit | No | `[10, "mega", "bytes"]` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you are intending this struct to be used by more than IPVM, but if yes, we also have things like number of GPUs. If you do want this to be standard, I'll pull out all of our other config options.
workflow/README.md
Outdated
type Verification enum { | ||
type Verification union { | ||
| Oracle | ||
| Consensus(Integer) | ||
| Optimistic(Integer) | ||
| Consensus | ||
| Optimistic | ||
| ZKP | ||
} | ||
} representation keyed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are user-defined verifications (e.g. probabilistic) contained in this concept? Maybe the "attestations" are designed to scale to that?
Ok, here is a full Docker example, with comments: {
// with: for a Docker job, this is the Docker image to use
// this must be the full form specified with a hash
// the user is able to submit tags/labels but the system should canonicalise this
"with": "ubuntu@sha256:4b1d0c4a2d2aaf63b37111f34eb9fa89fa1bf53dd6e4ca954d47caebca4005c2",
"do": "docker/run",
"inputs": {
// entrypoint: the program and arguments that will be invoked in the container
"entrypoint": ["bash", "-c", "echo", "hello, world"],
// mounts: read-only disk mounts that will contain the data to be stored
// this maps a mount path to some storage source
// if this is a Link, the storage source is assumed to be an IPFS CID to use
// other storage types are available... like URL download. We want to introduce more!
// uh oh, does this mean we have to standardise the storage sources too?
// or we could turn the mounts into a MAY field e.g. "implementations MAY accept arbitrary storage sources"
"mounts": {
"/inputs": {"/": "bafybeih5fo3ggbgkg5ftwzycup4shhcqrjpd7toyvnagwicxzkqekjtqba"},
"/more": {"url": "https://example.com/data.txt"}
},
// outputs: writeable disk mounts, initially empty
// we may wish to give them options in the future (such as are they encrypted)
// another MAY field e.g. "implementations MAY accept arbitrary storage sources"
"outputs": {
"/outputs": {}
},
// env: environment variables (no $ required)
"env": {
"FOO": "bar"
},
"workdir": "/"
},
// meta: keys not involved with describing the work but only the execution
"meta": {
"bacalhau/config": {
"verifier": "deterministic", // may include options in future
"publisher": "estuary",
"resources": {
"cpu": "100m", // not sure what units this actually is?
"disk": [1, "mega", "byte"], // love these units, will steal
"memory": [1, "giga", "byte"],
"gpu": 1
},
"timeout": 300,
// annotaitons: arbitrary user-defined labels
"annotations": [
"my-lovely-job"
],
"dnt": true,
// not included here is sharding...
}
}
} |
{
// with: for wasm jobs, the wasm module to invoke
// ... but hang on, this can be a stoarge source at the moment! So a URL or a CID or maybe something more complex...
"with": "Qmajb9T3jBdMSp7xh2JruNrqg3hniCnM6EUVsBocARPJRQ",
"do": "wasm32-wasi/run",
"inputs": {
// As for Docker
"mounts": {
"/input.csv": {"url": "https://data.api.trade.gov.uk/v1/datasets/uk-tariff-2021-01-01/versions/latest/tables/measures-as-defined/data?format=csv&download"}
},
// As for Docker
"outputs": {
"/outupts": {}
},
// Defined by WASI, not really needed if we are specifying WASI elsewhere
"entrypoint": "_start",
// imports: modules to download from remote storage and make available to the WASM runtime
// so that our main module doesn't need to be self-contained and can requite non-WASI imports
"imports": [
{"/": "bafybeih5fo3ggbgkg5ftwzycup4shhcqrjpd7toyvnagwicxzkqekjtqba"},
{"url": "https://example.com/library.wasm"}
],
// args: passed to the WASM program, the equivalent of commandline args
"args": [
"/inputs.csv",
"/outputs/uk-tariff-2021-01-01--latest--measures-as-defined.parquet"
],
// As for Docker
"env": {
"FOO": "bar"
}
},
"meta": {
"bacalhau/config": {
// As for Docker
}
}
} So we seem to have some contention over the I'm guessing Also, in your pipelines example, can the |
And here is an IPVM task (the Docker eg above) slotted into the current Bacalhau job structure: {
"APIVersion": "V1beta1",
"ID": "92d5d4ee-3765-4f78-8353-623f5f26df08",
"RequestorNodeID": "QmXaXu9N5GNetatsvwnTfQqNtSeKAD6uCmarbh3LMRYAcF",
"RequestorPublicKey": "...",
"ClientID": "ac13188e93c97a9c2e7cf8e86c7313156a73436036f30da1ececc2ce79f9ea51",
"Task": {
"with": "ubuntu@sha256:4b1d0c4a2d2aaf63b37111f34eb9fa89fa1bf53dd6e4ca954d47caebca4005c2",
"do": "docker/run",
"inputs": {
"entrypoint": ["bash", "-c", "echo", "hello, world"],
"mounts": {
"/inputs": {"/": "bafybeih5fo3ggbgkg5ftwzycup4shhcqrjpd7toyvnagwicxzkqekjtqba"},
"/more": {"url": "https://example.com/data.txt"}
},
"outputs": {
"/outputs": {}
},
"env": {
"FOO": "bar"
},
"workdir": "/"
},
"meta": {
"bacalhau/config": {
"verifier": "deterministic",
"publisher": "estuary",
"resources": {
"cpu": "100m",
"disk": [1, "mega", "byte"],
"memory": [1, "giga", "byte"],
"gpu": 1
},
"timeout": 300,
"annotations": [
"my-lovely-job"
],
"dnt": true,
}
}
},
"Deal": {
"Concurrency": 1, // would be better called "multiplicity"
"Confidence": 1, // really an option for the deterministic verifier
"MinBids": 3
},
"CreatedAt": "2022-11-17T13:29:01.871140291Z",
} I've also not really worked out how our sharding feature fits into this world – on the one hand, each "invocation" in a sharded job is a different peice of work to be done, so it feels like there is a reification step for the system where it converts each shard into a different Task, and then submits them to compute nodes separately, so this structure works. But the user also needs to be able to define where they want sharding to happen, which at the moment happens to all "inputs" but not to "contexts", which are the same as inputs but unshardable (I have glossed over this in the above examples because it seems very Bac specific). Actually, I think I'm realising that trying to fit into a "pure inovcation data structure" world will be good for us and lead to a better factoring of our job structure. |
{
with: "ipfs://container",
do: "docker/run",
inputs: myargs
} ...as being roughly analogous to... container.docker_run(myargs) It's not a perfect analogy because these are distributed objects, but roughly that. We normally think of things like objects as containing some state, which these don't — it's a bit more functional: # Elixir
Executor.send(containerId, {"docker/run", myargs}) These can be totally stateless, pure functions from inputs to outputs.
You could inline e.g. Wasm, sure. It's probably not the most efficient way to do it over a network; you probably want to pass it by reference.
For the IPVM runtime it cannot! We want to be able to negotiate workloads with the network, and you can spark new jobs as a request to the runtime if you need dynamic behaviour. We could generalize it so that you could put a promise in that field, and IPVM would just reject those but Bacalhau could take them. Do you have a concrete use case for that? My guess is that possibly I need to do a better job at describing the difference between a task and workflow here. A Task is just this part: {
"with": "dns://example.com?TYPE=TXT",
"do": "crud/update",
"inputs": {
"value": "hello world"
},
"meta": {
"ipvm/config": {
"secret": false
"timeout": [500, "milli", "seconds"],
"verification": "attestation"
}
}
} These can absolutely enqueue the a following job! It needs to return a description of what that job would be, which can be literally another one of these above. This tells the runtime "please I would like one of these to be scheduled". In IPVM (which you don't necessarily have to follow but may be a good idea), we're putting this kind of output a special output to distinguish it from a normal pure value return: this is the effect system. Think of it like having // Input:
{
"with": "dns://example.com?TYPE=TXT",
"do": "crud/update",
"inputs": {
"value": "hello world"
},
"meta": {
"ipvm/config": {
"secret": false
"timeout": [500, "milli", "seconds"],
"verification": "attestation"
}
}
}
// Output, not in the spec but SHOULD BE!
{
"receipt": {
"bafyYourJobCID": {
"trace": {
"who": someDID,
"verification": [
signedVerification1,
signedVerification2
]
}
"value": "bafyOutputImageCID",
"effects": [ // request to enqueue
{
"ipfs://bafyMyWasmModule",
"do": "wasm/run",
"inputs": {
"func": "photoFilter",
"arg": "bafyOutputImageCID"
},
"meta": {
"ipvm/config": {
"verification": "attestation"
}
}
]
}
}
} Which is the purple part of this image: Now, the executor hasn't agreed to this yet! So in the full-blown IPVM version, this potentially has a performance impact unless you know that the runner has agreed to e.g. accept more of your credits, but if they don't have a GPU, it may need to get moved around the network. It's a new contract. So the TL;DR is that posting a static workflow in advance lets both sides agree on the kinds of things that will get run, and it can still get dynamic behaviour via the effect system, because enqueuing a job is "just" another kind of effect. |
Err, to clarify further the above: I'm saying that depending on Balachau's goals, you could skip the workflow part, and only use the individual tasks to only chain actions together procedurally. |
So perhaps is some text to use in a Docker invocation: X.1 Docker invocationsA Docker invocation is identified by a UCAN with the For a Docker invocation, the
where the fields are defined as:
|
At the moment we support IPFS CIDs and URLs for consumption over HTTP(S) for WASM. I guess either one of those could appear in a So the thing I am worried about re: |
Just the "build in Docker + run in WASM" example. At job submission time it would be Known that the output of the first job would be the program to run for the second, so it would be nice to negogiate all of the e.g. payment and compute provision up front. E.g. we would line up a node to build the WASM and then a node to run it, because we know that both steps are going to happen, just not what the WASM actually is, yet. I guess an Effect being part of the same Session means it can reuse payment channels etc, but we just won't have scheduled some of this stuff up-front. As you say, we should probably focus on getting everything from the Task-downwards right, and if we can adopt the Workflow then that is a bonus! |
Apologies, I'm going to mix the task description with some discussion of IPVM-specific considerations, because it is relevant to our scheduler and making as many things as memoizable as possible. This doesn't mean that users have to write these lines of config directly; one put a sugar layer on top as long as it has these semantics.
This is super helpful, thanks! 🙏
Ah, I see. For memoizability, that In IPVM's model these definitely need to be separate steps, because you need to cleanly separate the step that could fail because the executor may not be able to pull a particular container. We have the same situation with CIDs, too (e.g. running this in offline mode you want to know that you have certain CIDs available locally, which is an important use case for Fission). If I'm understanding you correctly, you don't want to kick off an entirely new workflow, because you know that this thing is a docker container. What you really want is something like the following mangled pseudocode: {
"with": "docker",
"do": "container/run",
"inputs": {
"container": {promise: [previous_step_output]},
"args": {/* ... */}
}
} 🤔 Okay, let me mull this one over! It would be convenient to say roughly "I know that this thing is one of these kinds of actions, I don't want to have to negotiate with the runner at every step when working procedurally".
The URI format CIDs is
Can you give me a concrete example of what that would look like for Bacalhau? I could be misunderstanding the goal of using RFC822 syntax: you're trying to do point-to-point communication? Or just give users a friendly, almost TOML-like interface? A sketched example may help me understand the kinds of things that you're trying to do here. If I'm understanding correctly, for the IPVM runtime we'd interpret this as a normal HTTP request. If you wanted to send this to a DID or an email, it would look similar, but the runtime would need to understand how to route that. // A lot like Axios or similar:
{
"with": "https://example.com",
"do": "http/put",
"input": {
"headers": {
"content-type": "application/json"
},
"payload": "hello world"
}
} |
Which is a bit like a higher order function. If we could address "Docker version X" by URI, then this would be really simple 🤔 We mainly need a way to say "this is the thing to run". URIs are very flexible, but I'd prefer not to have to resort to something like The problem with not having it as a URI is you really do want namespacing of some kind. There are other options, but URIs and URNs cover such a massive portion of the solution space. |
Rubber ducking in public, don't mind me. Here, we're trying to manage both the flow of authority (security, privacy, payment, etc but across many entities) and a convenient calling that makes sense for IPVM and Bacalhau. Object capabilities (e.g. CapTP) give us exactly this, but you need globally understandable pointers, because you have send messages to things. URIs usually fit the bill nicely. You should absolutely be able to say "and please run the Docker container than came out of the previous step". Here the calling convention gets a bit tricky in a declarative spec for the case you cited above. It's not a main use case of something like CapTP, where the system is dynamic. What we're modelling statically here is roughly a message to the executor to perform some action, not a message to some external service or module. We're basically saying "please load this thing and run it" or "please resolve this CID and give me a local handle". It's almost recursive. |
The underlying UCAN capability could be something like this (though it's more likely to be between the requestor and the coordinator process, not "the network") // pseudocode
{
iss: "did:bacalhau",
aud: "did:user",
att: [{
with: "docker:*", // any docker container
can: "run/arbitrary_containers",
nb: [{timeout: "10m"}]
}]
} Which also has this same addressing challenge, but strangely more straightforward, because we're getting the ability to run arbitrary containers on Bacalhau's network from Bacalhau. Authority and invocation are not the same thing, so this doesn't translate directly. A rough equivalent invocation would be something like: {
"with": "docker:*", // any docker container
"do": "run/arbitrary_containers",
"inputs": {
"container": {promise: ["/", "previous-step"]}
}
} Strictly speaking, in UCAN we a URI scheme for "any of this kind of thing", which IIRC (will look at spec later) is something like: We almost want like... string interpolation "with": "docker:${promise_goes_here}", |
I cheated and peaked at Spritely's OCap system. Being a Scheme, for them "lambda is the ultimate" (LtU). They resolve steps with lambdas, which is actually maybe what we want, too! Here's a sketch for what is almost like a quoted action: {
"with": "urn:ipvm:lambda",
"do": "ipvm/run",
"inputs": {
"use-the-built-conatiner": {
"type": "effect",
"resource": "docker",
"with": {"promise": ["/", "previous-step"]}, // <-- here it is!
"do": "container/run",
"inputs": {
"env": {"promise": ["/", "different-step-sure-why-not"]}
}
},
"do-some-wasm": {
// ... more of same
}
}
} |
Scratch that, I think we can do just this: {
"with": {"promise": ["/", "previous-step"]}, // <-- here it is!
"do": "container/run",
"inputs": {
"env": {"promise": ["/", "different-step-sure-why-not"]}
}
} But we need to be more careful with signalling in the |
As usual, @walkah is really awesome to bounce ideas around with! Also it turns out that the Spiritely folks have correct opinions in the right general direction... as usual (though they often use Magnet links, which won't help us here). Free & Easy 🚴♀️There's two layers: authority and invocation. These are definitely different things (see also big chunks of the UCAN Invocation PR conversation). A capability MAY be a cryptographic certificate, but it can also be a file handle to the thing itself (or the raw bytes). If someone has your container, we literally can't stop them from running it, unless it's encrypted. This is core to the capabilities model. Authority is extremely important to running tasks in a trustless ("mutually suspicious") network, but does not apply in the case where you have direct access to the bytes. Put another way: you don't need a UCAN if you have the container (except for metering and payment, but that's a separate concern). CID ≈ IPLD ≈ Bytes 💾A CID by itself represents some IPLD. This is a bit richer than raw bytes, since it also has links. As the Iroh folks often say, "we only care about bytes and links". That said, it's isomorphic to bytes. In essence, CIDs point to binary data. The URI for binary data is
Scheduler Safety & Authority 🦺Both scheduler safety properties (idempotent, destructive, verifiable) and authority (capabilities) use the tuple ("https:", "crud/create")
("dns:", "crud/read")
("mailto:", "msg/send")
("did:key:", "crypto/sign") But the relevant one for us in a moment: Signalling 🚦Binary data coming off an In the spec as it stands today, we already do signaling: The pair All of these ability namespaces are buckets for how to interpret the thing. They're kind of like Rust traits or Haskell type classes. Let's interpret binary as a Wasm blob: {
"with": "data:<bytes>",
"can": "docker/run",
"inputs": {
"entrypoint": ["/"],
"mounts": {"mnt": "/"},
"outputs": {"tmp": "/tmp"},
"env": {"$HELLO": "world"},
"workdir": "/work"
},
"meta": {
"ipvm/config": {
"disk": "200GB"
}
}
} Which — under the reasoning above — we can give a CID: {
"with": "ipfs://QmYqWegi4pfTJAsWN45FiRTDB75SyhmJG3L122dMWVCGWd",
"can": "docker/run",
"inputs": {
"entrypoint": ["/"],
"mounts": {"mnt": "/"},
"outputs": {"tmp": "/tmp"},
"env": {"$HELLO": "world"},
"workdir": "/work"
},
"meta": {
"ipvm/config": {
"disk": "200GB"
}
}
} ...and if we always treat promises in {
"with": {"ucan/promise": ["/", "previous_action"]},
"can": "docker/run",
"inputs": {
"entrypoint": ["/"],
"mounts": {"mnt": "/"},
"outputs": {"tmp": "/tmp"},
"env": {"$HELLO": "world"},
"workdir": "/work"
},
"meta": {
"ipvm/config": {
"disk": "200GB"
}
}
} If someone puts Wrap Up 🎁I... think that may solve the problem without needing to change the format. It does mean that we need to clarify the above promise semantics (i.e. always binary in a Does that work for you @simonwo? |
Moved this spec to a standalone repo (it's the pattern we've used) ipvm-wg/workflow#1 |
|
||
The `secret` flag marks a task as being unsuitable for publication. | ||
|
||
If the `sceret` field is explicitely set, the task MUST be treated per that setting. If not set, the `secret` field defaults to `null`, which behaves as a soft `false`. If such a task consumes input from a `secret` source, it is also marked as `secret`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: sceret -> secret
Sure – we are talking about this sort of thing. Only an idea at the moment but it's the sort of thing I wouldn't want to make more difficult.
Yeah 🤔... the excellent point you're making is that IPVM's way of modelling these is to have an Effect that can already have arbitrary structured So you have helped me realise that my idea kinda stinks and I think your model is better! So I think Bac should probably move more towards treating anything more advanced than a URL as an Effect, rather than trying to jam all the invocation inputs into a URL. Which is all a long-winded way of saying, perhaps don't worry about my concern, this is the right direction! |
…en?: Link to shouldi Coach Alice: Our Open Source Guide Related: ipvm-wg/spec#8 2022-11-14 @pdxjohnny Engineering Logs: intel#1406
… within Architecting Alice: She's Arriving When? Related: ipvm-wg/spec#8 2022-11-14 @pdxjohnny Engineering Logs: intel#1406 2022-11-14 SCITT Meeting Notes: intel#1406
Preview 📝
WIP but working in the open (as one does)
Added some text about effects. "But Brooke!" you say. "Scheduling is an implicit effect!". Yeah, I know. But it's not in the DSL — it's handled at a different layer. From the programmer's perspective, everything may as well be getting executed single-threaded. We're going to great pains to make this cleanly schedule-able, so that's all getting pushed out of the programmer's direct control.