Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job Spec #8

Draft
wants to merge 42 commits into
base: main
Choose a base branch
from
Draft

Job Spec #8

wants to merge 42 commits into from

Conversation

expede
Copy link
Member

@expede expede commented Nov 8, 2022

Preview 📝

WIP but working in the open (as one does)

Added some text about effects. "But Brooke!" you say. "Scheduling is an implicit effect!". Yeah, I know. But it's not in the DSL — it's handled at a different layer. From the programmer's perspective, everything may as well be getting executed single-threaded. We're going to great pains to make this cleanly schedule-able, so that's all getting pushed out of the programmer's direct control.

@cla-bot cla-bot bot added the cla-signed label Nov 8, 2022
"right": {
"run": {
"wasm": "bafyWasmRight",
"input": [{"from": "start", "output": 4}, {"from": "database"}]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the output here just "returning a number"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 4? Current theory is that it'll be the index on the multivalued return in this completely made up example.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the 4. So,it's the index, gotcha.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evidently the field name should be changed for clarity. Good feedback :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest named slots for clarity (and robustness to changes in # of output parameters)
Another suggestion for robustness: Optional expected type/schema for both input and outputs. Feeling there's a good opportunity for a Cambrian-like data lens structure here...
Maybe that's out of scope (because we have clear expectations of stability from the model) but then again, given that mutability is expected...

@johnandersen777
Copy link

johnandersen777 commented Nov 8, 2022

Some 2022-11-08 IPVM November Meeting Notes @pdxjohnny took (sorry for any errors)

Feel free to edit this comment to clean up the notes to something more resembling coherent
Cross posting here because this pull request was the main topic of discussion, but maybe there is more cross refs or better place.
Cross ref: intel/dffml#1406

  • Brooklyn Leading
  • TODO Link recording
  • Agenda
    • Updates
    • Convos in Lisbon
    • Discussion
  • Last month didn't happen due to busy-ness
  • Lisbon
    • Folks on this call were there in person for network labs week
    • Talked about IPVM and other topics
    • How to plug into other systems
    • How it's different than other things
    • IPVM got a grant, some funding, there is community faith
    • First step is to work on invocation spec
    • If we do a good job then in the next week or so it can serve as a basis for a few diffrrenet projects
    • BucketVM
      • UCAN based invocation
    • WarpForge
      • Build system, sets up linux sandbox then does deterministic builds (not WASM)
        • Goals: Build libc form source
        • Possibly aligned
        • Catalogs and formulas
    • Optimine?
      • Nondeterministic computation in docker containers
      • Getting existing workloads running
      • They have a golang based configuration
    • IPVM is less interested in ditributed algs and more interseted in doing fast WASM
  • How is interop being planned?
    • IPVM wants to be fully deterministic, cached, verifiable
    • Often need to resolve IPNS link, send email, etc. do "off chain"
      • WASI is one way to do that
      • That's not deterministic, you can do traced deception and read sth stream in but you can't parallelize and compare results
    • If you use a managed effect system, you leave all the impure stuff to the runtime
    • Example Effect: Operation invocation manifest, it calls back in using the input effect.
    • If there are chunks then they can call into IPVM and it can use the
    • Effects are like input events in DFFML dataflows
    • Affinity
      • I already have this cached, you should send me these effect
      • I have a GPU
    • Brooklyn has been laying out and thinking about what's reasonable
      • Data pipelines, composable out of existing jobs
      • Can tell it to run things concurrently
      • Dataflows are nice for this, dimond validation came up as an example
      • Issues: JSON due to DAG
        • There is as draft PR in the repo which says let's just name all the jobs
          • Job Spec #8
          • There might be a multi value output
          • This is static invocation, we know ahead of time this is the level of parallelism
          • You might have an output which invokes more jobs
  • Ideally, here's a UCAN, please do it
    • There is already a place for authorizations
    • In a UCAN, you have all the info you need to say please run this
    • Sometimes people will add invoke:true, it's unclear if you should be able to delegate.
    • Another approach is to put a think wrapper, you can rip off the auth part and wrap a new one
  • Irakli
    • CID of WASM with data in, not invocation by CID, but invocation by mutable pointer?
      • Brooklyn says ya we want multiple pointers?
        • There is a before block in the invocation, do this effect as an input, then place that and that gets a name.
    • How do define interfaces?
      • https://radu-matei.com/blog/intro-wasm-components/ might get into major interfaces soon
      • Challenge of links outside of IPLD
      • Need to have some native notion of "I'm reading 9TB data but I have to read in blocks" needs to read off of streams and emit streams
        • Autocodec inside of IPVM usually makes sense
          • Instead of baking in JSON and CBOR and protobuf and all these thing, we just pass around WASM and say run this on these blocks of data, it's like ebpf, it's dynamic
          • To get their webfilesystem to show in a gateway they had to do a bunmch of hacks right now
            • If you put it in IPVM then you can just reuse that as the distributed compute method
  • What happens when a user creates one of these? How do we put syntactic sugar on top.
    • How do we look at caching?
  • Non-goal: Support WASI right off the bat
    • WASM allows us to restrict what will be run with effects
      • Putting all effects on outside then WASM always allows us to use
        • They want to replace FaaS stuff with distributed compute
          • Fission goals: Decentralized open functions as a service, small short deterministic data flow, simple image transformations, etc.
  • Coming from erlang/elixr world
    • What happens when there is an issue how does erlang supervision pattern apply and failure cases / states for dags, how do we filter off into declarative specs based on locality
      • Not sure if giving people the choice of supervisor pattern is the right choice
      • We should come up with the secure by default (giving people to modify supervision patterns has been a loss for erlang)
        • With great power comes great responsibility, supervision is the correct concept, IPVM could be opinionated
        • Affinity, this depends on that, defined failure modes with overlays?
        • Look at k8s affinity and anti-affinity patterns
          • Please go to another node
    • WASM is a pure function with pure data (deterministic)
    • People want things that look like objects or actors
      • You can build that around this!
      • It will look like eventual consistency or software transaction memory
      • If you need locking then can use effects and soforth to land where you need
  • IPVM we want an analysis step, I'm going to reorder, come up with the dependency tree, (then overlay failure modes possible?)
    • Failure modes defined as effects?
  • IPVM as a distributed scheduler
  • Melanie: Microkernel
    • From chat: There is always a minimal set of functions application code need to communicate with the system- in our case we care about IPLD blocks. Is there a way to define affinity, so if a node has executed a command, loaded the IPFS in its cache, it’s more likely to get the next job with same base data?. Looks like it could be done outside Wasm. I'd like to say IPVM host code is close ish to a microkernel that ships with a kernel that can be pasted on modules when they get run to provide a better interface *to the system cals
    • Looking to have effectivly this syscall style interface which you can referecnce for CID
    • Works on filecoin VM, using WASM and micro kernel appraoch has been useful
  • Autocodec sounds similar to a WASM version of shim

johnandersen777 pushed a commit to intel/dffml that referenced this pull request Nov 14, 2022
…en?: Link to shouldi Coach Alice: Our Open Source Guide

Related: ipvm-wg/spec#8
2022-11-14 @pdxjohnny Engineering Logs: #1406
johnandersen777 pushed a commit to intel/dffml that referenced this pull request Nov 14, 2022
… within Architecting Alice: She's Arriving When?

Related: ipvm-wg/spec#8
2022-11-14 @pdxjohnny Engineering Logs: #1406
2022-11-14 SCITT Meeting Notes: #1406
@RangerMauve
Copy link

RangerMauve commented Nov 22, 2022

Not sure if this has been brought up before, but would it make sense to use IPLD Schemas instead of JSON Schemas as much as possible and to link to their DMT CID so stuff can be loaded from the network?

edit: Whoops, didn't see the latest commit. :P

@expede expede mentioned this pull request Nov 24, 2022
task/README.md Outdated
Comment on lines 217 to 238
"type": "ipvm/task",
"version": "0.1.0",
"using": "docker:Qm12345"
"meta": {
"annotations": []
},
"args": {
"resources": {
"ram": {"gb": 10}
},
"inputs": [1, 2, 3],
"entry": "/",
"workdir": "/",
"env": {
"$FOO": "bar"
},
"timeout": {"seconds": "3600"},
"contexts": [],
"output": [],
"sharding": 5
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukemarsden & @simonwo: this is a very rough mockup, but is this more in line with what you need?

Changes:

  • They all just "tasks" (wasm, docker, etc)
  • Nested fields for docker-specific things
  • Signalling Docker with docker:... not sure if there's an offical URI scheme from Docker/CNCF/OCI
  • Annotations are a good idea! Here they're in the common fields section (un-nested)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks broadly good!

  • what determines the fields that are valid under args? is it the scheme in the using URI? I'd maybe prefer to have it be an explicit kind sub-field (open to a better name!) i.e. type is top-level task, kind is the kind of task it is, which determines the valid args fields. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking of it as being scoped to the URI, but you're right that we probably want at least a version in there or something. We could also keep the same pattern of the type field inside the payload:

{
  "type": "ipvm/task",
  "version": "0.1.0",
  "using": "docker:Qm12345"
  "meta": {
    "annotations": []
  },
  "args": {
    "type": "bacalhau/docker", // This line
    "version": "0.1.0", // Possibly this, too
    "resources": {
      "ram": {"gb": 10}
    },
    "inputs": [1, 2, 3],
    "entry": "/",
    "workdir": "/",
    "env": {
      "$FOO": "bar"
    },
    "timeout": {"seconds": "3600"},
    "contexts": [],
    "output": [],
    "sharding": 5
  }
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

Copy link

@lukemarsden lukemarsden Nov 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also ideally we don't have bacalhau specific things, but generic things that bacalhau and others can implement. So we try and spearhead ipvm/wasm and add ipvm/docker

Copy link

@simonwo simonwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I haven't read all of everything yet but it's looking fabulous so far!

I'm still trying to get clear that we are aiming at the same end result. In my head it seems to be:

  • For memoization, we are looking for a thing we can hash and link to a peice of output. In the best case, someone can compute something on IPVM and then we can reuse that result on Bacalhau, in a fully distriubted way. So the thing we want represents the computation done (the "what") but doesn't depend on execution environment (the "how" i.e. memory limits, gas, timeouts).
    • (Because you can run the same computation with timeout = 1hr and timeout = 1day and get the same result, if no error).
  • The "IPVM Task" represents the "what" of a unit of computation i.e. what is the code to run, what are the arguments to that code, what data is expected to be available, etc. It doesn't contain the "how" of the execution.

So the thing we want to share across CoD networks specifically is the Task. Bacalhau will adopt the Task and use it in place of our own Spec. And then when we start building the memozation service, we'll use the Task as input key into the map of jobs to outputs.

  • Separately, we want to be able to invoke a Task on Bacalhau from within IPVM. That will involve sending an IPVM Task towards Bacalhau but it will also require sending some extra data as well (the "how"). E.g. what time limits Bac should apply, what publisher to use for the results. So we also need to build an IPVM Effect that will execute a Bac job. Am I right?
  • On the outside, the IPVM Workflow defines orchestration of tasks and verification options. Bacalhau's options in this area are currently a richer on e.g. the hardware limits side than what is here but less sophisticated around orchestration. I'm not sure if we want to try and jam everything into this IPVM Workflow? Because there will be a lot of stuff that is not common between networks.
    • But Bac should work towards being able to use the Workflow, not least because we don't have our own orchestration stuff. But I think it will mean implementing/using quite a lot of the IPVM scheduler internally because it sounds like there are quite strict rules around how things are executed esp. around Effects.
  • How interested are we in the Output object? On your slides there is an "IPLI Output" which I am also keen to explore – I imagine a lot of it might be network specific but there are some clear commonalities e.g. data CIDs created, verification results, links to previous Task specs, etc.

WDYT? Am I in the right place with this?

task/README.md Outdated

### 2.1.1 `type`

The `type` field is used to declare the shape of the objet. This field MUST be either `ipvm/wasm` for pure Wasm, or `ipvm/effect` for effectful computation.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I feel like what we are saying is that the type field is not limited to these two things, but it user-defined. So the comment about effectful computation is true in an IPVM context (i.e. IPVM will only accept these two values and will treat them like this) but other contexts will accept other values for either type of computation. WDYT?

task/README.md Outdated
Comment on lines 137 to 139
### 2.1.2 `with` Resource

The `with` field MUST contain a CID or URI of the resource to interact with. For example, this MAY be the Wasm to execute, or the URL of a web server to send a message to.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we have found is that arguments required vary considerably on a per-type basis. And so generally one field isn't enough. ANd I think you have taken this out below?

task/README.md Outdated
Comment on lines 167 to 175
| Field | Type | Description | Required | Default |
|-----------|-----------------------|-------------------------------------------|----------|-------------------------------|
| `type` | `"ipvm/wasm"` | Identify this task as Wasm 1.0 | Yes | |
| `version` | SemVer | The Wasm module's Wasm version | No | `"0.1.0"` |
| `mod` | CID | Reference to the Wasm module to run | Yes | |
| `fun` | `String or OutputRef` | The function to invoke on the Wasm module | Yes | |
| `args` | `[{String => CID}]` | Arguments to the Wasm executable | Yes | |
| `secret` | Boolean | | No | `False` |
| `maxgas` | Integer | Maximum gas for the invocation | No | 1000 <!-- ...or something --> |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this looks great! I am intrigued about the args array works :)

But also in Bacalhau we are mainly running WASI-type workloads in a different way...
We have mod and fun, but we also have environment variables, filesystem mounts, and program arguments (strings as if executed on a command line). And we are just adding references to other WASM modules to load and link.

So I think we have two different types here...

  • ipvm/wasm for the IPVM style of deterministic WASM, invoking a single WASM function with args
  • wasi-32/wasm (or something) for the Bacalhau style invocation...

IPVM will focus on the top one, Bac will support both eventually?

task/README.md Outdated
Comment on lines 243 to 268
``` json
{
"type": "ipvm/effect",
"version": "0.1.0",
"using": "docker:Qm12345"
"meta": {
"description": "Tensorflow container",
"tags": ["machine-learning", "tensorflow", "myproject"]
},
"do": {
"resources": {
"ram": {"gb": 10}
},
"inputs": [1, 2, 3],
"entry": "/",
"workdir": "/",
"env": {
"$FOO": "bar"
},
"timeout": {"seconds": "3600"},
"contexts": [],
"output": [],
"sharding": 5
}
}
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I've really understood the intention around effects w.r.t. the rest of the objects in this spec... it seems like an effect IS a task? But I also feel like these things are very different? E.g. the effect you have on line 215 looks very different from this one.

Is an effect really a wrapper around a task? So an effect may be "invoke another task" but it may also be something else?

"version": "0.1.0",
"requestor": "did:key:zAlice",
"nonce": "o3--8Gdu5",
"verification": {"optimistic": 2},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot in here! And we are coming to the conclusion (like Koii) that verification can be quite task specific... and hence probably another user-definable object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting — do you mean defining arbitrary user-defined verification types, or pushing this field into each task?

How does Koii do verification?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is only from watching Al Morris' talk, but he seemed to suggest that it is up to the owner of the Task to define how it is verified. This opens up doing quite specific things for verification, e.g. he mentioned that deterministic work over data might be verified via hashing, but non-deterministic work like web scraping might be verified stochastically, in a domain-specific way.

Comment on lines 244 to 250
| Field | Type | Description | Required | Default |
|----------|-------------------|-----------------------------------------|----------|--------------------------|
| `secret` | `Boolean or null` | Whether the output is unsafe to publish | No | `null` |
| `check` | `Verification` | How to verify the output | No | `"attestation"` |
| `time` | `TimeLength` | Timeout | No | `[5, "minutes"]` |
| `memory` | `InfoSize` | Memory limit | No | `[100, "kilo", "bytes"]` |
| `disk` | `InfoSize` | Disk limit | No | `[10, "mega", "bytes"]` |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you are intending this struct to be used by more than IPVM, but if yes, we also have things like number of GPUs. If you do want this to be standard, I'll pull out all of our other config options.

Comment on lines 214 to 286
type Verification enum {
type Verification union {
| Oracle
| Consensus(Integer)
| Optimistic(Integer)
| Consensus
| Optimistic
| ZKP
}
} representation keyed
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are user-defined verifications (e.g. probabilistic) contained in this concept? Maybe the "attestations" are designed to scale to that?

@simonwo
Copy link

simonwo commented Nov 30, 2022

Ok, here is a full Docker example, with comments:

{
    // with: for a Docker job, this is the Docker image to use
    // this must be the full form specified with a hash
    // the user is able to submit tags/labels but the system should canonicalise this
    "with": "ubuntu@sha256:4b1d0c4a2d2aaf63b37111f34eb9fa89fa1bf53dd6e4ca954d47caebca4005c2",
    "do": "docker/run",
    "inputs": {
        // entrypoint: the program and arguments that will be invoked in the container
        "entrypoint": ["bash", "-c", "echo", "hello, world"],
        // mounts: read-only disk mounts that will contain the data to be stored
        // this maps a mount path to some storage source
        // if this is a Link, the storage source is assumed to be an IPFS CID to use
        // other storage types are available... like URL download. We want to introduce more!
        // uh oh, does this mean we have to standardise the storage sources too?
        // or we could turn the mounts into a MAY field e.g. "implementations MAY accept arbitrary storage sources"
        "mounts": {
            "/inputs": {"/": "bafybeih5fo3ggbgkg5ftwzycup4shhcqrjpd7toyvnagwicxzkqekjtqba"},
            "/more": {"url": "https://example.com/data.txt"}
        },
        // outputs: writeable disk mounts, initially empty
        // we may wish to give them options in the future (such as are they encrypted)
        // another MAY field e.g. "implementations MAY accept arbitrary storage sources"
        "outputs": {
            "/outputs": {}
        },
        // env: environment variables (no $ required)
        "env": {
            "FOO": "bar"
        },
        "workdir": "/"
    },
    // meta: keys not involved with describing the work but only the execution
    "meta": {
        "bacalhau/config": {
            "verifier": "deterministic", // may include options in future
            "publisher": "estuary",
            "resources": {
                "cpu": "100m", // not sure what units this actually is?
                "disk": [1, "mega", "byte"], // love these units, will steal
                "memory": [1, "giga", "byte"],
                "gpu": 1
            },
            "timeout": 300,
            // annotaitons: arbitrary user-defined labels
            "annotations": [
                "my-lovely-job"
            ],
            "dnt": true,
            // not included here is sharding... 
        }
    }
}

@simonwo
Copy link

simonwo commented Nov 30, 2022

{
    // with: for wasm jobs, the wasm module to invoke
    // ... but hang on, this can be a stoarge source at the moment! So a URL or a CID or maybe something more complex...
    "with": "Qmajb9T3jBdMSp7xh2JruNrqg3hniCnM6EUVsBocARPJRQ",
    "do": "wasm32-wasi/run",
    "inputs": {
        // As for Docker
        "mounts": {
            "/input.csv": {"url": "https://data.api.trade.gov.uk/v1/datasets/uk-tariff-2021-01-01/versions/latest/tables/measures-as-defined/data?format=csv&download"}
        },
        // As for Docker
        "outputs": {
            "/outupts": {}
        },
        // Defined by WASI, not really needed if we are specifying WASI elsewhere
        "entrypoint": "_start",
        // imports: modules to download from remote storage and make available to the WASM runtime
        // so that our main module doesn't need to be self-contained and can requite non-WASI imports
        "imports": [
            {"/": "bafybeih5fo3ggbgkg5ftwzycup4shhcqrjpd7toyvnagwicxzkqekjtqba"},
            {"url": "https://example.com/library.wasm"}
        ],
        // args: passed to the WASM program, the equivalent of commandline args
        "args": [
            "/inputs.csv",
            "/outputs/uk-tariff-2021-01-01--latest--measures-as-defined.parquet"
        ],
        // As for Docker
        "env": {
            "FOO": "bar"
        }
    },
    "meta": {
        "bacalhau/config": {
            // As for Docker
        }
    }
}

So we seem to have some contention over the with field... we seem to be using it kinda arbitrarily between our different types? For WASM it is a program to run whereas for Docker it is a Docker image reference (i.e. the container to use, not a program to run). What is the intended meaning of the with field?

I'm guessing with being so central to UCAN means it can't be anything other than a string? Because for WASM we technically support getting a WASM blob from any storage source, which can have structure beyond what is easily serializable.

Also, in your pipelines example, can the with be something that is supplied by an earlier pipeline? So that we can build WASM in one step and then execute it another?

@simonwo
Copy link

simonwo commented Nov 30, 2022

And here is an IPVM task (the Docker eg above) slotted into the current Bacalhau job structure:

{
    "APIVersion": "V1beta1",
    "ID": "92d5d4ee-3765-4f78-8353-623f5f26df08",
    "RequestorNodeID": "QmXaXu9N5GNetatsvwnTfQqNtSeKAD6uCmarbh3LMRYAcF",
    "RequestorPublicKey": "...",
    "ClientID": "ac13188e93c97a9c2e7cf8e86c7313156a73436036f30da1ececc2ce79f9ea51",
    "Task": {
        "with": "ubuntu@sha256:4b1d0c4a2d2aaf63b37111f34eb9fa89fa1bf53dd6e4ca954d47caebca4005c2",
        "do": "docker/run",
        "inputs": {
            "entrypoint": ["bash", "-c", "echo", "hello, world"],
            "mounts": {
                "/inputs": {"/": "bafybeih5fo3ggbgkg5ftwzycup4shhcqrjpd7toyvnagwicxzkqekjtqba"},
                "/more": {"url": "https://example.com/data.txt"}
            },
            "outputs": {
                "/outputs": {}
            },
            "env": {
                "FOO": "bar"
            },
            "workdir": "/"
        },
        "meta": {
            "bacalhau/config": {
                "verifier": "deterministic", 
                "publisher": "estuary",
                "resources": {
                    "cpu": "100m", 
                    "disk": [1, "mega", "byte"], 
                    "memory": [1, "giga", "byte"],
                    "gpu": 1
                },
                "timeout": 300,
                "annotations": [
                    "my-lovely-job"
                ],
                "dnt": true,
            }
        }
    },
    "Deal": {
        "Concurrency": 1, // would be better called "multiplicity"
        "Confidence": 1, // really an option for the deterministic verifier
        "MinBids": 3
    },
    "CreatedAt": "2022-11-17T13:29:01.871140291Z",
}

I've also not really worked out how our sharding feature fits into this world – on the one hand, each "invocation" in a sharded job is a different peice of work to be done, so it feels like there is a reification step for the system where it converts each shard into a different Task, and then submits them to compute nodes separately, so this structure works. But the user also needs to be able to define where they want sharding to happen, which at the moment happens to all "inputs" but not to "contexts", which are the same as inputs but unshardable (I have glossed over this in the above examples because it seems very Bac specific).

Actually, I think I'm realising that trying to fit into a "pure inovcation data structure" world will be good for us and lead to a better factoring of our job structure.

@expede
Copy link
Member Author

expede commented Nov 30, 2022

What is the intended meaning of the with field?

with is the target that you're working with, and is any URI. It can be https://exmaple.com/timeline or ipfs://bafyMyWasmfunction. In an OO sense, you can think of this as an object reference. Every object takes different messages. The do field contains what you're sending to that thing, so you can think of this...

{
  with: "ipfs://container",
  do: "docker/run",
  inputs: myargs
}

...as being roughly analogous to...

container.docker_run(myargs)

It's not a perfect analogy because these are distributed objects, but roughly that. We normally think of things like objects as containing some state, which these don't — it's a bit more functional:

# Elixir
Executor.send(containerId, {"docker/run", myargs})

These can be totally stateless, pure functions from inputs to outputs.

I'm guessing with being so central to UCAN means it can't be anything other than a string?

You could inline e.g. Wasm, sure. It's probably not the most efficient way to do it over a network; you probably want to pass it by reference.

can the with be something that is supplied by an earlier pipeline?

For the IPVM runtime it cannot! We want to be able to negotiate workloads with the network, and you can spark new jobs as a request to the runtime if you need dynamic behaviour. We could generalize it so that you could put a promise in that field, and IPVM would just reject those but Bacalhau could take them.

Do you have a concrete use case for that?

My guess is that possibly I need to do a better job at describing the difference between a task and workflow here. A Task is just this part:

{
  "with": "dns://example.com?TYPE=TXT",
  "do": "crud/update",
  "inputs": { 
    "value": "hello world"
  },
  "meta": {
    "ipvm/config": {
      "secret": false
      "timeout": [500, "milli", "seconds"],
      "verification": "attestation"
    }
  }
}

These can absolutely enqueue the a following job! It needs to return a description of what that job would be, which can be literally another one of these above. This tells the runtime "please I would like one of these to be scheduled". In IPVM (which you don't necessarily have to follow but may be a good idea), we're putting this kind of output a special output to distinguish it from a normal pure value return: this is the effect system. Think of it like having stdeff to go with stdout and stderr.

// Input:
{
  "with": "dns://example.com?TYPE=TXT",
  "do": "crud/update",
  "inputs": { 
    "value": "hello world"
  },
  "meta": {
    "ipvm/config": {
      "secret": false
      "timeout": [500, "milli", "seconds"],
      "verification": "attestation"
    }
  }
}

// Output, not in the spec but SHOULD BE!

{
  "receipt": {
    "bafyYourJobCID": {
      "trace": {
         "who": someDID,
         "verification": [
           signedVerification1,
           signedVerification2
         ]
      }
       "value": "bafyOutputImageCID",
       "effects": [ // request to enqueue
         {
           "ipfs://bafyMyWasmModule",
           "do": "wasm/run",
           "inputs": { 
             "func": "photoFilter",
             "arg": "bafyOutputImageCID"
           },
           "meta": {
            "ipvm/config": {
              "verification": "attestation"
            }
         }
       ]
    }
  }

}

Which is the purple part of this image:

Now, the executor hasn't agreed to this yet! So in the full-blown IPVM version, this potentially has a performance impact unless you know that the runner has agreed to e.g. accept more of your credits, but if they don't have a GPU, it may need to get moved around the network. It's a new contract.

So the TL;DR is that posting a static workflow in advance lets both sides agree on the kinds of things that will get run, and it can still get dynamic behaviour via the effect system, because enqueuing a job is "just" another kind of effect.

@expede
Copy link
Member Author

expede commented Nov 30, 2022

Err, to clarify further the above: I'm saying that depending on Balachau's goals, you could skip the workflow part, and only use the individual tasks to only chain actions together procedurally.

@simonwo
Copy link

simonwo commented Nov 30, 2022

So perhaps is some text to use in a Docker invocation:

X.1 Docker invocations

A Docker invocation is identified by a UCAN with the do field set to docker/run. The with field SHOULD then be interpreted as a Docker image identifier. The image identifier SHOULD be a fully-resolved image tag (e.g. ubuntu@sha256:4b1d0c) but implementations MAY support label tags (e.g. ubuntu@latest).

For a Docker invocation, the inputs structure MUST conform to the following schema:

type DockerInputs struct {
    entrypoint [String]
    mounts {String:StorageSource}
    outputs {String:StorageOutput}
    env {String:String}
    workdir String
}

type StorageSource union {
    | Link
    | URL
    | Any
}

type StorageOutput Any
type URL String

where the fields are defined as:

Name Type Description
entrypoint [String] The program and arguments that will be invoked in the container
mounts [StorageSource] Read-only disk mounts that will contain the data to be stored
outputs [StorageOutput] Writeable disk mounts, initially empty
env {String:String} Environment variables made available to the container
workdir String Path within the container in which the entrypoint will be invoked

@simonwo
Copy link

simonwo commented Nov 30, 2022

At the moment we support IPFS CIDs and URLs for consumption over HTTP(S) for WASM. I guess either one of those could appear in a with and it's up to the system to interpret it as a URI?

So the thing I am worried about re: with is having to boil all future storage source types down into a URI format. E.g. we are thinking about supporting a more RFC 822-like syntax where users can specify arbitrary HTTP headers. Of course, it's possible to represent any data as a URI, but not necessarily a better way of representing things if nested structures are available. Feels like this is a pretty non-negogiable part of using this spec though!

@simonwo
Copy link

simonwo commented Nov 30, 2022

Do you have a concrete use case for that?

Just the "build in Docker + run in WASM" example. At job submission time it would be Known that the output of the first job would be the program to run for the second, so it would be nice to negogiate all of the e.g. payment and compute provision up front. E.g. we would line up a node to build the WASM and then a node to run it, because we know that both steps are going to happen, just not what the WASM actually is, yet.

I guess an Effect being part of the same Session means it can reuse payment channels etc, but we just won't have scheduled some of this stuff up-front.

As you say, we should probably focus on getting everything from the Task-downwards right, and if we can adopt the Workflow then that is a bonus!

@expede
Copy link
Member Author

expede commented Nov 30, 2022

Apologies, I'm going to mix the task description with some discussion of IPVM-specific considerations, because it is relevant to our scheduler and making as many things as memoizable as possible. This doesn't mean that users have to write these lines of config directly; one put a sugar layer on top as long as it has these semantics.

where the fields are defined as:

This is super helpful, thanks! 🙏

the image identifier SHOULD be a fully-resolved image tag (e.g. ubuntu@sha256:4b1d0c) but implementations MAY support label tags (e.g. ubuntu@latest).

Ah, I see. For memoizability, that ubuntu@latest needs to get resolved before it get passed to a step that can be memoized. You know that you're going to run some container, but not which container.

In IPVM's model these definitely need to be separate steps, because you need to cleanly separate the step that could fail because the executor may not be able to pull a particular container. We have the same situation with CIDs, too (e.g. running this in offline mode you want to know that you have certain CIDs available locally, which is an important use case for Fission).

If I'm understanding you correctly, you don't want to kick off an entirely new workflow, because you know that this thing is a docker container. What you really want is something like the following mangled pseudocode:

{
  "with": "docker",
  "do": "container/run",
  "inputs": {
    "container": {promise: [previous_step_output]},
    "args": {/* ... */}
  }
}

🤔 Okay, let me mull this one over! It would be convenient to say roughly "I know that this thing is one of these kinds of actions, I don't want to have to negotiate with the runner at every step when working procedurally".

it's up to the system to interpret it as a URI

The URI format CIDs is ipfs://QmTheCIDGoesHere. I used to think that this meant "on the public DHT", but this is the syntax for any Multiformat-flavoured CID.

RFC 822-like syntax where users can specify arbitrary HTTP headers

Can you give me a concrete example of what that would look like for Bacalhau?

I could be misunderstanding the goal of using RFC822 syntax: you're trying to do point-to-point communication? Or just give users a friendly, almost TOML-like interface? A sketched example may help me understand the kinds of things that you're trying to do here.

If I'm understanding correctly, for the IPVM runtime we'd interpret this as a normal HTTP request. If you wanted to send this to a DID or an email, it would look similar, but the runtime would need to understand how to route that.

// A lot like Axios or similar:
{
  "with": "https://example.com",
  "do": "http/put",
  "input": {
    "headers": {
      "content-type": "application/json"
    },
    "payload": "hello world"
  }
}

@expede
Copy link
Member Author

expede commented Nov 30, 2022

"with": "docker",

Which is a bit like a higher order function. If we could address "Docker version X" by URI, then this would be really simple 🤔 We mainly need a way to say "this is the thing to run". URIs are very flexible, but I'd prefer not to have to resort to something like urn:ipvm:effect:dockerV20. Still noodling on it, but I think we have options.

The problem with not having it as a URI is you really do want namespacing of some kind. There are other options, but URIs and URNs cover such a massive portion of the solution space.

@expede
Copy link
Member Author

expede commented Nov 30, 2022

Rubber ducking in public, don't mind me.

Here, we're trying to manage both the flow of authority (security, privacy, payment, etc but across many entities) and a convenient calling that makes sense for IPVM and Bacalhau.

Object capabilities (e.g. CapTP) give us exactly this, but you need globally understandable pointers, because you have send messages to things. URIs usually fit the bill nicely. You should absolutely be able to say "and please run the Docker container than came out of the previous step". Here the calling convention gets a bit tricky in a declarative spec for the case you cited above. It's not a main use case of something like CapTP, where the system is dynamic.

What we're modelling statically here is roughly a message to the executor to perform some action, not a message to some external service or module. We're basically saying "please load this thing and run it" or "please resolve this CID and give me a local handle". It's almost recursive.

@expede
Copy link
Member Author

expede commented Nov 30, 2022

The underlying UCAN capability could be something like this (though it's more likely to be between the requestor and the coordinator process, not "the network")

// pseudocode
{
  iss: "did:bacalhau",
  aud: "did:user",
  att: [{
    with: "docker:*", // any docker container
    can: "run/arbitrary_containers",
    nb: [{timeout: "10m"}]
  }]
}

Which also has this same addressing challenge, but strangely more straightforward, because we're getting the ability to run arbitrary containers on Bacalhau's network from Bacalhau. Authority and invocation are not the same thing, so this doesn't translate directly. A rough equivalent invocation would be something like:

{
  "with": "docker:*", // any docker container
  "do": "run/arbitrary_containers",
  "inputs": {
    "container": {promise: ["/", "previous-step"]}
  }
}

Strictly speaking, in UCAN we a URI scheme for "any of this kind of thing", which IIRC (will look at spec later) is something like: urn:<scheme>:*. This gives us an equivalent to a type field, which lets you do the negotiation needed for IPVM and Bacalhau to know that you're able to accept the job... but also it's a bit weird in this context because you're basically addressing all Docker or Wasm jobs, which then looks different from the individual call case.

We almost want like... string interpolation

 "with": "docker:${promise_goes_here}",

@expede
Copy link
Member Author

expede commented Nov 30, 2022

I cheated and peaked at Spritely's OCap system. Being a Scheme, for them "lambda is the ultimate" (LtU). They resolve steps with lambdas, which is actually maybe what we want, too! Here's a sketch for what is almost like a quoted action:

{
  "with": "urn:ipvm:lambda",
  "do": "ipvm/run",
  "inputs": {
    "use-the-built-conatiner": {
      "type": "effect",
      "resource": "docker",
      "with": {"promise": ["/", "previous-step"]}, // <-- here it is!
      "do": "container/run",
      "inputs": {
         "env": {"promise": ["/", "different-step-sure-why-not"]}
      }
    },
    "do-some-wasm": {
      // ... more of same
    }
  }
}

@expede
Copy link
Member Author

expede commented Nov 30, 2022

Scratch that, I think we can do just this:

{
  "with": {"promise": ["/", "previous-step"]}, // <-- here it is!
  "do": "container/run",
  "inputs": {
    "env": {"promise": ["/", "different-step-sure-why-not"]}
  }
}

But we need to be more careful with signalling in the do, since it's unclear what kind of thing you're doing. This is always the case with ipfs:// URIs 🤷‍♀️ zcap (and Spritely?) use magnet links because you can shove all kinds of context in there.

@expede
Copy link
Member Author

expede commented Dec 1, 2022

As usual, @walkah is really awesome to bounce ideas around with! Also it turns out that the Spiritely folks have correct opinions in the right general direction... as usual (though they often use Magnet links, which won't help us here).

Free & Easy 🚴‍♀️

There's two layers: authority and invocation. These are definitely different things (see also big chunks of the UCAN Invocation PR conversation). A capability MAY be a cryptographic certificate, but it can also be a file handle to the thing itself (or the raw bytes). If someone has your container, we literally can't stop them from running it, unless it's encrypted. This is core to the capabilities model.

Authority is extremely important to running tasks in a trustless ("mutually suspicious") network, but does not apply in the case where you have direct access to the bytes. Put another way: you don't need a UCAN if you have the container (except for metering and payment, but that's a separate concern).

CID ≈ IPLD ≈ Bytes 💾

A CID by itself represents some IPLD. This is a bit richer than raw bytes, since it also has links. As the Iroh folks often say, "we only care about bytes and links". That said, it's isomorphic to bytes. In essence, CIDs point to binary data. The URI for binary data is data:<bytes>, so for our purposes we can treat data: as resolved bytes and ipfs:// as soon to be raw bytes (with some extra structure that we can path through in chunks, but still binary data)

(Aside: conceptually we could also use the extended fields in data: to signal the type of data, but I don't think we need to do that. Keep reading 👇)

Scheduler Safety & Authority 🦺

Both scheduler safety properties (idempotent, destructive, verifiable) and authority (capabilities) use the tuple (<uri-scheme>, <some ability>) to agree on the kind of action, and the level of safety. It's openly extensible, so anyone can register new actions without having to ask by using these kinds of strings + UCAN to back the authority. Some examples:

("https:", "crud/create")
("dns:", "crud/read")
("mailto:", "msg/send")
("did:key:", "crypto/sign")

But the relevant one for us in a moment: ("data:", <some action>).

Signalling 🚦

Binary data coming off an ipfs:// CID doesn't help you identify its contents. That's fine. We can infer from the ability.

In the spec as it stands today, we already do signaling: crud/read is a very generic thing that lets us write standard libs, but that string can be literally anything.

The pair ("data:<bytes>", "crud/read") is meaningless. Sure, we can read that data. ("data:<bytes>, crud/create) is meaningless, because it's already created.

All of these ability namespaces are buckets for how to interpret the thing. They're kind of like Rust traits or Haskell type classes. Let's interpret binary as a Wasm blob: ("data:<bytes>, "wasm/run"). Here's Docker: ("data:<bytes>", "docker/run"). Or in IPVM syntax:

{
  "with": "data:<bytes>",
  "can": "docker/run",
  "inputs": {
    "entrypoint": ["/"],
    "mounts": {"mnt": "/"},
    "outputs": {"tmp": "/tmp"},
    "env":  {"$HELLO": "world"},
    "workdir":  "/work"
  },
  "meta": {
    "ipvm/config": {
      "disk": "200GB"
    }
  }
}

Which — under the reasoning above — we can give a CID:

{
  "with": "ipfs://QmYqWegi4pfTJAsWN45FiRTDB75SyhmJG3L122dMWVCGWd",
  "can": "docker/run",
  "inputs": {
    "entrypoint": ["/"],
    "mounts": {"mnt": "/"},
    "outputs": {"tmp": "/tmp"},
    "env":  {"$HELLO": "world"},
    "workdir":  "/work"
  },
  "meta": {
    "ipvm/config": {
      "disk": "200GB"
    }
  }
}

...and if we always treat promises in with as binary (because they resolve to CIDs, which in turn resolve to binary), then:

{
  "with": {"ucan/promise": ["/", "previous_action"]},
  "can": "docker/run",
  "inputs": {
    "entrypoint": ["/"],
    "mounts": {"mnt": "/"},
    "outputs": {"tmp": "/tmp"},
    "env":  {"$HELLO": "world"},
    "workdir":  "/work"
  },
  "meta": {
    "ipvm/config": {
      "disk": "200GB"
    }
  }
}

If someone puts "https://exmaple.com" in that promise, we're still safe because the semantics are equivalent to "data:\"https://example.com\"".

Wrap Up 🎁

I... think that may solve the problem without needing to change the format. It does mean that we need to clarify the above promise semantics (i.e. always binary in a with) in the spec.

Does that work for you @simonwo?

@expede
Copy link
Member Author

expede commented Dec 1, 2022

Moved this spec to a standalone repo (it's the pattern we've used) ipvm-wg/workflow#1

@expede expede mentioned this pull request Dec 1, 2022

The `secret` flag marks a task as being unsuitable for publication.

If the `sceret` field is explicitely set, the task MUST be treated per that setting. If not set, the `secret` field defaults to `null`, which behaves as a soft `false`. If such a task consumes input from a `secret` source, it is also marked as `secret`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: sceret -> secret

@simonwo
Copy link

simonwo commented Dec 2, 2022

RFC 822-like syntax where users can specify arbitrary HTTP headers
Can you give me a concrete example of what that would look like for Bacalhau?

Sure – we are talking about this sort of thing. Only an idea at the moment but it's the sort of thing I wouldn't want to make more difficult.

If I'm understanding correctly, for the IPVM runtime we'd interpret this as a normal HTTP request. If you wanted to send this to a DID or an email, it would look similar, but the runtime would need to understand how to route that.

Yeah 🤔... the excellent point you're making is that IPVM's way of modelling these is to have an Effect that can already have arbitrary structured inputs. For anything that is more than an identifier (like how to retrieve a resource) you need an Effect. In Bacalhau we don't have orchestration so the data retrieval isn't a first-class activity atm and its just something handled by the runtime, and what I am talking about in the doc above is making the data description more like a request invocation.

So you have helped me realise that my idea kinda stinks and I think your model is better! So I think Bac should probably move more towards treating anything more advanced than a URL as an Effect, rather than trying to jam all the invocation inputs into a URL. Which is all a long-winded way of saying, perhaps don't worry about my concern, this is the right direction!

johnandersen777 pushed a commit to johnandersen777/dffml that referenced this pull request Jan 5, 2023
…en?: Link to shouldi Coach Alice: Our Open Source Guide

Related: ipvm-wg/spec#8
2022-11-14 @pdxjohnny Engineering Logs: intel#1406
johnandersen777 pushed a commit to johnandersen777/dffml that referenced this pull request Jan 5, 2023
… within Architecting Alice: She's Arriving When?

Related: ipvm-wg/spec#8
2022-11-14 @pdxjohnny Engineering Logs: intel#1406
2022-11-14 SCITT Meeting Notes: intel#1406
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants