Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!(messagev2): tweak dag-cbor message schema #354

Merged
merged 2 commits into from
Feb 11, 2022

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Feb 7, 2022

For:

  1. Efficiency: compacting the noisy structures into tuples representations and making top-level components of a message optional.
  2. Migrations: providing a secondary mechanism to lean on for versioning if we want a gentler upgrade path than libp2p protocol versioning.

Closes: #351

In terms of what this looks like, an example of an original message is in #351 and with this change the same data would look like (as dag-json):

{
  "gs1": {
    "blk": [
      [
        { "/": { "bytes": "AVUSIA" } },
        { "/": { "bytes": "QgTLmh40xfCOmyCqdgkOcAILtWwMo9OvcpbNEFilESiQ/tIYSI8ITY355INftUrQRf/ZNuO/cmGwQmxRNSoJeBbtdEgruQhLSn7YrcUX8zceDgQ0tRFiXNGkF5IkPczc/ogJSw" } }
      ],
      [
        { "/": { "bytes": "AVUSIA" } },
        { "/": { "bytes": "xfPTKlWZ2kO4US4NsU11HtHWQJ2dy3rtBdbMbAmHxjTufEI28FF8INzGqZ6G5LrcAl5fXoG0WWEoUimfNvEQ6Qa6IW4ByDcDUq0ziUrrH+WNzNCC9FaSKBovz42VowZ4zaRVhw" } }
      ]
    ],
    "req": [
      [
        { "/": { "bytes": "GmIWTBeVSRyOznxIsnBumg" } },
        "n",
        101,
        { "/": "bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi" },
        {"R":{":>":{"|":[{".":{}},{"a":{">":{"@":{}}}}]},"l":{"none":{}}}},
        {
          "AppleSauce/McGee": "yee haw"
        }
      ],
      [
        { "/": { "bytes": "vd/PRdnYTg27RolghnknVQ" } },
        "n",
        202,
        { "/": "bafyreibdoxfay27gf4ye3t5a7aa5h4z2azw7hhhz36qrbf5qleldj76qfy" },
        {"R":{":>":{"a":{">":{"@":{}}}},"l":{"none":{}}}},
        {}
      ]
    ],
    "rsp": [
      [
        { "/": { "bytes": "GmIWTBeVSRyOznxIsnBumg" } },
        34,
        [
          [
            { "/": "bafyreibdoxfay27gf4ye3t5a7aa5h4z2azw7hhhz36qrbf5qleldj76qfy" },
            "m"
          ]
        ],
        {}
      ],
      [
        { "/": { "bytes": "vd/PRdnYTg27RolghnknVQ" } },
        14,
        [],
        {
          "Hippity+Hoppity": {   "/": {   "bytes": "9V/48SUItj7yv+ynVXrpDfYxGl7BYxtKH6hDMQvZw6cQ6qzlob3XKtC/4El3HBHnVjOL2Thl5kXxreybnJnvQH+9T8aFnnkExa19yb0QpcwWlz1bKOwabdQ9n4L58Yw9A0GONQ"   }   }
        }
      ]
    ]
  }
}

In terms of bytes saved, using the randomish data in TestGraphsyncRoundTrip (i.e. it has some variability in output length and these are not using exactly the same data so consider it approximate):

  • v1.0 -> v1.0 protobuf original: 213 bytes written by client, 25,361 bytes written by server
  • v2.0 -> v2.0 prior to this change: 271 bytes written by requestor, 23,129 bytes written by server
  • v2.0 -> v2.0 with this change: 239 bytes written by client, 22,076 bytes written by server

@mvdan @warpfork care to critique my schema?

Aside: it's interesting that we're even saving bytes written by server from protobuf to dag-cbor even without these changes. The majority of the data sent should be blocks and their CIDs and all these maps with string keys should drown out any saving CBOR gets from its compact int & length representation. Plus we have v1.0 doing int request IDs and v2 doing UUID (as bytes) request IDs so they're longer. Nothing in the protobuf spec is optional, although I believe repeated allows for zero occurrences so in effect the top-level items are optional at least. Perhaps it's all to do with moving the metadata into the core message rather than as an extension (so we save writing the string "graphsync/response-metadata" on each response). An interesting mystery that might be worth investigating at some point.

For:

1. Efficiency: compacting the noisy structures into tuples representations and
   making top-level components of a message optional.
2. Migrations: providing a secondary mechanism to lean on for versioning if we
   want a gentler upgrade path than libp2p protocol versioning.

Closes: #351
Copy link
Collaborator

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I have a few comments:

  • I am pretty anxious about tuple representation for requests and responses. While the versioning helps, if we can move around fields without breaking versions, I much prefer that. Also seems like message size is already a reduction so... maybe that's fine?
  • I feel less anxious about blocks being tuple. Blocks are blocks. They have a well defined standard and aren't likely to change.
  • why do we need versioning in the message itself? Isn't the libp2p protocol version sufficient to give us this information?

@rvagg
Copy link
Member Author

rvagg commented Feb 8, 2022

Adjusted schema as per feedback above and some discussion today via Zoom. The requests and responses are back to maps with keys because we don't expect them to repeat much and it's nice to have them descriptive. But I've also optionalised more of the fields in there so they could be left out entirely without any pain.

{
    "gs2": {
        "blk": [
            [
                { "/": { "bytes": "AVUSIA" } },
                { "/": { "bytes": "QgTLmh40xfCOmyCqdgkOcAILtWwMo9OvcpbNEFilESiQ/tIYSI8ITY355INftUrQRf/ZNuO/cmGwQmxRNSoJeBbtdEgruQhLSn7YrcUX8zceDgQ0tRFiXNGkF5IkPczc/ogJSw" } }
            ],
            [
                { "/": { "bytes": "AVUSIA" } },
                { "/": { "bytes": "xfPTKlWZ2kO4US4NsU11HtHWQJ2dy3rtBdbMbAmHxjTufEI28FF8INzGqZ6G5LrcAl5fXoG0WWEoUimfNvEQ6Qa6IW4ByDcDUq0ziUrrH+WNzNCC9FaSKBovz42VowZ4zaRVhw" } }
            ]
        ],
        "req": [
            {
                "ext": {
                    "AppleSauce/McGee": "yee haw"
                },
                "id": { "/": { "bytes": "k3Nu8gWHTfq53ypWnVFtwg" } },
                "pri": 101,
                "root": { "/": "bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi" },
                "sel": {"R":{":>":{"|":[{".":{}},{"a":{">":{"@":{}}}}]},"l":{"none":{}}}},
                "type": "n"
            },
            {
                "id": { "/": { "bytes": "mdBpy6/wQyqT1LiaihfQWQ" } },
                "pri": 202,
                "root": { "/": "bafyreibdoxfay27gf4ye3t5a7aa5h4z2azw7hhhz36qrbf5qleldj76qfy" },
                "sel": {"R":{":>":{"a":{">":{"@":{}}}},"l":{"none":{}}}},
                "type": "n"
            }
        ],
        "rsp": [
            {
                "meta": [
                    [
                        { "/": "bafyreibdoxfay27gf4ye3t5a7aa5h4z2azw7hhhz36qrbf5qleldj76qfy" },
                        "m"
                    ]
                ],
                "reqid": { "/": { "bytes": "k3Nu8gWHTfq53ypWnVFtwg" } },
                "stat": 34
            },
            {
                "ext": {
                    "Hippity+Hoppity": { "/": { "bytes": "9V/48SUItj7yv+ynVXrpDfYxGl7BYxtKH6hDMQvZw6cQ6qzlob3XKtC/4El3HBHnVjOL2Thl5kXxreybnJnvQH+9T8aFnnkExa19yb0QpcwWlz1bKOwabdQ9n4L58Yw9A0GONQ" } }
                },
                "reqid": { "/": { "bytes": "mdBpy6/wQyqT1LiaihfQWQ" } },
                "stat": 14
            }
        ]
    }
}

In terms of bytes sent, the difference is negligible when averaged out. We're sending a few more bytes for the map keys but not enough of them and the blocks are still tuples, we're also saving on optional fields by skipping them entirely.

Unfortunately there's more pointers in here than I'd like, getting bindnode working with implicit values that don't need to be backed by pointers might be nice to help with that, but tricky (/cc @mvdan - e.g. optional but implicit 0 should mean I don't need a pointer field for that).

Copy link
Collaborator

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This schema is looking good @rvagg

@hannahhoward hannahhoward merged commit 259905a into rvagg/uuid-rebasing Feb 11, 2022
hannahhoward added a commit that referenced this pull request Feb 18, 2022
…ng (#332)

* feat(net): initial dag-cbor protocol support

also added first roundtrip benchmark

* feat(requestid): use uuids for requestids

Ref: #278
Closes: #279
Closes: #281

* fix(requestmanager): make collect test requests with uuids sortable

* fix(requestid): print requestids as string uuids in logs

* fix(requestid): use string as base type for RequestId

* chore(requestid): wrap requestid string in a struct

* feat(libp2p): add v1.0.0 network compatibility

* chore(net): resolve most cbor + uuid merge problems

* feat(net): to/from ipld bindnode types, more cbor protoc improvements

* feat(net): introduce 2.0.0 protocol for dag-cbor

* fix(net): more bindnode dag-cbor protocol fixes

Not quite working yet, still need some upstream fixes and no extensions work
has been attempted yet.

* chore(metadata): convert metadata to bindnode

* chore(net,extensions): wire up IPLD extensions, expose as Node instead of []byte

* Extensions now working with new dag-cbor network protocol
* dag-cbor network protocol still not default, most tests are still exercising
  the existing v1 protocol
* Metadata now using bindnode instead of cbor-gen
* []byte for deferred extensions decoding is now replaced with datamodel.Node
  everywhere. Internal extensions now using some form of go-ipld-prime
	decode to convert them to local types (metadata using bindnode, others using
	direct inspection).
* V1 protocol also using dag-cbor decode of extensions data and exporting the
  bytes - this may be a breaking change for exising extensions - need to check
	whether this should be done differently. Maybe a try-decode and if it fails
	export a wrapped Bytes Node?

* fix(src): fix imports

* fix(mod): clean up go.mod

* fix(net): refactor message version format code to separate packages

* feat(net): activate v2 network as default

* fix(src): build error

* chore: remove GraphSyncMessage#Loggable

Ref: #332 (comment)

* chore: remove intermediate v1.1 pb protocol message type

v1.1.0 was introduced to start the transition to UUID RequestIDs. That
change has since been combined with the switch to DAG-CBOR messaging format
for a v2.0.0 protocol. Thus, this interim v1.1.0 format is no longer needed
and has not been used at all in a released version of go-graphsync.

Fixes: filecoin-project/lightning-planning#14

* fix: clarify comments re dag-cbor extension data

As per dission in #338, we are going
to be erroring on extension data that is not properly dag-cbor encoded from now
on

* feat: new LinkMetadata iface, integrate metadata into Response type (#342)

* feat(metadata): new LinkMetadata iface, integrate metadata into Response type

* LinkMetadata wrapper around existing metadata type to allow for easier
  backward-compat upgrade path
* integrate metadata directly into GraphSyncResponse type, moving it from an
  optional extension
* still deal with metadata as an extension for now—further work for v2 protocol
  will move it into the core message schema

Ref: #335

* feat(metadata): move metadata to core protocol, only use extension in v1 proto

* fix(metadata): bindnode expects Go enum strings to be at the type level

* fix(metadata): minor fixes, tidy up naming

* fix(metadata): make gofmt and staticcheck happy

* fix(metadata): docs and minor tweaks after review

Co-authored-by: Daniel Martí <[email protected]>

* fix: avoid double-encode for extension size estimation

Closes: filecoin-project/lightning-planning#15

* feat(requesttype): introduce RequestType enum to replace cancel&update bools (#352)

Closes: #345

* fix(metadata): extend round-trip tests to byte representation (#350)

* feat!(messagev2): tweak dag-cbor message schema (#354)

* feat!(messagev2): tweak dag-cbor message schema

For:

1. Efficiency: compacting the noisy structures into tuples representations and
   making top-level components of a message optional.
2. Migrations: providing a secondary mechanism to lean on for versioning if we
   want a gentler upgrade path than libp2p protocol versioning.

Closes: #351

* fix(messagev2): adjust schema per feedback

* feat(graphsync): unify req & resp Pause, Unpause & Cancel by RequestID (#355)

* feat(graphsync): unify req & resp Pause, Unpause & Cancel by RequestID

Closes: #349

* fixup! feat(graphsync): unify req & resp Pause, Unpause & Cancel by RequestID

* fixup! feat(graphsync): unify req & resp Pause, Unpause & Cancel by RequestID

when using error type T, use *T with As, rather than **T

* fixup! feat(graphsync): unify req & resp Pause, Unpause & Cancel by RequestID

* fixup! feat(graphsync): unify req & resp Pause, Unpause & Cancel by RequestID

Co-authored-by: Daniel Martí <[email protected]>

* feat: SendUpdates() API to send only extension data to via existing request

* fix(responsemanager): send update while completing

If request has finished selector traversal but is still sending blocks,
I think it should be possible to send updates. As a side effect, this
fixes our race.

Logically, this makes sense, cause our external indicator that we're
done (completed response listener) has not been called.

* fix(requestmanager): revert change to pointer type

* Refactor async loading for simplicity and correctness (#356)

* feat(reconciledloader): first working version of reconciled loader

* feat(traversalrecorder): add better recorder for traversals

* feat(reconciledloader): pipe reconciled loader through code

style(lint): fix static checks

* Update requestmanager/reconciledloader/injest.go

Co-authored-by: Rod Vagg <[email protected]>

* feat(reconciledloader): respond to PR comments

Co-authored-by: Rod Vagg <[email protected]>

* fix(requestmanager): update test for rebase

Co-authored-by: Daniel Martí <[email protected]>
Co-authored-by: hannahhoward <[email protected]>
@mvdan mvdan deleted the rvagg/ipld-schema-tweaks branch March 7, 2022 11:57
marten-seemann pushed a commit that referenced this pull request Mar 2, 2023
avoid panic when a decoder is not present for a voucher type
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants