Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DagJSON/DagCBOR encode+decode changes selector traversal order? #220

Closed
wants to merge 2 commits into from

Conversation

hannahhoward
Copy link
Collaborator

@hannahhoward hannahhoward commented Aug 5, 2021

I attempted to upgrade go-graphsync to master and found my tests are now broken.

It looks like we added code to "sort" fields on encode/decode from DAG-JSON / DAG-CBOR per the IPLD spec I guess?

The problem here is that means that building an IPLD node, then encoding/decoding from DAG-JSON or DAG-CBOR produces a different selector traversal order, which is unexpected to say the least.

It's also of course important cause selector traversal order remaining constant is a fairly important gaurantee, at least for Filecoin.

I've produced a test to demonstrate the behavior.

Not sure if this is "works as intended", but definitely quite unexpected and will break several things.

I wonder if it makes sense to force things to get added "in order" when building nodes?

Oh also this might cause a lot of data to change CIDs as well.

@hannahhoward hannahhoward added the kind/bug A bug in existing code (including security flaws) label Aug 5, 2021
@willscott
Copy link
Member

Relaying some useful context from @rvagg :

the state of world regarding map key sorting is

  • Old Go stack sorts RFC7049 on encode (via refmt)
  • Old and new JS stack sorts RFC7049 on encode
  • ipld-prime did no sorting until now, due to ongoing arguments about where the responsibility for sorting lies, but also for performance concerns
  • cbor-gen does its own form of sorting, simple bytewise, but maps are not used on main chain at all

But the reason why this isn’t the end of the world is that we don’t care about round-trip consistency in most cases (we do in some, i.e. chain consensus, which is why we need to be careful and why avoiding maps (and floats) is a great choice). On decode, what matters most of the time is that our implementations see the fields in the order that they are encoded in. So ipld-prime does this, traversals will work on the blocks as they are so selector ordering is stable. We have a problem with our JS stack in that we don’t preserve this ordering properly on decode, which is one reason why selectors in JS are a bit of a problem.

@warpfork
Copy link
Collaborator

The problem here is that means that building an IPLD node, then encoding/decoding from DAG-JSON or DAG-CBOR produces a different selector traversal order, which is unexpected to say the least.

^ I'm also here. I think we've made more footguns by enforcing this sorting than we remove.

But I feel I've been outvoted.

@mvdan
Copy link
Contributor

mvdan commented Aug 10, 2021

@rvagg said:

On decode, what matters most of the time is that our implementations see the fields in the order that they are encoded in. So ipld-prime does this, traversals will work on the blocks as they are so selector ordering is stable.

Just so I can understand what's going on here better: what's causing breakage here is not the decoding, but the encoding, since the encoding by default now does the sorting as per the IPLD codec spec.

And again as per Rod's input, it seems like this will generally be fine if users avoid tricky types like maps, where the key order can add edge cases.

So if we have a system that already uses maps and requires determinism with encode-decode roundtrips, could we not compromise by forcing the codec Encode calls use "no map sorting"? That has existed since 083c395.

The question then might be "why should the default be to sort and break those users"? If we swapped the default to not sort map keys for the user, in my opinion we open up even bigger footguns, as the implementation of the codecs would simply not do what the spec requires by default.

Comparatively, deterministic traversal order when using encode-decode roundtrips while using maps feels like a fairly advanced use case. I think it's more reasonable for those users to opt out via MapSortMode or an equivalent option :)

@mvdan
Copy link
Contributor

mvdan commented Aug 10, 2021

To add another bit of info that might get lost in Slack, Rod has revived the cbor-gen PR to also make it follow the spec: whyrusleeping/cbor-gen#56

@rvagg
Copy link
Member

rvagg commented Aug 11, 2021

(slightly philosophical rant below, feel free to gloss)

IMO the horse has bolted on this, sorting was baked into the codec usage for IPLD, as far as I understand from the beginning, and it's based on the 7049 suggestions. If you want to do something else wrt map ordering then that should probably be a different codec.

One of the primary goals is that of convergence from data produced by multiple systems. When you only have a single implementation then whatever but we're already starting off with two equally prominent stacks and when we move into blockchain-space we have multiple implementations that are having to match the rules of the reference implementation and it can get gnarly (witness the hassles over big.Int for non-Go implementations).

The problem here is that means that building an IPLD node, then encoding/decoding from DAG-JSON or DAG-CBOR produces a different selector traversal order, which is unexpected to say the least.

This sentiment seems reasonable, but it's a shifting of the goalposts. If this is what we really want then we're signing up for significant investment in making this work in both of our stacks. I think there's an important point here about IPLD - it's slowly moved from being primarily about data serialization to be concerned with higher level concerns about instantiated forms, which we've largely captured under the term "data model". This really has been driven almost entirely by @warpfork through go-ipld-prime but has not, at all, been mirrored by the JavaScript stack which still views IPLD as a matter of data serialization and bytes.

Maybe we need to be honest about the status quo and the huge differentials its created and the risks therein, or we're going to be picking up big.Int scraps for years to come as we "just make it work" in JavaScript and any other stack that tries to follow the specifics opinions arising out of go-ipld-prime. @warpfork I understand where your head's at regarding this map ordering issue, for the most part, but you're not taking everyone else along for the ride, we're brute forcing these problems through the stack by means of go-ipld-prime without having a chance to solve them elsewhere. As I consider the problems of graphsync, and more specificially, selectors, in JavaScript I'm terrified at the rabbit holes of patching needed to make it work because nobody's going to allow any kind of official investment in mirroring the go-ipld-prime vision of IPLD in JavaScript. Even our specifications and most of our documentation are focused on serialization and avoid many of these issues of what it means for the "data model" to exist outside of serialization concerns. Even now the web3.storage team is puzzling over why CARs created by https://github.com/web3-storage/ipfs-car are different to those produced by ipfs dag export (with go-ipfs) ... The reasons are here, but how do they get addressed without patching over patches over patches for each new edge that doesn't quite fit.

@aschmahmann
Copy link

that building an IPLD node, then encoding/decoding from DAG-JSON or DAG-CBOR produces a different selector traversal order, which is unexpected to say the least.

Something I'd like to explore/poke at here a little is what the implications are of the above being considered a problem.

If we are given an IPLD node built with some node builder there is no way to know ahead of time whether we're going to be serializing it to DagJSON, DagCBOR, DagPB, GitRaw, EthTx, or AdinSillyFormat. Unless we were to absolutely preclude support for codecs that ordered any of their fields we'd be out of luck on round-tripping the ordering of anything that's not explicitly ordered (i.e. a list).

For example, say we have two codecs DagAdinAZ and DagAdinZA. The first orders maps from A to Z and the other does the reverse. When I use node builders to create an ipld.Node with a map builder that adds foo = 5 and bar = 6 there is no way that I can traverse that map in a way that would round-trip with both DagAdinAZ and DagAdinZA.

So how could we get out of this if we really wanted to be able to create nodes with mapbuilders that are traversed in order the same way as when encoded and decoded? These are what come to mind:

  1. Have an IPLD library spec that defines a single sort order for map traversals that applies at the data model layer (i.e. above the codec layer)
  2. Have go-ipld-prime refuse to run selectors against nodes that are not backed by a concrete object (i.e. you must traverse over data that's already been encoded to a particular codec)
  3. Have users make sure to add their data to the mapbuilder in sorted order knowing which codec(s) they will be encoding to ahead of time and if the sort orders could conflict then create multiple different nodes with identical data (aside from sorting) one which will be serialized to each codec
  4. Disallow support for any codec that has any defined field order

All of these seem like pretty bad options to me, although if anyone has alternative ideas or disagrees with the badness levels I'd be interested to hear.

This seems to then boil down to the data model only enforcing a universal sort (i.e. option 1) on a single data model type and that is List and not Map. IIUC in the case of the go-graphsync tests they are failing for a good reason which is that until encoded and decoded the ordering of maps cannot be determined. However, IIUC we should be able to deterministically traverse a CID+selector combination since our decoders aren't strict and will still parse non-sorted DagCBOR. This is essentially option 2, but localized to where it's needed (i.e. traversals over CIDs + selectors, since the CID is immutable and defines the encoding) as opposed to the library as a whole.

@BigLep
Copy link

BigLep commented Oct 29, 2021

What are the next steps here?

@BigLep BigLep added the need/author-input Needs input from the original author label Oct 29, 2021
@rvagg
Copy link
Member

rvagg commented Oct 29, 2021

I think this can be closed, discussion here is a good reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) need/author-input Needs input from the original author
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants