-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: serve CBOR encoded DAG nodes from the gateway #8037
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one suggestion. Good to merge as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am supportive for this, with some caveats:
- CBOR is not JSON, so the request should explicitly ask for JSON (details inline)
- Cache-Control needs more nuance (details inline)
- Needs sharness tests similar to
test/sharness/t0115-gateway-dir-listing.sh
Note that we want to support CAR export too: ipfs/in-web-browsers#170
core/corehttp/gateway_handler.go
Outdated
@@ -523,6 +527,25 @@ func (i *gatewayHandler) serveFile(w http.ResponseWriter, req *http.Request, nam | |||
http.ServeContent(w, req, name, modtime, content) | |||
} | |||
|
|||
func (i *gatewayHandler) serveCBOR(w http.ResponseWriter, r *http.Request, n format.Node) { | |||
w.Header().Set("Cache-Control", "public, max-age=29030400, immutable") | |||
w.Header().Set("Content-Type", "application/json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this. We need to make a decision on default behavior before this ships:
- @warpfork suggested we should aim at returning more user-friendly HTML by default, similar to https://explore.ipld.io, for all non-unixfs DAGs.
- I kinda agree: we already return custom HTML for directory listings (and not JSON)
- We want gateways to support everything, not just dag-cbor
- What if for dag-cbor we return JSON only when request had an explicit
Content-Type: application/json
header?- This way makes semantic sense (I explicitly ask for JSON representation of dag-cbor) and won't break if we add HTML explorer for IPLD universe as the default response
- For now, if content-type is missing from request, return error saying that "To preview dag-cbor via gateway, request with
Content-type: application/json
header"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikeal pointed out that
there’s a lot of caches out there that aren’t smart about this and you’ll get responses from cache that don’t match the accept header
This is really unfortunate, but likely relevant concern.
I was thinking maybe gateway code could consider file extension if there is one and infer Content-Type from there ? E.g. if content targeted is ipfs://${cid}/metadata.json
in dag-cbor format it's probably reasonable to assume that content type application/json
should be used. In fact that seems like a reasonable approach to dag-pb as well.
Although I realize this is getting uncomfortable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be ok with streaming the bytes as application/cbor
and add a querystring parameter ?json
or ?encoding=json
to have it converted to JSON with a application/json
header.
IMHO we should be serving data from the gateway that is by default the easiest format to consume from a web application. JSON is that. Also, right now we have NO support, and this is significantly better than nothing and what's here is consistent with the CLI.
So I'd prefer to keep what's happening here and maybe add ?encoding=raw
for getting the raw cbor bytes.
I like @Gozala's idea but it feels like too many special cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really should keep complexity at bay and avoid special-casing different DAG types too much.
I propose we introduce DAG CBOR support in (IMO) the least controversial way:
- Stream
application/cbor
by default- Returning the original, space-efficient format makes more sense than JSON
- Rationale: Gateway should prioritize returning original bytes if possible and avoid conversion in default mode. If someone wants JSON by default, then they should use
dag-json
– can we adddag-json
as part of this PR as well?
- Rationale: Gateway should prioritize returning original bytes if possible and avoid conversion in default mode. If someone wants JSON by default, then they should use
- IIUC dag-cbor is a subset of cbor, so this is a valid content-type (I'd like to avoid introducing proprietary one if we don't have to).
- Returning the original, space-efficient format makes more sense than JSON
- JSON requires conversion step, so we
should supportapplication/json
as an explicit opt-in by either:- requesting
Content-Type: application/json
- Rationale: standards matter and we can't be sloppy if we want to be respected by serious vendors and standards bodies
- OR passing
?enc=json
parameter (as a fallback way for use in poorly written software)- Rationale: the same parameter could be re-used for returning JSON directory listings in Gateway: client opt-in JSON responses for directory listing #7552, and we already have
--enc=json
inipfs
CLI, so this is keeping our conventions neat.
- Rationale: the same parameter could be re-used for returning JSON directory listings in Gateway: client opt-in JSON responses for directory listing #7552, and we already have
- requesting
Update: I no longer feel as strong about keeping CBOR as default, see #8037 (review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just require the header and the querystring. The problem you’ll run into with it only being a fallback is that the user, client developer, and the meddling proxy/cache are different actors and can’t always coordinate whether or not to explicitly opt into a feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean we should require both?
That is too wacky, and bad for DX/UX: makes it impossible to load JSON via address bar in browser.
If someone uses ductape-based CDN/middleware and Content-Type
does not work, then they add ?enc=json
, but we should not make redundant opt-in mandatory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point on the address bar. It would actually have inconsistent behavior based on your browser plugins because everyone with JSON viewer plugin includes the content type in browser requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's continue in ipfs/in-web-browsers#182
@@ -523,6 +527,25 @@ func (i *gatewayHandler) serveFile(w http.ResponseWriter, req *http.Request, nam | |||
http.ServeContent(w, req, name, modtime, content) | |||
} | |||
|
|||
func (i *gatewayHandler) serveCBOR(w http.ResponseWriter, r *http.Request, n format.Node) { | |||
w.Header().Set("Cache-Control", "public, max-age=29030400, immutable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is invalid if DAG is fetched from anything other than /ipfs/
like /ipns/
(not immutable). See gateway_handler.go#L318-L319.
This may require a bigger refactor, as we want to avoid separate code paths for unixfs and cbor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about this some more, and playing with this PR, I feel @alanshaw convinced me that defaulting dag-cbor
response to application/json
contributes to a way better onboarding flow for newcomers and also better interop (numbers game, more tools speak JSON than CBOR, including web browsers).
I created ipfs/in-web-browsers#182 to discuss the overall strategy when it comes to IPLD on gateways, the default behavior is one of open questions there – would like to hear more voices before we commit to this.
As for this PR, needs sharness tests for dag-pb
requests ensuring expected response with:
- default behavior (we may flip it to JSON before merging this)
- request with
Content-Type: application/cbor
- request with
?enc=cbor
- request with
Content-Type: application/json
- request with
?enc=json
@alanshaw lmk if you need help with sharness
I've come to realise a possible reason for CBOR (or other encoding) not being implemented yet is pathing within a node - typically you can |
@alanshaw iiuc our path traversal semantics, pathing makes sense only when we talk about separate documents, so it should work only for CID tags within Also, pathing makes it impossible to extract more than one value, so I suggested support for graphql-like queries that return only a subset of fields from a document in ipfs/in-web-browsers#182, but that requires more design discussion (needs to be future-proof and compatible with IPLD selectors), lets descope it from this PR. Remaining work here (so we dont miss anything):
|
@alanshaw After sleeping on this I think we should not return JSON by default, but instead return an error informing user to retry request with explicit format (JSON or CBOR) either via That ensures devs will build on deterministic response types and enable us to introduce user-friendly UI for exploring DAGs in the future, as suggested in ipfs/in-web-browsers#182 (comment) |
Big agree, 💯 . I think we should leave room for that kind of experience to be built into the Gateway. It would be consistent with what we do for unixfsv1 directories already and we should leave room to carry that design logic further; and for the encoding types, explicit is good. |
Let me join the chorus suggesting against converting objects to JSON by default. Despite it being almost the only case right now, unixfs is a special case of IPLD data that can be interpreted by a gateway. When a client asks the gateway for some CID and that CID is a unixfs root, the gateway doesn't give the client what they actually asked for (an IPLD node), but infers a request for the DAG data to be traversed and rendered to a flat file (which does not have the same hash). Or analogous behaviour for a directory. This is great for the use case of URLs in hyperlinks, rendering in browsers etc. We don't ask the client to pass headers or query parameters for this "special" rendering behaviour because it's too awkward for the browser URL bar case; so the default is to render. Imagine some node that the gateway currently doesn't know how to render, but in the future might be told how to render as something a browser can handle. We should preserve a similar default approach of assuming that the request is from a browser URL bar. Maybe some third party has an alternative schema for chunking flat files, and we later add support for it too. We don't want to change the response data and break old clients relying on the default JSON. Most other contexts outside a browser URL bar have easy capacity to pass parameters describing what content they want (JS code in browsers, server-side code, When serving IPLD data, another reason not to JSONify everything is that it breaks the verifiability of the content from the CID. It's ok for that to be an explicit request, but unfortunate as a default behaviour. JS code can parse CBOR too.
I would tweak this. We should be serving data from the gateway that is by default the easiest format to consume from a web browser, because the URL bar is the least good place to require parameters describing what the client wants, and is already set by precedent. But web applications are just as capable as other clients of requesting a particular data format – the request and response will both usually be handled by code. |
Switched to draft while this is still undergoing some design discussion and isn't quite ready for mergable-reviewing. |
This might become much easier once we have #7995 which will allow for a |
return | ||
var contentType string | ||
var data []byte | ||
if r.URL.Query().Get("enc") == "json" || r.Header.Get("Content-Type") == "application/json" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be: r.Header.Get("Accept")
"Content-Type" in the request is about the body of the request, not about the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... oh, except "Accept" can be a comma-separated list, so the header value needs to be parsed - not just ==
compared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to work pretty well:
diff --git a/core/corehttp/gateway_handler.go b/core/corehttp/gateway_handler.go
index 11e5db671..5b847ae10 100644
--- a/core/corehttp/gateway_handler.go
+++ b/core/corehttp/gateway_handler.go
@@ -39,6 +39,7 @@ const (
)
var onlyAscii = regexp.MustCompile("[[:^ascii:]]")
+var anyJSON = regexp.MustCompile(`^application/(|.+\+)json$`)
// HTML-based redirect for errors which can be recovered from, but we want
// to provide hint to people that they should fix things on their end.
@@ -537,9 +538,24 @@ func (i *gatewayHandler) serveCBOR(w http.ResponseWriter, r *http.Request, n for
}
modtime := time.Unix(1, 0)
+ useJSON := false
+ if r.URL.Query().Get("enc") == "json" {
+ useJSON = true
+ } else {
+ for _, v := range r.Header.Values("Accept") {
+ for _, a := range strings.Split(v, ",") {
+ mediatype, _, err := mime.ParseMediaType(a)
+ if err == nil && anyJSON.MatchString(mediatype) {
+ useJSON = true
+ break
+ }
+ }
+ }
+ }
+
var contentType string
var data []byte
- if r.URL.Query().Get("enc") == "json" || r.Header.Get("Content-Type") == "application/json" {
+ if useJSON {
contentType = "application/json"
b, err := json.Marshal(n)
if err != nil {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d'oh! : forgot the label on the outer loop so the break can break all the way out.
Quick check-in note:
|
Note to self: after #8758 lands we revisit this and add dag-cbor and dag-json formats using the newly created reusable code for cache control and content disposition. |
@BigLep yes, this is unblocked, I've created #8823 with some additional notes, so it is not blocked on my availability. Aiming at 0.14 sounds good: ideally we would resolve #8568 before this ships, as people will start using dag-json and dag-cbor way more thanks to this, and we want good UX around it, but in this specific case it is not a hard blocker. Implementation wise, it should be pretty easy to add on top of refactor that happened in #8758: Anyone with spare time could do it, given a spare day.
(at minimum, we want to test the real dag-json and dag-cbor + have one for dag-pb represented as dag-json) |
Continued in #9335 |
This PR enables the gateway to serve CBOR encoded DAG nodes. They are returned as JSON objects, the same encoding used on the CLI.
Closes #8823 and ipfs/in-web-browsers#182