-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: JSON decoder / encoder fns #403
Conversation
...and move some utility schemas to a shared util.validation namespace. These schemas are useful in the http-api-gateway for coercing & encoding & decoding requests & responses to/from JSON.
1b7abf1
to
775623e
Compare
(fn [v] | ||
(log/debug "decoding iri key:" v) | ||
(if (string? v) | ||
(if (str/includes? v "://") ; non-compact IRI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this non-compact iri test may need to be a bit more sophisticated - here's the spec: https://www.rfc-editor.org/rfc/rfc3987#section-2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there some that could contain a colon but not contain ://
? That's all this needs to ensure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think a colon is all you need to be an iri: foo:foo
is a valid fully expanded iri.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which should or should not be converted into :foo/foo
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of these should get touched. We’d never want to convert id => :id nor convert a string compact iri to a keyword compact iri as the @context meaning would break.
This would not only produce incorrect results but the concept of it introduces a bunch of extra compute for no real value, even if it would work (which it won’t).
Maybe I’m misunderstanding how this is intended to be used, is there a ticket for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: a colon designating an iri - we can/will store relative iris which would not have a colon until expanded with a default @base context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, here's a deeper dive explanation of what this does and why:
The original problem I set out to solve was described by this bug: fluree/http-api-gateway#25. The TL;DR of that issue is that queries like this weren't working:
{
"ledger": "example/time-travel",
"query": {
"commit-details": true,
"t": {
"at": "latest"
}
}
}
The core issue there is that when application/json
requests come in we need to convert some (but not all) map keys to Clojure keywords before passing the request into the fluree.db.json-ld.api
fns. Typically the outermost map keys need to be keywords but we need to leave the inner JSON-LD data alone so we didn't end up with keys like :schema:Person
(which wouldn't match the string-based context, etc.).
My first cut at a solution for this was a custom fluree-json-ld-decoder
in http-api-gateway (defined here: https://github.com/fluree/http-api-gateway/blob/e216d7fc30a55afb55ef845f50307bd11eb3eba2/src/fluree/http_api/components/http.clj#L100-L118) that converts only the outermost map keys from strings to keywords. This worked for many types of requests, but notably not that history query from the bug mentioned above.
That history query needs the "commit-details"
and "t"
keys under the "query"
key converted to keywords also. There are a few ways that could be solved, of course, but it was starting to feel like this string / keyword bug whack-a-mole was going to constantly plague us unless we could find a more comprehensive solution for it.
The goal I had in mind was to get to one standard internal format whenever possible, regardless of the format of the request or the requested response format. That one standard internal format could use string keys or keyword keys, but keywords felt more natural and idiomatic for Clojure.
So this solution uses malli schemas to coerce & transform incoming requests and outgoing responses to/from this internal format to whatever the incoming Content-Type
was and whatever the requested Accept
format was (if supported). It also coerces the context to keywords regardless of what format it is in the request. It does the colon -> namespacing conversion when it converts a string compact IRI in a JSON request to a Clojure keyword and vice versa when encoding back to JSON. I think this could obviate the need for the separate :context-str
data structure in db, but I haven't fully explored that yet.
This solution is certainly not comprehensive nor fully correct-to-spec yet, but I would be happy to push it further in that direction before merging this if anyone thinks that would be important at this stage. But it is important to note that this covers a lot more use cases than the previous solution to this same problem. And it removes the need for that fluree-json-ld-decoder
in http-api-gateway. It may also introduce new bugs, though, and if we have some use cases in mind already that might not work with this current implementation, I would be happy to add tests for them and get them covered too.
In the performance department, malli seems to be much faster at this than most hand-rolled solutions. Since this is something we need to do anyway (that is, convert from the incoming raw request to a data structure that the f.d.json-ld.api
fns can accept), malli appears to be one of the most performant ways to do it.
So, in the end, this allows us to have simpler, more performant code because it can make more assumptions, do less work, and leverages all of the hard work that went into making malli as fast as it is. And it is just scratching the surface of what we can do with malli to cover additional use cases.
I'd be happy to hop on a screen share to go over this in more detail if there are still questions or concerns that this explanation doesn't cover. In spite of its approved status, I'm going to leave this PR (and its counterpart in http-api-gateway: fluree/http-api-gateway#29) open until early next week at least so we can continue to discuss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🪜
@@ -27,8 +30,6 @@ | |||
(and (or (string? x) (symbol? x) (keyword? x)) | |||
(-> x name first (= \?)))) | |||
|
|||
(def value? (complement coll?)) | |||
|
|||
(defn sid? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably beyond the scope of this pr, but i think this function could go in the new validation util ns too.
::var [:fn {:decode/json | ||
(fn [v] | ||
(log/debug "decoding var:" v) | ||
(if (string? v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the variable?
fn already does a check for if this is a string or symbol iirc. why do we need to repeat that at the json decode level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the :decode/json
fn runs before validation / coercion. It even runs on things that malli ultimately decides don't conform to that (e.g. if they're a part of an :or
schema or similar). So you have to do a little sanity checking and just return the arg if it's not something you want to transform.
For example, the selector (i.e. value of the :select
key in a query) schema looks like this:
[:orn
[:var ::var]
[:pred ::iri-pred]
[:aggregate ::function]
[:select-map ::select-map]]
...which means that this decoder fn will get passed the {?p [:*]}
map when you have a query like {:select '{?p [:*]} ...}
because we've told it it could be a ::var
. So it has to know to ignore that.
v))} | ||
variable?] | ||
::val [:fn v/value?] | ||
::iri v/iri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad we have tighter validation for these.
Update: I'm going to investigate a context-aware version of this on Monday. I have a SHACL use case @mpoffald identified that shows where this implementation is a bit too simple and gets some things wrong, as we suspected it might. So I'll turn that into a test and then see if I can get something that at least leaves things alone that it shouldn't change while still fixing the same issues that this current branch fixes. |
These don't recurse back into ::where-map so no need to make them :ref's
Closing for now. Going to approach this differently. |
Part of fixing fluree/http-api-gateway#25 (for real). This allows the malli schema to decode & coerce JSON into our internal format and then re-encode to JSON when outputting that format. For example, it can transform
"foo:bar"
into:foo/bar
and vice versa.The latest commits pushed to fluree/http-api-gateway#29 rely on these changes.