Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: JSON decoder / encoder fns #403

Closed
wants to merge 5 commits into from

Conversation

cap10morgan
Copy link
Contributor

@cap10morgan cap10morgan commented Feb 28, 2023

Part of fixing fluree/http-api-gateway#25 (for real). This allows the malli schema to decode & coerce JSON into our internal format and then re-encode to JSON when outputting that format. For example, it can transform "foo:bar" into :foo/bar and vice versa.

The latest commits pushed to fluree/http-api-gateway#29 rely on these changes.

@cap10morgan cap10morgan requested a review from a team February 28, 2023 20:42
...and move some utility schemas to a shared util.validation namespace.

These schemas are useful in the http-api-gateway for coercing & encoding & decoding requests & responses to/from JSON.
(fn [v]
(log/debug "decoding iri key:" v)
(if (string? v)
(if (str/includes? v "://") ; non-compact IRI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this non-compact iri test may need to be a bit more sophisticated - here's the spec: https://www.rfc-editor.org/rfc/rfc3987#section-2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there some that could contain a colon but not contain ://? That's all this needs to ensure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think a colon is all you need to be an iri: foo:foo is a valid fully expanded iri.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which should or should not be converted into :foo/foo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these should get touched. We’d never want to convert id => :id nor convert a string compact iri to a keyword compact iri as the @context meaning would break.

This would not only produce incorrect results but the concept of it introduces a bunch of extra compute for no real value, even if it would work (which it won’t).

Maybe I’m misunderstanding how this is intended to be used, is there a ticket for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: a colon designating an iri - we can/will store relative iris which would not have a colon until expanded with a default @base context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, here's a deeper dive explanation of what this does and why:

The original problem I set out to solve was described by this bug: fluree/http-api-gateway#25. The TL;DR of that issue is that queries like this weren't working:

{
  "ledger": "example/time-travel",
  "query": {
     "commit-details": true,
     "t": {
        "at": "latest"
      }
   }
}

The core issue there is that when application/json requests come in we need to convert some (but not all) map keys to Clojure keywords before passing the request into the fluree.db.json-ld.api fns. Typically the outermost map keys need to be keywords but we need to leave the inner JSON-LD data alone so we didn't end up with keys like :schema:Person (which wouldn't match the string-based context, etc.).

My first cut at a solution for this was a custom fluree-json-ld-decoder in http-api-gateway (defined here: https://github.com/fluree/http-api-gateway/blob/e216d7fc30a55afb55ef845f50307bd11eb3eba2/src/fluree/http_api/components/http.clj#L100-L118) that converts only the outermost map keys from strings to keywords. This worked for many types of requests, but notably not that history query from the bug mentioned above.

That history query needs the "commit-details" and "t" keys under the "query" key converted to keywords also. There are a few ways that could be solved, of course, but it was starting to feel like this string / keyword bug whack-a-mole was going to constantly plague us unless we could find a more comprehensive solution for it.

The goal I had in mind was to get to one standard internal format whenever possible, regardless of the format of the request or the requested response format. That one standard internal format could use string keys or keyword keys, but keywords felt more natural and idiomatic for Clojure.

So this solution uses malli schemas to coerce & transform incoming requests and outgoing responses to/from this internal format to whatever the incoming Content-Type was and whatever the requested Accept format was (if supported). It also coerces the context to keywords regardless of what format it is in the request. It does the colon -> namespacing conversion when it converts a string compact IRI in a JSON request to a Clojure keyword and vice versa when encoding back to JSON. I think this could obviate the need for the separate :context-str data structure in db, but I haven't fully explored that yet.

This solution is certainly not comprehensive nor fully correct-to-spec yet, but I would be happy to push it further in that direction before merging this if anyone thinks that would be important at this stage. But it is important to note that this covers a lot more use cases than the previous solution to this same problem. And it removes the need for that fluree-json-ld-decoder in http-api-gateway. It may also introduce new bugs, though, and if we have some use cases in mind already that might not work with this current implementation, I would be happy to add tests for them and get them covered too.

In the performance department, malli seems to be much faster at this than most hand-rolled solutions. Since this is something we need to do anyway (that is, convert from the incoming raw request to a data structure that the f.d.json-ld.api fns can accept), malli appears to be one of the most performant ways to do it.

So, in the end, this allows us to have simpler, more performant code because it can make more assumptions, do less work, and leverages all of the hard work that went into making malli as fast as it is. And it is just scratching the surface of what we can do with malli to cover additional use cases.

I'd be happy to hop on a screen share to go over this in more detail if there are still questions or concerns that this explanation doesn't cover. In spite of its approved status, I'm going to leave this PR (and its counterpart in http-api-gateway: fluree/http-api-gateway#29) open until early next week at least so we can continue to discuss.

Copy link
Contributor

@zonotope zonotope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🪜

@@ -27,8 +30,6 @@
(and (or (string? x) (symbol? x) (keyword? x))
(-> x name first (= \?))))

(def value? (complement coll?))

(defn sid?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably beyond the scope of this pr, but i think this function could go in the new validation util ns too.

::var [:fn {:decode/json
(fn [v]
(log/debug "decoding var:" v)
(if (string? v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the variable? fn already does a check for if this is a string or symbol iirc. why do we need to repeat that at the json decode level?

Copy link
Contributor Author

@cap10morgan cap10morgan Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the :decode/json fn runs before validation / coercion. It even runs on things that malli ultimately decides don't conform to that (e.g. if they're a part of an :or schema or similar). So you have to do a little sanity checking and just return the arg if it's not something you want to transform.

For example, the selector (i.e. value of the :select key in a query) schema looks like this:

[:orn
 [:var ::var]
 [:pred ::iri-pred]
 [:aggregate ::function]
 [:select-map ::select-map]]

...which means that this decoder fn will get passed the {?p [:*]} map when you have a query like {:select '{?p [:*]} ...} because we've told it it could be a ::var. So it has to know to ignore that.

v))}
variable?]
::val [:fn v/value?]
::iri v/iri
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad we have tighter validation for these.

@cap10morgan
Copy link
Contributor Author

Update: I'm going to investigate a context-aware version of this on Monday. I have a SHACL use case @mpoffald identified that shows where this implementation is a bit too simple and gets some things wrong, as we suspected it might. So I'll turn that into a test and then see if I can get something that at least leaves things alone that it shouldn't change while still fixing the same issues that this current branch fixes.

These don't recurse back into ::where-map so no need to make them :ref's
@cap10morgan
Copy link
Contributor Author

Closing for now. Going to approach this differently.

@cap10morgan cap10morgan deleted the feature/json-decoder-fns branch March 10, 2023 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants