-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling subtypes in relationships + for validation/middleware #149
Comments
@carlbennettnz Since both of the solutions above could require adapter changes, can you weigh in here if you have a preference? I don't want to do anything that causes too much trouble for your postgres adapter. Also, lmk if my post is unclear in places. I wrote it as a "think out loud" exercise to myself, so there might be some places where I'm using non-standard phrases that only make sense in my head.(That's also why it has all the rejected approaches and dead ends.) |
Some other notes about/implications of option 2:
|
Hmm, tricky problem. Here are some assorted thoughts. Whichever solution you choose, it probably won't affect my postgres adapter too much, mostly because I haven't implemented subtypes at all yet. When I eventually do (I'm planning on doing the last few things needed before PRing it sometime next month) I can't foresee any major issues with either option. I'll run through the implications for solution 1 for each relationship type. To-many relationshipsA response like this {
"type": "authors",
"id": "1",
"relationships": {
"posts": {
"data": [
{ "type": "posts", "id": "1" },
{ "type": "posts", "id": "2" }
]
}
}
} is generated using a SQL query like this SELECT author.*, array_agg(post.id)
FROM author
LEFT JOIN post ON post.author = author.id
GROUP BY author.id; If I wanted to add a posts subtype, fancy-posts, I could use the hidden system column {
"type": "authors",
"id": "1",
"relationships": {
"posts": {
"data": [
{ "type": "posts", "id": "1" },
{ "type": "fancy-posts", "id": "2" }
]
}
}
} could be generated with this query SELECT author.*, array_agg((post.id, post.tableoid))
FROM author
LEFT JOIN post ON post.author = author.id
GROUP BY author.id; with no serious overhead. To-one relationshipsThis would receive a slight hit. Currently, the foreign key is loaded directly without a join. This would change to this SELECT author.*, manger.tableoid,
FROM author
LEFT JOIN manager ON manager.id = author.manager; so one extra join would be needed for every foreign key referencing a parent type. Not really a big deal IMO, and it's invisible to the library user. Many-to-many relationshipsThese would also receive a minor hit on render. Current situation CREATE TABLE post (id int, title text);
CREATE TABLE tag (id int, name text);
CREATE TABLE post_tag (post int REFERENCES post.id, tag int REFERENCES tag.id);
SELECT post.*, array_agg(post_tag.tag)
FROM post
LEFT JOIN post_tag ON post_tag.post = post.id; would change to SELECT post.*, array_agg((post_tag.tag, tag.tableoid))
FROM post
LEFT JOIN post_tag ON post_tag.post = post.id
LEFT JOIN tag ON post_tag.tag = tag.id; So again, an extra join per foreign key referencing a parent type. The only other implication I can think of would be that I would need to check that the type the client said was actually correct. I can imagine a security issue arising through the client sending a PATCH for a school, but setting the type field to 'organization', bypassing the extra access control for schools. Inherited tables in postgres are exposed transparently enough that an UPDATE on an organization that's actually a school could conceivably be made without errors. However, this is just a general issue with implementing JSON API subtypes and probably not relevant here. That's all I've got for now, but I'll comment again if I think of anything else. Also, I haven't forgotten about that full review of v3 I promised, just really busy right now. I'll try to find some time to go through it soon. |
Ok, glad to hear it! Based on that, and thinking about it more, I've decided to go with option 2. Performance tradeoffs aside, I also like option 2 much better for:
My intention is to free the adapter from having to do these checks as much as possible. The idea is that, as long as the adapter implements a function for getting the full set of types for a (list of) resources by their I'm working on a PR for this now, and will try to have something soon. |
Sweet. If you could document the specifics of how yours works, I'll make sure the behaviour of mine matches. |
Resolves #149. This is a pretty major change, as it touches most parts of the system. In particular: - The Resource and ResourceIdentifier types now have `typesList`, `typePath`, and `adapterExtra` members. See the Resource class for details on their role. These members are set as appropriate (on update and create) by the APIController using the setTypePaths function, which is now passed to all query factories. - The transform logic has been re-written to read from Resource.typePath instead of Resource.type, while still reading from ResourceIdentifierObject.type for transforming linkage (which can now occasionally have different results; see UPGRADING.md). - Various validation routines have been updated to account for the new values in the `type` key and have been split up where appropriate (e.g. validate-resources => validate-resource-types, validate-resource-ids, and validate-resource-data). - The adapter interface has been updated to require Adapters to provide an instance method called `getModelName` (before, this was required as a static method) and an instance method called `getTypePaths`. See MongooseAdapter for an example. Other MongooseAdapter methods have been removed. See UPGRADING.md. In the process, the adapter has been cleaned up and adapter-specific tests have been pulled out out into a separate `tests/integration/mongoose-adapter` file. - Using multiple adapters simultaneously is no longer possible. See UPGRADING.md. - Lots of new tests have been added.
@carlbennettnz I've implemented this here and published the latest as 3.0.0-beta.11. I've tried to document things reasonably thoroughly in the commit message and in the UPGRADING file, but there's a lot that changed, so let me know if anything's missing. The solution ended up being basically what we discussed above, with some small naming tweaks. |
Thanks for this, especially the incredibly thorough commit message and UPGRADING file. I'm looking through it now. |
@carlbennettnz Cool cool. Random question for you: in your Postgres adapter/with your fork of the library, how do you validate incoming data for field-specific requirements? Like, if you have a zip code field that needs to be x-y digits long, where is that checked? In beforeSave, I guess? If so, are those checks written directly in each model's beforeSave functions, or is there some abstraction for declaring the validation constraints in the model, and then automatically applying those in beforeSave? |
We're checking simple constraints like that in the database itself. CREATE TABLE zip (
code varchar(10) CHECK (char_length(code) BETWEEN 5 AND 10)
); Our access control system produces resource descriptions that check AC policies and also call // models/model-name.js
module.exports = {
attributes: [],
relationships: [],
validate() {},
transforms: {
beforeRender() {},
beforeSave() {},
beforeDelete() {}
}
} |
Awesome, thanks! |
Would you mind taking a look at ethanresnick/json-api-example#v3-wip to see if anything subtle has fallen out of date? I got it running with a few minor tweaks, but the following req adds an organization, not a school, for me. 99% sure I'm doing the same thing as the tests.
|
Good catch. Should be fixed now in the example repo. I think the issue was just that the mongoose dependency was too old. (Mongoose < 4.8 had some bugs around discriminators.) I'll document this new requirement too. |
I'm still experimenting and writing up my feedback (there are quite a few points I hadn't considered earlier about actually consuming the API, especially with libraries like Ember Data) but while I'm working on that, is this summary of how subtyping works accurate? I just want to check my understanding of the changes to the public API. Fetching
Creating
Updating
Deleting
|
Also, here's a few (mostly minor) issues I found while testing/reading.
|
Thanks for doing such a thorough review. Having you as a second set of eyes to sanity check the logic, and verify that the implementation is actually doing what its supposed to do, is super useful. Most of what you've described is spot on, so I'll just respond to the bits where I have something to add.
In the serialization of
Maybe I'm crazy, but I'm pretty sure the
Yeah, these behaviors are a bug. The adapter should at least verify that the Arguably, the adapter should go further and verify that there actually is a document that exists with that id and that its discriminator key indicates that it is really a school, but I think that check is lower priority/I go back and forth on whether it's worth the overhead. (That check for the actual existence of a document with the related id didn't happen before subtypes were introduced either, and would require another query for each relationship in mongo land. Checking the existence of the related resource is easier in SQL land, I imagine, as the foreign key constraint should just fail.)
The logic is supposed to be: in all cases,
Yeah, that's a bad bug. I was pretty beat by the time I was working on the relationship-related adapter methods, so I'm kinda not surprised that those aren't working. I'll definitely add some tests, and then try to fix whatever stupid bug I introduced in my state of "can this feature please be done by now?".
D'oh! And that, kids, is why it's important to see your tests fail before they pass :D Thanks for catching this. Just pushed a fix.
Hmm, I can't reproduce this... even when I drop the whole db, the tests seem to pass on next run. There are some tests that depend on some fixtures data being inserted first, though. On my machine, that data gets added as part of running the test command, but maybe those fixtures aren't running for you for some reason?
I'm not sure I follow. The If you're talking about in the example API, I left out the
The bug as I understand it really comes up anywhere the |
Nope, I'm the crazy one. No idea where I got the idea that this was a new thing.
Yep, this was actually a big factor is why we switched. We were getting more and more issues from relationships that pointed to nowhere. Postgres now checks references are valid both when adding references and when deleting/updating referenced records.
Oh boy do I know that feeling
Ah sorry, I forgot to mention I was only running the subtype tests. I get 5 failures with this:
Ah, okay. Should have looked a little closer here. Ignore me, it's working perfectly.
Woah, cool. I never realised there was a way to protect against someone messing with |
Makes sense. Does this new approach to subtyping make that check (prohibitively) harder (because you also, I think, would want to check the discriminator key)?
Ahh, gotcha. Yeah, that'll fail. I've updated the library to pull out installing the fixtures into a separate npm command. Now, if you do |
Right now, there's a bug with how subtypes work (at least when backed by the Mongoose adapter, which implements subtypes with Mongo discriminators).
E.g., suppose you have type
Organization
, with a subtype ofSchool
. Also, imagine there's aPerson
type that has amanages
relationship pointing to anOrganization
.In Mongoose, that becomes:
The problem is that, when the library serializes a person document, all it has to go on for serializing the
manages
relationship is an ObjectId in the document and a schema saying that id points to a document in the collection for the Organization model. Accordingly, it renders the relationship as:When the related document is actually rendered, however (whether directly or through an include), its
type
key's value might be"schools"
instead of"organizations"
, because, currently, the discriminator is read to produce that type. This causes a big problem, as it makes it impossible to match up the related resource to the resource identifier object in the person resource.Solutions
Return proper subtype in resource identifier objects
One fix here would be to detect whether a relationship is pointing to a type that has subtypes (like Organization) and, if so, actually query for the discriminator key of the related documents, to check its value before rendering the resource identifier object. (In SQL, this would be a simple join.)
In Mongo, this could be done with an aggregate query leveraging
$lookup
. That query is more complex, and so would presumably take longer to execute (way longer if not all of the database’s data fits in memory), but I doubt this is a big issue, and at least it wouldn't require multiple network roundtrips. There would be some other issues, but they should all be surmountable. E.g.:The results that come out of the aggregate won't be Mongoose model instances (which makes sense, because they could have any shape). That means that Mongoose document initialization operations won't happen (idk what these include -- casting fields?) and pre/post
init
middleware won't run. It is possible, though, to instantiate Mongoose docs from the aggregate result (usingModel.hydrate
), which should solve this.find/findOne query middlewhere won't run -- but I guess that could be ok because the
aggregate
middleware hook could be used instead. That would still be a breaking change, and I suspect that manipulating the aggregation pipeline is more cumbersome/brittle than manipulating a find query, but it's doable.Always use the parent type
So, the above approach would work, but there's also another option entirely: always return the parent type in the
type
key, and then indicate any subtypes in ameta.interfaces
array or similar. The argument for this is that:it makes more information available to the client, which can see (by the shared
"type"
value) that the subtype is also an instance of the parent type;it makes it possible to reclassify a resource as a further subtype later (e.g., an
organization
can become a school in a non-breaking way; the resource still has"type": "organizations"
and the interfaces list just gets added to).The challenge with this approach is that the subtypes aren’t necessarily provided by the client on an update or delete request, because the subtype list isn’t part of the resource’s identity (which is what makes advantage 2 possible above).
This is a problem if we want to use the deepest subtype that applies to the resource to trigger appropriate Mongoose middleware and validators (because we don’t know what that deepest subtype is just from what the client’s provided).
There are a few possible workarounds:
We could require that the client provide an
interfaces
array with its delete and update requests. This would be read to determine which model (and so which middleware/validators) to use, and it would be put into a criteria on the query, so that the client can’t just make up a value forinterfaces
. That is, if theinterfaces
provided by the client don’t match those in the db, the query will match 0 documents and have no effect. There are a few problems with this, though:Deleting or updating a resource without first fetching it becomes impossible. This might be fine in most cases, but there are some common-enough cases where it seems pretty bad. In particular, if a client fetches resource A, and want to delete all resources in one of its relationships, it can’t just use the resource identifier objects from the relationship in order to do that. (Unless resource identifier objects also contain the
interfaces
in meta, but then we’re back to querying a bunch of extra data on find.)JSON:API requests to delete single resources have no body, so the
interfaces
list would have to go in a query param or something. Ick.Off the shelf JSON:API client libraries, which might well have some built-in logic for tracking changed fields and putting them in a PATCH request, or triggering a delete, would all have to be configured to store and send back the interfaces too. This might be non-trivial depending on how the library’s architected, and breaks a lot of the “you don’t have to think about it” value that JSON:API is trying to offer.
We could simply always use the parent model (which is identified in the
type
key), with its middleware and validators, on update and delete. Validations that depend on the discriminator key could be implemented at the database level, in the same way validations that depend on multiple fields already need to be (because the adapter doesn’t load the document first). The validation would look something like this:There are still a few problems with this approach, though:
While validation can usually still be accomplished (and maybe even in a more reliable, if less flexible, way than at the application level), its likely to be somewhat confusing to users that even single-field validators defined on the subtype don’t run during updates. That just seems like a recipe for bugs/security vulnerabilities and surprise.
There are other middleware that the subtype might want to define, e.g. for modifying the query or responding after an update/delete, and these won’t run. Instead, the parent type’s middleware will run. I suspect this will also be confusing, and has poor ergonomics. If the user ends up needing to query for the discriminator type in those middleware anyway, that query may even end up being less efficient than if we did it in the adapter (because there we can do one query for many documents, if a bulk update or bulk delete is happening).
A third option would be to actually query for the document before update and delete in order to get its subtype, and then use that to trigger the proper middleware and validators. This could be done with the consistency control described here to make sure the document isn’t changed out from under us between read and our subsequent write. This option probably makes the most sense, because it preserves all the usability wins of mongoose and actually gives us back (with no extra work) application-level, cross-field validation (which is arguably necessary actually for cases where db validation isn’t expressive enough). And, we can use the version key to prevent race conditions. Even with this approach, though, there are still a few problems/things to work out:
How do we to run the subtype’s beforeSave function, since we don’t know the subtype until we’ve queried for it in the adapter? One natural solution would be to add a
getTypes(type, id): Promise<string[]>
method to the adapter. Then, this would run first, and the result would be passed back to the update/delete methods later. One problem with this is that those methods would still have to query for the full document (to get all the fields for the validators, and the version key), so that’d be duplicative.getTypes
could returnPromise<{ types: string[], extra: any }>
, though, and the adapter could put the full doc inextra
in order to not have to find it again later. Then the last problem is on bulk updates and deletes, we’d be issuing one query per document. I guess the easy workaround for that would begetTypesBulk(ids: {type: string, id: string}[])
, which would have a default implementation of callinggetTypes
for each resource identifier and awaiting all the results, but could be specified in each adapter to be more efficient [ig we’d mix that default implementation into whatever adapter instance the user provides].What are the rules for subtype mutability? That is, can an update change the resource’s subtype? If so, how does that change which beforeSaves and validators run? Maybe the simple answer is to forbid this altogether for now (from standard PATCH requests; the user could presumably define a custom endpoint for doing it outside the semantics of the json-api library) and figure out the semantics later — although making this change impossible out of the box maybe defeats some of the value of using the parent type in the type field.
So, summing it all up, we have two options that seem ok:
Always use the child type in the
type
key. This adds some expense to reads, because we have to look up the discriminator value in the referenced document in resource identifier objects. But, it ensures that we always have the subtype when we’re handling updates and deletions, and so can run the proper middleware and single-field validators. Validations that depend on multiple (non-discriminator) fields still have to be done at the db level. Changing an existing resource to a new subtype — which should be a very rare event — would change the resource’s id, which would break client efforts at, e.g., synchronization after coming back from a period offline. However, doing that (or adding a new subtype) could still happen without breaking clients, if clients look at ameta.interfaces
key to determine which resources can be used where. (The new subtype would still have the parent type in its interfaces.)Always use the parent type in the
type
key, and look up the document (to get its full type hierarchy) on update and delete. This makes reads faster at the expense of writes, which is generally a good idea, especially since reads would probably be slowed down more. (That’s just a guess. I imagine the extra network trip for writes costs more than the extra reading if the database contents fit in RAM. But, if they don’t, reading from disk is probably slower than than the network overhead in a good datacenter, though it might be close with an SSD(?). Obviously depends on the number of docs being read, number of relationships that can hold subtypes, and whether mongo can parallelize ‘sequential’ $lookup stages to different collections.) This approach also means that we can have application-level, cross-field validations run, because we might as well look up the whole document when we go to look up its list of types. Finally, it means that, in theory, we could change a resource’s type without changing its id from the client’s perspective — even if I haven’t worked out the rules for that yet. Note that, in this approach, we could still allow types (presumably those with no subtypes) to opt-in individually to the faster findOneAndUpdate path that would work like option 2 above. Also note, in this approach resource identifier objects wouldn't indicate their subtypes -- which is probably appropriate if they're potentially mutable (like attributes).All told, option two seems better, so I'm leaning towards implementing that.
The text was updated successfully, but these errors were encountered: