Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling subtypes in relationships + for validation/middleware #149

Closed
ethanresnick opened this issue Feb 5, 2018 · 17 comments
Closed

Handling subtypes in relationships + for validation/middleware #149

ethanresnick opened this issue Feb 5, 2018 · 17 comments

Comments

@ethanresnick
Copy link
Owner

ethanresnick commented Feb 5, 2018

Right now, there's a bug with how subtypes work (at least when backed by the Mongoose adapter, which implements subtypes with Mongo discriminators).

E.g., suppose you have type Organization, with a subtype of School. Also, imagine there's a Person type that has a manages relationship pointing to an Organization.

In Mongoose, that becomes:

const personSchema = Schema({  
  // ...   
  manages: { type: Schema.Types.ObjectId, ref: 'Organization' }
})

The problem is that, when the library serializes a person document, all it has to go on for serializing the manages relationship is an ObjectId in the document and a schema saying that id points to a document in the collection for the Organization model. Accordingly, it renders the relationship as:

data: { "type": "organizations", "id": "....." }

When the related document is actually rendered, however (whether directly or through an include), its type key's value might be "schools" instead of "organizations", because, currently, the discriminator is read to produce that type. This causes a big problem, as it makes it impossible to match up the related resource to the resource identifier object in the person resource.

Solutions

Return proper subtype in resource identifier objects

One fix here would be to detect whether a relationship is pointing to a type that has subtypes (like Organization) and, if so, actually query for the discriminator key of the related documents, to check its value before rendering the resource identifier object. (In SQL, this would be a simple join.)

In Mongo, this could be done with an aggregate query leveraging $lookup. That query is more complex, and so would presumably take longer to execute (way longer if not all of the database’s data fits in memory), but I doubt this is a big issue, and at least it wouldn't require multiple network roundtrips. There would be some other issues, but they should all be surmountable. E.g.:

  1. The results that come out of the aggregate won't be Mongoose model instances (which makes sense, because they could have any shape). That means that Mongoose document initialization operations won't happen (idk what these include -- casting fields?) and pre/post init middleware won't run. It is possible, though, to instantiate Mongoose docs from the aggregate result (using Model.hydrate), which should solve this.

  2. find/findOne query middlewhere won't run -- but I guess that could be ok because the aggregate middleware hook could be used instead. That would still be a breaking change, and I suspect that manipulating the aggregation pipeline is more cumbersome/brittle than manipulating a find query, but it's doable.

Always use the parent type

So, the above approach would work, but there's also another option entirely: always return the parent type in the type key, and then indicate any subtypes in a meta.interfaces array or similar. The argument for this is that:

  1. it makes more information available to the client, which can see (by the shared "type" value) that the subtype is also an instance of the parent type;

  2. it makes it possible to reclassify a resource as a further subtype later (e.g., an organization can become a school in a non-breaking way; the resource still has "type": "organizations" and the interfaces list just gets added to).

The challenge with this approach is that the subtypes aren’t necessarily provided by the client on an update or delete request, because the subtype list isn’t part of the resource’s identity (which is what makes advantage 2 possible above).

This is a problem if we want to use the deepest subtype that applies to the resource to trigger appropriate Mongoose middleware and validators (because we don’t know what that deepest subtype is just from what the client’s provided).

There are a few possible workarounds:

  1. We could require that the client provide an interfaces array with its delete and update requests. This would be read to determine which model (and so which middleware/validators) to use, and it would be put into a criteria on the query, so that the client can’t just make up a value for interfaces. That is, if the interfaces provided by the client don’t match those in the db, the query will match 0 documents and have no effect. There are a few problems with this, though:

    • Deleting or updating a resource without first fetching it becomes impossible. This might be fine in most cases, but there are some common-enough cases where it seems pretty bad. In particular, if a client fetches resource A, and want to delete all resources in one of its relationships, it can’t just use the resource identifier objects from the relationship in order to do that. (Unless resource identifier objects also contain the interfaces in meta, but then we’re back to querying a bunch of extra data on find.)

    • JSON:API requests to delete single resources have no body, so the interfaces list would have to go in a query param or something. Ick.

    • Off the shelf JSON:API client libraries, which might well have some built-in logic for tracking changed fields and putting them in a PATCH request, or triggering a delete, would all have to be configured to store and send back the interfaces too. This might be non-trivial depending on how the library’s architected, and breaks a lot of the “you don’t have to think about it” value that JSON:API is trying to offer.

  2. We could simply always use the parent model (which is identified in the type key), with its middleware and validators, on update and delete. Validations that depend on the discriminator key could be implemented at the database level, in the same way validations that depend on multiple fields already need to be (because the adapter doesn’t load the document first). The validation would look something like this:

    db.createCollection( "contacts", { 
      validator: { 
        $or: [ 
          { $and: [
              __t: { $eq: “SubType” }, 
              /* other, sub-type specific validators */ 
            ]
          },
          { $and: [
             __t: { $eq: “SuperType” }, 
             /* validators for supertype */
           ]
          }
        ]
      }
    })
    

    There are still a few problems with this approach, though:

    • While validation can usually still be accomplished (and maybe even in a more reliable, if less flexible, way than at the application level), its likely to be somewhat confusing to users that even single-field validators defined on the subtype don’t run during updates. That just seems like a recipe for bugs/security vulnerabilities and surprise.

    • There are other middleware that the subtype might want to define, e.g. for modifying the query or responding after an update/delete, and these won’t run. Instead, the parent type’s middleware will run. I suspect this will also be confusing, and has poor ergonomics. If the user ends up needing to query for the discriminator type in those middleware anyway, that query may even end up being less efficient than if we did it in the adapter (because there we can do one query for many documents, if a bulk update or bulk delete is happening).

  3. A third option would be to actually query for the document before update and delete in order to get its subtype, and then use that to trigger the proper middleware and validators. This could be done with the consistency control described here to make sure the document isn’t changed out from under us between read and our subsequent write. This option probably makes the most sense, because it preserves all the usability wins of mongoose and actually gives us back (with no extra work) application-level, cross-field validation (which is arguably necessary actually for cases where db validation isn’t expressive enough). And, we can use the version key to prevent race conditions. Even with this approach, though, there are still a few problems/things to work out:

    • How do we to run the subtype’s beforeSave function, since we don’t know the subtype until we’ve queried for it in the adapter? One natural solution would be to add a getTypes(type, id): Promise<string[]> method to the adapter. Then, this would run first, and the result would be passed back to the update/delete methods later. One problem with this is that those methods would still have to query for the full document (to get all the fields for the validators, and the version key), so that’d be duplicative. getTypes could return Promise<{ types: string[], extra: any }>, though, and the adapter could put the full doc in extra in order to not have to find it again later. Then the last problem is on bulk updates and deletes, we’d be issuing one query per document. I guess the easy workaround for that would be getTypesBulk(ids: {type: string, id: string}[]), which would have a default implementation of calling getTypes for each resource identifier and awaiting all the results, but could be specified in each adapter to be more efficient [ig we’d mix that default implementation into whatever adapter instance the user provides].

    • What are the rules for subtype mutability? That is, can an update change the resource’s subtype? If so, how does that change which beforeSaves and validators run? Maybe the simple answer is to forbid this altogether for now (from standard PATCH requests; the user could presumably define a custom endpoint for doing it outside the semantics of the json-api library) and figure out the semantics later — although making this change impossible out of the box maybe defeats some of the value of using the parent type in the type field.

So, summing it all up, we have two options that seem ok:

  1. Always use the child type in the type key. This adds some expense to reads, because we have to look up the discriminator value in the referenced document in resource identifier objects. But, it ensures that we always have the subtype when we’re handling updates and deletions, and so can run the proper middleware and single-field validators. Validations that depend on multiple (non-discriminator) fields still have to be done at the db level. Changing an existing resource to a new subtype — which should be a very rare event — would change the resource’s id, which would break client efforts at, e.g., synchronization after coming back from a period offline. However, doing that (or adding a new subtype) could still happen without breaking clients, if clients look at a meta.interfaces key to determine which resources can be used where. (The new subtype would still have the parent type in its interfaces.)

  2. Always use the parent type in the type key, and look up the document (to get its full type hierarchy) on update and delete. This makes reads faster at the expense of writes, which is generally a good idea, especially since reads would probably be slowed down more. (That’s just a guess. I imagine the extra network trip for writes costs more than the extra reading if the database contents fit in RAM. But, if they don’t, reading from disk is probably slower than than the network overhead in a good datacenter, though it might be close with an SSD(?). Obviously depends on the number of docs being read, number of relationships that can hold subtypes, and whether mongo can parallelize ‘sequential’ $lookup stages to different collections.) This approach also means that we can have application-level, cross-field validations run, because we might as well look up the whole document when we go to look up its list of types. Finally, it means that, in theory, we could change a resource’s type without changing its id from the client’s perspective — even if I haven’t worked out the rules for that yet. Note that, in this approach, we could still allow types (presumably those with no subtypes) to opt-in individually to the faster findOneAndUpdate path that would work like option 2 above. Also note, in this approach resource identifier objects wouldn't indicate their subtypes -- which is probably appropriate if they're potentially mutable (like attributes).

All told, option two seems better, so I'm leaning towards implementing that.

@ethanresnick
Copy link
Owner Author

@carlbennettnz Since both of the solutions above could require adapter changes, can you weigh in here if you have a preference? I don't want to do anything that causes too much trouble for your postgres adapter.

Also, lmk if my post is unclear in places. I wrote it as a "think out loud" exercise to myself, so there might be some places where I'm using non-standard phrases that only make sense in my head.(That's also why it has all the rejected approaches and dead ends.)

@ethanresnick
Copy link
Owner Author

Some other notes about/implications of option 2:

  • Because the subtype and the parent type would be serialized with the same type key in json, it wouldn't be possible to use the ?fields[TYPE] parameter to target each independently. In other words, you could previously ask for some fields of the parent type and then have them excluded on the subtype. Now, any fields included in the parent type will also be included on the subtype. I don't think those extra bits are a big deal. Note: it's still possible to put subtype-only fields in ?fields[type], and the parent type resources will just be missing those fields.

  • The subtype would internally still have its own resource description (as alluded to above re running its beforeSave) and be able to show up as a type name in all internal queries (see below). From the library's POV, in other words, the subtype is a full-fledged thing. The only difference is the serialization of the type key.

    • Find and delete queries could still have the subtype in their type key, which would happen if the endpoint that created the query is only meant to act on (return/delete) instances of the subtype. (E.g. DELETE /subtype/:id. Of course, you might not want that particular endpoint now, as GET /subtype/:id, could go from 404 to 200 and back again if the subtype changes, which it might in the future; probably better to just have /parentType/:id, but that decision is outside the scope of the library.) This is equivalent to an (already possible) find/delete query with the parent type in the type key and a where clause limiting the results to a subtype; the adapter would be expected to treat those equivalently. AddTo and RemoveFromRelationship queries would work similarly. In those queries, type, which refers to the type of the resource owning the relationship, could also be a subtype (e.g. POST /subtype/:id/relationships/:relname), and the adapter would try to look up a resource with id = id, table/collection = parent type where discriminator = subtype, just like on a find.

    • Create queries could still refer to the subtype in type as well; on create, the library would read meta.interfaces (or whatever its called; maybe meta.types) to determine which value to put there. The logic in the validate-resources step would have to be updated to account for how the new types work.

    • Update queries could also have type as the subtype (e.g. on PATCH /subtype). The json type key of the resources to PATCH, though, would not match subtype, so this endpoint's restriction would have to be enforced at the make-query step by calling the getTypes function described above (which was gonna need to happen anyway).

@carlbennettnz
Copy link
Contributor

Hmm, tricky problem. Here are some assorted thoughts.

Whichever solution you choose, it probably won't affect my postgres adapter too much, mostly because I haven't implemented subtypes at all yet. When I eventually do (I'm planning on doing the last few things needed before PRing it sometime next month) I can't foresee any major issues with either option. I'll run through the implications for solution 1 for each relationship type.

To-many relationships

A response like this

{
  "type": "authors",
  "id": "1",
  "relationships": {
    "posts": {
      "data": [
        { "type": "posts", "id": "1" },
        { "type": "posts", "id": "2" }
      ]
    }
  }
}

is generated using a SQL query like this

SELECT author.*, array_agg(post.id)
FROM author
LEFT JOIN post ON post.author = author.id
GROUP BY author.id;

If I wanted to add a posts subtype, fancy-posts, I could use the hidden system column tableoid to differentiate. In that case, this response

{
  "type": "authors",
  "id": "1",
  "relationships": {
    "posts": {
      "data": [
        { "type": "posts", "id": "1" },
        { "type": "fancy-posts", "id": "2" }
      ]
    }
  }
}

could be generated with this query

SELECT author.*, array_agg((post.id, post.tableoid))
FROM author
LEFT JOIN post ON post.author = author.id
GROUP BY author.id;

with no serious overhead.

To-one relationships

This would receive a slight hit. Currently, the foreign key is loaded directly without a join. This would change to this

SELECT author.*, manger.tableoid,
FROM author
LEFT JOIN manager ON manager.id = author.manager;

so one extra join would be needed for every foreign key referencing a parent type. Not really a big deal IMO, and it's invisible to the library user.

Many-to-many relationships

These would also receive a minor hit on render. Current situation

CREATE TABLE post (id int, title text);
CREATE TABLE tag (id int, name text);
CREATE TABLE post_tag (post int REFERENCES post.id, tag int REFERENCES tag.id);

SELECT post.*, array_agg(post_tag.tag)
FROM post
LEFT JOIN post_tag ON post_tag.post = post.id;

would change to

SELECT post.*, array_agg((post_tag.tag, tag.tableoid))
FROM post
LEFT JOIN post_tag ON post_tag.post = post.id
LEFT JOIN tag ON post_tag.tag = tag.id;

So again, an extra join per foreign key referencing a parent type.


The only other implication I can think of would be that I would need to check that the type the client said was actually correct. I can imagine a security issue arising through the client sending a PATCH for a school, but setting the type field to 'organization', bypassing the extra access control for schools. Inherited tables in postgres are exposed transparently enough that an UPDATE on an organization that's actually a school could conceivably be made without errors. However, this is just a general issue with implementing JSON API subtypes and probably not relevant here.

That's all I've got for now, but I'll comment again if I think of anything else. Also, I haven't forgotten about that full review of v3 I promised, just really busy right now. I'll try to find some time to go through it soon.

@ethanresnick
Copy link
Owner Author

ethanresnick commented Feb 12, 2018

Whichever solution you choose, it probably won't affect my postgres adapter too much, mostly because I haven't implemented subtypes at all yet. When I eventually do (I'm planning on doing the last few things needed before PRing it sometime next month) I can't foresee any major issues with either option.

Ok, glad to hear it!

Based on that, and thinking about it more, I've decided to go with option 2. Performance tradeoffs aside, I also like option 2 much better for:

  1. its semantics (e.g., changing between subtypes, if possible down the line, really shouldn't effect the resource's identity from JSON:API's POV);

  2. features it makes more natural (e.g., if we're already querying at least for the resource's discriminator value on update, querying for the whole resource has smaller marginal cost, and that enables app-level validation based on the resources's current state, which I think we want anyway);

  3. other things it simplifies, like dealing with ?fields[TYPE] on GET requests (and reconciling that with ?includes).

The only other implication I can think of would be that I would need to check that the type the client said was actually correct. I can imagine a security issue arising through the client sending a PATCH for a school, but setting the type field to 'organization', bypassing the extra access control for schools.

My intention is to free the adapter from having to do these checks as much as possible. The idea is that, as long as the adapter implements a function for getting the full set of types for a (list of) resources by their (type, id), the library can just call that on updates to verify that the resource being patched is in fact of an appropriate type for the endpoint, and to figure out which beforeSave methods to call.

I'm working on a PR for this now, and will try to have something soon.

@carlbennettnz
Copy link
Contributor

Sweet. If you could document the specifics of how yours works, I'll make sure the behaviour of mine matches.

ethanresnick added a commit that referenced this issue Feb 14, 2018
Resolves #149.

This is a pretty major change, as it touches most parts of the system.
In particular:

- The Resource and ResourceIdentifier types now have `typesList`,
`typePath`, and `adapterExtra` members. See the Resource class for
details on their role. These members are set as appropriate (on update
and create) by the APIController using the setTypePaths function, which
is now passed to all query factories.

- The transform logic has been re-written to read from
Resource.typePath instead of Resource.type, while still reading from
ResourceIdentifierObject.type for transforming linkage (which can now
occasionally have different results; see UPGRADING.md).

- Various validation routines have been updated to account for the new
values in the `type` key and have been split up where appropriate (e.g.
validate-resources => validate-resource-types, validate-resource-ids,
and validate-resource-data).

- The adapter interface has been updated to require Adapters to provide
an instance method called `getModelName` (before, this was required as
a static method) and an instance method called `getTypePaths`. See
MongooseAdapter for an example. Other MongooseAdapter methods have been
removed. See UPGRADING.md. In the process, the adapter has been cleaned
up and adapter-specific tests have been pulled out out into a separate
`tests/integration/mongoose-adapter` file.

- Using multiple adapters simultaneously is no longer possible. See
UPGRADING.md.

- Lots of new tests have been added.
@ethanresnick
Copy link
Owner Author

@carlbennettnz I've implemented this here and published the latest as 3.0.0-beta.11. I've tried to document things reasonably thoroughly in the commit message and in the UPGRADING file, but there's a lot that changed, so let me know if anything's missing. The solution ended up being basically what we discussed above, with some small naming tweaks.

@carlbennettnz
Copy link
Contributor

Thanks for this, especially the incredibly thorough commit message and UPGRADING file. I'm looking through it now.

@ethanresnick
Copy link
Owner Author

@carlbennettnz Cool cool.

Random question for you: in your Postgres adapter/with your fork of the library, how do you validate incoming data for field-specific requirements? Like, if you have a zip code field that needs to be x-y digits long, where is that checked? In beforeSave, I guess? If so, are those checks written directly in each model's beforeSave functions, or is there some abstraction for declaring the validation constraints in the model, and then automatically applying those in beforeSave?

@carlbennettnz
Copy link
Contributor

We're checking simple constraints like that in the database itself.

CREATE TABLE zip (
  code varchar(10) CHECK (char_length(code) BETWEEN 5 AND 10)
);

Our access control system produces resource descriptions that check AC policies and also call before{Render|Save|Delete} hooks if they're present in the model definition. There's also a validate() function called by the adapter, but that's basically only there as a hangover from using Mongoose.

// models/model-name.js
module.exports = {
  attributes: [],
  relationships: [],
  validate() {},
  transforms: {
    beforeRender() {},
    beforeSave() {},
    beforeDelete() {}
  }
}

@ethanresnick
Copy link
Owner Author

Awesome, thanks!

@carlbennettnz
Copy link
Contributor

Would you mind taking a look at ethanresnick/json-api-example#v3-wip to see if anything subtle has fallen out of date? I got it running with a few minor tweaks, but the following req adds an organization, not a school, for me. 99% sure I'm doing the same thing as the tests.

POST /schools
{
  "data": {
    "type": "organizations",
    "attributes": {
      "name": "My School",
      "description": "Desc",
      "isCollege": true
    },
    "meta": {
       "types": ["organizations", "schools"]
     }
  }
}
> db.organizations.find()
{
	"_id" : ObjectId("5a84e78fe640b4ac23cd0600"),
	"name" : "My School",
	"description" : "Desc",
	"liaisons" : [ ],
	"__v" : 0
}

@ethanresnick
Copy link
Owner Author

Good catch. Should be fixed now in the example repo. I think the issue was just that the mongoose dependency was too old. (Mongoose < 4.8 had some bugs around discriminators.) I'll document this new requirement too.

@carlbennettnz
Copy link
Contributor

I'm still experimenting and writing up my feedback (there are quite a few points I hadn't considered earlier about actually consuming the API, especially with libraries like Ember Data) but while I'm working on that, is this summary of how subtyping works accurate? I just want to check my understanding of the changes to the public API.

Fetching

  • The root type is returned for every resource, regardless of the actual type of the resource.
  • For non-root type resources, meta.types contains an array of type names, root type last.
  • The fields parameter is now type-delimited, in the format ?fields[<type>]=<field>. This type-delimitation is required everywhere, even if subtypes aren't used.
  • Fetching from relationship endpoints works similarly. In the response, data.[].type is always the root type.

Creating

  • The process to create resources that are not subtypes is unchanged. Both the request and the response are the same.
  • When creating a resource with a relationship (to-one or to-many), the type key in the linkage is ignored. This is probably an unrelated bug.
  • To create a subtype resource, data.type must be the root type, and data.meta.types must be an array of type names, in any order. This is true regardless of whether the resource is POSTed to the root type's endpoint, or the subtype's.
  • If data.meta.types is provided when creating non-subtype resources, it is checked, but otherwise ignored.

Updating

  • Updating non-subtype resources remains unchanged.
  • Updating subtype resources requires use of the parent type in data.type. A useful error message is returned if the child type is used. No data.meta.types value is required.
  • Like in POST requests, the type key of linkage values is ignored.
  • The response to successful PATCH requests displays the root type for the resources and any related resources.
  • POST requests to any to-many relationship endpoints should work, but currently appear to give 500 errors. PATCH requests appear to work fine, though again, type keys are ignored.

Deleting

  • The process to delete non-subtype resources is unchanged.
  • Subtype resources can be deleted from any route in their type hierarchy.
  • Parent types cannot be deleted from their children's endpoints.

@carlbennettnz
Copy link
Contributor

Also, here's a few (mostly minor) issues I found while testing/reading.

  • You've added a whole lot more tests with this change, which is awesome. However, I don't think relationship endpoints for subtypes are covered, and POST requests to these seem to be giving 500 errors.
  • Validation errors on model (unique, etc.) give 500 errors, at least in the example project.
  • The subtypes test "should apply ?fields restrictions based on the rendered type" has two problems. expect(...) should be expect(...).to.be.true and in the .every() we should be checking that the attributes are/aren't present on the attributes object of the resource, not the resource itself.
  • This isn't directly related to subtypes and probably isn't one to bother with now, but the tests aren't atomic at all. A bunch of tests fail the first time they're run because they rely on data being inserted by other tests. I'd be happy to help with this once v3 stabilizes a little more.
  • Attempting to request a sparse fieldset with ?fields=name gives a vague 'Invalid parameter value' error. This could use a better explanation, especially given this was how it worked previously. I'm also not sure that the move to type-delimited field is necessary, but I'll discuss that more later.
  • Virtual relationships created with API.Relationship.of() don't create a data key when empty, unlike real relationships.
  • This is doesn't matter at all, but you're using function(obj = Object.create(null)) {} in a few places. As far as I can tell function(obj = {}) {} doesn't create the referencing bug I assume you're trying to avoid.

@ethanresnick
Copy link
Owner Author

ethanresnick commented Feb 16, 2018

Thanks for doing such a thorough review. Having you as a second set of eyes to sanity check the logic, and verify that the implementation is actually doing what its supposed to do, is super useful.

Most of what you've described is spot on, so I'll just respond to the bits where I have something to add.

  • For non-root type resources, meta.types contains an array of type names, root type last.

In the serialization of meta.types, the order of the entries isn't intended to be significant to the client. (It's just meant as a list of "interfaces" the resource "implements".) In practice, though, you're right that the order will always have the child types first, because Resource.typePath is what's getting serialized to meta.types, and the order in Resource.typePath is significant for the beforeX transforms.

The fields parameter is now type-delimited, in the format ?fields[]=. This type-delimitation is required everywhere, even if subtypes aren't used.

Maybe I'm crazy, but I'm pretty sure the ?fields parameter has always been type-delimited. If it hasn't been, then it should've been, because the type-delimination of ?fields is specified directly by json-api and isn't related to subtyping. The only thing that should've changed is that, previously, a subtype could have had a different fieldset restriction than the super type, whereas now they share one set of field restrictions because they're serialized with the same type key.

When creating a resource with a relationship (to-one or to-many), the type key in the linkage is ignored. This is probably an unrelated bug.
[On update...] Like in POST requests, the type key of linkage values is ignored.
PATCH requests [to relationship endpoints] appear to work fine, though again, type keys are ignored....

Yeah, these behaviors are a bug. The adapter should at least verify that the type key in the linkage matches the type for the root model of the model that the relationship points to. That is, if the relationship is specified as holding a school, the adapter should at least verify that the linkage type is "organizations". If you want to work up a failing test for that, it would be much appreciated.

Arguably, the adapter should go further and verify that there actually is a document that exists with that id and that its discriminator key indicates that it is really a school, but I think that check is lower priority/I go back and forth on whether it's worth the overhead. (That check for the actual existence of a document with the related id didn't happen before subtypes were introduced either, and would require another query for each relationship in mongo land. Checking the existence of the related resource is easier in SQL land, I imagine, as the foreign key constraint should just fail.)

If data.meta.types is provided when creating non-subtype resources, it is checked, but otherwise ignored.

The logic is supposed to be: in all cases, data.meta.types is checked, regardless of whether the resource being created is a subtype. However, in all cases, if data.meta.types is not provided, its value is implicitly defaulted to [data.type]. That happens here. The only difference between subtype and root type creation is that, when data.meta.types is automatically given that default value, that value is only valid at some endpoints. Concretely, a POST /schools with no data.meta.types should fail, but the same request to POST /organizations should succeed, because the implicit ["organizations"] value for the type list is only valid in the latter case. If the above is not actually what the code is doing then that's really bad and should definitely be fixed.

POST requests to any to-many relationship endpoints should work, but currently appear to give 500 errors.... You've added a whole lot more tests with this change, which is awesome. However, I don't think relationship endpoints for subtypes are covered

Yeah, that's a bad bug. I was pretty beat by the time I was working on the relationship-related adapter methods, so I'm kinda not surprised that those aren't working. I'll definitely add some tests, and then try to fix whatever stupid bug I introduced in my state of "can this feature please be done by now?".

The subtypes test "should apply ?fields restrictions based on the rendered type" has two problems.

D'oh! And that, kids, is why it's important to see your tests fail before they pass :D

Thanks for catching this. Just pushed a fix.

A bunch of tests fail the first time they're run because they rely on data being inserted by other tests.

Hmm, I can't reproduce this... even when I drop the whole db, the tests seem to pass on next run. There are some tests that depend on some fixtures data being inserted first, though. On my machine, that data gets added as part of running the test command, but maybe those fixtures aren't running for you for some reason?

Virtual relationships created with API.Relationship.of() don't create a data key when empty, unlike real relationships.

I'm not sure I follow. The API.Relationship.of method can take a data key in its argument and should use that. (E.g., I believe API.Relationship.of({ data: X, owner: ... }) works to create a relationship that gets serialized with the requested data.)

If you're talking about in the example API, I left out the data key intentionally (it's optional in the JSON:API spec when links are provided) because populating it would require an extra query, and having the linkage isn't all that useful anyway unless the user is doing an include. So, the query factory detects the include and only fills in the data key in that case.

This is doesn't matter at all, but you're using function(obj = Object.create(null)) {} in a few places. As far as I can tell function(obj = {}) {} doesn't create the referencing bug I assume you're trying to avoid.

The bug as I understand it really comes up anywhere the in operator is used to detect the existence of an (own) key, as that will (inadvertently) look up the prototype chain. Since the user can do an in check in their code, using Object.create(null) doesn't seem like a bad idea to me. (E.g., if a user's beforeSave has 'valueOf' in resource.attrs, Object.create(null) will prevent a false positive.) The real, issue, though, is that I'm not applying Object.create(null) consistently enough for the in operator to be safe anyway. E.g., when the MongooseAdapter creates its resource objects, I'm pretty sure it uses an object literal for the attributes object, so then the in operator is broken anyway. Given that, I should probably take it out everywhere (to not imply that in is safe when it isn't) or be more consistent about it. Hmm...

@carlbennettnz
Copy link
Contributor

The fields parameter is now type-delimited, in the format ?fields[]=. This type-delimitation is required everywhere, even if subtypes aren't used.

Maybe I'm crazy, but I'm pretty sure the ?fields parameter has always been type-delimited. If it hasn't been, then it should've been, because the type-delimination of ?fields is specified directly by json-api and isn't related to subtyping. The only thing that should've changed is that, previously, a subtype could have had a different fieldset restriction than the super type, whereas now they share one set of field restrictions because they're serialized with the same type key.

Nope, I'm the crazy one. No idea where I got the idea that this was a new thing.

Arguably, the adapter should go further and verify that there actually is a document that exists with that id [...] Checking the existence of the related resource is easier in SQL land, I imagine, as the foreign key constraint should just fail.

Yep, this was actually a big factor is why we switched. We were getting more and more issues from relationships that pointed to nowhere. Postgres now checks references are valid both when adding references and when deleting/updating referenced records.

and then try to fix whatever stupid bug I introduced in my state of "can this feature please be done by now?".

Oh boy do I know that feeling

A bunch of tests fail the first time they're run because they rely on data being inserted by other tests.

Hmm, I can't reproduce this... even when I drop the whole db, the tests seem to pass on next run. There are some tests that depend on some fixtures data being inserted first, though. On my machine, that data gets added as part of running the test command, but maybe those fixtures aren't running for you for some reason?

Ah sorry, I forgot to mention I was only running the subtype tests. I get 5 failures with this:

$ mongo integration-test --eval 'db.dropDatabase()'
$ NODE_ENV=testing npx mocha --compilers ts:ts-node/register test/integration/subtypes/index.ts

Virtual relationships created with API.Relationship.of() don't create a data key when empty, unlike real relationships.

[...]

If you're talking about in the example API, I left out the data key intentionally (it's optional in the JSON:API spec when links are provided) because populating it would require an extra query, and having the linkage isn't all that useful anyway unless the user is doing an include. So, the query factory detects the include and only fills in the data key in that case.

Ah, okay. Should have looked a little closer here. Ignore me, it's working perfectly.

This is doesn't matter at all, but you're using function(obj = Object.create(null)) {} in a few places. As far as I can tell function(obj = {}) {} doesn't create the referencing bug I assume you're trying to avoid.

The bug as I understand it really comes up anywhere the in operator is used to detect the existence of an (own) key, as that will (inadvertently) look up the prototype chain. Since the user can do an in check in their code, using Object.create(null) doesn't seem like a bad idea to me. (E.g., if a user's beforeSave has 'valueOf' in resource.attrs, Object.create(null) will prevent a false positive.) The real, issue, though, is that I'm not applying Object.create(null) consistently enough for the in operator to be safe anyway. E.g., when the MongooseAdapter creates its resource objects, I'm pretty sure it uses an object literal for the attributes object, so then the in operator is broken anyway. Given that, I should probably take it out everywhere (to not imply that in is safe when it isn't) or be more consistent about it. Hmm...

Woah, cool. I never realised there was a way to protect against someone messing with Object.prototype. Makes sense.

@ethanresnick
Copy link
Owner Author

Yep, this was actually a big factor is why we switched. We were getting more and more issues from relationships that pointed to nowhere.

Makes sense. Does this new approach to subtyping make that check (prohibitively) harder (because you also, I think, would want to check the discriminator key)?

I was only running the subtype tests

Ahh, gotcha. Yeah, that'll fail. I've updated the library to pull out installing the fixtures into a separate npm command. Now, if you do npm run test:install-fixtures first, you'll be able to run just the subtype tests after that and things should work fine. (If you do just npm t, it'll still install the fixtures automatically, as before.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants