Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Saved Objects] introduce per-type validation schemas. #104088

Closed
pgayvallet opened this issue Jul 1, 2021 · 13 comments · Fixed by #118969
Closed

[Saved Objects] introduce per-type validation schemas. #104088

pgayvallet opened this issue Jul 1, 2021 · 13 comments · Fixed by #118969
Assignees
Labels
Feature:Saved Objects impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. loe:medium Medium Level of Effort Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@pgayvallet
Copy link
Contributor

It is currently trivial to create saved objects containing invalid or corrupted data. You simply have to create (or update an existing) object with data that doesn't match the expected shape. We're currently not doing anything to protect against that.

This is dangerous because potential developer errors can lead to data being corrupted. Even worse, as the SO apis are documented, even end users can easily create corrupted objects that would break the system.

Another consequence is that, even if the corrupted object is unused in any application (e.g a corrupted dashboard that the end user never uses), it can lead to a migration failure during an upgrade, as all documents of a given type will go though the migration functions that are expecting a valid shape and/or data.

The first obvious improvement we could achieve to introduce basic protection against such corrupted would be to add per-type validation schemas, that would be provided when registering the associated type.

core.savedObjects.registerType({
  name: 'myType',
  mappings: {
    properties: {
       someField: { type: 'text' }
    }
  },
  schema: schema.object({
    someField: schema.string(),
  })
})

This would allow us to validate the attributes provided during a SOR.create operation against the schema, to ensure the created object is valid (and throw an error if it's not).

I'm unsure how we could implement that with other APIs. e.g update is effectively a partial update, so I'm not sure we could find a way to only validate the updated properties.

We could eventually use a plain object as the root schema instead of a schema.object, and pick the properties that are updated

schema: {
    someField: schema.string(),
}

but then we would loose some of the advanced schema validation features (such as by-reference validation between fields, or global validation methods), so I'm not sure it's worth it, and maybe only validating create is good enough

@pgayvallet pgayvallet added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:APM All issues that need APM UI Team support Feature:Saved Objects labels Jul 1, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@elasticmachine
Copy link
Contributor

Pinging @elastic/apm-ui (Team:apm)

@pgayvallet pgayvallet removed the Team:APM All issues that need APM UI Team support label Jul 1, 2021
@sorenlouv
Copy link
Member

Was team:apm added on purpose?

@pgayvallet
Copy link
Contributor Author

@sqren sorry, no, removed it after creating the issue

@mshustov
Copy link
Contributor

mshustov commented Jul 8, 2021

  mappings: {
    properties: {
       someField: { type: 'text' }
    }
  },
  schema: schema.object({
    someField: schema.string(),
  })

How hard it would be to build schema validation automatically based on provided properties?

This is dangerous because potential developer errors can lead to data being corrupted. Even worse, as the SO apis are documented, even end users can easily create corrupted objects that would break the system.

Also, we can start with validating well-known properties created by SavedObjects service itself.

{
  id: schema.string({ validate: () => validateIdFormat }),
  type: schema.string(), // or literal?
  version: schema.maybe(schema.string()),
  attributes: schema.any(), 
  ....

@pgayvallet
Copy link
Contributor Author

How hard it would be to build schema validation automatically based on provided properties

Would probably be tedious for some specific types of fields such as nested and so. Would also not work with attributes that are not indexed (and therefore could not be present in the mappings provided during the type registration).

But mostly it would not be good enough. In some case, we would like the owners to do more than just surface validation. E.g for dashboard, we would be expecting to validate their panelsJSON and optionsJSON properties by parsing them and running validation against the parsed properties, not just ensure that a string was passed, which we wouldn't be able to do with just auto-generated schemas.

We could try to generate a schema for types that does not explicitly provide one though, it would be better than no validation at all, but even in that case, we would be forced to use unknowns: 'allow' to not break in case of potential unregistered/non-indexed fields

Also, we can start with validating well-known properties created by SavedObjects service itself

I don't think we need that to secure the SOR.create API, as the consumers are only allowed to provide the attributes used during the creation of the doc?

@mshustov
Copy link
Contributor

mshustov commented Jul 8, 2021

I don't think we need that to secure the SOR.create API, as the consumers are only allowed to provide the attributes used during the creation of the doc?

yeah. it's rather for the update case.

We could try to generate a schema for types that does not explicitly provide one though, it would be better than no validation at all

yes, I didn't say we should use auto-generation for 100% of cases, but it might be a good default.

@lukeelmers
Copy link
Member

Discussed today that for now we will focus on SOR.create here. We can treat update as a follow-up task if we think the tradeoffs discussed in this thread are worth it.

@lukeelmers lukeelmers added impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. loe:medium Medium Level of Effort labels Oct 29, 2021
@exalate-issue-sync exalate-issue-sync bot added loe:large Large Level of Effort and removed loe:medium Medium Level of Effort labels Nov 2, 2021
@lukeelmers lukeelmers self-assigned this Nov 2, 2021
@lukeelmers
Copy link
Member

Another consequence is that, even if the corrupted object is unused in any application (e.g a corrupted dashboard that the end user never uses), it can lead to a migration failure during an upgrade, as all documents of a given type will go though the migration functions that are expecting a valid shape and/or data.

As I've started looking into this, one trade-off we will need to decide on is whether to run the type validations before or after the object is migrated during SOR.create.

If we validate before, devs would need to worry about writing a schema that maintains BWC, as for example field names could be changed when the object is migrated. This would be a pain. It also means we'd need a map of schemas to versions like we use for migrations, which would significantly increase complexity and require adding this to the migration algorithm:

core.savedObjects.registerType({
  name: 'myType',
  mappings: {
    properties: {
       someField: { type: 'text' }
    }
  },
  migrations: {
    '1.0.0': fn1,
    '2.0.0': fn2,
  },
  schemas: {
    '1.0.0': schema.object({ someField: schema.string() }),
    '2.0.0': schema.object({ someMigratedField: schema.string() }),
  }
})

If we validate after, the schema can be written based on the current structure of the object, however the error messages become less useful. For example, an invalid object may cause a failed migration, but you'd get back an error about the migration, not a general validation error. That's a bummer because it means migration functions cannot trust that the provided attributes are valid.

Overall I think validating after still makes the most sense and is the least complex / risky route, and is the path I'm going down now, but wanted to make note of it here in case others feel strongly otherwise. The user would still be protected against data corruption in this scenario because the object won't be written to ES, it just means they wouldn't always be getting back a useful validation error message unless the invalid object made it through the migration, and migrations themselves won't really be able to benefit from this feature.

@lukeelmers
Copy link
Member

Just discussed this as a team, and we decided that moving toward per-version schemas would make sense for us longer term, as (among other things) it could improve the migration authoring experience if a version-specific validation could be run before each migration function.

However, this would significantly blow up the scope of this effort, and we want to start small here.

So we agreed a good path forward would be:

  • Provide versioned schemas when registering SO types as shown above
  • For now, just proceed with validating a SO inside SOR.create, and do it after the migration is run against the object
  • We would only run the schema validation for the current Kibana version

In this way, we get the near-term benefit of reducing the likelihood of data corruption, while having a future-proofed API that would allow for version-by-version schema validation to take place during migrations in the future. The tradeoff is the same as what I mentioned above, basically that users might get a migration error back when creating a SO instead of a validation error, which isn't the ideal DX but is at least an improvement on the current situation.

@rudolf
Copy link
Contributor

rudolf commented Nov 4, 2021

👍

Adding validations to migrations would also mean that upgrades fail when we encounter invalid documents. Whereas some migrations are written in a way that ignores invalid data to allow the migration to succeed even with corrupt data.

So we would want to have API validation in place for a long time, then perhaps warn on failed migration validation and perhaps only then enforce it.

@mshustov
Copy link
Contributor

mshustov commented Nov 4, 2021

So we would want to have API validation in place for a long time, then perhaps warn on failed migration validation and perhaps only then enforce it.

Can we start with emitting a warning in the dev mode and collecting production telemetry to track the number of such malformed docs?

@lukeelmers
Copy link
Member

#118969 has merged with support for create and bulkCreate, so I will go ahead and close this issue as it represents what we wanted to accomplish for a first iteration.

I've opened #122343 to track some of the potential follow-ups we discussed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Saved Objects impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. loe:medium Medium Level of Effort Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants