Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for validating schema properties #14

Open
iaincollins opened this issue Mar 6, 2020 · 4 comments
Open

Support for validating schema properties #14

iaincollins opened this issue Mar 6, 2020 · 4 comments
Labels
enhancement New feature or request
Milestone

Comments

@iaincollins
Copy link
Owner

Summary of proposed feature

Schema properties should be checked for validity.

Purpose of proposed feature

Currently properties are only checked to see if they exist, and not if the value they contain is valid.

Detail of the proposed

The value of properties should be checked.

This may include primitive types (strings, numbers) as well as specific types (dates, URLs) and complex objects (including nested types).

  • Strings should be valid
  • Numbers should be valid
  • Dates should be valid
  • URLs should be valid
  • Objects should be valid

Potential problems

As per the outline for the milestone for version 5.0, doing this for all schemas is expected to involve extending the schema.org scraper and writing a parser to handle scraping and using meta-programming to create tests that apply validation rules to properties.

Initial versions may include simple handling for primitive types and easily checkable types, but supporting complex types and properties that can be one of many types will be more difficult and support for that will likely come later. There may be edge cases it is not practical to support.

Describe any alternatives you've considered

It would be nice to have a list of valid templates that are parsable (e.g. in JSON Schema format) but I have not been able to find a suitable library of these and it does not appear there is a list of them published by Schema.org.

Additional context

Is there value is creating JSON Schema profiles, as something other people could reuse?

This would require extra work to integrate schema validation into this tool, but that is something I am familiar with from other projects.

@iaincollins iaincollins added the enhancement New feature or request label Mar 6, 2020
@iaincollins iaincollins added this to the Version 5.0 milestone Mar 6, 2020
@raffaelj
Copy link

and it does not appear there is a list of them published by Schema.org.

Have a look at https://schema.org/docs/developers.html and scroll down to "Vocabulary Definition Files" if you didn't find that before.

"all-layers" seems to cover all
https://schema.org/version/6.0/all-layers.jsonld

and "schema" covers a lot
http://schema.org/version/latest/schema.jsonld

I didn't look through the whole json strings, but I did a quick search for WebSite and WebPage properties. They still need to be converted into a parsable string with an iteration over the domainIncludes and rangeIncludes properties.

iaincollins added a commit that referenced this issue Mar 12, 2020
Partially addresses #14 by adding checks for top level properties for Schema.org objects.

* It only tests top level properties.
* If checks if a property name is valid (passes).
* If checks if a property name is not valid (fails).
* If checks if a property name is a draft property (warning).

Known limitations:

* Does not support nested schemas (ignored, unless the top level property is invalid for the specified schema).
* Does not validate the property type or value (e.g. String, Number, Date, etc).

Additionally, includes updated tests and improvments to CLI output.
@iaincollins
Copy link
Owner Author

Hey there, I've actually been working this over the last couple of days!

The data for mapping properties to types is sourced from the following files (the latter two are for schemas and properties that are not yet final / still in draft stage):

I've added checks for if a Schema.org property name is valid (passes), invalid (invalid) or valid but still in draft (warning). It does not yet support nested properties or check values.

I will likely tackle nested properties then value checks, as the behaviour for testing nested values will likely impact type checking if I do it the other way round.

I've started expanding the tests too. Adding this actually found real errors in a couple of the example schemas (such as invalid properties on test schemas).

@raffaelj
Copy link

Thanks for the csv sources. I wasn't aware of that github repository.

I would suggest to transform the csv files to a sqlite database with relations instead of parsing the csv files all over again and splitting the results on ', '.

@iaincollins
Copy link
Owner Author

Hey there! After a few weeks break I've followed up and done some more work on this for 5.x.

It's in master, but is not released to NPM yet, I'd like to continue to work on it for a while and do some refactoring to clean things up first.

I'll probably add validation for nested properties, think about if I want to tackle validating the content of properties in 5.0 (or save it for an update for 5.1) and think about some of the edge cases that can arise (e.g. referring to other schema objects on the same page using @id, which I've written up in #21).

I suspect I'll try and do nested validation for properties, just to indicate if they are valid or invalid for a given schema (this something I worked on in similar projects) and then more code clean up and cut a release before I go much further - ie before I add things like actually validating the value of properties, which I think will be in 5.1.

I'm actually included to try and avoid using sqllite as it creates a dependancy that can be awkward (it's a super useful library, but somewhat heavyweight and can have breaking changes), I've run into issues with it before and normally think it's great but am not as keen to introduce it as a dependancy for a library as I would be in an application.

Right now the performance hit of CSV is negligible, but in future I'll probably end up transforming the CVS files to JSON when they are imported (and writing an import script) as it's extremely fast to work with and has no dependancies, while also being easy to diff changes.

Will keep posting updates on progress!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants