Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a format Registry #845

Closed
webron opened this issue Nov 18, 2016 · 43 comments
Closed

Create a format Registry #845

webron opened this issue Nov 18, 2016 · 43 comments
Assignees

Comments

@webron
Copy link
Member

webron commented Nov 18, 2016

In reference to #607 and #811.

Instead of adding additional formats to the spec, we're considering two options - either a set of formats in a supporting guidelines document or a formal OAI format registry with a set of guidelines how to enter new formats to it. The registry will serve as an official repository for these formats and tools could use it as a reference. The @OAI/tdc is currently leaning towards the option of the repository.

This ticket is a reminder to tackle it and finalize the approach and the guidelines for either case.

@MikeRalphson
Copy link
Member

MikeRalphson commented Apr 28, 2017

Summary of existing formats used in the wild. https://github.com/Mermade/openapi-specification-extensions/blob/master/formats/combined.tsv

It may be worth recommending that format strings only contain lower-case letters, digits and hyphens, or some other restriction, to draw attention to the fact it is not intended as an essay on the content of the property or an enum, pattern or template.

@handrews
Copy link
Member

handrews commented Apr 3, 2018

Relevant: json-schema-org/json-schema-spec#563

@MikeRalphson
Copy link
Member

MikeRalphson commented Apr 3, 2018

Proposed candidates discussed in other issues (work-in-progress):

Type(s) Format Description Issue
integer int53 53-bit integer #1517
integer int16 Signed 16-bit integer (short)
integer int8 Signed 8-bit integer (To help with misunderstandings of the current byte format)
integer uint8 Unsigned 8-bit integer (To help with misunderstandings of the current byte format)
number|string decimal Fixed-point decimal numbers #889
string int64s 64-bit integer held as a string for interoperability reasons #1517
string uuid UUID v4 format - RFC4122 #542
string base64url url-safe binary #606
string time time of day - as defined by partial-time - RFC3339 #358
string duration Duration - as defined by xs:dayTimeDuration - XML Schema 1.1 / ISO 8601 #359

@MikeRalphson
Copy link
Member

@handrews - I like the idea of potentially delegating responsibility for formats to a JSON Schema-led registry and collaborating on that.

For now, we'll keep a small standard set in one of the major specification drafts.

How close would we be on the "core" OAS formats ?

@darrelmiller
Copy link
Member

And there are a bunch more here #607

@handrews
Copy link
Member

handrews commented Apr 3, 2018

@MikeRalphson regarding "core" OAS formats:

  • "byte" should be done with "contentEncoding": "base64" as of draft-07
  • "binary" should be done with "contentMediaType": "application/octet-stream" as of draft-07
  • Both of the above have existed in some form for much longer than draft-07 under various names
  • The numeric formats are reasonable (to me) but we've never really sorted that out
  • "password" should be done through a UI vocabulary, so probably not a format

@darrelmiller regarding #607:

  • "decimal" is proposed for JSON Schema but as a string format if someone who understands the use cases and concerns will just write $*%$@ PR for it- it could have been in draft-07 if anyone who was asking about it would step up. I don't think it's possible to reliably implement it as a number format so I'm confused there.
  • "uuid" Is this just a UUID or a UUID URN? The latter should be {"type": "string", "format": "uri", "pattern": "^urn:uuid:"}
  • "duration" is one that I'd like to have, we already have the other date and time ones
  • "base64url" should also be "contentEncoding": "base64url", not format

@cyberphone
Copy link

What's the difference between "byte" and "binary"?
In most newer standards, byte-arrays are encoded as Base64Url rather than Base64.

BigInteger support is currently available in Java, .NET, and probably in a bunch of other platforms as well. Java and .NET use entirely different BigInteger serialization schemes.

I would separate a possible BigNumber type from Decimal/Money because the latter is based on decimal arithmetic and usually do not come with exponents. If an application needs exponents, BigNumber would be the logical choice.

@cyberphone
Copy link

cyberphone commented May 17, 2018

The JSON-P API for Java use the following algorithm for long and a similar one for other extended numeric types:

if (value >= 9007199254740992 || value <= -9007199254740992)
    serializeAsString(value);
else
    serializeAsNumber(value);

IMO, it is bad idea but they claim it is the "industry standard" as well as the correct interpretation of the JSON RFC.

https://github.com/cyberphone/I-JSON-Number-System#extended-numeric-data-types-compliant-with-i-json

@ioggstream
Copy link
Contributor

I propose the fixed-point numeric decimal as per

  • "DECIMAL" SQL standard "ISO/IEC 9075-2: 2016 12 15"

@cyberphone
Copy link

I propose the fixed-point numeric decimal as per
"DECIMAL" SQL standard "ISO/IEC 9075-2: 2016 12 15"

Don't you just love standards organizations that want money for their work? https://webstore.iec.ch/publication/59685

@ioggstream
Copy link
Contributor

ioggstream commented May 29, 2018

"DECIMAL" SQL standard "ISO/IEC 9075-2: 2016 12 15"
Don't you just love standards organizations that want money for their work?

Use the draft, Luke ;)

@peteraritchie
Copy link

peteraritchie commented Feb 19, 2021

How would a compliant OAS parser deal with additions to this external registry? I think versioning the spec based on the state of a registry at an instant of time might not be the most beneficial.

I also thought, how would OAS authors deal with draft formats... What I was thinking is that there be a way to specify format/pattern pairs in the components or something. e.g.

  components:
    x-formats:
      html:
        pattern: '<\s*a[^>]*>(.*?)<\s*/\s*a>'
#...
    schemas:
       markup-text:
         type: string
         format: html # a parser would know here to use '<\s*a[^>]*>(.*?)<\s*/\s*a>' to validate a value

@karenetheridge
Copy link
Member

I don't see much gain in distilling formats down to simple regexes. The advantage of the "format" keyword is to be able to specify constraints that are not possible with a simple regexp, such as "date" which not only checks that digits and symbols are in the right place in the string, but also that the value represents a real date (for example "2021-13-13 25:00:00" passes a simple regexp but is not actually a valid date).

FWIW, many json schema implementations do support plugging additional formats into them -- but this cannot be done with data, but with code, due to the nature of format validation -- so there is no standard for doing so, and will be different for every implementation/language/platform.

@handrews
Copy link
Member

handrews commented Feb 23, 2021

Since OAS 3.1 is out now, with JSON Schema draft 2020-12 support including support for extension vocabularies, I would recommend closing this and focusing on adding vocabularies to meet the various needs. OAI can at some point designate certain vocabularies as preferred without breaking any compatibility guarantees at all. Or even make certain ones mandatory, which would still be backwards compatible but would require work to ensure that the newly mandatory vocabularies are supported. Presumably, only well-supported vocabularies would be blessed in that way, though.

For those who do not know the saga of format and what happened in the 4+ years since this issue was filed:

  • format is a giant pain for implementations because the set of formats is open-ended, but was specified in such a way that people expected it to work for validation, although that was a misreading- it was always optional in some ways.
  • The exact situation of what was and was not required in terms of format support was a huge point of contention fo years.
  • format was the only place in the spec where a lot of things people wanted fit, so the list of format values grew over time, both within and outside of the JSON Schema spec.
  • This was extra-messy because there was no registry of format values or expected behavior (hence this issue, and I think other people have their own registries somewhere).
  • No other keyword behaves remotely like this. While contentMediaType is open-ended, it is explicitly not validated and parsing any such embedded document is handled separately from validation, by whoever calls the validator. No other keyword is even close (other enumerated value keywords, including contentEncoding, used a closed set of values).
  • Because of all of this, format support "in the wild" was extremely variable, ranging from not supported at all, to partial support (the canonical example being validating the "email" format by just checking that there's an @ somewhere in the middle), to full support. And each specific format was variable in level of support.

This was a mess, so in draft 2019-09 (which introduced extension vocabularies), we made format not validate at all by default. Technically you can configure things to require it to validate, and then you probably end up in the unreliable situation described above.

The point of vocabulary support is that, as a schema author, you can indicate what extension vocabularies MUST be present in order for your schema to be correctly processed. This is not possible with format values (vocabularies work at the keyword level, not the keyword value level). The benefit here is that you could (and really should) have multiple vocabularies, each of which replaces a coherent part of format with reliable, clear behavior. For example:

  • A date-time vocabulary that supports at least as much as format does. This is easy to validate and I'm sure would be widely supported.
  • A networking vocabulary (or possibly multiple ones) ip addresses, hostnames, etc.
  • Email, with clear guidance on how much syntactic validation is required (because people argue about it endlessly now)
  • URI/IRI, perhaps where the keyword takes a list of valid schemes (one of the challenges of URI validation right now is that full syntactical validation is scheme-specific, and the set of schemes is open-ended), and fails validation if the scheme is not recognized, e.g. "uri": ["http", "https"] or "uri": ["data"] or "uri": ["urn"] (I just made this up right now, don't take it too seriously, there are many possible ways to solve this problem!)
  • A regular expression vocabulary that could indicate the specific dialect of regular expression expected (default ECMA whatever it is that JSON Schema itself references)

I hope this gives a feel for why JSON Schema wants to move away from format, and how better options can be added without having to wait for the JSON Schema spec team to approve anything.

@peteraritchie
Copy link

Since OAS 3.1 is out now, with JSON Schema draft 2020-12 support including support for extension vocabularies, I would recommend closing this and focusing on adding vocabularies to meet the various needs. OAI can at some point designate certain vocabularies as preferred without breaking any compatibility guarantees at all. Or even make certain ones mandatory, which would still be backwards compatible but would require work to ensure that the newly mandatory vocabularies are supported. Presumably, only well-supported vocabularies would be blessed in that way, though.

...

Agreed. My comment more of an "in lieu of anything better..." idea. Support for vocabularies/dialects is a much better way to support something like this.

@handrews
Copy link
Member

@Bessonov if you have particular questions or concerns that led you to put a "confused" emoji on my comment, I'd be happy to address them.

@Bessonov
Copy link

Hey @handrews , thanks for pinging! Github emoji 😕 description doesn't make any sense for me. I use it in terms of "unhappy", like ":(". In this particular case I'm unhappy with current state and described challenges. Not with your explanation ❤️

@baywet
Copy link
Contributor

baywet commented Jan 4, 2023

hi everyone 👋
We do have additional list of mappings here microsoft/OpenAPI.NET#1094
Here https://github.com/microsoft/kiota/blob/bac004b2c3688e6dde6ddf010ec0f8f272d7d080/tests/Kiota.Builder.Tests/KiotaBuilderTests.cs#L2724

And here https://json-schema.org/draft/2020-12/json-schema-validation.html#name-defined-formats
I'd be happy to update the list here if someone points me to the source, or to update the vocabulary, whichever we think is best at this point.

@MikeRalphson
Copy link
Member

I'd be happy to update the list here if someone points me to the source

@baywet, the source of the draft registry is yaml data in the gh-pages branch, specifically here.

@baywet
Copy link
Contributor

baywet commented Feb 20, 2023

Thanks for the pointers.

Everyone: I put a PR together at #3167 feel free to give it a look and tell me what I missed :)

@baywet
Copy link
Contributor

baywet commented Mar 14, 2023

Update: The main PR has been merged.

There are a couple of follow up PRs you should jump on if you're interested in the specific topics:

@MikeRalphson
Copy link
Member

I've uploaded a more recent summary of formats in the wild from 2021.

@Speeddymon
Copy link

I was looking for a format to constrain IP addresses with a CIDR yesterday, I tried format: ipv4 but it didn't work because the definition for ipv4 doesn't allow for that notation. It'd be nice to see an additional format value for that in both the ipv4 and ipv6 flavors. Happy to submit a separate issue or be redirected to the proper location to make this request if this is not the correct place.

@lornajane
Copy link
Contributor

@Speeddymon could you open a new issue to request that please? The issue should be in this repository, and I recommend you include your use cases to help us understand the request.

@lornajane
Copy link
Contributor

This was fixed in #3167

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests