Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Defined Format: UUID #542

Closed
JLLeitschuh opened this issue Jan 24, 2018 · 14 comments
Closed

Add Defined Format: UUID #542

JLLeitschuh opened this issue Jan 24, 2018 · 14 comments

Comments

@JLLeitschuh
Copy link

The UUID format is covered in RFC 4122. This standard could be a very simple addition to the default set of Defined Formats that JSON schema comes built in with.

There is a standard format that could be encoded as a pattern: https://en.wikipedia.org/wiki/Universally_unique_identifier#Format

Something like the following would work:

^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$
@handrews
Copy link
Contributor

@JLLeitschuh since UUIDs are URIs, we've generally taken the position that to require a specific sort of URI you use both format and pattern:

{
    "type": "string",
    "format": "uri",
    "pattern": "^urn:uuid:"
}

This is a pretty concise and readable way to indicate uuids (or https URIs, or mailto URIs, or whatever scheme you want) without having an explosion of format values.

This has come up before so perhaps a note about this usage in the spec would help?

@JLLeitschuh
Copy link
Author

Yes, noting this would be very useful.
I haven't seen anywhere before where ^urn:uuid: was a valid regex. Are you sure that this is valid in ECMA 262?

@handrews
Copy link
Contributor

@JLLeitschuh why wouldn't it be? ^ is one of the most basic regex operators and : is not a reserved character (except in constructs like (?:)).

bash$ node
> let r = /^urn:uuid:/;
undefined
> r.test('urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6');
true
> r.test('https://example.com/foo');
false
> 

@JLLeitschuh
Copy link
Author

Sorry, I guess I'm confused. I thought that the regex ^urn:uuid: was a way of validating UUID with the format of [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}.

I totally misunderstood.

@handrews
Copy link
Contributor

@JLLeitschuh Oh, I see! Yeah, what I mean is that if the implementation you are using understands validating UUID URNs as part of "format": "uri" in the first place, then you can restrict it t UUID URNs (instead of all URIs) by adding "pattern": "^urn:uuid:" alongside it. That's also good if you want a human to be able to understand that you want a UUID URN as it's pretty readable.

I'm also realizing that maybe you just want the UUID part and not the URN? In that case, you are best off just defining a schema using that regex and referencing it with $ref.

The format keyword is for documenting semantics, particularly for formats like email addresses that cannot be validated with a regexp. For such formats, if an implementation validates them then it may use a different approach than a regexp, and if not, then the application using the schema can notice that the field is semantically an email address (or whatever it is) and do its own additional validation.

I don't think that UUIDs on their own are a good candidate for a standardized format value. They are not as universal as URIs in general or IP addresses, nor are they particularly hard to validate with existing keywords such as pattern.

@JLLeitschuh
Copy link
Author

I think you hit the nail on the head.

I'm only referring to validating UUID's not that it is a URN.

you are best off just defining a schema using that regex and referencing it with $ref.
That's exactly what I was planning on doing. However, UUID's are pretty common in the industry for database indexes and for API's querying for data from those databases.
It seemed to me that having the UUID format in the spec would make sense given the widespread adoption of the format as a standard.

I agree that they aren't that hard to validate with a regex, it just seemed like a common format that would allow code generators for languages like Java to see and know that they should be using the java.util.UUID type.

Just my 10¢. Thanks!

@ralfhandl
Copy link

We also need/use a uuid format, plus a few other ones, e.g.

  • date for ISO8601 dates
  • time for ISO8601 time
  • duration for ISO 8601 durations without fixed endpoints, also used in XML Schema
  • ...

OpenAPI defines a few additional formats, and the OAI discusses creating a format registry, see OAI/OpenAPI-Specification#845 and OAI/OpenAPI-Specification#1345.

@handrews
Copy link
Contributor

handrews commented Mar 8, 2018

@ralfhandl date and time were added (or rather restored, along with regex) in draft-07. Also, we reserved other RFC 3339 names as possible future standardized formats (there was disagreement on how far to go, so we settled for saying "don't define anything that contradicts this" as a compromise while we thought on it more).

I don't think RFC 3339 includes duration, though.

Anyway, if you'd like durations please file that separately, let's keep this issue focused on UUID.

We're still working out the right threshold for standardizing formats. It's currently an awkward balancing act of extensibility vs interoperability vs implementation costs (right now if you implement one format, you are supposed to implement all of them, which is an increasingly high bar and somewhat problematic). The concept of modular vocabularies (#561) may help with this, although that issue as filed does not address the specifics of format.

@handrews
Copy link
Contributor

handrews commented Mar 8, 2018

@JLLeitschuh regarding:

would allow code generators for languages like Java to see and know that they should be using the java.util.UUID type

one of the main drivers of multi-vocabulary support (see #561) is to add a code generation vocabulary that would include hints for this kind of thing. Also hints as to whether an allOf is being used for inheritance vs composition or whatever.

The validation spec is not really a good place for that sort of thing. See also #513 which moves some keywords out of validation so that they can sever as a basis for validation, hyper-schema, code generation, etc.

If anyone out there likes the multi-vocabulary idea, even if the exact details of #561 aren't quite ideal for you, adding comments to #561 supporting the general direction would be most helpful.

@ralfhandl
Copy link

Formats usually serve the dual purpose of a programming-language-independent code-generation hint for the recipient and a production/validation instruction for the sender or an intermediate validator.

So pulling formats out of validation seems to be the right direction.

Opened #565 for duration.

@broofa
Copy link

broofa commented Mar 15, 2018

It's probably worth pointing out that in the most popular language on Github (JS), the most popular module for schema validation (AJV @ 40M downloads/month) already supports format: uuid.

Kinda feels like the standard has some catching up to do on this. :-/

[FWIW, came here because I'm using JSON schema for what I suspect is a very common use-case: server-API validation of REST resources backed by a DB where all the keys are UUIDs.]

@handrews
Copy link
Contributor

@broofa format is intentionally extensible and it is neither intended nor desirable for the standard to encompass all possible formats. Please see #563 for further discussion.

@gpakosz
Copy link

gpakosz commented Jan 4, 2019

Hello I landed here as I'm learning JSON Schema and I want to represent a field in a document that's a UUID v4.

I'm ready to extend https://pypi.org/project/jsonschema/ to add support for it. However, after reading this discussion, I'm not sure what's the recommend way to represent UUIDs.

The documents will be of the form

{
  "id": "xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx",
  ...
}

Is there now an official stance? Should it be what OpenAPI suggests?

{
  "type": "string",
  "format": "uuid"
}

Or something else?

Also, I'm not sure it's desirable to have "urn:uuid:xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx" in our JSON documents. It's mostly always going to be strings.

Can someone please shed some light?

@awwright
Copy link
Member

I'd say "uuid" seems like a good format to add, as it's described in a standard endorsed by a major organization (IETF), it's in common use "in the wild", and it provides some semantic meaning (it's not merely a bunch of characters with dashes, if you know it's a UUID then you also know its URI).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

6 participants