-
Notifications
You must be signed in to change notification settings - Fork 9.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenAPI vocabulary or dialect for code generation #2542
Comments
Related: |
@mkistler great to see you getting this going! I have some thoughts, but take them with a grain of salt as I'll probably just drop in on this discussion periodically and don't have the bandwidth to push anything. I'm just offering some ideas in case they help. I think that this is a good comprehensive overview in what's needed to create successful tooling, but there's also a separation of concerns here in how it might be best approached. I would see three components:
I admit to being confused over what I would caution against calling this the code generation system, as requirements will vary and not all tools will target the same language(s). A code generator targeting Python will have different needs and capabilities from one targeting Java. Regarding JSON Schema vocabularies, the approach that looks most promising is one that disambiguates JSON Schema validation constructs that are challenging for code generation by placing new annotation keywords adjacent to (in the same schema object as) the keyword being disambiguated. The latest JSON Schema specification gives an example of this approach. In some cases ( |
To me, an ideal code gen vocabulary is one that allows me to use the full power of JSON Schema for validation while also allowing me to use the same schemas for code gen. That would mean that tooling would have to ignore some things that only relate to validation. It also means tooling can't make assumptions about how a pattern is to be interpreted in OO. For example, I believe that the way forward is a vocabulary of annotation keywords that allows you to be explicit about how you expect a schema to be used for code gen without it having an effect on validation. Here are a couple examples of the kind of thing I'm thinking of. {
"$comment": "Example of anyOf expressing an enum",
"interpretAs": "enum",
"anyOf": [
{ "const": 0, "title": "Sunday" }
{ "const": 1, "title": "Monday" }
{ "const": 2, "title": "Tuesday" }
{ "const": 3, "title": "Wednesday" }
{ "const": 4, "title": "Thursday" }
{ "const": 5, "title": "Friday" }
{ "const": 6, "title": "Saturday" }
]
} {
"$comment": "className is used for the internal name because URI $ids don't work as class names",
"className": "Foo",
"type": "object",
"properties": {
"foo": { "type": "string" }
}
} {
"$comment": "The baseClass keyword makes it explicit that a reference is intended to represent an inheritance relationship",
"className": "FooBar",
"allOf": [{ "$ref": "/schema/foo", "baseClass": true }],
"properties": {
"bar": {
"$comment": "A reference that does not express inheritance. (In a way it does, but that's not how we would expect code to be generated)",
"$ref": "/schema/common#/nonnegative",
"maximum": 100
}
}
} This is just off the top of my head. It's probably not the best approach and the names certainly will need some workshopping, but hopefully this gets across the idea of the general idea. I think it would be useful to get some details about the reason for each of the restrictions in the original proposal. That way we can work backwards to try to solve those problems in ways that don't require restrictions for JSON Schema validation. One more thing I want to point out is that the current proposal is coupled to the OpenAPI document. I believe that we should be solving the general case. OpenAPI users aren't the only JSON Schema users that are interested in code gen and it would be great if we could solve for their needs as well. |
This is a note to hopefully remember a point for this discussion. Possibly we can learn something from TypeScript type annotations, as they add strong typing to an untyped language, similarly structured annotations might help the code-generation case. |
For clarification:
Should this be restricted to the specific top-level component sections? ie: Or would references within a top level component be allowed? ie: |
@landrito it's not really a good practice to Of course nothing bad automatically happens if you don't. But it's like abusing the leading underscore convention in Python, which usually indicates a private method. You can call it like a public one, but you're doing something that most people reading or maintaining the code wouldn't expect. Most people will assume that a random property schema is not being re-used elsewhere and will feel free to change it without looking for @MikeRalphson good idea on TypeScript. Which may be the only time I've ever said something positive about TypeScript but that's just my preference for loose/dynamic typing speaking 😝 |
I just updated the original description to add rationale to each of my original bullet points. |
The approach @jdesrosiers illustrated is very much the sort of thing I had in mind when I talked about disambiguating validation constructs. I'll probably get out of the way on this point now and let him carry it forward 😃 @mkistler Many of your rationales cite limitations in specific programming languages, or specific sorts of programming languages. If this is to be THE code generation system endorsed by OpenAPI, that would damage the specification for other environments. I recall you stating something to the effect that everything should be consumable by those languages, but that is a design choice that OAS should allow but not enforce. I would really like to see some acknowledgement and discussion of this point. I have absolutely no objection to there being a strict no-union statically typed code gen vocabulary, as long as there is also one that allows for full idiomatic usage of loosely typed languages. These would likely build on each other in some way, perhaps a core shared vocabulary and an additional vocabulary for one approach, or two additional vocabularies, one for each, if each has additional needs. Please do not limit OAS for those of us who live entirely in the loosely typed world. |
@handrews I have no intention and would indeed resist "limiting OAS" to adopt any of these restrictions in general. At the same time, I think it would be helpful to have a means to describe subsets of OAS that are amenable to code generators. It was towards this end that I posted the issue. I think it is likely that there will be many code generation vocabularies -- as you suggest, one might be for statically-typed languages, another for dynamically typed languages, and perhaps others for special situations. |
Thanks for the clarification @mkistler. It just read as if it were a proposal for a single system to me, and I remember someone (I'm not even sure if it was in the context of OAS or on the JSON Schema slack or what) arguing that everything should be done based on strongly typed languages only. |
@handrews I agree. I think we are very much on the same page. @mkistler Thanks for adding rationale for each of the constraints you proposed. The first thing that jumps out at me is that we aren't just taking about a JSON Schema vocabulary/dialect here. Several of the points seem to apply only to OpenAPI. I assumed you are talking about a JSON Schema vocabulary because OpenAPI doesn't have a similar concept, but now I think you may be extending the concept to OpenAPI as a whole. Either way, I think it would be good to clarify the scope of what we are talking about here.
If you are referring to an unconstrained array that allows items of any type, I don't know of any language that doesn't have such a concept. For example, in Java you can use
It's not impossible, you just need to return an
Why does it have to be forbidden? Why can't it just ignore it? Code gen has no use for
On the contrary, I consider external
The biggest problem with this is that it's coupled to OpenAPI. Ideally, this would be a vocabulary that any JSON Schema users could use even if they aren't using OpenAPI. This constraint would couple this vocabulary to OpenAPI. Other than that, why do you think it's unnecessary and why do you think it's complicated? It seems to me an unnecessary complication for the processing engine to have to care where a reference is pointing. |
Just wanted to mention that I started a discussion in the JSON Schema community repository suggesting to work together on this problem of using JSON Schema in tooling. The outcome of the suggested process tackles the very issues talked about here. Especially since this is a common issue that we all currently trying to solve in parallel, each with their own way of doing it. - looking forward to hearing your thoughts on the matter! 👍 |
Code generation tools often have special requirements or restrictions on the structure of an OpenAPI definition (document?) that improve the generated code. Here are some examples of restrictions from the IBM OpenAPI SDK generator:
Parameters must be unique by name only, irrespective of "in".
Rationale: Operation parameters are often rendered as the parameters on a function or method in the target language of the code generator. Since most languages require parameters to have unique names, the code generator would need to incorporate the
in
of a parameter into its name to prevent name collisions. This is undesirable, since it exposes the mechanics of the API without adding any value.There should be at most one success response with a response body. A 204 and other 2XX is okay, but no other combination of two or more 2XX responses.
Rationale: In statically-typed languages like Java, the return value of a method must have a single static type. This makes it difficult to represent an operation with two different response schemas as a single method returning a single response type.
Property names and parameter names must be "case-insensitive" unique
Rationale: Code generators often reformat the names of parameters, properties, and schemas to use idomatic case formatting for the target language: lower_snake_case for Python, lowerCamelCase for Java, etc. But this reformatting could introduce naming conflicts if two parameters, e.g. "foo_bar" and "fooBar" are not "case-insensitive" unique.
Arrays must contain items of a single type
Rationale: Many languages require an array to contain only values of a single type.
Schema type must specify a single type -- no type arrays
Rationale: Some widely-used statically-typed languages, e.g. Java and Go) have no provision for "union" types, making it impossible to define a
type: [ integer, string ]
typed property or parameter.Don't use "nullable"
Rationale: it's deprecated, and is just an alternate way of expressing type arrays
Don't use JSON schema "not"
Rationale There's no obvious way to represent this in many widely used programming languages.
No "if-then-else" in JSON schema
Rationale There's no obvious way to represent this in many widely used programming languages.
The API document should be "self contained" (no external "$refs")
Rationale: External refs can easily create multiple namespaces for schemas, parameters, security schemes, etc. These are unnecessary complications for code generators.
All "$refs" must be to elements in the "components" section of the document
Rationale: "$ref" targets outside of "components" are unnecessary complications for code generators.
It would be nice to have a common set of rules like this that could be codified into a "Code generation" vocabulary or dialect for OpenAPI.
The text was updated successfully, but these errors were encountered: