Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

top-level types are always inline, so duplicated in the schema #104

Closed
davepacheco opened this issue Apr 16, 2021 · 4 comments · Fixed by #105
Closed

top-level types are always inline, so duplicated in the schema #104

davepacheco opened this issue Apr 16, 2021 · 4 comments · Fixed by #105

Comments

@davepacheco
Copy link
Collaborator

davepacheco commented Apr 16, 2021

The problem

In oxidecomputer/omicron#64 @david-crespo reported that when running openapi-generator for TypeScript on the Omicron API schema, he was getting duplicate type definitions like ApiRackView and ApiRackView1. The types were exactly the same. This API has two different endpoints that use this type: one returns an ApiRackView, and the other returns a list of ApiRackView. It appears that the dropshot-generated OpenAPI spec inlines the top-level type in the response, but it generates references for subtypes.

Test case

I was able to replicate this by modifying test_openapi.rs with this change:

diff --git a/dropshot/tests/test_openapi.rs b/dropshot/tests/test_openapi.rs
index 97daf80..3af7e5f 100644
--- a/dropshot/tests/test_openapi.rs
+++ b/dropshot/tests/test_openapi.rs
@@ -136,6 +136,36 @@ async fn handler7(
     unimplemented!();
 }
 
+#[derive(JsonSchema, Serialize)]
+struct NeverDuplicatedTopLevel {
+    b: NeverDuplicatedNextLevel,
+}
+
+#[derive(JsonSchema, Serialize)]
+struct NeverDuplicatedNextLevel {
+    v: bool,
+}
+
+#[endpoint {
+    method = PUT,
+    path = "/dup1",
+}]
+async fn handler8(
+    _rqctx: Arc<RequestContext<()>>,
+) -> Result<HttpResponseOk<NeverDuplicatedTopLevel>, HttpError> {
+    unimplemented!();
+}
+
+#[endpoint {
+    method = PUT,
+    path = "/dup2",
+}]
+async fn handler9(
+    _rqctx: Arc<RequestContext<()>>,
+) -> Result<HttpResponseOk<NeverDuplicatedTopLevel>, HttpError> {
+    unimplemented!();
+}
+
 fn make_api() -> Result<ApiDescription<()>, String> {
     let mut api = ApiDescription::new();
     api.register(handler1)?;
@@ -145,6 +175,8 @@ fn make_api() -> Result<ApiDescription<()>, String> {
     api.register(handler5)?;
     api.register(handler6)?;
     api.register(handler7)?;
+    api.register(handler8)?;
+    api.register(handler9)?;
     Ok(api)
 }

Here's how that changes the generated schema for that test:

dap@zathras dropshot $ git diff dropshot/tests/test_openapi.json
diff --git a/dropshot/tests/test_openapi.json b/dropshot/tests/test_openapi.json
index 1233f79..688000d 100644
--- a/dropshot/tests/test_openapi.json
+++ b/dropshot/tests/test_openapi.json
@@ -26,6 +26,58 @@
         }
       }
     },
+    "/dup1": {
+      "put": {
+        "operationId": "handler8",
+        "responses": {
+          "200": {
+            "description": "successful operation",
+            "content": {
+              "application/json": {
+                "schema": {
+                  "title": "NeverDuplicatedTopLevel",
+                  "type": "object",
+                  "properties": {
+                    "b": {
+                      "$ref": "#/components/schemas/NeverDuplicatedNextLevel"
+                    }
+                  },
+                  "required": [
+                    "b"
+                  ]
+                }
+              }
+            }
+          }
+        }
+      }
+    },
+    "/dup2": {
+      "put": {
+        "operationId": "handler9",
+        "responses": {
+          "200": {
+            "description": "successful operation",
+            "content": {
+              "application/json": {
+                "schema": {
+                  "title": "NeverDuplicatedTopLevel",
+                  "type": "object",
+                  "properties": {
+                    "b": {
+                      "$ref": "#/components/schemas/NeverDuplicatedNextLevel"
+                    }
+                  },
+                  "required": [
+                    "b"
+                  ]
+                }
+              }
+            }
+          }
+        }
+      }
+    },
     "/impairment": {
       "get": {
         "operationId": "handler6",
@@ -288,6 +340,17 @@
   },
   "components": {
     "schemas": {
+      "NeverDuplicatedNextLevel": {
+        "type": "object",
+        "properties": {
+          "v": {
+            "type": "boolean"
+          }
+        },
+        "required": [
+          "v"
+        ]
+      },
       "ResponseItem": {
         "type": "object",
         "properties": {

We see that the top-level type shows up twice, inline in both places, while the next level type gets a reference instead.

Analysis

When you use the endpoint macro, we generate a From<...> for ApiEndpoint for this specific handler

#[allow(non_camel_case_types, missing_docs)]
#[doc = "API Endpoint: handler9"]
struct handler9 {}
#[allow(non_upper_case_globals, missing_docs)]
#[doc = "API Endpoint: handler9"]
const handler9: handler9 = handler9 {};
impl From<handler9>
    for dropshot::ApiEndpoint<
        <Arc<RequestContext<()>> as dropshot::RequestContextArgument>::Context,
    >
{
    fn from(_: handler9) -> Self {
        async fn handler9(
            _rqctx: Arc<RequestContext<()>>,
        ) -> Result<HttpResponseOk<NeverDuplicatedTopLevel>, HttpError> {
            ::core::panicking::panic("not implemented");
        }
        dropshot::ApiEndpoint::new(
            "handler9".to_string(),
            handler9,
            dropshot::Method::PUT,
            "/dup2",
        )
    }
}

(That's from cargo expand -p dropshot --test=test_openapi with my changes.)

When you register the handler function, we use that From impl (well, the corresponding Into) here:

pub fn register<T>(&mut self, endpoint: T) -> Result<(), String>
where
T: Into<ApiEndpoint<Context>>,
{
let e = endpoint.into();

We see above that that invokes ApiEndpoint::new(), which winds up invoking metadata() on the response type:

response: ResponseType::metadata(),

ResponseType is HttpResponseOk here, and the metadata() impl comes from HttpTypedResponse:

fn metadata() -> ApiEndpointResponse {
ApiEndpointResponse {
schema: Some(ApiSchemaGenerator::Gen {
name: T::Body::schema_name,
schema: T::Body::json_schema,
}),
success: Some(T::STATUS_CODE),
description: Some(T::DESCRIPTION.to_string()),
}
}

What's important here is that we supply Body::json_schema as the schema function here:

schema: T::Body::json_schema,

Again, that's all at registration time. Later when we generate the schema, we use that schema function to generate a schema for the response body:

ApiSchemaGenerator::Gen {
name,
schema,
} => (Some(name()), schema(&mut generator)),

Let's take a closer look at the derived JsonSchema impls. For the top-level type, it looks like this (back to cargo expand):

impl schemars::JsonSchema for NeverDuplicatedTopLevel {
    fn schema_name() -> std::string::String {
        "NeverDuplicatedTopLevel".to_owned()
    }
    fn json_schema(gen: &mut schemars::gen::SchemaGenerator) -> schemars::schema::Schema {
        {
            let schema = {
                let mut schema_object = schemars::schema::SchemaObject {
                    instance_type: Some(schemars::schema::InstanceType::Object.into()),
                    ..Default::default()
                };
                <NeverDuplicatedNextLevel as schemars::JsonSchema>::add_schema_as_property(
                    gen,
                    &mut schema_object,
                    "b".to_owned(),
                    None,
                    true,
                );
                schemars::schema::Schema::Object(schema_object)
            };
            gen.apply_metadata(schema, None)
        }
    }
}

The schema for NeverDuplicatedNextLevel must be added via add_schema_as_property, which winds up invoking not json_schema() on NeverDuplicatedNextLevel, but gen.subschema_for(). That's documented as:

Generates a JSON Schema for the type T, and returns either the schema itself or a $ref schema referencing T’s schema.

If T is referenceable, this will add T’s schema to this generator’s definitions, and return a $ref schema referencing that schema. Otherwise, this method behaves identically to JsonSchema::json_schema.

This explains why the non-top-level schemas get references: JsonSchema internally uses a function that tries to add references if possible. The top-level ones don't get references because we're using json_schema() directly, which always generates a new, non-reference schema.

@davepacheco
Copy link
Collaborator Author

davepacheco commented Apr 16, 2021

With this change:

diff --git a/dropshot/src/handler.rs b/dropshot/src/handler.rs
index 85d544d..38560c2 100644
--- a/dropshot/src/handler.rs
+++ b/dropshot/src/handler.rs
@@ -1163,13 +1163,20 @@ where
         ApiEndpointResponse {
             schema: Some(ApiSchemaGenerator::Gen {
                 name: T::Body::schema_name,
-                schema: T::Body::json_schema,
+                schema: make_subschema_for::<T::Body>,
             }),
             success: Some(T::STATUS_CODE),
             description: Some(T::DESCRIPTION.to_string()),
         }
     }
 }
+
+fn make_subschema_for<T: JsonSchema>(
+    gen: &mut schemars::gen::SchemaGenerator,
+) -> schemars::schema::Schema {
+    gen.subschema_for::<T>()
+}
+

With this change, this is how the schema for test_openapi.rs changes:

dap@zathras dropshot $ git diff dropshot/tests/test_openapi.json
diff --git a/dropshot/tests/test_openapi.json b/dropshot/tests/test_openapi.json
index 1233f79..4ce534a 100644
--- a/dropshot/tests/test_openapi.json
+++ b/dropshot/tests/test_openapi.json
@@ -26,6 +26,40 @@
         }
       }
     },
+    "/dup1": {
+      "put": {
+        "operationId": "handler8",
+        "responses": {
+          "200": {
+            "description": "successful operation",
+            "content": {
+              "application/json": {
+                "schema": {
+                  "$ref": "#/components/schemas/NeverDuplicatedTopLevel"
+                }
+              }
+            }
+          }
+        }
+      }
+    },
+    "/dup2": {
+      "put": {
+        "operationId": "handler9",
+        "responses": {
+          "200": {
+            "description": "successful operation",
+            "content": {
+              "application/json": {
+                "schema": {
+                  "$ref": "#/components/schemas/NeverDuplicatedTopLevel"
+                }
+              }
+            }
+          }
+        }
+      }
+    },
     "/impairment": {
       "get": {
         "operationId": "handler6",
@@ -67,25 +101,7 @@
             "content": {
               "application/json": {
                 "schema": {
-                  "title": "ResponseItemResultsPage",
-                  "description": "A single page of results",
-                  "type": "object",
-                  "properties": {
-                    "items": {
-                      "description": "list of items on this page of results",
-                      "type": "array",
-                      "items": {
-                        "$ref": "#/components/schemas/ResponseItem"
-                      }
-                    },
-                    "next_page": {
-                      "description": "token used to fetch the next page of results (if any)",
-                      "type": "string"
-                    }
-                  },
-                  "required": [
-                    "items"
-                  ]
+                  "$ref": "#/components/schemas/ResponseItemResultsPage"
                 }
               }
             }
@@ -122,8 +138,7 @@
             "content": {
               "application/json": {
                 "schema": {
-                  "title": "Response",
-                  "type": "object"
+                  "$ref": "#/components/schemas/Response"
                 }
               }
             }
@@ -288,6 +303,31 @@
   },
   "components": {
     "schemas": {
+      "NeverDuplicatedNextLevel": {
+        "type": "object",
+        "properties": {
+          "v": {
+            "type": "boolean"
+          }
+        },
+        "required": [
+          "v"
+        ]
+      },
+      "NeverDuplicatedTopLevel": {
+        "type": "object",
+        "properties": {
+          "b": {
+            "$ref": "#/components/schemas/NeverDuplicatedNextLevel"
+          }
+        },
+        "required": [
+          "b"
+        ]
+      },
+      "Response": {
+        "type": "object"
+      },
       "ResponseItem": {
         "type": "object",
         "properties": {
@@ -298,6 +338,26 @@
         "required": [
           "word"
         ]
+      },
+      "ResponseItemResultsPage": {
+        "description": "A single page of results",
+        "type": "object",
+        "properties": {
+          "items": {
+            "description": "list of items on this page of results",
+            "type": "array",
+            "items": {
+              "$ref": "#/components/schemas/ResponseItem"
+            }
+          },
+          "next_page": {
+            "description": "token used to fetch the next page of results (if any)",
+            "type": "string"
+          }
+        },
+        "required": [
+          "items"
+        ]
       }
     }
   }

The hunks related to ResponseItemResultsPage are noise -- those are other instances of the same problem being fixed. Importantly, we see the top-level type showing up as a reference now, with the body showing up in components.schemas in the spec.

This is promising but I'm not positive I've really applied the fix at the right spot. That said, I'm not sure where else it would go.

There are also a few other callers of json_schema() that may need the same treatment. The only suspicious one I found is get_metadata() in dropshot/src/handler.rs, which might be causing us to duplicating types that are used as parameters. I'm not sure if the fix will just work there because the generator is instantiated right there.

@davepacheco
Copy link
Collaborator Author

davepacheco commented Apr 16, 2021

Query parameters seem to have the same problem. I confirmed with this change:

diff --git a/dropshot/tests/test_openapi.rs b/dropshot/tests/test_openapi.rs
index 3af7e5f..621ce4f 100644
--- a/dropshot/tests/test_openapi.rs
+++ b/dropshot/tests/test_openapi.rs
@@ -147,7 +147,7 @@ struct NeverDuplicatedNextLevel {
 }
 
 #[endpoint {
-    method = PUT,
+    method = GET,
     path = "/dup1",
 }]
 async fn handler8(
@@ -157,7 +157,7 @@ async fn handler8(
 }
 
 #[endpoint {
-    method = PUT,
+    method = GET,
     path = "/dup2",
 }]
 async fn handler9(
@@ -166,6 +166,38 @@ async fn handler9(
     unimplemented!();
 }
 
+#[derive(Deserialize, JsonSchema)]
+struct NeverDuplicatedParamTopLevel {
+    _b: NeverDuplicatedParamNextLevel
+}
+
+#[derive(Deserialize, JsonSchema)]
+struct NeverDuplicatedParamNextLevel {
+    _v: bool
+}
+
+#[endpoint {
+    method = PUT,
+    path = "/dup3",
+}]
+async fn handler10(
+    _rqctx: Arc<RequestContext<()>>,
+    _q: Query<NeverDuplicatedParamTopLevel>,
+) -> Result<HttpResponseOk<()>, HttpError> {
+    unimplemented!();
+}
+
+#[endpoint {
+    method = PUT,
+    path = "/dup4",
+}]
+async fn handler11(
+    _rqctx: Arc<RequestContext<()>>,
+    _q: Query<NeverDuplicatedParamTopLevel>,
+) -> Result<HttpResponseOk<()>, HttpError> {
+    unimplemented!();
+}
+
 fn make_api() -> Result<ApiDescription<()>, String> {
     let mut api = ApiDescription::new();
     api.register(handler1)?;
@@ -177,6 +209,8 @@ fn make_api() -> Result<ApiDescription<()>, String> {
     api.register(handler7)?;
     api.register(handler8)?;
     api.register(handler9)?;
+    api.register(handler10)?;
+    api.register(handler11)?;
     Ok(api)
 }

We get this change in spec:

diff --git a/dropshot/tests/test_openapi_fuller.json b/dropshot/tests/test_openapi_fuller.json
index 19a1c59..e602b4a 100644
--- a/dropshot/tests/test_openapi_fuller.json
+++ b/dropshot/tests/test_openapi_fuller.json
@@ -35,7 +35,7 @@
       }
     },
     "/dup1": {
-      "put": {
+      "get": {
         "operationId": "handler8",
         "responses": {
           "200": {
@@ -52,7 +52,7 @@
       }
     },
     "/dup2": {
-      "put": {
+      "get": {
         "operationId": "handler9",
         "responses": {
           "200": {
@@ -68,6 +68,48 @@
         }
       }
     },
+    "/dup3": {
+      "put": {
+        "operationId": "handler10",
+        "parameters": [
+          {
+            "in": "query",
+            "name": "_b",
+            "required": true,
+            "schema": {
+              "$ref": "#/components/schemas/NeverDuplicatedParamNextLevel"
+            },
+            "style": "form"
+          }
+        ],
+        "responses": {
+          "200": {
+            "description": "successful operation"
+          }
+        }
+      }
+    },
+    "/dup4": {
+      "put": {
+        "operationId": "handler11",
+        "parameters": [
+          {
+            "in": "query",
+            "name": "_b",
+            "required": true,
+            "schema": {
+              "$ref": "#/components/schemas/NeverDuplicatedParamNextLevel"
+            },
+            "style": "form"
+          }
+        ],
+        "responses": {
+          "200": {
+            "description": "successful operation"
+          }
+        }
+      }
+    },
     "/impairment": {
       "get": {
         "operationId": "handler6",
@@ -366,6 +408,17 @@
         "required": [
           "items"
         ]
+      },
+      "NeverDuplicatedParamNextLevel": {
+        "type": "object",
+        "properties": {
+          "_v": {
+            "type": "boolean"
+          }
+        },
+        "required": [
+          "_v"
+        ]
       }
     }
   }

@davepacheco
Copy link
Collaborator Author

I think this probably also affects TypedBody as well.

The query params case is a little trickier to fix than I expected because the obvious change breaks the way we're detecting whether this is a paginated endpoint.

@davepacheco
Copy link
Collaborator Author

davepacheco commented Apr 16, 2021

I think I misread the output from the query params case in my comment above. I realized this when I took a closer look at the generated schema. The spec sort of looks like it has inlined the top-level type, in that there are two schemas for an object with property _b, etc. However, being query parameters, there's not an actual type for all the parameters. The NeverDuplicatedParamTopLevel type doesn't appear anywhere in the spec, let alone twice. The inner type is already a reference because it's not a top-level type. So the query params case was correct before my change.

davepacheco added a commit to oxidecomputer/omicron that referenced this issue Apr 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant