[Infra/Logs UI] [R&D] Should we replace GraphQL? #36674

jasonrhodes · 2019-05-20T11:07:44Z

We've discussed whether having GraphQL as part of the infrastructure and logs UI data layer is a good thing or whether we should remove it in favor of a more simple REST (or other?) interface. We should outline the basic pros and cons. I think it would be good to have the folks who have been working with this code (@simianhacker, @weltenwort, @skh, and @Kerry350) document the following answers:

"Keep GraphQL" side:

Where and how do we currently use GraphQL in the infra and logs UI?
What advantages do we get out of using GraphQL?
What might we lose by switching to REST?

"Remove GraphQL" side:

What are the downsides of continuing to use and support GraphQL in the infra and logs UI?
What are the best options to replace GraphQL if we remove it?
How complicated/risky will it be to remove?

Time-box: 2-3 hours

elasticmachine · 2019-05-20T11:07:45Z

Pinging @elastic/infrastructure-ui

weltenwort · 2019-05-20T12:19:17Z

Thanks for remembering to bring this up.

Current Usage

We're using GraphQL for all http API endpoints in the infra plugin except the rather new metrics explorer. The usage looks like this:

There is a single api at ${BASE_PATH}/api/infra/graphql
The server-side Query and Mutation schemata of the API are composed from schema.gql.ts files, which are co-located with the respective resolver code.
The client-side query schemata are derived from *.gql_query.ts, which are co-located with the react containers that use them in their apollo client calls or Query/Mutation usages.
The common, server-side and client-side TypeScript types are generated by the x-pack/plugins/infra/scripts/generate_types_from_graphql.js script, which uses https://github.com/dotansimha/graphql-code-generator to turn the aforementioned schema files into TypeScript interfaces.

(Warning, opinions and speculation from here on)

Advantages

End-to-end type safety is provided by using the graphql schema as the single source of truth in combination with the type generation script
Request and response payload validation is implicitly performed by the apollo client and server.

Disadvantages

The more advanced features of apollo-client (like caching and local state) don't work very reliably, which is why we chose not to use them.
The type generation script requires some additional effort to maintain.
Resolvers are harder to type and read than normal HTTP request handlers.

Alternative

Considering the current usage outlined above, I think we could preserve the advantages and reduce disadvantages by using basic HTTP endpoints with JSON serialization in combination with https://github.com/gcanti/io-ts (already in Kibana's dependency list) for validation and type safety. I don't think the ReST paradigm matches our API very well (and I've rarely seen it implemented correctly). Using Joi for validation also is not ideal IMHO, because it (1) doesn't work in the browser and (2) only validates the request, but not the response and doesn't integrate with TypeScript. Instead, if we replaced GraphQL with HTTP routes, I would propose something like the following:

(common) For every route create 1-3 io-ts runtime types and their derived TypeScript types in a common location for every route:
- optionally one type for the request url parameters
- optionally one type for the request body
- one type for the response body
(server-side) Register the HTTP routes with the Kibana HTTP server as usual.
(server-side) At the beginning of every request handler, decode the request url params and body using the corresponding runtime types. This makes them available as correctly typed values in downstream server-side code. A helper function might be useful to encapsulate this.
(server-side) At the end of every request handler use the runtime type to validate response body.
(client-side) Perform the query using Kibana's kfetch.
(client-side) Before sending the requests, validate the request url params and body using the corresponding runtime types. Again, a helper function or wrapper of kfetch might be useful (possibly a hook that extracts kfetch from the context and uses the supplied runtime type defs?).
(client-side) After receiving a reponse, decode the response using the runtime types. This could also be done using the helper just mentioned.

This would preserve the following characteristics on both client and server

Single source of truth for API types
Full validation of the request params and payloads
Correctly typed values

while improving on problems we currently have with GraphQL

No additional type DSL and type generation script
More commonly used tooling (hapi HTTP/kfetch)
Limited complexity compared to GraphQL

Thanks for reading this wall of text. I'm looking forward to hearing opinions and alternative proposals.

Kerry350 · 2019-05-21T13:03:39Z

"Keep GraphQL" side:

Where and how do we currently use GraphQL in the infra and logs UI?

@weltenwort's answer to this was very exhaustive. All I'll add is there's now one additional non-GraphQL usage in the "IP to host" endpoint (this was only merged yesterday).

What advantages do we get out of using GraphQL?
- Type safety and the ability to generate types from the schema
- A "source of truth" with the schema
- Parity with the wider community. By this I mean the trend was, for a while, to move to GraphQL from things like REST. This means more libraries, articles, bug fixes etc become relevant to us. However, I've actually seen this trend reversing of recent.
- In an ideal world we'd have the ability to request "what we need" (GraphQL's biggest selling point), but we don't use this functionality.
- There is a theoretical advantage that GraphQL can provide better analytics.
What might we lose by switching to REST?
- We will lose the overarching schema, which is useful as a source of truth and for generating types. We will need to fill this gap ourself with io-ts, Joi et al.
- We'd lose the ability to "play" with / test the API with GraphiQL.

"Remove GraphQL" side:

What are the downsides of continuing to use and support GraphQL in the infra and logs UI?
- We're not using the features that make GraphQL powerful - things like requesting a subset of fields, requesting multiple resources, dealing with polymorphic data etc
- Everything is a POST, HTTP caching is lost (say we want to start caching things like the auto suggestion results)
- We have the overhead of Apollo, but don't use two of it's most powerful features: caching and subscriptions (for realtime data)
- Cognitive overhead: there is more to reason about in implementing something in GraphQL (schema, resolvers etc) than a simple endpoint function
- Testing is a bit more involved (than using something like Supertest)
What are the best options to replace GraphQL if we remove it?

I actually really like the method proposed by @weltenwort (for context we spoke a little about it at GAH too, and Felix explained the issues with Joi in the browser).

I don't think the ReST paradigm matches our API very well (and I've rarely seen it implemented correctly).

I do strongly agree with this. Now, I fall in to the "It's RESTful" camp when it's really not just as much as anyone else. But, if we're completely redefining how we do things here, we might as well be accurate 😄 I honestly can't see that for every resource we provide we'll do things like provide hypermedia links to transition that resource from one state to another (pagination), or to navigate to linked resources.

How complicated/risky will it be to remove?

If we go endpoint by endpoint I don't think there's too much risk, at least no more than the Redux -> Hooks refactors that have been taking place.

weltenwort · 2019-05-21T15:20:28Z

We'd lose the ability to "play" with / test the API with GraphiQL.

Good point! This really helped me during development and testing and it might be difficult to replicate that without GraphQL. There are similar tools for HTTP APIs, but they also require some kind of machine-readable API specification or documentation to provide a comparable level of convenience.

If we go endpoint by endpoint I don't think there's too much risk, at least no more than the Redux -> Hooks refactors that have been taking place.

Agreed. The complexity of the migration might be a bit higher if we want to slim down the abstraction layers at the same time. But nothing that can't be handled with a bit of API testing discipline.

simianhacker · 2019-05-21T16:56:59Z

We'd lose the ability to "play" with / test the API with GraphiQL.

Good riddance! The GraphiQL enables us to be lazy. We shouldn't be playing with the API for testing/development, we should be writing API integration tests to trigger the endpoints for development.

I don't think the ReST paradigm matches our API very well (and I've rarely seen it implemented correctly).

Let's agree to stop calling it REST and just call it an HTTP API. Getting bogged down in the details of REST will end up paralyzing us into an endless analysis of how REST like we are, total waste of time.

Cognitive overhead: there is more to reason about in implementing something in GraphQL (schema, resolvers etc) than a simple endpoint function

This is the primary reason we are making this change. Since we are barely using the "killer" features of GraphQL it's pretty hard to justify asking new team members to learn yet another library (along with all the idiosyncrasies of Kibana). Everyone should know how HTTP API's work without much explanation.

Another related issue, the types that GraphQL script generates are practically unreadable and increases the complexity.

import { SourceQuery } from '../../../graphql/types';
export const useMetricsExplorerState = (
  source: SourceQuery.Query['source']['configuration'], // <-- Yuk!
  derivedIndexPattern: StaticIndexPattern
) => {
  ...
}

Furthermore, the type generation is not "automatic", we have to remember to re-generate the types when the schema changes, this can be frustrating to debug when things get out of sync or the output from the underlying library changes. Yet another moving piece we have to manage and maintain.

skh · 2019-05-22T07:54:34Z

Pro GraphQL

I definitely do access the API directly during development and for manual testing. GraphiQL is an extremely nice tool for this, but I can do what I need with curl or httpie from the command line. This won't have the nice autocompletion and API documentation, but I haven't really relied on that.
I like that GraphQL, through its rigid approach of doing things, forced us to write our endpoints mostly the same way, type and document them. We will probably not keep up the same level of consistency without it.

Contra GraphQL

I agree with everything that's been said above about mental overhead, and us not taking advantage of the real strengths of GraphQL anyway.

Replacement

I like the approach @weltenwort outlines above, but would like to suggest we start with a rather basic (simple, stupid) implementation and only add abstractions / helpers when we're sure they simplify things.
We're mostly only reading data, so I wasn't worried about us getting lost in REST discussions, but by all means, let's call it just an HTTP API and not try to follow a standard where there is not a single one anyway.

jasonrhodes · 2019-05-23T15:42:26Z

This is fantastic feedback, @infra-logs-ui team! I like everything that's laid out here, and in the end agree with the approach spelled out by @skh that we should try to move to a very simple HTTP API approach and then improve it as we go, although if there are parts of the implementation that @weltenwort outlined above that we think are simple enough to do, we can consider them. I confess I don't completely follow it so we should zoom about it when everyone is back from holiday in a week and begin to make a more structured plan for this work. I'll put something on the calendar.

Thanks!

weltenwort · 2019-06-05T14:05:39Z

An example of what it would look like with client-to-server typing: weltenwort#2

roncohen · 2019-06-11T09:32:51Z

what's the the runtime overhead of io-ts validation? I understood it would run through every JSON response to check that everything is the right type. For large responses (1000+ numbers and dates, for example), I'd assume this could add significant overhead if we're not careful.

weltenwort · 2019-06-11T11:47:18Z

I'm not aware of any benchmarks, but we could perform some. But since not performing validation is not really an alternative, we should compare against other validation libraries.

roncohen · 2019-06-11T13:05:23Z

But since not performing validation is not really an alternative, we should compare against other validation libraries.

Is it common to do run time type checking of APIs in production? As an aside, React propTypes are only validated in development. If there's a significant overhead from io-ts, could we use it in development mode only?

weltenwort · 2019-06-11T13:24:40Z

I consider validating API payloads a security best practice in production. Not blindly trusting user input (which anything from the network essentially is) is a basic principle IMHO. The apollo server and client did that for us so far, but when we move to a custom HTTP API we will have to manually include that validation step. I proposed io-ts for that because it not only performs the validation but also ensures the compiler is aware of the value types, thereby providing a single source of truth (like the GraphQL schema was before).

The analogy to React prop types is not valid, I think. React component props don't really represent a boundary of the system to an untrusted environment in the same way that a network API does.

roncohen · 2019-06-11T13:30:56Z

to be clear, i think it's great if we can get type checked API responses, but we need to be aware of the overhead

jfsiii · 2019-07-21T17:58:53Z

An example of what it would look like with client-to-server typing: weltenwort#2

@weltenwort I just saw this and it reminds me of the explorations I did in #40041 around using JSON Schema to describe the types. As the io-ts author outlines here, depending on one's preferences for which format should be the "source of truth", it's possible to either

convert a runtime type to JSON Schema (either at runtime or compile time)
convert a JSON Schema to a runtime type (either at runtime or compile time)

I'd love to talk more about this when you get a moment. I think this could unlock some very useful features.

weltenwort · 2019-07-22T08:47:59Z

Absolutely, I'd be happy to discuss it when you have the time.

Thanks for linking the io-ts issue. From the experience we gathered with GraphQL I would prefer the variant where we define the io-ts type at runtime as the source of trutz. The separate compilation step is seen as one of the biggest pains in our current graphql setup, so a pure runtime setup would be preferable.

jfsiii · 2019-07-22T16:55:00Z

The separate compilation step is seen as one of the biggest pains in our current graphql setup, so a pure runtime setup would be preferable.

Totally agree about avoiding the compilation step. I was hoping for something like Enjoi which I used in that PR to convert from JSON Schema to Joi at runtime without a compilation step

kibana/x-pack/legacy/plugins/integrations_manager/server/routes.ts

Line 18 in 5301b2c

import listSchema from '../schemas/consumed/list.schema.json';

kibana/x-pack/legacy/plugins/integrations_manager/server/routes.ts

Lines 34 to 43 in 5301b2c

    
           options: { 
        
             tags: [`access:${PLUGIN_ID}`, 'api'], 
        
             response: { 
        
               // TODO: Figure out why we need .meta and to spread 
        
               schema: Enjoi.schema({ 
        
                 ...listSchema, 
        
                 ...listSchema.definitions.IntegrationListElement, 
        
               }).meta({ className: 'IntegrationListElement' }), 
        
             }, 
        
           },

But I see that both io-ts-codegen and json-schema-to-typescript require writing out to disk. Which make sense I guess.

I was hoping to have a spec expressed in JSON Schema in implemented in other places/languages (Principle of Least Power, etc), but the problems caused by having them out of sync (loss of trust, false positives & negatives) and pain of keeping them in sync seem to shift to authoring in TS and exporting to JSON Schema as needed.

One item from that issue about starting with a runtime type:

However conversions might become tricky if there are recursive runtime types involved.

Can you think of any places where we have recursive runtime types?

weltenwort · 2019-07-22T18:26:39Z

Can you think of any places where we have recursive runtime types?

Not really. And we probably want to use a safe JSON serializer (with stable key order and cycle detection) in any case when doing this kind of conversion. Do you intend to serve the JSON schema on a well-know route or do you want to write it out to a file?

jfsiii · 2019-07-22T20:11:05Z

I don't even know if we'll need it now. My initial idea was have a spec in JSON Schema so it could be a) converted to Joi for validation, TS for types, etc and b) included in an OpenAPI definition

We've gone over how we'll try to address (a) in the last few comments and I just saw that there are a few references, especially #40623, for incorporating OpenAPI definitions.

I think I'll start with having Integrations Manager use io-ts and see where that leads.

weltenwort · 2019-07-22T20:19:15Z

Yes, that makes the most sense to me as well. We should be able to generate everything both at runtime and any other point in time from the runtime types.

elasticmachine · 2019-09-05T09:31:37Z

Pinging @elastic/infra-logs-ui

jasonrhodes added Feature:Metrics UI Metrics UI feature Feature:Logs UI Logs UI feature R&D Research and development ticket (not meant to produce code, but to make a decision) labels May 20, 2019

jasonrhodes added the [zube]: Investigate label May 28, 2019

jasonrhodes self-assigned this May 28, 2019

jasonrhodes mentioned this issue Jun 5, 2019

[Infra/Logs UI] HTTP API Documentation (replaces graphql) #38164

Closed

jasonrhodes closed this as completed Jun 5, 2019

jasonrhodes removed the [zube]: Investigate label Jun 5, 2019

jfsiii mentioned this issue Aug 28, 2019

NP Router handler doesn't provide to request body on validate: false #44170

Closed

sgrodzicki added the Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services label Sep 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Infra/Logs UI] [R&D] Should we replace GraphQL? #36674

[Infra/Logs UI] [R&D] Should we replace GraphQL? #36674

jasonrhodes commented May 20, 2019

elasticmachine commented May 20, 2019

weltenwort commented May 20, 2019

Kerry350 commented May 21, 2019

weltenwort commented May 21, 2019

simianhacker commented May 21, 2019

skh commented May 22, 2019

jasonrhodes commented May 23, 2019

weltenwort commented Jun 5, 2019

roncohen commented Jun 11, 2019

weltenwort commented Jun 11, 2019

roncohen commented Jun 11, 2019

weltenwort commented Jun 11, 2019 •

edited

Loading

roncohen commented Jun 11, 2019

jfsiii commented Jul 21, 2019

weltenwort commented Jul 22, 2019

jfsiii commented Jul 22, 2019

weltenwort commented Jul 22, 2019

jfsiii commented Jul 22, 2019

weltenwort commented Jul 22, 2019

elasticmachine commented Sep 5, 2019

[Infra/Logs UI] [R&D] Should we replace GraphQL? #36674

[Infra/Logs UI] [R&D] Should we replace GraphQL? #36674

Comments

jasonrhodes commented May 20, 2019

elasticmachine commented May 20, 2019

weltenwort commented May 20, 2019

Kerry350 commented May 21, 2019

"Keep GraphQL" side:

"Remove GraphQL" side:

weltenwort commented May 21, 2019

simianhacker commented May 21, 2019

skh commented May 22, 2019

Pro GraphQL

Contra GraphQL

Replacement

jasonrhodes commented May 23, 2019

weltenwort commented Jun 5, 2019

roncohen commented Jun 11, 2019

weltenwort commented Jun 11, 2019

roncohen commented Jun 11, 2019

weltenwort commented Jun 11, 2019 • edited Loading

roncohen commented Jun 11, 2019

jfsiii commented Jul 21, 2019

weltenwort commented Jul 22, 2019

jfsiii commented Jul 22, 2019

weltenwort commented Jul 22, 2019

jfsiii commented Jul 22, 2019

weltenwort commented Jul 22, 2019

elasticmachine commented Sep 5, 2019

weltenwort commented Jun 11, 2019 •

edited

Loading