Extend Studio usage reporting to support referenced fields in addition to executed fields #5708

hwillson · 2021-09-14T14:45:45Z

In Apollo Studio, the "Fields" page lets you see how often your fields are executed by operations — ie, how often their resolvers run. It additionally lets you see which clients and operations ran those operations.

However, knowing if an operation executed a field doesn't tell the whole story about the relationship between operations and fields. You might also be curious to learn if a field was textually referenced in the operation itself.

It's possible for a field to be referenced without executing. In fact, there are many reasons this can happen:

The field is nested under another field which evaluates to null
The field is nested under another field which evaluates to an empty list
The field is nested under a non-matching fragment
The field is nested under @include or @skip

If you're using the Fields page to determine "what are all the operations that use this field", you probably want to know about these usages too!

It's also possible for a field to be executed without being referenced. That's because when we track "executed fields", we are always tracking the concrete object type that is being executed, not an interface type it may have been resolved through. So with schema:

interface Animal {
  legs: Int
}
type Dog implements Animal {
  legs: Int
}
type Query {
  myFavoriteAnimal: Animal  # In practice, always returns a Dog
}

the operation { myFavoriteAnimal { legs } } "references" Animal.legs, not Dog.legs, but it is the field Dog.legs that is executed.

We aim to improve Apollo Studio and the apollo-server usage reporting so that the Fields page displays both the number of executions and the number of textual references.

In addition to this just being an interesting thing to ask about, this will also mean that the Fields page can be useful without needing full execution tracing. There is a real performance impact of full execution tracing, especially in the federation context where the ftv1 protocol involves sending subgraph traces to the gateway/router in-band in the response. And not every GraphQL server supports full execution tracing in the first place. If we support referenced fields on the Fields page, then you can run a gateway/router in front of an arbitrary subgraph without federated tracing (whether you want to do this for performance or lack of implementation reasons) and still get some useful data on the Fields page. (You can then perhaps run full tracing on a sampled subset of queries to get reasonable approximate execution data too.)

The text was updated successfully, but these errors were encountered:

glasser · 2021-09-21T18:52:37Z

Just as an external status update: this project had two halves. One is an internal refactoring to improve how related data is stored in our backend databases, and that has now been implemented. The next step is to make the Apollo Server related changes to report field usage differently. This won't happen until October.

RichardWright · 2021-10-12T10:34:08Z

If you could add a diagnostic channel (https://nodejs.org/api/diagnostics_channel.html) for graphql render times, that would be fantastic.

glasser · 2021-10-19T23:30:23Z

@RichardWright That sounds like something you could build with the plugin API? Seems pretty separate from Studio usage reporting.

alexandernanberg · 2021-11-24T13:47:00Z

Will you be able to use this without Apollo Studio? E.g. if you would like to use a custom server solution?

glasser · 2021-12-01T01:10:17Z

This is an improvement to a feature of Apollo Studio. You certainly could use the open source usage reporting plugin as a basis for reporting to something else if you'd like, but that's not a goal of this project.

In Apollo Studio, the "Fields" page lets you see how often your fields are *executed* by operations --- ie, how often their resolvers run. It additionally lets you see which clients and operations ran those operations. However, knowing if an operation *executed* a field doesn't tell the whole story about the relationship between operations and fields. You might also be curious to learn if a field was textually referenced in the operation itself. It's possible for a field to be referenced without executing. In fact, there are many reasons this can happen: - The field is nested under another field which evaluates to null - The field is nested under another field which evaluates to an empty list - The field is nested under a non-matching fragment - The field is nested under `@include` or `@skip` If you're using the Fields page to determine "what are all the operations that use this field", you probably want to know about these usages too! It's also possible for a field to be executed without being referenced. That's because when we track "executed fields", we are always tracking the concrete object type that is being executed, not an interface type it may have been resolved through. So with schema: ```graphql interface Animal { legs: Int } type Dog implements Animal { legs: Int } type Query { myFavoriteAnimal: Animal # In practice, always returns a Dog } ``` the operation `{ myFavoriteAnimal { legs } }` "references" `Animal.legs`, not `Dog.legs`, but it is the field `Dog.legs` that is executed. (Additionally, when using federation, fields can be executed without being referenced in the operation if the query plan requires them to be executed to fulfill an `@requires` or `@key` directive.) This PR extends Apollo Server's usage reporting plugin to provide a list of "referenced fields" along with every operation. Note that these fields depend only on the operation, not on the variables passed or anything at execution time, so we don't need one set per trace, just one set per operation. In addition to the fact that this statistic is an interesting one, this will also mean that the Fields page can be useful without needing full execution tracing. There is a real performance impact of full execution tracing, especially in the federation context where the ftv1 protocol involves sending subgraph traces to the gateway/router in-band in the response. And not every GraphQL server supports full execution tracing in the first place. With support for referenced fields on the Fields page, you will be able to run a gateway/router in front of an arbitrary subgraph without federated tracing (whether you want to do this for performance or lack of implementation reasons) and still get some useful data on the Fields page. (You can then perhaps run full tracing on a sampled subset of queries to get reasonable approximate execution data too.) Note that the new Studio functionality may not be in general availability when this PR merges into the Apollo Server release branch or when the first alphas are created; it will be fully enabled (including docs) by the time this PR is released in a non-prerelease. As part of the implementation, we extend the existing "signature cache" to cache the referenced field list as well. Note that the signature is a pure function of the operation, whereas the referenced field list also depends on the schema, so we add a little mechanism to throw out the cache if the schema changes. Fixes #5708.

The usage reporting plugin already has an includeRequest option which allows you to tell the plugin to completely ignore an operation. One reason you may have wanted to use this is to avoid the overhead of capturing a full field-by-field execution trace. But in this case, includeRequest=false is overkill: it removes the operation from aspects of Studio such as the Operations page and schema checks which don't require full execution traces to operate. This PR adds a new option, captureTraces, which is invoked after includeRequest returns true. If you return false from this operation, you won't incur the overhead of capturing a detailed trace (either directly in this process, or in subgraphs if this process is a Gateway). Most of Studio's features (including the newly-added "referencing operations" column on the Fields page) will still reflect the existence of this operation. As of the end of 2021, the features that this operation will not contribute to are: - The "field executions" column on the Fields page - The per-field timing hints shown in Explorer and in vscode-graphql - The trace view on the Operations page You can also pass a number between 0 or 1, which will be interpreted as randomly capturing traces that percent of a time. (Specifying this as a number rather than a function that invokes Math.random means that a future version of Apollo Server and Studio could include this number in the usage report and use it to let you view an estimate for total field executions based on using the number as a scale; this is however not yet implemented.) Since this may provide a common use case for running a GraphQL server without any need for per-field instrumentation, the request pipeline now only instruments the schema if it ever sees a willResolveField callback. This *almost* means that if you're running a monolithic server with captureTraces always returning false (or usage reporting disabled), or if you're running a subgraph whose gateway has captureTraces:false (and thus never receives a request with the `apollo-federation-include-trace` header), then execution won't have the small performance impact of instrumentation. In practice you need to also disable the cache control plugin to get this speedup, as it is installed by default and uses willResolveField to implement dynamic cache control. If this optimization proves to be important we can provide a mode of the cache control plugin that doesn't allow for dynamic cache control. (Alternatively we may eventually find a way to support the instrumentation of GraphQL execution with lower overhead.) Part of #5708.

The usage reporting plugin already has an `includeRequest` option which allows you to tell the plugin to completely ignore an operation. One reason you may have wanted to use this is to avoid the overhead of capturing a full field-by-field execution trace. But in this case, `includeRequest: false` is overkill: it removes the operation from aspects of Studio such as the Operations page and schema checks which don't require full execution traces to operate. This PR adds a new option, `fieldLevelInstrumentation`, which is invoked after includeRequest returns true. If you return false from this operation, you won't incur the overhead of capturing a detailed trace (either directly in this process, or in subgraphs if this process is a Gateway). Most of Studio's features (including the newly-added "referencing operations" column on the Fields page) will still reflect the existence of this operation. As of the end of 2021, the features that this operation will not contribute to are: - The "field executions" column on the Fields page - The per-field timing hints shown in Explorer and in vscode-graphql - The trace view on the Operations page Apollo Server now sends both an "observed execution count" and "estimated execution count" for each field (for each operation and client). The former is literally how many times we saw the field get executed (only during operations where `fieldLevelInstrumentation` returned truthy). If the hook only returns true or false, the latter is the same, but you may also return a number between 0 and 1 which represents the sample rate for that operation; the "estimated execution count" will be incremented by the reciprocal of that number instead of by 1. So for example, with: fieldLevelInstrumentation: () => Math.random() < 0.01 ? 0.01 : false Apollo Server will instrument 1% of operations, and the "estimated execution count" will be 100 times more than the observed execution count. (You can imagine more sophisticated implementations of `fieldLevelInstrumentation` which sample more common operations more aggressively than rare operations.) If you pass a number for `fieldLevelInstrumentation`, it is equivalent to passing a function of the form in the above example; that is, the previous example behaves identically to `fieldLevelInstrumentation: 0.01`. The `latency_count` sent with field stats (which powers per-field timing hints) is now always scaled by the given fact (ie, there's no separate "observed histogram"). Since this may provide a common use case for running a GraphQL server without any need for per-field instrumentation, the request pipeline now only instruments the schema if it ever sees a willResolveField callback. This *almost* means that if you're running a monolithic server with fieldLevelInstrumentation always returning false (or usage reporting disabled), or if you're running a subgraph whose gateway has fieldLevelInstrumentation:false (and thus never receives a request with the `apollo-federation-include-trace` header), then execution won't have the small performance impact of instrumentation. In practice you need to also disable the cache control plugin to get this speedup, as it is installed by default and uses willResolveField to implement dynamic cache control. If this optimization proves to be important we can provide a mode of the cache control plugin that doesn't allow for dynamic cache control. (Alternatively we may eventually find a way to support the instrumentation of GraphQL execution with lower overhead.) Part of #5708.

The usage reporting plugin already has an `includeRequest` option which allows you to tell the plugin to completely ignore an operation. One reason you may have wanted to use this is to avoid the overhead of capturing a full field-by-field execution trace. But in this case, `includeRequest: false` is overkill: it removes the operation from aspects of Studio such as the Operations page and schema checks which don't require full execution traces to operate. This PR adds a new option, `fieldLevelInstrumentation`, which is invoked after includeRequest returns true. If you return false from this operation, you won't incur the overhead of capturing a detailed trace (either directly in this process, or in subgraphs if this process is a Gateway). Most of Studio's features (including the newly-added "referencing operations" column on the Fields page) will still reflect the existence of this operation. As of the end of 2021, the features that this operation will not contribute to are: - The "field executions" column on the Fields page - The per-field timing hints shown in Explorer and in vscode-graphql - The trace view on the Operations page Apollo Server now sends both an "observed execution count" and "estimated execution count" for each field (for each operation and client). The former is literally how many times we saw the field get executed (only during operations where `fieldLevelInstrumentation` returned truthy). If the hook only returns true or false, the latter is the same, but you may also return a number between 0 and 1 which represents the sample rate for that operation; the "estimated execution count" will be incremented by the reciprocal of that number instead of by 1. So for example, with: fieldLevelInstrumentation: () => Math.random() < 0.01 ? 0.01 : false Apollo Server will instrument 1% of operations, and the "estimated execution count" will be 100 times more than the observed execution count. (You can imagine more sophisticated implementations of `fieldLevelInstrumentation` which sample more common operations more aggressively than rare operations.) If you pass a number for `fieldLevelInstrumentation`, it is equivalent to passing a function of the form in the above example; that is, the previous example behaves identically to `fieldLevelInstrumentation: 0.01`. The `latency_count` sent with field stats (which powers per-field timing hints) is now always scaled by the given fact (ie, there's no separate "observed histogram"). Note that the semantics of the `requestContext.metrics.captureTraces` field changes with this PR. Previously its value matched the value returned by the `includeRequest` hook; now it matches the truthiness of the value returned by the `fieldLevelInstrumentation` hook. This field determines whether Apollo Gateway includes the `apollo-federation-include-trace` header with outgoing requests so this is an appropriate change. Since this may provide a common use case for running a GraphQL server without any need for per-field instrumentation, the request pipeline now only instruments the schema if it ever sees a willResolveField callback. This *almost* means that if you're running a monolithic server with fieldLevelInstrumentation always returning false (or usage reporting disabled), or if you're running a subgraph whose gateway has fieldLevelInstrumentation:false (and thus never receives a request with the `apollo-federation-include-trace` header), then execution won't have the small performance impact of instrumentation. In practice you need to also disable the cache control plugin to get this speedup, as it is installed by default and uses willResolveField to implement dynamic cache control. If this optimization proves to be important we can provide a mode of the cache control plugin that doesn't allow for dynamic cache control. (Alternatively we may eventually find a way to support the instrumentation of GraphQL execution with lower overhead.) Part of #5708.

The usage reporting plugin already has an `includeRequest` option which allows you to tell the plugin to completely ignore an operation. One reason you may have wanted to use this is to avoid the overhead of capturing a full field-by-field execution trace. But in this case, `includeRequest: false` is overkill: it removes the operation from aspects of Studio such as the Operations page and schema checks which don't require full execution traces to operate. This PR adds a new option, `fieldLevelInstrumentation`, which is invoked after includeRequest returns true. If you return false from this operation, you won't incur the overhead of capturing a detailed trace (either directly in this process, or in subgraphs if this process is a Gateway). Most of Studio's features (including the newly-added "referencing operations" column on the Fields page) will still reflect the existence of this operation. As of the end of 2021, the features that this operation will not contribute to are: - The "field executions" column on the Fields page - The per-field timing hints shown in Explorer and in vscode-graphql - The trace view on the Operations page Apollo Server now sends both an "observed execution count" and "estimated execution count" for each field (for each operation and client). The former is literally how many times we saw the field get executed (only during operations where `fieldLevelInstrumentation` returned truthy). If the hook only returns true or false, the latter is the same, but you may also return a number (typically either 0 or at least 1) which represents a weight for that operation; the "estimated execution count" will be incremented by that number instead of by 1. So for example, with: fieldLevelInstrumentation: () => Math.random() < 0.01 ? 1/0.01 : false Apollo Server will instrument 1% of operations, and the "estimated execution count" will be 100 times more than the observed execution count. (You can imagine more sophisticated implementations of `fieldLevelInstrumentation` which sample more common operations more aggressively than rare operations.) If you pass a number for `fieldLevelInstrumentation`, it is equivalent to passing a function of the form in the above example; that is, the previous example behaves identically to `fieldLevelInstrumentation: 0.01`. The `latency_count` sent with field stats (which powers per-field timing hints) is now always scaled by the given fact (ie, there's no separate "observed histogram"). Note that the semantics of the `requestContext.metrics.captureTraces` field changes with this PR. Previously its value matched the value returned by the `includeRequest` hook; now it matches the truthiness of the value returned by the `fieldLevelInstrumentation` hook. This field determines whether Apollo Gateway includes the `apollo-federation-include-trace` header with outgoing requests so this is an appropriate change. Since this may provide a common use case for running a GraphQL server without any need for per-field instrumentation, the request pipeline now only instruments the schema if it ever sees a willResolveField callback. This *almost* means that if you're running a monolithic server with fieldLevelInstrumentation always returning false (or usage reporting disabled), or if you're running a subgraph whose gateway has fieldLevelInstrumentation:false (and thus never receives a request with the `apollo-federation-include-trace` header), then execution won't have the small performance impact of instrumentation. In practice you need to also disable the cache control plugin to get this speedup, as it is installed by default and uses willResolveField to implement dynamic cache control. If this optimization proves to be important we can provide a mode of the cache control plugin that doesn't allow for dynamic cache control. (Alternatively we may eventually find a way to support the instrumentation of GraphQL execution with lower overhead.) Part of #5708.

glasser · 2021-12-30T18:26:43Z

This is released in Apollo Server 3.6.0.

hwillson added the 2021-09 label Sep 14, 2021

hwillson added this to the Reporting milestone Sep 14, 2021

hwillson assigned glasser Sep 14, 2021

glasser added 2021-10 and removed 2021-09 labels Sep 21, 2021

hwillson added the size/large Estimated to take MORE THAN A WEEK label Oct 1, 2021

hwillson added 2021-11 and removed 2021-10 labels Nov 1, 2021

glasser mentioned this issue Nov 10, 2021

Usage reporting changes for better server/gateway performance #5586

Closed

glasser changed the title ~~Usage reporting performance improvements~~ Extend Studio usage reporting to support referenced fields in addition to executed fields Nov 10, 2021

glasser added 2021-12 and removed 2021-11 labels Dec 10, 2021

glasser mentioned this issue Dec 17, 2021

Usage reporting: report referenced fields in addition to executed fields #5956

Merged

glasser mentioned this issue Dec 22, 2021

usage reporting plugin: add fieldLevelInstrumentation option #5963

Merged

glasser closed this as completed in 17c6211 Dec 30, 2021

glasser mentioned this issue Jan 26, 2022

Change usage reporting defaults backwards-incompatibly to send less data #6051

Closed

hwillson removed size/large Estimated to take MORE THAN A WEEK 2021-12 labels Jun 24, 2022

github-actions bot locked as resolved and limited conversation to collaborators Apr 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend Studio usage reporting to support referenced fields in addition to executed fields #5708

Extend Studio usage reporting to support referenced fields in addition to executed fields #5708

hwillson commented Sep 14, 2021 •

edited by glasser

Loading

glasser commented Sep 21, 2021

RichardWright commented Oct 12, 2021

glasser commented Oct 19, 2021

alexandernanberg commented Nov 24, 2021

glasser commented Dec 1, 2021

glasser commented Dec 30, 2021

Extend Studio usage reporting to support referenced fields in addition to executed fields #5708

Extend Studio usage reporting to support referenced fields in addition to executed fields #5708

Comments

hwillson commented Sep 14, 2021 • edited by glasser Loading

glasser commented Sep 21, 2021

RichardWright commented Oct 12, 2021

glasser commented Oct 19, 2021

alexandernanberg commented Nov 24, 2021

glasser commented Dec 1, 2021

glasser commented Dec 30, 2021

hwillson commented Sep 14, 2021 •

edited by glasser

Loading