-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
@apollo/gateway: Make ApolloGateway.stop() reliable and required #452
Conversation
Before merging this I should implement the Apollo Server change described in it and release v2.20. It's ready for review but I'll mark as draft for this reason. |
@@ -243,6 +262,11 @@ export class ApolloGateway implements GraphQLService { | |||
apollo?: ApolloConfig; | |||
engine?: GraphQLServiceEngineConfig; | |||
}) { | |||
if (this.state.phase !== 'initialized') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I be able to call load
again after awaiting the stop
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably disregard, I can tell below it's explicitly written with the intention of not allowing that.
this.pollServices(); | ||
if (this.state.phase === 'polling') { | ||
// If we weren't stopped, we should transition back to the initial 'loading' state and trigger | ||
// another call to itself. (Do that in a setImmediate to avoid unbounded stack sizes.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! TIL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I actually don't know offhand if you get unbounded stack sizes from recursion in async functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I checked and you do.)
Because AS is what invokes ApolloGateway.load, it should be its responsibility to invoke the matching stop method. apollographql/federation#452 (aimed at @apollo/gateway 0.23) will change ApolloGateway to no longer "unref" its polling timer: ie, it makes calling `stop()` actually part of its expected API instead of something you can ignore if you feel like it without affecting program shutdown. It has a bit of a hack to still unref the timer if it looks like you're using an old (pre-2.18) version of Apollo Server, but this PR (which will be released in v2.20.0) will make ApolloServer stop the gateway for you. Part of fixing #4428.
Because AS is what invokes ApolloGateway.load, it should be its responsibility to invoke the matching stop method. apollographql/federation#452 (aimed at @apollo/gateway 0.23) will change ApolloGateway to no longer "unref" its polling timer: ie, it makes calling `stop()` actually part of its expected API instead of something you can ignore if you feel like it without affecting program shutdown. It has a bit of a hack to still unref the timer if it looks like you're using an old (pre-2.18) version of Apollo Server, but this PR (which will be released in v2.20.0) will make ApolloServer stop the gateway for you. Part of fixing #4428. Also simplify typings of toDispose set. `() => ValueOrPromise<void>` is a weird type because `void` means "I'm not going to look at the return value" which is sorta incompatible with "but I need to see if it's a Promise or not". So just make these all async (changing a couple implementations by adding `async`).
The (not particularly documented) ApolloGateway.stop() method didn't reliably stop the gateway from polling. All it did was cancel a timeout. But if it is called while pollServices is in the middle of running, it would do nothing, and then pollServices would carry on and set another timeout. Better semantics would be for stop() to reliably stop polling: allow the current poll to complete, ensure that there will be no more polls, and then (async) return. This PR implements those semantics, by implementing an explicit state machine for ApolloGateway's polling. One reason that these bugs were able to survive is that calling stop() is often unnecessary. In apollographql/apollo-server#3105 we chose to `unref` the polling timeout to allow the Node process to exit if it's the only thing left on the event loop, instead of encouraging users of `ApolloGateway` to be responsible and call `stop()` themselves. While this may be reasonable when the gateway's lifecycle is merely a schema polling timer, we may have future changes to the gateway where proper lifecycle handling is more important. So this PR also moves away from the world where it's fine to not bother to explicitly stop the gateway. That said, in the common case, we don't want users to have to write gateway stopping code. It makes more sense for stopping the `ApolloGateway` to be the responsibility of `ApolloServer.stop()`, just as `ApolloServer.stop()` can trigger events in plugins. So in the recently-released Apollo Server v2.20, `ApolloServer.stop()` calls `ApolloGateway.stop()`. This should mean that in most cases, the missing `unref` shouldn't keep the process running, as long as you've run `ApolloServer.stop()` (and if you don't, it's likely that other parts of the server are keeping your process running). What if you're still running an old version of Apollo Server? For a bit of backwards compatibility, `ApolloGateway` detects if it appears to be connected to Apollo Server older than v2.18. If so, it continues to do the old unref(). If you're using v2.18 or v2.19, the upgrade to v2.20 should be pretty painless (it's mostly just minor bugfixes). If you're using ApolloGateway on its own for some reason, and this change causes your processes to hang on shutdown, adding a `stop()` call should be pretty straightforward. (If we learn that this change really is devastating, we can always go back to an unconditional timer.unref() later.) Fixes #4428.
c4fe737
to
d82e916
Compare
The (not particularly documented) ApolloGateway.stop() method didn't reliably
stop the gateway from polling. All it did was cancel a timeout. But if it is
called while pollServices is in the middle of running, it would do nothing, and
then pollServices would carry on and set another timeout.
Better semantics would be for stop() to reliably stop polling: allow the current
poll to complete, ensure that there will be no more polls, and then (async)
return. This PR implements those semantics, by implementing an explicit state
machine for ApolloGateway's polling.
One reason that these bugs were able to survive is that calling stop() is often
unnecessary. In apollographql/apollo-server#3105 we
chose to
unref
the polling timeout to allow the Node process to exit if it'sthe only thing left on the event loop, instead of encouraging users of
ApolloGateway
to be responsible and callstop()
themselves. While this maybe reasonable when the gateway's lifecycle is merely a schema polling timer, we
may have future changes to the gateway where proper lifecycle handling is more
important. So this PR also moves away from the world where it's fine to not
bother to explicitly stop the gateway.
That said, in the common case, we don't want users to have to write gateway
stopping code. It makes more sense for stopping the
ApolloGateway
to be theresponsibility of
ApolloServer.stop()
, just asApolloServer.stop()
cantrigger events in plugins. So in the recently-released Apollo Server v2.20,
ApolloServer.stop()
callsApolloGateway.stop()
. This should mean that inmost cases, the missing
unref
shouldn't keep the process running, as long asyou've run
ApolloServer.stop()
(and if you don't, it's likely that other partsof the server are keeping your process running).
What if you're still running an old version of Apollo Server? For a bit of
backwards compatibility,
ApolloGateway
detects if it appears to be connectedto Apollo Server older than v2.18. If so, it continues to do the old
unref(). If you're using v2.18 or v2.19, the upgrade to v2.20 should be pretty
painless (it's mostly just minor bugfixes). If you're using ApolloGateway on its
own for some reason, and this change causes your processes to hang on shutdown,
adding a
stop()
call should be pretty straightforward. (If we learn that thischange really is devastating, we can always go back to an unconditional
timer.unref() later.)
Fixes apollographql/apollo-server#4428.