Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove access to process.env from hot code paths #5657

Merged
merged 7 commits into from
Sep 30, 2021
Merged

Remove access to process.env from hot code paths #5657

merged 7 commits into from
Sep 30, 2021

Conversation

moberegger
Copy link

@moberegger moberegger commented Aug 26, 2021

Removes access to process.env from hot code paths. My understanding on this is admittedly somewhat limited, but there is a performance penalty that occurs when accessing process.env (or perhaps just process) because Node needs to drop down to C which is (apparently) quite expensive. The penalty wouldn't be noticeable as a one time thing, but for things that are running often at scale, it can really add up.

I vaguely recall this being an issue in React SSR early on... remember reading something about this being discovered and remedied, leading to immediate perf improvements. Can't find the article about it... and now I'm wondering if my memory is failing me... but I recall this being the first thing that made me aware of the process.env penalty.

(Edit: I think it was this: https://reactjs.org/blog/2017/09/26/react-v16.0.html#better-server-side-rendering)

Offhand, I know express recommends this, too.

From what I could tell, these aren't transpiled out (I looked in the dist directory that gets installed in node_modules). Perhaps this is already being handled differently, making this PR moot, but figured I'd throw this up and get your thoughts.

There are other spots I saw doing this, but they looked like they were only being done when the server started, so the penalty wouldn't be noticeable at all. Suppose it could be done across the board for consistency.

Comment on lines -285 to +287
if (process.env.NODE_ENV === 'development') {
if (NODE_ENV === 'development') {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear in the diff, but this is for the trace method. A production environment would still pay the cost for this check, even though it would eventually evaluate to false and skip the tracing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is fine. RESTDataSource hasn't been actively maintained in a little while (the current Apollo Server team has not yet ramped up on it; it's basically an entirely unrelated project that is very loosely coupled to Apollo Server). I don't know if anyone depends on the ability to set process.env.NODE_ENV at runtime (eg, potentially after this file is imported) and have it take effect. But if it breaks people we can revert it.

Comment on lines 136 to 143
if (options.debug === undefined) {
const nodeEnv =
'__testing_nodeEnv__' in options
? options.__testing_nodeEnv__
: process.env.NODE_ENV;
'__testing_nodeEnv__' in options ? options.__testing_nodeEnv__ : NODE_ENV;
options.debug = debugFromNodeEnv(nodeEnv);
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one also looks like it would happen in production environments, where debug would likely be undefined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I would like to be sensitive to runtime NODE_ENV changes. We actually already capture NODE_ENV once in the ApolloServer constructor and pass that through to various ways. Unfortunately for historical reasons much of the core processing in this package happens technically outside of the ApolloServer class even though it is only usable when called from ApolloServer, so this file doesn't have access to that.

That said, I basically implemented this already but just as a "testing" hack. What I'd suggest is taking the __testing_nodeEnv__ fields of Config and GraphQLServerOptions and renaming them to nodeEnv. Then in the place in packages/apollo-server-core/src/ApolloServer.ts where we do const nodeEnv = we should set that on this.requestOptions.nodeEnv instead. That should lead to options.nodeEnv on this line in runHttpQuery being equal to the env var's value (or a manual override) already, so no need to read the env var in this file.

Is this something you can try?

Copy link
Author

@moberegger moberegger Sep 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've spent a few days trying to do this, and I'm not sure what to do.

The logic differs on the existence of nodeEnv, not necessarily its value. More specifically, providing nodeEnv: undefined is not the same thing as omitting it altogether.

What I would expect to happen is that if you don't provide nodeEnv, ApolloServer would default to the NODE_ENV environment variable. In other words, if nodeEnv is undefined, then default to NODE_ENV.

This doesn't work with the current implementation, at least as per the tests. The current implementation will simply check if you attempt to set it, even if it's value is undefined; as long as it's there, it's considered an override.

So what is happening is that the tests fail, because when nodeEnv is not provided, it defaults to NODE_ENV, and when NODE_ENV is test, debug is false, and you don't get stacktraces back in errors.

The old implementation would avoid defaulting to test because __testing_nodeEnv__ was simply there, even though it was undefined. The test cases look like they are attempting to validate behaviour when no node environment is set.

I'm not sure how to accommodate this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, I don't think it's possible to get this to not refer NODE_ENV in some way anyways, because resolveGraphqlOptions is what is supposed to give you the nodeEnv option, but that can error out and put you in a state where you need to know the node environment, but have no ability to get nodeEnv, and thus have to default to NODE_ENV anyways.

This would also introduce a quirk that even if you provided nodeEnv, it will be ignored depending on how the thing errors out, which sounds like a foot gun to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When __testing_nodeEnv__ was only being used in a few tests, making undefined distinct from "unset" seemed like a reasonable choice. It sounds like this is more of a burden when it's an actual part of the API.

How about we just change that? So we could make nodeEnv: undefined and "nodeEnv unset" be the same thing (use the actual environment variable if it's set), and we can use nodeEnv: '' to mean "seriously, don't be sensitive to the actual process environment, act as if the node env is unspecified". (Technically in the Unix model, there is a distinction between unset environment variables and environment variables that are set to the empty string, but it's not great to treat them differently and our code that reads this doesn't, so this should be fine.)

Does that resolve all the concerns you have here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it!

@glasser
Copy link
Member

glasser commented Aug 27, 2021

This seems like a good idea! I look forward to reviewing it soon.

Comment on lines -285 to +287
if (process.env.NODE_ENV === 'development') {
if (NODE_ENV === 'development') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is fine. RESTDataSource hasn't been actively maintained in a little while (the current Apollo Server team has not yet ramped up on it; it's basically an entirely unrelated project that is very loosely coupled to Apollo Server). I don't know if anyone depends on the ability to set process.env.NODE_ENV at runtime (eg, potentially after this file is imported) and have it take effect. But if it breaks people we can revert it.

Comment on lines 136 to 143
if (options.debug === undefined) {
const nodeEnv =
'__testing_nodeEnv__' in options
? options.__testing_nodeEnv__
: process.env.NODE_ENV;
'__testing_nodeEnv__' in options ? options.__testing_nodeEnv__ : NODE_ENV;
options.debug = debugFromNodeEnv(nodeEnv);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I would like to be sensitive to runtime NODE_ENV changes. We actually already capture NODE_ENV once in the ApolloServer constructor and pass that through to various ways. Unfortunately for historical reasons much of the core processing in this package happens technically outside of the ApolloServer class even though it is only usable when called from ApolloServer, so this file doesn't have access to that.

That said, I basically implemented this already but just as a "testing" hack. What I'd suggest is taking the __testing_nodeEnv__ fields of Config and GraphQLServerOptions and renaming them to nodeEnv. Then in the place in packages/apollo-server-core/src/ApolloServer.ts where we do const nodeEnv = we should set that on this.requestOptions.nodeEnv instead. That should lead to options.nodeEnv on this line in runHttpQuery being equal to the env var's value (or a manual override) already, so no need to read the env var in this file.

Is this something you can try?

@moberegger
Copy link
Author

Ok, I took a first attempt. Still getting my feet wet with Typescript, so I'm sure I missed something. I'll come back to this later and correct any mistakes.

Comment on lines -102 to -108
// Apollo Server only uses process.env.NODE_ENV to determine defaults for
// other behavior which have other mechanisms of setting explicitly. Sometimes
// our tests want to test the exact logic of how NODE_ENV affects defaults;
// they can set this parameter, but there's no reason to do so other than for
// tests. Note that an explicit `__testing_nodeEnv__: undefined` means "act as
// if the environment variable is not set", whereas the absence of
// `__testing_nodeEnv__` means to honor the environment variable.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't think that this comment was necessary anymore, since I suppose you can consider nodeEnv a first class option now.

@@ -29,6 +29,7 @@ describe('runHttpQuery', () => {
query: '{ testString }',
},
options: {
debug: false,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was required to avoid returning a stack trace, which the test didn't expect. Without running through ApolloServer first, the nodeEnv stuff never runs, so it doesn't get setup correctly for this test. This avoids that.

@glasser
Copy link
Member

glasser commented Sep 29, 2021

Sorry for the delay — hope to get to this tomorrow.

@@ -131,16 +133,12 @@ export async function runHttpQuery(
// debug. Therefore, we need to do some unnatural things, such
// as use NODE_ENV to determine the debug settings
return throwHttpGraphQLError(500, [e as Error], {
debug: debugFromNodeEnv(process.env.NODE_ENV),
debug: debugFromNodeEnv(NODE_ENV),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably at least extend the comment above to note that unlike other aspects of NODE_ENV in this package, this one isn't sensitive to runtime changes to the environment variable.

(Alternatively, perhaps it's OK performance-wise to read the environment variable in this 500 situation?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, perhaps it's OK performance-wise to read the environment variable in this 500 situation

Yeah, I suppose this isn't a hot code path. I imagine an error here would be something the would only happen intermittently... and if it's happening more than that, you likely got bigger problems than a runtime cost!

Either way, in my new revision I have debugFromNodeEnv default to the "cached" NODE_ENV, so I added the comment you requested. I think that this keeps it consistent with the default behaviour (ie. nodeEnv defaults to NODE_ENV and is also not sensitive to runtime changes).

@glasser
Copy link
Member

glasser commented Sep 29, 2021

I'm not sure if you want me to respond to the various comments that you've "resolved". The PR looks largely good, though I think you were right to point out that the current API (where undefined and unset are distinct) isn't great. Do you want to make that change?

@glasser
Copy link
Member

glasser commented Sep 30, 2021

Looks good! I tightened up a few types by dropping | undefined and added CHANGELOG and docs.

@glasser glasser merged commit c2a3347 into apollographql:main Sep 30, 2021
nodeEnv?: string | undefined;
nodeEnv?: string;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah! I made a note to do this, too! Totally missed it.

@moberegger
Copy link
Author

Awesome! Thank you @glasser !

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants