Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drain manager: Make probabilistic drain configurable #11403

Merged
merged 38 commits into from
Jun 10, 2020

Conversation

auni53
Copy link
Contributor

@auni53 auni53 commented Jun 2, 2020

  • Add DrainStrategy enum to Options with Graceful and Immediate
  • Disable probabilistic drain in DrainManager if DrainStrategy == Immediate
  • Add integration tests

Risk Level: Low.
Testing: Integration tests, verify that the race condition from #11240 does not occur if the probabilistic drain is disabled.

@auni53
Copy link
Contributor Author

auni53 commented Jun 2, 2020

@mattklein123 wanted to get your thoughts on this before I add the docs and release notes. I added a bool that guards the probabilistic drain behaviour, but I've got two thoughts:

  1. Instead of bool drainIncrementally() I was thinking about enum drainStrategy(), where we can define different drain strategies in the server options header. There'd just be two right now, but this would make it easier to put more of the overall server shutdown sequence into the drainmanager in the future.

  2. It's a little tiresome having to change so many codepaths every time we want to add a new option to the integration server config. Is it possible to modify the server optionsimpl directly? If not, I'll probably add a simple struct DrainManagerOptions to pass in the three drain manager configs.

Signed-off-by: Auni Ahsan <[email protected]>
Signed-off-by: Auni Ahsan <[email protected]>
Signed-off-by: Auni Ahsan <[email protected]>
Signed-off-by: Auni Ahsan <[email protected]>
@mattklein123
Copy link
Member

I was thinking about enum drainStrategy(), where we can define different drain strategies in the server options header. There'd just be two right now, but this would make it easier to put more of the overall server shutdown sequence into the drainmanager in the future.

+1 agreed this seems more future proof.

It's a little tiresome having to change so many codepaths every time we want to add a new option to the integration server config. Is it possible to modify the server optionsimpl directly? If not, I'll probably add a simple struct DrainManagerOptions to pass in the three drain manager configs.

For integration tests it seems reasonable to allow direct access, but I will defer to @alyssawilk on that (it might already be possible).

Signed-off-by: Auni Ahsan <[email protected]>
Signed-off-by: Auni Ahsan <[email protected]>
Signed-off-by: Auni Ahsan <[email protected]>
@auni53 auni53 marked this pull request as ready for review June 3, 2020 21:47
@alyssawilk
Copy link
Contributor

I don't mind allowing direct access, but please be mindful of thread safety issues - I think accessing it after the server starts running would be problematic.

@auni53
Copy link
Contributor Author

auni53 commented Jun 4, 2020

I don't mind allowing direct access, but please be mindful of thread safety issues - I think accessing it after the server starts running would be problematic.

I decided this was too complicated for this PR and just plumbed the other variable. I'll maybe try to get this to work on the side.

@alyssawilk
Copy link
Contributor

I'm going to assume this is still not ready for review as CI is red. Ping when it's ready for a pass!

auni53 and others added 7 commits June 4, 2020 23:31
Signed-off-by: Auni Ahsan <[email protected]>
Signed-off-by: Auni Ahsan <[email protected]>
Moving the choice of http or tcp connection pool from the router to the generic connection pool.

This will allow pluggable connection pools to choose to do HTTP or TCP on their own, as well as getting rid of a bunch of ugly variant logic.

Risk Level: medium (router refactor, ideally no-op)
Testing: existing test pass
Docs Changes: n/a
Release Notes: n/a

Signed-off-by: Alyssa Wilk <[email protected]>
Signed-off-by: Auni Ahsan <[email protected]>
…nvoyproxy#11126)

Commit Message: grpc-json: preserve http request method in `x-envoy-original-method` header so that applications have access to it.
Additional Description: The grpc-json transcoder currently forwards HTTP path to applications via "x-envoy-original-path" header. We would find it useful if it also forwarded the HTTP method.
Risk Level: Low
Testing: Updated grpc-json-transcoder unit tests
Docs Changes: Added docs
Release Notes: Added release notes

Signed-off-by: Phillip Huang <[email protected]>

Signed-off-by: Auni Ahsan <[email protected]>
Signed-off-by: Alyssa Wilk <[email protected]>
Signed-off-by: Auni Ahsan <[email protected]>
@auni53
Copy link
Contributor Author

auni53 commented Jun 7, 2020

OK @alyssawilk this is green.

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! just a few comments from my end.

void setParentShutdownTime(std::chrono::seconds parent_shutdown_time) {
parent_shutdown_time_ = parent_shutdown_time;
}
void setDrainStrategy(Server::DrainStrategy drain_strategy) { drain_strategy_ = drain_strategy; }
void setLogLevel(spdlog::level::level_enum log_level) { log_level_ = log_level; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, the diff makes it look weird, but I was trying to move setParentShutdownTime(std::chrono::seconds) in order to group all of the drain-related setters together.

EXPECT_CALL(server_, healthCheckFailed()).Times(0); // Listener check will short-circuit
EXPECT_FALSE(drain_manager.drainClose());
}
INSTANTIATE_TEST_SUITE_P(DrainGradually, DrainManagerImplTest, testing::Bool());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is DrainGradually the correct name for this? Perhaps DrainStrategies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

test/integration/integration.h Show resolved Hide resolved
signal_handling_enabled_(true), mutex_tracing_enabled_(false), cpuset_threads_(false),
fake_symbol_table_enabled_(false) {}
parent_shutdown_time_(900), drain_strategy_(Server::DrainStrategy::Gradual),
mode_(Server::Mode::Serve), hot_restart_disabled_(false), signal_handling_enabled_(true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are non-google users to configure this option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you're saying this config path is only used by Google?

I was sticking to YAGNI, don't add further complications to the config plane unless someone actually wants it. I'd be happy to add CLI options for this similar to the --drain-time-s, though I'd prefer to do that in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty envoy-standard to land the configuration, release notes, and code changes in a single PR - I don't think it'll make this one too large to review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done I think, I looked at similar PRs to try to make sure I updated the right spot everywhere, but I'm not sure if I missed anything or if the language is correct.

@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.
CC @envoyproxy/api-watchers: FYI only for changes made to api/.

🐱

Caused by: #11403 was synchronize by auni53.

see: more, trace.

Signed-off-by: Auni Ahsan <[email protected]>
Signed-off-by: Auni Ahsan <[email protected]>
@jmarantz
Copy link
Contributor

jmarantz commented Jun 9, 2020

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s), but failed to run 2 pipeline(s).

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from a code perspective. Will defer to @htuch on API

@htuch
Copy link
Member

htuch commented Jun 10, 2020

/lgtm api

@htuch htuch merged commit 8c7df0f into envoyproxy:master Jun 10, 2020
@auni53 auni53 deleted the incremental branch June 11, 2020 14:15
yashwant121 pushed a commit to yashwant121/envoy that referenced this pull request Jun 24, 2020
Add DrainStrategy enum to Options with Graceful and Immediate
Disable probabilistic drain in DrainManager if DrainStrategy == Immediate

Add integration tests
Risk Level: Low.
Testing: Integration tests, verify that the race condition from envoyproxy#11240 does not occur if the probabilistic drain is disabled.

Signed-off-by: Auni Ahsan <[email protected]>
Signed-off-by: yashwant121 <[email protected]>
songhu pushed a commit to songhu/envoy that referenced this pull request Jun 25, 2020
Add DrainStrategy enum to Options with Graceful and Immediate
Disable probabilistic drain in DrainManager if DrainStrategy == Immediate

Add integration tests
Risk Level: Low.
Testing: Integration tests, verify that the race condition from envoyproxy#11240 does not occur if the probabilistic drain is disabled.

Signed-off-by: Auni Ahsan <[email protected]>
yashwant121 pushed a commit to yashwant121/envoy that referenced this pull request Jul 24, 2020
Add DrainStrategy enum to Options with Graceful and Immediate
Disable probabilistic drain in DrainManager if DrainStrategy == Immediate

Add integration tests
Risk Level: Low.
Testing: Integration tests, verify that the race condition from envoyproxy#11240 does not occur if the probabilistic drain is disabled.

Signed-off-by: Auni Ahsan <[email protected]>
Signed-off-by: yashwant121 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.