Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hds: group endpoint health response by cluster and locality #12452

Merged
merged 56 commits into from
Aug 18, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
4a5e7c5
add backwards-compatible hds protos to better group endpoint health info
drewsortega Jul 17, 2020
39ffd06
formatting fixes for hds protos
drewsortega Jul 17, 2020
0275293
get correct Locality field location
drewsortega Jul 18, 2020
980a8e5
regenerate api shadow and v4alpha protos
drewsortega Jul 18, 2020
b0287eb
retain locality information in cluster config
drewsortega Jul 20, 2020
89f3937
update tests to include locality, attempt locality pulling on respons…
drewsortega Jul 21, 2020
834f84b
update comments to hide unimplemented, mark future deprecation
drewsortega Jul 21, 2020
5ef7776
Merge branch 'hds_proto_rework' into hds_proto_rework_impl
drewsortega Jul 21, 2020
8baaa1d
Merge branch 'master' into hds_proto_rework_impl
drewsortega Jul 22, 2020
0a1c3f7
formatting fixes
drewsortega Jul 22, 2020
2170fde
formatting fix
drewsortega Jul 22, 2020
3d59d43
fix endpoint parsing in HdsCluster by using cluster_ member object ov…
drewsortega Jul 23, 2020
206510e
fill out cluster_endpoints_health proto on response gen
drewsortega Jul 23, 2020
5897d0e
use MessageUtil::hash for flat_hash_map construction
drewsortega Jul 23, 2020
4f8d0e2
use MergeFrom to copy Locality into Cluster constructor
drewsortega Jul 23, 2020
7695766
use Built-in proto hashing for HashMap, remove redundant Locality store
drewsortega Jul 23, 2020
81dbac4
revert whitespace change
drewsortega Jul 23, 2020
6b5a374
compare locality differences on host iteration instead of hash table
drewsortega Jul 24, 2020
4f6680e
remove not needed import
drewsortega Jul 24, 2020
a215f1b
undo extra comment change
drewsortega Jul 24, 2020
5f38b88
add complex hds test specifier builder with many cluster-loc-ends
drewsortega Jul 25, 2020
f90e6b0
change hds test to use loops, add hds cluster name getter
drewsortega Jul 30, 2020
2c27e93
get working unit test with failing destructor
drewsortega Jul 31, 2020
b9cada9
fix mock connections for endpoints to not be shared
drewsortega Aug 3, 2020
3ae54b2
use proper name field from info_ by mocking prod impl on call
drewsortega Aug 3, 2020
7ed5ca2
remove underscores from local vars, remove not needed EXPECT_CALLs
drewsortega Aug 3, 2020
e387f7f
change existing hds intg tests to use new proto construction
drewsortega Aug 3, 2020
bd829dc
remove hidden comment and add deprecate mentioned by hds protos comments
drewsortega Aug 3, 2020
bb686c5
Merge branch 'master' into hds_proto_rework_impl_flat
drewsortega Aug 3, 2020
9520832
upgrade hds intg tests to check for correct updated proto layouts
drewsortega Aug 4, 2020
9fccebb
add final comments, fix some style issues
drewsortega Aug 4, 2020
132655a
Minor testing and comment changes as a response to PR feedback
drewsortega Aug 4, 2020
c9c1336
Update current version history to reflect changes and deprecation.
drewsortega Aug 4, 2020
979049d
update hds intg tests, use temp vars in field checks, better loops
drewsortega Aug 5, 2020
c256c10
hds_test changes, improving readability with temp vars and absl strcat
drewsortega Aug 5, 2020
93b5a85
update hds intg test flaky wait loops to ensure health checks completed
drewsortega Aug 6, 2020
31cd01c
add TODO comment to transition hds protos to v4
drewsortega Aug 6, 2020
04dab14
move ASSERT statements out of helper functions in hds intg tests
drewsortega Aug 7, 2020
56d48d5
fix clang-tidy issues
drewsortega Aug 7, 2020
f38b5ba
Merge branch 'master' into hds_proto_rework_impl_flat
drewsortega Aug 11, 2020
33a10d4
Use unique_ptr in specifier construction, use specific numeric types
drewsortega Aug 11, 2020
b9a494e
More integer type fixes.
drewsortega Aug 11, 2020
d36406c
Use a hosts_by_locality to keep groups by locality in priority set
drewsortega Aug 11, 2020
f9ab70e
Kick CI
drewsortega Aug 11, 2020
d371378
fix variable casing, fix shared pointer assignment, optimizations
drewsortega Aug 11, 2020
5b7c703
fix emplace vs push
drewsortega Aug 12, 2020
f72761b
whitespace fixes
drewsortega Aug 12, 2020
408bdb7
Kick CI
drewsortega Aug 12, 2020
ffa4fee
Merge branch 'master' into hds_proto_rework_impl_flat, Kick CI
drewsortega Aug 12, 2020
58d0695
Kick CI
drewsortega Aug 12, 2020
a5eb281
Kick CI
drewsortega Aug 12, 2020
0ea3f1a
Merge branch 'master' into hds_proto_rework_impl_flat
drewsortega Aug 13, 2020
c0ceb90
Merge branch 'master' into hds_proto_rework_impl_flat
drewsortega Aug 16, 2020
4e9ccec
Kick CI
drewsortega Aug 17, 2020
2c300ae
Kick CI
drewsortega Aug 17, 2020
4f192d4
replace if-else chain with cleaner switch block
drewsortega Aug 17, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions api/envoy/service/health/v3/hds.proto
Original file line number Diff line number Diff line change
Expand Up @@ -128,11 +128,9 @@ message EndpointHealthResponse {
option (udpa.annotations.versioning).previous_message_type =
"envoy.service.discovery.v2.EndpointHealthResponse";

// [#comment:TODO(drewsortega): add deprecate annotation once cluster_endpoints_health is implemented]
// Deprecated - Flat list of endpoint health information.
repeated EndpointHealth endpoints_health = 1;
repeated EndpointHealth endpoints_health = 1 [deprecated = true];

// [#not-implemented-hide:]
// Organize Endpoint health information by cluster.
repeated ClusterEndpointsHealth cluster_endpoints_health = 2;
}
Expand Down
7 changes: 3 additions & 4 deletions api/envoy/service/health/v4alpha/hds.proto

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions docs/root/version_history/current.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ New Features
* dynamic_forward_proxy: added :ref:`use_tcp_for_dns_lookups<envoy_v3_api_field_extensions.common.dynamic_forward_proxy.v3.DnsCacheConfig.use_tcp_for_dns_lookups>` option to use TCP for DNS lookups in order to match the DNS options for :ref:`Clusters<envoy_v3_api_msg_config.cluster.v3.Cluster>`.
* ext_authz filter: added support for emitting dynamic metadata for both :ref:`HTTP <config_http_filters_ext_authz_dynamic_metadata>` and :ref:`network <config_network_filters_ext_authz_dynamic_metadata>` filters.
* grpc-json: support specifying `response_body` field in for `google.api.HttpBody` message.
* hds: added :ref:`cluster_endpoints_health <envoy_v3_api_field_service.health.v3.EndpointHealthResponse.cluster_endpoints_health>` to HDS responses, keeping endpoints in the same groupings as they were configured in the HDS specifier by cluster and locality instead of as a flat list.
* http: added support for :ref:`%DOWNSTREAM_PEER_FINGERPRINT_1% <config_http_conn_man_headers_custom_request_headers>` as custom header.
* http: introduced new HTTP/1 and HTTP/2 codec implementations that will remove the use of exceptions for control flow due to high risk factors and instead use error statuses. The old behavior is used by default, but the new codecs can be enabled for testing by setting the runtime feature `envoy.reloadable_features.new_codec_behavior` to true. The new codecs will be in development for one month, and then enabled by default while the old codecs are deprecated.
* load balancer: added a :ref:`configuration<envoy_v3_api_msg_config.cluster.v3.Cluster.LeastRequestLbConfig>` option to specify the active request bias used by the least request load balancer.
Expand Down Expand Up @@ -85,5 +86,8 @@ Deprecated
----------
* The :ref:`track_timeout_budgets <envoy_v3_api_field_config.cluster.v3.Cluster.track_timeout_budgets>`
field has been deprecated in favor of `timeout_budgets` part of an :ref:`Optional Configuration <envoy_v3_api_field_config.cluster.v3.Cluster.track_cluster_stats>`.
* hds: the :ref:`endpoints_health <envoy_v3_api_field_service.health.v3.EndpointHealthResponse.endpoints_health>`
field has been deprecated in favor of :ref:`cluster_endpoints_health <envoy_v3_api_field_service.health.v3.EndpointHealthResponse.cluster_endpoints_health>` to maintain
grouping by cluster and locality.
* tap: the :ref:`match_config <envoy_v3_api_field_config.tap.v3.TapConfig.match_config>` field has been deprecated in favor of
:ref:`match <envoy_v3_api_field_config.tap.v3.TapConfig.match>` field.
4 changes: 1 addition & 3 deletions generated_api_shadow/envoy/service/health/v3/hds.proto

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 1 addition & 3 deletions generated_api_shadow/envoy/service/health/v4alpha/hds.proto

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

103 changes: 77 additions & 26 deletions source/common/upstream/health_discovery_service.cc
Original file line number Diff line number Diff line change
Expand Up @@ -102,27 +102,50 @@ void HdsDelegate::handleFailure() {
// TODO(lilika): Add support for the same endpoint in different clusters/ports
envoy::service::health::v3::HealthCheckRequestOrEndpointHealthResponse HdsDelegate::sendResponse() {
envoy::service::health::v3::HealthCheckRequestOrEndpointHealthResponse response;

for (const auto& cluster : hds_clusters_) {
// Add cluster health response and set name.
auto* cluster_health =
response.mutable_endpoint_health_response()->add_cluster_endpoints_health();
cluster_health->set_cluster_name(cluster->info()->name());

// Iterate through all hosts in our priority set.
for (const auto& hosts : cluster->prioritySet().hostSetsPerPriority()) {
for (const auto& host : hosts->hosts()) {
auto* endpoint = response.mutable_endpoint_health_response()->add_endpoints_health();
Network::Utility::addressToProtobufAddress(
*host->address(), *endpoint->mutable_endpoint()->mutable_address());
// TODO(lilika): Add support for more granular options of
// envoy::config::core::v3::HealthStatus
if (host->health() == Host::Health::Healthy) {
endpoint->set_health_status(envoy::config::core::v3::HEALTHY);
} else {
if (host->getActiveHealthFailureType() == Host::ActiveHealthFailureType::TIMEOUT) {
endpoint->set_health_status(envoy::config::core::v3::TIMEOUT);
} else if (host->getActiveHealthFailureType() ==
Host::ActiveHealthFailureType::UNHEALTHY) {
endpoint->set_health_status(envoy::config::core::v3::UNHEALTHY);
} else if (host->getActiveHealthFailureType() == Host::ActiveHealthFailureType::UNKNOWN) {
endpoint->set_health_status(envoy::config::core::v3::UNHEALTHY);
// Get a grouping of hosts by locality.
for (const auto& locality_hosts : hosts->hostsPerLocality().get()) {
// For this locality, add the response grouping.
envoy::service::health::v3::LocalityEndpointsHealth* locality_health =
cluster_health->add_locality_endpoints_health();
locality_health->mutable_locality()->MergeFrom(locality_hosts[0]->locality());

// Add all hosts to this locality.
for (const auto& host : locality_hosts) {
// Add this endpoint's health status to this locality grouping.
auto* endpoint = locality_health->add_endpoints_health();
Network::Utility::addressToProtobufAddress(
*host->address(), *endpoint->mutable_endpoint()->mutable_address());
// TODO(lilika): Add support for more granular options of
// envoy::config::core::v3::HealthStatus
if (host->health() == Host::Health::Healthy) {
endpoint->set_health_status(envoy::config::core::v3::HEALTHY);
} else {
NOT_REACHED_GCOVR_EXCL_LINE;
switch (host->getActiveHealthFailureType()) {
case Host::ActiveHealthFailureType::TIMEOUT:
endpoint->set_health_status(envoy::config::core::v3::TIMEOUT);
break;
case Host::ActiveHealthFailureType::UNHEALTHY:
case Host::ActiveHealthFailureType::UNKNOWN:
endpoint->set_health_status(envoy::config::core::v3::UNHEALTHY);
break;
default:
NOT_REACHED_GCOVR_EXCL_LINE;
break;
}
}

// TODO(drewsortega): remove this once we are on v4 and endpoint_health_response is
// removed. Copy this endpoint's health info to the legacy flat-list.
response.mutable_endpoint_health_response()->add_endpoints_health()->MergeFrom(*endpoint);
}
}
}
Expand Down Expand Up @@ -158,8 +181,15 @@ void HdsDelegate::processMessage(
ClusterConnectionBufferLimitBytes);

// Add endpoints to cluster
auto* endpoints = cluster_config.mutable_load_assignment()->add_endpoints();
for (const auto& locality_endpoints : cluster_health_check.locality_endpoints()) {
// add endpoint group by locality to config
auto* endpoints = cluster_config.mutable_load_assignment()->add_endpoints();
// if this group contains locality information, save it.
if (locality_endpoints.has_locality()) {
endpoints->mutable_locality()->MergeFrom(locality_endpoints.locality());
htuch marked this conversation as resolved.
Show resolved Hide resolved
}

// add all endpoints for this locality group to the config
for (const auto& endpoint : locality_endpoints.endpoints()) {
endpoints->add_lb_endpoints()->mutable_endpoint()->mutable_address()->MergeFrom(
endpoint.address());
Expand Down Expand Up @@ -252,13 +282,34 @@ HdsCluster::HdsCluster(Server::Admin& admin, Runtime::Loader& runtime,
{admin, runtime_, cluster_, bind_config_, stats_, ssl_context_manager_, added_via_api_, cm,
local_info, dispatcher, random, singleton_manager, tls, validation_visitor, api});

for (const auto& host : cluster_.load_assignment().endpoints(0).lb_endpoints()) {
initial_hosts_->emplace_back(
new HostImpl(info_, "", Network::Address::resolveProtoAddress(host.endpoint().address()),
nullptr, 1, envoy::config::core::v3::Locality().default_instance(),
envoy::config::endpoint::v3::Endpoint::HealthCheckConfig().default_instance(),
0, envoy::config::core::v3::UNKNOWN));
// Temporary structure to hold Host pointers grouped by locality, to build
// initial_hosts_per_locality_.
std::vector<HostVector> hosts_by_locality;
hosts_by_locality.reserve(cluster_.load_assignment().endpoints_size());

// Iterate over every endpoint in every cluster.
for (const auto& locality_endpoints : cluster_.load_assignment().endpoints()) {
// Add a locality grouping to the hosts sorted by locality.
hosts_by_locality.emplace_back();
drewsortega marked this conversation as resolved.
Show resolved Hide resolved
hosts_by_locality.back().reserve(locality_endpoints.lb_endpoints_size());

for (const auto& host : locality_endpoints.lb_endpoints()) {
// Initialize an endpoint host object.
HostSharedPtr endpoint = std::make_shared<HostImpl>(
info_, "", Network::Address::resolveProtoAddress(host.endpoint().address()), nullptr, 1,
locality_endpoints.locality(),
envoy::config::endpoint::v3::Endpoint::HealthCheckConfig().default_instance(), 0,
envoy::config::core::v3::UNKNOWN);
// Add this host/endpoint pointer to our flat list of endpoints for health checking.
initial_hosts_->push_back(endpoint);
// Add this host/endpoint pointer to our structured list by locality so results can be
// requested by locality.
hosts_by_locality.back().push_back(endpoint);
}
}
// Create the HostsPerLocality.
initial_hosts_per_locality_ =
std::make_shared<Envoy::Upstream::HostsPerLocalityImpl>(std::move(hosts_by_locality), false);
}

ClusterSharedPtr HdsCluster::create() { NOT_IMPLEMENTED_GCOVR_EXCL_LINE; }
Expand Down Expand Up @@ -300,9 +351,9 @@ void HdsCluster::initialize(std::function<void()> callback) {
for (const auto& host : *initial_hosts_) {
host->healthFlagSet(Host::HealthFlag::FAILED_ACTIVE_HC);
}

// Use the ungrouped and grouped hosts lists to retain locality structure in the priority set.
priority_set_.updateHosts(
0, HostSetImpl::partitionHosts(initial_hosts_, HostsPerLocalityImpl::empty()), {},
0, HostSetImpl::partitionHosts(initial_hosts_, initial_hosts_per_locality_), {},
*initial_hosts_, {}, absl::nullopt);
}

Expand Down
1 change: 1 addition & 0 deletions source/common/upstream/health_discovery_service.h
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ class HdsCluster : public Cluster, Logger::Loggable<Logger::Id::upstream> {
bool added_via_api_;

HostVectorSharedPtr initial_hosts_;
HostsPerLocalitySharedPtr initial_hosts_per_locality_;
ClusterInfoConstSharedPtr info_;
std::vector<Upstream::HealthCheckerSharedPtr> health_checkers_;
ProtobufMessage::ValidationVisitor& validation_visitor_;
Expand Down
Loading