Skip to content

Commit

Permalink
upstream: load distribution in Total Panic mode (#9343)
Browse files Browse the repository at this point in the history
Changed how load is calculated when all priority levels are in panic mode. Each priority level receives percentage of the traffic related to the number of hosts in that priority regardless of the health status of the hosts. This smooths out how traffic is shifted when hosts become unhealthy. See #4685 for design proposal and discussion.

Signed-off-by: Christoph Pakulski <[email protected]>
  • Loading branch information
cpakulski authored and snowp committed Jan 24, 2020
1 parent ba79d66 commit 757e0fb
Show file tree
Hide file tree
Showing 7 changed files with 204 additions and 45 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,15 @@ priority.

Panic mode can be disabled by setting the panic threshold to 0%.

If all hosts become unhealthy normalized total health is 0%, and if the panic threshold is above 0%
all traffic will be redirected to P=0.
Load distribution is calculated as described above as long as there are priority levels not in panic mode.
When all priority levels enter the panic mode, load calculation algorithm changes.
In this case each priority level receives traffic relative to the number of hosts in that priority level
in relation to the number of hosts in all priority levels.
For example, if there are 2 priorities P=0 and P=1 and each of them consists of 5 hosts, each level will
receive 50% of the traffic.
If there are 2 hosts in priority P=0 and 8 hosts in priority P=1, priority P=0 will receive 20% of the
traffic and priority P=1 will receive 80% of the traffic.

However, if the panic threshold is 0% for any priority, that priority will never enter panic mode.
In this case if all hosts are unhealthy, Envoy will fail to select a host and will instead immediately
return error responses with "503 - no healthy upstream".
Expand Down
1 change: 1 addition & 0 deletions docs/root/intro/version_history.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ Version history
================
* config: use type URL to select an extension whenever the config type URL (or its previous versions) uniquely identify a typed extension, see :ref:`extension configuration <config_overview_extension_configuration>`.
* retry: added a retry predicate that :ref:`rejects hosts based on metadata. <envoy_api_field_route.RetryPolicy.retry_host_predicate>`
* upstream: changed load distribution algorithm when all priorities enter :ref:`panic mode<arch_overview_load_balancing_panic_threshold>`.

1.13.0 (January 20, 2020)
=========================
Expand Down
90 changes: 76 additions & 14 deletions source/common/upstream/load_balancer_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -134,9 +134,13 @@ LoadBalancerBase::LoadBalancerBase(
// - normalized total health is < 100%. There are not enough healthy hosts to handle the load.
// Continue distributing the load among priority sets, but turn on panic mode for a given priority
// if # of healthy hosts in priority set is low.
// - normalized total health is 0%. All hosts are down. Redirect 100% of traffic to P=0.
// And if panic threshold > 0% then enable panic mode for P=0, otherwise disable.

// - all host sets are in panic mode. Situation called TotalPanic. Load distribution is
// calculated based on the number of hosts in each priority regardless of their health.
// - all hosts in all priorities are down (normalized total health is 0%). If panic
// threshold > 0% the cluster is in TotalPanic (see above). If panic threshold == 0
// then priorities are not in panic, but there are no healthy hosts to route to.
// In this case just mark P=0 as recipient of 100% of the traffic (nothing will be routed
// to P=0 anyways as there are no healthy hosts there).
void LoadBalancerBase::recalculatePerPriorityState(uint32_t priority,
const PrioritySet& priority_set,
HealthyAndDegradedLoad& per_priority_load,
Expand Down Expand Up @@ -183,9 +187,9 @@ void LoadBalancerBase::recalculatePerPriorityState(uint32_t priority,
const uint32_t normalized_total_availability =
calculateNormalizedTotalAvailability(per_priority_health, per_priority_degraded);
if (normalized_total_availability == 0) {
// Everything is terrible. Send all load to P=0.
// In this one case sumEntries(per_priority_load) != 100 since we sinkhole all traffic in P=0.
per_priority_load.healthy_priority_load_.get()[0] = 100;
// Everything is terrible. There is nothing to calculate here.
// Let recalculatePerPriorityPanic and recalculateLoadInTotalPanic deal with
// load calculation.
return;
}

Expand Down Expand Up @@ -239,24 +243,82 @@ void LoadBalancerBase::recalculatePerPriorityPanic() {
const uint64_t panic_threshold = std::min<uint64_t>(
100, runtime_.snapshot().getInteger(RuntimePanicThreshold, default_healthy_panic_percent_));

// Panic mode is disabled only when panic_threshold is 0%.
if (panic_threshold > 0 && normalized_total_availability == 0) {
// Everything is terrible. All load should be to P=0. Turn on panic mode.
ASSERT(per_priority_load_.healthy_priority_load_.get()[0] == 100);
per_priority_panic_[0] = true;
// This is corner case when panic is disabled and there is no hosts available.
// LoadBalancerBase::choosePriority method expects that the sum of
// load percentages always adds up to 100.
// To satisfy that requirement 100% is assigned to P=0.
// In reality no traffic will be routed to P=0 priority, because
// the panic mode is disabled and LoadBalancer will try to find
// a healthy node and none is available.
if (panic_threshold == 0 && normalized_total_availability == 0) {
per_priority_load_.healthy_priority_load_.get()[0] = 100;
return;
}

bool total_panic = true;
for (size_t i = 0; i < per_priority_health_.get().size(); ++i) {
// For each level check if it should run in panic mode. Never set panic mode if
// normalized total health is 100%, even when individual priority level has very low # of
// healthy hosts.
const HostSet& priority_host_set = *priority_set_.hostSetsPerPriority()[i];
per_priority_panic_[i] =
(normalized_total_availability == 100 ? false : isGlobalPanic(priority_host_set));
(normalized_total_availability == 100 ? false : isHostSetInPanic(priority_host_set));
total_panic = total_panic && per_priority_panic_[i];
}

// If all priority levels are in panic mode, load distribution
// is done differently.
if (total_panic) {
recalculateLoadInTotalPanic();
}
}

// recalculateLoadInTotalPanic method is called when all priority levels
// are in panic mode. The load distribution is done NOT based on number
// of healthy hosts in the priority, but based on number of hosts
// in each priority regardless of its health.
void LoadBalancerBase::recalculateLoadInTotalPanic() {
// First calculate total number of hosts across all priorities regardless
// whether they are healthy or not.
const uint32_t total_hosts_count = std::accumulate(
priority_set_.hostSetsPerPriority().begin(), priority_set_.hostSetsPerPriority().end(), 0,
[](size_t acc, const std::unique_ptr<Envoy::Upstream::HostSet>& host_set) {
return acc + host_set->hosts().size();
});

if (0 == total_hosts_count) {
// Backend is empty, but load must be distributed somewhere.
per_priority_load_.healthy_priority_load_.get()[0] = 100;
return;
}

// Now iterate through all priority levels and calculate how much
// load is supposed to go to each priority. In panic mode the calculation
// is based not on the number of healthy hosts but based on the number of
// total hosts in the priority.
uint32_t total_load = 100;
int32_t first_noempty = -1;
for (size_t i = 0; i < per_priority_panic_.size(); i++) {
const HostSet& host_set = *priority_set_.hostSetsPerPriority()[i];
const auto hosts_num = host_set.hosts().size();

if ((-1 == first_noempty) && (0 != hosts_num)) {
first_noempty = i;
}
const uint32_t priority_load = 100 * hosts_num / total_hosts_count;
per_priority_load_.healthy_priority_load_.get()[i] = priority_load;
per_priority_load_.degraded_priority_load_.get()[i] = 0;
total_load -= priority_load;
}

// Add the remaining load to the first not empty load.
per_priority_load_.healthy_priority_load_.get()[first_noempty] += total_load;

// The total load should come up to 100%.
ASSERT(100 == std::accumulate(per_priority_load_.healthy_priority_load_.get().begin(),
per_priority_load_.healthy_priority_load_.get().end(), 0));
}

std::pair<HostSet&, LoadBalancerBase::HostAvailability>
LoadBalancerBase::chooseHostSet(LoadBalancerContext* context) {
if (context) {
Expand Down Expand Up @@ -461,7 +523,7 @@ HostConstSharedPtr LoadBalancerBase::chooseHost(LoadBalancerContext* context) {
return host;
}

bool LoadBalancerBase::isGlobalPanic(const HostSet& host_set) {
bool LoadBalancerBase::isHostSetInPanic(const HostSet& host_set) {
uint64_t global_panic_threshold = std::min<uint64_t>(
100, runtime_.snapshot().getInteger(RuntimePanicThreshold, default_healthy_panic_percent_));
const auto host_count = host_set.hosts().size() - host_set.excludedHosts().size();
Expand Down Expand Up @@ -593,7 +655,7 @@ ZoneAwareLoadBalancerBase::hostSourceToUse(LoadBalancerContext* context) {
return hosts_source;
}

if (isGlobalPanic(localHostSet())) {
if (isHostSetInPanic(localHostSet())) {
stats_.lb_local_cluster_not_ok_.inc();
// If the local Envoy instances are in global panic, and we should not fail traffic, do
// not do locality based routing.
Expand Down
9 changes: 8 additions & 1 deletion source/common/upstream/load_balancer_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,14 @@ class LoadBalancerBase : public LoadBalancer {
* majority of hosts are unhealthy we'll be likely in a panic mode. In this case we'll route
* requests to hosts regardless of whether they are healthy or not.
*/
bool isGlobalPanic(const HostSet& host_set);
bool isHostSetInPanic(const HostSet& host_set);

/**
* Method is called when all host sets are in panic mode.
* In such state the load is distributed based on the number of hosts
* in given priority regardless of their health.
*/
void recalculateLoadInTotalPanic();

LoadBalancerBase(const PrioritySet& priority_set, ClusterStats& stats, Runtime::Loader& runtime,
Runtime::RandomGenerator& random,
Expand Down
117 changes: 97 additions & 20 deletions test/common/upstream/load_balancer_impl_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,6 @@ class LoadBalancerBaseTest : public LoadBalancerTestBase {
}

for (; i < (num_healthy_hosts + num_degraded_hosts + num_excluded_hosts); ++i) {
host_set.degraded_hosts_.push_back(host_set.hosts_[i]);
host_set.excluded_hosts_.push_back(host_set.hosts_[i]);
}
host_set.runCallbacks({}, {});
Expand Down Expand Up @@ -130,10 +129,18 @@ TEST_P(LoadBalancerBaseTest, PrioritySelection) {
HealthyAndDegradedLoad priority_load{Upstream::HealthyLoad({100, 0, 0}),
Upstream::DegradedLoad({0, 0, 0})};
EXPECT_CALL(context, determinePriorityLoad(_, _)).WillRepeatedly(ReturnRef(priority_load));
// With both the primary and failover hosts unhealthy, we should select an
// unhealthy primary host.
EXPECT_EQ(100, lb_.percentageLoad(0));
EXPECT_EQ(0, lb_.percentageLoad(1));
// Primary and failover are in panic mode. Load distribution is based
// on the number of hosts regardless of their health.
EXPECT_EQ(50, lb_.percentageLoad(0));
EXPECT_EQ(50, lb_.percentageLoad(1));
EXPECT_EQ(&host_set_, &lb_.chooseHostSet(&context).first);

// Modify number of hosts in failover, but leave them in the unhealthy state
// primary and secondary are in panic mode, so load distribution is
// based on number of host regardless of their health.
updateHostSet(failover_host_set_, 2, 0);
EXPECT_EQ(34, lb_.percentageLoad(0));
EXPECT_EQ(66, lb_.percentageLoad(1));
EXPECT_EQ(&host_set_, &lb_.chooseHostSet(&context).first);

// Update the priority set with a new priority level P=2 and ensure the host
Expand Down Expand Up @@ -302,12 +309,27 @@ TEST_P(LoadBalancerBaseTest, GentleFailover) {
ASSERT_THAT(getPanic(), ElementsAre(false, false));

// Health P=0 == 100*1.4 == 35 P=1 == 35
// Since 4 hosts are excluded and are unhealthy, P=0 should be considered fully unavailable.
// Total health = 35% is less than 100%. Panic should trigger.
// Total health = 35% is less than 100%.
// All priorities are in panic mode (situation called TotalPanic)
// Load is distributed based on number of hosts regardless of their health status.
// P=0 and P=1 have 4 hosts each so each priority will receive 50% of the traffic.
updateHostSet(host_set_, 4 /* num_hosts */, 0 /* num_healthy_hosts */, 0 /* num_degraded_hosts */,
4 /* num_excluded_hosts */);
updateHostSet(failover_host_set_, 4 /* num_hosts */, 1 /* num_healthy_hosts */);
ASSERT_THAT(getLoadPercentage(), ElementsAre(0, 100));
ASSERT_THAT(getLoadPercentage(), ElementsAre(50, 50));
ASSERT_THAT(getPanic(), ElementsAre(true, true));

// Make sure that in TotalPanic mode (all levels are in Panic),
// load distribution depends only on number of hosts.
// excluded_hosts should not be taken into account.
// P=0 has 4 hosts with 1 excluded, P=1 has 6 hosts with 2 excluded.
// P=0 should receive 4/(4+6)=40% of traffic
// P=1 should receive 6/(4+6)=60% of traffic
updateHostSet(host_set_, 4 /* num_hosts */, 0 /* num_healthy_hosts */, 0 /* num_degraded_hosts */,
1 /* num_excluded_hosts */);
updateHostSet(failover_host_set_, 6 /* num_hosts */, 1 /* num_healthy_hosts */,
0 /* num_degraded_hosts */, 2 /* num_excluded_hosts */);
ASSERT_THAT(getLoadPercentage(), ElementsAre(40, 60));
ASSERT_THAT(getPanic(), ElementsAre(true, true));
}

Expand Down Expand Up @@ -382,11 +404,13 @@ TEST_P(LoadBalancerBaseTest, GentleFailoverWithExtraLevels) {

// Levels P=0 and P=1 are totally down. P=2 is 40*1.4=56%% healthy.
// 100% of the traffic should go to P=2. All levels P=0, P=1 and P=2 should
// be in panic mode even though P=0 and P=1 do not receive any load.
// be in panic mode.
// Since all levels are in panic mode load distribution is based
// on number of hosts in each level.
updateHostSet(host_set_, 5 /* num_hosts */, 0 /* num_healthy_hosts */);
updateHostSet(failover_host_set_, 5 /* num_hosts */, 0 /* num_healthy_hosts */);
updateHostSet(tertiary_host_set_, 5 /* num_hosts */, 2 /* num_healthy_hosts */);
ASSERT_THAT(getLoadPercentage(), ElementsAre(0, 0, 100));
ASSERT_THAT(getLoadPercentage(), ElementsAre(34, 33, 33));
ASSERT_THAT(getDegradedLoadPercentage(), ElementsAre(0, 0, 0));
ASSERT_THAT(getPanic(), ElementsAre(true, true, true));

Expand All @@ -402,21 +426,34 @@ TEST_P(LoadBalancerBaseTest, GentleFailoverWithExtraLevels) {
ASSERT_THAT(getDegradedLoadPercentage(), ElementsAre(0, 0, 0));
ASSERT_THAT(getPanic(), ElementsAre(false, false, false));

// All levels are completely down. 100% of traffic should go to P=0
// and P=0 should be in panic mode
// All levels are completely down - situation called TotalPanic.
// Load is distributed based on the number
// of hosts in the priority in relation to the total number of hosts.
// Here the total number of hosts is 10.
// priority 0 will receive 5/10: 50% of the traffic
// priority 1 will receive 3/10: 30% of the traffic
// priority 2 will receive 2/10: 20% of the traffic
updateHostSet(host_set_, 5 /* num_hosts */, 0 /* num_healthy_hosts */);
updateHostSet(failover_host_set_, 5 /* num_hosts */, 0 /* num_healthy_hosts */);
updateHostSet(tertiary_host_set_, 5 /* num_hosts */, 0 /* num_healthy_hosts */);
ASSERT_THAT(getLoadPercentage(), ElementsAre(100, _, _));
updateHostSet(failover_host_set_, 3 /* num_hosts */, 0 /* num_healthy_hosts */);
updateHostSet(tertiary_host_set_, 2 /* num_hosts */, 0 /* num_healthy_hosts */);
ASSERT_THAT(getLoadPercentage(), ElementsAre(50, 30, 20));
ASSERT_THAT(getDegradedLoadPercentage(), ElementsAre(0, 0, 0));
ASSERT_THAT(getPanic(), ElementsAre(true, _, _));
ASSERT_THAT(getPanic(), ElementsAre(true, true, true));

// Rounding errors should be picked up by the first healthy priority.
// Rounding errors should be picked up by the first priority.
// All priorities are in panic mode - situation called TotalPanic.
// Load is distributed based on the number
// of hosts in the priority in relation to the total number of hosts.
// Total number of hosts is 5+6+3=14.
// priority 0 should receive 5/14=37% of traffic
// priority 1 should receive 6/14=42% of traffic
// priority 2 should receive 3/14=21% of traffic
updateHostSet(host_set_, 5 /* num_hosts */, 0 /* num_healthy_hosts */);
updateHostSet(failover_host_set_, 5 /* num_hosts */, 2 /* num_healthy_hosts */);
updateHostSet(tertiary_host_set_, 5 /* num_hosts */, 1 /* num_healthy_hosts */);
ASSERT_THAT(getLoadPercentage(), ElementsAre(0, 67, 33));
updateHostSet(failover_host_set_, 6 /* num_hosts */, 2 /* num_healthy_hosts */);
updateHostSet(tertiary_host_set_, 3 /* num_hosts */, 1 /* num_healthy_hosts */);
ASSERT_THAT(getLoadPercentage(), ElementsAre(37, 42, 21));
ASSERT_THAT(getDegradedLoadPercentage(), ElementsAre(0, 0, 0));
ASSERT_THAT(getPanic(), ElementsAre(true, true, true));

// Load should spill over into degraded.
updateHostSet(host_set_, 5 /* num_hosts */, 0 /* num_healthy_hosts */,
Expand All @@ -429,13 +466,53 @@ TEST_P(LoadBalancerBaseTest, GentleFailoverWithExtraLevels) {

// Rounding errors should be picked up by the first priority with degraded hosts when
// there are no healthy priorities.
// Disable panic threshold to prevent total panic from kicking in.
EXPECT_CALL(runtime_.snapshot_, getInteger("upstream.healthy_panic_threshold", 50))
.WillRepeatedly(Return(0));
updateHostSet(host_set_, 5 /* num_hosts */, 0 /* num_healthy_hosts */);
updateHostSet(failover_host_set_, 5 /* num_hosts */, 0 /* num_healthy_hosts */,
2 /* num_degraded_hosts */);
updateHostSet(tertiary_host_set_, 5 /* num_hosts */, 0 /* num_healthy_hosts */,
1 /* num_degraded_hosts */);
ASSERT_THAT(getLoadPercentage(), ElementsAre(0, 0, 0));
ASSERT_THAT(getDegradedLoadPercentage(), ElementsAre(0, 67, 33));

// Simulate Total Panic mode. There is no healthy hosts, but there are
// degraded hosts. Because there is Total Panic, load is distributed
// based just on number of hosts in priorities regardless of its health.
// Rounding errors should be picked up by the first priority.
// Enable back panic threshold.
EXPECT_CALL(runtime_.snapshot_, getInteger("upstream.healthy_panic_threshold", 50))
.WillRepeatedly(Return(50));
updateHostSet(host_set_, 5 /* num_hosts */, 0 /* num_healthy_hosts */);
updateHostSet(failover_host_set_, 5 /* num_hosts */, 0 /* num_healthy_hosts */,
2 /* num_degraded_hosts */);
updateHostSet(tertiary_host_set_, 5 /* num_hosts */, 0 /* num_healthy_hosts */,
1 /* num_degraded_hosts */);
ASSERT_THAT(getLoadPercentage(), ElementsAre(34, 33, 33));
ASSERT_THAT(getDegradedLoadPercentage(), ElementsAre(0, 0, 0));

// Rounding error should be allocated to the first non-empty priority
// In this test P=0 is not empty.
updateHostSet(host_set_, 3 /* num_hosts */, 0 /* num_healthy_hosts */);
updateHostSet(failover_host_set_, 3 /* num_hosts */, 0 /* num_healthy_hosts */);
updateHostSet(tertiary_host_set_, 3 /* num_hosts */, 0 /* num_healthy_hosts */);
ASSERT_THAT(getPanic(), ElementsAre(true, true, true));
ASSERT_THAT(getLoadPercentage(), ElementsAre(34, 33, 33));

// Rounding error should be allocated to the first non-empty priority
// In this test P=0 is empty and P=1 is not empty.
updateHostSet(host_set_, 0 /* num_hosts */, 0 /* num_healthy_hosts */);
updateHostSet(failover_host_set_, 6 /* num_hosts */, 0 /* num_healthy_hosts */);
updateHostSet(tertiary_host_set_, 3 /* num_hosts */, 0 /* num_healthy_hosts */);
ASSERT_THAT(getPanic(), ElementsAre(true, true, true));
ASSERT_THAT(getLoadPercentage(), ElementsAre(0, 67, 33));
// In this test P=1 is not empty.
updateHostSet(host_set_, 3 /* num_hosts */, 0 /* num_healthy_hosts */);
updateHostSet(failover_host_set_, 3 /* num_hosts */, 0 /* num_healthy_hosts */);
updateHostSet(tertiary_host_set_, 3 /* num_hosts */, 0 /* num_healthy_hosts */);
ASSERT_THAT(getPanic(), ElementsAre(true, true, true));
ASSERT_THAT(getLoadPercentage(), ElementsAre(34, 33, 33));
}

TEST_P(LoadBalancerBaseTest, BoundaryConditions) {
Expand Down
Loading

0 comments on commit 757e0fb

Please sign in to comment.