Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(balancer): expose balancer_health even when healthchecks are off #5885

Merged
merged 1 commit into from
Mar 16, 2023

Conversation

hishamhm
Copy link
Contributor

The Kong load balancer has concepts of upstream health, which is managed by the healthchecker, and balancer health, which is affected by the healthchecker but also affected by the balancer "health threshold" setting. This means a balancer can be unhealthy (and return HTTP 503) even when healthchecks for the upstream are disabled.

The endpoint /upstreams/<upstream>/health?balancer_health=1 is an "advanced mode" for viewing health, which shows the balancer health as opposed to upstream health. (It is hidden behind a query argument to avoid confusion, as for most cases the upstream health is sufficient and the "balancer" terminology referring to a mostly internal Kong object may also be unclear to end-users.) This endpoint, however, only returns the balancer health, when healthchecks for upstream health are enabled.

This PR changes this behavior so that using ?balancer_health=1 we can always see the balancer health, through a new attribute balancer_health, which always returns HEALTHY or UNHEALTHY (reporting the true state of the balancer), even if the overall upstream health status is HEALTHCHECKS_OFF. This is useful for debugging.

The original attribute health is preserved with the existing semantics for backward compatibility, returning HEALTHY, UNHEALTHY or HEALTHCHECKS_OFF.

@gszr gszr changed the base branch from next to master April 15, 2021 18:20
@guanlan guanlan requested a review from a team as a August 25, 2022 01:38
@pull-request-size pull-request-size bot added size/S and removed size/M labels Oct 5, 2022
@hishamhm hishamhm force-pushed the feat/balancer_health branch 2 times, most recently from aa426b0 to eb2a473 Compare October 5, 2022 17:04
@hishamhm hishamhm changed the title feat(balancer) expose balancer_health even when healthchecks are off feat(balancer): expose balancer_health even when healthchecks are off Oct 5, 2022
@hishamhm
Copy link
Contributor Author

hishamhm commented Oct 6, 2022

Thanks @hutchic for giving me a ping about this! It's been a while! I've just rebased it against the latest master (it wasn't too much trouble in spite of the balancer refactors!) This feature was an old ask from @Tieske, who wanted more diagnostics from the balancer state even when healthchecks were disabled, based on his experience troubleshooting customer issues. It might still be useful!

@hbagdi
Copy link
Member

hbagdi commented Oct 26, 2022

Closing because of lack of activity. We would like to accept this contribution if anyone wishes to pick this back up.

@hbagdi hbagdi closed this Oct 26, 2022
@hbagdi hbagdi deleted the feat/balancer_health branch October 26, 2022 18:05
@hishamhm
Copy link
Contributor Author

lack of activity? I rebased it like 3 hours ago 😆

@hbagdi
Copy link
Member

hbagdi commented Oct 26, 2022

Oops, apologies. Could you please re-open this one?

@hishamhm
Copy link
Contributor Author

hishamhm commented Oct 26, 2022

I can't, the reopen button is grayed out.

@hbagdi hbagdi restored the feat/balancer_health branch October 26, 2022 20:54
@hbagdi hbagdi reopened this Oct 26, 2022
@hbagdi hbagdi requested a review from locao October 26, 2022 21:08
The Kong load balancer has concepts of upstream health, which is managed by
the healthchecker, and balancer health, which is affected by the healthchecker
but also affected by the balancer "health threshold" setting. This means a
_balancer_ can be unhealthy (and return HTTP 503) even when healthchecks for
the upstream are disabled.

The endpoint `/upstreams/<upstream>/health?balancer_health=1` is an "advanced
mode" for viewing health, which shows the _balancer health_ as opposed to
upstream health. (It is hidden behind a query argument to avoid confusion, as
for most cases the upstream health is sufficient and the "balancer"
terminology referring to a mostly internal Kong object may also be unclear to
end-users.) This endpoint, however, only returns the balancer health, when
healthchecks for upstream health are enabled.

This commit changes this behavior so that using `?balancer_health=1` we can always
see the balancer health, through a new attribute `balancer_health`, which
always returns `HEALTHY` or `UNHEALTHY` (reporting the true state of the
balancer), even if the overall upstream health status is `HEALTHCHECKS_OFF`.
This is useful for debugging.

The original attribute `health` is preserved with the existing semantics for
backward compatibility, returning `HEALTHY`, `UNHEALTHY` or
`HEALTHCHECKS_OFF`.

Signed-off-by: Aapo Talvensaari <[email protected]>
@bungle bungle merged commit fdc653d into master Mar 16, 2023
@bungle bungle deleted the feat/balancer_health branch March 16, 2023 10:26
@hishamhm
Copy link
Contributor Author

@bungle thank you! ❤️

@ankujuniyal
Copy link

@bungle @hishamhm just want to understand how this query param work "health?balancer_health=1". In my case even the target exist in upstream im not getting data key and its value, Sometime is show some time it didn't show up.
Like
{"node_id":"908a59ea-1ff1-47eb-89a1-xxxxxxxxx","next":null}
but some time i get {"node_id":"908a59ea-1ff1-47eb-89a1-xxxxxxxxx","data":[***] "next":null}
same i can see in admin logs response byte size is very less

x.x.x.x - - [30/Mar/2023:16:45:37 +0530] "GET /upstreams/908a59ea-1ff1-47eb-89a1-xxxxxxxxx/health?balancer_health=1 HTTP/1.1" 200 62 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/111.0"

x.x.x.x - - [30/Mar/2023:16:18:23 +0530] "GET /upstreams/908a59ea-1ff1-47eb-89a1-xxxxxxxxx/health?balancer_health=1 HTTP/1.1" 200 621 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/111.0"

Kong version:- 2.4.0
lua_version:- "LuaJIT 2.1.0-beta3"

Do let me know how can I debug this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants