Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/plugin server release lock #10561

Merged
merged 4 commits into from
Apr 27, 2023
Merged

Conversation

StarlightIbuki
Copy link
Contributor

@StarlightIbuki StarlightIbuki commented Mar 27, 2023

Summary

NOTE: when merging, please make sure all changes are included in the message.

When the plugin server sock is not ready and a request comes, the "not ready" error will leak lock, fail to release the instance, and cause the CPU to reach 100%.

Checklist

  • There's an entry in the CHANGELOG
  • The Pull Request has tests N/A. This is hard to test with CI.
  • There is a user-facing docs PR N/A

Issue reference

KAG-1204


-- to prevent a potential dead loop when someone failed to release the ID
wait_count = wait_count + 1
if wait_count > 10 then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the max wait count looks okay to me, but the sleep change doesn't seem necessary, as the sleep call will cause an yield. can you elaborate on it?
for the fix itself, only the reset_instance should be necessary (and lgtm); let's please make sure it's in its own commit so we can do a rebase + merge (squash + merge will mix things up)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, when we start the Kong, the Go plugin server needs time to start. If we waited for a while, we could expect to wait longer.
When the Go plugin server does not respond, Kong can do nothing except for waiting for more rounds, and it might quickly run out of counts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got your point @StarlightIbuki, just checking: is this a theoretic scenario or have you confirmed that the first (10) wait count can be exceeded and therefore the wait increase becomes useful in practice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samugi each round may take less than 0.01 seconds and it's very possible for the go plugin server to take 0.1 seconds to start.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of relying on loops count (which would vary in time on different hardware), does it make sense to go with a naive approach that:

  • wait explictly 0.1 in every loop, check if we are good
  • if have waited for more than 10 seconds, return an error

@ms2008
Copy link
Contributor

ms2008 commented Apr 14, 2023

Is it possible to add a test to verify this behavior?

@StarlightIbuki
Copy link
Contributor Author

StarlightIbuki commented Apr 14, 2023

Is it possible to add a test to verify this behavior?

It's difficult and could be a source of flakiness.

Copy link
Contributor

@ms2008 ms2008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chronolaw
Copy link
Contributor

DO NOT SQUASH WHEN MERGING! There are 2 separate commits that need to be merged.

Now this PR has 4 commits, should we rebase it?

@StarlightIbuki
Copy link
Contributor Author

DO NOT SQUASH WHEN MERGING! There are 2 separate commits that need to be merged.

Now this PR has 4 commits, should we rebase it?

Let's squash and merge. But keep the 2 commits' message.

@fffonion fffonion merged commit 4646065 into master Apr 27, 2023
@fffonion fffonion deleted the fix/plugin-server-release-lock branch April 27, 2023 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants