Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Prevents internal http2 requests from having shared connections closed #8507

Merged

Conversation

AlanConfluent
Copy link
Member

Description

Related to #8505. Since we currently close connections upon being finished with a request, this disables multiplexing so that shared connections are not closed, affecting other requests.

Testing done

Ran locally.

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@AlanConfluent AlanConfluent requested a review from a team as a code owner December 13, 2021 21:56
@@ -381,6 +381,12 @@
+ " otherwise be verbose. Note that this works on the entire URI, respecting the "
+ KSQL_ENDPOINT_LOGGING_LOG_QUERIES_CONFIG + " configuration";

public static final String KSQL_INTERNAL_HTTP2_MAX_POOL_SIZE_CONFIG
= "ksql.internal.http2.max.pool.size";
public static final int KSQL_INTERNAL_HTTP2_MAX_POOL_SIZE_DEFAULT = 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the risk of setting this even higher, even (say) 10k? I know we've been defensive before and it's often just bit us unnecessarily. I'm wondering if this could be the case here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other limits (the port range that the kernel sets aside for listening connections, the file handle limit, etc.). We wanted to shoot for a limit that's high enough to permit our use case, but low enough that we will hit our own limit before some weird and hard-to-debug subsystem limit.

Personally, I really hope we can find a "better" solution to this problem sooner than later, but I also agree we should patch this problem asap while we seek a "proper" solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the settings of the machine it runs on. I checked some for reference and they were about 1M files open allowed, so I think we could go higher. Have some value in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it at 3k, seems reasonable to allow 1.5k push queries on a 3 node cluster

Copy link
Member

@vvcephei vvcephei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @AlanConfluent !

Not to be "that guy", but would it be appropriate to have a test for this logic? It seems like it might be a bummer to try and set this config in an emergency later only to discover that it doesn't work for some silly reason.

@AlanConfluent
Copy link
Member Author

Thanks, @AlanConfluent !

Not to be "that guy", but would it be appropriate to have a test for this logic? It seems like it might be a bummer to try and set this config in an emergency later only to discover that it doesn't work for some silly reason.

Wrote a test for it!

@AlanConfluent AlanConfluent merged commit 5b0be58 into confluentinc:master Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants