-
Notifications
You must be signed in to change notification settings - Fork 659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark: HTTP2 fix cancel_stream_error #2612
base: main
Are you sure you want to change the base?
Conversation
Fixes Jetty HTTP2 cancel_stream_error by increasing the flow control limit. jetty.http2.rateControl.maxEventsPerSecond=2000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DRAFT; I still see the error and I'm confused
Could you please walk me through the process of running this specific benchmark which is giving the error, such as a command, or anything which helps me re-produce it on my system. |
The benchmark module uses JettySolrRunner via MiniClusterState. Googling for this is inconclusive as to the cause. It seems the server is choosing to cancel the client's stream, but we don't know why. I dug into this before and thought it was a rate limiting mechanism, so this PR disables that. Perhaps the next step to debug is to ascertain what jetty config/logging would explain that. Would want to avoid debug level on everything -- too much output. |
I recall encountering a somewhat similar error in the IndexFetcher class when we transitioned to HTTP2. The IndexFetcher class was prematurely closing the client connection and sending a GOAWAY packet to the server. When the flow control limit was reached, it triggered an exception. I suspect the same issue might be occurring in this case. Unfortunately, I can't run the crave script anymore since upgrading to the new Apple Silicon chip. I have to look into that. |
Was there anyway to enable more logs for benchmark? I wonder If it's too much overhead with the current Jetty configuration.
|
Maybe you are unfamiliar with Log4j but look at log4j2-bench.xml in this module. |
MaxConnectionsPerDestination 8
MaxConnectionsPerDestination 16
MaxConnectionsPerDestination 32
MaxConnectionsPerDestination 64
|
One thing I noticed that the moment I increased the maxConnectionPerDestination to 8 and keep on increasing till 64, I did'nt see any exception, able to run the full benchmark without any issue. Below I ran it for 2 and first benchmark immediately failed! MaxConnectionsPerDestination 2
|
MaxConnectionsPerDestination 128
|
Wow; good find! If I should mention that we'll somewhat soon in #2410 have maxConnectionsPerDestination set more appropriately (not 4!) and then this issue here maybe won't be experienced. |
Actually I am still not convinced that issue is related to There is a possibility that Jetty is cancelling stream because one of the shard went into recovery!
This one looks different!
|
We can test the setup where we are trying to index document and shard is down (rewriting the zookeeper cluster state)and see If we are getting the same Also In some cases, I encountered an Out of Memory exception, suggesting we might need to scale it down a bit, as the current setup seems somewhat unstable. |
This error is not related, just make sure your version field has docValues enabled |
I'm surprised / suspicious app level concerns/errors (Solr returning an HTTP error to the client) would result in a lower level HTTP TCP stream cancellation of some kind. |
We are using jetty 10, right? |
Yes; our versions are in |
Latest Benchmark with different values for maxConnectionPerDestination
It's a good improvement as we increase maxConnectionPerStream. The pattern I see so far that If we increase the maxConnectionPerDestination then yes there are failures, but all of them are related to Out Of Memory exception rather than Clearly, increasing EDIT Wait I was looking at the wrong value, actually there is no improvement as we increase |
There should be way to provide some sort of setting by which user can control this setting and then can play with it. We do have We can also introduce another setting for http2 in UpdateShardHandler. |
I inquired about the Jetty version because I posted a question regarding the cancel_stream error on the Jetty forum, and they informed me that Jetty 10 is at the end of the support and Please try Jetty 12 and report back if the problem still happens. Note that Jetty 12 requires JDK 17! |
|
Thanks for getting those numbers, very interesting that it had no effect. |
This is a low priority thing; feel free to abandon this until we get to Jetty 12 and JDK 17. In the mean time, we know maxConnectionPerDestination is misconfigured to be too low; will be increased by default quite a bit -- Houston; I'm looking at you :-) |
Fixes Jetty HTTP2 cancel_stream_error by removing the flow control limit.
I was encountering this issue with CloudIndexing, esp when using single-node, single-replica, single-shard -- like 30K ops/sec. It'd routinely fail. I bumped this to 1000, then still hit it sometimes, then bumped to 2000. I haven't run it much since then to be honest. Then I saw how to disable it, which looks a little weird TBH (the default method returns a NO_RATE_CONTROL impl).