-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC Pauses makes maxInflightRequest highly volatile when response times are lower than the pauses itself #156
Comments
I think you either tune GC or change architecture so there are some sort of sidecar proxy, which can observe latency from the outside and not affected by the application GC. Second approach seems way harder 😃 |
Hi @IgorPerikov , thanks for your answer! |
oh, I was answering another "question" 😅I thought you had problems with calculating rtt time correctly because of GC Now I see.
It seems they should've not been processed according to your configuration. You limited amount of inflight requests and long GC cycle means your application struggles, so it should reject some to be able to recover from GC impact. If your service is latency-critical it might be fine to reject them(retried request will likely come to less busy server), if extra waiting is fine - you can queue them |
RTT time calculation is affected for sure, but since I'm using a windowed strategy using percentile 0.5 instead of an average it does not implies a real issue. I'm not sure if a long GC cycle implies that the application is truly struggling. Before implementing the concurrency limiter there were some timeout in the client side (400 ms), taking into consideration network times. But I would like to show one quick example where the outcome is not the best. Let's say that my max amount of concurrent request is 3. What if R6 has arrived 1ms before the GC pause finishs? that request will be rejected even though having a lifetime duration much lower than R4 that will be answered, and it may trigger a timeout in the client side if the GC pause was long enough. This scenario encourages the solution that you've mentioned before of having a sidecar proxy, because the proxy would've rejected R4,R5 and R6 immediately. |
From my point of view, of course it is (I even had a production incident because of long GC 😄). Surely, long GC is a false positive sensor. Imagine changing your api so it will return much more data (thus, allocating more objects on heap). In that case rtt might go up and system will spend more time on GC, which means that previous static limit no more valid, because operations are more expensive now. If you will let them flow same as before - server will be flooded by work, start slowely degrade, spending more cpu cycles on gc will affect latency and total healthiness, therefore is a signal to refuse exceeding requests and probably lowering limit |
I think that is the key of my question. But also, I believe that there has to be a way to avoid falling in those false positives. Thanks for sharing your ideas! |
Hi.
I've just ask in stackoverflow this question.
https://stackoverflow.com/questions/59311752/how-to-limit-concurrency-in-a-webapp-when-gc-pauses-last-more-than-the-average-r#
I've forked this project and implemented in an application and I realized that the GC Pauses last long than my average response time, so when tracking the value of maxInflightRequest I can see that when a GC (Minor o Major) is performed the value of maxInflightRequest goes up and reach the threshold that I've configured stressing the application. So I'm having rejections that should've been processed.
All the details are in the stackoverflow question
The text was updated successfully, but these errors were encountered: