Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a deadline timeout when calling Nevergreen server -> CI server #254

Closed
GentlemanHal opened this issue Nov 4, 2018 · 4 comments
Closed
Labels
complicated Changes that are difficult or time consuming to implement improvement Changes that improve an existing feature
Milestone

Comments

@GentlemanHal
Copy link
Member

The current timeouts when calling Nevergreen server -> CI server are for establishing the initial connection and for receiving each packet of data. Whereas the timeouts when calling the UI -> Nevergreen server are for receiving the first packet of data and a deadline for the entire response to complete.

This means if the CI server is responding slowly the Nevergreen server wouldn't timeout but the UI would. Given we've added automatic retries as part of #238 and the monitor view could be polling every 5 seconds, this could result in loads of extra calls to a CI server already responding slowly.

We should add a deadline timeout to the CI server calls that matches the timeout of the UI. This would mean the server timeouts at the same time as the UI, and we wouldn't be stacking calls to a probably broken CI server.

Unfortunately this isn't simple with clj-http, which is the library we are using to make http calls, it requires adding a manual timer and aborting the request if it takes too long as per this example: https://github.com/dakrone/clj-http#cancelling-requests

@GentlemanHal GentlemanHal added the improvement Changes that improve an existing feature label Nov 4, 2018
@GentlemanHal GentlemanHal added the complicated Changes that are difficult or time consuming to implement label Aug 16, 2019
@jainsahab
Copy link
Contributor

Hi @GentlemanHal , I was thinking of solving this issue using circuit breaker pattern. We can break the circuit from nevergreen -> ci server after a certain threshold is reached and when UI is polling it'll get the same response for some time until the circuit is established again. Let me know your thoughts on this and then I can pick this one up.

@GentlemanHal
Copy link
Member Author

Hi @jainsahab , thanks for offering to help!

My first thought is the circuit breaker pattern might be overkill for the issue. I'd be interested in hearing some specific implementations details however.

Some things to consider:

  • The Nevergreen server is currently stateless. This is a good thing that allows us to run a hosted version and allows others to run it themselves easily.
    • So I don't want any kind of DB or persistent storage used.
    • We'd have to make sure any kind of in-memory cache doesn't cause out of memory errors or require ridiculous amounts of memory allocated to be useful.
  • We shouldn't be hiding CI server errors from the users, so we want to return an error if a CI server times out (rather than the last known good response).
  • It's possible to add multiple CI servers which get sent as a single request from the UI to the Nevergreen server. We'd want the working CI servers to return projects and only the broken server to return an error.

I'm thinking that adding the deadline timeout as originally stated (or finding an alternative library that provides the functionality out the box) is the way to close this issue, and maybe doing something more complicated like circuit breaking to solve issue #199.

@jainsahab
Copy link
Contributor

sure @GentlemanHal , Yes. Having a circuit breaker would add the overhead of maintaining persistent storage. and it would be a bit difficult to maintain when having multiple CIs configured on UI.

then, I'll go with the originally stated solution by you. Thanks :)

@GentlemanHal GentlemanHal added this to the v5.0.0 milestone Oct 15, 2019
@GentlemanHal
Copy link
Member Author

Done via pull request #293

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complicated Changes that are difficult or time consuming to implement improvement Changes that improve an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants