Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CircuitBreakerOpenException at high load persisting / restoring #24

Closed
thjaeckle opened this issue Jun 1, 2015 · 12 comments
Closed

CircuitBreakerOpenException at high load persisting / restoring #24

thjaeckle opened this issue Jun 1, 2015 · 12 comments

Comments

@thjaeckle
Copy link
Contributor

Hi.

I am currently evaluating this nice akka persistence plugin for mongoDB and it suits really well.
But now I ran into some problems regarding the "CircuitBreaker" pattern included.

Under high load of journal writes or journal reads (when restoring) the CircuitBreakerOpenException is thrown.
I have no clue what goes wrong (is there an exception in Casbah driver?) or how to handle this correctly.

I don't say this is wrong - I just don't know what to do with the "CircuitBreakerOpenException" at this point.

Any hints?

@marcuslinke
Copy link
Contributor

Please take a look at #22

@scullxbones
Copy link
Owner

Thanks @marcuslinke !

@thomasjaeckle let me know if you have more questions after looking at #22.

@thjaeckle
Copy link
Contributor Author

Thanks to both of you, @marcuslinke and @scullxbones - I dindn't think about showing the resolved issues :)
Your settings @marcuslinke also work for my scenario:

akka.contrib.persistence.mongodb.mongo.journal-wtimeout = 10s
akka.contrib.persistence.mongodb.mongo.breaker.maxTries = 0
akka.contrib.persistence.mongodb.mongo.breaker.timeout.call = 10s

That way I have a constant rate of ~3.000 inserts per second (all on the same dev machine) without loosing events, thanks :)

I still have no clue how to investigate which exception caused the CircuitBreakerOpenException (and I'm getting lost in scala code - I use Java for Akka coding). Any tip for that?

@scullxbones
Copy link
Owner

I took a look over the code, both in akka and in the plugin. Based on my reading, if there's no exception preceding the CircuitBreakerOpenException then it's a timeout that's occurring. By design, the open exception is only thrown while the breaker is open, and doesn't retain the original cause.

You can test that the timeout is the issue by setting the timeout.call to 0s and increasing maxTries to 1. Setting the timeout to 0 disables the timeout detection logic. If the issue does not reproduce, then it's a timeout problem.

@thjaeckle
Copy link
Contributor Author

Jep, if I set timeout.call to 0s the error does not reproduce as well. Thanks - now I understood how the CircuitBreakerOpenException is used in your fine library :)

@marcuslinke
Copy link
Contributor

@scullxbones Any chance to adopt the default settings accordingly?

@scullxbones
Copy link
Owner

@marcuslinke i'm not a big fan of these becoming the default settings, mainly because you've disabled the circuit breaker. I've been convinced that fail fast using circuit breakers is the correct default model to use. No timeouts sets you up for cascading failures which can be very ugly things. For more on this, I can't recommend enough the book Release It by Michael Nygard. I think ultimately users will need to understand what is causing the timeouts and size accordingly.

What I can do is add an item to the README.md about timeouts so that it's easier for people to understand what's happening, and adjust as needed.

@marcuslinke
Copy link
Contributor

@scullxbones It would be great to have this documented in the README. Nevertheless as I understand these settings it doesn't disable the circuit breaker:

maxTries = 0 - switch to OPEN state right after the first failure
journal-wtimeout = 10s - a failure is detected if a write to the journal takes more than 10 second
timeout.call = 10s - switch back to CLOSED state after 10 seconds again

Or am I missing something here?

@scullxbones
Copy link
Owner

hi @marcuslinke ... I double-checked the source code of the CircuitBreaker in akka, and if you set maxTries = 0 that disables it, because it uses incrementAndGet to compare to maxTries. In this case, the timeouts do not matter.

To summarize, maxTries = 0 turns the breaker off completely. timeout = 0s disables just the timeout feature of the breaker, assuming your maxTries > 0. Of course, this is fine for prototyping and testing. For production, depending of course on your app's requirements, you should consider re-enabling the breaker and potentially retry-ing while the breaker is open (exponential back-off would be nice). What you're seeing is essentially back-pressure from mongodb, that needs to be handled appropriately.

@marcuslinke
Copy link
Contributor

@scullxbones Thanks for clarifying this. So maxTries should be set to 1 to achive the described behaviour above. So what about to define

akka.contrib.persistence.mongodb.mongo.journal-wtimeout = 10s
akka.contrib.persistence.mongodb.mongo.breaker.maxTries = 1
akka.contrib.persistence.mongodb.mongo.breaker.timeout.call = 10s

as the default?

@scullxbones
Copy link
Owner

@marcuslinke i think those defaults are way too long to be honest. That said, this discussion brought a question to mind - i'm wondering if the circuit breaker is being applied correctly in the plugin. Do you know what the plugin was doing when you got the open breaker errors? It may be that it's not being applied at a fine enough grain on replay. It should be fine on append I think.

Have you hooked up the metrics as described in the README? Is there anything interesting there?

@marcuslinke
Copy link
Contributor

@scullxbones As I remember the CircuitBreakerOpenException was thrown mainly while normal operations so I assume this is what you called 'append' mode. But sometimes it occured while initialization of the system also and forced multiple restarts of the application. Sorry, can't remember exactly.

Also I have to say that i didn't used the metrics library at all. Thats something I have on my list but haven't cared about yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants