-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] multiple tests started failing with NoNodeAvailableException #37275
Comments
Pinging @elastic/es-security |
Pinging @elastic/es-core-infra |
The interesting log lines here seem to indicate a problem with establishing a connection. @tbrooks8 do you have an idea what might be going on here?
|
@tbrooks8 I see some SSL issues here can this be related to #37246 |
This build was run on commit I see a variety of exceptions there:
This means that this transport ( This does not tell us much besides the fact that the other side wanted to close the connection before the handshake was complete (and it wanted it enough to send a close_notify).
This means that we are closing the channel before we could send the CLOSE_NOTIFY message. This happens because:
This is a netty transport failing to send a message because the channel is closed. This test is a mix of nio and netty transports communicating. Unfortunately I do not see an obvious underlying cause. Just messages indicating that channels were closed. Sometimes not in a clean manner. Important to note, it is possible that the transports using netty at not fully sending and receiving their close_notifys. Netty swallows those exceptions (many implementations in the wild and most web browsers do not send and receive). Some of these exceptions are kind noisy. I need to think about if these are all warn level exceptions. |
This happened again in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+intake/1244/consoleText
also:
Seems like the whole test cluster froze up for quite a while and then came back to life eventually. ... The above case also doesn't run into any |
This happens quite a few times recently in 6.x also, not just in master: |
Occurrences seem to be down and localized to a specific worker: |
@atorok yea seems moving to 64GB memory on the workers fixed this pretty much. I guess we can close here? |
Infra changes seem to have helped allot , but according to build stats we still see ~ 10 failures like this per day, that's still fairly high. |
@atorok yea you're right, this seems to have other non-resource reasons apparently from looking at the timings. See #37965 (comment) if interested :) I'm investigating this currently |
Fixed one (potentially not so special) case that leads to this failure in #39118 |
* Remove unnecessary `synchronized` statements * Make `Predicate`s constants where possible * Cleanup some stream usage * Make unsafe public methods `synchronized` * Closes elastic#37965 * Closes elastic#37275 * Closes elastic#37345
* Remove unnecessary `synchronized` statements * Make `Predicate`s constants where possible * Cleanup some stream usage * Make unsafe public methods `synchronized` * Closes elastic#37965 * Closes elastic#37275 * Closes elastic#37345
* Remove unnecessary `synchronized` statements * Make `Predicate`s constants where possible * Cleanup some stream usage * Make unsafe public methods `synchronized` * Closes elastic#37965 * Closes elastic#37275 * Closes elastic#37345
* Remove unnecessary `synchronized` statements * Make `Predicate`s constants where possible * Cleanup some stream usage * Make unsafe public methods `synchronized` * Closes elastic#37965 * Closes elastic#37275 * Closes elastic#37345
Seems to be triggered by the first test to run for the class.
Example build failure
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+matrix-java-periodic/ES_BUILD_JAVA=openjdk12,ES_RUNTIME_JAVA=java8,nodes=virtual&&linux/170/console
Reproduction line
does not reproduce locally
Example relevant log:
Frequency
Up to 5 times per day.
The text was updated successfully, but these errors were encountered: