-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
socket reconnect
from ping timeout does not work correctly
#3652
Comments
We have logic to relaunch browsers (and thus retry the test) if the socket disconnects during a test. Is this worth having? It's not fundamentally wrong: if the test passes its fine. It seems marginally wrong in that the most common case would be a flaky test that is close to the pingTimeout (too large JS or too long test). The retry might pass if the first one fails, but it means we don't address the underlying problem (pingTimeout too small). Can the network could be legitimately flaky in some cases these days? |
There is some replay code: |
This feature has been trouble and there are cases it cannot work. Any socket disconnect during a test can lose data. The server will disconnect if the client does not reply. Combined it makes any long running or large test error prone. Better to bail and restart the test. Fixes karma-runner#3652
A co-worker and myself as new users of karma ran into this issue (Disconnected reconnect failed before timeout of 2000ms (ping timeout)) many times during testing. We pulled John's PR and after some pretty exhaustive testing, I can say it has 100% resolved our problems. Thank you! Looking forward to this getting merged. |
@johnjbarton, any update on this? The reason my team recently adopted Karma is so that we could leverage the integration with Istanbul to generate coverage reports for our suite of Jasmine tests. When using Karma with headless Chrome via Puppeteer, the current Karma code fails far more often than not at the very end of the test run (with the ping timeout error). As a result, the Istanbul tests are not written to disk. For such a use case, the pull request completely resolves the problem (based on our testing) and is essential to generating coverage reports. |
I just want to be sure that all karma 6.0 issues are resolved before we go ahead. This reconnect code is only triggered when things go bad so I don't trust our test coverage. I appreciate your efforts and I will try to pick up the pace. |
This feature has been trouble and there are cases it cannot work. Any socket disconnect during a test can lose data. The server will disconnect if the client does not reply. Combined it makes any long running or large test error prone. Better to bail and restart the test. Fixes karma-runner#3652
This feature has been trouble and there are cases it cannot work. Any socket disconnect during a test can lose data. The server will disconnect if the client does not reply. Combined it makes any long running or large test error prone. Better to bail and restart the test. Fixes karma-runner#3652
@johnjbarton, just wanted to check in on this one. I appreciate it's only one issue of many the team is focused on, so I understand it can take some time. I also wanted to update you that today I pulled the disconnect branch from your forked repo, and tested again. Unfortunately, the original problem has resurfaced. When I originally tested the fix you made in February, it was rock solid. Today when looking at the git log on that disconnect branch, I can see your fix at the top of the disconnect branch, but a number of other commits with a more recent date beneath it the git log. I guess maybe you rebased with the base branch, and perhaps something was incompatible with your changes. Perhaps this is something you already know, but I wanted to bring it to your attention in case it's news. Thanks! |
Hi @trevorgithub , John has left the team and probably won't be active on Karma project. @jginsburgn and I (@gjyalpha ) will keep working on the Karma open source project. Well, due to other priorities, I'm not sure when we will get another chance to look on this yet. Will keep this thread posted if we have any update. Thanks! |
@gjyalpha , thank you very much for the prompt feedback, it is much appreciated. I'm saddened to hear that work may be lost (or at least postponed) but I am very much grateful for the information. |
@trevorgithub can you please share a log from the failing run using the latest PR build with the |
Actually, please wait a bit. I see that the last build in PR failed, I'll fix it in the coming days and then ask you to re-test. Sorry for the noise! |
Two or three e2e (cucumber) tests fail. (I was debugging these on my last
day when the power on our block went out for 4 hours so I was unable to
finish).
As far as I can tell, these fails indicate that the PR is not fully
correct. The failing cases seem to involve 1) launching the server 2)
running additional command line commands. These additional commands fail
because the server has exited. Since part of the change involves the logic
to exit when the number of browsers goes to zero, I think that the
related (no)single run logic is broken. Since I've never used the related
command-line features I was uncertain about that logic anyway.
I planned to take another look but I'm unsure when I will have a chance. So
far my retirement hasn't left me with any free time ;-)
jjb
…On Wed, May 12, 2021 at 12:52 AM Yaroslav Admin ***@***.***> wrote:
Actually, please wait a bit. I see that the last build in PR failed, I'll
fix it in the coming days and then ask you to re-test. Sorry for the noise!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3652 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABSGAW5W5JURJLPN5IUDYDTNIXTZANCNFSM4XJ6XLHQ>
.
|
Thanks so much for taking a look @johnjbarton. Enjoy your retirement! @devoto13: Do you want me to take a look at this? |
@jginsburgn Yes, once I have some time to spare :) |
This feature has been trouble and there are cases it cannot work. Any socket disconnect during a test can lose data. The server will disconnect if the client does not reply. Combined it makes any long running or large test error prone. Better to bail and restart the test. Fixes #3652
Could this issue be why sometimes Perhaps the following occurs:
|
This is a long standing issue I've tried to fix in the past.
A user in my company hit a case that failed 30% of the time or so and he ran down the reason:
As far as I can tell, any data that the client sends after the server thinks the socket is disconnected is lost. The socket.io layer has no replay and neither does karma.
The problem is fundamental. Either we implement replay or we work to avoid disconnect but fail if we hit it.
Replay is not super hard since we know that tests are finite in duration.
But I think that, for the most part, disconnect is self-inflicted. The client is parsing a huge chunk of JS or it is off doing a long test: it fails to respond to a ping. The browser is there and the runtime is live, but we are starving the ping reply. If we increase the pingTimeout we can avoid the disconnect.
The other ways that the pingTimeout can be hit is browser/tab crash or network partition. With a large pingTimeout we wait a long time to discover the tab crash.
I can't come up with a scenario where automatic socket reconnect makes sense for us. Any time we lose connection, we lose data and the test is invalid. We could say that, eg logging up until the test starts is "optional" so we could reconnect if we lost connection before start. But if we lose connection before start the thing we need to figure out why is exactly those logs.
I think we should flush the entire reconnect logic, all of the timeouts other than pingTimeout, and retry logic in favor error messages directing users to look into the issue and then increase the pingTimeout.
The text was updated successfully, but these errors were encountered: