Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Killing webserver process should use SIGTERM on Linux, with SIGKILL as fallback #18209

Open
rmunn opened this issue Oct 20, 2022 · 9 comments · Fixed by #18220
Open

[BUG] Killing webserver process should use SIGTERM on Linux, with SIGKILL as fallback #18209

rmunn opened this issue Oct 20, 2022 · 9 comments · Fixed by #18220
Labels
feature-test-runner Playwright test specific issues

Comments

@rmunn
Copy link
Contributor

rmunn commented Oct 20, 2022

Context:

  • Playwright Version: 1.20.0-alpha-feb-23-2022
  • Operating System: Linux
  • Node.js version: v16.10.0
  • Browser: Chromium
  • Extra: Using docker-compose to launch webserver

Config Snippets

Playwright config:

const config: PlaywrightTestConfig = {
  // ...
  webServer: {
    command: 'docker-compose -f docker/docker-compose.yml up app-for-e2e',
    port: 3238,
    timeout: 10 * 1000,
    reuseExistingServer: !process.env.CI,
  }
};

docker-compose.yml:

app-for-e2e:
  # ...
  container_name: app-for-e2e
  platform: linux/amd64
  ports:
    - 3238:80

Describe the bug

We use a Docker container to launch our dev webserver and run Playwright tests against it. The docker-compose up command responds to SIGINT and SIGTERM by gracefully terminating the container(s) that it launched, but if it receives SIGKILL then it cannot terminate the containers, and they stay running. At the moment, the WebServer class does not provide any way to gracefully close the launched process, instead using the killAndWait function from processLauncher.ts, which sends SIGKILL to the process without ever trying SIGTERM.

The result is that after I run npx playwright test, the app-for-e2e container is still running, because Playwirght never attempted to send SIGTERM to the process and just sent SIGKILL instead. I then have to stop the app-for-e2e container manually, since I don't want it running and taking up system resources when I'm not using it.

Proposed solution

To solve this bug, I would suggest changing killProcess in processLauncher.ts to take a boolean argument gracefully, defaulting to false for backwards compatibility. If gracefully is true, send SIGTERM rather than SIGKILL on Unix-like platforms (Linux and MacOS), or on Windows, run taskkill without the /F flag. Add the same parameter to killProcessAndCleanup and killAndWait. Then change the this._killProcess() method in the WebServer class to look something like this:

this._killProcess = async () => {
  const _hasExited = async () => launchedProcess.exitCode != null;
  const killTimeout = this.config.timeout || 60 * 1000;
  kill(true);
  const cancellationToken = { canceled: false };
  const { timedOut } = (await Promise.race([
    raceAgainstTimeout(() => waitFor(_hasExited, 100, cancellationToken), killTimeout),
    this._processExitedPromise,
  ]));
  cancellationToken.canceled = true;
  if (timedOut)
    kill(false);
};

Additional Info

This is #12299 reopened, since #12299 was closed without a good reason for closing it. (The reasons given were "It looks like this issue only has a handful of upvotes, has not been touched recently and/or we lack sufficient feedback to act on it", none of which are good reasons to close a bug without discussion in the issue first. If a bug is going to be closed for any of those reasons, a comment to that effect should be made in the bug itself first. Closing bugs with no discussion like that is bad issue-management practice.)

@thenbe
Copy link

thenbe commented Oct 20, 2022

Possibly related: #18041

My current workaround is to manually detect if a process has been started by playwright, then use tree-kill to kill it in GlobalTeardown.

@pavelfeldman
Copy link
Member

I'm reading docker docs and it seems like docker compose itself is sending SIGTERM to the containers to gracefully kill them: https://docs.docker.com/compose/faq/#:~:text=Compose%20stop%20attempts%20to%20stop,they%20receive%20the%20SIGTERM%20signal. It probably does the same thing when you send SIGTERM to docker compose, or it happens automatically if they are a part of the same process tree.

This is a known anti-pattern when reliably managing dependent process trees - it is prone to dangling and zombie processes. Any process can die at any moment, you can't rely on SIGTERM solely to process the cleanup code. I.e. we can support graceful termination, but they must ensure hard kill still works. It looks like a docker bug to me. Even with your proposed fix, clicking Ctrl+C twice would allow container processes to survive, there is no guarantee they will die.

@rmunn
Copy link
Contributor Author

rmunn commented Oct 20, 2022

A quick note on Linux signal handling in case someone reading this is unfamiliar with it: processes are allowed to handle most signals however they want, and shutting down when you receive SIGTERM is only a (universally-followed) convention: you could handle SIGTERM and do nothing with it, though that would be considered a bug in nearly every case. But processes cannot handle SIGKILL. When a process is sent SIGKILL, the Linux kernel immediately kills the process, giving it no time to do any cleanup. Which is why the pattern on Linux is usually to send SIGTERM first (which by convention means "do any necessary cleanup, then shut yourself down"), then wait a configurable number of seconds before sending SIGKILL if the process hasn't terminated in a timely fashion.

So yes, docker-compose needs to have something in place to handle cases where the docker-compose process dies unexpectedly (or is sent a SIGKILL). But bugs in other software are no excuse for Playwright not doing the right thing, and sending SIGTERM first in order to allow webserver processes to do whatever cleanup they might find necessary. I experienced this issue with docker-compose, but there are many other scenarios where a webserver process might have some cleanup (deleting temporary files, etc.) to do, and sending SIGKILL first without ever attempting SIGTERM doesn't allow that cleanup to ever happen.

@dgozman dgozman added the v1.28 label Oct 20, 2022
aslushnikov added a commit to aslushnikov/playwright that referenced this issue Oct 20, 2022
We now will send `SIGTERM` to the webserver and wait for the `timeout`
before sending `SIGKILL` to it.

Fixes microsoft#18209
aslushnikov added a commit to aslushnikov/playwright that referenced this issue Oct 20, 2022
We now will send `SIGTERM` to the webserver and wait for the `timeout`
before sending `SIGKILL` to it.

Fixes microsoft#18209
aslushnikov added a commit that referenced this issue Oct 21, 2022
We now will send `SIGTERM` to the webserver and wait for the `timeout`
before sending `SIGKILL` to it.

Fixes #18209
@domvo
Copy link

domvo commented Feb 6, 2023

This should be reopened, since the initial fix was reverted.

@rmunn
Copy link
Contributor Author

rmunn commented Feb 6, 2023

I'd reopen it, but I don't have permissions to do so on this repo. Any maintainers want to reopen this, or would you prefer I open a new issue linking to this one?

@alexis-aquanty
Copy link

The fact that the webserver is killed instead of terminating gracefully seems to break any tools that rely on Node V8 coverage of the webserver process, since coverage data is written when the process terminates (but not when it's killed). This breaks code coverage using c8, which is important for us. I don't know why the fix for this issue was correctly implemented and then reverted. SIGKILL should never be the default.

@dangoodman
Copy link

any update?

divdavem added a commit to divdavem/AgnosUI that referenced this issue Mar 29, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to divdavem/AgnosUI that referenced this issue Mar 29, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to divdavem/AgnosUI that referenced this issue Mar 29, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to divdavem/AgnosUI that referenced this issue Mar 29, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to divdavem/AgnosUI that referenced this issue Mar 29, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to divdavem/AgnosUI that referenced this issue Mar 29, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to divdavem/AgnosUI that referenced this issue Mar 29, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to divdavem/AgnosUI that referenced this issue Mar 29, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to divdavem/AgnosUI that referenced this issue Mar 29, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to divdavem/AgnosUI that referenced this issue Mar 29, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to divdavem/AgnosUI that referenced this issue Mar 29, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to divdavem/AgnosUI that referenced this issue Mar 29, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to divdavem/AgnosUI that referenced this issue Mar 29, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to divdavem/AgnosUI that referenced this issue Mar 29, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to divdavem/AgnosUI that referenced this issue Mar 29, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to divdavem/AgnosUI that referenced this issue Apr 2, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to divdavem/AgnosUI that referenced this issue Apr 2, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to divdavem/AgnosUI that referenced this issue Apr 2, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to divdavem/AgnosUI that referenced this issue Apr 2, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to divdavem/AgnosUI that referenced this issue Apr 2, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to divdavem/AgnosUI that referenced this issue Apr 2, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to divdavem/AgnosUI that referenced this issue Apr 2, 2024
Note that this change is also a workaround for:
microsoft/playwright#18209
divdavem added a commit to AmadeusITGroup/AgnosUI that referenced this issue Apr 2, 2024
@rmunn
Copy link
Contributor Author

rmunn commented Apr 25, 2024

This is a quick comment to keep this issue open, since apparently the Playwright team closes issues without warning when they haven't had much recent activity.

Psst, Playwright team: it would be nice if you would drop a comment on stale issues first, say 14 days before closing them, saying "This issue has not had any recent activity and will be closed in 14 days". That way people wouldn't feel the need to add comments like this one, which is essentially spam but I don't know any other way to be sure this issue will be kept open. After all, it isn't fixed yet. If you're closing issues for not having recent activity, then "bump" comments like this are the only way, apparently, to make sure an issue that hasn't been fixed stays open.

@swersk
Copy link

swersk commented Aug 16, 2024

Following this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-test-runner Playwright test specific issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants