Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graphical applications crash with Xorg error after being visible for a split second #529

Open
Matoking opened this issue Sep 11, 2022 · 8 comments

Comments

@Matoking
Copy link

Matoking commented Sep 11, 2022

Your system information

  • Steam Runtime Version: Soldier 0.20220803.0
  • Distribution (e.g. Ubuntu 18.04): Arch Linux
  • Link to your full system information (Help -> System Information) in a Gist: https://gist.github.com/Matoking/daecb6d447f58598de60148cbee0de99
  • Have you checked for system updates?: Yes
  • What compatibility tool are you using?: Proton 7.0
  • If you are using Steam Linux Runtime, or Proton 5.13 or newer: What versions are listed in SteamLinuxRuntime_soldier/VERSIONS.txt?
#Name	Version		Runtime	Runtime_Version	Comment
depot	0.20220804.66			# Overall version number
pressure-vessel	0.20220803.0			
scripts	v0.20220803.0-0-gbca628e			# Entry point scripts, etc.
soldier	0.20220803.0	soldier	0.20220803.0	# soldier_platform_0.20220803.0/

Please describe your issue in as much detail as possible:

When running graphical applications such as winecfg and regedit using Steam Runtime and Proton 7.0, the window will appear for a split second and then crash with the following X11 error:

X Error of failed request:  BadWindow (invalid Window parameter)
  Major opcode of failed request:  10 (X_UnmapWindow)
  Resource id in failed request:  0x9600001
  Serial number of failed request:  225
  Current serial number in output stream:  227

To reproduce the issue, the wineserver process must be alive for multiple wine calls, and the command to launch the graphical application must be run twice for the crash to occur.

Steps for reproducing this issue:

  1. Launch a Steam game using Proton 7.0 with the PROTON_DUMP_DEBUG_COMMANDS=1 set.
  2. Go to the created debug directory (/tmp/proton_USERNAME) and edit run to remove steam.exe from the call, and replace wine64 with wine; this latter change ensures the X11 error is printed, for whatever reason.
  3. Open the installation directory for "Steam Linux Runtime - Soldier".
  4. Run the command ./run --share-pid --launcher --filesystem /mnt -- --bus-name "com.foo.TestProton.Test" in the Steam Runtime directory to launch a Steam Runtime session (adjust --filesystem as needed to ensure the runtime directory is mounted inside the container). Keep this command running.
  5. Open another session in <Steam Runtime directory>/pressure-vessel/bin and run the following command ./steam-runtime-launch-client --bus-name "com.foo.TestProton.Test" --share-pids --directory /tmp/proton_USERNAME -- ./run cmd.exe. Keep this command running as well, since it ensures a wineserver process is left running for the duration of the next Wine calls.
  6. Open third session and run the command ./steam-runtime-launch-client --bus-name "com.foo.TestProton.Test" --share-pids --directory /tmp/proton_USERNAME -- ./run winecfg twice. On first run, the configuration window should appear as normal. On second run, it should show up for a moment and then close itself, with the X11 error appearing on the command line. Any subsequent attempts will crash as well.

The issue can be reproduced on Proton 7.0-4 and 5.13-6, but not on Proton 6.3-8. The issue also can't be reproduced if Steam Runtime is not used.

Logs for each command with WINEDEBUG="+timestamp,+pid,+tid,+seh,+debugstr,+loaddll,+mscoree" and --verbose can be found here:

https://gist.github.com/Matoking/9db8e55c8bbf3325a4613db2ffc59cdb

@smcv
Copy link
Contributor

smcv commented Sep 19, 2022

./run --share-pid --launcher --filesystem /mnt -- --bus-name "com.foo.TestProton.Test"

Please be aware that running Proton or SteamLinuxRuntime from outside Steam is not really a supported configuration. The container runtime and Proton both expect to be run from inside Steam in order to behave correctly: if you run them externally, they will not have the expected environment variables set and a lot of things won't work as intended.

The issue can be reproduced on Proton 7.0-4 and 5.13-6, but not on Proton 6.3-8

I think someone with Proton knowledge will need to look at this. The container runtime infrastructure doesn't talk to X11, and if X11 can show a window at all (even intermittently), then the container runtime has done its job by providing the X11 socket and the XAUTHORITY data.

You might find that you get different results by using the client_beta branch of Steam Linux Runtime - soldier, which changed the way it sets up X11 so that it tries to reuse the same display number that's used on the host system (often :0 or :1) instead of remapping it to :99. I don't know whether that will help to solve this problem or not.

@smcv
Copy link
Contributor

smcv commented Sep 19, 2022

If you think this could be a recent regression, it might also be useful to try the previous_release branch of Steam Linux Runtime - soldier and see whether it works there.

@smcv
Copy link
Contributor

smcv commented Sep 19, 2022

Your log mentions some options that you didn't mention in the original issue report, like --pass-env XAUTHORITY --pass-env DISPLAY. This might be interfering with the container runtime setup: the DISPLAY and XAUTHORITY environment variables inside the container are intentionally not the same as on the host system.

If you don't use --pass-env, they should inherit the correct values from the steam-runtime-launcher-service.

@Matoking
Copy link
Author

I tested both previous_release:

#Name	Version		Runtime	Runtime_Version	Comment
depot	0.20220727.64			# Overall version number
pressure-vessel	0.20220726.0			
scripts	v0.20220726.0-0-ga110829			# Entry point scripts, etc.
soldier	0.20220726.0	soldier	0.20220726.0	# soldier_platform_0.20220726.0/

and client_beta:

#Name	Version		Runtime	Runtime_Version	Comment
depot	0.20220919.70			# Overall version number
pressure-vessel	0.20220919.0			
scripts	v0.20220823.0-0-gcc4e44f			# Entry point scripts, etc.
soldier	0.20220919.0	soldier	0.20220919.0	# soldier_platform_0.20220919.0/

Both still cause the crash.

This crash didn't occur before, however, so I'll have to look if I can find the runtime version that introduced the issue. The older versions might be available through Steam depots. I also tried compiling the runtime myself so I could try bisecting the issue more precisely, but that turned out to be more time consuming than I expected.

Your log mentions some options that you didn't mention in the original issue report, like --pass-env XAUTHORITY --pass-env DISPLAY. This might be interfering with the container runtime setup: the DISPLAY and XAUTHORITY environment variables inside the container are intentionally not the same as on the host system.

If you don't use --pass-env, they should inherit the correct values from the steam-runtime-launcher-service.

I uploaded new logs here:

https://gist.github.com/Matoking/e0459d62b429584fd09731c4dd6da69b

I initially used both parameters, but noticed they didn't affect the result, so I tried again but forgot to update my logs. The end result is the same, though.


Also, I managed to reproduce the crash on Proton 6.3 as well. It takes a little more effort, however, since I had to close the cmd.exe process and reopen it again. It also turns out the issue can't be reproduced fully deterministically on other Proton versions as well: sometimes they don't reproduce the issue immediately, requiring another attempt before the crash starts occurring.

I don't have experience with X11, but a quick lookup of the error in question suggests that there may be a stale handle of some sort that causes the crash, which might explain why it takes at least two attempts for the crash to occur?

@smcv
Copy link
Contributor

smcv commented Sep 20, 2022

@kisak-valve, please could you point Proton people towards this?

This crash didn't occur before

Before when? Possible triggers, other than the container runtime, include:

  • upgrading Proton
  • upgrading some library on your Arch system

You've tested all the container runtime releases for the last few weeks, so my suggestion would be to look at what else has changed. Does Arch's package manager have an equivalent of /var/log/apt/history.log that would tell you what you upgraded at around the time this started happening?

a quick lookup of the error in question suggests that there may be a stale handle of some sort that causes the crash

The container runtime's involvement in X11 should be mostly limited to "X11 works" or "X11 doesn't work"; anything involving state, windows, etc. is between the X11 client (a Proton/Wine process) and the server (Xorg or Xwayland). It's weird that this is only happening with the container runtime; maybe it's related to differing versions of some library like libxcb or libX11 that is involved in the stateful parts of the X11 protocol?

@Matoking
Copy link
Author

Matoking commented Sep 20, 2022

The issue was reported on the Protontricks repository here and here on September 2. Both of the users use SteamOS 3.3.1 on Steam Deck, and I was able to reproduce the issue on my Arch Linux installation as well.

The issue could have appeared earlier though. I only first noticed it after checking one of the linked bug reports.

@GloriousEggroll
Copy link

GloriousEggroll commented Feb 5, 2024

I was hitting this exact same problem and banging my head on it for hours, and it's specific to running inside a flatpak.

Turns out it's simply because the application running the runtime needs background permissions.

See:
flatpak/flatpak#5427
https://www.reddit.com/r/flatpak/comments/15tzx0w/flatpak_apps_close_a_few_seconds_after_opening/

For reference -- we built ULWGL around the steam runtime, we are launching non-steam games using the steam runtime + custom scripts to pass the required envvars it needs and a custom proton version. When running inside flatpak the application -would- run for a second or so before completely closing. After finding the above mentioned flatpak issue we added this to our flatpak:

  - --talk-name=org.freedesktop.portal.Background

And it resolved the issue. If that does not work you can also try enabling the 'Background' toggle in flatseal

It should also be noted that this appears to happen with flatpak-builder builds installed from the build folder. When built then installed from a local repo the issue did not occur.

@smcv
Copy link
Contributor

smcv commented Feb 6, 2024

I was hitting this exact same problem

Are you sure it was the exact same problem? Including the characteristic BadWindow (invalid Window parameter) in X_UnmapWindow?

The symptom "visible for a split second, and then crashes with BadWindow" described on this issue is not the same as "visible for a second or so, and then killed with SIGKILL", even though it's superficially similar.

After finding the above mentioned flatpak issue we added this to our flatpak: --talk-name=org.freedesktop.portal.Background

This should never be necessary. Flatpak is hard-coded to do the equivalent of --talk-name=org.freedesktop.portal.* without any further action from you - the whole point of portals is that they're something that is safe to give to every sandboxed app, because they have taken responsibility for prompting the user for permission where necessary.

If that does not work you can also try enabling the 'Background' toggle in flatseal

If your app was being killed by the Background portal, then I think you'll find that this is actually how you resolved the problem.

Since xdg-desktop-portal 1.18, it should log a message to the systemd Journal whenever it does this, which will look like:

Terminating app xyz (process 12345) because the app does not have permission to run in the background. You may be able to grant this app the permission to run in background in the system settings of your desktop environment.

xdg-desktop-portal >= v1.17 also allows apps to run in the background by default, but pre-existing apps might have an old entry in the permissions database: see flatpak/flatpak#5427 (comment).

If you have an older version of xdg-desktop-portal, I would recommend upgrading if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants