Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client crashes with opengl enabled and transparency #717

Closed
totaam opened this issue Oct 17, 2014 · 57 comments
Closed

client crashes with opengl enabled and transparency #717

totaam opened this issue Oct 17, 2014 · 57 comments

Comments

@totaam
Copy link
Collaborator

totaam commented Oct 17, 2014

Issue migrated from trac ticket # 717

component: client | priority: major | resolution: fixed

2014-10-17 20:33:07: nickc created the issue


When attaching with this command line, in version 0.15.0:

xpra --encoding=jpeg --opengl=yes attach tcp:0.0.0.0:1200

If epiphany is run in the xterm window, the client crashes. This occurs in 0.15.0, r7955. I could not reproduce it in 0.14.9.

This does not happen if I run Firefox in xterm.

For simplicity, I set up epiphany beforehand so it only loads a simple HTML file with no content, to rule out any possible problems with the target site that's being loaded.

The client crashes, but the server remains intact.

I've enclosed output from the server and client logs using -d all, and output from a gdb session of the same crash scenario.

@totaam
Copy link
Collaborator Author

totaam commented Oct 17, 2014

2014-10-17 20:33:35: nickc uploaded file client.log (1102.3 KiB)

@totaam
Copy link
Collaborator Author

totaam commented Oct 17, 2014

2014-10-17 20:33:45: nickc uploaded file server.log (925.0 KiB)

@totaam
Copy link
Collaborator Author

totaam commented Oct 17, 2014

2014-10-17 20:33:53: nickc uploaded file gdb.log (7.1 KiB)

@totaam
Copy link
Collaborator Author

totaam commented Oct 19, 2014

2014-10-19 16:02:56: totaam changed status from new to assigned

@totaam
Copy link
Collaborator Author

totaam commented Oct 19, 2014

2014-10-19 16:02:56: totaam changed owner from antoine to totaam

@totaam
Copy link
Collaborator Author

totaam commented Oct 19, 2014

2014-10-19 16:02:56: totaam commented


This does not happen if I run Firefox in xterm.
For simplicity, I set up epiphany..
[[BR]]
I wasn't really sure here what was needed to reproduce the bug (xterm matters? epiphany only?).
But it quickly looked like this might be transparency related, so I found a much simpler test case: [/browser/xpra/trunk/src/tests/xpra/test_apps/transparent_colors.py]

FYI, to get better gdb backtraces, install whatever debuginfo packages are needed and then running gdb like so:

gdb --args /usr/bin/python /usr/bin/xpra ...

Here's the one I got:

#0  0x00000033c0e50c00 in g_logv () at /lib64/libglib-2.0.so.0
#1  0x00000033c0e50e3f in g_log () at /lib64/libglib-2.0.so.0
#2  0x00000033cfe6824d in gdk_x_error () at /lib64/libgdk-x11-2.0.so.0
#3  0x00000033c2a454dd in _XError () at /lib64/libX11.so.6
#4  0x00000033c2a42427 in handle_error () at /lib64/libX11.so.6
#5  0x00000033c2a424e5 in handle_response () at /lib64/libX11.so.6
#6  0x00000033c2a42e95 in _XEventsQueued () at /lib64/libX11.so.6
#7  0x00000033c2a2468a in XFlush () at /lib64/libX11.so.6
#8  0x00000033cfe40d50 in gdk_window_process_all_updates () at /lib64/libgdk-x11-2.0.so.0
#9  0x00000033cfe40df9 in gdk_window_update_idle () at /lib64/libgdk-x11-2.0.so.0
#10 0x00000033cfe1ea97 in gdk_threads_dispatch () at /lib64/libgdk-x11-2.0.so.0
#11 0x00000033c0e49afb in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#12 0x00000033c0e49e98 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#13 0x00000033c0e4a1c2 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#14 0x00000033cf344ea7 in gtk_main () at /lib64/libgtk-x11-2.0.so.0

Which isn't very helpful to us. It only shows that this is an unexpected X11 error firing from when gtk processes updates..

@totaam
Copy link
Collaborator Author

totaam commented Oct 20, 2014

2014-10-20 16:49:18: nickc commented


I did the simpler test of setting start-child to transparent_colors, loaded all the debug packages it asked for, and got more detailed output from bt and py-bt, which I've attached. Don't know if these will be any more helpful.

@totaam
Copy link
Collaborator Author

totaam commented Oct 20, 2014

2014-10-20 16:51:45: nickc uploaded file bt.log (43.1 KiB)

@totaam
Copy link
Collaborator Author

totaam commented Oct 20, 2014

2014-10-20 16:51:56: nickc uploaded file py-bt.log (11.3 KiB)

@totaam
Copy link
Collaborator Author

totaam commented Oct 20, 2014

2014-10-20 20:26:43: totaam changed status from assigned to new

@totaam
Copy link
Collaborator Author

totaam commented Oct 20, 2014

2014-10-20 20:26:43: totaam changed owner from totaam to nickc

@totaam
Copy link
Collaborator Author

totaam commented Oct 20, 2014

2014-10-20 20:26:43: totaam commented


Oh thanks, that's much more useful: that's a different stacktrace (I hope it is the same bug!), and a much more interesting one:

#36 Frame 0x2e10820, for file /usr/lib64/python2.7/site-packages/xpra/client/gl/gl_window_backing_base.py, line 537, \
    in _do_paint_rgb (self=<GLPixmapBacking(offscreen_fbo=<c_uint at remote 0x290fc20>, \
    wid=1, _last_pixmap_data=None, _alpha_enabled=True, pixel_format=None, \
    _backing=<gtk.DrawingArea at remote 0x29ca2d0>, border=None, \
    idle_add=<built-in function idle_add>, size=(320, 320), paint_screen=True, mmap=None, \
    gl_setup=True, debug_setup=True, draw_needs_refresh=True, \
    _PIL_encodings=['png', 'png/L', 'png/P', 'jpeg'], \
    textures=<numpy.ndarray at remote 0x2d89070>, texture_pixel_format=<IntConstant(name='GL_RGBA') at remote 0x1e0b830>, \
    texture_size=(0, 0), _video_decoder=None, \
    _decoder_lock=<thread.lock at remote 0x1a5c570>, _csc_decoder=None, shaders=[1, 2], \
    glconfig=<gtk.gdkgl.Config at remote 0x29ca280>, mmap_enabled=False) at remote 0x29cb110>, \
    bpp=32, img_data=<memoryview at remote 0x29326d8>, x=0, y=0, width=320, height=320, rowstride=1280, \
    options=<typedict at remote 0x7fffc8001770>, context=<GLContextMan...(truncated)

Does this help:

--- xpra/client/gl/gl_window_backing_base.py	(revision 7956)
+++ xpra/client/gl/gl_window_backing_base.py	(working copy)
@@ -107,6 +107,7 @@
 memoryview_type = None
 if sys.version_info[:2]>=(2,7) and OpenGL_version.__version__.split('.')[:2]>=['3','1']:
     memoryview_type = memoryview
+memoryview_type = None
 try:
     buffer_type = buffer
 except:

If not, can you please post the output with -d opengl. I seem to recall a problem with the gl pixel format, maybe we should be using GL_RGB here?
(just out of curiosity more than anything else: which distro is this? it has python 2.7.5)

If none of this helps, it may be worth a try: can you reproduce this with older versions of the Nvidia drivers? (like 331.x) I am on an laptop with an Intel chipset for a few days...

@totaam
Copy link
Collaborator Author

totaam commented Oct 20, 2014

2014-10-20 22:54:48: nickc commented


Yes, that stacktrace was taken directly after getting the crash.

And yes, that patch does indeed fix the bug. I tried it with and without the patch several times, and it always succeeds with the patch (running Epiphany or transparent_colors) and always fails to run either of them without the patch.

Would that work as a permanent fix, or was it just an experiment to narrow down the problem?

I'm running on Fedora 20, and I'm pretty sure it came with python 2.7.5. At least I don't recall manually installing python.

@totaam
Copy link
Collaborator Author

totaam commented Oct 20, 2014

2014-10-20 23:05:43: nickc commented


Strike the dumb question of mine from the record. I can see now, by looking at the surrounding code, that this was obviously an experimental patch.

Anyway, I hope it gives you enough info to fix the problem.

@totaam
Copy link
Collaborator Author

totaam commented Oct 20, 2014

2014-10-20 23:33:03: nickc commented


I discovered today that this bug also happens in our latest code.

So whatever fix gets applied to xpra we'll need to merge into that as well.

@totaam
Copy link
Collaborator Author

totaam commented Oct 21, 2014

2014-10-21 15:47:53: totaam commented


this was obviously an experimental patch
[[BR]]
It is, but maybe this is what we should apply until we can figure out a better solution... crashing is bad!
I just do not understand why this only affects jpeg: png is also decoded by pillow but it is not crashing (not on my system anyway).
This issue looks similar to #465#comment:11, but I previously thought that this only affected the "new buffer interface" which we only use with py3k (because we have no choice there).

Can you please post the -d opengl debug output?

@totaam
Copy link
Collaborator Author

totaam commented Oct 21, 2014

2014-10-21 16:39:14: nickc commented


The attached opengl.log was the output from a server started with:

xpra --no-daemon --bind-tcp=0.0.0.0:1200 --start-child=work/xpra/src/tests/xpra/test_apps/transparent_colors.py start :11

And the client attached with:

xpra --encoding=jpeg -d opengl attach tcp:0.0.0.0:1200

This is on my Fedora 20 desktop, so I didn't need to pass --opengl=yes, because it's enabled by default. If I pass --opengl=no, then the bug doesn't happen.

@totaam
Copy link
Collaborator Author

totaam commented Oct 21, 2014

2014-10-21 16:39:34: nickc uploaded file opengl.log (15.4 KiB)

@totaam
Copy link
Collaborator Author

totaam commented Oct 24, 2014

2014-10-24 20:10:20: totaam uploaded file remove-opengl-zerocopy-upload.patch (2.5 KiB)

completely removes all the memoryview code from the opengl backend

@totaam
Copy link
Collaborator Author

totaam commented Oct 24, 2014

2014-10-24 20:22:17: totaam commented


I am testing this again, and disabling memoryview does NOT help here. Are you sure that the change from comment:3 improved things?

I am still getting the crash, with or without memoryviews (the patch above is a more complete removal).

@totaam
Copy link
Collaborator Author

totaam commented Oct 24, 2014

2014-10-24 21:50:20: totaam commented


What I found so far is that taking out this one line prevents the crash:

self._backing.set_colormap(rgba)

But obviously disables transparency too.

Taking out do_paint_rgb32 alone does not help, you also need to take out gl_expose_event. I think that the backtrace is misleading, the crash can also be triggered by present_fbo.

I am certain that the test colour code used to work fine, so it looks like we may have to bisect this...

@totaam
Copy link
Collaborator Author

totaam commented Oct 24, 2014

2014-10-24 22:10:35: nickc commented


Noted.

And btw, I just did another round of testing with and without the first patch, and it consistently works with the patch, and fails without it. So I'm pretty sure the earlier results were correct.

Sounds like it's more complex than that though, based on your recent findings.

@totaam
Copy link
Collaborator Author

totaam commented Oct 25, 2014

2014-10-25 10:04:22: totaam changed status from new to assigned

@totaam
Copy link
Collaborator Author

totaam commented Oct 25, 2014

2014-10-25 10:04:22: totaam changed owner from nickc to totaam

@totaam
Copy link
Collaborator Author

totaam commented Oct 25, 2014

2014-10-25 10:04:22: totaam commented


Thanks for the update.
It is quite possible that my laptop has another unrelated opengl bug, after all the driver is blacklisted (Intel 4000 again... see also #563).

So, I have removed opengl zero copy upload for the v0.14.x branch in 7974.
I am still hoping to get a better fix in trunk, one that will allow us to keep zero copy. So I have NOT applied the fix there. I will get back to this once I am back on less crippled hardware for testing.

@totaam
Copy link
Collaborator Author

totaam commented Oct 31, 2014

2014-10-31 02:53:44: totaam changed status from assigned to new

@totaam
Copy link
Collaborator Author

totaam commented Oct 31, 2014

2014-10-31 02:53:44: totaam changed owner from totaam to nickc

@totaam
Copy link
Collaborator Author

totaam commented Oct 31, 2014

2014-10-31 02:53:44: totaam commented


I am unable to reproduce this bug for now on Fedora 20 with nvidia drivers version 343.22 and a GTX 760.
I have also tried Fedora 19 in a virtual machine (both i686 and amd64 arches) and Fedora 20 in a virtual machine (amd64 only), and there seems to be a problem with the OpenGL accelerate package we ship with F19 (not going to bother fixing it since it is EOLed soon) and the virtual driver is blacklisted anyway. So opengl is definitely disabled on those systems, and still no crash there at all, no matter what encoding I try to use.

Can you try the newer drivers to see if that helps. If not, we'll need to narrow it down. GTX 750 vs 760 shouldn't matter!
Maybe also try using vesa or vga drivers to see if the drivers matter at all on this system.

@totaam
Copy link
Collaborator Author

totaam commented Nov 7, 2014

2014-11-07 07:38:17: totaam commented


This ticket seems to be stuck, so I have forward ported 7978 in trunk.

Will revisit once I have enough information to make progress.

@totaam
Copy link
Collaborator Author

totaam commented Nov 8, 2014

2014-11-08 15:18:54: totaam commented


Logs were added for what looks exactly like this bug to #654.
Of particular interest, shortly before the crash (crash is in UI thread, this is the decoding thread) we have a paint with alpha using webp encoding:

process_draw 13094 bytes for window 2 using webp encoding with options={'has_alpha': True, 'quality': 99, 'speed': 86, 'rgb_format': 'BGRA'}

and

#36 Frame 0x3031cd0, for file /usr/lib64/python2.7/site-packages/xpra/client/gl/gl_window_backing_base.py, line 537, \
    in _do_paint_rgb (self=<GLPixmapBacking(offscreen_fbo=<c_uint at remote 0x2919b00>, wid=3, _last_pixmap_data=None, \
    _alpha_enabled=True, pixel_format=None, _backing=<gtk.DrawingArea at remote 0x2a395f0>, border=None, \
    idle_add=<built-in function idle_add>, size=(320, 320), paint_screen=True, mmap=None, gl_setup=True, debug_setup=True, \
    draw_needs_refresh=True, _PIL_encodings=['png', 'png/L', 'png/P', 'jpeg'], textures=<numpy.ndarray at remote 0x2d90860>, \
    texture_pixel_format=<IntConstant(name='GL_RGBA') at remote 0x1d7f3b0>, texture_size=(0, 0), _video_decoder=None, \
    _decoder_lock=<thread.lock at remote 0x1cf8f30>, _csc_decoder=None, shaders=[1, 2], \
    glconfig=<gtk.gdkgl.Config at remote 0x2a395a0>, mmap_enabled=False) at remote 0x2a34e90>, bpp=32, \
    img_data=<memoryview at remote 0x2a305a8>, x=0, y=0, width=320, height=320, rowstride=1280, \
    options=<typedict at remote 0x7fffc400a530>, context=<GLContextMan...(truncated)

img_data=<memoryview at ..

Something is giving us a memoryview as upload data, and I suspect it might be webp. I've hit a similar bug today with #729, and there was a change in the cairo backing to do just that in r7558.

@nickc: does r8077 + r8078 fix things?

To verify that the pixel upload causing the crash is the webp one (there is little doubt), this would help:

--- src/xpra/client/window_backing_base.py	(revision 8011)
+++ src/xpra/client/window_backing_base.py	(working copy)
@@ -240,6 +240,7 @@
         buffer_wrapper, width, height, stride, has_alpha, rgb_format = dec_webp.decompress(img_data, has_alpha, options.get("rgb_format"))
         #replace with the actual rgb format we get from the decoder:
         options["rgb_format"] = rgb_format
+        options["WEBP-DEBUG-MARKER"] = True
         def free_buffer(*args):
             buffer_wrapper.free()
         callbacks.append(free_buffer)

(as we already log the options during paint_rgb32 / paint_rgb24 in the UI thread)

The really puzzling thing is why I am unable to reproduce it.
Can you please also post the output of xpra/codecs/loader.py?
Which version of webp / Pillow do you have installed?

@totaam
Copy link
Collaborator Author

totaam commented Nov 8, 2014

2014-11-08 20:28:55: nickc commented


I can only reproduce this bug at work. When I VNC from home, using xrdp, I'm getting a different desktop, and I notice that transparent_colors.py does not render using transparency when I remote to the box, and the crash does not happen.

Apparently the bug only happens with my GNOME desktop in the office.

So I drove into work to check out your latest changes. I tested on r8078 on my work machine, and those changes did fix the problem (and the window transparency was properly enabled as well). I verified by trying r8060 again, and it still fails on that revision.

Running xpra/codecs/loader.py gives me the following versions:

codecs versions:
* PIL                  : 2.2.1
* avcodec2             : 55.39.101
* cython               : 0.3.0.20
* dec_webp             : 0.3.1
* enc_webp             : 0.3.1
* numpy                : 1.8.2
* swscale              : 2.5.101
* vpx                  : 1.3.0
* x264                 : 138

@totaam
Copy link
Collaborator Author

totaam commented Nov 8, 2014

2014-11-08 20:34:28: nickc commented


Sorry, I didn't understand what you want me to do with the patch you included regarding the pixel upload. Did you want me to patch a crashing system with that and see if it fixes it, patch a working system to see if that breaks it, or patch one or the other versions, and print out logs for you?

@totaam
Copy link
Collaborator Author

totaam commented Nov 9, 2014

2014-11-09 04:15:53: totaam commented


When I VNC from home, using xrdp, I'm getting a different desktop
[[BR]]
Try:

xpra shadow ssh:WORKMACHINE:0

(or you can also start xpra shadow on the server first then connect via tcp or whatever)
This will give you a copy of your current desktop rather than a new one. It's going to be slower than a brand new one (shadow mode is not very efficient).
You can also do the same thing with x11vnc but xpra is faster.
[[BR]]

So I drove into work to check out your latest changes
[[BR]]
Thanks! It wasn't that urgent!
[[BR]]

those changes did fix the problem
[[BR]]
Great. Backport in 8079.
[[BR]]

Running xpra/codecs/loader.py gives me the following versions:
(..)
[[BR]]
Here's the big difference with my setup... now I think we know why we get different results on the same OS.
You're building against the host libraries instead of building against the xpra private libraries in /usr/lib[64]/xpra.
I have instead:

codecs versions:
* PIL                  : 2.6.1
* avcodec2             : 56.1.100
* cython               : 0.3.0.21.1
* dec_webp             : 0.4.2
* enc_webp             : 0.4.2
* numpy                : 1.8.2
* nvenc3               : 3.0.0
* nvenc4               : 4.0.0
* swscale              : 3.0.100
* vpx                  : 1.3.0-484-g3bcece9
* x264                 : 142

Don't worry about nvenc if you don't have the hardware for it, but the rest should match.

To use the private libraries, apart from installing the required -devel rpms from the repository, you need to build and install a RPM or at least build using the same command as the RPM spec file. ie for 64 bits:

LDFLAGS=-Wl,-rpath=/usr/lib64/xpra \
PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/lib64/xpra/pkgconfig \
    ./setup.py install

Installing the updated Cython, yasm, etc would not hurt either.
I've updated the wiki with more up to date (and clearer?) build information for fedora/centos: [/Building#FedoraandCentOSRHEL]

Now, I've tried doing the same thing you did, and I still didn't get the crash!?
Maybe your PyOpenGL packages are also out of date? (I have not tried that one)
I have:

$ yum list | grep -i pyopengl
PyOpenGL.noarch                        3.1.0final-1.fc20       @/PyOpenGL-3.1.0final-1.fc20.noarch
PyOpenGL-accelerate.x86_64             3.1.0-1.fc20            @/PyOpenGL-accelerate-3.1.0-1.fc20.x86_64

@totaam
Copy link
Collaborator Author

totaam commented Nov 9, 2014

2014-11-09 04:29:54: nickc commented


Here's what I get from the same yum listing:

PyOpenGL.noarch                        3.1.0b2-1.fc20          @updates         
PyOpenGL-Tk.noarch                     3.1.0b2-1.fc20          updates          
python3-PyOpenGL.noarch                3.1.0b2-1.fc20          updates          
python3-PyOpenGL-Tk.noarch             3.1.0b2-1.fc20          updates          

@totaam
Copy link
Collaborator Author

totaam commented Nov 9, 2014

2014-11-09 05:08:13: totaam changed priority from blocker to major

@totaam
Copy link
Collaborator Author

totaam commented Nov 9, 2014

2014-11-09 05:08:13: totaam commented


Right, it would be good to confirm that upgrading from the buggy beta2 packages found in the Fedora 20 updates repository to the one we ship does fix the problem (undo the fixes, or revert to r8076 or earlier to test).
Then we would know the actual combination of libraries that causes the crashes on your system.
Note: I've tried downgrading here... and still no crash!?

The new memoryview_to_bytes call makes an unnecessary copy of the data before the upload. To make matters worse, it does so in the UI thread where we normally try to avoid doing any heavy work.

So if we can identify the buggy pyopengl version, we can skip the copy on 3.1.0 proper.
With the buggy one, running ./xpra/client/gl/gl_check.py |& grep pyopengl I get:

 - pyopengl                 : 3.1.0b2

And with the correct one:

 - pyopengl                 : 3.1.0

And the new one normally gets installed with the "accelerate" module. So we should be able to tell them apart.

Lowering the ticket priority as we no longer have a "crash by default" behaviour!

Also, before closing this ticket we should verify which encoding really does cause the crash - it can't be jpeg as in the ticket title, since jpeg isn't used for transparent windows. I think it's webp. (png and rgb can do transparency, so won't trigger the bug - vp8, h264 should trigger it too)

@totaam
Copy link
Collaborator Author

totaam commented Nov 10, 2014

2014-11-10 18:41:52: nickc commented


I got my machine into a state that's probably no longer able to reliably test the failed case anymore. I can no longer reproduce the bug. Here's what I did:

-- I attempted to update my PyOpenGL and PyOpenGL-accelerate packages using yum install.

-- It wasn't finding PyOpenGL-accelerate.

-- I did a yum update.

-- Then I installed PyOpenGL and PyOpenGL-accelerate from our own rpm distributions.

-- I could no longer reproduce the bug.

So, to verify that it was the updates to PyOpenGL that did the trick, I removed the packages and reinstalled PyOpenGL from the Fedora repository. But I still couldn't reproduce the crash.

Even though my system was reporting that the beta package of PyOpenGL was the installed package, I still could not reproduce the bug.

I tried to set my LDFLAGS as you specified, but that didn't seem to matter either way.

So either something got updated when I installed our newest PyOpenGL packages, which did not get removed when I removed them, or something got updated when I ran the yum update.

I tried running gl_check.py to verify the version that way, but running that utility crashes due to a missing module:

Traceback (most recent call last):
  File "./gl_check.py", line 469, in <module>
    sys.exit(main())
  File "./gl_check.py", line 450, in main
    props = check_support(0, True, verbose)
  File "./gl_check.py", line 368, in check_support
    from xpra.client.gl.gtk_compat import get_info, gdkgl, Config_new_by_mode, MODE_DOUBLE, GLDrawingArea
  File "/usr/lib64/python2.7/site-packages/xpra/client/gl/gtk_compat.py", line 59, in <module>
    from gtk import gdkgl, gtkgl
ImportError: cannot import name gdkgl

@totaam
Copy link
Collaborator Author

totaam commented Nov 11, 2014

2014-11-11 02:30:57: totaam commented


The reason why it doesn't crash any more is because you're not using opengl, that's because you are missing the (py)gtkglext packages.

When you removed PyOpenGL, yum removed some packages which depended on it. You should have reinstalled them.

@totaam
Copy link
Collaborator Author

totaam commented Nov 11, 2014

2014-11-11 21:31:13: nickc commented


Thanks for the tip on that pygtkglext package. I've restored that, and as you suggested, I'm again able to reproduce the crash.

With this capability back in place, I can confidently report now, that with PyOpenGL 3.1.0b2 installed, the crash occurs, and with PyOpenGL 3.1.0 installed, the crash does not occur.

It's unfortunate however that you're not getting the same behavior on your machine. So maybe there is some other package that I have installed that causes the crash in combination with using the PyOpenGL beta.

If you can think of some other packages to check, I can take a look. What would be great, would be a comprehensive package list I could check against. Is there such a master list anywhere, or would it just be too many miles in length due to all the dependencies?

@totaam
Copy link
Collaborator Author

totaam commented Nov 13, 2014

2014-11-13 05:41:17: totaam commented


would be a comprehensive package list I could check against
[[BR]]
I don't think so:

$ rpm -qa | wc -l
2718

That would be a lot of false positives to sift through!

r8099 re-enables zero copy for rgb paint (used by all non-video encodings), it would be good to re-enable it for all video encodings (added to #679), as this is where the biggest savings will be.

@nickc: can you break it at all? I've tried hard to crash it, using the buggy Fedora 20 pyopengl and with the xpra.org non-buggy build and I cannot get it to crash. But since, I couldn't get it to crash before... that doesn't mean much!

Note: the ./xpra/client/gl/gl_check.py script will now show the status of zerocopy (renamed in r8100), ie with the broken Fedora version:

OpenGL properties:
 - GLU extensions           : GLU_EXT_nurbs_tessellator GLU_EXT_object_space_tess 
 - GLU version              : 1.3
 - display_mode             : ALPHA, SINGLE
 - gdkgl.version            : 1.4
 - gdkglext.version         : 1.2.0
 - gtkglext.version         : 1.2.0
 - has_alpha                : True
 - opengl                   : 4.4
 - pygdkglext.version       : 1.1.0
 - pyopengl                 : 3.1.0b2
 - renderer                 : GeForce GTX 760/PCIe/SSE2
 - rgba                     : True
 - shading language version : 4.40 NVIDIA via Cg compiler
 - vendor                   : NVIDIA Corporation
 - zerocopy                 : False

TIL: zerocopy True or False.

As of r8101, we also show which pixel upload method we use:

  • buggy pyopengl:
gtk2.GLWindowBacking(1, (499, 316), None).gl_marker(BGRX 24bpp update at (155,301) size 6x13 (312 bytes), \
    stride=24, row length 0, alignment 8, using GL copy:str format=BGRA)

  • up to date pyopengl:
gtk2.GLWindowBacking(1, (499, 316), None).gl_marker(BGRX 24bpp update at (155,301) size 6x13 (312 bytes), \
    stride=24, row length 0, alignment 8, using GL ('zerocopy:memoryview', <type 'str'>) format=BGRA)

TIL: copy:str vs zerocopy:memoryview ....

@totaam
Copy link
Collaborator Author

totaam commented Nov 13, 2014

2014-11-13 18:30:57: nickc commented


I think I've found at missing package on my system that was at least partially causing the problem. PyOpenGL-accelerate. Running gl_check.py (without using a grep for the pyopengl line), instead showing its full output, I noticed it was giving me a warning that PyOpenGL-accelerate was not installed.

And I see also that you've added it to the list of packages that should be installed.

So I installed that, and now the buggy build of xpra at r8017 shows an alternate symptom. The transparent_colors.py window comes up empty (just its chrome shows, and the content is all fully transparent - no colors). But it doesn't crash the client.

Instead I get this exception twice when I run the window:

2014-11-13 09:42:23,539 do_paint_rgb32 error
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/client/window_backing_base.py", line 292, in do_paint_rgb32
    success = (self._backing is not None) and self._do_paint_rgb32(img_data, x, y, width, height, rowstride, options)
  File "/usr/lib64/python2.7/site-packages/xpra/client/gl/gl_window_backing_base.py", line 475, in _do_paint_rgb32
    return self._do_paint_rgb(32, img_data, x, y, width, height, rowstride, options)
  File "/usr/lib64/python2.7/site-packages/xpra/client/gl/gl_window_backing_base.py", line 537, in _do_paint_rgb
    glTexImage2D(GL_TEXTURE_RECTANGLE_ARB, 0, self.texture_pixel_format, width, height, 0, pformat, GL_UNSIGNED_BYTE, img_data)
  File "latebind.pyx", line 32, in OpenGL_accelerate.latebind.LateBind.__call__ (src/latebind.c:989)
  File "wrapper.pyx", line 299, in OpenGL_accelerate.wrapper.Wrapper.__call__ (src/wrapper.c:6294)
  File "wrapper.pyx", line 161, in OpenGL_accelerate.wrapper.PyArgCalculator.c_call (src/wrapper.c:4241)
  File "wrapper.pyx", line 128, in OpenGL_accelerate.wrapper.PyArgCalculatorElement.c_call (src/wrapper.c:3601)
  File "wrapper.pyx", line 122, in OpenGL_accelerate.wrapper.PyArgCalculatorElement.c_call (src/wrapper.c:3520)
  File "/usr/lib/python2.7/site-packages/OpenGL/GL/images.py", line 453, in __call__
    return arrayType.asArray( arg )
  File "arraydatatype.pyx", line 174, in OpenGL_accelerate.arraydatatype.ArrayDatatype.asArray (src/arraydatatype.c:4221)
  File "arraydatatype.pyx", line 51, in OpenGL_accelerate.arraydatatype.HandlerRegistry.c_lookup (src/arraydatatype.c:2084)
  File "buffers_formathandler.pyx", line 63, in OpenGL_accelerate.buffers_formathandler.MemoryviewHandler.__init__ (src/buffers_formathandler.c:1128)
RuntimeError: ('Unable to load array constants', <OpenGL.GL.images.ImageInputConverter object at 0x3452d10>)

And if I run a video in epiphany, I get an continual stream of that same exception.

But, as with transparent_colors, I couldn't get epiphany to crash the client, with that package added.

I'll post a second comment to summarize my testing of the latest trunk code.

@totaam
Copy link
Collaborator Author

totaam commented Nov 13, 2014

2014-11-13 23:47:18: nickc commented


I tested the newest trunk code with 4 combinations, trying with beta PyOpenGL vs new, and PyOpenGL-accelerate installed vs uninstalled and the results are:

Beta PyOpenGL PyOpenGL-accelerate installed -- no problems
New PyOpenGL PyOpenGL-acclerate installed -- no problems
Beta PyOpenGL PyOpenGL-accelerate uninstalled -- no problems
New PyOpenGL PyOpenGL-accelerate uninstalled -- xterm crashes, can't even keep it displaying long enough to run transparent_colors.py

If you look at that failure case above, and think that's expected, then we can leave it as is. But if that combination should be supported, and you'd like me to pull logs let me know.

@totaam
Copy link
Collaborator Author

totaam commented Nov 14, 2014

2014-11-14 01:20:56: totaam commented


You say "New PyOpenGL PyOpenGL-accelerate uninstalled -- xterm crashes", but in the stacktrace I see OpenGL_accelerate.buffers_formathandler.MemoryviewHandler.__init__ - where is this file from if not from the rpm?

It looks like one way of avoiding this code path would be to check for the presence of OpenGL_accelerate, which should not be required - but if it avoids crashes.. maybe that's what we need to do.

@totaam
Copy link
Collaborator Author

totaam commented Nov 14, 2014

2014-11-14 01:39:12: nickc commented


That exception was occurring when I did have PyOpenGL-accelerate installed, testing it against the older r8017 code, to see if that fixed the first crash. That's where I was seeing the empty window, but no crash, and that variation in symptoms was only occurring because PyOpenGL-accelerate was installed.

@totaam
Copy link
Collaborator Author

totaam commented Nov 14, 2014

2014-11-14 03:10:40: totaam commented


That exception was occurring..
[[BR]]
OK, so we can ignore that one since it no longer occurs, right?


... uninstalled -- xterm crashes
[[BR]]
I don't understand how uninstalling the new rpms vs uninstalling the old ones can make any difference: the way rpm works, the two situations should be 100% identical. Something else must be different.

Running without any opengl is definitely supported, as this is the fallback case for when the video card is blacklisted or not capable of opengl acceleration.

@totaam
Copy link
Collaborator Author

totaam commented Nov 14, 2014

2014-11-14 16:33:34: nickc commented


[[BR]]

OK, so we can ignore that one since it no longer occurs, right?
[[BR]]
Agreed.

[[BR]]

I don't understand how uninstalling the new rpms vs uninstalling the old ones can make any difference: the way rpm works, the two situations should be 100% identical. Something else must be different.
[[BR]]

Right, I see now that the way I showed those results was not particularly clear, and probably ambiguous. Let me try a hopefully clearer method of expressing the results I got, testing trunk/LATEST:

PyOpenGL Version	PyOpenGL-accelerate installed     Test Results
----------------	-----------------------------     ------------
3.1.0b2			Yes		  		  Passed
3.1.0			Yes				  Passed
3.1.0b2			No				  Passed
3.1.0			No				  Failed - xterm crashes

@totaam
Copy link
Collaborator Author

totaam commented Nov 14, 2014

2014-11-14 16:50:41: totaam commented


Gotcha - I think.

The crash is not with trunk or even 0.14.11 I hope, is it?

@totaam
Copy link
Collaborator Author

totaam commented Nov 14, 2014

2014-11-14 17:01:19: nickc commented


[[BR]]

The crash is not with trunk or even 0.14.11 I hope, is it?
[[BR]]
Yes, all 4 of those tests were done with the latest trunk code at r8105. I didn't try it on 0.14.11.

@totaam
Copy link
Collaborator Author

totaam commented Nov 14, 2014

2014-11-14 17:24:35: totaam commented


OK, reproduced it easily enough now - thanks!

I believe r8111 should be the final fix for this bug (bar the backport to v0.14.x)

Does that work for you in all cases?
And preferably checking that we still do allow zero copy upload:

xpra attach --no-mmap --encoding=jpeg -d opengl |& grep copy

Shows (when zero copy is possible, that is when we have accelerate and 3.1.0final!):

BGRA 32bpp update at (0,0) size 320x320 (409600 bytes), stride=1280, row length 0, alignment 8, \
    using GL ('zerocopy:memoryview', <type 'buffer'>) format=BGRA

@totaam
Copy link
Collaborator Author

totaam commented Nov 14, 2014

2014-11-14 17:56:32: nickc commented


Ok, I've retested all 4 cases with the r8111 code, and tracked the output of that BGRA log. The zerocopy column indicates whether or not I was seeing zerocopy:memoryview in that log. All 4 cases passed, with no crash, and no other display anomalies.

PyOpenGL	Accelerate	Results		zerocopy  
--------	----------	-------		--------
3.1.0		yes		Passed		yes
3.1.0b2		no		Passed		no
3.1.0b2		yes		Passed		no
3.1.0		no		Passed		no

@totaam
Copy link
Collaborator Author

totaam commented Nov 19, 2014

2014-11-19 00:03:33: totaam changed status from new to closed

@totaam
Copy link
Collaborator Author

totaam commented Nov 19, 2014

2014-11-19 00:03:33: totaam changed resolution from ** to fixed

@totaam
Copy link
Collaborator Author

totaam commented Nov 19, 2014

2014-11-19 00:03:33: totaam commented


Backport in 8119. Closing.

@totaam totaam closed this as completed Nov 19, 2014
@totaam
Copy link
Collaborator Author

totaam commented Nov 20, 2014

2014-11-20 23:55:16: antoine commented


This "fix" broke mmap, see #741

@totaam
Copy link
Collaborator Author

totaam commented Nov 27, 2014

2014-11-27 01:25:50: totaam changed title from client crashes with opengl enabled, encoding jpeg to client crashes with opengl enabled and transparency

@totaam
Copy link
Collaborator Author

totaam commented Nov 27, 2014

2014-11-27 01:25:50: totaam commented


(updating ticket title - we may have to revisit as per #745)

@totaam
Copy link
Collaborator Author

totaam commented Sep 10, 2018

2018-09-10 06:57:49: antoine commented


Removed extra pixel copy in #1954. (with potential for regressions...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant