Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve picture buffer handling #465

Closed
totaam opened this issue Dec 3, 2013 · 48 comments
Closed

improve picture buffer handling #465

totaam opened this issue Dec 3, 2013 · 48 comments

Comments

@totaam
Copy link
Collaborator

totaam commented Dec 3, 2013

Issue migrated from trac ticket # 465

component: core | priority: major | resolution: fixed

2013-12-03 08:51:10: totaam created the issue


At the moment, some picture decoders return the pixel data as a read-only string buffer (PIL does). Others return a read-write buffer just because downstream requires it (ie: OpenCL does, despite only reading from the pixel buffer..)

Tasks:

  • Use the new buffer API
  • We should use something better and allow read-write access if the underlying buffer allows it (non reference frames), or use a copy-on-write mechanism
  • Try to fix OpenCL? (if possible..) see r4281 and r5255, question to the PyOpenCL mailing list here: read_only images, answer here Should be fixed in git - which means we can make the buffer read-only again and require a newer version of pyopencl (global switch?)
  • Carry a flag telling us if the buffer is read-only or not?
  • Carry a flag telling us if the buffer is thread safe or not? (so we know when to free the image from the UI thread - better than r4963)
@totaam
Copy link
Collaborator Author

totaam commented Dec 17, 2013

2013-12-17 16:07:58: totaam changed status from new to assigned

@totaam
Copy link
Collaborator Author

totaam commented Dec 17, 2013

2013-12-17 16:07:58: totaam changed owner from antoine to totaam

@totaam
Copy link
Collaborator Author

totaam commented Dec 17, 2013

2013-12-17 16:07:58: totaam edited the issue description

@totaam
Copy link
Collaborator Author

totaam commented Jan 23, 2014

2014-01-23 11:00:02: antoine edited the issue description

@totaam
Copy link
Collaborator Author

totaam commented Feb 4, 2014

2014-02-04 12:13:02: antoine edited the issue description

@totaam
Copy link
Collaborator Author

totaam commented Feb 18, 2014

2014-02-18 11:22:22: totaam commented


r5498 replaces r4963 with a much cleaner solution

@totaam
Copy link
Collaborator Author

totaam commented Feb 28, 2014

2014-02-28 05:38:05: totaam uploaded file newbuffer-vpx-opengl.patch (4.6 KiB)

uses the new buffer API for decoding vpx via dec_vpx and tries to upload to PyOpenGL without copying

@totaam
Copy link
Collaborator Author

totaam commented Feb 28, 2014

2014-02-28 05:41:01: totaam commented


As per this answer to my question on PyOpenGL zero copy buffer handling support, we should be able to pass a memoryview object directly for upload instead of calling clone_pixel_data in the OpenGL codepath. (see r5622 on why we copy).
The patch above should do that for vpx - which I am unable to test as this requires the latest PyOpenGL and a machine with OpenGL support. More work would be needed:

  • the memoryview code is Python 2.7 only, so needs to be moved to a common location so we can more easily re-use it from all the codecs
  • csc modules expect buffers not memory views, so again they need a more generic way of getting pointers from data
  • clone_pixel_data barfs that we cannot modify the view..
    etc.

Some pointers:

@totaam
Copy link
Collaborator Author

totaam commented Apr 5, 2014

2014-04-05 16:26:40: totaam uploaded file newbuffer-v2.patch​ (13.8 KiB)

updated patch using the new buffer API, works but leaks memory..

@totaam
Copy link
Collaborator Author

totaam commented Apr 7, 2014

2014-04-07 09:57:35: totaam uploaded file newbuffer-pyopengl-leaks.patch (4.0 KiB)

enables memoryview direct memory copy with pyopengl for memoryviews, str and buffers (indirectly) - all leak

@totaam
Copy link
Collaborator Author

totaam commented Apr 7, 2014

2014-04-07 10:03:33: totaam commented


Merged into a new buffer codec module in r6050, used by all decoders and CSC when "memoryview" is enabled (use --with-memoryview).
As can be seen by running for 5 minutes with the patch above which forces the use of memoryviews (creating one if necessary), the memory leaks only when using pyopengl and memoryviews together... (not leaking with the plain pixbuf window backing, not leaking when we copy the pixels before use)

Good read on buffer and memoryview: Ugliness of 2.7.

@totaam
Copy link
Collaborator Author

totaam commented Apr 7, 2014

2014-04-07 12:15:44: totaam uploaded file gl-memoryview-nocopy.patch (3.5 KiB)

zerocopy patch for pyopengl texture upload

@totaam
Copy link
Collaborator Author

totaam commented Apr 7, 2014

2014-04-07 12:24:57: totaam commented


Actually, it only leaks with pyopengl, but that's not where the leak is.
The new code works absolutely fine, as can be seen by running with the dec_vpx decoder instead of dec_avcodec2. The problem is with ffmpeg2 frame refcounting, using pyopengl makes us free the pixel data from the UI thread later than usual and that's what seems to confuse the avframe logic.

Could be related to #457.
Other tickets: the ffmpeg2 ticket was #415 and the buffer management for ffmpeg1 was #350.

As per this answer to my question to the libav mailing list, I think we need to use an approach more similar to what we did with ffmpeg1: "You need to fill AVFrame->buf[] using av_buffer_create when you allocate a frame, and that function takes a release callback which will be called once the buffer is unused and should be released. AVCodecContext->release_buffer is unused in this model."

I'm far from being the only one who is completely lost in this ffmpeg memory management mess, as can be seen with a quick google search, or directly on the ffmpeg mailing list: Memory Leaks? - "Everything else is slightly or completely broken or works only on older FFmpeg versions. It's possible that you don't need to unref the frame on very new FFmpeg, but I haven't checked."

@totaam
Copy link
Collaborator Author

totaam commented Apr 7, 2014

2014-04-07 14:58:08: totaam commented


New-style buffer support has been added in r6050, used in more places in r6053.
It is disabled by default and requires the build time switch: --with-memoryview (and Python 2.7)

r6055 enables zerocopy memoryview based texture upload with opengl, but not for avcodec2 because of the problems discussed in comment:7

Still TODO:

  • read-only: decide if we disable current versions of opencl on the clientside (not a big loss), or if we keep the buffer read-write for now
  • fix dec_avcodec2 (big)
  • add locking (noop when not needed) so we can re-use and free buffers without causing crashes (vpx?)

@totaam
Copy link
Collaborator Author

totaam commented Apr 30, 2014

2014-04-30 09:32:49: antoine commented


The argb module needed support for writeable buffers, added in r6248

@totaam
Copy link
Collaborator Author

totaam commented May 7, 2014

2014-05-07 16:38:55: antoine commented


Lots more work done on this to try to improve GTK3 support (see #90), for example: r6397. This is so ugly, there has to be a better way. What sort of an upgrade is this? python2 string behaviour is more error prone when dealing with character encodings, but still way better then this awful mess..

@totaam
Copy link
Collaborator Author

totaam commented May 17, 2014

2014-05-17 06:26:20: totaam commented


I think this causes random crashes even with python 2.7 when I enable it, deferring it.

@totaam
Copy link
Collaborator Author

totaam commented Sep 11, 2014

2014-09-11 03:48:45: totaam commented


Good link: An Introduction to the Python Buffer Protocol

@totaam
Copy link
Collaborator Author

totaam commented Oct 27, 2014

2014-10-27 19:30:44: totaam commented


Also related: #717 had to disable zero copy opengl buffer upload..

@totaam
Copy link
Collaborator Author

totaam commented Apr 15, 2015

2015-04-15 12:14:45: antoine uploaded file memoryview-encode.patch (5.0 KiB)

changes that may be needed for memoryview support encoding side

@totaam
Copy link
Collaborator Author

totaam commented Apr 15, 2015

2015-04-15 12:18:22: antoine commented


Running with memoryviews seems to work OK for the client, but with the server I get screen corruption.
The patch above is needed to avoid errors - nor merging it just yet because I'm not sure it is right. Here's the sort of errors you get without it:

  • with webp:
error processing damage data: pixel buffer is too small: expected at least 161476 bytes but got 0
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/server/source.py", line 1769, in encode_loop
    fn_and_args[0](*fn_and_args[1:])
  File "/usr/lib64/python2.7/site-packages/xpra/server/window_source.py", line 1187, in make_data_packet_cb
    packet = self.make_data_packet(damage_time, process_damage_time, wid, image, coding, sequence, options)
  File "/usr/lib64/python2.7/site-packages/xpra/server/window_source.py", line 1513, in make_data_packet
    ret = encoder(coding, image, options)
  File "/usr/lib64/python2.7/site-packages/xpra/server/window_source.py", line 1584, in webp_encode
    return webp_encode(coding, image, self.rgb_formats, self.supports_transparency, q, s, options)
  File "/usr/lib64/python2.7/site-packages/xpra/server/picture_encode.py", line 62, in webp_encode
    cdata = enc_webp.compress(image.get_pixels(), w, h, stride=stride/4, quality=quality, speed=speed, has_alpha=alpha)
  File "xpra/codecs/webp/encode.pyx", line 345, in xpra.codecs.webp.encode.compress (xpra/codecs/webp/encode.c:1948)
    assert pic_buf_len>=c, "pixel buffer is too small: expected at least %s bytes but got %s" % (c, pic_buf_len)
AssertionError: pixel buffer is too small: expected at least 161476 bytes but got 0
  • with nvenc:
error processing damage data: bad argument type for built-in operation
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/server/source.py", line 1769, in encode_loop
    fn_and_args[0](*fn_and_args[1:])
  File "/usr/lib64/python2.7/site-packages/xpra/server/window_source.py", line 1187, in make_data_packet_cb
    packet = self.make_data_packet(damage_time, process_damage_time, wid, image, coding, sequence, options)
  File "/usr/lib64/python2.7/site-packages/xpra/server/window_source.py", line 1513, in make_data_packet
    ret = encoder(coding, image, options)
  File "/usr/lib64/python2.7/site-packages/xpra/server/window_video_source.py", line 1251, in video_encode
    ret = self._video_encoder.compress_image(csc_image, quality, speed, options)
  File "xpra/codecs/nvenc4/encoder.pyx", line 1774, in xpra.codecs.nvenc4.encoder.Encoder.compress_image (xpra/codecs/nvenc4/encoder.c:19357)
    return self.do_compress_image(image, options)
  File "xpra/codecs/nvenc4/encoder.pyx", line 1821, in xpra.codecs.nvenc4.encoder.Encoder.do_compress_image (xpra/codecs/nvenc4/encoder.c:20292)
    self.inputBuffer.data[:len(pixels)] = pixels
TypeError: bad argument type for built-in operation

@totaam
Copy link
Collaborator Author

totaam commented Apr 16, 2015

2015-04-16 09:49:51: antoine changed priority from minor to critical

@totaam
Copy link
Collaborator Author

totaam commented Apr 16, 2015

2015-04-16 09:49:51: antoine commented


Fixes in r9019 (needs backporting), r9020.
The only problem remaining is the xor code, r9021 adds a test for it, r9022 + r9023 + r9024 improve the code a little bit.
It manifests itself as a memory corruption, and maybe it is. Raising.

@totaam
Copy link
Collaborator Author

totaam commented Apr 16, 2015

2015-04-16 11:18:27: antoine changed priority from critical to blocker

@totaam
Copy link
Collaborator Author

totaam commented Apr 16, 2015

2015-04-16 11:18:27: antoine commented


Found the first problem: we allocate a new buffer in the imagewrapper's allocate_buffer, and it frees the old one... which we are still using.

This explains why the same code works when used as client, but not as server: the server will call restride which does this naughty use-after-free.
The fix is not simple unfortunately, but it is critical.

Some minor fixes and tweaks in r9026 + r9027.

I have a patch for not freeing the buffer until we are done copying it, but:

  • it's ugly (would be better to move the restride code to image.pyx)
  • it doesn't actually fix (all?) the visual corruption I am seeing..

@totaam
Copy link
Collaborator Author

totaam commented Apr 16, 2015

2015-04-16 11:19:02: antoine uploaded file image-pixels-reallocate-fix.patch (3.3 KiB)

work in progress patch to fix use-after-free

@totaam
Copy link
Collaborator Author

totaam commented Apr 16, 2015

2015-04-16 17:40:12: antoine uploaded file xterm-xor-corruption.png (5.2 KiB)

what the corruption looks like
xterm-xor-corruption.png

@totaam
Copy link
Collaborator Author

totaam commented Apr 16, 2015

2015-04-16 18:27:27: antoine commented


For the record, the problem only occurs when building with --with-memoryview and with delta enabled (any number of buckets - see #756).
To make it easier to trigger I launch the server with:

XPRA_ENCODING_STRICT_MODE=1 XPRA_MIN_DELTA_SIZE=0 XPRA_MAX_DELTA_SIZE=6553600 \
    xpra start -d delta

And the client with:

XPRA_DELTA_BUCKETS=1 XPRA_OPENGL_PAINT_BOX=1 \
    xpra attach --no-mmap  --encoding=rgb -d delta

The output looks like this when corruption occurs:
[[Image(xterm-xor-corruption.png​)]]

There is nothing of interest in the debug output:

  • server:
delta: found empty bucket 0
delta: client options={'bucket': 0, 'lz4': 1, 'store': 2, 'rgb_format': 'BGRX'} (for region (0, 0, 511, 316))
  • client:
delta: storing sequence 2 in bucket 0

Stumped.

@totaam
Copy link
Collaborator Author

totaam commented Apr 17, 2015

2015-04-17 06:16:52: antoine commented


Things I have ruled out, I think, by turning it off (either with switches or modifying the code):

  • lz4 / lzo / zlib compression stage
  • restride stage
  • verified we don't rgb_reformat the pixel order
  • buckets getting cleared
  • tried copying the pixels instead of referencing the memoryview
  • cyxor code: switching to the old numpy also gives visual corruption, and if anything, more reliably! (why?)
  • all delta encodings are affected (not just rgb)

Some minor tweaks and improvements in r9034, r9035.

@totaam
Copy link
Collaborator Author

totaam commented Apr 17, 2015

2015-04-17 06:17:36: antoine uploaded file xor-debug.patch (11.3 KiB)

the patch I am using for debugging

@totaam
Copy link
Collaborator Author

totaam commented Apr 17, 2015

2015-04-17 09:05:21: antoine commented


Well, that was HARD.

Using this bit of code to print the address of the buffers we pass around:

def pp(info, v):
    cdef unsigned char * buf = NULL              #@DuplicatedSignature
    cdef Py_ssize_t buf_len = 0                     #@DuplicatedSignature
    assert object_as_buffer(v, <const void**> &buf, &buf_len)==0, "cannot get buffer pointer for %s: %s" % (type(v), v)
    print("%s=%#x (len=%s)" % (info, <unsigned long> buf, buf_len))

I quickly found that we were sometimes xoring the same memory region..
Turns out that slicing a memoryview just gives you a new memoryview pointing to the same memory... And so when the pixels get re-allocated, we end up pointing to whatever got allocated there after it got freed!
Fixed in r9039.
I've also merged the ugly reallocate fix above in r9040. Both need backporting. And I still want to make this code a bit nicer: moving the restride code to ximage (the only image wrapper that needs restriding).

More minor related updates (debugging, cleanups, etc) in r9036, r9037, r9038.

@totaam
Copy link
Collaborator Author

totaam commented Apr 17, 2015

2015-04-17 13:26:07: antoine commented


r9043 makes the xor code about 25% faster by avoiding unnecessary copying of the output (ideally we could do it in place in some cases when we own the buffer already... but the wrapper does not expose that information, yet - same issue as restriding: these should be methods on the wrapper, a bit like what PIL does with its Image class)

@totaam
Copy link
Collaborator Author

totaam commented Apr 19, 2015

2015-04-19 11:58:22: antoine changed priority from blocker to major

@totaam
Copy link
Collaborator Author

totaam commented Apr 19, 2015

2015-04-19 11:58:22: antoine changed status from assigned to new

@totaam
Copy link
Collaborator Author

totaam commented Apr 19, 2015

2015-04-19 11:58:22: antoine changed owner from totaam to afarr

@totaam
Copy link
Collaborator Author

totaam commented Apr 19, 2015

2015-04-19 11:58:22: antoine commented


Forgot to record some partial fixes to the mmap code, wrongly committed together with sound fixes in r9018.
Found some missing bits in less used codepaths: r9060 and r9062 fix those.
Minimal backports for v0.14.x have been applied in 9056, 9057, 9059, 9061.
There is a follow up ticket with further enhancements: #839.

Note: the v0.14.x branch is not affected by the ximage use-after-free bug because we do extra memory copies in this older version of the code.
I am lowering the priority now that the corruption bugs have been fixed, and also because they would not affect our default non-gtk3 builds.

@afarr: these trunk (v0.15) changes could have a significant impact on performance. (and a negative one if I have made mistakes..) So it is worth running some benchmarks to check that trunk has not regressed (unlikely but worth checking), and also comparing trunk "normal builds" with --with-memoryviews builds. (both client and server, mix and match them too)
Since the changes are very likely to affect some encodings more than others (rgb with delta, webp?, nvenc?), it may well be necessary to narrow down the benchmarks using command lines similar to the ones in comment:17 to try to identify if there really are differences between the old and new implementations.

I do not intend to make "new buffers" the default for v0.15.x (too late in the release cycle for that), but once we get some data on how well this performs, we can consider it for v0.16.x

@totaam
Copy link
Collaborator Author

totaam commented May 3, 2015

2015-05-03 12:46:38: antoine commented


0.16 enables memoryview by default in r9223, so checking the performance is becoming more important.

@totaam
Copy link
Collaborator Author

totaam commented May 16, 2015

2015-05-16 15:21:08: antoine commented


Since this is deep in the C / Cython layer, it is difficult to know what option was used for building the codecs, r9394 + r9395 (trunk / 0.16) expose it as buffer_api in the codec loader info, ie: with Fedora building against local libraries:

codecs versions:
* PIL                  : 2.7.0
* avcodec2             : 56.1.100
* buffer_api           : 1
* cython               : 0.3.0.22
* dec_webp             : 0.4.2
* enc_webp             : 0.4.2
* numpy                : 1.8.2
* nvenc4               : 4.0.0
* nvenc5               : 5.0.0
* swscale              : 3.0.100
* vpx                  : 1.3.0
* x264                 : 142

And also via xpra info:

$ xpra info | grep buffer_api
encoding.buffer_api.version=1

Note: in theory, it is still possible to build one module against one version and another module against a different version.. (for example by interrupting the build half way through, then switching to the other mode to continue.. or when fixing bugs). We only report what was used when building the argb codec (which serves as a bit of a dumping ground for utility cython functions).

@totaam
Copy link
Collaborator Author

totaam commented May 22, 2015

2015-05-22 17:41:43: antoine commented


Some related bugs with link to the fixes: #863, #866

@totaam
Copy link
Collaborator Author

totaam commented May 28, 2015

2015-05-28 04:33:09: antoine commented


As per #863#comment:13, this got tested with a Fedora server: Building --with-memoryview and --without-memoryview has no noticeable effect in the 0.15.X branch or trunk..

@totaam
Copy link
Collaborator Author

totaam commented May 28, 2015

2015-05-28 04:54:04: antoine commented


As per #866#comment:8, the proxy server with nvenc and nvenc standalone should be tested more thoroughly.

@totaam
Copy link
Collaborator Author

totaam commented Jul 28, 2015

2015-07-28 07:52:04: antoine commented


Note: a backport to v0.15.x caused a performance regression, see #926.

As part of this ticket, please add more performance data to CSC Performance. Maybe this data should be part of some regression testing.

@totaam
Copy link
Collaborator Author

totaam commented Aug 2, 2015

2015-08-02 17:54:06: antoine commented


Note: I believe that there may well be some issues left in 0.15 that we never caught before the release, so it is a good thing that this is not enabled by default there - re-assigning to the 0.16 milestone where it is now the default with Python 2.7 and later.

Further work on buffers will continue in #935.

@totaam
Copy link
Collaborator Author

totaam commented Aug 7, 2015

2015-08-07 07:53:25: antoine commented


Huge use after free bug fixed in r10230, caused crashes with the vpx decoder on win32 (less of a problem with avcodec because although we free the buffer, avcodec still used it - so it didn't crash)

@totaam
Copy link
Collaborator Author

totaam commented Jan 5, 2016

2016-01-05 06:35:31: antoine commented


Support for old style buffers will be dropped in 0.18: #1073

@totaam
Copy link
Collaborator Author

totaam commented Mar 16, 2016

2016-03-16 07:17:43: antoine changed status from new to closed

@totaam
Copy link
Collaborator Author

totaam commented Mar 16, 2016

2016-03-16 07:17:43: antoine set resolution to fixed

@totaam
Copy link
Collaborator Author

totaam commented Mar 16, 2016

2016-03-16 07:17:43: antoine commented


No problems found in 0.16 with the new buffers. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant