Refresh bug in certain applications #1781

totaam · 2018-03-02T17:32:11Z

Issue migrated from trac ticket # 1781

component: encodings | priority: major | resolution: fixed

2018-03-02 17:32:11: maxmylyn created the issue

I finally managed to reliably reproduce this - it's easiest in Firefox and Chrome on [https://sfbay.craigslist.org/search/cta] (a site for classifieds - this sub-site happens to be for cars, I've gotten it to trigger on the Motorcycles one as well).

My client and server are both Fedora 26 machines running trunk r18661 built from source.

I started my server with: XPRA_BANDWIDTH_DETECTION=0 xpra start :13 --bind-tcp=0.0.0.0: --start-new-commands=yes --start-child=xterm --start-child=google-chrome --start-via-proxy=no --systemd-run=no --no-daemon (running without the bandwidth limits makes it easier to detect) and attached with xpra attach tcp:ip.

Once attached I nagivated to the link I provided above. Just mousing over a few of the ads (and cycling through the photos) is enough to induce a state where the client is constantly repainting. Continued use of this site makes the paints more and more aggressive to the point where it becomes almost impossible to type into the Xterm.

I ran two different sessions - one to get the xpra info of both a fresh and a bugged state. And then a second one with -d refresh, compress enabled. On the second run I got into the bugged state almost instantly - I let it sit for at least 10-20 seconds before trying to stop the session with the Xterm. Hopefully that's enough.

The text was updated successfully, but these errors were encountered:

totaam · 2018-03-02T17:33:04Z

2018-03-02 17:33:04: maxmylyn uploaded file `1781freshinfo.txt` (149.2 KiB)

Fresh xpra info

totaam · 2018-03-02T17:33:19Z

2018-03-02 17:33:19: maxmylyn uploaded file `1781buggedinfo.txt` (150.8 KiB)

After stuck in a constant refresh state

totaam · 2018-03-02T17:41:19Z

2018-03-02 17:41:19: maxmylyn uploaded file `1781refreshcompresstrimmed.log` (325.2 KiB)

uncompressed and trimmed log - the first few seconds, and the last few seconds (at the point where it was sitting and not being interacted with)

totaam · 2018-03-02T17:41:45Z

2018-03-02 17:41:45: maxmylyn uploaded file `1781log.tar` (20.0 KiB)

tarball of the entire log - tarred with -caf

totaam · 2018-03-05T16:14:32Z

2018-03-05 16:14:32: antoine changed owner from antoine to maxmylyn

totaam · 2018-03-05T16:14:32Z

2018-03-05 16:14:32: antoine commented

Your log file is empty:
$ tar -xvf 1781log.tar 
1781refreshcompress.log
$ ls -la 1781*
-rw-rw-r--. 1 antoine antoine 20480 Mar  3 12:40 1781log.tar
-rw-rw-r--. 1 antoine antoine 10240 Mar  3 00:38 1781refreshcompress.log
$ wc -l 1781refreshcompress.log 
0 1781refreshcompress.log
In any case, never put a single file in a tar file. That never does anything useful except increasing the file size, tar is not a compressed format, it is archival format.
Use xz instead.

As for the log sample attached, it is trimmed just around the time where things got interesting so I don't really know how things got into this state.

totaam · 2018-03-06T17:16:59Z

2018-03-06 17:16:59: maxmylyn uploaded file `1781log.tar.xz` (192.7 KiB)

(properly) compressed full log

totaam · 2018-03-06T17:21:03Z

2018-03-06 17:21:03: maxmylyn commented

Well I've learned something today:

The a flag on tar will auto use the correct compression based on the name, rather than magically auto-selecting the proper one based on what I assumed it was before

I've properly tarred (and verified that it's correct) the full log from start to finish.

totaam · 2018-03-06T17:27:58Z

2018-03-06 17:27:58: antoine commented

As per comment:1:

never put a single file in a tar file. That never does anything useful except increasing the file size, tar is not a compressed format, it is archival format. Use xz instead.
200	1781log.tar.xz
196	1781refreshcompress.log.xz

totaam · 2018-03-10T05:51:26Z

2018-03-10 05:51:26: antoine commented

I believe this is caused by r17540, triggering a refresh loop.
This bug is present in the 2.2.x branch, but I believe that the bandwidth detection changes in 2.3 (#999) make it easier to trigger.

FYI: you can generally avoid refresh related problems by turning off auto-refresh.

18683 disables lossy refresh in the 2.2.x branch.
I will find a better solution for 2.3.

totaam · 2018-03-10T09:27:25Z

2018-03-10 09:27:25: antoine commented

My guess in comment:4 was based on log events like this one:
compress:   0.1ms for    2x1    pixels at 1126,965  for wid=2     using      webp with ratio 525.0%  (    0KB to     0KB), sequence  3608, client_options={'flush': 3, 'quality': 100, 'rgb_format': 'BGRX'}
compress:   0.3ms for    3x16   pixels at 1126,22   for wid=2     using      webp with ratio  21.9%  (    0KB to     0KB), sequence  1165, client_options={'flush': 3, 'quality': 100, 'rgb_format': 'BGRX'}
We shouldn't be using "webp" for such tiny screen updates.

But it turns out that this may not be fixed by 18683, r18685 is likely to be a more correct solution. (not sure this should be backported though).

It's not clear to me which part of the screen was refreshing unnecessarily. Can you specify the coordinates or at least take a screenshot with paint boxes turned on.
And assuming that you can reproduce this reliably still, please try turning off features to see if they are involved in this, ie: video region detection, refresh delay, etc..

totaam · 2018-03-13T20:34:17Z

2018-03-13 20:34:17: maxmylyn commented

Upped client and server to r18711

I no longer see the excessive flickering that made it obvious what was going on. However, my session still bogs down to the point of being unusable in the same manner, and the logs still show a flood of Webp refreshes -d refresh (see below). Turning auto-refresh off xpra control :13 auto-refresh 0 makes the issue go away; alternatively detaching and re-attaching help, but the issue will crop up again eventually.

2018-03-13 13:17:53,688 auto refresh:  webp screen update (actual quality=100, lossy=False), nothing to do (region=rectangle(0, 0, 1128, 877), refresh regions=[])
2018-03-13 13:17:53,688 auto refresh: rgb24 screen update (actual quality=100, lossy=False), nothing to do (region=rectangle(474, 78, 637, 28), refresh regions=[])
2018-03-13 13:17:53,735 auto refresh:  webp screen update (actual quality=100, lossy=False), nothing to do (region=rectangle(0, 0, 1128, 877), refresh regions=[])
2018-03-13 13:17:53,901 auto refresh:  webp screen update (actual quality=100, lossy=False), nothing to do (region=rectangle(0, 0, 1128, 877), refresh regions=[])
2018-03-13 13:17:54,067 auto refresh:  webp screen update (actual quality=100, lossy=False), nothing to do (region=rectangle(0, 0, 1128, 877), refresh regions=[])
2018-03-13 13:17:54,290 auto refresh:  webp screen update (actual quality=100, lossy=False), nothing to do (region=rectangle(0, 0, 1128, 877), refresh regions=[])
2018-03-13 13:17:54,368 auto refresh:  webp screen update (actual quality=100, lossy=False), nothing to do (region=rectangle(0, 0, 1128, 877), refresh regions=[])
2018-03-13 13:17:54,492 auto refresh:  webp screen update (actual quality=100, lossy=False), nothing to do (region=rectangle(0, 0, 1128, 877), refresh regions=[])
2018-03-13 13:17:54,585 auto refresh:  h264 screen update (actual quality=  0, lossy= True), added pixels to refresh regions, scheduling refresh in 150ms (pct=0, batch=28) (region=rectangle(0, 73, 1128, 804), refresh regions=[])
2018-03-13 13:17:54,586 auto refresh: rgb24 screen update (actual quality=100, lossy=False), nothing to do (region=rectangle(0, 0, 1128, 73), refresh regions=[])

I also managed to get a traceback once or twice(something about video subregion), but can't seem to get it again for this comment.

totaam · 2018-03-14T00:20:18Z

2018-03-14 00:20:18: antoine commented

Hmmm, looking at this again, maybe r18712 will fix this.
This code is far from being new though, so I would have expected it to be reported sooner (every version going back to pre 1.0 releases would be affected, unless somehow a changeset like r17480 made it easier to trigger).

If you still hit the issue, please provide "-d damage,refresh,regionrefresh,subregion" with more than a few lines and maybe also with "-d encoding" added. And if it does fire similar large-ish webp refreshes, does the region being refreshed match xpra info | grep video_subregion?

The traceback could be critical, please always save and report ALL tracebacks.

totaam · 2018-03-16T17:13:20Z

2018-03-16 17:13:20: maxmylyn commented

Upped client and server to r18739:

I can still trigger this exact same state fairly reliably - by opening Chrome and navigating to [sfbay.craigslist.org] and navigating a few of the ads (cars and motorcycles in particular). Oddly enough it was far more difficult to trigger while the logs were running, at least in my case. It took a few tries yesterday afternoon, but I managed to get it fairly quickly this morning before the log file got way too large. Running with the OpenGL paint boxes makes it much easier to tell when you've gotten into the correct state - it's most apparent when you see a ton of small paint boxes along the edges of the windows. I'll attach the log you asked for.

NOTE: I switched from a browser and terminal start children to two start terminal children, and I used one for starting the browser and the other to stop the session quickly.

As you mentioned as well, I double checked to see if they matched from xpra info | grep video_subregion and....they kind of do.

From my Xpra info (different session than the logs I'll be attaching):
window.3.video_subregion.rectangle=(1, 1, 1127, 965)
window.3.video_subregion.score=99
window.3.video_subregion.scores.(1, 1, 1127, 965)=99
window.3.video_subregion.set-at=7354
window.3.video_subregion.supported=True
window.3.video_subregion.time=65861
window.3.video_subregion.video-mode=False
window.3.video_subregion.width=1127
window.3.video_subregion.x=1
window.3.video_subregion.y=1
and from the logs (same session as the one I got the info from):
2018-03-16 09:45:43,432 auto refresh:  webp screen update (actual quality=100, lossy=False), nothing to do (region=rectangle(1, 106, 4, 856), refresh regions=[])
2018-03-16 09:45:43,470 remove_refresh_region(rectangle(161, 301, 6, 13)) updated refresh regions=[]
2018-03-16 09:45:43,470 auto refresh: rgb24 screen update (actual quality=100, lossy=False), nothing to do (region=rectangle(161, 301, 6, 13), refresh regions=[])
2018-03-16 09:45:43,479 refresh_subregion([R(0, 3, 1, 960)])
2018-03-16 09:45:43,503 remove_refresh_region(rectangle(1, 1, 1126, 105)) updated refresh regions=[]
2018-03-16 09:45:43,504 auto refresh:  webp screen update (actual quality=100, lossy=False), nothing to do (region=rectangle(1, 1, 1126, 105), refresh regions=[])
2018-03-16 09:45:43,505 remove_refresh_region(rectangle(1126, 1, 2, 383)) updated refresh regions=[]
2018-03-16 09:45:43,505 auto refresh:  webp screen update (actual quality=100, lossy=False), nothing to do (region=rectangle(1126, 1, 2, 383), refresh regions=[])
2018-03-16 09:45:43,600 refresh_subregion([R(0, 0, 1128, 3), R(1124, 3, 4, 959), R(0, 962, 1128, 4)])
2018-03-16 09:45:43,602 refresh_subregion([R(3, 0, 1122, 1)])
2018-03-16 09:45:43,602 remove_refresh_region(rectangle(0, 0, 1129, 967)) updated refresh regions=[]
2018-03-16 09:45:43,602 refresh_subregion([R(1, 1, 1126, 105), R(1, 106, 4, 856), R(1124, 106, 3, 856), R(1, 962, 1126, 3)])
Note the (1, 1, 1126, 105) from the logs and the (1, 1, 1127, 965) from the info. The start coordinates are the same, and the ending X-coordinates almost match, but the subregion from the info is much taller than what's getting auto-refreshed. That feels like it's too big of a coincidence, but I have a feeling you know a lot more about what's going on here.

totaam · 2018-03-16T17:14:20Z

2018-03-16 17:14:20: maxmylyn uploaded file `1781damagerefreshregionsubregion.log` (3203.3 KiB)

totaam · 2018-03-16T17:51:26Z

2018-03-16 17:51:26: antoine commented

Note the (1, 1, 1126, 105) from the logs and the (1, 1, 1127, 965) from the info.
That's very likely to be because the browser paints the video area using horizontal bands. This data from the log file uses a different geometry but the idea is the same (here with 58 pixel high bands):
damage(5, 48, 1119, 58, {})    wid=3, scheduling batching expiry for sequence 1809 in 34.0 ms
damage(5, 106, 1119, 58, {})   wid=3, using existing delayed {} regions created 0.0ms ago
damage(5, 164, 1119, 58, {})   wid=3, using existing delayed {} regions created 0.0ms ago
...
damage(5, 860, 1119, 58, {})   wid=3, using existing delayed {} regions created 0.0ms ago
damage(5, 918, 1119, 44, {})   wid=3, using existing delayed {} regions created 0.0ms ago
We then run some clever code to identify the "real" video region from that. (usually by aggregating the horizontal bands together)
We also round down the dimensions (1127 becomes 1126) because video encoders often use colourspace subsampling, which requires even dimensions to operate.

This log file looks like it would be useful, but I'm not sure where to look (~25000 lines in there): did the problem appear at the start? If not, when?
Do you have the corresponding "paint boxes" screenshot so I can try to reconcile? Or at least the "xpra info"?

totaam · 2018-03-16T17:53:01Z

2018-03-16 17:53:01: maxmylyn commented

I managed to get it into the stuck state in the last few seconds of the log - when the flood of webp paints start is usually when it's in the stuck state.

I'll try again and post a new log with screenshots and an xpra info.

totaam · 2018-03-16T18:19:14Z

2018-03-16 18:19:14: maxmylyn commented

Okay got myself stuck into the state really quick with the logs running, so I'll attach them and a screenshot and an Xpra info.

totaam · 2018-03-16T18:19:45Z

2018-03-16 18:19:45: maxmylyn uploaded file `1781loground2.log` (1025.9 KiB)

totaam · 2018-03-16T18:19:58Z

2018-03-16 18:19:58: maxmylyn uploaded file `1781loginfo.txt` (170.6 KiB)

totaam · 2018-03-16T18:20:44Z

2018-03-16 18:20:44: maxmylyn uploaded file `1781 paint bug.png` (701.9 KiB)

Stuck paint state - note the boxes along the edge

totaam · 2018-03-17T17:27:51Z

2018-03-17 17:27:51: antoine commented

From the log, we see refreshes following the last add_video_refresh on line 7802:
refresh_subregion([R(1124, 3, 4, 184), R(1124, 207, 4, 25), R(1124, 251, 4, 694), R(1124, 958, 4, 3), R(1124, 961, 4, 1), R(0, 962, 1128, 1)])
Which would match the right and bottom edges of the window where the black paint boxes are on the screenshot: window.3.size=(1129, 967).
Just noticed that webp was missing from the default [/browser/xpra/trunk/src/xpra/client/paint_colors.py paint colours] definition, which is why it showed up as black rather than pink, this is fixed in r18745.

Then each one of those regions gets refreshed, ie:
auto refresh:  webp screen update (actual quality=100, lossy=False), nothing to do (region=rectangle(0, 962, 1128, 1), refresh regions=[])
Yet, after about 130ms, we're back to refreshing the same list again. Without seeing any damage(..) events in between.

That's because the video area shrank and we then request the areas that are no longer managed as video to get refreshed... and the method fires repeatedly instead of just once. The minimal fix for this is r18747.
This bug affected all versions that have video region support. (some recent optimizations may have made it easier to hit)

r18748 is a more complete code update in this area and r18749 is another related minor improvement.

totaam · 2018-03-28T16:57:47Z

2018-03-28 16:57:47: maxmylyn commented

Meant to close this one but somehow I never got around to it. I'll just blame that on being retasked.

Anyways, post r18749, I no longer see the flickering or excessive webp refreshes. I do see some fairly consistent small updates after a short time in some cases, but I can't rule out odd Firefox or Chrome painting behavior as being a culprit, and it doesn't seem to have any notable impact on my session like I was seeing before.

Closing

totaam closed this as completed Mar 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refresh bug in certain applications #1781

Refresh bug in certain applications #1781

totaam commented Mar 2, 2018

totaam commented Mar 2, 2018

totaam commented Mar 2, 2018

totaam commented Mar 2, 2018

totaam commented Mar 2, 2018

totaam commented Mar 5, 2018

totaam commented Mar 5, 2018

totaam commented Mar 6, 2018

totaam commented Mar 6, 2018

totaam commented Mar 6, 2018

totaam commented Mar 10, 2018

totaam commented Mar 10, 2018

totaam commented Mar 13, 2018

totaam commented Mar 14, 2018

totaam commented Mar 16, 2018

totaam commented Mar 16, 2018

totaam commented Mar 16, 2018

totaam commented Mar 16, 2018

totaam commented Mar 16, 2018

totaam commented Mar 16, 2018

totaam commented Mar 16, 2018

totaam commented Mar 16, 2018

totaam commented Mar 17, 2018

totaam commented Mar 28, 2018

Refresh bug in certain applications #1781

Refresh bug in certain applications #1781

Comments

totaam commented Mar 2, 2018

2018-03-02 17:32:11: maxmylyn created the issue

totaam commented Mar 2, 2018

2018-03-02 17:33:04: maxmylyn uploaded file 1781freshinfo.txt (149.2 KiB)

totaam commented Mar 2, 2018

2018-03-02 17:33:19: maxmylyn uploaded file 1781buggedinfo.txt (150.8 KiB)

totaam commented Mar 2, 2018

2018-03-02 17:41:19: maxmylyn uploaded file 1781refreshcompresstrimmed.log (325.2 KiB)

totaam commented Mar 2, 2018

2018-03-02 17:41:45: maxmylyn uploaded file 1781log.tar (20.0 KiB)

totaam commented Mar 5, 2018

2018-03-05 16:14:32: antoine changed owner from antoine to maxmylyn

totaam commented Mar 5, 2018

2018-03-05 16:14:32: antoine commented

totaam commented Mar 6, 2018

2018-03-06 17:16:59: maxmylyn uploaded file 1781log.tar.xz (192.7 KiB)

totaam commented Mar 6, 2018

2018-03-06 17:21:03: maxmylyn commented

totaam commented Mar 6, 2018

2018-03-06 17:27:58: antoine commented

totaam commented Mar 10, 2018

2018-03-10 05:51:26: antoine commented

totaam commented Mar 10, 2018

2018-03-10 09:27:25: antoine commented

totaam commented Mar 13, 2018

2018-03-13 20:34:17: maxmylyn commented

totaam commented Mar 14, 2018

2018-03-14 00:20:18: antoine commented

totaam commented Mar 16, 2018

2018-03-16 17:13:20: maxmylyn commented

totaam commented Mar 16, 2018

2018-03-16 17:14:20: maxmylyn uploaded file 1781damagerefreshregionsubregion.log (3203.3 KiB)

totaam commented Mar 16, 2018

2018-03-16 17:51:26: antoine commented

totaam commented Mar 16, 2018

2018-03-16 17:53:01: maxmylyn commented

totaam commented Mar 16, 2018

2018-03-16 18:19:14: maxmylyn commented

totaam commented Mar 16, 2018

2018-03-16 18:19:45: maxmylyn uploaded file 1781loground2.log (1025.9 KiB)

totaam commented Mar 16, 2018

2018-03-16 18:19:58: maxmylyn uploaded file 1781loginfo.txt (170.6 KiB)

totaam commented Mar 16, 2018

2018-03-16 18:20:44: maxmylyn uploaded file 1781 paint bug.png (701.9 KiB)

totaam commented Mar 17, 2018

2018-03-17 17:27:51: antoine commented

totaam commented Mar 28, 2018

2018-03-28 16:57:47: maxmylyn commented

2018-03-02 17:33:04: maxmylyn uploaded file `1781freshinfo.txt` (149.2 KiB)

2018-03-02 17:33:19: maxmylyn uploaded file `1781buggedinfo.txt` (150.8 KiB)

2018-03-02 17:41:19: maxmylyn uploaded file `1781refreshcompresstrimmed.log` (325.2 KiB)

2018-03-02 17:41:45: maxmylyn uploaded file `1781log.tar` (20.0 KiB)

2018-03-06 17:16:59: maxmylyn uploaded file `1781log.tar.xz` (192.7 KiB)

2018-03-16 17:14:20: maxmylyn uploaded file `1781damagerefreshregionsubregion.log` (3203.3 KiB)

2018-03-16 18:19:45: maxmylyn uploaded file `1781loground2.log` (1025.9 KiB)

2018-03-16 18:19:58: maxmylyn uploaded file `1781loginfo.txt` (170.6 KiB)

2018-03-16 18:20:44: maxmylyn uploaded file `1781 paint bug.png` (701.9 KiB)