move content guesser data to a user editable file #1950

totaam · 2018-09-09T03:34:59Z

Issue migrated from trac ticket # 1950

component: encodings | priority: major | resolution: fixed

2018-09-09 03:34:59: antoine created the issue

Move the hard-coded definitions added in r17665 (#1699).

The default should go in /usr/share/xpra, but we should support per-user overrides. (~/local/share/xpra)

The file should support regular expressions.
So we can match:

gmail and ensure this is tagged as "browser":
_NET_WM_NAME(UTF8_STRING) = "Inbox (7) - [email protected] - Gmail - Google Chrome"
youtube as "video":
WM_NAME(UTF8_STRING) = "(9) "SomeTitle - YouTube - Google Chrome"
etc

The text was updated successfully, but these errors were encountered:

totaam · 2018-09-09T06:26:34Z

2018-09-09 06:26:34: antoine commented

Done in r20364 + r20365.

We ship a set of default configuration files and users can add their own.

@maxmylyn: we should improve those definitions to cover more of the top websites (see r20365 for format - those are python regular expressions), you can find the values to match using xpra info or just using xprop on the browser window.

We should take advantage of this hint more: #1952.

totaam · 2018-09-10T19:23:56Z

2018-09-10 19:23:56: maxmylyn commented

I can take a stab at this, but it's going to be very very time-consuming.

I'll bet we can find a shortcut somewhere by using other application's compatibility lists. That is, we could find a list of youtube-dl's supported sites and put those into a giant list. Or we could do something similar for a list of all news sites, etc etc. And that's just for a website list, and that doesn't even begin to tackle the thousands of desktop applications.

totaam · 2018-09-11T05:14:20Z

2018-09-11 05:14:20: antoine commented

I can take a stab at this, but it's going to be very very time-consuming.
No-one is asking you to catalogue every app or every website.
Focusing on the top 10 or top 20 can already give you 80% of page views / application used.

we could find a list of youtube-dl's supported sites and put those into a giant list
Not sure what "youtube-dl" means here.

that doesn't even begin to tackle the thousands of desktop applications.
Again, the top 10 is what matters, a generic solution is now in #1956

totaam · 2018-09-21T22:20:59Z

2018-09-21 22:20:59: maxmylyn commented

I did a first-pass using [https://moz.com/top500] as guidance, I'll attach my 30_title.conf for some feedback as I'm still not 100% sure about it. I think I've got a handle on how the Python Regex works, but Regex really is not a strong suit of mine. At all..

Not sure what "youtube-dl" means here.

youtube-dl is a utility that can download videos from thousands of supported sites. I was theorizing we could take its compatibility list as a shortcut for grabbing nearly every major video site on the internet. But, that would also include hundreds of adult websites, so I'm not sure how we as a project feel about that. (I know I don't care, but I'm not the Xpra project)

But, more importantly, that would also catch a number of sites that should be primarily defined as picture sites rather than video.

totaam · 2018-09-21T22:21:29Z

2018-09-21 22:21:29: maxmylyn uploaded file `30_title.conf` (0.9 KiB)

revised 30_title.conf

totaam · 2018-09-22T06:36:59Z

2018-09-22 06:36:59: antoine commented

I'm still not 100% sure about it

it says "top 20", but the half of those sites are above that header line - just drop it altogether, it will change over time anyway

no need to repeat "a no-go for the same reason as" for every site this applies to, just group them together in a commented out section

comments were not supported unless the whole line started with "#" (r20498 changes that) so I assume that you didn't actually test any of this? (ie: verifying the content-type for the window with "xpra info")

but Regex really is not a strong suit of mine
Here is a one liner:
python -c 'import re;print(bool(re.search("- Gmail -", "Inbox - Gmail - Whatever")))'
The upstream doc is here: python 2.7 re module

that would also include hundreds of adult websites
That's not really a problem.
also catch a number of sites that should be primarily defined as picture sites rather than video
But this is.

And it would make the file unnecessarily long. Let's keep it small and simple.

What content types are allowed?
It's free text.

I see text, picture, and video,
Those are the only 3 content-types that the encoding heuristics currently know about.

but are there more?
No, but we can add more if needed.

As in, what's the complete list of acceptable content types for sites.
FYI: content-types are for window content, sites happen to be a type of window content seen through a browser.

For example, a site like Facebook and Instagram have both pictures and videos - so assigning one or the other will have an impact on the other type of content that appears thereon. Something like "mixed" is probably the right way to classify them.
No.
Use text if the priority is to get crisp text quickly, use picture or video if the text content is secondary. You should still get crisp text, just sometimes not as quickly, and the graphical content will compress and look better.
So I would use "text" for facebook and "picture" for instagram.

totaam · 2018-09-24T22:58:48Z

2018-09-24 22:58:48: maxmylyn commented

No, I haven't actually tested this to make sure it's working; I was more concerned with getting the proper site content matched to the site than I was making sure the updated conf worked. If that train of thought makes sense.

No.
Use text if the priority is to get crisp text quickly, use picture or video if the text content is secondary. You should still get crisp text, just sometimes not as quickly, and the graphical content will compress and look better.

Okay that was my thought process as well, I just wanted to make sure that was actually the case.

I touched it up and removed all the comments. I'll post it to this ticket for your approval.

totaam · 2018-09-24T22:59:12Z

2018-09-24 22:59:12: maxmylyn uploaded file `30_title.2.conf` (0.5 KiB)

Updated 30_title.conf with comments and unnecessary text removed.

totaam · 2018-09-25T06:18:37Z

2018-09-25 06:18:37: antoine commented

Applied in r20525.

See also #2023

totaam · 2019-01-09T15:35:41Z

2019-01-09 15:35:41: antoine commented

Improvement merged from #2100: r21302 allows users to specify content type definitions using the XPRA_CONTENT_TYPE_DEFS env var.

totaam · 2019-01-18T17:36:47Z

2019-01-18 17:36:47: antoine commented

I believe this has been tested by nathan_lstc as part of #2100 / r21302. So closing.

totaam closed this as completed Jan 18, 2019

totaam added the v2.3.x label Jan 22, 2021

totaam added encoding enhancement New feature or request labels Jan 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

move content guesser data to a user editable file #1950

move content guesser data to a user editable file #1950

totaam commented Sep 9, 2018

totaam commented Sep 9, 2018

totaam commented Sep 10, 2018

totaam commented Sep 11, 2018

totaam commented Sep 21, 2018

totaam commented Sep 21, 2018

totaam commented Sep 22, 2018

totaam commented Sep 24, 2018

totaam commented Sep 24, 2018

totaam commented Sep 25, 2018

totaam commented Jan 9, 2019

totaam commented Jan 18, 2019

move content guesser data to a user editable file #1950

move content guesser data to a user editable file #1950

Comments

totaam commented Sep 9, 2018

2018-09-09 03:34:59: antoine created the issue

totaam commented Sep 9, 2018

2018-09-09 06:26:34: antoine commented

totaam commented Sep 10, 2018

2018-09-10 19:23:56: maxmylyn commented

totaam commented Sep 11, 2018

2018-09-11 05:14:20: antoine commented

totaam commented Sep 21, 2018

2018-09-21 22:20:59: maxmylyn commented

totaam commented Sep 21, 2018

2018-09-21 22:21:29: maxmylyn uploaded file 30_title.conf (0.9 KiB)

totaam commented Sep 22, 2018

2018-09-22 06:36:59: antoine commented

totaam commented Sep 24, 2018

2018-09-24 22:58:48: maxmylyn commented

totaam commented Sep 24, 2018

2018-09-24 22:59:12: maxmylyn uploaded file 30_title.2.conf (0.5 KiB)

totaam commented Sep 25, 2018

2018-09-25 06:18:37: antoine commented

totaam commented Jan 9, 2019

2019-01-09 15:35:41: antoine commented

totaam commented Jan 18, 2019

2019-01-18 17:36:47: antoine commented

2018-09-21 22:21:29: maxmylyn uploaded file `30_title.conf` (0.9 KiB)

2018-09-24 22:59:12: maxmylyn uploaded file `30_title.2.conf` (0.5 KiB)