Use topics from a meta tag on Special Topics Provider Sites #206

dmarti · 2023-06-23T01:17:56Z

Check to see if the page is from a Special Topics Provider Site (STPS), one that hosts content on many topics (such as youtube.com). If so:

Do not use the hostname to train the classifier
Check for a meta tag in the page head containing the section or channel name. Use the content of this meta tag to train the classifier instead
If no meta tag containing the section or channel name is found, disable Topics API on this STPS page.

Special Topics Provider Sites could enroll, using the existing enrollment process, specifying that they want to be part of the STPS program. The browser or an independent party could crawl the site and check that the site has at least "n" pages that are classified as at least "m" different topics before adding the site to the STPS list.

(simpler solution to achieve a large fraction of the benefits of #118 with less complexity and risk)

jkarlin · 2023-06-23T18:01:18Z

This doesn't actually address the privacy concerns from #118. Further, it picks a single site (a rather arbitrary heuristic) as opposed to applying equally web wide, which doesn't seem particularly webby. Finally, due to filtering, there would be some benefit to all from this (global top topic selection being more refined) but one would still have to observe the user on some page with that topic in order to receive it.

dmarti · 2023-06-23T20:12:44Z

I agree that it's suboptimal to treat a single site as a special case. But as long as there is no more general approach to the YouTube problem being pursued, this would be better than nothing. Possibly other very large sites that also cover all or most topics could be special cased as well.

michaelkleber · 2023-06-23T20:40:51Z

I think this feature request should be interpreted as something like: "For some browser-chosen list of Special Topic Provider Sites, pages on those sites should be able to declare what Topics they are about, and those become available to everyone, as if every Topics caller had observed them. And also YouTube should be on that list." In this sense it's more like a restricted version of #1 than of #118.

I don't know that I agree with this proposal! — no idea whether YouTube would be interested in being a Special Topic Provider, no idea how we would determine what other sites should have the same special status, etc. But this version seems "tricky and subtle" rather than "impossible".

dmarti · 2023-06-24T00:48:47Z

@michaelkleber That makes a lot of sense. The list doesn't have to be browser-chosen.

Special Topics Provider Sites could enroll, using the existing enrollment process and specifying that they want to be part of the STPS program.
The browser or an independent party could crawl the site and check that the site has at least "n" pages that are classified as at least "m" different topics
If the number of pages and topics is high enough, the site is added to the STPS list.
~~Sites~~Pages from sites on the STPS list are classified by the content of the appropriate meta tag, not domain.

dmarti · 2023-06-26T14:49:44Z

I have rewritten the text of this issue to cover Special Topics Provider Sites, as @michaelkleber suggested. This seems like a possible path forward considering that #118 was closed, and that there still appears to be interest in fairly classifying content from large, multi-topics sites. See p. 7 of CMA update report on implementation of the Privacy Sandbox commitments, April 2023

jkarlin · 2023-06-26T15:10:01Z

I think you can achieve the same effect with a default () permission policy that declares that the page would like to include something other than domain in its topics rather than needing to make changes to enrollment.

michaelkleber · 2023-06-26T15:42:53Z

Don, I see you're still hoping that the browser does the work of turning the "section or channel name" into topics, rather than letting the STPS just declare the page's topics directly. Is that distinction important to you?

It seems to me that the way to turn a YouTube channel name into a Topic could be very different from how you turn a hostname into a Topic. So it feels like this version of the proposal implicitly asks browsers to build a specialized STPS-to-Topics model for each Provider Site.

On the one hand, that seems like putting the work in the wrong place: Surely the site is in a good position to do a better job! On the other hand, you might worry that an STPS would be able to abuse this by maliciously giving out the wrong topics — but if you're letting them control the "section or channel name" input and the model is public, then surely it would be easy for them to maliciously push false topics either way.

dmarti · 2023-06-26T16:24:11Z

Hi @michaelkleber -- I don't know. On one hand, it seems like the choice of whether or not to allow sites or channels to choose their own topics should apply to both sites and channels or to neither. Some hostnames provide usable Topics API information to the classifier, and others don't. Some YouTube channel names provide usable information to the classifier, and others don't. (For example, Jalopnik dot com is about cars, but it's a made-up word so doesn't get classified, last I checked. And the YouTube channel "LazerPig" is not about lasers or pigs. Other site and channel names have better keywords in them.) You might be able to use the same classifier for hostnames and channels/sections if STPSs had to transform the channel name into something that would be a valid hostname ("My YouTube Channel" becomes "my-youtube-channel" or similar)

On the other hand, there are relatively few STPSs and it would be fairly straightforward to spot-check how accurately they were assigning topics to each channel, so it might be fine to have STPSs pass topics directly.

@jkarlin Yes, that seems to be another workable option.

michaelkleber · 2023-06-26T18:44:24Z

On one hand, it seems like the choice of whether or not to allow sites or channels to choose their own topics should apply to both sites and channels or to neither.

Hmm, the two questions feel quite different to me. Changing a domain name is both much harder and much more user-visible than changing an invisible meta tag on a page, for example. Using something user-visible seems like a huge contributor to maintaining quality of input data.

But a lot of this comes around to the question of what qualifications a site would need to have to be a STPS. Besides just being large and heterogeneous, if we think it would include a site being more "reputable" in some way, then perhaps that reputation would lead us to expect a lower chance being pushed useless/fabricated topics. (OTOH would you let Reddit onto the list? Seems all-but-guaranteed that some subreddits would claim a random absurd topic for each pageview.)

dmarti · 2023-06-26T21:23:07Z

@michaelkleber Yes, I agree about the Reddit problem (one of the current best international news subreddits has a deliberately embarrassing and NSFW name in an effort to avoid ads, and they would probably pass the most embarrassing possible topics too). But there are few enough STPSs that the browser (or other STPS list maintainer) could check the privacy policy for whether it covers passing best-effort accurate topics or something else, and spot-check what the site is actually passing.

Some sites that are eligible to be STPSs will probably not see a reason to do it until some other party offers them an incentive to more accurately classify their audiences. In that case the other party will be in a position to require and check that the STPS is passing accurate topics, and the browser won't need to enforce.

michaelkleber · 2023-06-26T21:28:19Z

the browser (or other STPS list maintainer) could check the privacy policy for whether it covers passing best-effort accurate topics or something else, and spot-check what the site is actually passing.

This strikes me as very unappealing, and we should do whatever we can to avoid ending up in that position.

dmarti · 2023-06-26T23:40:02Z

Yes, but it's less unappealing the fewer privacy policies you have to read. The number of pages and topics required for STPS status can be set high enough to keep the work on the browser (or independent evaluator) easily manageable, and not all sites eligible for STPS will apply.

jkarlin · 2023-06-28T14:15:01Z

If we were to go in the direction of allowing metadata, then it might make sense to do so in a page-level opt-in way to address privacy concerns. My primary concern there is that I imagine very few pages would opt in, as it's unclear what their incentive would be. And without a significant user base, it's hard to justify the costs of training the new model and having it sit on users devices.

dmarti · 2023-06-28T15:21:35Z

Hi @jkarlin, yes, that's a good point. There are at least two scenarios in which a large, multi-topic site will choose page-level opt-in or STPS.

Competition regulators require page-level opt-in and/or STPS when a company owns both a Topics API browser and a large, general-interest site that would otherwise benefit in an illegal or questionable way from domain-based Topics API training
Adtech intermediaries compensate large, general-interest publishers for providing additional data that they can use to increase ad revenue on other sites (in this case the intermediary is motivated to check on the site's topics, so there would be little administrative burden on the browser maintainers)

The first scenario is the one that seems to be the immediate problem. I know that either opt-ins or STPS would represent additional development work, but realistically considering the time required for browser development tasks compared to the time required for regulator and lawyer meetings, it seems to me that it's worth the additional time to implement Topics API in a way that takes some meaningful steps toward treating niche sites and YouTube channels in a comparable way.

dmarti · 2023-07-24T23:54:41Z

Added #224 to cover the opt-in suggested by @jkarlin

dmarti changed the title ~~Handle youtube.com channels as Topics API data sources~~ Use topics from a meta tag on Special Topics Provider Sites Jun 24, 2023

dmarti mentioned this issue Jun 29, 2023

Update permissions policy to support separate permissions for retrieve and observe #92

Closed

dmarti mentioned this issue Jul 24, 2023

Permissions to observe topics in page head and body #224

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use topics from a meta tag on Special Topics Provider Sites #206

Use topics from a meta tag on Special Topics Provider Sites #206

dmarti commented Jun 23, 2023 •

edited

Loading

jkarlin commented Jun 23, 2023

dmarti commented Jun 23, 2023

michaelkleber commented Jun 23, 2023

dmarti commented Jun 24, 2023 •

edited

Loading

dmarti commented Jun 26, 2023

jkarlin commented Jun 26, 2023

michaelkleber commented Jun 26, 2023

dmarti commented Jun 26, 2023 •

edited

Loading

michaelkleber commented Jun 26, 2023

dmarti commented Jun 26, 2023

michaelkleber commented Jun 26, 2023

dmarti commented Jun 26, 2023

jkarlin commented Jun 28, 2023

dmarti commented Jun 28, 2023

dmarti commented Jul 24, 2023

Use topics from a meta tag on Special Topics Provider Sites #206

Use topics from a meta tag on Special Topics Provider Sites #206

Comments

dmarti commented Jun 23, 2023 • edited Loading

jkarlin commented Jun 23, 2023

dmarti commented Jun 23, 2023

michaelkleber commented Jun 23, 2023

dmarti commented Jun 24, 2023 • edited Loading

dmarti commented Jun 26, 2023

jkarlin commented Jun 26, 2023

michaelkleber commented Jun 26, 2023

dmarti commented Jun 26, 2023 • edited Loading

michaelkleber commented Jun 26, 2023

dmarti commented Jun 26, 2023

michaelkleber commented Jun 26, 2023

dmarti commented Jun 26, 2023

jkarlin commented Jun 28, 2023

dmarti commented Jun 28, 2023

dmarti commented Jul 24, 2023

dmarti commented Jun 23, 2023 •

edited

Loading

dmarti commented Jun 24, 2023 •

edited

Loading

dmarti commented Jun 26, 2023 •

edited

Loading