Placeholder issue for discussion of issues in ABP/AdGuard issue tracker -- and possible solutions #1930

gorhill · 2016-08-23T17:00:34Z

[Intentionally empty]

gorhill · 2016-08-23T17:07:43Z

Regarding issue https://issues.adblockplus.org/ticket/2278:

Being able to have a token for regex-based filters would definitely help performance. However trying to programmatically extract a token from a regex-based filter sounds scary to me, too much risk of extracting erroneous tokens.

Suggestion: create a new filter option, token=[...], which filter creators can use to assign a predefined token to the filter. The creator of a filter is best placed to figure if and what token will work to store the filter internally.

For example, this filter in EasyList:

/\.filenuke\.com/.*[a-zA-Z0-9]{4}/$script

Could simply have been written by a filter creator:

/\.filenuke\.com/.*[a-zA-Z0-9]{4}/$script,token=filenuke

ameshkov · 2016-08-23T17:44:01Z

Hey guys! I was thinking about solving this issue a while ago. Even tried to implement a simple token-extracting algorithm. I will post my ideas a bit later though.

Meanwhile, here is a list of known regexp rules:

/^(?![a-z]+\:\/+([^\/\:]+\.(il\|com\|net)\|[\.0-9]+\|([^\/\:\.]+\.)*(spot\.im\|vine\.co\|periscope\.tv\|vid\.me\|mako\.tools\|minidom\.org\|jquerymin\.org\|logidea\.info\|zoomanalytics\.co\|firstimpression\.io))\.?([\/\:]\|$))^[^\/\:\.]+\:\/+[^\/\:\.]/$third-party,domain=mako.co.il	EasyList Hebrew	https://github.com/AdBlockPlusIsrael/EasyListHebrew
/^(?![a-z]+\:\/+([^\/\:\.]+\.)*(google\|icdn\|auto\|sport5\|smartair\|mysupermarket\|blms\|linicom)\.co\.il\.?([\/\:]\|$))^[a-z]+\:\/+[^\/\:]+\.il\.?([\/\:]\|$)/$third-party,domain=mako.co.il	EasyList Hebrew	https://github.com/AdBlockPlusIsrael/EasyListHebrew
/^[a-z]+\:\/+[\.0-9]+([\/\:]\|$)/$image,media,object,script,stylesheet,subdocument,third-party,domain=mako.co.il	EasyList Hebrew	https://github.com/AdBlockPlusIsrael/EasyListHebrew
/^(?![a-z]+\:\/+([^\/\:\.]+\.)*(fbcdn\|cloudfront\|facebook\|akamaihd\|ctedgecdn\|2mdn\|uploaditnow\|edgesuite\|doubleclick\|dmcdn\|slideshare\|advsnx)\.net\.?([\/\:]\|$))^[a-z]+\:\/+[^\/\:]+\.net\.?([\/\:]\|$)/$third-party,domain=mako.co.il	EasyList Hebrew	https://github.com/AdBlockPlusIsrael/EasyListHebrew
/^(?![a-z]+\:\/+([^\/\:\.]+\.)*(google\|facebook\|twitter\|instagram\|youtube\|jquery\|googleapis\|vicomi\|twimg\|cdninstagram\|pinterest\|pinimg\|giphy\|playbuzz\|outbrain\|ytimg\|amazonaws\|cloudflare\|gstatic\|sniperm\|dinovich\|shortaudition\|linkedin\|opinionstage\|vimeo\|vimeocdn\|dailymotion\|flickr\|staticflickr\|tumblr\|soundcloud\|scribd\|syteapi\|addthis\|addthisedge\|reddit\|disqus\|disquscdn\|apester\|qmerce\|taboola\|taboolasyndication\|google-analytics\|googletagservices\|googletagmanager\|googleadservices\|googlesyndication\|h-cdn\|scorecardresearch\|serving-sys\|bootstrapcdn\|tiviclick\|ruchlis\|hotjar\|flx1\|mxpnl\|themarker\|adnxs\|conduit\|fourtips\|makojs)\.com\.?([\/\:]\|$))^[a-z]+\:\/+[^\/\:]+\.com\.?([\/\:]\|$)/$third-party,domain=mako.co.il	EasyList Hebrew	https://github.com/AdBlockPlusIsrael/EasyListHebrew
/quang%20cao/	ABPVN List	http://abpvn.com/
/YanAds/	ABPVN List	http://abpvn.com/
/www/images/	ABPVN List	http://abpvn.com/
/ads-pic/	Adblock-Persian list	http://ideone.com/K452p
/eshop-eca/	Adblock-Persian list	http://ideone.com/K452p
/eshop98/	Adblock-Persian list	http://ideone.com/K452p
/402x192/	Adblock-Persian list	http://ideone.com/K452p
/^http://m\.autohome\.com\.cn\/[a-z0-9]{32}\//$domain=m.autohome.com.cn	ChinaList+EasyList	http://www.adtchrome.com/extension/adt-chinalist-easylist.html
/^http://www\.tt1069\.com\/(?!bbs)/$script,domain=tt1069.com	ChinaList+EasyList	http://www.adtchrome.com/extension/adt-chinalist-easylist.html
/^http://www\.iqiyi\.com\/common\/flashplayer\/[0-9]{8}/[0-9a-z]{32}.swf/$domain=iqiyi.com	ChinaList+EasyList	http://www.adtchrome.com/extension/adt-chinalist-easylist.html
/^http://www\.dnvod\.eu.*?\/[a-z0-9]{9,}\.swf/$domain=dnvod.eu	ChinaList+EasyList	http://www.adtchrome.com/extension/adt-chinalist-easylist.html
/NetInsight/text/$domain=~ads.pandora.tv\|~opt.mgoon.com	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/omniture/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/NetInsight/html/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/cgi-bin/conad.fcgi/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/acecounter/$domain=~acecounter.com	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/adNdsoft/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/wisenut/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/ad-pay/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/wp-content/plugins/google-analyticator/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/realclick/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/max-banner-ads-pro/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/RealMedia/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/bannerManager/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/autoPage/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/overture/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/wiseAd/euckr/inc/$subdocument	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/NetInsight/js/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/scrap_logs/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/banner_event/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/images/adpresso/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/AdBanner/	Korean Adblock List	https://github.com/gfmaster/adblock-korea-contrib
/cdsbData_gal/bannerFile/$image,domain=mybogo.net\|zipbogo.net	List-KR	https://list-kr.github.io/
/nad/media/	List-KR	https://list-kr.github.io/
/ajrotator/	Filtros Nauscopicos	http://nauscopio.nireblog.com/cat/filtrado
/:\/\/(?!biuropodrozy)(?!liveblog)(?!relacje)(?!opinie)(?!zalacznik)(?!magazyn)(?!newsletter)(?!rodzinnawycieczka)(?!doladowania)(?!fantasyliga)(?!funduszeue)(?!imperiumstylu)(?!kodyrabatowe)(?!ogloszenia)(?!orangekinoletnie)(?!rekrutacja)(?!rycerzeiksiezniczki)(?!speedwaymanager)(?!sportowefakty)(?!sportowybar)(?!talesofmagic)(?!ubezpieczenia)(?!warofdragons)(?!wiadomosci)[a-zA-Z0-9]{10,}\.wp.pl\//	Adblock polskie reguły	http://certyficate.it/polski-filtr-adblock/
/:\/\/(?!biuropodrozy)(?!liveblog)(?!relacje)(?!opinie)(?!zalacznik)(?!magazyn)(?!newsletter)(?!facet)(?!wyleczto)(?!kuchnia)(?!film)(?!moto)(?!gwiazdy)(?!teleshow)(?!finanse)(?!kobieta)(?!dom)(?!pogoda)(?!tech)(?!historia)(?!czat)(?!ksiazki)(?!gryonline)(?!hotele)(?!narty)(?!samoloty)(?!wycieczki)(?!hosting)(?!irlandia)(?!multikurs)(?!casino)(?!foto)(?!tech)(?!www)(?!stg)(?!doladowania)(?!fantasyliga)(?!funduszeue)(?!imperiumstylu)(?!kodyrabatowe)(?!alefolwark)(?!angielski)(?!arenamody)(?!beniamin)(?!bon)(?!bsg)(?!casino)(?!diety)(?!dlaprasy)(?!dlugi)(?!doladowania)(?!dom)(?!dysk)(?!ebiznes)(?!ebooki)(?!empire)(?!fantasyliga)(?!film)(?!fundusze)(?!ogloszenia)(?!orangekinoletnie)(?!rekrutacja)(?!rycerzeiksiezniczki)(?!speedwaymanager)(?!sportowefakty)(?!sportowybar)(?!talesofmagic)(?!ubezpieczenia)(?!warofdragons)(?!wiadomosci)(?!gazetki)(?!gry)(?!horoskop)(?!kalendarz)(?!katalog)(?!khanwars)(?!komiks)(?!konflikty)(?!kontakty)(?!korsarze)(?!kultura)(?!mini)(?!mmho)(?!mobilna)(?!morizon)(?!moto)(?!muzyka)(?!narty)(?!naryby)(?!onas)(?!orangekinoletnie)(?!piraci)(?!poczta)(?!pomoc)(?!praca)(?!profil)(?!programtv)(?!pytamy)(?!rekrutacja)(?!rss)(?!rtvagd)(?!rycerzeiksiezniczki)(?!smeet)(?!speedwaymanager)(?!szkola)(?!szukaj)(?!tech)(?!teleshow)(?!triviador)(?!turystyka)(?!twojeip)(?!ulubiency)(?!warodfragons)(?!wycieczki)(?!zdrowie)(?!zoomumba)(?!topnews)(?!erotyka)(?!dzieci)(?!fitness)(?!gielda)(?!finansomat)(?!biznes)(?!sport)[a-zA-Z0-9]{4,9}\.wp.pl\//	Adblock polskie reguły	http://certyficate.it/polski-filtr-adblock/
/commoncfm/images/microsoftxboxone/$domain=buffed.de\|gamesaktuell.de\|gamezone.de\|pcgames.de\|videogameszone.de	German filter	http://adguard.com/filters.html#german
/[a-z0-9]{32,}/$third-party,domain=picshare.ru	Russian filter	http://adguard.com/filters.html#russian
/[a-zA-Z0-9]{35,}/$script,third-party,domain=bigtorrent.org\|bigtorrents.ru\|cashtube.ru\|cmexota.ru\|dreamprogs.net\|dsvload.net\|ecsebo.ru\|enotbox.com\|faspiic.ru\|imagefile.org\|imgpay.ru\|kordonivkakino.net\|mcdownloads.ru\|mega-pic.org\|odnopolchane.net\|payforpic.ru\|pic4cash.ru\|pic4you.ru\|picclick.ru\|picforall.ru\|pics-money.ru\|pirat-pic.ru\|planeta51.com\|pronpic.org\|prons.org\|q32.ru\|rustorrents.net\|santikov.net\|sharezones.biz\|torrent-pirat.com\|unionpeer.org\|uraltrack.net\|viewy.ru\|xhamster-pic.com	Russian filter	http://adguard.com/filters.html#russian
/http:\/\/rustorka.com\/[a-z]+\.js/$domain=rustorka.com	Russian filter	http://adguard.com/filters.html#russian
/http:\/\/rustorka.com\/[a-z0-9]+\.(jpg\|gif)/$image,domain=rustorka.com	Russian filter	http://adguard.com/filters.html#russian
/[a-zA-Z0-9]{35,}/$domain=anime-free.net\|cyberpirate.me\|imgbum.net\|online-porno-hd.ru\|tecnomectrani.com	Russian filter	http://adguard.com/filters.html#russian
/[a-z0-9]{30,}/$script,third-party,domain=free-torrent.org\|free-torrents.org	Russian filter	http://adguard.com/filters.html#russian
/^http://[a-z0-9_]{15,}\.[a-z0-9-]+\.[a-z]{2,}\/.*[a-zA-Z0-9]{100,}/$object-subrequest,domain=wat.tv	Liste FR	http://adblock-listefr.com/
/^http://[a-z0-9_-]{10,}\.[a-z0-9-]+\.[a-z]{2,}\/.*?\w{30,}/$~xmlhttprequest,domain=gentside.com\|maxisciences.com\|ohmymag.com	Liste FR	http://adblock-listefr.com/
/content/stargate/$domain=hlamer.ru\|kadu.ru\|krasview.ru	RU AdList	https://code.google.com/p/ruadlist/
/output/index/$third-party,script	RU AdList	https://code.google.com/p/ruadlist/
/https?://(?!(mc\.yandex\.ru\|www\.google-analytics\.com)/)/$third-party,script,subdocument,domain=massivmebel.by	RU AdList	https://code.google.com/p/ruadlist/
/^https?://goodgame\.ru/[a-z0-9]+$/$subdocument,domain=goodgame.ru	RU AdList	https://code.google.com/p/ruadlist/
/wp-content/plugins/popup-maker/$domain=info-life.in.ua\|intermarium.com.ua\|paragraf.net.ua\|unn24.com.ua\|varota.com.ua	RU AdList	https://code.google.com/p/ruadlist/
/^https?://(?!static\.)([^.]+\.)+?fastpic\.ru[:/]/$script,domain=fastpic.ru	RU AdList	https://code.google.com/p/ruadlist/
/images/brandings/$image,domain=sc2tv.ru	RU AdList	https://code.google.com/p/ruadlist/
/default/vbanners/$domain=noi.md	RU AdList	https://code.google.com/p/ruadlist/
/branding/$subdocument,domain=fanserials.tv\|kino-filmi.net	RU AdList	https://code.google.com/p/ruadlist/
/serial_adv_files/$image,domain=xn--80aacbuczbw9a6a.xn--p1ai\|куражбамбей.рф	RU AdList	https://code.google.com/p/ruadlist/
/^https?://(?!www\.)([^.]+\.)+?(kordonivkakino\.net\|m(ac-torrent-download\.net\|oviki\.ru))[:/]/$script	RU AdList	https://code.google.com/p/ruadlist/
/popupclick/$popup	RU AdList	https://code.google.com/p/ruadlist/
/http://[a-zA-Z0-9]+\.[a-z]+\/.(?:[!"#$%&'()+,:;<=>?@/\^_`{\|}~-]).*[a-zA-Z0-9]+/$script,third-party,domain=keezmovies.com\|redtube.com\|tube8.com\|tube8.es\|tube8.fr\|www.pornhub.com\|youporn.com	EasyList	https://easylist.github.io/
/\/[0-9].\-.\-[a-z0-9]{4}/$script,xmlhttprequest,domain=gaytube.com\|keezmovies.com\|spankwire.com\|tube8.com\|tube8.es\|tube8.fr	EasyList	https://easylist.github.io/
/\.sharesix\.com/.*[a-zA-Z0-9]{4}/$script	EasyList	https://easylist.github.io/
/\.filenuke\.com/.*[a-zA-Z0-9]{4}/$script	EasyList	https://easylist.github.io/
/^http://m\.autohome\.com\.cn\/[a-z0-9]{32}\//$domain=m.autohome.com.cn	EasyList China	http://abpchina.org/forum/
/^http://www\.iqiyi\.com\/common\/flashplayer\/[0-9]{8}/[0-9a-z]{32}.swf/$domain=iqiyi.com	EasyList China	http://abpchina.org/forum/
/^http://www\.dnvod\.eu.*?\/[a-z0-9]{9,}\.swf/$domain=dnvod.eu	EasyList China	http://abpchina.org/forum/
/^http://www\.tt1069\.com\/(?!bbs)/$script,domain=tt1069.com	EasyList China	http://abpchina.org/forum/
/ulightbox/$domain=hdkinomax.com\|tvfru.net	RU AdList: BitBlock	https://code.google.com/p/ruadlist/
/http://cdn[0-9]\.spiegel\.de/images/image-([^-]+)-[^-]+-[^-]+-(?!\1)[^-]+\.jpg/$image,domain=spiegel.de	EasyList Germany	https://easylist.github.io/

ameshkov · 2016-08-23T17:45:38Z

Please note the number of rules which are mistakenly made regexp-type.

kzar · 2016-08-23T17:48:13Z

@gorhill I've not been involved in that issue so far, so just done a quick bit of reading. I might get some things wrong.

While I agree that grabbing a keyword from the regexp seems scary, I'm not sure how the suggested token option would help. Take your filenuke example, there the automatic keyword would have been "filenuke" anyway.

Now if you think of a more advanced example which matches one of two possible domains, what would you put for the token option? If you chose to use parts of one of the domain as a keyword you'd end up not matching the other domain. Instead you'd have to omit the token option, which would end up as the same result as the automatic approach. (Since they mention that those kind of strings should be ignored.)

kzar · 2016-08-23T17:51:20Z

(I wonder if we could copy the content blocking approach of compiling all these regular expressions into a finite state machine? That could be a way to make matching regular expression filters faster without worrying about keywords.)

ameshkov · 2016-08-23T18:51:30Z

(I wonder if we could copy the content blocking approach of compiling all these regular expressions into a finite state machine? That could be a way to make matching regular expression filters faster without worrying about keywords.)

This would be an overkill
In order to do it they have restricted regular expressions support to a very limited subset.

gorhill · 2016-08-23T20:32:50Z

Take your filenuke example

Yes, bad example. Here is another one found in EasyList:

/\/[0-9].*\-.*\-[a-z0-9]{4}/$script,xmlhttprequest,domain=gaytube.com|keezmovies.com|spankwire.com|tube8.com|tube8.es|tube8.fr

Not sure if a token was available for this one -- whoever created the filter knows, but mainly my point is that token= option, would be an easy low-tech way available immediately (easy implementation) to deal with this, with no need for a regex parser (which would fail anyway with the filter here). If no token is present for untokenizable filter, then we just end up with the current behavior.

ameshkov · 2016-08-23T21:49:32Z

Let's first think about what issue we are trying to solve.

First of all, domain-restricted filters are not a problem as there is no influence on the overall performance.

I suppose, that what we really need is to reduce the negative impact of the mistakes made by filters authors. For instance, the filters like /ajrotator/ and such. There is no problems with extracting a token from a rule like this.

Here is just a dirty example of a token extracting function:

var extractToken = function(ruleText) {

    // Get the regexp text
    var reText = ruleText.match(/\/(.*)\/(\$.*)?/)[1];

    var specialCharacter = "...";

    if (reText.indexOf('(?') >= 0 || reText.indexOf('(!?') >= 0) {
        // Do not mess with complex expressions which use lookahead
        return null;
    }

    // (Dirty) prepend specialCharacter for the following replace calls to work properly
    reText = specialCharacter + reText;

    // Strip all types of brackets
    reText = reText.replace(/[^\\]\(.*[^\\]\)/, specialCharacter);
    reText = reText.replace(/[^\\]\[.*[^\\]\]/, specialCharacter);
    reText = reText.replace(/[^\\]\{.*[^\\]\}/, specialCharacter); 

    // Strip some special characters
    reText = reText.replace(/[^\\]\\[a-zA-Z]/, specialCharacter); 

    // Split by special characters
    var parts = reText.split(/[\\^$*+?.()|[\]{}]/);
    var token = "";
    var iParts = parts.length;
    while (iParts--) {
        var part = parts[iParts];
        if (part.length > token.length) {
            token = part;
        }
    }

    return token;
};

I've tried this function with the rules above and here is the result:
https://ameshkov.github.io/web/regex-tokens.html?1

What for the token proposition, here are the downsides I see:

It does not solve the issue with regex filters created by mistake.
Complex rules which cannot be tokenized are rare. There are only 2 such filters in EasyList and both are domain-restricted.
No backward compatibility, filters with unknown options will be ignored by old versions. Also, for instance, getadblock guys aren't invited to our party so it could be a surprise for them.

gorhill · 2016-08-23T21:55:52Z

getadblock guys aren't invited to our party

They are using ABP's filtering engine since AdBlock v3.0. See https://github.com/kzar/watchadblock/releases/tag/3.0.

ameshkov · 2016-08-23T21:57:31Z

The other points still stand though:)

gorhill · 2016-08-24T00:07:57Z

I wasn't aware of the many erroneous regex filters, looks like this can be easily addressed with a trivial code for these cases.

Mainly it was just to throw an idea out there, since these untokenizable filters have always bothered me[1], and I knew there was an issue like this opened on ABP issue tracker -- so I just threw the idea out there to have an easy fix, worth only if actually used by filter list maintainers.

Anyway, I will just use this issue here to throw ideas once in a while which I think might be good for all blockers[2], especially when it comes to make the life of filter list maintainers easier.

[1] I was looking to even skip testing for domain hit -- but this is an implementation-dependent detail I suppose
[2] I understand that when a filter syntax is not supported by ABP, EasyList et al. maintainers won't use it.

ameshkov · 2016-08-24T08:47:16Z

[2] I understand that when a filter syntax is not supported by ABP, EasyList et al. maintainers won't use it.

By the way, I'd like to raise a question about the non-standard syntax.

You have recently added a couple of pseudo-classes extending element hiding rules syntax. I am talking about :has(), :xpath(), :matches-css [1] and such.

The idea is really great and we will support some of these extended selectors as well (:has() and :contains() are currently in the beta testing stage, :matches-css() is coming).

However, there is one issue that bothers me. The syntax you use (pseudo-classes syntax) is not backward-compatible and it will break good old stylesheet-based ad blockers like Adguard and ABP.

/* browser will ignore the whole style due to the second selector */
#banner, #banner:has(.test) { display: none; }

I suggest introducing a backward-compatible syntax along with the modern pseudo-classes-based one.

Backward compatible synonym for :has(...) will be [-ext-has="..."]
Backward compatible synonym for :matches-css(...) will be [-ext-matches-css="..."]
Backward compatible synonym for :xpath(...) will be [-ext-xpath="..."]

[1] As I understand, there is a backward compatible :matches-css() option already: https://issues.adblockplus.org/ticket/2390

kzar · 2016-08-24T09:28:58Z

You have recently added a couple of pseudo-classes extending element hiding rules syntax. I am talking about :has() ...

FWIW We are working towards adding the :has selectors too https://issues.adblockplus.org/ticket/3143

Anyway, I will just use this issue here to throw ideas once in a while which I think might be good for all blockers[2], especially when it comes to make the life of filter list maintainers easier.

👍 Please do, I think collaboration benefits us all.

ameshkov · 2016-08-24T09:34:01Z

@kzar so, what do you think about the backward compatible syntax proposition?

ameshkov · 2016-08-24T09:52:02Z

@kzar regarding Lain's comment:

I think it's worth mentioning that :has() selector must work in combination with -abp-properties. So, filter like site.name##.block:has([-abp-properties="background: yellow"])

Using proposed syntax it could look like this:
##.block[-ext-has="*:matches-css(background: yellow)"]

kzar · 2016-08-24T10:09:28Z

@ameshkov Well I think the idea is that when browsers eventually support :has selectors those filters will be again using standard CSS selectors anyway. We only need to implement special logic for those filters in the mean time as a stop-gap. I guess it's true (and unfortunate) that the syntax will break filters for ad blockers which haven't added support for now, but I guess that's not too bad since uBlock, AdGuard and Adblock Plus all plan to support them. (Also because they are only planned to be something used as a last resort.)

As for the general point of using backward compatible syntax like you've suggested, I think it's a good idea. (We already do something like that for CSS property filters using the -abp-properties attribute.)

ameshkov · 2016-08-24T10:16:21Z

Well I think the idea is that when browsers eventually support :has selectors those filters will be again using standard CSS selectors anyway.

True. However, here is one more argument for that type of syntax. We all support a lot of different browsers (including mobile and such) and trying to use pseudo-classes syntax requires us to do it simultaneously for all the platforms. While backward-compatible syntax allows us to roll this feature out gradually.

As for the general point of using backward compatible syntax like you've suggested, I think it's a good idea. (We already do something like that for CSS property filters using the -abp-properties attribute.)

Yeah, I know, that's why I was surprised by the implementation proposed in the issue 3143.

gorhill · 2016-08-24T12:58:46Z

I suggest introducing a backward-compatible syntax along with the modern pseudo-classes-based one.

I will support the backward-compatible syntax where possible, but personally, internally I prefer using the :() syntax. I see these new operators as nodes in a processing graph, and thus being able to easily and freely combine them I see this as a requirement for the future. Example[1]:

div.red:has(div.blue:matches-css(position: fixed;):contains(allo)):contains(publicité)

It does feel to me like a backward-compatible syntax would complicate writing such filters (especially the use of quotes):

div.red[-ext-has="div.blue[-ext-matches-css=\"position: fixed;\"][-ext-contains=\"allo\"]"][-ext-contains="publicité"]

Aren't you validating element hiding filters at load time (or else using invalid CSS selector would break element hiding) so isn't true that old versions will discard filters with this new syntax? (Element:matches('div:has(span)') would throw).

[1] Ok, the example is contrived, but it's just to illustrate easily combining such filters.

ameshkov · 2016-08-24T13:10:34Z

It does feel to me like a backward-compatible syntax would complicate writing such filters (especially the use of quotes):

Yeah, frankly, when I check something, I prefer to use the newer syntax as well.

However, it's not that bad, there's no need to support it inside of a composite filter.

Here, look at this example:

div.red[-ext-has="div.blue:matches-css(position: fixed):contains(allo):contains(publicité)"]

ameshkov · 2016-08-24T13:23:31Z

Aren't you validating element hiding filters at load time (or else using invalid CSS selector would break element hiding) so isn't true that old versions will discard filters with this new syntax? (Element:matches('div:has(span)') would throw).

Nope, in fact it was all of a sudden for us:) Also there's no way we could do it in desktop and mobile versions.

ameshkov · 2016-08-24T14:23:04Z

@gorhill one more thing regarding the :matches-css(). I propose using a bit different syntax for it.

Could you please read this issue description and tell me what you think about it?
AdguardTeam/ExtendedCss#7

gorhill · 2016-08-24T17:07:03Z

Q: Why additional pseudo-classes for matching before and after

I already support selector:after:style-properties(pattern), I just extract the :after before using the selector at setup time. But I would not mind selector:style-properties-before(pattern) -- it would just make the setup code a bit simpler.

Q: Why pattern-matching?

I agree with (optional) pattern matching. Pattern-matching is not something I implemented, but I don't see a problem supporting this. For the implementation side of such filter however, I would just want to be sure its semantic does not force a very specific implementation.[1]

I suppose that using this approach we could also cover existing abp-properties rules

Note that ABP's -abp-properties has been implemented with a very different semantic in mind than something like :matches-css: to reverse lookup CSS rules. Such filters shouldn't be used directly on a set of nodes for filtering purpose. The purpose of all the filters I have been adding lately are to reduce a set of nodes (starting with one as small as possible), so the suffix part is key, to start with the smallest set of nodes possible is key for performance.

For example, a filter such as wetter.com##[-abp-properties='margin-left: 24px'], given that it has no suffix selector, would have to be tested for all elements on a page, which would just kill performance.

[1] I see using cssText as a potentially high overhead approach, so I went with the dictionary approach, to test only for the enumerated properties. a) I suspect the cssText string is generated on the fly by the browser when "getted"; b) using cssText forces the use of a regex which will apply to a potentially large string.

ameshkov · 2016-08-24T17:49:06Z

I already support selector:after:style-properties(pattern)

It may look pretty good, but it bothers me that :after in fact can't be part of a valid selector as pseudo-element cannot be selected. I suppose it could mislead a filter author.

[1] I see using cssText as a potentially high overhead approach, so I went with the dictionary approach, to test only for the enumerated properties. a) I suspect the cssText string is generated on the fly by the browser when "getted"; b) using cssText forces the use of a regex which will apply to a potentially large string.

Yep, I've run into a number of issues while implementing it. For now I've used a cross-browser function for extracting the cssText string:
https://github.com/AdguardTeam/ExtendedCss/blob/feature/issues/7/lib/style-property-matcher.js#L96

Also I agree with you on the enumerated properties approach. There's no need in building the cssText field, I will change the current implementation.

For example, a filter such as wetter.com##[-abp-properties='margin-left: 24px'],

Yeah, you're right. Also now when I know how this type of rules work, I find it a bit misleading. At least I think Lain_13 does not understand how it works.

@kzar what do you think about implementing something more "straightforward"?

ameshkov · 2016-08-24T17:57:05Z

I guess if we use the properties approach and agree on *-before/after postfix, there is no need for me to use another name for that pseudo class. matches-css, matches-css-before and matches-css-after sounds good and describes the filter behaviour very well.

gorhill · 2016-08-24T19:51:59Z

matches-css, matches-css-before and matches-css-after sounds good and describes the filter behaviour very well.

I agreed with this. This new selector, combinable with :has() is going make filter list maintainers' life easier.

ameshkov · 2016-08-24T20:16:56Z

I've updated the syntax description:
AdguardTeam/ExtendedCss#7

gorhill · 2016-08-26T13:05:56Z

Looking into this specific case this morning: uBlockOrigin/uAssets#110.

This would be solvable without exception filters if it was possible to outright remove the targeted nodes from the DOM:

finanzen.net###bodyCenter > div[id]:has(:scope > #Ads_BA_Sky):remove()

The current implicit action to take on targeted nodes is to hide them. However, being to re-style has make the job of working against anti-blocker mechanisms much easier (AdGuard support this).

Additionally, being able to remove nodes from the DOM is something I have found would take care of many other cases as well (I do believe AdGuard support this in some ways, not sure). From my point of view, being forced to whitelist network requests from 3rd-party advertisers/trackers is always the worst option, and we should extend the capabilities of cosmetic filtering (element hiding) to avoid such whitelisting.

ameshkov · 2016-08-26T14:30:19Z

Oh, you have finally faced these german wunderwaffe-anti-adblock-solutions:)
I was impressed when I saw this particular script for the first time.

Currently the easiest way to circumvent it is to inject a script like this:

Object.defineProperty(window, `UABPtracked`, { get: function() { return true; }, set: function() {} })

hemantgoyal · 2017-06-21T10:45:38Z

Why does the chrome store say it is corrupted now ? :(

bershan2 · 2017-06-21T11:25:33Z

@hemantgoyal That is a Chrome hash function bug. See https://github.com/gorhill/uBlock/issues/2720 (already fixed with a bit of padding).

gorhill · 2017-11-08T13:35:44Z

@mapx- re. https://issues.adblockplus.org/ticket/6002, see https://bugzilla.mozilla.org/show_bug.cgi?id=1415194.

mapx- · 2017-11-08T13:43:45Z

thanks:
https://issues.adblockplus.org/ticket/6002#comment:5

ameshkov · 2017-11-08T13:51:41Z

Hey guys, I've recently stumbled upon an interesting adblocking circumvention technique used by Yandex.

The thing is that they use shadow DOM to circumvent element hiding rules:
https://uploads.adguard.com/up04_5pkb0_Yandeks.png

The open root is used in the screenshot so we can get inside with a /deep/, and when a closed root is used, /deep/ cannot help us. Anyway, all the shadow piercing selectors are deprecated and will be eventually removed so we have a problem here.

Possible solution (rather ugly though):

Support ::shadow pseudo-selector to pierce inside open roots.
Override attachShadow and force all shadow roots to be open.

Have you already faced anything like that? Thoughts?

mapx- · 2017-11-08T14:07:50Z

@ameshkov
I added you here: https://issues.adblockplus.org/ticket/5318
and
https://issues.adblockplus.org/ticket/5302

perhaps it's about the same yandex stuff

ameshkov · 2017-11-08T14:19:51Z

@mapx- thank you! Yeah, it's been a while since they began their crusade and both issues are relevant.

What bothers me is that the "closed shadow root" approach seems to be a universal way to avoid elements hiding and even user stylesheets won't help us defeat it once Chrome stops supporting /deep/ and ::shadow.

gorhill · 2017-11-08T14:19:59Z

The /deep/ issue has been raised before. It's being deprecated as a valid CSS selector component in a CSS rule, but will still be valid as a CSS selector in querySelector[All] call. My understanding.

So currently, not an issue with Firefox I presume (does not support shadow stuff yet). An issue with Chrome, but can be worked around by manually hiding through querySelectorAll.

ameshkov · 2017-11-08T14:46:18Z

@gorhill

So currently, not an issue with Firefox I presume (does not support shadow stuff yet). An issue with Chrome, but can be worked around by manually hiding through querySelectorAll.

That's basically what I meant -- support either shadow or /deep/ "polyfills" just like we do with :has.

Good news is that /deep/ is able to pierce inside closed shadow roots so my the second point in my comment is redundant.

gorhill · 2017-11-09T16:34:09Z

just like we do with :has.

I just realized we can probably already just use :has for filter with /deep/?

example.com##div.container:has(/deep/ .aq)

Would this work now?

ameshkov · 2017-11-09T16:37:43Z

I just realized we can probably already just use :has for filter with /deep/?

We don't yet support it (but we definitely will), but this is a partial solution anyway.

For instance, in Yandex case they shadow contains legit elements as well so we need something like example.org##div /deep/ span:has(.banner)

gorhill · 2017-11-13T17:58:03Z

@ameshkov

The "fastest" way to perform a dynamic script injection is to use the onCommitted listener

I'm experimenting with this and this works fine so far on both Chromium and Firefox. I see 10-20ms gain in how earlier the scriptlets are injected (using tabs.executeScript), though when I measure with the number of scripts already handled (document.scripts.length), I can't see gain so far for the little I have tested.

Anyway, I want to ask why did you chose to go through messaging to inject the scriptlets rather than injecting directly using tabs.executeScript?

ameshkov · 2017-11-15T10:30:37Z

@gorhill

Anyway, I want to ask why did you chose to go through messaging to inject the scriptlets rather than injecting directly using tabs.executeScript?

As I recall, we compared both and didn't see any serious difference. Actually, in future updates, we'll migrate to tabs.insertCSS in light of the coming user-agent styles priority improvement.

mjethani · 2018-02-03T13:23:06Z

@ameshkov regarding Shadow DOM, we have user style sheets in Chromium now (works on Canary) along with the cssOrigin option to tabs.insertCSS. I'm also working on allowing extensions to access closed shadow roots. We should be in a good place soon.

ameshkov · 2018-02-04T18:30:35Z

Hi @mjethani! I'm actually keeping an eye on all the pull requests you're pushing to Chromium, and you're doing a great job, thank you!

ameshkov · 2018-03-04T23:51:38Z

Hey guys, coming at you with a new modifier idea:
AdguardTeam/AdguardBrowserExtension#961

I suppose it can benefit all the privacy-oriented subscriptions so we're planning to implement it in one of the future updates.

gorhill · 2018-03-05T11:21:56Z

@ameshkov I don't see that much privacy value in dealing with cookies alone given that data can be stored in other local storage such as localStorage, indexedDB, etc.Blocking 3rd-party cookies in browser settings should be the first step for any privacy conscious person -- this also takes care of all local storages.

ameshkov · 2018-03-05T13:43:52Z

Extending this modifier to handle localStorage sounds useful indeed, and it can be done without changing the modifier syntax.

However, I've never seen indexedDB used for tracking purpose.

ameshkov mentioned this issue Aug 24, 2016

Extract a token/shortcut from simple regex rules AdguardTeam/AdguardBrowserExtension#332

Closed

gorhill changed the title ~~Placeholder issue for discussion of issues in ABP's issue tracker -- and possible solutions~~ Placeholder issue for discussion of issues in ABP/AdGuard issue tracker -- and possible solutions Aug 26, 2016

theseanl mentioned this issue Jun 18, 2017

https://ru.investing.com/ - Missed ad AdguardTeam/AdguardFilters#5207

Closed

gorhill mentioned this issue Oct 23, 2017

<all_urls> matching rule not working pes10k/web-api-manager#14

Closed

okiehsch referenced this issue in uBlockOrigin/uAssets Dec 6, 2017

https://forums.lanik.us/viewtopic.php?f=62&t=39280&p=129024#p129024

4633bf8

ghostwords mentioned this issue Feb 11, 2018

Not being able to guarantee interception gbaptista/luminous#55

Open

alexristich mentioned this issue Feb 12, 2018

Update conditional content script injection to use faster methods EFForg/privacybadger#1865

Open

cretz mentioned this issue Jun 13, 2018

Support CSP Injection In Ad Blocking Rules cretz/doogie#73

Open

gorhill mentioned this issue Jul 7, 2018

Advanced anti-adblock techniques on many CZ local servers uBlockOrigin/uAssets#2790

Closed

ameshkov mentioned this issue Jul 20, 2018

Multiple CSP headers restriction is unclear w3c/webappsec-csp#215

Closed

gwarser mentioned this issue Dec 29, 2018

blocking cookies files uBlockOrigin/uBlock-issues#352

Closed

gorhill closed this as completed May 19, 2019

polcak mentioned this issue May 29, 2020

Fix handling of non-default levels polcak/jsrestrictor#40

Closed

gorhill mentioned this issue Nov 2, 2020

Implement $queryprune parameter uBlockOrigin/uBlock-issues#760

Closed

8 tasks

Placeholder issue for discussion of issues in ABP/AdGuard issue tracker -- and possible solutions #1930

Placeholder issue for discussion of issues in ABP/AdGuard issue tracker -- and possible solutions #1930

Comments

gorhill commented Aug 23, 2016

gorhill commented Aug 23, 2016 • edited Loading

ameshkov commented Aug 23, 2016

ameshkov commented Aug 23, 2016

kzar commented Aug 23, 2016

kzar commented Aug 23, 2016 • edited Loading

ameshkov commented Aug 23, 2016 • edited Loading

gorhill commented Aug 23, 2016 • edited Loading

ameshkov commented Aug 23, 2016 • edited Loading

gorhill commented Aug 23, 2016 • edited Loading

ameshkov commented Aug 23, 2016

gorhill commented Aug 24, 2016

ameshkov commented Aug 24, 2016 • edited Loading

kzar commented Aug 24, 2016

ameshkov commented Aug 24, 2016

ameshkov commented Aug 24, 2016 • edited Loading

kzar commented Aug 24, 2016

ameshkov commented Aug 24, 2016

gorhill commented Aug 24, 2016

ameshkov commented Aug 24, 2016 • edited Loading

ameshkov commented Aug 24, 2016

ameshkov commented Aug 24, 2016

gorhill commented Aug 24, 2016 • edited Loading

ameshkov commented Aug 24, 2016

ameshkov commented Aug 24, 2016 • edited Loading

gorhill commented Aug 24, 2016

ameshkov commented Aug 24, 2016

gorhill commented Aug 26, 2016

ameshkov commented Aug 26, 2016

hemantgoyal commented Jun 21, 2017

bershan2 commented Jun 21, 2017

gorhill commented Nov 8, 2017

mapx- commented Nov 8, 2017

ameshkov commented Nov 8, 2017 • edited Loading

mapx- commented Nov 8, 2017

ameshkov commented Nov 8, 2017

gorhill commented Nov 8, 2017

ameshkov commented Nov 8, 2017

gorhill commented Nov 9, 2017 • edited Loading

ameshkov commented Nov 9, 2017

gorhill commented Nov 13, 2017 • edited Loading

ameshkov commented Nov 15, 2017

mjethani commented Feb 3, 2018 • edited Loading

ameshkov commented Feb 4, 2018

ameshkov commented Mar 4, 2018

gorhill commented Mar 5, 2018 • edited Loading

ameshkov commented Mar 5, 2018

gorhill commented Aug 23, 2016 •

edited

Loading

kzar commented Aug 23, 2016 •

edited

Loading

ameshkov commented Aug 23, 2016 •

edited

Loading

gorhill commented Aug 23, 2016 •

edited

Loading

ameshkov commented Aug 23, 2016 •

edited

Loading

gorhill commented Aug 23, 2016 •

edited

Loading

ameshkov commented Aug 24, 2016 •

edited

Loading

ameshkov commented Aug 24, 2016 •

edited

Loading

ameshkov commented Aug 24, 2016 •

edited

Loading

gorhill commented Aug 24, 2016 •

edited

Loading

ameshkov commented Aug 24, 2016 •

edited

Loading

ameshkov commented Nov 8, 2017 •

edited

Loading

gorhill commented Nov 9, 2017 •

edited

Loading

gorhill commented Nov 13, 2017 •

edited

Loading

mjethani commented Feb 3, 2018 •

edited

Loading

gorhill commented Mar 5, 2018 •

edited

Loading