Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the surface for bot detection #448

Open
englehardt opened this issue Aug 9, 2019 · 7 comments · May be fixed by #1037
Open

Reduce the surface for bot detection #448

englehardt opened this issue Aug 9, 2019 · 7 comments · May be fixed by #1037
Labels
enhancement Not a bug or a feature request high-priority

Comments

@englehardt
Copy link
Collaborator

englehardt commented Aug 9, 2019

There are likely a number of ways to identify that we're running Firefox with Selenium/geckodriver. Back in the Selenium 2 days these were injected by the Selenium extension. We made some efforts to prevent that (#108). We later removed them with the upgrade to Selenium 3 because, at least at the time, Selenium 3 didn't self-identify via navigator.webdriver (#152). I'm guessing that's no longer the case. The move to headless mode from XVFB in #426 may further increase discoverability, since Firefox may skip loading some graphics-related things (rather than load fully in a virtual environment).

See also:

@englehardt
Copy link
Collaborator Author

It looks like WebGL doesn't work properly in headless mode:

selenium_firefox     - DEBUG    - BROWSER -1382707880: driver: JavaScript warning: https://login.taobao.com/member/login.jhtml?tpl_redirect_url=https%3A%2F%2Fwww.tmall.com&style=miniall&enup=true&newMini2=true&full_redirect=true&sub=true&from=tmall&allp=assets_css%3D3.0.10/login_pc.css&pms=1566085516350, line 450: Error: WebGL warning: getContext: Disallowing antialiased backbuffers due to blacklisting.
selenium_firefox     - DEBUG    - BROWSER -1382707880: driver: JavaScript warning: https://login.taobao.com/member/login.jhtml?tpl_redirect_url=https%3A%2F%2Fwww.tmall.com&style=miniall&enup=true&newMini2=true&full_redirect=true&sub=true&from=tmall&allp=assets_css%3D3.0.10/login_pc.css&pms=1566085516350, line 450: Error: WebGL warning: <SetDimensions>: Can't use WebGL in headless mode (https://bugzil.la/1375585).
selenium_firefox     - DEBUG    - BROWSER -1382707880: driver: JavaScript warning: https://login.taobao.com/member/login.jhtml?tpl_redirect_url=https%3A%2F%2Fwww.tmall.com&style=miniall&enup=true&newMini2=true&full_redirect=true&sub=true&from=tmall&allp=assets_css%3D3.0.10/login_pc.css&pms=1566085516350, line 450: Error: WebGL warning: <SetDimensions>: Failed to create WebGL context: WebGL creation failed:
selenium_firefox     - DEBUG    - BROWSER -1382707880: driver: * Can't use WebGL in headless mode (https://bugzil.la/1375585).

@englehardt
Copy link
Collaborator Author

Removing the webdriver attribute, and anything else that reveals to websites that automation is active is a great first step. See:

We could make a test page that checks for these properties to see which ones are exposed while OpenWPM is driving Firefox with geckodriver/Selenium. Then we'll want to figure out how to remove them.

Overwriting in JS via a content script is still probably the easiest option, but is a bit hacky. Ideally, we would patch Firefox with a build flag that allows us to disable the webdriver self-identification when running crawls; it might be as simple as adding an ifdef around this line [2]. However, it would be helpful to know whether that's the only properly exposed when marionette is enabled (i.e., when geckodriver/selenium is used).

@Flnch
Copy link

Flnch commented Oct 20, 2019

I am currently fixing the webdriver attribute such that it is set to false as for a regular Firefox instance on Ubuntu. My approach is to overwrite the attribute in a JS content script.

Using Object.defineProperty() opens a new way of identification. Therefore, I switched to another way of overwriting. As this is a bit complex, I oriented the code on other code published under the GNU General Public License v3.0. How is this compatible with the license of OpenWPM? Do I need to consider something special before starting a pull request?

Another thing: For my thesis project I will continue to fix also other revealing things. Is the way of going via separate pull request or via one "big" pull request?

-Daniel

@englehardt
Copy link
Collaborator Author

I am currently fixing the webdriver attribute such that it is set to false as for a regular Firefox instance on Ubuntu. My approach is to overwrite the attribute in a JS content script.

Using Object.defineProperty() opens a new way of identification. Therefore, I switched to another way of overwriting. As this is a bit complex, I oriented the code on other code published under the GNU General Public License v3.0. How is this compatible with the license of OpenWPM? Do I need to consider something special before starting a pull request?

OpenWPM is GPLv3, so that's fine. You can just reference the other codebase following this example.

Another thing: For my thesis project I will continue to fix also other revealing things. Is the way of going via separate pull request or via one "big" pull request?

Individual, self-contained PRs are best. That way we can decide whether to accept your fix for each component individually and can give you feedback as you go. We may choose not to accept some components (due to complexity, etc), but may accept others.

@englehardt
Copy link
Collaborator Author

Found in the wild, some tricks a script uses to detect various browsers (and webdriver):

function Y() {
        try {
            if (null != window._phantom || null != window.callPhantom) return 99;
            if (document.documentElement.hasAttribute && document.documentElement.hasAttribute("webdriver") || null != window.domAutomation || null != window.domAutomationController || null != window._WEBDRIVER_ELEM_CACHE) return 98;
            if (void 0 != window.opera && void 0 != window.history.navigationMode || void 0 != window.opr && void 0 != window.opr.addons && "function" == typeof window.opr.addons.installExtension) return 4;
            if (void 0 != window.chrome &&
                "function" == typeof window.chrome.csi && "function" == typeof window.chrome.loadTimes && void 0 != document.webkitHidden && (1 == document.webkitHidden || 0 == document.webkitHidden)) return 3;
            if (void 0 != window.mozInnerScreenY && "number" == typeof window.mozInnerScreenY && void 0 != window.mozPaintCount && 0 <= window.mozPaintCount && void 0 != window.InstallTrigger && void 0 != window.InstallTrigger.install) return 2;
            if (void 0 != document.uniqueID && "string" == typeof document.uniqueID && (void 0 != document.documentMode && 0 <= document.documentMode ||
                    void 0 != document.all && "object" == typeof document.all || void 0 != window.ActiveXObject && "function" == typeof window.ActiveXObject) || window.document && window.document.updateSettings && "function" == typeof window.document.updateSettings) return 1;
            var b = !1;
            try {
                var c = document.createElement("p");
                c.innerText = ".";
                c.style = "text-shadow: rgb(99, 116, 171) 20px -12px 2px";
                b = void 0 != c.style.textShadow
            } catch (d) {}
            return (0 < Object.prototype.toString.call(window.HTMLElement).indexOf("Constructor") || window.webkitAudioPannerNode &&
                window.webkitConvertPointFromNodeToPage) && b && void 0 != window.innerWidth && void 0 != window.innerHeight ? 5 : 0
        } catch (d) {
            return 0
        }
    }

top_url https://www.pragmatismopolitico.com.br/
script_url https://rtbcdn.doubleverify.com/bsredirect5_internal41.js

@englehardt
Copy link
Collaborator Author

Specifically relevant to webdriver are: if (document.documentElement.hasAttribute && document.documentElement.hasAttribute("webdriver") || null != window.domAutomation || null != window.domAutomationController || null != window._WEBDRIVER_ELEM_CACHE) return 98;

@englehardt
Copy link
Collaborator Author

from @birdsarah: https://dxr.mozilla.org/mozilla-central/search?q=IsHeadless()&redirect=false might be useful for tracking down differences between headless and headed modes

@englehardt englehardt added enhancement Not a bug or a feature request high-priority and removed feature-request labels Nov 9, 2020
@vringar vringar linked a pull request Feb 7, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Not a bug or a feature request high-priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants