-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the surface for bot detection #448
Comments
It looks like WebGL doesn't work properly in headless mode:
|
Removing the
We could make a test page that checks for these properties to see which ones are exposed while OpenWPM is driving Firefox with geckodriver/Selenium. Then we'll want to figure out how to remove them. Overwriting in JS via a content script is still probably the easiest option, but is a bit hacky. Ideally, we would patch Firefox with a build flag that allows us to disable the webdriver self-identification when running crawls; it might be as simple as adding an ifdef around this line [2]. However, it would be helpful to know whether that's the only properly exposed when marionette is enabled (i.e., when geckodriver/selenium is used). |
I am currently fixing the Using Another thing: For my thesis project I will continue to fix also other revealing things. Is the way of going via separate pull request or via one "big" pull request? -Daniel |
OpenWPM is GPLv3, so that's fine. You can just reference the other codebase following this example.
Individual, self-contained PRs are best. That way we can decide whether to accept your fix for each component individually and can give you feedback as you go. We may choose not to accept some components (due to complexity, etc), but may accept others. |
Found in the wild, some tricks a script uses to detect various browsers (and webdriver): function Y() {
try {
if (null != window._phantom || null != window.callPhantom) return 99;
if (document.documentElement.hasAttribute && document.documentElement.hasAttribute("webdriver") || null != window.domAutomation || null != window.domAutomationController || null != window._WEBDRIVER_ELEM_CACHE) return 98;
if (void 0 != window.opera && void 0 != window.history.navigationMode || void 0 != window.opr && void 0 != window.opr.addons && "function" == typeof window.opr.addons.installExtension) return 4;
if (void 0 != window.chrome &&
"function" == typeof window.chrome.csi && "function" == typeof window.chrome.loadTimes && void 0 != document.webkitHidden && (1 == document.webkitHidden || 0 == document.webkitHidden)) return 3;
if (void 0 != window.mozInnerScreenY && "number" == typeof window.mozInnerScreenY && void 0 != window.mozPaintCount && 0 <= window.mozPaintCount && void 0 != window.InstallTrigger && void 0 != window.InstallTrigger.install) return 2;
if (void 0 != document.uniqueID && "string" == typeof document.uniqueID && (void 0 != document.documentMode && 0 <= document.documentMode ||
void 0 != document.all && "object" == typeof document.all || void 0 != window.ActiveXObject && "function" == typeof window.ActiveXObject) || window.document && window.document.updateSettings && "function" == typeof window.document.updateSettings) return 1;
var b = !1;
try {
var c = document.createElement("p");
c.innerText = ".";
c.style = "text-shadow: rgb(99, 116, 171) 20px -12px 2px";
b = void 0 != c.style.textShadow
} catch (d) {}
return (0 < Object.prototype.toString.call(window.HTMLElement).indexOf("Constructor") || window.webkitAudioPannerNode &&
window.webkitConvertPointFromNodeToPage) && b && void 0 != window.innerWidth && void 0 != window.innerHeight ? 5 : 0
} catch (d) {
return 0
}
} top_url https://www.pragmatismopolitico.com.br/ |
Specifically relevant to webdriver are: |
from @birdsarah: https://dxr.mozilla.org/mozilla-central/search?q=IsHeadless()&redirect=false might be useful for tracking down differences between headless and headed modes |
There are likely a number of ways to identify that we're running Firefox with Selenium/geckodriver. Back in the Selenium 2 days these were injected by the Selenium extension. We made some efforts to prevent that (#108). We later removed them with the upgrade to Selenium 3 because, at least at the time, Selenium 3 didn't self-identify via
navigator.webdriver
(#152). I'm guessing that's no longer the case. The move to headless mode from XVFB in #426 may further increase discoverability, since Firefox may skip loading some graphics-related things (rather than load fully in a virtual environment).See also:
The text was updated successfully, but these errors were encountered: