-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that NEL does not report on requests that users do not make voluntarily. #150
Comments
Scenario 3: A website (lets call it tracker.example) appears on a block list that is commonly used by DNS firewalls in the wild. As browsers of the users behind the firewall have already installed NEL policies for tracker.example, their browsers start to report many DNS failures to the operator of tracker.example. In the beginning, the operator can be confused by the number of errors learnt from NEL. After investigation, the operator would likely learn that wide-spread DNS blocking of its domain appeared in the wild. Such information would be hidden or harder to discover without NEL. The help of users that actually want to block tracker.example to the operator of the tracker.example does not look like something the users want to do voluntarily. |
We plan to talk about this and potentially other issues next week at our annual TPAC meeting next Thursday. |
Regarding Scenario 3, It is OK for a site to learn that a resource/fetch has failed, but I do not think that it is appropriate for a site to learn that the reason for a failure is the application of a blocklist. I might be OK with aggregated (and differentially private) measures to count the number of instances of blocking. |
I think that is a misreading of the original paper - the actual quote is
and states specifically that what NEL can (should?) not do is to make connectivity checks in the background, while the user is not voluntarily accessing the Web. While the user is actively browsing, then the requests that the user agent makes are in scope for NEL logging. This means that requests for subresources of the main document, even if those are hosted on different origins, should trigger NEL if the services which provide them are unreachable. |
Hello @clelland, yes. you have the right quote. So we have:
Let's take a user that visits a service and is unaware of the tracking at that page. Their user agent fails to distinguish trackers from content that the user voluntarily accesses. That is understandable because user agents are only a piece of software. They are not able to see in advance if such requests benefits the user or some other party. That is also the reason why tools like DNS block lists exist. Now, you are right that strict analysis of the original paper does not prevent NEL from providing information about requests that users do not voluntarily make but their user agents execute anyway. However, read https://www.w3.org/TR/privacy-principles/#user-agents. Can we agree that user agents should protect the interests of the users? (Yes, I know, that document is a work in progress and can change at any time.) From that principles, I strongly believe that user agents should act in favour of the individuals and not the first party (web server operator). The user agent must also limit retention, help ensure that only strictly necessary data is collected, and require guarantees from any actor that the user agent can reasonably be aware that data is shared to. When a user agent carries out processing that is detrimental to its user's interests and instead benefits another actor, this is disloyal. Behaviour can be disloyal even if it is done at the same time as processing that is in the person's interest, what matters is that it potentially conflicts with that person's interest. Additionally, it is important to keep in mind that additional processing almost always implies additional risk. So, NEL introduces additional processing, so it likely implies additional risk. Now, see the 3 provided scenarios. Can we agree that we identified such additional risks? Such risks imply that the user agent cannot guarantee that it would stay loyal to the user that controls the user agent. In this light, I argue that we can actually interpret the quote in a simplified way, i.e., so that NEL should log only requests users voluntarily make. This is what a loyal agent would do. But even if we read more from the original paper and it actually meant that all requests that the user agent makes while the user is actively browsing are in scope for NEL logging, the problem of disloyalty of the user agents stands.
So I change the proposal I suggest improving privacy considerations with a text like: "Web pages often contain content from multiple domains. User agents should be loyal to the user. However, current user agents are not sophisticated enough to guarantee that they issue only requests that align with users' will. This means that in some deployments, like scenarios with DNS firewalls, or embedded trackers, NEL could make user agents to be disloyal for the second time: first, the user agent starts an HTTP transfer against user will, and then it reports an error or success of that request. Hence, browsers should consider limiting the parties that are able to insert NEL policies. For example, NEL could report only on the availability of the domain currently displayed in the URL bar." |
The web doesn't have a concept of "requests that the user made voluntarily". Limiting NEL to the top-level origin would significantly restrict the use-cases the API tackles without providing any user benefits. Regarding DNS-based blocking, we reached a conclusion on the call that the server will not really have any way to distinguish between blocking-related DNS failures and "normal" DNS failures. On top of that, the fact that the server will not have this information in real time nor would it have the availability information of servers on other origins, means that the chances of retaliation are slim to none. |
I disagree. The benefit for the user is in the lower number of information and requests going out of the device. Some users have limitations on data or may pay higher costs with higher amount of data. Users might not be interested in helping other parties to improve their services than the domain they visited. Or there might be a different reason of each specific user. Why do you suppose that users are interested in NEL in the first place?
False. normal DNS failure (copied from https://w3c.github.io/network-error-logging/#dns-misconfiguration):
{
"age": 0,
"type": "network-error",
"url": "https://new-subdomain.example.com/",
"body": {
"sampling_fraction": 1.0,
"server_ip": "",
"protocol": "http/1.1",
"method": "GET",
"request_headers": {},
"response_headers": {},
"status_code": 0,
"elapsed_time": 48,
"phase": "dns",
"type": "dns.name_not_resolved"
}
}
But for DNS firewalls, see https://w3c.github.io/network-error-logging/#example-14 and the comment above:
{
"age": 0,
"type": "network-error",
"url": "https://example.com/",
"body": {
"sampling_fraction": 1.0,
"server_ip": "IP address returned by the DNS firewall, typically 127.0.0.1, 0.0.0.0 or similar",
"protocol": "http/1.1",
"method": "GET",
"request_headers": {},
"response_headers": {},
"status_code": 0,
"elapsed_time": 0,
"phase": "dns",
"type": "dns.address_changed"
}
}
The comment, using the NEL editor's draft as template: The user agent then tries to send a request to 127.0.0.1 (returned by DNS firewall), but isn't able to establish a connection to the localhost. The user agent still has the NEL policy in the policy cache, and would use this policy to generate a tcp.timed_out report about the failed network request. However, because the policy's received IP address (192.0.2.2) doesn't match the IP address that this request was sent to, the user agent cannot verify that the server at 127.0.0.1 is actually owned by the owners of example.com. The user agent must therefore downgrade the report to dns.address_changed. |
I just wanted to offer a few personal thoughts/observations in the hope that they help form a more complete view, along with everyone else's views:
I agree that it's generally good for the end user to be sending less data to the internet, I personally run a content and a DNS blocker to help with this. I'm interested in data minimisation for every reason: privacy, security, environment, cost, performance.
At least in my experience (personal and working with our teams elsewhere in the world), upload bandwidth isn't typically metered in terms of cost or amount. Maybe I'm not aware of some scenarios but i've not personally seen that.
I'd propose the argument that users would generally prefer that the services they use are reliable - NEL definitely helps with this as it's real-world feedback that's more or less impossible to obtain any other way. There's defintely a balance to be struck (privacy vs information), as always but I don't think NEL is implicitly bad (and in fact we discussed removing as much of the sent data as possible at TPAC, which was broadly agreed in that conversation).
That's certainly one possibility but it's not the whole story. There are many more failure modes than that. For instance, some DNS firewalls (mine included) return I don't think there's a robust way to know what a user intended to download versus what they didn't when all elements are considered so I'm more focussed on limiting the information sent to the bare minimum instead as I think that's a much simpler target which we definitely can achieve. Cheers |
I know a network that limits upload. Users of a software that I develop reported that for example https://github.com/AdguardTeam/AdGuardHome returns 127.0.0.1 for the blocked domains. |
Oh cool, that's what I use but I configured mine to Cheers |
If we are concerned about local IPs as a blocking method, we can most probably filter those from NEL reports (and consider them equivalent to NXDOMAIN). @clelland - WDYT? |
Or potentially these would be connection failures, rather than DNS failures (but still, we can filter out local IPs, or even not report locally resolved IPs at all). |
Any private IP* is potentially fair game for data minimisation. (* for lack of time to research, e.g. the whole range of addresses in https://en.wikipedia.org/wiki/Private_network) |
I think there's a case to be made that any IP within an RFC1918 block should be excluded, (or probably more broadly, as https://wicg.github.io/private-network-access/ defines more private blocks than those). (I do wonder if there are applications running on private networks which would benefit from NEL if it could operate entirely within that network -- if an intranet host sets a NEL policy, could it be used for network errors within that private network, so long as the endpoint was similarly within the network? Probably worth a separate github issue at this point to avoid forking the discussion here.) |
This issue was originally raised in #136 and I was asked to open a separate issue.
The original paper behind NEL lists four security, privacy, and ethical principles. One of them is that NEL logs only requests users voluntarily make.
Our SECRYPT 2023 paper explains two scenarios in which the expectation fails:
Scenario 1: A user or a network operator can deploy DNS firewall, for example, by changing the host file or the local DNS resolver (Špaček et al., 2019). Consequently, the DNS firewall returns invalid IP addresses for domain queries of the blocked domains. So, for example, if a web page A includes content from a blocked domain D, the browser cannot access the server of D as the DNS returns an invalid IP address.
However, NEL policies installed before the deployment of such a DNS firewall stay in place. Hence, the browser would report to the collector of the blocked domain (if the collector is not also blocked) that the IP address of the server changed, the IP address to which the DNS firewall remaps the domain, and the IP address of the computer running behind the DNS firewall.
We argue that the user or the operator that deployed the DNS firewall took active measures against accessing the blocked domains. Despite this, NEL reveals the DNS firewall to the blocked party.
Scenario 2: online trackers and advertisement. Some data protection authorities (Information Commissioner’s Office, 2019) report that people are often not aware of the queries of their browsers interacting with online advertisement business. (Acar et al., 2020) list more examples of companies misusing online tracking to steal login credentials and other information. Suppose that one of these companies deploys NEL. We argue that if the data protection authorities claim that the online advertisement processing is ”disproportionate, intrusive, and unfair” (Information Commissioner’s Office, 2019), users do not voluntarily access such services. Hence, in this scenario, NEL would track HTTP services that the users are unaware of and thus do not access voluntarily.
Acar, G., Englehardt, S., and Narayanan, A. (2020). No boundaries: data exfiltration by third parties embedded on web pages. Proceedings on Privacy Enhancing Technologies, 2020:220–238.
Information Commissioner’s Office (2019). Update report into adtech and real time bidding. https://ico.org.uk/media/about-the-ico/documents/2615156/adtech-real-time-bidding-report-201906.pdf.
Špaček, S., Laštovička, M., Horák, M., and Plesnı́k, T. (2019). Current issues of malicious domains blocking. In 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), pages 551–556.
Proposal
I suggest improving privacy considerations with a text like:
"Web pages often contain content from multiple domains. As the NEL security, privacy, and ethical principles require to report on HTTP transfers that were willingly started by the users, browsers should consider limiting the parties that are able to insert NEL policies. For example, NEL could report only on the availability of the domain currently displayed in the URL bar."
The text was updated successfully, but these errors were encountered: