Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that NEL does not report on requests that users do not make voluntarily. #150

Open
polcak opened this issue Jun 28, 2023 · 14 comments

Comments

@polcak
Copy link

polcak commented Jun 28, 2023

This issue was originally raised in #136 and I was asked to open a separate issue.

The original paper behind NEL lists four security, privacy, and ethical principles. One of them is that NEL logs only requests users voluntarily make.

Our SECRYPT 2023 paper explains two scenarios in which the expectation fails:

Scenario 1: A user or a network operator can deploy DNS firewall, for example, by changing the host file or the local DNS resolver (Špaček et al., 2019). Consequently, the DNS firewall returns invalid IP addresses for domain queries of the blocked domains. So, for example, if a web page A includes content from a blocked domain D, the browser cannot access the server of D as the DNS returns an invalid IP address.

However, NEL policies installed before the deployment of such a DNS firewall stay in place. Hence, the browser would report to the collector of the blocked domain (if the collector is not also blocked) that the IP address of the server changed, the IP address to which the DNS firewall remaps the domain, and the IP address of the computer running behind the DNS firewall.

We argue that the user or the operator that deployed the DNS firewall took active measures against accessing the blocked domains. Despite this, NEL reveals the DNS firewall to the blocked party.

Scenario 2: online trackers and advertisement. Some data protection authorities (Information Commissioner’s Office, 2019) report that people are often not aware of the queries of their browsers interacting with online advertisement business. (Acar et al., 2020) list more examples of companies misusing online tracking to steal login credentials and other information. Suppose that one of these companies deploys NEL. We argue that if the data protection authorities claim that the online advertisement processing is ”disproportionate, intrusive, and unfair” (Information Commissioner’s Office, 2019), users do not voluntarily access such services. Hence, in this scenario, NEL would track HTTP services that the users are unaware of and thus do not access voluntarily.

Acar, G., Englehardt, S., and Narayanan, A. (2020). No boundaries: data exfiltration by third parties embedded on web pages. Proceedings on Privacy Enhancing Technologies, 2020:220–238.

Information Commissioner’s Office (2019). Update report into adtech and real time bidding. https://ico.org.uk/media/about-the-ico/documents/2615156/adtech-real-time-bidding-report-201906.pdf.

Špaček, S., Laštovička, M., Horák, M., and Plesnı́k, T. (2019). Current issues of malicious domains blocking. In 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), pages 551–556.

Proposal

I suggest improving privacy considerations with a text like:

"Web pages often contain content from multiple domains. As the NEL security, privacy, and ethical principles require to report on HTTP transfers that were willingly started by the users, browsers should consider limiting the parties that are able to insert NEL policies. For example, NEL could report only on the availability of the domain currently displayed in the URL bar."

@polcak
Copy link
Author

polcak commented Jul 14, 2023

Scenario 3: A website (lets call it tracker.example) appears on a block list that is commonly used by DNS firewalls in the wild. As browsers of the users behind the firewall have already installed NEL policies for tracker.example, their browsers start to report many DNS failures to the operator of tracker.example.

In the beginning, the operator can be confused by the number of errors learnt from NEL. After investigation, the operator would likely learn that wide-spread DNS blocking of its domain appeared in the wild. Such information would be hidden or harder to discover without NEL.

The help of users that actually want to block tracker.example to the operator of the tracker.example does not look like something the users want to do voluntarily.

@yoavweiss
Copy link
Contributor

We plan to talk about this and potentially other issues next week at our annual TPAC meeting next Thursday.

@martinthomson
Copy link
Member

Regarding Scenario 3, It is OK for a site to learn that a resource/fetch has failed, but I do not think that it is appropriate for a site to learn that the reason for a failure is the application of a blocklist.

I might be OK with aggregated (and differentially private) measures to count the number of instances of blocking.

@clelland
Copy link
Contributor

The original paper behind NEL lists four security, privacy, and ethical principles. One of them is that NEL logs only requests users voluntarily make.

I think that is a misreading of the original paper - the actual quote is

We can only collect information about requests that user agents issue when users voluntarily access services on the Web. We cannot issue requests in the background (i.e., outside of normal user activity), even though this prevents us from proactively ascertaining service reachability

and states specifically that what NEL can (should?) not do is to make connectivity checks in the background, while the user is not voluntarily accessing the Web.

While the user is actively browsing, then the requests that the user agent makes are in scope for NEL logging. This means that requests for subresources of the main document, even if those are hosted on different origins, should trigger NEL if the services which provide them are unreachable.

@polcak
Copy link
Author

polcak commented Sep 15, 2023

Hello @clelland,

yes. you have the right quote.

So we have:

  • users that voluntarily access services on the Web,
  • requests that user agents issue when users voluntarily access services on the Web,
  • collecting information about such requests.

Let's take a user that visits a service and is unaware of the tracking at that page. Their user agent fails to distinguish trackers from content that the user voluntarily accesses. That is understandable because user agents are only a piece of software. They are not able to see in advance if such requests benefits the user or some other party. That is also the reason why tools like DNS block lists exist.

Now, you are right that strict analysis of the original paper does not prevent NEL from providing information about requests that users do not voluntarily make but their user agents execute anyway.

However, read https://www.w3.org/TR/privacy-principles/#user-agents. Can we agree that user agents should protect the interests of the users? (Yes, I know, that document is a work in progress and can change at any time.)

From that principles, I strongly believe that user agents should act in favour of the individuals and not the first party (web server operator). The user agent must also limit retention, help ensure that only strictly necessary data is collected, and require guarantees from any actor that the user agent can reasonably be aware that data is shared to. When a user agent carries out processing that is detrimental to its user's interests and instead benefits another actor, this is disloyal. Behaviour can be disloyal even if it is done at the same time as processing that is in the person's interest, what matters is that it potentially conflicts with that person's interest. Additionally, it is important to keep in mind that additional processing almost always implies additional risk.

So, NEL introduces additional processing, so it likely implies additional risk. Now, see the 3 provided scenarios. Can we agree that we identified such additional risks? Such risks imply that the user agent cannot guarantee that it would stay loyal to the user that controls the user agent.

In this light, I argue that we can actually interpret the quote in a simplified way, i.e., so that NEL should log only requests users voluntarily make. This is what a loyal agent would do.

But even if we read more from the original paper and it actually meant that all requests that the user agent makes while the user is actively browsing are in scope for NEL logging, the problem of disloyalty of the user agents stands.

  • In scenarios 1 and 3, the user agent is tricked by the visited web to be disloyal to the user and connects to some third party trackers. The DNS firewall blocks such requests. NEL makes that browser disloyal again in the sense that it propagates the information about the firewall to the collector.
  • In scenario 2, we know that the user wanted to visit the first party. However, we do not know if they wanted to interact with the third parties. Possibly it was only the user agent that was disloyal. I believe that standard bodies should address such cases.

So I change the proposal

I suggest improving privacy considerations with a text like:

"Web pages often contain content from multiple domains. User agents should be loyal to the user. However, current user agents are not sophisticated enough to guarantee that they issue only requests that align with users' will. This means that in some deployments, like scenarios with DNS firewalls, or embedded trackers, NEL could make user agents to be disloyal for the second time: first, the user agent starts an HTTP transfer against user will, and then it reports an error or success of that request. Hence, browsers should consider limiting the parties that are able to insert NEL policies. For example, NEL could report only on the availability of the domain currently displayed in the URL bar."

@yoavweiss
Copy link
Contributor

The web doesn't have a concept of "requests that the user made voluntarily". Limiting NEL to the top-level origin would significantly restrict the use-cases the API tackles without providing any user benefits.

Regarding DNS-based blocking, we reached a conclusion on the call that the server will not really have any way to distinguish between blocking-related DNS failures and "normal" DNS failures. On top of that, the fact that the server will not have this information in real time nor would it have the availability information of servers on other origins, means that the chances of retaliation are slim to none.

@polcak
Copy link
Author

polcak commented Sep 19, 2023

Limiting NEL to the top-level origin would significantly restrict the use-cases the API tackles without providing any user benefits.

I disagree. The benefit for the user is in the lower number of information and requests going out of the device. Some users have limitations on data or may pay higher costs with higher amount of data. Users might not be interested in helping other parties to improve their services than the domain they visited. Or there might be a different reason of each specific user.

Why do you suppose that users are interested in NEL in the first place?

we reached a conclusion on the call that the server will not really have any way to distinguish between blocking-related DNS failures and "normal" DNS failures

False. normal DNS failure (copied from https://w3c.github.io/network-error-logging/#dns-misconfiguration):

{ "age": 0, "type": "network-error", "url": "https://new-subdomain.example.com/", "body": { "sampling_fraction": 1.0, "server_ip": "", "protocol": "http/1.1", "method": "GET", "request_headers": {}, "response_headers": {}, "status_code": 0, "elapsed_time": 48, "phase": "dns", "type": "dns.name_not_resolved" } }

But for DNS firewalls, see https://w3c.github.io/network-error-logging/#example-14 and the comment above:

{ "age": 0, "type": "network-error", "url": "https://example.com/", "body": { "sampling_fraction": 1.0, "server_ip": "IP address returned by the DNS firewall, typically 127.0.0.1, 0.0.0.0 or similar", "protocol": "http/1.1", "method": "GET", "request_headers": {}, "response_headers": {}, "status_code": 0, "elapsed_time": 0, "phase": "dns", "type": "dns.address_changed" } }

The comment, using the NEL editor's draft as template:

The user agent then tries to send a request to 127.0.0.1 (returned by DNS firewall), but isn't able to establish a connection to the localhost. The user agent still has the NEL policy in the policy cache, and would use this policy to generate a tcp.timed_out report about the failed network request. However, because the policy's received IP address (192.0.2.2) doesn't match the IP address that this request was sent to, the user agent cannot verify that the server at 127.0.0.1 is actually owned by the owners of example.com. The user agent must therefore downgrade the report to dns.address_changed.

@neilstuartcraig
Copy link
Contributor

I just wanted to offer a few personal thoughts/observations in the hope that they help form a more complete view, along with everyone else's views:

Limiting NEL to the top-level origin would significantly restrict the use-cases the API tackles without providing any user benefits.

I disagree. The benefit for the user is in the lower number of information and requests going out of the device.

I agree that it's generally good for the end user to be sending less data to the internet, I personally run a content and a DNS blocker to help with this. I'm interested in data minimisation for every reason: privacy, security, environment, cost, performance.

Some users have limitations on data or may pay higher costs with higher amount of data.

At least in my experience (personal and working with our teams elsewhere in the world), upload bandwidth isn't typically metered in terms of cost or amount. Maybe I'm not aware of some scenarios but i've not personally seen that.

Users might not be interested in helping other parties to improve their services than the domain they visited. Or there might be a different reason of each specific user.

I'd propose the argument that users would generally prefer that the services they use are reliable - NEL definitely helps with this as it's real-world feedback that's more or less impossible to obtain any other way. There's defintely a balance to be struck (privacy vs information), as always but I don't think NEL is implicitly bad (and in fact we discussed removing as much of the sent data as possible at TPAC, which was broadly agreed in that conversation).

(The DNS failure scenario)

That's certainly one possibility but it's not the whole story. There are many more failure modes than that. For instance, some DNS firewalls (mine included) return NXDOMAIN (as it's a faster/more complete failure mode, IMO) rather than localhost - it would not (as far as I know, correct me if I am wrong) be possible to distinguish between a DNS Firewall returning NXDOMAIN or a resolver returning the same.
There may be other scenarios which would not be possible to disambiguate but that's the one I know of.

I don't think there's a robust way to know what a user intended to download versus what they didn't when all elements are considered so I'm more focussed on limiting the information sent to the bare minimum instead as I think that's a much simpler target which we definitely can achieve.

Cheers

@polcak
Copy link
Author

polcak commented Sep 19, 2023

I know a network that limits upload.

Users of a software that I develop reported that for example https://github.com/AdguardTeam/AdGuardHome returns 127.0.0.1 for the blocked domains.

@neilstuartcraig
Copy link
Contributor

I know a network that limits upload.
Is that a limit in terms of monthly quota? That'd be a bit of a pain for users 😭.

Users of a software that I develop reported that for example https://github.com/AdguardTeam/AdGuardHome returns 127.0.0.1 for the blocked domains.

Oh cool, that's what I use but I configured mine to NXDOMAIN (I think the default was to return 0.0.0.0). The point I was making is just that returning 127.0.0.1 is not the only possible blocking method, thus looking for dns.address_changed with a server_ip of 127.0.0.1/0.0.0.0 is an unreliable/incomplete method, unfortunately (would've been great to have a way to be sure and thus action NEL accordingly).

Cheers

@yoavweiss
Copy link
Contributor

If we are concerned about local IPs as a blocking method, we can most probably filter those from NEL reports (and consider them equivalent to NXDOMAIN). @clelland - WDYT?

@yoavweiss
Copy link
Contributor

If we are concerned about local IPs as a blocking method, we can most probably filter those from NEL reports (and consider them equivalent to NXDOMAIN). @clelland - WDYT?

Or potentially these would be connection failures, rather than DNS failures (but still, we can filter out local IPs, or even not report locally resolved IPs at all).

@LPardue
Copy link

LPardue commented Sep 20, 2023

Any private IP* is potentially fair game for data minimisation.

(* for lack of time to research, e.g. the whole range of addresses in https://en.wikipedia.org/wiki/Private_network)

@clelland
Copy link
Contributor

I think there's a case to be made that any IP within an RFC1918 block should be excluded, (or probably more broadly, as https://wicg.github.io/private-network-access/ defines more private blocks than those).

(I do wonder if there are applications running on private networks which would benefit from NEL if it could operate entirely within that network -- if an intranet host sets a NEL policy, could it be used for network errors within that private network, so long as the endpoint was similarly within the network? Probably worth a separate github issue at this point to avoid forking the discussion here.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants