Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attestation of device lifetime #15

Open
pamellaprevedel-hpm opened this issue Sep 21, 2022 · 24 comments
Open

Attestation of device lifetime #15

pamellaprevedel-hpm opened this issue Sep 21, 2022 · 24 comments

Comments

@pamellaprevedel-hpm
Copy link

We are Allowme, a business unit from Tempest Security Intelligence, a cybersecurity company from Brazil, Latam, with more than 22 years in operation. Allowme's mission is to help companies protect the digital identities of their legitimate customers through a complete fraud prevention platform.

Context and threat
Automation is one of the main requirements for large scale attacks and high profit for attackers, therefore, it has become a priority from a malicious actor's point of view.

When doing a massive attack, fraudsters usually use navigation automation tools without a graphical interface, or Headless Browser (https://en.wikipedia.org/wiki/Headless_browser), usually using versions of Chrome Webdriver (https ://en.wikipedia.org/wiki/Selenium_(software)#Selenium_WebDriver).

However, a common characteristic in attacks of this nature is that the attacker essentially needs to create many instances of browsers to execute the attack and, when this is done, in general the created browser has very unique characteristics, as if they were installations performed at that moment.

Proposal
Being able to accurately and safely attest to improper manipulations, the lifetime of that User Agent instance, from its initialization to the present moment, can be of extreme importance and value in the detection of automated threats, both on the web and on mobile devices.

On web browsers
The combination of different signals could be used to estimate the lifetime of a device running, for example: lifetime of cookies, time of plugin installation, time since last update, lifetime of an associated profile to the browser, etc.

On mobile devices
For mobile devices, knowing the OS lifetime can be even more accurate, as the hardware can indicate this, in addition to the connection between the device and the manufacturer's application store (Google Play or Apple Store).

On Android, for example, we could use some relevant information, such as:
Date of acquisition of an App on Google Play
Date of installation/re-installation of the App on Google Play

However, an important decision to be made is whether to recompile Apps after the first installation, as this could compromise the lifetime of a given App.

Privacy implications and safeguards
There is no PII data being used to calculate the lifetime of a particular device, so there is very little threat to user privacy.

However, this data could be used as an additional signal to re-identify users if combined with browsing history and other behavioral information.

Safeguard #1
The API could only return if the lifetime is longer than a specific time period, for example:
If the lifetime is longer than 1 day
If the lifetime is longer than 1 week
If the lifetime is more than 1 month
If the lifetime is more than 3 months
If the lifetime is more than 1 year

Thus, it would be difficult to use this data to identify a person, even when combined with other user behavior data.

@SpaceGnome
Copy link

Definitely like the idea - would be interesting if we could merge this and #9 ?

@philippp
Copy link
Contributor

Attesting the lifetime of the UA instance, OS, or any other part of the system either requires attesting that the UA instance / OS / etc is indeed what it presents itself as being. #9 proposes an attestation of the time-since-cookie-reset, which becomes meaningful only on systems that can attest everything up to the integrity of the cookie jar. (The intention of the bug is to not leave an attack opportunity via UI in that end state).

Is this in line with how you are thinking about it?

@bmayd
Copy link

bmayd commented Sep 27, 2022

I like the general concept of having attestations for UA ages as well as the suggestion that the API be limited to returning true of false when asked whether the lifetime is greater than a specified interval.

It seems that attesting to the age of a thing is generally useful and that it would be worthwhile developing an abstract pattern that could be broadly applied.

I think we might also want to attest to the fact that something is used, how recently and how often, though the latter may be more difficult to provide. An app instance installed a month ago and used for the first time an hour ago is meaningfully different from an app instance installed a month ago and used daily. Alternatively or additionally, we might provide cumulative utilization information e.g., used for longer than an hour since install or used on average more than 10 minutes/week.

I think we want to be careful about the question of privacy:

Privacy implications and safeguards
There is no PII data being used to calculate the lifetime of a particular device, so there is very little threat to user privacy.

Although no PII is used, there is meaningful potential for this API to provide information that could contribute to the development of user or subgroup profiles. For example, a list of app installs by date could be combined with responses from this API to identify the subset of users a device might belong to and repeating the process with different apps would progressively narrow the list of potential devices. Given that, we'd probably want to budget access to this sort of API.

@dvorak42
Copy link
Member

We'll be briefly (3-5 minutes) going through open proposals at the Anti-Fraud CG meeting this week. If you have a short 2-4 sentence summary/slide you'd like the chairs to use when representing the proposal, please attach it to this issue otherwise the chairs will give a brief overview based on the initial post.

@AramZS
Copy link

AramZS commented Oct 28, 2022

I want to note my agreement with @bmayd here:

There is no PII data being used to calculate the lifetime of a particular device, so there is very little threat to user privacy.

Lifetime data as described here would instantly become a major vector for fingerprinting.

Also, I'm very unclear to what extent this could prevent the signal itself from being fraudulently set by artificial browsers.

@dvorak42
Copy link
Member

From the CG meeting, the question of how a device attests that they're not lying about their lifetime came up, though there was a question if there are certain use cases that are okay relying on the less clear lifetime guarantees. One next step there might be to pick out specific cases with the weaker lifetime model and see if those are of interest pursuing, or if there's an attestable technique to manage device lifetime in a privacy-preserving way?

@michaelficarra
Copy link
Member

If we're concerned about the sensitivity of the lifetime, a possible solution would be to use a zero-knowledge proof like Yao's Millionaires' problem. Of course, this would still depend on device integrity to ensure that the client isn't lying.

@SamuelSchlesinger
Copy link

The zero-knowledge proof idea is nice @michaelficarra. Here's a sketch of an approach that I think works: when the server first sees a client, the client requests a signed copy of the current time. When the server wants to verify that client age is older than a certain date, the client sends them a zero-knowledge proof that there exists a time which was 1. signed by the server, and 2. older than the given cutoff.

This approach is not adversarially robust, as now a bad actor can simply take the signed time and send it around to all of their different bots. So here's an attempt to mitigate that, using the anonymous rate limiting scheme from the zkCreds paper : instead of a signed time, we get a blind signature of a random nonce X with the current time t as the public metadata. When the server wants to verify that the client age is older than a specific cutoff, they now send a proof as well as an output Y from a cryptographically secure PRF. The proof would now say that there exists such a signed X with public metadata t such that PRF(X || epoch || counter) = Y, counter < RATE_LIMIT, and t < cutoff, where epoch is based on the current time interval like the given week or month or whatever and well known to the server and client. This way there are only RATE_LIMIT valid Y outputs per epoch and the signed time in the browser can't just be used by an infinite number of bots in any given time window.

@SamuelSchlesinger
Copy link

@AramZS, I think my comment addresses your concern about fingerprinting by only allowing a single bit based on whether the date is before a certain threshold. The one major difference between this issue's request and what I'd endorse is that device age seems far too sticky to be consistent with the privacy goals of the web platform. Instead, I'd like to propose a user-wipeable "profile age" which would be cross-site but can be wiped by the user along with other cross-site state.

@dvorak42 I think it would make sense to discuss this in the next week's AFCG meeting -- do we have an open slot on the agenda?

@akakou
Copy link

akakou commented Jun 21, 2024

@SamuelSchlesinger

Thank you for your very interesting presentation.
I would like to share some ideas that may help this proposal.

My idea is to reconstruct your idea, based on the Scrappy discussed in #21.
Although Scrappy is very similar to zk-cred, it has some beneficial points as follows:

  1. Scrappy is based on standardized cryptographic protocols (i.e., DAA, EPID).
  2. The latency of Scrappy is likely shorter than zk-cred because it does not use zk-SNARKs.*
  3. The key for Scrappy can be stored in a TPM (i.e., secure hardware chip) since Scrappy is compatible with TPM.

*If Scrappy is running on the computer directly and not on TPM.

@akakou
Copy link

akakou commented Jun 21, 2024

By the way, the related work section in Scrappy's paper might help answer the question in your slide
(i.e., Alternatives to rate limiting).

image

Paper: https://www.ndss-symposium.org/ndss-paper/scrappy-secure-rate-assuring-protocol-with-privacy/

@SamuelSchlesinger
Copy link

@akakou if I understand Scrappy correctly, it allows one to rate limit to a single request within a given time window, which is significantly less flexible than allowing a rate limit per time window without any constraints on how fast that rate limit can be used. For instance, if I open 10 tabs within a minute while scrolling through a news feed, it would be a shame if you could only have 1 anonymous show.

@akakou
Copy link

akakou commented Jun 25, 2024

@SamuelSchlesinger
As you mentioned, Scrappy, as described in the paper, has certain inconveniences. However, this can be overcome with your idea, I think.

Scrappy currently uses a time window in the signing parameters(called basename) of DAA. By extending Scrappy with your idea, we can use a concatenation of a time window and a counter in the basename instead of just the time window.

Concretely...

Before Scrappy:
$\sigma = DAA\_Sign(msg, T, sk)$

  • msg is the message
  • T is time-window
  • sk is user' secret key
  • $\sigma$ is signature

Extended Scrappy with your idea
$\sigma = DAA\_Sign(msg, T||C, sk)$

  • C is counter
  • || is concatination

This works similarly to your rate-limiting system because DAA functions as a PRF and zero-knowledge proof.

@SamuelSchlesinger
Copy link

This works similarly to your rate-limiting system because DAA functions as a PRF and zero-knowledge proof.

This makes good sense, its better than my worse idea of adding RATE_LIMIT basenames per service and selecting a random one -- do you know how to efficiently add a proof that C < RATE_LIMIT? I think this approach would likely be more performant than a SNARK based approach, so I am very open to it.

@akakou
Copy link

akakou commented Jun 25, 2024

@SamuelSchlesinger

You mean how Scrappy proves $C &lt; \mathtt{RATE\_LIMIT}$ under the specific epoch, right?

@akakou
Copy link

akakou commented Jun 25, 2024

@SamuelSchlesinger
Honestly, I do not fully understand zk-cred yet, and the logic may be close to your ideas, I think.

We can accomplish this by the deterministic computation in $DAA\_Sign()$ .

A part of a signature (called pseudonym) is computed from the basename (i.e., $C$ *1) deterministically.
(In detail, the $\mathtt{pseudonym} = H(\mathtt{basename})^{\mathtt{sk}} = H(C)^{\mathtt{sk}}$ )

In this case, the verifier cannot generate a valid signature with $C \ge \mathtt{RATE\_LIMIT}$ for the following reasons:

  1. If the signer chooses $C \ge \mathtt{RATE\_LIMIT}$ , the verifier can notice this from the $C$ sent by the signer.
  2. If the signer reuses $C$ which has already been sent in the past, the verifier can recognize this by checking the $\mathtt{pseudonym}$ .
    • This is because the verifier received the same $\mathtt{pseudonym}$ during the first verification.
  3. If the signer lies about $C$ , the verifier's verification will fail.
    • This is due to the DAA signatures having the (zero-knowledge) proof of $SPK\{(\mathtt{sk}): \mathtt{pseudonym} = H(C)^{\mathtt{sk}}\}$.

*1 For ease of explanation, we omit epoch $T$ but the same logic applies even if $T$ is added to the basename.

@akakou
Copy link

akakou commented Jun 30, 2024

@SamuelSchlesinger

Hi! Is it all clear for you?
(If my explanation is not enough, I hope to supplement it.)

@SamuelSchlesinger
Copy link

The thing that I can't really live with is sending C to the verifier, as it creates an information leak. We need to prove that 0 <= C < RATE_LIMIT in zero knowledge, not just give C.

@akakou
Copy link

akakou commented Jul 2, 2024

@SamuelSchlesinger
I see your concern: it would cause significant privacy concerns.

How about randomly choosing $C$?*
This mitigates the privacy concern.

*choosing $C$ under the condition ($C &lt; RATE\_LIMIT$ and $C$ have not been used)

@akakou
Copy link

akakou commented Jul 5, 2024

@SamuelSchlesinger
How about that?
(I have an additional idea if you are still concerned about privacy.)

@SamuelSchlesinger
Copy link

Randomly choosing it doesn't exactly mitigate it -- for instance, there are a set of clients which have used all but RATE_LIMIT - 1 choices and they only have one choice left and this choice can be used to re-identify them. After reviewing Scrappy, I'm not so convinced it is a sufficient approach for the goals we're trying to achieve here and the generalizations we're trying to head towards.

@akakou
Copy link

akakou commented Jul 9, 2024

Umm...as far as I understand, to track users by the $C$, the verifier needs to know the signer's remaining choices. For example, the verifier must know the remaining choice(e.g., $RATE\_LIMIT - 1$) and that it is only left.

However, I cannot imagine this situation. This is because the verifier cannot know the user's remaining choices due to Scrappy's unlinkability and the random selection of $C$. (Conversely, to know this, the verifier would need to track users with another scheme.)

@akakou
Copy link

akakou commented Jul 9, 2024

@SamuelSchlesinger
Is this reasonable? If anything is missing, I would like to know.

@akakou
Copy link

akakou commented Jul 19, 2024

@SamuelSchlesinger
How about this...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants