Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose SHA256 value of DownloadItem #401

Open
patrickkettner opened this issue Jun 1, 2023 · 13 comments
Open

Expose SHA256 value of DownloadItem #401

patrickkettner opened this issue Jun 1, 2023 · 13 comments
Labels
neutral: chrome Not opposed or supportive from Chrome neutral: firefox Not opposed or supportive from Firefox

Comments

@patrickkettner
Copy link
Contributor

@ericlaw1979 opened an issue on crbug about exposing SHA256 of DownloadItem. This is a fairly trivial change in chromium as the value is already calculated and stored, but were unclear if adding this functionality would be burdensome elsewhere and want to check that before discussing more internally if we want to do this.

@Rob--W
Copy link
Member

Rob--W commented Jun 7, 2023

Regardless of currently existing implementations, I don't think that it would be a good design to include this information in DownloadItem by default. It being there implies that the information should be present at all times, even for the majority of the extensions that don't use it. This increases the required complexity for browsers that implement the downloads API.

It would be more acceptable to provide the functionality through a dedicated method, e.g. a new downloads.getFileHash:

  • behind a new downloads.getFileHash permission, even if there is no permission message.
  • only meaningful for completed downloads.
  • rejects if a hash is not available.
  • to be decided: desired behavior if the file does not exist.
  • to be decided: desired behavior if the file has changed.
  • to be decided: desired behavior if the file has relocated.

Beyond the API design aspect, there is the privacy/security consideration: this functionality is essentially an oracle that emits the sha256 hash for any web content. This hash can easily be reversed to the original value when the number of possible options is relatively small. A way to counter this is to add a permission warning for the extension permission.

@Rob--W Rob--W added the follow-up: chrome Needs a response from a Chrome representative label Jun 22, 2023
@Rob--W
Copy link
Member

Rob--W commented Jun 22, 2023

follow-up: chrome Needs a response from a Chrome representative label here means: Patrick to follow up with the engineers to answer the (design) questions/considerations raised above and in the meeting.

@ericlaw1979
Copy link

ericlaw1979 commented Jun 22, 2023

  • I agree that putting this API behind a permission makes sense, and agree that most extensions, including many download-monitoring extensions will not need it.

  • From a privacy/security POV: Yes, this does reveal information about the response content, and because a malicious extension could cause any URL to be treated as a "Download", it does seem like we should probably put this behind a permission prompt at install time.

  • Making this a function rather than a property seems entirely reasonable for performance and ergonomics reasons.

  • Returning an error if the file has been moved or deleted seems entirely reasonable.

  • Behavior for cases where the file has been modified is an interesting question. To address the current use cases, the file's hash at the point of download completion is the only interesting value, so I would definitely want to deem returning that hash value, at any point in the future, regardless of subsequent modification, as "Valid behavior."

However, if there's some other client which does NOT calculate the hash in the course of its normal operation, it probably does make sense to allow the API to hash the current content of the file on disk and return that value, which would avoid introducing (potentially unnecessary) hash computations on every download completion, and avoid the complexity of checking to see if the file was modified, etc.

(An alternative approach might be to say that the getFileHash method is only valid to call in the onChanged callback when the download enters the complete state, but that would probably be non-trivial to implement and somewhat challenging to explain to developers)

@patrickkettner
Copy link
Contributor Author

@ericlaw1979 I spoke to engineering on our side, and we're open whats been discussed, but want a more formal spec. Is that something you are planning on working on?

@oliverdunk oliverdunk added neutral: chrome Not opposed or supportive from Chrome and removed follow-up: chrome Needs a response from a Chrome representative labels Jul 27, 2023
@bershanskiy
Copy link
Member

I wrote a patch for Chromium (no CL yet) just to see what the API could look like and how to use it. I came up with the following:

IDL (simplified):

namespace downloads {
    callback GetFileHashCallback = void(ArrayBuffer hash);
    [supportsPromises, permissions="downloads.getFileHash"] static void getFileHash(
        long downloadId,
        [legalValues=("SHA-256")] DOMString algorithm,
        GetFileHashCallback callback);
}

Specific API choices to consider:

  1. The API creates a new optional permission downloads.getFileHash
  2. The API adds a new method called downloads.getFileHash() which takes two required arguments, downloadId and constant string SHA-256, and an optional callback.
  3. So far, only SHA-256 is supported. To match behavior of SubtleCrypto, this argument is normalized. (E.g, it can be spelled as sha-256 or Sha-256, etc.)
  4. To match behavior of SubtleCrypto, this method returns ArrayBuffer (via callback or promise)
  5. Hash is calculated only when the download was completed and not updated if user updates the file (because that would be a privacy violation) and maintained as long as download exists in history (even if user deletes the file)*
  6. The call succeeds only if it can return a valid hash. It throws and exception in any of the following:
    • extension does not have downloads.getFileHash or downloads permission
    • specified hash algorithm is not SHA-256
    • downloadId is not valid or such download does not exist
    • download is not in "complete" state (download was paused, cancelled)
    • hash is not available (e.g., download was created on current version of Chromium which does not persist hashes across browsing sessions)

*the hash will need to be persisted to disk, which will grow serialization format a bit

Overall, the API addition seems small and I would gladly create the CLs for it. (Only two are needed: one for the API part, one for persisting hash to disk across sessions.)

@bershanskiy
Copy link
Member

@oliverdunk @patrickkettner does IDL in comment above make sense? If so, I can create a Chromium CL for this functionality this week.

@Rob--W
Copy link
Member

Rob--W commented Jan 23, 2024

In response to #401 (comment) :
I would suggest the following changes:

  • Accept only "SHA-256". Not "sha-256", "Sha-256", etc. More than one value is not strictly necessary, but would only increase the implementation complexity by having to normalize.
  • Return the hex-encoded string value instead of an ArrayBuffer of the digest. The two can trivially be converted to each other, but I expect the string value to be more ergonomic for extension developers.

Hash is calculated only when the download was completed

I'm concerned about this part of the proposal. It reintroduces the top level concern from #401 (comment) .

and not updated if user updates the file (because that would be a privacy violation) and maintained as long as download exists in history (even if user deletes the file)*

I wouldn't mind the state to be saved when available (and then consistently be available), but I not guarantee its availability in the API when the file is absent. If a hash algo other than SHA-256 were to be introduced in the future, it wouldn't be able to meet this requirement either, since the hash may be unavailable.

In response to #401 (comment) :

However, if there's some other client which does NOT calculate the hash in the course of its normal operation, it probably does make sense to allow the API to hash the current content of the file on disk and return that value, which would avoid introducing (potentially unnecessary) hash computations on every download completion, and avoid the complexity of checking to see if the file was modified, etc.

From the original crbug and here, it seems that the intent is to retrieve the hash of the downloaded file. To that end, returning the current hash by design would be undesirable. Clients may have different ways to ascertain integrity on disk (e.g. heuristics such as file size, or a different hash/checksum). Clients that don't already save SHA-256 hashes can rely on these alternative internal integrity checks to determine whether the file on disk is still the same, and if so, compute the SHA-256 hash on the fly (and reject with an error otherwise). Clients that do not have any of these integrity heuristics could either return the current hash or reject with an error.

If the source of the hash (at download time / at runtime plus integrity check / at runtime without integrity check) is important to extensions, then it may be worth adding another optional option to enable extensions to select the reliability of the hash. Alternatively, getFileHash could return a dictionary with the hash plus a description of the hash's reliability.

@bershanskiy
Copy link
Member

In response to #401 (comment):

Accept only "SHA-256". Not "sha-256", "Sha-256", etc. More than one value is not strictly necessary, but would only increase the implementation complexity by having to normalize.

I proposed case-insensitive match to follow similar precedent set by Subtle Crypto API (digest() method). Specifically, Web Cryptography API section 18.4.4. "Normalizing an algorithm" step 5 specifies "case-insensitive string match".

In my patch, implementation complexity is just upper-casing that string, and adding a few test cases. Most complexity actually resides in extra tests.

What is the Chrome's opinion on this?

Return the hex-encoded string value instead of an ArrayBuffer of the digest. The two can trivially be converted to each other, but I expect the string value to be more ergonomic for extension developers.

No strong opinion here, I just used ArrayBuffer to match Subtle Crypto API digest() method. Also, I just like the binary formats over their hex encoding. Could we get official opinion from Chrome?

I wouldn't mind the state to be saved when available (and then consistently be available), but I not guarantee its availability in the API when the file is absent. If a hash algo other than SHA-256 were to be introduced in the future, it wouldn't be able to meet this requirement either, since the hash may be unavailable.

Yes, there will be many cases when the hash is not available. Of course, I did not mean to imply that the browser should calculate the hash after saving the file (bad for performance, privacy concerns of extension detecting that the file was modified after download completed, etc.). I only suggest that the browser can calculate multiple hashes at once (just hash input stream with multiple algorithms before saving it to disk and then saving multiple hashes). The choice of desired hashes could be later specified by a separate API (for a sake of argument, downloads.setFileHashAlgorithms(['SHA-256', 'SHA-512'])).

@Rob--W
Copy link
Member

Rob--W commented Jan 24, 2024

In response to #401 (comment):

Accept only "SHA-256". Not "sha-256", "Sha-256", etc. More than one value is not strictly necessary, but would only increase the implementation complexity by having to normalize.

In my patch, implementation complexity is just upper-casing that string, and adding a few test cases. Most complexity actually resides in extra tests.

... which shows that there is a non-zero cost with no extra benefit to supporting case-insensitive algo names. The usual way of declaring supported values in an extension API is through an enum. That enum can then be used for feature detection (e.g. browser.downloads.FILE_HASH_ALGO?.SHA256) if desired.

I wouldn't mind the state to be saved when available (and then consistently be available), but I not guarantee its availability in the API when the file is absent. If a hash algo other than SHA-256 were to be introduced in the future, it wouldn't be able to meet this requirement either, since the hash may be unavailable.

Yes, there will be many cases when the hash is not available. Of course, I did not mean to imply that the browser should calculate the hash after saving the file

I agree that the hash should not be available until after download completion. I interpreted your point 5 as a requirement to have the hash computed and persisted at download completion. The rest of your latest response emphasizes this aspect again, that you would like to precompute all hashes upfront.

I only suggest that the browser can calculate multiple hashes at once (just hash input stream with multiple algorithms before saving it to disk and then saving multiple hashes). The choice of desired hashes could be later specified by a separate API (for a sake of argument, downloads.setFileHashAlgorithms(['SHA-256', 'SHA-512'])).

As described at the end of my last comment, it is not necessary to precompute all hashes upfront IF the API does not guarantee the availability of the hash when the file is unavailable. An implementation that wishes to return the hash using a previously unsupported algo could do so by reading the file, and compute the new hash and the (internally available) sha-256 hash (or whatever also is used). If the computed sha-256 hash matches the internally known hash, then we can assume the integrity of the file to be intact, and that the other hash is therefore also a representation of the downloaded data.

@david-va
Copy link

david-va commented Sep 9, 2024

@Rob--W, @oliverdunk is there an update or timeline for this item?

@oliverdunk
Copy link
Member

@Rob--W, @oliverdunk is there an update or timeline for this item?

Hi David, this is being worked on by an external contributor (@bershanskiy). There is currently an open PR but there is some Chrome specific work that needs to be done before I can get approval on the Chrome side and merge the PR. My understanding is that Anton has limited capacity right now so I expect it might not move forward in the short term unfortunately.

@david-va
Copy link

Hi @oliverdunk, thanks for the update.
It seems like the PR @bershanskiy contributed has been in approved status for a few months now.
Can you detail what is currently missing or estimate when you expect it to be ready?

@oliverdunk
Copy link
Member

Sure - while the PR has approval from Safari and Firefox, there isn't an approval from Chrome right now. Since Chrome is listed as the sponsoring browser (the browser that would drive implementation) that is something we need before we can merge it.

On the Chrome side, we'd like to get privacy and security approval for this API before we commit to accepting a patch and approve the PR. That requires a Chrome-specific API proposal document.

As mentioned, Anton has limited capacity right now so I don't expect this to move forward in the short term unfortunately. I don't want to say more than that since I don't want to speak on his behalf 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
neutral: chrome Not opposed or supportive from Chrome neutral: firefox Not opposed or supportive from Firefox
Projects
None yet
Development

No branches or pull requests

6 participants