Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support blake algorithms for file hashing #993

Merged
merged 2 commits into from
Mar 10, 2020

Conversation

joshuagl
Copy link
Member

@joshuagl joshuagl commented Mar 6, 2020

Fixes issue #: N/A

Description of the changes being introduced by the pull request:

  • Don't use securesystemslib.settings.HASH_ALGORITHMS for file hashing – doing so is poor cohesion and overloading a setting meant for key hashing
  • Add a new tuf.settings.FILE_HASHING_ALGORITHM for file hashing
  • Use tuf.settings.FILE_HASHING_ALGORITHM for hashes in timestamp and targets metadata

Please verify and check that the pull request fulfills the following
requirements
:

  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

Using securesystemslib.settings.HASH_ALGORITHMS is undersirable, because
it binds tuf to an implementation detail of the underlying library.
In this specific instance of file hashing algorithms it's even more
undesirable because it's overloading the intended use of the setting
which is "algorithm(s) [...] used to generate key IDs".

Add a new setting tuf.settings.FILE_HASH_ALGORITHMS, with a default
value of ['sha256', 'sha512'] (that matches the current value of
securesystemslib.settings.HASH_ALGORITHMS), to be used for file
hashing operations in tuf.

Signed-off-by: Joshua Lock <[email protected]>
Timestamp.json includes a METAFILES entry for snapshot.json. METAFILES
includes HASHES:
"HASHES is the dictionary that specifies one or more hashes, including the
cryptographic hash function. For example: { "sha256": HASH, ... }."

We've been hard-coding this to a single sha256 hash, as that's the default
algorithms argument of securesystemlib.util.get_file_details() -- this
feels wrong. Change to using the new tuf.settings.FILE_HASH_ALGORITHMS
setting.

Signed-off-by: Joshua Lock <[email protected]>
@joshuagl
Copy link
Member Author

joshuagl commented Mar 6, 2020

(I closed PR #991 because I didn't want that branch name to live forever in the git history, sorry for the noise)
cc @mnm678 who was assigned to review #991 by @JustinCappos

Note: this change has been tested with the simple script in this GitHub gist: test_blake.py which requires the corresponding change in secure-systems-lab/securesystemslib#218

@lukpueh shared a list of places to check for use of hashing algorithms:

  • targets (for targets metadata) – addressed in this PR with
    1f0cb9c
  • targets (for consistent snapshot prefixes) – uses the already computed hashes in the fileinfo dict
  • targets (for hashed bins) – assigns the value of tuf.settings.DEFAULT_HASH_ALGORITHM to the module level constant HASH_FUNCTION and uses that for hashing
  • targets metadata (for snapshot metadata) – I think this uses the already computed hashes in the fileinfo dict
  • snapshot metadata (for timestamp metadata) – addressed in this PR with 930d832
  • keys (for keyid_hash_algorithms) – uses securesystemslib.settings.HASH_ALGORITHMS and often includes an awkward dance where that setting is changed before calling securesystemslib.keys.format_metadata_to_key() and then reverted. This is a prime candidate for a refactoring, but doesn't preclude this PR being merged
  • client updater – needs work, for example _get_target_hash() hard-codes the use of 'sha256' hashing algorithm

In order to unblock @woodruffw's PEP 458 implementation could we land this and create an issue to address client updater in a follow-on PR?

@lukpueh
Copy link
Member

lukpueh commented Mar 6, 2020

In order to unblock @woodruffw's PEP 458 implementation could we land this and create an issue to address client updater in a follow-on PR?

@joshuagl: Have you already identified any issues in the client updater resulting from this PR?

@joshuagl
Copy link
Member Author

joshuagl commented Mar 6, 2020

In order to unblock @woodruffw's PEP 458 implementation could we land this and create an issue to address client updater in a follow-on PR?

@joshuagl: Have you already identified any issues in the client updater resulting from this PR?

I have not. The only thing I've spotted so far is the related issue that _get_target_hash() hard-codes the hash algorithm to 'sha256', whereas I think it should be using tuf.settings.DEFAULT_HASH_ALGORITHM.

I have been able to make a simple modification to tuf/scripts/repo.py to create a repository which uses blake algorithms for filesystem hashes and successfully use tuf/scripts/client.py to fetch a target file from that repo.

@@ -102,6 +102,9 @@
# the securesystemslib external library.
DEFAULT_HASH_ALGORITHM = 'sha256'

# The hashing algorithms used to compute file hashes
FILE_HASH_ALGORITHMS = ['sha256', 'sha512']
Copy link
Contributor

@woodruffw woodruffw Mar 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't apply directly to this PR, but could someone on the TUF team comment on best practices re: configuring tuf.settings?

After this PR, Warehouse will probably do something like this:

import tuf.settings

# ...
tuf.settings.FILE_HASH_ALGORITHMS = ['blake2b']

is this the intended configuration pattern? It feels slightly wrong to modify a third-party module global like this, but it doesn't look like any other interfaces for configuring TUF are exposed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it feels wrong, but I fear in some cases it is the only way to configure TUF. We should fix this at some point.

@mnm678
Copy link
Contributor

mnm678 commented Mar 6, 2020

This looks good as a short term fix, but in the long term I would prefer changing the requirements for keyid calculation as discussed here: #848. As currently implemented, the client has to hard code the same hash algorithm for keyids as the metadata which does not leave much room for flexibility, especially because the keyid does not need to be globally unique.

@joshuagl
Copy link
Member Author

joshuagl commented Mar 6, 2020

This looks good as a short term fix

Thanks @mnm678. This is very much intended as a short-term fix to unblock PEP 458 implementation. @lukpueh mentioned #848 and some other issues around cohesion between securesystemslib and tuf when we discussed this quick fix. I'm absolutely interested in contributing to discussions and solutions towards long-term fixes too.

@trishankatdatadog
Copy link
Member

I agree with both of you. I think a good long-term fix might be specifying the targets hashing and keyid calculation (not necessarily hashing) algorithms in the root metadata?

@mnm678
Copy link
Contributor

mnm678 commented Mar 6, 2020

I think an even simpler long term solution would be to keep the keyid calculation independent of the client. The client only needs to know that keyids are unique within a metadata file, not how they are determined. But we can have a meeting/discussion about this elsewhere.

@lukpueh
Copy link
Member

lukpueh commented Mar 10, 2020

Quick question, @joshuagl, regarding:

targets (for consistent snapshot prefixes) – uses the already computed hashes in the fileinfo dict

Are you referring to the updater ... ?
https://github.com/theupdateframework/tuf/blob/4fe29d138df02035eed9716657cfe0a76e17e178/tuf/client/updater.py#L1375-L1377

(btw. the pop seems hazardous, but that's a different story)

I couldn't find any repo tooling that helps prefixing target files with their hash digests. 🤷‍♂

@lukpueh
Copy link
Member

lukpueh commented Mar 10, 2020

I couldn't find any repo tooling that helps prefixing target files with their hash digests. 🤷‍♂

Never mind, just found it:
https://github.com/theupdateframework/tuf/blob/4fe29d138df02035eed9716657cfe0a76e17e178/tuf/repository_lib.py#L1319-L1324

@lukpueh
Copy link
Member

lukpueh commented Mar 10, 2020

Regarding

targets metadata (for snapshot metadata) – I think this uses the already computed hashes in the fileinfo dict

Rather looks like python-tuf does not include hashes (only version) in snapshot metadata. See repository_lib.generate_snapshot_metadata.

Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lukpueh lukpueh merged commit ba57da0 into theupdateframework:develop Mar 10, 2020
@joshuagl joshuagl deleted the joshuagl/quickblake branch March 10, 2020 17:03
@joshuagl
Copy link
Member Author

Regarding

targets metadata (for snapshot metadata) – I think this uses the already computed hashes in the fileinfo dict

Rather looks like python-tuf does not include hashes (only version) in snapshot metadata. See repository_lib.generate_snapshot_metadata.

Seems like this could be worthy of an issue so that we can remember to add to python-tuf?

@lukpueh
Copy link
Member

lukpueh commented Mar 11, 2020

Seems like this could be worthy of an issue so that we can remember to add to python-tuf?

Good suggestion! Here we go #996

For the interested reader: The fields were removed in #469 to fix #468 because of the spec changes in #466 motivated by #465. (I just love git and GitHub! :D)

@mnm678 mnm678 mentioned this pull request Apr 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants