Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update for FIPS Compliance #3291

Merged
merged 11 commits into from
Dec 22, 2023
Merged

Update for FIPS Compliance #3291

merged 11 commits into from
Dec 22, 2023

Conversation

ccravens
Copy link
Contributor

@ccravens ccravens commented Dec 21, 2023

I need to run altair in a FIPS compliance environment and currently unable to due to the use of md5 as a hashing algorithm. I've updated these to FIPS-compliant sha256 in the hopes that we can get altair running in a FIPS-compliant environment.

Example of error in FIPS environment:

image

@ccravens ccravens mentioned this pull request Dec 21, 2023
…on changed due to the different hash algorithm which is used for the dataset name
Copy link
Contributor

@binste binste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR and for the additional info in your comment at #3292 (comment)!

I pushed a small commit to fix the failing test. Looks good from my side! I'm also in favour of making Altair FIPS compliant. Based on my understanding of your comment and some reading, this might be the only change necessary but we can't say for sure. In your FIPS environment, can you pip install from GitHub to test this?

Waiting with the merge until @mattijn also had another look as he answered in #3292.

@ccravens
Copy link
Contributor Author

hello @binste yes I would be more than happy to test in our FIPS environment! I'll let you know how that goes so you can get feedback on that before merging. Thanks :)

@jakevdp
Copy link
Collaborator

jakevdp commented Dec 21, 2023

A comment here: these hashes are just a way to generate unique keys for data objects, and the hashes appear literally in the resulting vega-lite output. Changing from md5 to sha256 doubles the length of the hashes, which in my opinion is not an improvement, because it makes the raw specifications harder for a human to read. It also leads to much longer vega-lite viewer URLs (see the modified test in this PR for an example) which arguably leads to a worse user experience.

The concerns of FIPS don't really apply here, because we're not using the md5 hash in any way that is relevant to privacy or security. Is there any way to mark a usage of md5 to tell the FIPS compilance checker that their assumptions about the code are incorrect?

@mattijn
Copy link
Contributor

mattijn commented Dec 21, 2023

Thanks for chiming in Jake! From here, I read it is not a checker but a special "FIPS compliant" Python build:

Constructors for hash algorithms that are always present in this module are sha1(), sha224(), sha256(), sha384(), sha512(), sha3_224(), sha3_256(), sha3_384(), sha3_512(), shake_128(), shake_256(), blake2b(), and blake2s(). md5() is normally available as well, though it may be missing or blocked if you are using a rare “FIPS compliant” build of Python.

Could we maybe use a non-cryptographic hash function that still produces hashes, but not doubles the length?

@jakevdp
Copy link
Collaborator

jakevdp commented Dec 21, 2023

Sure, a non-cryptographic hash would be fine. The history here (if I recall correctly) is that we previously used a counter within the runtime, but this occasionally produced collisions in notebooks that had chart outputs from different Python sessions. Then we switched to hash to avoid these collisions, but due to the fact that hash has a different salt in each runtime, users reported frustrations that re-running a notebook (in a CI job, say) would change the value of the outputs. Then we switched to md5, because it's a convenient hash function that produces consistent results across runs.

If we can find an alternative that produces short hashes that are consistent across runs, then that should be fine. Maybe slicing off the first 32 characters of the sha256 hash would be a good solution.

@ccravens
Copy link
Contributor Author

Hello all!

@mattijn you are correct, the md5 hash function is not available in a FIPS-compliant environment.

@jakevdp given that the hash value is not used for security purposes, I've updated to truncate the sha256 to 32 bytes (128-bits). This should allow it to run in FIPS-compliant environments while also maintaining the same length as an md5 hash. Obviously chance of collision is there, but still 128-bits which is very very small (practically zero)

Hope this helps!

@mattijn mattijn merged commit 5540ed9 into vega:main Dec 22, 2023
10 checks passed
@mattijn
Copy link
Contributor

mattijn commented Dec 22, 2023

I updated the tests and added an enhancement note that we are now FIPS compliant and a backward incompatible change that we moved from md5 to a truncated sha256 non-cryptographic hash. Thanks for this PR @ccravens and reviews @binste and @jakevdp! Merging.

@ccravens
Copy link
Contributor Author

Thanks @mattijn I'll install main branch on my FIPS environment and let you know how it goes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants