-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix binary array to hex #64
Fix binary array to hex #64
Conversation
1 similar comment
c8dd266
to
0e96b2c
Compare
1 similar comment
0e96b2c
to
49ee130
Compare
That looks very good already, thank you! Could you also include the problematic cases you pointed out in #51 with Lenna.png (hash_size=6, hash_size=2) as new unit tests? You can use the imagehash.png in imagehash/tests/data/, which is not copyrighted. Could you also include one or two unit tests for the old implementation, just to make sure we don't break it in the future (coverage decreased, I guess because of this). I had been flirting earlier on with the idea of using hypothesis for testing (e.g. generating new images of various sizes, making hash of some size, generating string, make a hash out of this again). Not necessary to do this here, but might be an idea. Do you think it would make sense to include bitarray (mentioned in #51) as a dependency? Or binascii.hexlify/unhexlify? Or binhex.binhex/hexbin? |
['1', '1'], | ||
['11', '3'], | ||
['111', '7'], | ||
['1111', 'f'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JohannesBuchner , this test case covers hash_size=2. Previously there was no output. This test case assert there is a output and it is correct.
other_hash = imagehash.hex_to_hash(str(image_hash)) | ||
emsg = 'stringified hash {} != original hash {}'.format(other_hash, | ||
image_hash) | ||
self.assertEqual(image_hash, other_hash, emsg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JohannesBuchner , this test assert one can generate a hex from a hash and later generate the hash from the hex, for all values of hash_size >= 2 and < 21. It does not assert the hex value is correct, but it implicit test the problem with hash_size=6 described in #51.
I use pyenv to run different Python versions and it uses this file. It should not be committed, my fault. I will add it to .gitignore
OK.
My opinion on this is that the fewer dependencies the better. Everything already works without it. |
First: check_hash_length test was testing only two values for the hash_size. Changed to check all values from 2 to 20. Second: check_hash_stored was testing only the default value for the hash_size. Changed to check all values from 2 to 20. Third: Add missing tests: test_whash, test_whash_length and test_whash_stored.
This is a fix for the given problems: 1. When a hash is generated with hash_size=6 _binary_array_to_hex will give a 8 digit hexadecimal representation for it while the correct would be a 9 digit hecadecimal representation. Retrieving a hash from the hex value in these cases is impossible. 2. _binary_array_to_hex gives no output when hash_size=2. 3. The _binary-to-hexadecimal conversion is not interoperable with other implementations of the same algorithm. Fixing _binary_array_to_hex demands hex_to_hash to be changed as well. That makes hex_to_hash incompatible with hex values generate with previous versions of ImageHash. For that reason, the old implementation is available as old_hex_to_hash in order to allow users convert old hex values back to hashes. Regarding the new version of hex_to_hash: - It assumes all hashes are bidimensional arrays with dimensions hash_size * hash_size. This is the current behavior of all hashes. - The implementation does not work for hash_size < 2.
49ee130
to
b89333f
Compare
ping @JohannesBuchner, will you merge it? Any more change needed? |
Aha, sorry, I did not see that you added another commit. Thank you, I merged it. Could you check if the newest version (4.0) works OK for you? |
Working! |
Fix _binary_array_to_hex as discussed in #51
Fixing _binary_array_to_hex demands hex_to_hash to be changed as well. That makes hex_to_hash incompatible with hex values generate with previous versions of ImageHash. For that reason, the old implementation is available as old_hex_to_hash in order to allow users convert old hex values back to hashes.
Because of that, I think that this PR should be followed by a major release and the README should point out how to upgrade.
Regarding the new version of hex_to_hash:
This PR also fix some tests.
This PR closes #51.