Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix binary array to hex #64

Merged

Conversation

djunzu
Copy link

@djunzu djunzu commented Nov 2, 2017

Fix _binary_array_to_hex as discussed in #51

Fixing _binary_array_to_hex demands hex_to_hash to be changed as well. That makes hex_to_hash incompatible with hex values generate with previous versions of ImageHash. For that reason, the old implementation is available as old_hex_to_hash in order to allow users convert old hex values back to hashes.

Because of that, I think that this PR should be followed by a major release and the README should point out how to upgrade.

Regarding the new version of hex_to_hash:

  • It assumes all hashes are bidimensional arrays with dimensions hash_size * hash_size. This is the current behavior of all hashes.
  • The implementation does not work for hash_size < 2. (But hash_size = 1 does not make sense anyway.

This PR also fix some tests.
This PR closes #51.

@coveralls
Copy link

Coverage Status

Coverage decreased (-7.1%) to 75.0% when pulling c8dd266 on djunzu:fix__binary_array_to_hex into 43481e5 on JohannesBuchner:master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage decreased (-7.1%) to 75.0% when pulling c8dd266 on djunzu:fix__binary_array_to_hex into 43481e5 on JohannesBuchner:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-7.1%) to 75.0% when pulling 0e96b2c on djunzu:fix__binary_array_to_hex into 43481e5 on JohannesBuchner:master.

1 similar comment
@coveralls
Copy link

coveralls commented Nov 2, 2017

Coverage Status

Coverage decreased (-7.1%) to 75.0% when pulling 0e96b2c on djunzu:fix__binary_array_to_hex into 43481e5 on JohannesBuchner:master.

@coveralls
Copy link

coveralls commented Nov 2, 2017

Coverage Status

Coverage decreased (-7.1%) to 75.0% when pulling 49ee130 on djunzu:fix__binary_array_to_hex into 43481e5 on JohannesBuchner:master.

@JohannesBuchner
Copy link
Owner

That looks very good already, thank you!
What does the .python-version file do?

Could you also include the problematic cases you pointed out in #51 with Lenna.png (hash_size=6, hash_size=2) as new unit tests? You can use the imagehash.png in imagehash/tests/data/, which is not copyrighted.

Could you also include one or two unit tests for the old implementation, just to make sure we don't break it in the future (coverage decreased, I guess because of this).

I had been flirting earlier on with the idea of using hypothesis for testing (e.g. generating new images of various sizes, making hash of some size, generating string, make a hash out of this again). Not necessary to do this here, but might be an idea.

Do you think it would make sense to include bitarray (mentioned in #51) as a dependency? Or binascii.hexlify/unhexlify? Or binhex.binhex/hexbin?

['1', '1'],
['11', '3'],
['111', '7'],
['1111', 'f'],
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohannesBuchner , this test case covers hash_size=2. Previously there was no output. This test case assert there is a output and it is correct.

other_hash = imagehash.hex_to_hash(str(image_hash))
emsg = 'stringified hash {} != original hash {}'.format(other_hash,
image_hash)
self.assertEqual(image_hash, other_hash, emsg)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohannesBuchner , this test assert one can generate a hex from a hash and later generate the hash from the hex, for all values of hash_size >= 2 and < 21. It does not assert the hex value is correct, but it implicit test the problem with hash_size=6 described in #51.

@djunzu
Copy link
Author

djunzu commented Nov 3, 2017

What does the .python-version file do?

I use pyenv to run different Python versions and it uses this file. It should not be committed, my fault. I will add it to .gitignore

Could you also include one or two unit tests for the old implementation, just to make sure we don't break it in the future (coverage decreased, I guess because of this).

OK.

Do you think it would make sense to include bitarray (mentioned in #51) as a dependency? Or binascii.hexlify/unhexlify? Or binhex.binhex/hexbin?

My opinion on this is that the fewer dependencies the better. Everything already works without it.

First: check_hash_length test was testing only two values for the
hash_size. Changed to check all values from 2 to 20.

Second: check_hash_stored was testing only the default value for the
hash_size. Changed to check all values from 2 to 20.

Third: Add missing tests: test_whash, test_whash_length and
test_whash_stored.
This is a fix for the given problems:

1. When a hash is generated with hash_size=6 _binary_array_to_hex
will give a 8 digit hexadecimal representation for it while the
correct would be a 9 digit hecadecimal representation. Retrieving
a hash from the hex value in these cases is impossible.

2. _binary_array_to_hex gives no output when hash_size=2.

3. The _binary-to-hexadecimal conversion is not interoperable with
other implementations of the same algorithm.

Fixing _binary_array_to_hex demands hex_to_hash to be changed as well.
That makes hex_to_hash incompatible with hex values generate with
previous versions of ImageHash. For that reason, the old
implementation is available as old_hex_to_hash in order to allow
users convert old hex values back to hashes.

Regarding the new version of hex_to_hash:
- It assumes all hashes are bidimensional arrays with dimensions
hash_size * hash_size. This is the current behavior of all hashes.
- The implementation does not work for hash_size < 2.
@coveralls
Copy link

coveralls commented Nov 11, 2017

Coverage Status

Coverage remained the same at 82.143% when pulling b89333f on djunzu:fix__binary_array_to_hex into 43481e5 on JohannesBuchner:master.

@djunzu
Copy link
Author

djunzu commented Dec 6, 2017

ping

@JohannesBuchner, will you merge it? Any more change needed?

@JohannesBuchner JohannesBuchner merged commit 5450033 into JohannesBuchner:master Dec 6, 2017
@JohannesBuchner
Copy link
Owner

JohannesBuchner commented Dec 7, 2017

Aha, sorry, I did not see that you added another commit. Thank you, I merged it. Could you check if the newest version (4.0) works OK for you?

@djunzu
Copy link
Author

djunzu commented Dec 7, 2017

Working!
🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

_binary_array_to_hex gives wrong value
3 participants