Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dask deterministic hashing #300

Closed
crusaderky opened this issue Nov 12, 2019 · 3 comments · Fixed by #320
Closed

dask deterministic hashing #300

crusaderky opened this issue Nov 12, 2019 · 3 comments · Fixed by #320

Comments

@crusaderky
Copy link
Contributor

crusaderky commented Nov 12, 2019

sparse.COO should implement dask deterministic hashing:
https://docs.dask.org/en/latest/custom-collections.html#deterministic-hashing

The current situation is that dask falls back on a random uuid4:

>>> import dask.array
>>> import numpy
>>> import sparse
>>> a = numpy.array([1,2])
>>> sa = sparse.COO(a)
>>> dask.array.from_array(a).__dask_keys__()
[('array-de932becc43e72c010bc91ffefe42af1', 0)]
>>> dask.array.from_array(a).__dask_keys__()                                                                                                                                                                                                              
[('array-de932becc43e72c010bc91ffefe42af1', 0)]
>>> dask.array.from_array(sa).__dask_keys__()                                                                                                                                                                                                              
[('array-77964bd4315f9a3eb43953586a91d273', 0)]
>>> dask.array.from_array(sa).__dask_keys__()                                                                                                                                                                                                             
[('array-cf0d4b78cf3a2b277c351ea3d534a790', 0)]

Suggested implementation:

class COO:
    def __dask_tokenize__(self):
        from dask.base import normalize_token
        return normalize_token((type(self), self.coords, self.data, self.shape, self.fill_value))
@hameerabbasi
Copy link
Collaborator

Feel free to make a PR.

@danielballan
Copy link
Contributor

Is anyone working on this already? If not, I'd be happy to make that PR. The implementation suggested about looks right to me.

@hameerabbasi
Copy link
Collaborator

Feel free. Thanks a lot for offering to contribute, it's really welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants