Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 'to_masked_array' computing dask arrays twice #3006

Merged
merged 1 commit into from
Jun 10, 2019

Conversation

djhoese
Copy link
Contributor

@djhoese djhoese commented Jun 10, 2019

I ran in to this issue when using imshow on a dask-based DataArray and dask's ProgressBar() context manager. I noticed that when using my_data_arr.plot.imshow the dask array was being computed twice. This PR fixes that.

Suggestions for tests and documenting this fix are welcome.

  • Closes #xxxx
  • Tests added
  • Fully documented, including whats-new.rst for all changes and api.rst for new API

@djhoese
Copy link
Contributor Author

djhoese commented Jun 10, 2019

In my own tests I've been using the following custom scheduler with dask.config.set(scheduler=CustomScheduler()) to point out what code is computing the array when I don't want it to:

class CustomScheduler(object):
    """Custom dask scheduler that raises an exception if dask is computed too many times."""

    def __init__(self, max_computes=1):
        """Set starting and maximum compute counts."""
        self.max_computes = max_computes
        self.total_computes = 0

    def __call__(self, dsk, keys, **kwargs):
        """Compute dask task and keep track of number of times we do so."""
        import dask
        self.total_computes += 1
        if self.total_computes > self.max_computes:
            raise RuntimeError("Too many dask computations were scheduled: {}".format(self.total_computes))
        return dask.get(dsk, keys, **kwargs)

Does something like this exist in the xarray tests? If not, I could add it then add a dask test to the DataArray tests.

@shoyer
Copy link
Member

shoyer commented Jun 10, 2019

I'd be happy to merge this fix. I think it's a vestige of when we used to always cached computations.

I'd also love to have more comprehensive test coverage, and I like the look of your custom scheduler. But I'm not sure it's worth adding that many lines of test logic for what is essentially a one line fix.

@djhoese
Copy link
Contributor Author

djhoese commented Jun 10, 2019

@shoyer Makes sense. Any idea what's up with the travis test? It doesn't look like it is from my changes.

@shoyer
Copy link
Member

shoyer commented Jun 10, 2019

@djhoese It looks like something broke in the development version of dask. But that shouldn't hold up merging your fix here...

@shoyer shoyer merged commit adbd59a into pydata:master Jun 10, 2019
@djhoese djhoese deleted the bugfix-masked-compute branch June 11, 2019 23:44
dcherian added a commit to dcherian/xarray that referenced this pull request Jun 24, 2019
* master: (31 commits)
  Add quantile method to GroupBy (pydata#2828)
  rolling_exp (nee ewm) (pydata#2650)
  Ensure explicitly indexed arrays are preserved (pydata#3027)
  add back dask-dev tests (pydata#3025)
  ENH: keepdims=True for xarray reductions (pydata#3033)
  Revert cmap fix (pydata#3038)
  Add "errors" keyword argument to drop() and drop_dims() (pydata#2994) (pydata#3028)
  More consistency checks (pydata#2859)
  Check types in travis (pydata#3024)
  Update issue templates (pydata#3019)
  Add pytest markers to avoid warnings (pydata#3023)
  Feature/merge errormsg (pydata#2971)
  More support for missing_value. (pydata#2973)
  Use flake8 rather than pycodestyle (pydata#3010)
  Pandas labels deprecation (pydata#3016)
  Pytest capture uses match, not message (pydata#3011)
  dask-dev tests to allowed failures in travis (pydata#3014)
  Fix 'to_masked_array' computing dask arrays twice (pydata#3006)
  str accessor (pydata#2991)
  fix safe_cast_to_index (pydata#3001)
  ...
@djhoese
Copy link
Contributor Author

djhoese commented Jun 25, 2019

@shoyer Any idea when there might be another release of xarray where this fix will be included? I'm teaching a tutorial at SciPy this year that is effected by this bug. Learners are starting to prepare for the tutorials and I'd like if they could have this fix before the day of the tutorial.

@shoyer
Copy link
Member

shoyer commented Jun 25, 2019

hopefully out this week! I am going to wait another day or two for someone else to look at #3040 and then will probably be issuing a release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants