Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle log colormapping change in bokeh 2.0 #4376

Closed
jlstevens opened this issue Apr 10, 2020 · 16 comments · Fixed by #4383
Closed

Handle log colormapping change in bokeh 2.0 #4376

jlstevens opened this issue Apr 10, 2020 · 16 comments · Fixed by #4383

Comments

@jlstevens
Copy link
Contributor

jlstevens commented Apr 10, 2020

Example of the issue as reported by K.-Michael Aye on gitter:

image

Bokeh 2.x no longer lets the log colormapper handle zeros. I understand the goal of this change was to address this issue bokeh/bokeh#8061 (also see the discussion here bokeh/bokeh#8724). This problem can be more easily avoided when using floats as you can use NaN to avoid the log zero issues.

Unfortunately, this fix will cause a lot of confusion as illustrated by the screenshot above. After discussing this with @jbednar, we listed the following ways we could address this:

  1. Document this issue and update how to use clim appropriately, checking and updating all our notebooks and handling all the confusion caused to users who miss the updated docs.

  2. Maintain backwards compatibility somehow in bokeh 2.0, stating that our logz semantics match Bokeh > 2.0 and offer a new option for the better behavior (the old behavior resulted in incorrect colorbars). I'm not sure how this could be achieved technically if the old behavior is not available in bokeh 2.x anymore...

  3. Somehow support a masking approach at the element level (i.e Image at least, and that could be generalized to other elements?). Seems like a whole lot of work...

  4. Address this for the most common use case which will be for any datashading operation that returns zero integers (e.g count). To do this you have to avoid generating integers in the first place as NaN isn't a valid integer: the idea would be to cast integer arrays to float and replace all zeros with NaNs.

If it isn't already obvious, I am in favor of option 4 (and maybe option 2, if it is possible) as it would fix the problem in the majority of cases without having to update all the notebooks and inform all affected users. Although I don't think this is ideal (I would prefer to keep count dtypes as integer) this transformation doesn't violate datashader semantics where NaN and zero are equivalent. Maybe we could have an option for the operations (default off) to skip this transformation for people who want to preserve integer types?

Unfortunately, I don't see any easy solutions and nothing that doesn't involve documenting this change in behavior in Bokeh 2.0 somewhere. What are your thoughts @philippjfr ?

@philippjfr
Copy link
Member

philippjfr commented Apr 10, 2020

My opinion is that we should warn if the log color mapper encounters a zero value and tell the user to set a clim lower bound. Another option is that if we encounter an integer array we can automatically set the lower bound of the log range to 1 but warn if a float is used since it's not obvious how to choose a default lower bound.

Somehow support a masking approach at the element level (i.e Image at least, and that could be generalized to other elements?). Seems like a whole lot of work...

We absolutely should do this as it's needed for masking integer arrays in linked selections. Not sure how quickly we can do this however.

If it isn't already obvious, I am in favor of option 4 (and maybe option 2, if it is possible) as it would fix the problem in the majority of cases without having to update all the notebooks and inform all affected users.

I'm -1 on both 2 and 4 tbh. There was a reason that log1p was disabled in bokeh because it was effectively wrong in many cases. I'm also not too happy casting ints to floats really.

@jlstevens
Copy link
Contributor Author

My opinion is that we should warn if the log color mapper encounters a zero value and tell the user to set a clim lower bound.

This is a pretty good suggestion, especially if the warning could point to the docs to help avoid user confusion. We would still have to update all our notebooks though...

Another option is that if we encounter an integer array we can automatically set the lower bound of the log range to 1 but warn if a float is used since it's not obvious how to choose a default lower bound.

I think this suggestion is even better as it would automatically handle the common case automatically without having to update existing notebooks. We would just have to document what we are doing in the integer case and issue a warning for the float case.

@philippjfr
Copy link
Member

Okay, I'll work up a PR to do the int lower bound handling and float warning.

@jbednar
Copy link
Member

jbednar commented Apr 10, 2020

I think what Philipp is proposing is a better version of option 2, i.e. to set up HoloViews options to avoid both the current urgent broken-colormap problem but also the original broken colorbar-tick issue and misleading colorbar issue, all at once. So it sounds ideal, if it works.

@michaelaye
Copy link
Contributor

just one thought: I thought the integer NaNs exist now? would that help?

@michaelaye
Copy link
Contributor

apparently only experimentally in pandas, still not in numpy: https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html

@jbednar
Copy link
Member

jbednar commented Apr 10, 2020

Pandas uses a masked array approach, which we could optionally return from Datashader as well, but it would take a good bit of plumbing to connect things up between Datashader, HoloViews, and Bokeh.

@philippjfr
Copy link
Member

Datashader outputs an xarray, not pandas DataFrames so that won't work anyway.

@jbednar
Copy link
Member

jbednar commented Apr 11, 2020

I didn't mean that Datashader would return a DataFrame, but that it would (optionally) return a masked array.

@jbednar
Copy link
Member

jbednar commented Apr 11, 2020

But also note that even if we did return a masked array from Datashader for integer aggregates, I'm not at all certain that I'd want the output of a count aggregation to use masking. A count of zero isn't missing data; it's simply a count of zero. Yet I still want to be able to use a log colormapper with counts, such as for the Datashader census example, which works perfectly well using log1p for the colormapping in Datashader's shade function, but has always been messed up in bokeh (as colormapping used log1p but colorbar ticks used log). So I think the clim approach proposed by Philipp is the right answer for counts independently of whether we also support masked integer arrays, which would be used for non-count aggregations (e.g. for sum, masking out pixels where it would be summing zero items).

@poplarShift
Copy link
Collaborator

int lower bound handling and float warning.

+1 on this.

I would also vote for throwing a warning in case the clim gets cut off at 1 for integers. After all, holoviews wouldn't know the array came out of a counting operation or if it simply happens to be integer valued, in which case the user should be warned 0s were omitted. I don't know how easy it would be to control the verbosity of such warnings, but I assume if you are aware there are zeros but you still want to log transform it isn't too much asked of the user to find a specific way of handling this before passing it into holoviews.

@philippjfr
Copy link
Member

I would also vote for throwing a warning in case the clim gets cut off at 1 for integers.

Are you sure this is really needed? What is the alternative here? A zero value can never be displayed in the log color mapper so a lower bound of 1 is the only possible default lower bound for ints.

@michaelaye
Copy link
Contributor

I'd agree that the logic shouldn't be so contrived that it depends on the source of data. It should be as simple as @philippjfr indicates, however, a source of floats would need more user input I guess? Would the palette start at 0.1? Or 0.01? User still needs to decide that.

@poplarShift
Copy link
Collaborator

I would also vote for throwing a warning in case the clim gets cut off at 1 for integers.

Are you sure this is really needed?

No of course, no warning is ever strictly needed so long as the user can be confident that what they're passing in is what they think it is (not always easy/practical in a pipeline) and they are fully aware of holoviews' behaviour. But it is still an ad-hoc choice, even if not completely arbitrary, on the part of holoviews.

In the end, the reason for setting the lower clim to 1 is mostly that it is easy to do (easier than piecing together a color scale that goes as the logarithm for values >=1 but also has a category for zeros, anyway), not because it is correct (because as James said, NaN is not the same as 0, not even in count data). Or put differently, if your data contain zeros, you should not log-transform them in the first place except maybe for quick exploration. Anyway, just my two cents; we've had this discussion before.

@philippjfr
Copy link
Member

@poplarShift I do see what you're getting at and it's a point well taken. I'd say that in future we should support masked arrays properly and at that point we could expand the warning to ints as well asking the user to mask out the zero values explicitly.

a source of floats would need more user input I guess? Would the palette start at 0.1? Or 0.01? User still needs to decide that.

Right, floats definitely need a very deliberate choice by the user and will warn.

@poplarShift
Copy link
Collaborator

I'd say that in future we should support masked arrays properly and at that point we could expand the warning to ints as well asking the user to mask out the zero values explicitly.

Sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants