Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LogTicker broken for colormaps that include zero #8061

Open
jbednar opened this issue Jul 7, 2018 · 17 comments
Open

LogTicker broken for colormaps that include zero #8061

jbednar opened this issue Jul 7, 2018 · 17 comments

Comments

@jbednar
Copy link
Contributor

jbednar commented Jul 7, 2018

ref: #6517
ref: #6536

At least for Bokeh 0.13.0 and earlier, adding a LogTicker() to
color_data_map.py makes the numerical colorbar labels disappear, making it not usable for the intended purpose of a colorbar (showing the mapping from colors to numerical values):

color_bar = ColorBar(color_mapper=mapper, location=(0, 0))

image

color_bar = ColorBar(color_mapper=mapper, location=(0, 0), ticker=LogTicker())

image

The problem can be avoided by using a lower bound > 0 on the colorbar:

p1 = make_plot(LinearColorMapper(palette=Viridis256, low=1, high=100), title='Viridis256 - Linear, low/high = blue/red')
p2 = make_plot(LogColorMapper(palette=Viridis256, low=1, high=100), title='Viridis256 - Log, low/high = blue/red')

image

But because having a lower bound of 0 is not a problem for the actual colormapping, it doesn't seem like it should be an issue for the colorbar's ticker, right? Presumably the colorbar is using log1p, while the ticker is trying to take the regular log of 0 and failing. Can we simply have the ticker start at 1 when the colorbar starts at 0?

You might argue that we should instead be using a lower bound of 1 for the colormapping, but allowing 0 values for colormapping is an extremely important feature for us with datashader and other types of heatmaps that are plotting counts, because we want to use log colormapping to handle large counts but many cells in a heatmap or raster plot are zero because a count of zero is also common. In any case, I think the ticker behavior should match the colormapping behavior, as the purpose of a ticker is to reveal the colormap.

@bryevdv
Copy link
Member

bryevdv commented Jul 9, 2018

Can we simply have the ticker start at 1 when the colorbar starts at 0?

I'm not sure what you are suggesting, concretely. The log ticker doesn't know that it is being used by a color bar. The only way to achieve what you suggest would be to have the colorbar actively intervene and override whatever values the colorbar is set to to begin with, after everything is initialized. But that would be an unusual and atypical thing to do, and, I think, lead to unexpected behavior. It seems like a better and more general solution is just to have a log ticker never try to return ticks for values <= 0, regardless of what the requested range is.

@bryevdv
Copy link
Member

bryevdv commented Jul 9, 2018

Or maybe you mean that the colorbar should set its range to avoid this situation? It would clarify things to have a PR with a concrete propose change to evaluate.

@jbednar
Copy link
Contributor Author

jbednar commented Jul 9, 2018

I'm only proposing that it work, nor precisely how to do it. :-) Not returning a tick for values <=0 would be a good start, but then it seems like the bottom of the range would be missing a tick. But maybe it could first clip the requested range to something where log is valid, then spreat the ticks over that? Or else just use log1p instead of log?

@bryevdv
Copy link
Member

bryevdv commented Jul 9, 2018

Or else just use log1p instead of log?

I'm not sure how that would work for the ticker, AFAICT the color mapper uses log1p to define a relative scale (difference in orders of magnitude) at a certain place, but the ticker ultimately has to contend with an absolute range, whatever it actually happens to be. But maybe I am missing something.

It seems the best way to address this is to just have a way to make the colorbar not include zero in its range, when including zero would cause problems. This could be an explicit property (e.g. skip_zero=True) that has to be set on the color bar, or perhaps we could try to make an adjustment to the internally created ranges based on the ticker type:

https://github.com/bokeh/bokeh/blob/master/bokehjs/src/lib/models/annotations/color_bar.ts#L660

I'm happy to entertain a PR for either, you guys know what best will support your needs.

@bryevdv bryevdv modified the milestone: short-term Sep 11, 2018
@xhongyi
Copy link

xhongyi commented Dec 15, 2018

Up voting this issue. Not working when low is 0 is disappointing.

@giogit
Copy link

giogit commented Mar 29, 2019

Up voting this issue

@bryevdv
Copy link
Member

bryevdv commented Sep 28, 2019

So after doing a survey of Altair, Plotly, and Matplotlib I am unable to find any evidence that any of them support putting 0 on a logaxis, except for MPL's symlog which is both generally regarded as bad, and definitely not appropriate for color bars in any case.

Anyone advocating for this will need to supply some references to other tools or libraries that do allow zero so that their policies can be considered and studied. Otherwise I am inclined to close this issue with noaction

I do think we should have color bar log scales stop using log1p for correctness and to be consistent with other log scales, but there is another issue for that.

@jbednar
Copy link
Contributor Author

jbednar commented Sep 30, 2019

I'm not certain that the other issue (#8724?) covers everything, as this issue is about a problem with the ticker rather than the colormap itself, but maybe you're right that this problem will go away when the other issue is addressed.

It might be helpful to clarify that for Datashader use cases, wanting a logarithmic mapping that includes 0 comes up primarily because there is no NaN available for integer types. For floating-point types, Datashader uses NaN for areas with no data, which I believe works fine already, but that option is not available for integer types. So while doing a log of 0 doesn't make any sense mathematically, it can make sense as a workaround for the limitations of integer types on computers.

With that in mind, would it be meaningful to somehow mask out the zero values for integer arrays to avoid the problem instead? Datashader can't really do this internally, as a user who has asked for an int64 array should get one (which thus cannot represent NaN), but in HoloViews I suppose we could convert the data to float64 and replace zeros with NaNs before giving the array to Bokeh. Or maybe Bokeh could do that before plotting, i.e. to mask out zero values for integer arrays used with log? I'm pretty sure I haven't thought of all the implications of the various options here; I just know that it's an issue.

@bryevdv
Copy link
Member

bryevdv commented Sep 30, 2019

Allowing a user-defined sentinel value of some sort (which could be 0, for that matter, if so desired) that gets converted to a color directly before any other transformation seems more reasonable to accommodate that case than inventing new art that does not seem to exist anywhere else. Then the question is: how do you convey what this unique color represents to users (e.g. because it won't necessarily show up in the colorbar; say you map 0 to bright pink, 0 will not be on the colorbar). I think we could leave that to the user to supply in annotations, subtitles or accompanying text, though.

@jbednar
Copy link
Contributor Author

jbednar commented Oct 1, 2019

I keep coming back to the specific, extremely well defined case that's the starting point: an array of whole-number counts, specifically used as a 2D histogram in the case of Datashader, but in general any array of whole-number counts would have the same property -- (a) the counts are often distributed logarithmically, making a linear colormap be a poor representation, and (b) a count of zero often occurs. Under those conditions, 0 shouldn't have some arbitrary color; if it's mapped to a non-transparent color, that color should normally be the bottom of a color palette, with increasing values mapping to other colors in the same color palette. So having to display or explain some arbitrary color wouldn't come up; 0 has a very natural mapping to the first color already.

@bryevdv
Copy link
Member

bryevdv commented Oct 1, 2019

It sounds like a solution that works for you then? You just choose to map zero to the first bin color.

@jbednar
Copy link
Contributor Author

jbednar commented Oct 1, 2019

Sure!

@poplarShift
Copy link
Contributor

@bryevdv @jbednar

I think the only use case where a log-transform on data including zero could be consistently advocated for is when you have data that is in fact lognormally (or similar) distributed, but the (measurement) noise is such that some values happen to be "too close" to zero or even below.

However, I would personally argue that in such cases, it is up to the user to do the right data analysis, find the right lower limit for the color map, and map the rest to some outlier value. Bokeh shouldn't be expected to magically do that.

@jbednar
Copy link
Contributor Author

jbednar commented Oct 3, 2019

I'm not focusing on cases where "measurement noise" is an issue; my concern is for integer count values, such as those in http://datashader.org/topics/census.html (which has a detailed analysis of why a linear mapping is inappropriate and why a log (well, log1p) mapping reveals the data much more faithfully). For counts, zero is a valid value, not a measurement error, and should map to the lowest slot on the color scale. For floating point values, the situation is quite different and very poorly defined, but luckily in those cases NaN is available to avoid this problem. Users can already mask out the problematic values with NaN, and Bokeh doesn't need to do anything there, but that option isn't available for integer arrays.

@jbednar
Copy link
Contributor Author

jbednar commented Oct 3, 2019

Oh, and simply having the user add 1 to the whole integer array to avoid the problem isn't appropriate either, because then the incorrect count values will be shown on the color bar and in hover. The fact that other libraries haven't addressed this problem doesn't make it any less of a problem!

@poplarShift
Copy link
Contributor

poplarShift commented Oct 3, 2019

Meanwhile: "Do not log‐transform count data" :)

I am not an expert but why could your count data not be modelled and then transformed e.g. with a Poisson distribution, or whatever distribution accurately describes the data? Isn't that what histogram equalization does (very roughly) in the end?

@bryevdv
Copy link
Member

bryevdv commented Oct 3, 2019

@jbednar @poplarShift while I think this an interesting conversation, I think it might be better suited to continue on the Datashader issue tracker.

I think we have a clear course of action for this issue now: provide a mechanism to supply mappings from distinguished values to arbitrary colors, making sure that those mappings short circuit any subsequent computations. That's the affordance, and then everyone use it however suits their needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants