Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor geom_contour()? #3044

Closed
clauswilke opened this issue Dec 21, 2018 · 28 comments
Closed

Refactor geom_contour()? #3044

clauswilke opened this issue Dec 21, 2018 · 28 comments
Assignees

Comments

@clauswilke
Copy link
Member

As discussed on Twitter here and previously explained in a blog post by @thomasp85 an argument can be made that geom_contour() should take a 2d density map as input and calculate its own contour lines without using a stat. I'm opening this issue so we can discuss pros and cons of various approaches.

I think there is precedent for this kind of refactoring with the refactoring of geom_bar(), geom_histogram(), and geom_col() some time back.

@clauswilke
Copy link
Member Author

clauswilke commented Dec 21, 2018

One of my personal pet peeves is that stat_density_2d() returns a contour by default. I've always thought that is confusing.

library(ggplot2)

set.seed(4393)
dsmall <- diamonds[sample(nrow(diamonds), 1000), ]
d <- ggplot(dsmall, aes(x, y))

# works
d + geom_raster(aes(fill = stat(density)), stat = "density_2d", contour = FALSE)

# doesn't work but should in my opinion
d + geom_raster(aes(fill = stat(density)), stat = "density_2d")
#> Don't know how to automatically pick scale for object of type function. Defaulting to continuous.
#> Error in is.finite(x): default method not implemented for type 'closure'

Created on 2018-12-21 by the reprex package (v0.2.1)

@clauswilke
Copy link
Member Author

The argument for changing the default of the contour argument to FALSE in stat_density_2d() if we change the input data for geom_contour() is that then stat_density_2d() will work by default with the new geom_contour() and also with geom_tile() and geom_raster(). By contrast, if we leave contour = TRUE it won't work with either.

The argument against changing this default is that it will potentially break existing code.

One way around this problem may be to soft-deprecate stat_density_2d() and introduce a new stat_2d_density(). This will give users of the current stat_density_2d() time to adjust their code while introducing a new function that can be designed cleanly and without baggage.

@clauswilke
Copy link
Member Author

One other issue: geom_contour() doesn't currently understand the fill aesthetic. I see no principled reason why it shouldn't. It would be easy to make the fill aesthetic NA by default but draw polygons underneath the path when fill isn't NA. The only issue to handle better than is currently the case would be polygons that intersect with the plot boundaries.

library(ggplot2)

set.seed(4393)
dsmall <- diamonds[sample(nrow(diamonds), 1000), ]
d <- ggplot(dsmall, aes(x, y))

# works
d + geom_contour(stat = "density_2d")

# doesn't work but should
d + geom_contour(aes(fill = stat(level)), stat = "density_2d")
#> Warning: Ignoring unknown aesthetics: fill

# works
d + geom_polygon(aes(fill = stat(level)), stat = "density_2d")

Created on 2018-12-21 by the reprex package (v0.2.1)

@thomasp85
Copy link
Member

The problem of filled contours has come up time and time again. I don’t think we should attempt to add support before ggplot2 has native support for holed polygons

@thomasp85
Copy link
Member

I’m petty sure @hadley has opinions on this as well, if I remember correctly from the last issue that requested this

@yutannihilation
Copy link
Member

native support for holed polygons

For reference: #2534, #2502

@clauswilke
Copy link
Member Author

I've looked at the prior issues and the output returned by grDevices::contourLines(). Filled contours won't be possible with that output. I think @hadley is right that a new contouring implementation would be needed, preferably written in C or C++.

In case anybody is curious, the code for the current implementation is here:
https://github.com/wch/r-source/blob/trunk/src/main/plot3d.c
https://github.com/wch/r-source/blob/trunk/src/main/contour-common.h
It's not that long, so an adventurous person could try to figure out exactly what it does. It's not immediately obvious to me to what extent the code takes shortcuts based on the assumption that it only needs to generate lines, not closed polygons.

@clauswilke
Copy link
Member Author

Instead of trying to understand the poorly documented C code in the base R distribution, it's probably easier to just implement the marching squares algorithm from scratch. A good overview is provided on wikipedia: https://en.wikipedia.org/wiki/Marching_squares

@clauswilke
Copy link
Member Author

To keep the issues focused, let's take filled contours off the table for this one (#3044). Issue #2534 is about filled contours.

@clauswilke
Copy link
Member Author

I realized the refactoring I proposed may not be possible. It's not clear to me how we would map levels in the contours to colors if the contours don't exist until the very end, right before drawing.

@thomasp85 @yutannihilation If you agree this would be a problem, then I'll close this issue for now. If you can see a way to work around this issue, please let me know.

@thomasp85
Copy link
Member

I don't think that is a problem as such. The colour is directly related to the underlying data of the density/height map and really just a binning performed by the geom...

There are other issues to be solved for sure, such as what the fill of the band should correspond to (the mean, lower, or upper value of the spanned range) as well as how to achieve the current type of plot with disconnected paths, but they should all be doable...

@clauswilke
Copy link
Member Author

There are other issues to be solved for sure, such as what the fill of the band should correspond to (the mean, lower, or upper value of the spanned range) as well as how to achieve the current type of plot with disconnected paths, but they should all be doable...

Yeah, I've thought through these. I think the geom could take a parameter indicating which value should be used to determine fill. The problem with the binning still concerns me, because the geom itself would have perform the color mapping. In any case I'll take a stab at this kind of a geom, but we may find that it's not appropriate for ggplot2 proper and needs to live in an extension package.

@yutannihilation
Copy link
Member

I'm yet to figure out what the problem is...

a parameter indicating which value should be used to determine fill

Are there any way to bypass scale mapping and keep the raw fill values so that the geom can translate later?

But, I feel this is going to be hacky, considering there's no formal way for Geoms to affect on Scales. So, probably I'll agree with Claus that "we may find that it's not appropriate for ggplot2 proper."

@clauswilke
Copy link
Member Author

Since we have closed #2534, I think filled contours are back on the table for this issue. A first release of the isoband package is on CRAN, so we have the underlying functionality to generate filled contour polygons: https://cran.r-project.org/web/packages/isoband/

Some additional thoughts:

  1. An alternative ggplot2 interface to contour lines and areas is available here: https://github.com/clauswilke/ggisoband It could be the basis for a redesign in ggplot2.
  2. To keep things backwards compatible, it may make sense to have two alternative implementations, one where the stat generates lines/polygons and the geom draws those (as is the case currently), and one where the stat (or the new binned scale) bins the data and the geom generates the lines/bands and then draws (as is the case in my alternative code).
  3. I would also like to be able to draw labeled contour lines. I think I'll have the relevant infrastructure in the isoband package soon.

@yutannihilation
Copy link
Member

labeled contour sounds cool!

@clauswilke
Copy link
Member Author

Proof of concept for labeled contour lines.

library(isoband)   # needs version 0.2.0 from github
library(grid)

x <- (1:ncol(volcano))/(ncol(volcano)+1)
y <- (nrow(volcano):1)/(nrow(volcano)+1)
lines <- isolines(x, y, volcano, 10*(10:18))

# create a colored background
grid.newpage()
grid.rect(width = .8, height = .8, gp = gpar(fill = "azure1", col = NA))
grid.rect(width = .6, height = .6, gp = gpar(fill = "cornsilk1", col = NA))
grid.rect(width = .4, height = .4, gp = gpar(fill = "cornsilk2", col = NA))
grid.rect(width = .2, height = .2, gp = gpar(fill = "cornsilk3", col = NA))

# draw labeled lines
grid.draw(isolines_grob(lines))

Created on 2019-02-05 by the reprex package (v0.2.1)

@clauswilke
Copy link
Member Author

@yutannihilation, @thomasp85, and anybody else who would like to chime in: As I mentioned in the previous comment, I'm working on a grob that can draw labeled contour lines. I'm now at the point where the basic infrastructure is in place and the one remaining issue is the correct API, in particular with respect to specifying graphical parameters. I'd like to receive feedback on different options, and I've opened an issue where we can discuss the pros and cons of various approaches:
r-lib/isoband#3

Please take a look and let me know your thoughts. I'd hate to pick an API that'll cause predictable problems down the road.

cc @brodieG @eliocamp @mdsumner

@hadley
Copy link
Member

hadley commented May 9, 2019

@paleolimbot is going to take a look — he'll brainstorm some API ideas for the ggplot2 side, think about whether we can make it a drop in replacement for the existing geom_contour(), and then report back here.

@clauswilke
Copy link
Member Author

Sounds good. One thing I feel pretty certain about at this time: We'll probably need at least two separate geoms, one for contour lines and one for contour bands, e.g. geom_contour() and geom_filled_contour(). Trying to squeeze everything into one geom is not going to give satisfactory results.

@eliocamp
Copy link
Contributor

eliocamp commented May 9, 2019

Claus, does your grob work if points are not defined in a regular grid? I'm asking because computing contours at draw time could prove impossible under some coordinate transformations otherwise. I'm thinking of a scalar field defined at a regular longitude by latitude grid but that is then projected into a cartographical projection.

@clauswilke
Copy link
Member Author

You have to separate the grob (which only draws lines/polygons but doesn't generate them) from the isolining/isobanding functions which calculate the lines/polygons. The functions that actually calculate the lines/polygons require a regular grid. They will not work on transformed data. For this and other reasons, I feel fairly certain now that the default geoms (geom_contour()/geom_filled_contour()) should not compute contours at draw time. Additional geoms could be provided (either as part of ggplot2 proper or as part of an extension package) that calculate contours at draw time. Those geoms may not work with some coordinate transformations.

@mdsumner
Copy link

mdsumner commented May 9, 2019

I've thought that it could do this, these grids are common in climate models, and usually aren't a transformation, i e there's no regular grid but the index. But, isn't it enough to march on the index, then use real coordinates on the vertices? (I'll make an example)

@clauswilke
Copy link
Member Author

Michael, yes, as long as the input can be provided as a regular matrix things should work. Also, the isoband package reconstructs isobands and isolines internally without interpolating coordinates until the very last minute, so it would be fairly straightforward to modify the package to allow for alternative coordinate interpolation approaches if necessary.

See here. Point coordinates are calculated at the very end, when we're collecting the final polygons:
https://github.com/wilkelab/isoband/blob/6bfa72a5eb214d3cf84b16039bc296fefcd7fc83/src/isoband.cpp#L1243-L1246

@paleolimbot
Copy link
Member

paleolimbot commented Jul 12, 2019

@clauswilke I'm working on this now, and I want to make sure I'm doing something with the output of isoband::isolines() and isoband::isobands() that is something like what you envisioned. So far I've only implemented this in StatContour, with isoband::isolines() as the backend for complete = FALSE and isoband::isobands() for complete = TRUE (from reading the code, it looks like the value of complete was unused):

library(ggplot2)

ggplot(faithfuld, aes(waiting, eruptions)) +
  geom_contour(aes(z = density, col = factor(stat(level))))

ggplot(faithfuld, aes(waiting, eruptions)) +
  stat_contour(aes(z = density, fill = stat(level)), complete = TRUE, geom = "polygon")

Created on 2019-07-12 by the reprex package (v0.2.1)

The Geom version described above (and implemented in ggisoband) is cool as well...how that fits into ggplot2 proper is less clear-cut to me, but a better Stat version seems like a good start to me.

@paleolimbot
Copy link
Member

In case you'd like to have a look before the PR, the code is here: master...paleolimbot:issue-3044-isoband-contours

@clauswilke
Copy link
Member Author

clauswilke commented Jul 12, 2019

Yes, what you currently have is approximately what I envisioned for stat_contour(). I think any changes to it that are visible from the outside need to be at most minor.

A couple of comments, in no specific order:

  1. I had envisioned adding a stat_filled_contour() in place of your stat_contour(complete = TRUE). I do think we generally need to distinguish between contour lines and contour polygons, and trying to wrap everything into one function is going to make things more confusing than less. In any case, stat_filled_contour() could simply call stat_contour(complete = TRUE) so you'd get both options in that scenario.

  2. As in the previous point, I think we need geom_contour() and geom_filled_contour(). Importantly, for geom_filled_contour() we need to be able to style both the fill and the outline of the polygons, and there are valid cases where one would use both geom_filled_contour() and geom_contour() in subsequent layers. Again, trying to squeeze everything into one will make things more complex and more confusing.

  3. If the data marshalling is slow, we could add functions to isoband to directly return the data in more appropriate formats. When I wrote the current functions I wasn't sure what the best return format would be. This shouldn't hold up your current PR, though. We can always make this change later.

  4. Visual tests will be problematic, because isoband uses hashmaps internally to construct the lines and polygons. The resulting polygons are always mathematically equivalent, but they differ in how exactly they are drawn. (E.g., in a closed loop, it is irrelevant where you start and stop drawing, but you can see the start point in the generated polygons and hence in the svg output. Similarly, for an open path, it doesn't matter at which end you start drawing, but the svg records it.) To fully resolve this issue, I think we'll need raster-based visual tests. cc @lionel-

  5. I'm not sure at this time how to best implement something like geom_isobands() (https://github.com/clauswilke/ggisoband), which calculates the contours at draw time. We would need to figure out how to make sure the geom knows at draw time what the levels are that were defined earlier, to avoid issues such as Polygons leaving voids of white space  clauswilke/ggisoband#2. I think this is related to the problem of giving geoms access to the scales at draw time. In any case, this is a topic for a later day, in my opinion.

@paleolimbot
Copy link
Member

Thank you! I think I addressed these in the PR, but if you have the time to take a look at #3439 I'd be more than happy to make any changes!

@lock
Copy link

lock bot commented Jan 12, 2020

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jan 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants