Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression in geom_sf (+ geom_bin2d) after latest update #4527

Closed
Mashin6 opened this issue Jun 22, 2021 · 10 comments
Closed

regression in geom_sf (+ geom_bin2d) after latest update #4527

Mashin6 opened this issue Jun 22, 2021 · 10 comments

Comments

@Mashin6
Copy link

Mashin6 commented Jun 22, 2021

I am not sure if this should be posted here or in r-spatial/sf. Recently I updated several packages in my "R version 4.0.4 (2021-02-15) MacOS 11.4 " and noticed several weird behaviors when plotting 2d density over spatial map of countries.

  1. Performance issue: Plotting takes unusually long. e.g. before ~10s , now >1min
  2. I get warning message: In st_cast.GEOMETRYCOLLECTION(X[[i]], ...) : only first part of geometrycollection is retained
  3. 2d density square bins are now plotted with outlines (overlaps). geom_bin2d works fine alone but gives this issue in combination with geom_sf
``` r
library(tidyverse)
library(ggspatial)
library(sf)
#> Linking to GEOS 3.8.1, GDAL 3.2.1, PROJ 7.2.1
library(rnaturalearth)
library(rnaturalearthdata)
library(rgeos)
#> Loading required package: sp
#> rgeos version: 0.5-5, (SVN revision 640)
#>  GEOS runtime version: 3.8.1-CAPI-1.13.3 
#>  Linking to sp version: 1.4-2 
#>  Polygon checking: TRUE
    

world <- ne_countries(scale = "medium", returnclass = "sf")
csvData <- data.frame(lat = rnorm(10^5, 41.5, 0.1),
                      lon = rnorm(10^5, -72.75, 0.1))

# OK plot
world %>% 
    ggplot() +
        geom_bin2d(data = csvData, 
                   aes(x = lon, 
                       y = lat, 
                       fill = after_stat(log10(count))), 
                   binwidth = 0.005)

# Bins with borders
world %>% 
    ggplot() +
        geom_bin2d(data = csvData, 
                   aes(x = lon, 
                       y = lat, 
                       fill = after_stat(log10(count))), 
                   binwidth = 0.005) +
        geom_sf(color = "grey70", size = 0.1, fill = "transparent") +
        coord_sf(expand = FALSE, xlim = c(-73.7, -71.8), ylim = c(41.0, 42.0))
#> Warning in st_cast.GEOMETRYCOLLECTION(X[[i]], ...): only first part of
#> geometrycollection is retained

Created on 2021-06-22 by the reprex package (v2.0.0)

@clauswilke
Copy link
Member

This is probably the right location to report this, but I'm not sure whether you're reporting one issue or multiple issues. Could you try to simplify and really bring out one (or more) concrete problem(s)? If there are multiple problems, please open one issue per problem. Also, please simplify the reproducible example so it uses the minimum amount of packages required (ideally, only ggplot2). You can use one of the existing examples of geom_sf() as a starting point: https://ggplot2.tidyverse.org/reference/ggsf.html#examples

@Mashin6
Copy link
Author

Mashin6 commented Jun 22, 2021

The main issue is 3. which makes the 2d bin plot look bad. And I assume the performance issues are going to be related, since they happen only together.

I tried a few thing and it seems to be related to assigned crs.

library(ggplot2)
library(sf)
#> Linking to GEOS 3.8.1, GDAL 3.2.1, PROJ 7.2.1


csvData <- data.frame(lat = rnorm(10^5, 41.5, 0.1),
                      lon = rnorm(10^5, -72.75, 0.1))

pol <-st_polygon(list(rbind(c(-73.5,41.2), c(-73.5,41.5), c(-72,41.5), c(-72,41.2), c(-73.5,41.2))))

test_sf <- st_sf(st_sfc(pol))

ggplot(test_sf) +
        geom_sf(color = "grey70", size = 0.1, fill = "transparent") +
        geom_bin2d(data = csvData, 
                   aes(x = lon, 
                       y = lat, 
                       fill = after_stat(log10(count))), 
                   binwidth = 0.005) +
        coord_sf(expand = FALSE, xlim = c(-73.7, -71.8), ylim = c(41.0, 42.0))

st_crs(test_sf) <- 4326
                 
ggplot(test_sf) +
        geom_sf(color = "grey70", size = 0.1, fill = "transparent") +
        geom_bin2d(data = csvData, 
                   aes(x = lon, 
                       y = lat, 
                       fill = after_stat(log10(count))), 
                   binwidth = 0.005) +
        coord_sf(expand = FALSE, xlim = c(-73.7, -71.8), ylim = c(41.0, 42.0))
#> Warning in st_cast.GEOMETRYCOLLECTION(X[[i]], ...): only first part of
#> geometrycollection is retained

Created on 2021-06-22 by the reprex package (v2.0.0)

@clauswilke
Copy link
Member

Yes, the warning is unrelated. The sf package doesn't always behave correctly when no non-geometry column is available. The following fixes that issue:

test_sf <- st_sf(x = 1, geometry = st_sfc(pol))

I can confirm the performance and rendering issues with geom_bin2d() and will look into it.

@clauswilke
Copy link
Member

clauswilke commented Jun 23, 2021

The rendering errors go away if you set the color aesthetic as well, color = after_stat(log10(count)). In general, you need to set both color and fill when drawing rectangles.

The performance impact is probably due to coordinate transformation. However, in the example you provide, coordinates for the bins shouldn't get transformed, so I'll have to see what's going on there.

@thomasp85
Copy link
Member

@clauswilke should I wait with submitting 3.3.5 for including a fix for this?

@Mashin6
Copy link
Author

Mashin6 commented Jun 23, 2021

@clauswilke Thank you for looking into this. So relieved that you can reproduce this..

Having to set both color and fill seems counterintuitive compared to how the function is described. In order to change scale one would have to use duplicate definition in both scale_color and scale_fill. Outside of use with geom_sf with crs set, geom_bin2d behaves fine..

@clauswilke
Copy link
Member

@thomasp85 Let me try today to see if there's something simple I can do, though I already thought I had done the simple thing (let coord_sf() report that it is a linear coordinate system when default_crs = NULL). If there's anything on that level that can be done, I'll make a fix. Otherwise, I'd leave it as is.

@Mashin6 That's just how polygons work. If you draw many polygons side-by-side, there can be rounding errors between them that show up as tiny gaps. You can use a single scale function and set the aesthetics argument to both aesthetics you want to have covered by the function.

@clauswilke
Copy link
Member

I'm quite sure I know what causes the difference in behavior @Mashin6 sees. For linear coordinate systems, geom_bin2d() uses a raster, which doesn't require setting of color, but for non-linear coordinate systems it uses tiles, which do require color. The regression is that coord_sf() currently always reports that it is non-linear, even when it is linear.

@thomasp85 I think the problem is in this line:

default_crs = self$default_crs %||% crs

It should just be:

  default_crs = self$default_crs

If the fix is indeed that simple we should make it. I'm just worried that something else will blow up if we don't test properly. There must be a reason why I wrote it the way it is and I can't remember right now.

Otherwise, we'd have to add more sophisticated logic in this line, and checking whether if the default crs is set does it match the crs:

is_linear = function(self) is.null(self$get_default_crs()),

Either way it should be a simple fix, I guess.

clauswilke added a commit to wilkelab/ggplot2_archive that referenced this issue Jun 23, 2021
@clauswilke
Copy link
Member

With the proposed fix things appear fine and there is no performance difference between the three plots.

library(ggplot2)
library(sf)
#> Linking to GEOS 3.8.1, GDAL 3.1.4, PROJ 6.3.1

csvData <- data.frame(
  lat = rnorm(10^5, 41.5, 0.1),
  lon = rnorm(10^5, -72.75, 0.1)
)

pol <-st_polygon(list(rbind(c(-73.5,41.2), c(-73.5,41.5), c(-72,41.5), c(-72,41.2), c(-73.5,41.2))))

test_sf <- st_sf(x = 1, geometry = st_sfc(pol))

ggplot(csvData) +
  geom_bin2d(
    aes(lon, lat, fill = after_stat(log10(count)))
  ) +
  coord_fixed(xlim = c(-73.5, -72))

ggplot(csvData) +
  geom_sf(data = test_sf) +
  geom_bin2d(
    aes(lon, lat, fill = after_stat(log10(count)))
  ) +
  coord_sf(xlim = c(-73.5, -72))

test2_sf <- test_sf
st_crs(test2_sf) <- 4326

ggplot(csvData) +
  geom_sf(data = test2_sf) +
  geom_bin2d(
    aes(lon, lat, fill = after_stat(log10(count)))
  ) +
  coord_sf(xlim = c(-73.5, -72))

Created on 2021-06-23 by the reprex package (v1.0.0)

@Mashin6
Copy link
Author

Mashin6 commented Jun 23, 2021

Awesome! Thanks a ton!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants