-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tidyverse implementation for sf
subclasses
#1958
Comments
If that would solve your problem and not cause any new ones, I'd be happy to accept a PR! |
update: pull request submitted: #1963 |
With this PR I still get library(sf)
# Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.0; sf_use_s2() is TRUE
library(dplyr)
# Attaching package: ‘dplyr’
# The following objects are masked from ‘package:stats’:
# filter, lag
# The following objects are masked from ‘package:base’:
# intersect, setdiff, setequal, union
demo(nc, ask = FALSE, echo = FALSE)
class(nc) <- c("myclass", class(nc))
class(nc)
# [1] "myclass" "sf" "data.frame"
nc |> filter(NAME == "Ashe") |> class()
# [1] "sf" "data.frame" |
Ah, my bad: library(sf)
# Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.0; sf_use_s2() is TRUE
library(dplyr)
# Attaching package: ‘dplyr’
# The following objects are masked from ‘package:stats’:
# filter, lag
# The following objects are masked from ‘package:base’:
# intersect, setdiff, setequal, union
demo(nc, ask = FALSE, echo = FALSE)
class(nc) <- c("myclass", class(nc))
class(nc)
# [1] "myclass" "sf" "data.frame"
dplyr_row_slice.myclass <- function(data, i, ...){
out <- vctrs::vec_slice(data, i)
dplyr_reconstruct(out, data)
}
dplyr_reconstruct.myclass <- function(data, template){
class(data) <- class(template)
data
}
nc |> filter(NAME == "Ashe") |> class()
# [1] "myclass" "sf" "data.frame" |
@henningte could I ask you to review this PR? |
@edzer I'll have a look at it tomorrow. |
Before having a closer look into the code, I have the following questions and issues:
|
As I understand it it will not replace the column selectors (select, mutate, ...) because sf objects break the tidy contract (or assumption) that obj[1] always has length 1. |
Yes, you're right! Except that the PR does replace |
The behavior of
"My concern is only that it may get confusing which dplyr method is implemented the old way and which the new way." Re: The way I see it is that this makes it clearer whether the dplyr verb has a special behavior on the class.
Definitely, some documentation on this would be helpful here.
|
Thanks @huizezhang-sherry , this helped me to better understand the purpose of the new functions and where they do not apply. In this case, the PR is as complete as it could possibly be. The only open question is then whether |
Thanks a lot to both - now merged! |
Your PR creates a reverse dependency problem in package
|
Oh I think this bit of codes inside
In both errors,
The old code went smooth because the code does not directly print out the object and the later line My solution here would be:
|
Thanks - I followed 1, to restore old behaviour. |
* redist problem see alarm-redist/redist#148
@huizezhang-sherry could you pls check that 2df4b5c still does what you had in mind? It seems to fix the redist revdep problem. |
yea, I'm happy with that. Just went through the rest in the redist revdep problem, I'm thinking the test on |
Dear All, I really appreciate these changes and they seem to solve quite a few issue I do however note that mutate still seems to change the order of classes when inheriting sf (adopted from the example above): library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
demo(nc, ask = FALSE, echo = FALSE)
class(nc) <- c("myclass", class(nc))
class(nc)
#> [1] "myclass" "sf" "data.frame"
dplyr_row_slice.myclass <- function(data, i, ...){
out <- vctrs::vec_slice(data, i)
dplyr_reconstruct(out, data)
}
dplyr_reconstruct.myclass <- function(data, template){
class(data) <- class(template)
data
}
nc |> mutate(NAME2 = NAME) |> class()
#> [1] "sf" "myclass" "data.frame"
nc |> filter(NAME == "Ashe") |> class()
#> [1] "myclass" "sf" "data.frame" This would mean packages inheriting sf would still need to implement custom Created on 2022-07-10 by the reprex package (v2.0.1) |
Thanks; it seems |
I somehow had expected the calling order would be reversed, when thinking about reconstructing a class hierarchy. |
Thanks for having a look. It also seems that if you implement library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
demo(nc, ask = FALSE, echo = FALSE)
class(nc) <- c("myclass", class(nc))
class(nc)
#> [1] "myclass" "sf" "data.frame"
dplyr_row_slice.myclass <- function(data, i, ...){
message("called 1")
out <- NextMethod()
dplyr_reconstruct(out, data)
}
dplyr_reconstruct.myclass <- function(data, template){
message("called 2")
class(data) <- class(template)
data
}
nc |> mutate(NAME2 = NAME) |> class()
#> called 2
#> [1] "sf" "myclass" "data.frame"
nc |> filter(NAME == "Ashe") |> class()
#> called 1
#> called 2
#> called 2
#> [1] "myclass" "sf" "data.frame" Could this relate to this line which could result in |
Hi, thanks for being interested in this discussion. I think there are two issues here:
|
We now get the following, new, revdep errors on CRAN checks:
So I think I am going to revert this change to how it was in 1.0-7, and leave this issue for someone with more time on their hands and willing to do all the revdep checks before writing a PR. |
Hi @edzer, I apologize for not being aware of the revdep checks before. I'm running these checks now and will investigate accordingly if any issue raises. Would that be okay with you? |
Yes, but it will probably not make if for sf 1.0-8, which needs to get out now. |
Thanks, I'm okay with that. |
Add a dedicated `dplyr::dplyr_reconstruct()` method for `sftime` objects. Relying on the method for `sf` objects caused erroneously column binding when the second object was a data frame without conflicting column names for the `sf` and time columns. In this case, a `sf` objects was returned, even though an `sftime` object should be returned. See also r-spatial/sf#1958 (comment).
Hi @edzer, I looked into the issue with the The object The code below describes the same issue: library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.1, PROJ 9.0.1; sf_use_s2() is TRUE
nc = read_sf(system.file("shape/nc.shp", package="sf"))
nc_g <- nc %>% dplyr::group_by(AREA) %>% head(5)
nc_g
#> Simple feature collection with 5 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -81.74107 ymin: 36.07282 xmax: -75.77316 ymax: 36.58965
#> Geodetic CRS: NAD27
#> # A tibble: 5 × 15
#> # Groups: AREA [5]
#> AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74 NWBIR74
#> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.114 1.44 1825 1825 Ashe 37009 37009 5 1091 1 10
#> 2 0.061 1.23 1827 1827 Alleg… 37005 37005 3 487 0 10
#> 3 0.143 1.63 1828 1828 Surry 37171 37171 86 3188 5 208
#> 4 0.07 2.97 1831 1831 Curri… 37053 37053 27 508 1 123
#> 5 0.153 2.21 1832 1832 North… 37131 37131 66 1421 9 1066
#> # … with 4 more variables: BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
#> # geometry <MULTIPOLYGON [°]>
names(nc_g)[1] <- "area"
nc_g
#> # A tibble: 5 × 15
#> # Groups: area [5]
#> area PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74 NWBIR74
#> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.114 1.44 1825 1825 Ashe 37009 37009 5 1091 1 10
#> 2 0.061 1.23 1827 1827 Alleg… 37005 37005 3 487 0 10
#> 3 0.143 1.63 1828 1828 Surry 37171 37171 86 3188 5 208
#> 4 0.07 2.97 1831 1831 Curri… 37053 37053 27 508 1 123
#> 5 0.153 2.21 1832 1832 North… 37131 37131 66 1421 9 1066
#> # … with 4 more variables: BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
#> # geometry <MULTIPOLYGON [°]> Created on 2022-07-14 by the reprex package (v2.0.1) This can be fixed with adding:
in the |
Great, does that also fix |
I think the problem with In the following line: the class of In the conflicting This is not the case in the current So the modified |
Sure, I will modify the quoted line in the upcoming pull request. |
From the source code, the
sf
class implements tidyverse methods by directly providing methods for each verb. This will cause an issue forsf
subclasses since thesf
class will always get prioritised over its subclass.I'm wondering if the sf team would like to take tidyverse's recommendation to implement through
dplyr_row_slice()
anddplyr_col_modify()
?Some more details:
Here, the data
nc
augmented with a classmyclass
. The functiondplyr_row_slice()
anddplyr_reconstruct()
are implemented formyclass
.Now if we run
out <- nc %>% filter(NAME == "Ashe")
, one may expectfilter()
->filter.data.frame()
->dplyr_row_slice.myclass()
. This would resultout
to have class in the order of"myclass" "sf" "data.frame"
.But this is not the case:
Created on 2022-06-13 by the reprex package (v2.0.1)
The text was updated successfully, but these errors were encountered: