Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidyverse implementation for sf subclasses #1958

Closed
huizezhang-sherry opened this issue Jun 13, 2022 · 30 comments
Closed

tidyverse implementation for sf subclasses #1958

huizezhang-sherry opened this issue Jun 13, 2022 · 30 comments

Comments

@huizezhang-sherry
Copy link
Contributor

From the source code, the sf class implements tidyverse methods by directly providing methods for each verb. This will cause an issue for sf subclasses since the sf class will always get prioritised over its subclass.

I'm wondering if the sf team would like to take tidyverse's recommendation to implement through dplyr_row_slice() and dplyr_col_modify()?


Some more details:

Here, the datanc augmented with a class myclass. The function dplyr_row_slice() and dplyr_reconstruct() are implemented for myclass.

Now if we run out <- nc %>% filter(NAME == "Ashe"), one may expect filter() -> filter.data.frame()-> dplyr_row_slice.myclass(). This would result out to have class in the order of "myclass" "sf" "data.frame".

But this is not the case:

library(sf)
#> Linking to GEOS 3.9.1, GDAL 3.4.0, PROJ 8.1.1; sf_use_s2() is TRUE
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
nc <- st_read(system.file("shape/nc.shp", package="sf"))
#> Reading layer `nc' from data source 
#>   `/Library/Frameworks/R.framework/Versions/4.1/Resources/library/sf/shape/nc.shp' 
#>   using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS:  NAD27
class(nc) <- c("myclass", class(nc))
class(nc)
#> [1] "myclass"    "sf"         "data.frame"
dplyr_row_slice.myclass <- function(data, i, ...){
  out <- vctrs::vec_slice(data, i)
  dplyr_reconstruct(out, data)
}

dplyr_reconstruct.myclass <- function(data, template){
  class(data) <- class(template)
  data
}

out <- nc %>% filter(NAME == "Ashe")
out
#> Simple feature collection with 1 feature and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -81.74107 ymin: 36.23436 xmax: -81.23989 ymax: 36.58965
#> Geodetic CRS:  NAD27
#>    AREA PERIMETER CNTY_ CNTY_ID NAME  FIPS FIPSNO CRESS_ID BIR74 SID74 NWBIR74
#> 1 0.114     1.442  1825    1825 Ashe 37009  37009        5  1091     1      10
#>   BIR79 SID79 NWBIR79                       geometry
#> 1  1364     0      19 MULTIPOLYGON (((-81.47276 3...
class(out)
#> [1] "sf"         "myclass"    "data.frame"

Created on 2022-06-13 by the reprex package (v2.0.1)

@edzer
Copy link
Member

edzer commented Jun 13, 2022

I'm wondering if the sf team would like to take tidyverse's recommendation to implement through dplyr_row_slice() and dplyr_col_modify()?

If that would solve your problem and not cause any new ones, I'd be happy to accept a PR!

@huizezhang-sherry
Copy link
Contributor Author

update: pull request submitted: #1963

@edzer
Copy link
Member

edzer commented Jun 22, 2022

With this PR I still get

library(sf)
# Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.0; sf_use_s2() is TRUE
library(dplyr)

# Attaching package: ‘dplyr’

# The following objects are masked from ‘package:stats’:

#     filter, lag

# The following objects are masked from ‘package:base’:

#     intersect, setdiff, setequal, union

demo(nc, ask = FALSE, echo = FALSE)
class(nc) <- c("myclass", class(nc))
class(nc)
# [1] "myclass"    "sf"         "data.frame"
nc |> filter(NAME == "Ashe") |> class()
# [1] "sf"         "data.frame"

@edzer
Copy link
Member

edzer commented Jun 22, 2022

Ah, my bad:

library(sf)
# Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.0; sf_use_s2() is TRUE
library(dplyr)

# Attaching package: ‘dplyr’

# The following objects are masked from ‘package:stats’:

#     filter, lag

# The following objects are masked from ‘package:base’:

#     intersect, setdiff, setequal, union

demo(nc, ask = FALSE, echo = FALSE)
class(nc) <- c("myclass", class(nc))
class(nc)
# [1] "myclass"    "sf"         "data.frame"
dplyr_row_slice.myclass <- function(data, i, ...){
  out <- vctrs::vec_slice(data, i)
  dplyr_reconstruct(out, data)
}

dplyr_reconstruct.myclass <- function(data, template){
  class(data) <- class(template)
  data
}
nc |> filter(NAME == "Ashe") |> class()
# [1] "myclass"    "sf"         "data.frame"

@edzer
Copy link
Member

edzer commented Jun 22, 2022

@henningte could I ask you to review this PR?

@henningte
Copy link

@edzer I'll have a look at it tomorrow.

@henningte
Copy link

Before having a closer look into the code, I have the following questions and issues:

  1. If I understand correctly, dplyr::dplyr_row_slice() and dplyr::dplyr_col_modify() are meant to replace all current implementations of dplyr methods in the sf package, right?
  2. Currently, only four of thedplyr methods currently implemented separately in sf are replaced by the pull request. I understand from the PR comment that implementing this might get complicated for select.sf() and transmute.sf() and all dplyr methods depending on this. However, it should be possible for *join() methods, right?
    Correct me if I'm wrong! My concern is only that it may get confusing which dplyr method is implemented the old way and which the new way.
  3. "Extending dplyr with new data frame subclasses" is still listed as experimental feature of the dplyr package. Is there any sf policy whether to implement experimental features?

@edzer
Copy link
Member

edzer commented Jun 24, 2022

As I understand it it will not replace the column selectors (select, mutate, ...) because sf objects break the tidy contract (or assumption) that obj[1] always has length 1.

@henningte
Copy link

henningte commented Jun 24, 2022

Yes, you're right! Except that the PR does replace mutate.sf() since it does not interfere with the sticky behavior of the active geometry column.

@huizezhang-sherry
Copy link
Contributor Author

huizezhang-sherry commented Jun 24, 2022

  1. No, the replacement can happen if two conditions are fulfilled:

    a) the dplyr function itself is implemented with dplyr_row_slice() or dplyr_col_modify()
    b) the behavior of the dplyr method on the new class is basically the same

The behavior of filter(), slice(), arrange(), and mutate() on an sf object are no different than their behaviours on a data frame, hence can be replaced with the two proposed verbs. select and transmute have different behaviors (they always retains the geometry column), hence need to have its customised select.sf() and transmute.sf().

  1. No, the four basic joins do not use any of the two function. On the dplyr extending page, it states that

inner_join(), left_join(), right_join(), and full_join() coerces x to a tibble, modify the rows, then uses dplyr_reconstruct() to convert back to the same type as x.

"My concern is only that it may get confusing which dplyr method is implemented the old way and which the new way."

Re: The way I see it is that this makes it clearer whether the dplyr verb has a special behavior on the class.

  • If nothing special, it would go through dplyr_row_slice() or dplyr_col_modify.
  • A written up method.class() would signal the method works differently on the class.

Definitely, some documentation on this would be helpful here.

  1. It is true that the document is labelled as experimental, but the page also stats that "... but it's likely that any code you write to implement them will find a home in what comes next". In terms of the sf policy on this, I'm not the best person to answer this.

@henningte
Copy link

henningte commented Jun 24, 2022

Thanks @huizezhang-sherry , this helped me to better understand the purpose of the new functions and where they do not apply. In this case, the PR is as complete as it could possibly be.

The only open question is then whether sf should built on an experimental feature of another package. @edzer , what do you think about this?

@edzer
Copy link
Member

edzer commented Jul 6, 2022

Thanks a lot to both - now merged!

@edzer edzer closed this as completed Jul 6, 2022
@edzer
Copy link
Member

edzer commented Jul 6, 2022

Your PR creates a reverse dependency problem in package himach, would you mind taking a look at this @huizezhang-sherry?

Package: himach
Check: tests
New result: ERROR
    Running ‘spelling.R’ [0s/0s]
    Running ‘testthat.R’ [32s/32s]
  Running the tests in ‘tests/testthat.R’ failed.
  Complete output:
    > library(testthat)
    > library(himach)
    > 
    > test_check("himach")
    [ FAIL 2 | WARN 0 | SKIP 1 | PASS 66 ]
    
    ══ Skipped tests ═══════════════════════════════════════════════════════════════
    • On CRAN (1)
    
    ══ Failed tests ════════════════════════════════════════════════════════════════
    ── Error (test_routes.R:94:3): find_route works with subsonic option ───────────
    Error in `st_sf(x, ..., agr = agr, sf_column_name = sf_column_name)`: sf_column_name %in% all_sfc_names is not TRUE
    Backtrace:
         ▆
      1. ├─utils::capture.output(...) at test_routes.R:94:2
      2. │ └─base::withVisible(...elt(i))
      3. ├─... %>% summarise_routes_for_test()
      4. ├─himach (local) summarise_routes_for_test(.)
      5. │ └─... %>% ... at test_routes.R:17:2
      6. ├─dplyr::mutate(...)
      7. ├─dplyr::summarise(...)
      8. ├─dplyr::group_by(., fullRouteID)
      9. ├─sf::st_drop_geometry(.)
     10. ├─dplyr::mutate(., envelope = st_area(envelope))
     11. ├─dplyr::mutate(., across(c(gc, crow), st_length), gc_length = gc)
     12. └─dplyr:::mutate.data.frame(...)
     13.   ├─dplyr::dplyr_col_modify(.data, cols)
     14.   ├─sf:::dplyr_col_modify.sf(.data, cols)
     15.   ├─base::NextMethod()
     16.   └─dplyr:::dplyr_col_modify.data.frame(.data, cols)
     17.     └─dplyr::dplyr_reconstruct(out, data)
     18.       ├─dplyr:::dplyr_reconstruct_dispatch(data, template)
     19.       └─sf:::dplyr_reconstruct.sf(data, template)
     20.         ├─sf::st_as_sf(data, sf_column_name = sfc_name, crs = crs, precision = prec)
     21.         └─sf:::st_as_sf.data.frame(...)
     22.           └─sf::st_sf(x, ..., agr = agr, sf_column_name = sf_column_name)
     23.             └─base::stopifnot(sf_column_name %in% all_sfc_names)
    ── Error (test_routes.R:134:3): Find multiple routes for multiple aircraft ─────
    Error in `st_sf(x, ..., agr = agr, sf_column_name = sf_column_name)`: sf_column_name %in% all_sfc_names is not TRUE
    Backtrace:
         ▆
      1. ├─utils::capture.output(...) at test_routes.R:134:2
      2. │ └─base::withVisible(...elt(i))
      3. ├─... %>% summarise_routes_for_test()
      4. ├─himach (local) summarise_routes_for_test(.)
      5. │ └─... %>% ... at test_routes.R:17:2
      6. ├─dplyr::mutate(...)
      7. ├─dplyr::summarise(...)
      8. ├─dplyr::group_by(., fullRouteID)
      9. ├─sf::st_drop_geometry(.)
     10. ├─dplyr::mutate(., envelope = st_area(envelope))
     11. ├─dplyr::mutate(., across(c(gc, crow), st_length), gc_length = gc)
     12. └─dplyr:::mutate.data.frame(...)
     13.   ├─dplyr::dplyr_col_modify(.data, cols)
     14.   ├─sf:::dplyr_col_modify.sf(.data, cols)
     15.   ├─base::NextMethod()
     16.   └─dplyr:::dplyr_col_modify.data.frame(.data, cols)
     17.     └─dplyr::dplyr_reconstruct(out, data)
     18.       ├─dplyr:::dplyr_reconstruct_dispatch(data, template)
     19.       └─sf:::dplyr_reconstruct.sf(data, template)
     20.         ├─sf::st_as_sf(data, sf_column_name = sfc_name, crs = crs, precision = prec)
     21.         └─sf:::st_as_sf.data.frame(...)
     22.           └─sf::st_sf(x, ..., agr = agr, sf_column_name = sf_column_name)
     23.             └─base::stopifnot(sf_column_name %in% all_sfc_names)
    
    [ FAIL 2 | WARN 0 | SKIP 1 | PASS 66 ]
    Error: Test failures
    Execution halted

@huizezhang-sherry
Copy link
Contributor Author

Oh I think this bit of codes inside summarise_routes_for_test() in their test_routes.R is causing some problems here:

r %>%
  # wkt is machine-dependent so just extract length/area
  mutate(across(c(gc, crow), st_length),
         gc_length = gc) %>%
  mutate(envelope = st_area(envelope)) %>%
  sf::st_drop_geometry() %>%
  ...

In both errors, r is an sf object with gc as the sf_column. The mutate step transforms the gc column to its length and makes it no longer an sfc column. The old code stores the output still as an sf object and produces an error when printed:

Error in st_geometry.sf(x) : 
  attr(obj, "sf_column") does not point to a geometry column.
Did you rename it, without setting st_geometry(obj) <- "newname"?

The old code went smooth because the code does not directly print out the object and the later line sf::st_drop_geometry() happens to fix this sf_column problem.

My solution here would be:

  1. A new line in dplyr_reconstruct.sf() is needed: if (!inherits(sfc_name, "sfc")) return(data), so that if the sf_column is no longer an sfc, the data can be returned as a tibble (even if it can't be constructed as an sf) .

  2. They need to modify the test codes, potentially other places as well, by either giving the mutated length a different name or dropping the line sf::st_drop_geometry() since as per the current code, the object passed by mutate will not be an sf object.

edzer added a commit that referenced this issue Jul 7, 2022
@edzer
Copy link
Member

edzer commented Jul 7, 2022

Thanks - I followed 1, to restore old behaviour.

@edzer
Copy link
Member

edzer commented Jul 8, 2022

@huizezhang-sherry could you pls check that 2df4b5c still does what you had in mind? It seems to fix the redist revdep problem.

@huizezhang-sherry
Copy link
Contributor Author

yea, I'm happy with that.

Just went through the rest in the redist revdep problem, I'm thinking the test on rowwise_df can be better with using the filter(AREA > .1) example. The current slice(1) will give the first row of each group, which is basically everything in a rowwise_df. But, that's trivial!

edzer added a commit that referenced this issue Jul 9, 2022
@bart1
Copy link
Contributor

bart1 commented Jul 10, 2022

Dear All, I really appreciate these changes and they seem to solve quite a few issue I do however note that mutate still seems to change the order of classes when inheriting sf (adopted from the example above):

library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

demo(nc, ask = FALSE, echo = FALSE)
class(nc) <- c("myclass", class(nc))
class(nc)
#> [1] "myclass"    "sf"         "data.frame"
dplyr_row_slice.myclass <- function(data, i, ...){
    out <- vctrs::vec_slice(data, i)
    dplyr_reconstruct(out, data)
}

dplyr_reconstruct.myclass <- function(data, template){
    class(data) <- class(template)
    data
}
nc |> mutate(NAME2 = NAME) |> class()
#> [1] "sf"         "myclass"    "data.frame"
nc |> filter(NAME == "Ashe") |> class()
#> [1] "myclass"    "sf"         "data.frame"

This would mean packages inheriting sf would still need to implement custom mutatemethods. Would there be a way to prevent this?

Created on 2022-07-10 by the reprex package (v2.0.1)

@edzer
Copy link
Member

edzer commented Jul 10, 2022

Thanks; it seems dplyr_reconstruct.myclass is called before dplyr_reconstruct.sf. Maybe the latter should strip all class labels before label sf, and add them back later before sf before returning?

@edzer
Copy link
Member

edzer commented Jul 10, 2022

I somehow had expected the calling order would be reversed, when thinking about reconstructing a class hierarchy.

@bart1
Copy link
Contributor

bart1 commented Jul 10, 2022

Thanks for having a look. It also seems that if you implement dplyr_row_slice.myclass with NextMethod the dplyr_reconstruct.myclass is called twice:

library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
demo(nc, ask = FALSE, echo = FALSE)
class(nc) <- c("myclass", class(nc))
class(nc)
#> [1] "myclass"    "sf"         "data.frame"
dplyr_row_slice.myclass <- function(data, i, ...){
  message("called 1")
  out <- NextMethod()
  dplyr_reconstruct(out, data)
}

dplyr_reconstruct.myclass <- function(data, template){
  message("called 2")
  class(data) <- class(template)
  data
}
nc |> mutate(NAME2 = NAME) |> class()
#> called 2
#> [1] "sf"         "myclass"    "data.frame"
nc |> filter(NAME == "Ashe") |> class()
#> called 1
#> called 2
#> called 2
#> [1] "myclass"    "sf"         "data.frame"

Could this relate to this line which could result in myclass being moved down the class order?

@huizezhang-sherry
Copy link
Contributor Author

Hi, thanks for being interested in this discussion. I think there are two issues here:

  1. mutate is a column wise operation, so you will need to implement dplyr_col_modify.myclass() to have myclass take precedence over sf.

  2. I didn't show the mutate example because implementing dplyr_col_modify.myclass() will need to take into account the sticky behavior of select/ subset in sf - see [.sf. Depending on how it is implemented with myclass, you may also need these methods ([.myclass, [[<-.myclass, print.myclass, etc) for a mutate example to work properly. This is beyond the complexity of a proof of concept, but you may have already had them when creating a new class.

@edzer
Copy link
Member

edzer commented Jul 12, 2022

We now get the following, new, revdep errors on CRAN checks:


Changes to worse in reverse depends:

Package: eks
Check: examples
New result: ERROR
  Running examples in ‘eks-Ex.R’ failed
  The error most likely occurred in:
  
  > base::assign(".ptime", proc.time(), pos = "CheckExEnv")
  > ### Name: tidyst_kdr
  > ### Title: Tidy and geospatial kernel density ridge estimates
  > ### Aliases: tidy_kdr st_kdr
  > ### Keywords: smooth
  > 
  > ### ** Examples
  > 
  > ## tidy density ridge estimate
  > 
  > ## geospatial density ridge estimate
  > data(wa)
  > data(grevilleasf)
  > hakeoides <- dplyr::filter(grevilleasf, species=="hakeoides")
  > ## gridsize=c(21,21) is for illustrative purposes only 
  > ## remove for more complete KDR
  > s1 <- st_kdr(hakeoides, gridsize=c(21,21))
  > 
  > ## base R plot
  > xlim <- c(1.2e5, 1.1e6); ylim <- c(6.1e6, 7.2e6)
  > plot(wa, xlim=xlim, ylim=ylim)
  > plot(sf::st_geometry(hakeoides), add=TRUE, col=3, pch=16)
  > plot(s1, add=TRUE, col=6, lwd=3, alpha=0.8)
  Adding missing grouping variables: `segment`
  Warning in xy.coords(x, y, xlabel, ylabel, log) :
    NAs introduced by coercion
  Warning in min(x) : no non-missing arguments to min; returning Inf
  Warning in max(x) : no non-missing arguments to max; returning -Inf
  Warning in plot.window(...) : "add" is not a graphical parameter
  Error in plot.window(...) : need finite 'ylim' values
  Calls: plot ... NextMethod -> plot.default -> localWindow -> plot.window
  Execution halted

Package: mapping
Check: R code for possible problems
New result: NOTE
  Warning in fun(libname, pkgname) :
    rgeos: versions of GEOS runtime 3.11.0-CAPI-1.17.0
  and GEOS at installation 3.10.2-CAPI-1.16.0differ

Package: sftime
Check: examples
New result: ERROR
  Running examples in ‘sftime-Ex.R’ failed
  The error most likely occurred in:
  
  > base::assign(".ptime", proc.time(), pos = "CheckExEnv")
  > ### Name: bind
  > ### Title: Bind rows (features) of 'sftime' objects
  > ### Aliases: bind rbind.sftime cbind.sftime
  > 
  > ### ** Examples
  > 
  > g1 <- st_sfc(st_point(1:2))
  > x1 <- st_sftime(a = 3, geometry = g1, time = Sys.time())
  > 
  > g2 <- st_sfc(st_point(c(4, 6)))
  > x2 <- st_sftime(a = 4, geometry = g2, time = Sys.time())
  > 
  > rbind(x1, x2) # works because both tc1 and tc2 have the same class
  Spatiotemporal feature collection with 2 features and 1 field
  Geometry type: POINT
  Dimension:     XY
  Bounding box:  xmin: 1 ymin: 2 xmax: 4 ymax: 6
  CRS:           NA
  Time column with class: 'POSIXt'.
  Ranging from 2022-07-12 16:54:09 to 2022-07-12 16:54:09.
    a    geometry                time
  1 3 POINT (1 2) 2022-07-12 16:54:09
  2 4 POINT (4 6) 2022-07-12 16:54:09
  > 
  > ## Not run: 
  > ##D st_time(x2) <- 1
  > ##D rbind(x1, x2) # error because both tc1 and tc2 do not have the same class
  > ## End(Not run)
  > 
  > cbind(x1, x2) 
  Spatiotemporal feature collection with 1 feature and 3 fields
  Active geometry column: geometry
  Geometry type: POINT
  Dimension:     XY
  Bounding box:  xmin: 1 ymin: 2 xmax: 1 ymax: 2
  CRS:           NA
  Time column with class: 'POSIXt'.
  Representing 2022-07-12 16:54:09.
    a a.1              time.1    geometry  geometry.1                time
  1 3   4 2022-07-12 16:54:09 POINT (1 2) POINT (4 6) 2022-07-12 16:54:09
  > 
  > if (require(dplyr))
  +   dplyr::bind_cols(x1, x2)
  Loading required package: dplyr
  
  Attaching package: ‘dplyr’
  
  The following objects are masked from ‘package:stats’:
  
      filter, lag
  
  The following objects are masked from ‘package:base’:
  
      intersect, setdiff, setequal, union
  
  New names:
  • `a` -> `a...1`
  • `geometry` -> `geometry...2`
  • `time` -> `time...3`
  • `a` -> `a...4`
  • `geometry` -> `geometry...5`
  • `time` -> `time...6`
  Error in st_agr.default(x) : all(is.na(x)) is not TRUE
  Calls: <Anonymous> ... print.sftime -> st_agr -> st_agr.default -> stopifnot
  Execution halted

Package: sspm
Check: re-building of vignette outputs
New result: ERROR
  Error(s) in re-building vignettes:
    ...
  --- re-building ‘An_example_with_simulated_data.Rmd’ using rmarkdown
  Quitting from lines 250-252 (An_example_with_simulated_data.Rmd) 
  Error: processing vignette ‘An_example_with_simulated_data.Rmd’ failed with diagnostics:
  `.data` must be a valid <grouped_df> object.
  Caused by error in `validate_grouped_df()`:
  ! The `groups` attribute must be a data frame.
  --- failed re-building ‘An_example_with_simulated_data.Rmd’
  
  --- re-building ‘Package_and_workflow_design.Rmd’ using rmarkdown
  --- finished re-building ‘Package_and_workflow_design.Rmd’
  
  SUMMARY: processing the following file failed:
    ‘An_example_with_simulated_data.Rmd’
  
  Error: Vignette re-building failed.
  Execution halted

So I think I am going to revert this change to how it was in 1.0-7, and leave this issue for someone with more time on their hands and willing to do all the revdep checks before writing a PR.

edzer added a commit that referenced this issue Jul 13, 2022
@huizezhang-sherry
Copy link
Contributor Author

Hi @edzer,

I apologize for not being aware of the revdep checks before. I'm running these checks now and will investigate accordingly if any issue raises. Would that be okay with you?

@edzer
Copy link
Member

edzer commented Jul 13, 2022

Yes, but it will probably not make if for sf 1.0-8, which needs to get out now.

@huizezhang-sherry
Copy link
Contributor Author

Thanks, I'm okay with that.

henningte added a commit to r-spatial/sftime that referenced this issue Jul 13, 2022
Add a dedicated `dplyr::dplyr_reconstruct()` method for `sftime` objects.
Relying on the method for `sf` objects caused erroneously column binding when the second object was a data frame without conflicting column names for the `sf` and time columns. In this case, a `sf` objects was returned, even though an `sftime` object should be returned. See also r-spatial/sf#1958 (comment).
@huizezhang-sherry
Copy link
Contributor Author

huizezhang-sherry commented Jul 14, 2022

Hi @edzer,

I looked into the issue with the eks package. The failure is because in the list s1, the element sf is not an sf object. The class label goes missing at this line when relocate() is called from mutate() in st_kdr().

The object fhat.sf passed into mutate() in st_kdr() is an sf object with groped_df. The grouped_df class has a names<-.grouped_df method but sf doesn't, so the sf class is missing after resetting the name.

The code below describes the same issue:

library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.1, PROJ 9.0.1; sf_use_s2() is TRUE
nc = read_sf(system.file("shape/nc.shp", package="sf"))
nc_g <- nc %>% dplyr::group_by(AREA) %>% head(5)
nc_g
#> Simple feature collection with 5 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -81.74107 ymin: 36.07282 xmax: -75.77316 ymax: 36.58965
#> Geodetic CRS:  NAD27
#> # A tibble: 5 × 15
#> # Groups:   AREA [5]
#>    AREA PERIMETER CNTY_ CNTY_ID NAME   FIPS  FIPSNO CRESS_ID BIR74 SID74 NWBIR74
#>   <dbl>     <dbl> <dbl>   <dbl> <chr>  <chr>  <dbl>    <int> <dbl> <dbl>   <dbl>
#> 1 0.114      1.44  1825    1825 Ashe   37009  37009        5  1091     1      10
#> 2 0.061      1.23  1827    1827 Alleg… 37005  37005        3   487     0      10
#> 3 0.143      1.63  1828    1828 Surry  37171  37171       86  3188     5     208
#> 4 0.07       2.97  1831    1831 Curri… 37053  37053       27   508     1     123
#> 5 0.153      2.21  1832    1832 North… 37131  37131       66  1421     9    1066
#> # … with 4 more variables: BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
#> #   geometry <MULTIPOLYGON [°]>
names(nc_g)[1] <- "area"
nc_g
#> # A tibble: 5 × 15
#> # Groups:   area [5]
#>    area PERIMETER CNTY_ CNTY_ID NAME   FIPS  FIPSNO CRESS_ID BIR74 SID74 NWBIR74
#>   <dbl>     <dbl> <dbl>   <dbl> <chr>  <chr>  <dbl>    <int> <dbl> <dbl>   <dbl>
#> 1 0.114      1.44  1825    1825 Ashe   37009  37009        5  1091     1      10
#> 2 0.061      1.23  1827    1827 Alleg… 37005  37005        3   487     0      10
#> 3 0.143      1.63  1828    1828 Surry  37171  37171       86  3188     5     208
#> 4 0.07       2.97  1831    1831 Curri… 37053  37053       27   508     1     123
#> 5 0.153      2.21  1832    1832 North… 37131  37131       66  1421     9    1066
#> # … with 4 more variables: BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
#> #   geometry <MULTIPOLYGON [°]>

Created on 2022-07-14 by the reprex package (v2.0.1)

This can be fixed with adding:

#' @export
"names<-.sf" <- function(x, value) {
	out <- NextMethod()
	dplyr_reconstruct.sf(out, x)
}

in the sf package and I get eks passes the revdep check afterwards.

@edzer
Copy link
Member

edzer commented Jul 14, 2022

Great, does that also fix sftime checking? I now reverted your original PR (and released 1.0-8 to CRAN without it), but did this manually as automatically no longer worked, so I guess you'd have to start the PR from scratch when working from branch main.

@henningte
Copy link

henningte commented Jul 14, 2022

I think the problem with sftime is actually a problem with the modified dplyr_reconstruct.sf() in huizezhang-sherry/sf@main.

In the following line:

https://github.com/huizezhang-sherry/sf/blob/cab24cfdc5f0a75dd08a440a4e35e1a224cf9213/R/tidyverse.R#L23

the class of data is updated with the class of template (with only the sf class removed from the class attribute of template). This means that if there is a subclass to an sf object, the subclass remains in the returned object.

In the conflicting sftime example, exactly this happens, i.e. the merged sf objects are invalid because the sf columns get renamed and therefore the modified dplyr_reconstruct.sf() returns an object with class c("sftime", "data.frame") which causes the error (but only upon printing).

This is not the case in the current r-spatial/sf@main code since the class attribute of data is only changed to tibble in a safe way (via dplyr::as_tibble()) or to sf in a safe way (via st_as_sf()).

So the modified dplyr_reconstruct.sf() should be rewritten to either use only safe class conversions (as is currently the case in r-spatial/sf@main) or remove the sf class and all subclasses when the data is no valid sf object any more.

@huizezhang-sherry
Copy link
Contributor Author

Sure, I will modify the quoted line in the upcoming pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants