Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move POINT geometries out of a list column, into a matrix #2059

Open
wants to merge 38 commits into
base: main
Choose a base branch
from
Open

Conversation

edzer
Copy link
Member

@edzer edzer commented Dec 13, 2022

This PR moves POINT geometries out of the list column, leaves a list column filled with NULL values in place, and moves the point coordinates in an n x 2 (or 3 for XYM/XYZ, or 4 for XYZM) matrix attribute called points. The class dim label (XY/XYZ/XYM/XYZM) is put in attribute points_dim. This happens when:

  • st_as_sf.data.frame is called with e.g. coords specifying the coordinate colums
  • WKB in converted into an sfc, i.e. when coming back from st_read() or any of the GEOS functions.
    Right now, a[] restores the original list of sfg objects:
library(sf)
# Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.1; sf_use_s2() is TRUE
data.frame(z=rnorm(3), x=1:3, y=3:1) |>
  st_as_sf(coords = c("x", "y")) -> x
x$geometry |>
  unclass() |> 
  str()
# List of 3
#  $ : NULL
#  $ : NULL
#  $ : NULL
#  - attr(*, "points")= num [1:3, 1:2] 1 2 3 3 2 1
#  - attr(*, "points_dim")= chr "XY"
#  - attr(*, "n_empty")= int 0
#  - attr(*, "precision")= num 0
#  - attr(*, "crs")=List of 2
#   ..$ input: chr NA
#   ..$ wkt  : chr NA
#   ..- attr(*, "class")= chr "crs"
#  - attr(*, "bbox")= 'bbox' Named num [1:4] 1 1 3 3
#   ..- attr(*, "names")= chr [1:4] "xmin" "ymin" "xmax" "ymax"
x$geometry[] |>
  unclass() |> 
  str()
# List of 3
#  $ : 'XY' num [1:2] 1 3
#  $ : 'XY' num [1:2] 2 2
#  $ : 'XY' num [1:2] 3 1
#  - attr(*, "n_empty")= int 0
#  - attr(*, "precision")= num 0
#  - attr(*, "crs")=List of 2
#   ..$ input: chr NA
#   ..$ wkt  : chr NA
#   ..- attr(*, "class")= chr "crs"
#  - attr(*, "bbox")= 'bbox' Named num [1:4] 1 1 3 3
#   ..- attr(*, "names")= chr [1:4] "xmin" "ymin" "xmax" "ymax"

This may give some performance changes, both in memory footprint and runtime; still need to do benchmarks.

Comments welcome!

@edzer
Copy link
Member Author

edzer commented Dec 13, 2022

Memory size: decreases with a factor 18:

library(sf)
# Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.1; sf_use_s2() is TRUE
n = 1e6
data.frame(z = rnorm(n), x = runif(n), y = runif(n)) -> df
st_as_sf(df, coords = c("x", "y")) |>
  st_geometry() -> pts
(s1 = object.size(pts))
# 24002864 bytes
(s2 = object.size(pts[]))
# 432002312 bytes
as.numeric(s2)/as.numeric(s1)
# [1] 17.99795

@edzer
Copy link
Member Author

edzer commented Dec 13, 2022

This one is pretty obvious:

library(sf)
# Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.1; sf_use_s2() is TRUE
n = 1e6
data.frame(z = rnorm(n), x = runif(n), y = runif(n)) -> df
st_as_sf(df, coords = c("x", "y")) |>
  st_geometry() -> pts
pts   |> st_is_empty() |> system.time()
#    user  system elapsed 
#   1.692   0.072   1.764 
system.time(pts0 <- pts[])
#    user  system elapsed 
#   1.104   0.036   1.140 
pts0 |> st_is_empty()  |> system.time()
#    user  system elapsed 
#   1.853   0.057   1.909 
pts |> st_combine() |> system.time()
#    user  system elapsed 
#   0.002   0.000   0.002 
pts0 |> st_combine() |> system.time()
#    user  system elapsed 
#   3.119   0.155   3.275 

@paleolimbot
Copy link
Contributor

This is a very cool idea! The use of an attribute to maintain a semblance of backward compatibility with the structure is clever.

I do wonder if this change is going to result in enough performance gains to justify the maintenance cost of the additional representation. My adventures playing with this branch are below...basically all the ways I know about based on the benchmarks above. wk::xy() is more like data.frame(x, y, z) than what you have here...basically just because it doesn't require the matrix attribute shenanigan to work with existing data frames. The GEOS representation is a list() of external pointers to GEOSGeometry(). Geoarrow isn't quite ready for this yet but its kernels will probably be faster than both of those.

While I certainly don't mind if this representation exists in sf, my first impression is that it's rather complicated and not that fast. If performance is the motivation, I wonder if broadening the sf API to work with other S3 classes (e.g., wk::xy(), geos::geos_geometry(), whatever terra has on the go). Functionally that mostly means just making more things S3 generic (then people like me can provide fast implementations in wk and geos without you having to do anything!). Slightly harder would be loosening that assumption for the sf geometry column.

Another low-hanging performance fruit would be to drop the cached bbox and geometry classes (that's most of why the geos package is faster than sf in most cases).

library(sf)
#> Linking to GEOS 3.11.1, GDAL 3.6.0, PROJ 9.1.1; sf_use_s2() is TRUE

n = 1e5
data.frame(z = rnorm(n), x = runif(n), y = runif(n)) -> df
st_as_sf(df, coords = c("x", "y")) |>
    st_geometry() -> pts

pts0 <- pts[]
wk_xy <- wk::as_xy(df)
geos_pts <- geos::as_geos_geometry(wk_xy)

lobstr::obj_size(pts)
#> 2.40 MB
lobstr::obj_size(pts0)
#> 12.80 MB
lobstr::obj_size(wk_xy)
#> 2.40 MB
# not really accurate...only includes R allocs
lobstr::obj_size(geos_pts)
#> 7.20 MB

bench::mark(
    st_is_empty(pts),
    st_is_empty(pts0),
    wk::wk_meta(wk_xy)$size == 0,
    wk::wk_meta(geos_pts)$size == 0,
    geos::geos_is_empty(geos_pts)
)
#> # A tibble: 5 × 6
#>   expression                           min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                      <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 st_is_empty(pts)                 33.83ms  34.48ms      28.9    1.16MB     12.8
#> 2 st_is_empty(pts0)                46.59ms   46.8ms      21.4    1.15MB     49.8
#> 3 wk::wk_meta(wk_xy)$size == 0      2.06ms   2.25ms     421.      3.1MB     53.9
#> 4 wk::wk_meta(geos_pts)$size == 0   4.21ms   4.43ms     221.     3.06MB     31.1
#> 5 geos::geos_is_empty(geos_pts)   756.29µs 840.21µs    1180.   406.18KB     22.7

bench::mark(
    st_combine(pts),
    st_combine(pts0),
    wk::wk_collection(wk_xy, wk::wk_geometry_type("multipoint")),
    geos::geos_make_collection(geos_pts, "multipoint"),
    check = FALSE
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 4 × 6
#>   expression                                                        min   median
#>   <bch:expr>                                                   <bch:tm> <bch:tm>
#> 1 st_combine(pts)                                               97.99µs 100.61µs
#> 2 st_combine(pts0)                                             139.98ms 181.44ms
#> 3 wk::wk_collection(wk_xy, wk::wk_geometry_type("multipoint"))   1.85ms   2.03ms
#> 4 geos::geos_make_collection(geos_pts, "multipoint")             6.09ms   6.51ms
#> # … with 3 more variables: `itr/sec` <dbl>, mem_alloc <bch:byt>, `gc/sec` <dbl>

Copy link
Member

@rsbivand rsbivand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the POINT object (XY and XYZ) more like sp, but:

> data.frame(z=rnorm(3), x=1:3, y=3:1) |> st_as_sf(coords = c("x", "y")) -> x
> as(x$geometry[], "Spatial")
SpatialPoints:
     coords.x1 coords.x2
[1,]         1         3
[2,]         2         2
[3,]         3         1
Coordinate Reference System (CRS) arguments: NA 
> as(x$geometry, "Spatial")
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function 'coordinates' for signature '"NULL"'
> as(x, "Spatial")
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'addAttrToGeom': unable to find an inherited method for function 'coordinates' for signature '"NULL"'
> sp_x <- as(x$geometry[], "Spatial")
> st_as_sfc(sp_x)
Geometry set for 3 features 
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 1 ymin: 1 xmax: 3 ymax: 3
CRS:           NA
POINT (1 3)
POINT (2 2)
POINT (3 1)
> st_as_sfc(sp_x) |> unclass() |> str()
List of 3
 $ : 'XY' num [1:2] 1 3
 $ : 'XY' num [1:2] 2 2
 $ : 'XY' num [1:2] 3 1
 - attr(*, "precision")= num 0
 - attr(*, "bbox")= 'bbox' Named num [1:4] 1 1 3 3
  ..- attr(*, "names")= chr [1:4] "xmin" "ymin" "xmax" "ymax"
 - attr(*, "crs")=List of 2
  ..$ input: chr NA
  ..$ wkt  : chr NA
  ..- attr(*, "class")= chr "crs"
 - attr(*, "n_empty")= int 0

@edzer
Copy link
Member Author

edzer commented Dec 14, 2022

@rsbivand thanks, that should now work; I finished implementation so far only to pass sf's own tests, this was not in there.

@bart1
Copy link
Contributor

bart1 commented Dec 22, 2022

I just encountered an other case where this pull request could provide a big speedup. When using print.sfc it can be quite slow (20 seconds for 10^7 points) as the dimensions are calculated in an apply function: https://github.com/r-spatial/sf/blob/main/R/sfc.R#L195 . If this information can directly be taken from the attribute points_dim this should speed up considerably

@edzer
Copy link
Member Author

edzer commented Dec 22, 2022

... which is rather stupid (the 20 seconds), given that #2046 suggests sf reading breaks (unintended) in case of mixed dimensions...

@kadyb
Copy link
Contributor

kadyb commented Feb 28, 2023

Today ALTREP support for lists appeared in R SVN. Do you think it can somehow affect {sf} if it uses lists as data structure?

@edzer
Copy link
Member Author

edzer commented Feb 28, 2023

Thanks for reporting! I'm not an altrep expert, but would expect no miracles; I think it's mostly useful for (i) data structures out of memory (e.g. in a memory mapped area) or (ii) unrealised arrays (e.g. 1:1e12, where you're going to compute the value when you need it rather than realize it in memory). The first would help with massive point sets (but merely make it possible, not fast), the second doesn't help us here.

The speedup of this PR (if any) is due to the flat memory model of matrices vs. the arbitrary memory layout of lists; accessing lists elements is more expensive because of that; the smaller memory footprint may be even more important.

@paleolimbot
Copy link
Contributor

FWIW I actually think ALTREP for lists will be very useful here. It will let you use the data structure you've prototyped here without breaking backwards compatibility with users of the previous data structure (i.e., everybody).

@edzer
Copy link
Member Author

edzer commented Mar 30, 2023

Great idea, do you know of an example of such a thing?

@bart1
Copy link
Contributor

bart1 commented Jun 29, 2023

I just played around with this pull request out of curiosity what the consequences would be for move2. I see substantial memory usage reductions, in the example data from 7mb to 2mb. That is great!
I also noticed when running the test of the package, some errors occur when adding empty points to an sfc vector. For example st_distance failing here (NaN results). I guess this relates to combining vectors or creating empty points. Here is an example (code works with current sf package):

require(sf)
#> Loading required package: sf
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
n <- 4
data.frame(z = rnorm(n), x = runif(n), y = runif(n)) -> df
st_as_sf(df, coords = c("x", "y")) |>
  st_geometry() -> pts
empty <- st_as_sfc("POINT(EMPTY)", crs = st_crs(pts))
c(pts, empty)
#> Geometry set for 5 features  (with 1 geometry empty)
#> Geometry type: POINT
#> Error in a$points[i, , drop = FALSE]: subscript out of bounds
st_distance(c(pts, empty), c(empty, pts), by_element = T)
#> [1] NaN NaN NaN NaN
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.1.2 (2021-11-01)
#>  os       Ubuntu 22.04.2 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Amsterdam
#>  date     2023-06-29
#>  pandoc   2.19.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  class         7.3-22  2023-05-03 [1] CRAN (R 4.1.2)
#>  classInt      0.4-9   2023-02-28 [1] CRAN (R 4.1.2)
#>  cli           3.6.1   2023-03-23 [1] CRAN (R 4.1.2)
#>  DBI           1.1.3   2022-06-18 [1] CRAN (R 4.1.2)
#>  digest        0.6.32  2023-06-26 [1] CRAN (R 4.1.2)
#>  dplyr         1.1.2   2023-04-20 [1] CRAN (R 4.1.2)
#>  e1071         1.7-13  2023-02-01 [1] CRAN (R 4.1.2)
#>  evaluate      0.21    2023-05-05 [1] CRAN (R 4.1.2)
#>  fansi         1.0.4   2023-01-22 [1] CRAN (R 4.1.2)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.1.2)
#>  fs            1.6.2   2023-04-25 [1] CRAN (R 4.1.2)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.1.2)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.1.2)
#>  htmltools     0.5.5   2023-03-23 [1] CRAN (R 4.1.2)
#>  KernSmooth    2.23-21 2023-05-03 [1] CRAN (R 4.1.2)
#>  knitr         1.43    2023-05-25 [1] CRAN (R 4.1.2)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.1.2)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.1.2)
#>  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.1.2)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.1.2)
#>  proxy         0.4-27  2022-06-09 [1] CRAN (R 4.1.2)
#>  purrr         1.0.1   2023-01-10 [1] CRAN (R 4.1.2)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.1.2)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.1.2)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.1.2)
#>  R.utils       2.12.2  2022-11-11 [1] CRAN (R 4.1.2)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.1.2)
#>  Rcpp          1.0.10  2023-01-22 [1] CRAN (R 4.1.2)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.1.2)
#>  rlang         1.1.1   2023-04-28 [1] CRAN (R 4.1.2)
#>  rmarkdown     2.22    2023-06-01 [1] CRAN (R 4.1.2)
#>  rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.1.2)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.1.2)
#>  sf          * 1.0-13  2023-06-29 [1] Github (r-spatial/sf@e3def06)
#>  styler        1.10.1  2023-06-05 [1] CRAN (R 4.1.2)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.1.2)
#>  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.1.2)
#>  units         0.8-2   2023-04-27 [1] CRAN (R 4.1.2)
#>  utf8          1.2.3   2023-01-31 [1] CRAN (R 4.1.2)
#>  vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.1.2)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.1.2)
#>  xfun          0.39    2023-04-20 [1] CRAN (R 4.1.2)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.1.2)
#> 
#>  [1] /home/bart/R/x86_64-pc-linux-gnu-library/4.1
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

@edzer
Copy link
Member Author

edzer commented Jun 29, 2023

Great example; that should run better, now.

@bart1
Copy link
Contributor

bart1 commented Jun 30, 2023

Thanks @edzer , I tested a bit more and noticed two other examples changing for unexpected behavior. Sorry for having these issues iteratively, the popup up in some of the unit test. Let me also know if your not looking for these detailed tests yet

require(sf)
#> Loading required package: sf
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
require(dplyr)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
n <- 4
data.frame(z = rnorm(n), x = runif(n), y = runif(n)) -> df
dfsf<-st_as_sf(df, coords = c("x", "y"))
# The filtered data turns empty:
dfsf |> filter(rep(T,4))
#> Simple feature collection with 4 features and 1 field (with 4 geometries empty)
#> Geometry type: GEOMETRY
#> Dimension:     XY
#> Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
#> CRS:           NA
#>             z                 geometry
#> 1 -0.32671172 GEOMETRYCOLLECTION EMPTY
#> 2 -0.05574357 GEOMETRYCOLLECTION EMPTY
#> 3 -1.34424667 GEOMETRYCOLLECTION EMPTY
#> 4 -0.24700773 GEOMETRYCOLLECTION EMPTY
# there is no empty point inserted
dfsf$geometry[2]<-st_point()
dfsf
#> Simple feature collection with 4 features and 1 field (with 4 geometries empty)
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 0.02291049 ymin: 0.2227787 xmax: 0.6861906 ymax: 0.4822213
#> CRS:           NA
#>             z                     geometry
#> 1 -0.29728105  POINT (0.3049286 0.4746749)
#> 2 -0.33397322  POINT (0.5350945 0.2988531)
#> 3 -0.93209304  POINT (0.6861906 0.4822213)
#> 4  0.08929689 POINT (0.02291049 0.2227787)
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.1.2 (2021-11-01)
#>  os       Ubuntu 22.04.2 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Amsterdam
#>  date     2023-06-30
#>  pandoc   2.19.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  ...
#>  dplyr       * 1.1.2   2023-04-20 [1] CRAN (R 4.1.2)
#>  ...
#>  sf          * 1.0-14  2023-06-29 [1] Github (r-spatial/sf@3ec76b8)
#>  ...
#> 
#>  [1] /home/bart/R/x86_64-pc-linux-gnu-library/4.1
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

@edzer
Copy link
Member Author

edzer commented Jun 30, 2023

This should now also run.

@bart1
Copy link
Contributor

bart1 commented Jun 30, 2023

That works for the examples above but filter fails if there is an empty location:

require(sf)
#> Loading required package: sf
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
n <- 4
data.frame(z = rnorm(n), x = runif(n), y = runif(n)) -> df
dfsf<-st_as_sf(df, coords = c("x", "y"))
dfsf$geometry[2]<-st_point()
dfsf |> dplyr::filter(rep(T,4))
#> Error in st_as_sf.data.frame(NextMethod(), coords = attr(.data, "sf_column"), : missing values in coordinates not allowed

I also see a few other changes in behavior/test error but I have not been able to make small reproducible examples out of those yet

@bart1
Copy link
Contributor

bart1 commented Jul 2, 2023

It seems dplyr filter drops the CRS:

require(dplyr)
require(sf)
#> Loading required package: sf
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
data(meuse, package = "sp")
meuse_sf = st_as_sf(meuse, coords = c("x", "y"), crs = 28992, agr = "constant")
st_crs(meuse_sf)
#> Coordinate Reference System:
#>   User input: EPSG:28992 
#>   wkt:
....
#>         BBOX[50.75,3.2,53.7,7.22]],
#>     ID["EPSG",28992]]
st_crs(dplyr::filter(meuse_sf, copper>80))
#> Coordinate Reference System: NA
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
.....
#>  sf          * 1.0-14  2023-07-01 [1] Github (r-spatial/sf@203187d)
.....
#> 
#>  [1] /home/bart/R/x86_64-pc-linux-gnu-library/4.1
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2023-07-02 with reprex v2.0.2

@bart1
Copy link
Contributor

bart1 commented Jan 22, 2024

Hi, I have been doing some more testing. In general I use the test suite from move2 (most test/functions use sf in the background in some way) to find these changes from before (for the move2 package this pull request would be quite beneficial). Then I try to isolate these errors to sf workflows. Let me know there is no desire to merge this request then I will stop testing.

I have found a few more cases that show some strange/undesired? behavior.

require(sf)
#> Loading required package: sf
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
d<-st_as_sf(data.frame(x=c(1:4), y=c(1:3,NA)), coords=c("x","y"), na.fail = F)
st_is_empty(st_geometry(d))
#> [1] FALSE FALSE FALSE FALSE
st_is_empty(d) 
#> [1] FALSE FALSE FALSE FALSE
# should be c(F,F,F, T)?, now is c(F,F,F,F)

dput(st_geometry(d))
#> structure(list(NULL, NULL, NULL, NULL), points = structure(c(1, 
#> 2, 3, 4, 1, 2, 3, NA), dim = c(4L, 2L)), points_dim = "XY", n_empty = 0L, precision = 0, crs = structure(list(
#>     input = NA_character_, wkt = NA_character_), class = "crs"), bbox = structure(c(xmin = 1, 
#> ymin = 1, xmax = 4, ymax = 3), class = "bbox"), class = c("sfc_POINT", 
#> "sfc"))
dput(st_geometry(d[1:4,])) 
#> structure(list(structure(c(1, 1), class = c("XY", "POINT", "sfg"
#> )), structure(c(2, 2), class = c("XY", "POINT", "sfg")), structure(c(3, 
#> 3), class = c("XY", "POINT", "sfg")), structure(c(4, NA), class = c("XY", 
#> "POINT", "sfg"))), n_empty = 0L, precision = 0, crs = structure(list(
#>     input = NA_character_, wkt = NA_character_), class = "crs"), bbox = structure(c(xmin = 1, 
#> ymin = 1, xmax = 4, ymax = 3), class = "bbox"), class = c("sfc_POINT", 
#> "sfc"))
# sub setting directly reduces to the old structure, keeping the matrix might be nicer and makes sure the benefits are throughout scripts

e<-st_as_sf(data.frame(x=c(1:3, NA), y=c(1:3,NA)), coords=c("x","y"), na.fail = F)

st_is_empty(st_geometry(e)) 
#> [1] FALSE FALSE FALSE  TRUE
# works correctly
st_is_empty(st_geometry(dplyr::filter(e, rep(T,4)))) 
#> Error in scan(text = lst[[length(lst)]], quiet = TRUE): scan() expected 'a real', got 'ParseException:'
#> Error in (function (msg) : ParseException: Unexpected EOF parsing WKB
# fails,  one difference I see is that a Z dimension is added to points_dim.
sessioninfo::session_info(pkgs = "attached")
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.2 (2023-10-31)
#>  os       Ubuntu 22.04.3 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Amsterdam
#>  date     2024-01-22
#>  pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package * version date (UTC) lib source
#>  sf      * 1.0-14  2024-01-22 [1] Github (r-spatial/sf@d38b3dc)
#> 
#>  [1] /home/bart/R/x86_64-pc-linux-gnu-library/4.3
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2024-01-22 with reprex v2.0.2

@bart1
Copy link
Contributor

bart1 commented Jan 23, 2024

Another small change/corner case casting and empty sfc fails:

require(sf)
#> Loading required package: sf
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
st_cast(st_sfc(),"POINT")
#> Error in cls[2, ]: incorrect number of dimensions
sessioninfo::session_info("attached")
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.2 (2023-10-31)
#>  os       Ubuntu 22.04.3 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Amsterdam
#>  date     2024-01-23
#>  pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package * version date (UTC) lib source
#>  sf      * 1.0-14  2024-01-22 [1] Github (r-spatial/sf@d38b3dc)
#> 
#>  [1] /home/bart/R/x86_64-pc-linux-gnu-library/4.3
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

This used to work fine:

require(sf)
#> Loading required package: sf
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
st_cast(st_sfc(),"POINT")
#> Geometry set for 0 features 
#> Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
#> CRS:           NA
sessioninfo::session_info("attached")
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.2 (2023-10-31)
#>  os       Ubuntu 22.04.3 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Amsterdam
#>  date     2024-01-23
#>  pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package * version date (UTC) lib source
#>  sf      * 1.0-15  2023-12-18 [1] CRAN (R 4.3.2)
#> 
#>  [1] /home/bart/R/x86_64-pc-linux-gnu-library/4.3
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2024-01-23 with reprex v2.0.2

@bart1
Copy link
Contributor

bart1 commented Jan 23, 2024

Great to see progress! @edzer am I right to notice that this resolves the cast error I posted today but does not address the remarks from yesterday?

@bart1
Copy link
Contributor

bart1 commented Jan 23, 2024

Additionally I found this issue with replacing values:

require(sf)
d<-st_as_sf(data.frame(x=1:3,y=1:3), coords = c("x","y"))
e<-st_as_sf(data.frame(x=4:5,y=4:5), coords = c("x","y"))
# Works correctly if both are of the list type:
replace(st_geometry(d)[T,], c(F,T,T), st_geometry(e)[T,])
#> Geometry set for 3 features 
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 1 ymin: 1 xmax: 5 ymax: 5
#> CRS:           NA
#> POINT (1 1)
#> POINT (4 4)
#> POINT (5 5)
# However not of the replacement is to the matrix type:
replace(st_geometry(d)[T,], c(F,T,T), st_geometry(e))
#> Geometry set for 3 features  (with 2 geometries empty)
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 1 ymin: 1 xmax: 1 ymax: 1
#> CRS:           NA
#> POINT (1 1)
#> POINT EMPTY
#> POINT EMPTY
# Replace is equivalent to:
dd<-st_geometry(d)[T,]
ee<-st_geometry(e)
dd[c(F,T,T)]<-ee
dd
#> Geometry set for 3 features  (with 2 geometries empty)
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 1 ymin: 1 xmax: 1 ymax: 1
#> CRS:           NA
#> POINT (1 1)
#> POINT EMPTY
#> POINT EMPTY
sessioninfo::package_info("attached")
#>  package * version date (UTC) lib source
#>  sf      * 1.0-16  2024-01-23 [1] Github (r-spatial/sf@4ce501e)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants