Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

st_intersection 30 times slower in 0.9_6 compared to 0.7_4 #1510

Closed
geogale opened this issue Oct 13, 2020 · 49 comments
Closed

st_intersection 30 times slower in 0.9_6 compared to 0.7_4 #1510

geogale opened this issue Oct 13, 2020 · 49 comments

Comments

@geogale
Copy link

geogale commented Oct 13, 2020

Hi. I am running the following code:

library(sf)

download.file("https://www.dropbox.com/s/izol20rxf90apye/Areas.gpkg?dl=1",
              destfile = "Areas.gpkg" , mode='wb')
Boundary1 <- st_transform(st_read("Areas.gpkg"),crs=27700)

download.file("https://www.dropbox.com/s/z8pham6w9ldesfy/Local_Authority_Districts__May_2020__Boundaries_UK_BFE-shp.zip?dl=1",
              destfile = "Local_Authority_Districts__May_2020__Boundaries_UK_BFE-shp.zip" , mode='wb')
unzip("Local_Authority_Districts__May_2020__Boundaries_UK_BFE-shp.zip", exdir = ".")
Boundary2 <- st_transform(st_read("Local_Authority_Districts__May_2020__Boundaries_UK_BFE.shp"),crs=27700)

Boundary_Intersection <- st_intersection(Boundary1,Boundary2)
Boundary_Intersection$Area <- as.numeric(st_area(Boundary_Intersection))

The intersection takes about 15 minutes to complete when using R 4.0.2 and sf 0.9_6 but only around 30 seconds when using R 3.4.0 and sf 0.7_4. Is there a reason why it is now a lot slower?

@rsbivand
Copy link
Member

Please report GEOS versions for both cases - and your OS, with your installation choices (MacOS or Windows CRAN binaries, source install, etc.). Windows binaries are now using a more recent GEOS, as I think are MacOS. In both cases, sessionInfo() and sf::sf_extSoftVersion(), with a description of how you installed sf and if you built from source, how you installed GEOS.

@geogale
Copy link
Author

geogale commented Oct 13, 2020

sf 0.9_6 is using GDAL 3.8.0 and sf 0.7_4 is using GDAL 3.6.1. I am running this on Windows (although, I have 0.7_4/3.6.1 on a Mac and it's just as quick). I am doing this on a work machine that uses JFrog Artifactory to mirror Windows binaries - installing a package looks like this: install.packages("sf", depends=TRUE, type = "win.binary")

@rsbivand
Copy link
Member

OK, you mean GEOS, not GDAL. So either something changed in sf, GEOS, or both. Let's see if @edzer can comment before proceeding. I'm assuming that the platforms are otherwise comparable.

@geogale
Copy link
Author

geogale commented Oct 13, 2020

Oops! I should have said GDAL 2.2.3 vs 3.0.4

@rsbivand
Copy link
Member

I asked about GEOS, not GDAL. Even though sf may do topology predicates and operations through GDAL, it is the version of GEOS that matters. The older variant of CRAN Windows binaries uses https://github.com/rwinlib/gdal2 GEOS 3.6.1, the newer https://github.com/rwinlib/gdal3 GEOS 3.8.0.

@edzer
Copy link
Member

edzer commented Oct 13, 2020

sf uses GEOS directly, not through GDAL (with the exception of specifying ops in the query argument to st_read).

@rsbivand
Copy link
Member

Sorry, didn't check. Do you know of obvious changes in GEOS or the sf interface to GEOS (in Rcpp?) since early 2018?

@edzer
Copy link
Member

edzer commented Oct 13, 2020

I don't see, and cannot recall, any changes made to st_intersection, neither the R nor the C++ code, since april 2019 (release of 0.7-4).

@rsbivand
Copy link
Member

Shall I try to repllicate on Win10 - CRAN sf 0.9-6 and source install from GDAL2 of 0.7-4? Or do you have access?

@edzer
Copy link
Member

edzer commented Oct 13, 2020

That would be great - I don't have (simple) access to windows, and can't install 0.7-4 on my working machine (requires proj_api.h, no longer avail in PROJ 7).

@geogale geogale closed this as completed Oct 13, 2020
@geogale geogale reopened this Oct 13, 2020
@rsbivand
Copy link
Member

rsbivand commented Oct 13, 2020

On sf 0.9-7 with development GEOS (with NG overlay I think) - so it could be a validity handling issue - @geogale don't worry about this, we test as-yet unreleased GEOS etc. versions. This is a message, and the inputs and the output are all valid, F32 time 617 seconds:

CBR: result (after common-bits addition) is INVALID: Self-intersection at or near point 368576.69999999995 864696 (368576.69999999995343 864696)

@rsbivand
Copy link
Member

On Win10, R 4.0.2 and sf 0.9-6 and GDAL3 GEOS 3.8.0 16.95 min (old T420s laptop)
On Win10 old T420s laptop R 4.0.2 and sf 0.7-4 built locally with current Rcpp and Rtools4, GEOS 3.6.1 38 sec. So the regression is very real for this data set. It is not coming from different versions of Rcpp or R itself. I have not succeeded in installing sf 0.9-6 against rwinlib/GDAL2, I think I would have to install a pre-Rtools4 R and Rtools. I'll try to choose GEOS/PROJ/GDAL versions on F32 matching the un-degraded case, so GEOS 3.6.1, GDAL 2.2.3 and PROJ 4.9.3 or similar. Maybe trying on Debian/Ubuntu with GEOS 3.8.0 or 3.8.1 could cross-check to see if my 617 seconds (10.3 min on a much faster laptop) is caused by the NG overlay or was there before.

@rsbivand
Copy link
Member

With GEOS 3.9.0dev, sf was 617 seconds for intersection, but rgeos with the same GEOS is 154 seconds for the same intersection (F32).

@edzer
Copy link
Member

edzer commented Oct 13, 2020

> system.time(Boundary_Intersection <- st_intersection(Boundary1,Boundary2))
   user  system elapsed 
 27.705   0.007  27.716 
Warning message:
attribute variables are assumed to be spatially constant throughout all geometries 
> sf_extSoftVersion()
          GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H 
       "3.6.0"        "2.2.4"        "4.8.0"        "false"        "false" 
> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux bullseye/sid

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sf_0.9-7

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3         class_7.3-15       crayon_1.3.4       dplyr_0.8.3       
 [5] assertthat_0.2.1   grid_3.6.2         R6_2.4.1           DBI_1.1.0         
 [9] magrittr_1.5       e1071_1.7-3        units_0.6-5        pillar_1.4.3      
[13] KernSmooth_2.23-16 rlang_0.4.4        tools_3.6.2        glue_1.3.1        
[17] purrr_0.3.3        compiler_3.6.2     pkgconfig_2.0.3    classInt_0.4-2    
[21] tidyselect_1.0.0   tibble_2.1.3      

indicates to me that the regression is not in sf...

@rsbivand
Copy link
Member

And it is a big regression, isn't it? What are the timings for GEOS 3.8.1 (I'm stuck with 3.9.0dev at the moment - do you have 3.8.0 or 3.8.1 running)? It isn't rwinlib either.

@edzer
Copy link
Member

edzer commented Oct 13, 2020

With

GEOS 3.8.1, GDAL 3.1.3, PROJ 7.1.1

I see

> system.time(Boundary_Intersection <- st_intersection(Boundary1,Boundary2))
   user  system elapsed 
562.452   0.067 562.728 
Warning message:
attribute variables are assumed to be spatially constant throughout all geometries 

so, yeah, a factor 20.

@olcaysah
Copy link

olcaysah commented Oct 13, 2020 via email

@rsbivand
Copy link
Member

rsbivand commented Oct 13, 2020

Oops: current rgeos with GEOS runtime version: 3.6.3-CAPI-1.10.3 runs 190 seconds, as against 154 seconds with rgeos 3.9.0dev - with overlay-NG I think (all F32, R 4.0.3). With GEOS 3.8.1 (also before overlay-NG), it takes 150 seconds. I'm hesitant to rebuild sf because GDAL is built with GEOS, so I'd have to rebuild GDAL too. I'm not sure that this isn't sf or perhaps Rcpp, which is needed for the interface? I recall that at some point (3.7.?)? it became more necessary to check for validity?

@edzer
Copy link
Member

edzer commented Oct 13, 2020

Or it's in parts that sf uses, but rgeos doesn't, like the STRtree?

@rsbivand
Copy link
Member

Unfortunately, for R profiling, .Call gets 99.98% of the run time. How could we profile Rcpp::wrap(CPL_geos_op2()) or CPL_geos_op2(), or further back the internals of that function?

@edzer
Copy link
Member

edzer commented Oct 13, 2020

I tried compiling with -pg, but that doesn't work with a package; probably need to recompile entire R with -pg (if supported at all).

@rsbivand
Copy link
Member

Maybe I can check STRtree separately in rgeos::gUnarySTRtreeQuery() to see? Doesn't look like a smoking gun:

> library(rgeos)
Loading required package: sp
rgeos version: 0.5-5, (SVN revision 640)
 GEOS runtime version: 3.9.0dev-CAPI-1.14.0 
 Linking to sp version: 1.4-4 
 Polygon checking: TRUE 

> load("GEOS_regression_objects.rda")
> system.time(res <- gUnarySTRtreeQuery(a_sp))
   user  system elapsed 
  0.113   0.002   0.116 
> system.time(res <- gUnarySTRtreeQuery(b_sp))
   user  system elapsed 
  0.777   0.016   0.796 
> system.time(res <- gBinarySTRtreeQuery(a_sp, b_sp))
   user  system elapsed 
  0.874   0.007   0.889 
> system.time(res <- gBinarySTRtreeQuery(b_sp, a_sp))
   user  system elapsed 
  0.882   0.013   0.901 

@rsbivand
Copy link
Member

@edzer
Copy link
Member

edzer commented Oct 14, 2020

sf 0.7-5 is the earliest version that works without proj_api.h; compiled against current GEOS/GDAL/PROJ it also has the regression:

library(sf)
# Linking to GEOS 3.8.1, GDAL 3.1.3, PROJ 7.1.1
utils::packageVersion("sf")
# [1] ‘0.7.5’
#download.file("https://www.dropbox.com/s/izol20rxf90apye/Areas.gpkg?dl=1",
#              destfile = "Areas.gpkg" , mode='wb')
Boundary1 <- st_transform(st_read("Areas.gpkg"),crs=27700)
# Reading layer `Areas' from data source `/tmp/Areas.gpkg' using driver `GPKG'
# Simple feature collection with 1000 features and 1 field
# geometry type:  MULTIPOLYGON
# dimension:      XY
# bbox:           xmin: -116.1928 ymin: 68218.7 xmax: 640329.7 ymax: 868949.7
# epsg (SRID):    27700
# proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs

#download.file("https://www.dropbox.com/s/z8pham6w9ldesfy/Local_Authority_Districts__May_2020__Boundaries_UK_BFE-shp.zip?dl=1",
#              destfile = "Local_Authority_Districts__May_2020__Boundaries_UK_BFE-shp.zip" , mode='wb')
#unzip("Local_Authority_Districts__May_2020__Boundaries_UK_BFE-shp.zip", exdir = ".")
Boundary2 <- st_transform(st_read("Local_Authority_Districts__May_2020__Boundaries_UK_BFE.shp"),crs=27700)
# Reading layer `Local_Authority_Districts__May_2020__Boundaries_UK_BFE' from data source `/tmp/Local_Authority_Districts__May_2020__Boundaries_UK_BFE.shp' using driver `ESRI Shapefile'
# Simple feature collection with 379 features and 10 fields
# geometry type:  MULTIPOLYGON
# dimension:      XY
# bbox:           xmin: -116.1928 ymin: 5333.81 xmax: 655989 ymax: 1220302
# epsg (SRID):    27700
# proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs
system.time(Boundary_Intersection <- st_intersection(Boundary1, Boundary2))
#    user  system elapsed 
# 581.509   0.383 582.846 
# Warning message:
# attribute variables are assumed to be spatially constant throughout all geometries 

Is one of you able to time the sf 0.7-5 binary on Windows?

@rsbivand
Copy link
Member

rsbivand commented Oct 14, 2020

I'll try that in a moment and report back. The st_transform() steps are not needed as the input is in OSGB National Grid anyway.

However, after asking on geos-devel how to build with overlay-ng (I thought it was enabled in master by default but wasn't) https://lists.osgeo.org/pipermail/geos-devel/2020-October/009750.html, I have:

> library(rgeos)
Loading required package: sp
rgeos version: 0.5-5, (SVN revision 640)
 GEOS runtime version: 3.9.0dev-CAPI-1.14.0 
 Linking to sp version: 1.4-4 
 Polygon checking: TRUE 

> load("GEOS_regression_objects.rda")
> system.time(bI <- gIntersection(a_sp, b_sp, byid=TRUE))
   user  system elapsed 
 12.970   0.157  13.264 
> system.time(bI <- gIntersection(a_sp, b_sp, byid=TRUE))
   user  system elapsed 
 12.591   0.018  12.654 
> library(sf)
Linking to GEOS 3.9.0dev, GDAL 3.2.0dev-b51dc28ab7, PROJ 7.2.0
> a_sf <- st_as_sf(a_sp)
> b_sf <- st_as_sf(b_sp)
> system.time(bIa <- st_intersection(a_sf, b_sf))
   user  system elapsed 
 16.414   0.098  16.558 
Warning message:
attribute variables are assumed to be spatially constant throughout all geometries 
> packageVersion("sf")
[1] ‘0.9.7’

So maybe GEOS 3.7 introduced the regression but overlay-ng will not only remove it, but will also be faster. I still need to run checks and revdep checks.

Maybe 3.9 will build on Solaris (3.7 and 3.8 are broken)??

@rsbivand
Copy link
Member

rsbivand commented Oct 14, 2020

I can only install sf 0.7-5 on Windows with rwinlib/GDAL2, so PROJ 4.9.3. I tried to mix GDAL2 and GDAL3 yesterday without success, as GDAL2 is Rtools < 4, and GDAL3 >= 4 IIUC. I don't think the sf change is related to proj.h. I'll try to trap the regression by GEOS releases, I suspect 3.6.* are fast, and will check whether the regression was 3.7.* or 3.8.* . Since 3.9.0dev is faster anyway (than 3.6.*), maybe we'll be lucky?

@rsbivand
Copy link
Member

I cannot detect the regression through rgeos, unfortunately:

> library(rgeos)
Loading required package: sp
rgeos version: 0.5-5, (SVN revision 640)
 GEOS runtime version: 3.6.0-CAPI-1.10.0
 Linking to sp version: 1.4-4
 Polygon checking: TRUE

> load("GEOS_regression_objects.rda")
> system.time(bI <- gIntersection(a_sp, b_sp, byid=TRUE))
   user  system elapsed
185.260   0.303 195.412


> library(rgeos)
Loading required package: sp
rgeos version: 0.5-5, (SVN revision 640)
 GEOS runtime version: 3.6.3-CAPI-1.10.3
 Linking to sp version: 1.4-4 
 Polygon checking: TRUE 

> load("GEOS_regression_objects.rda")
> system.time(bI <- gIntersection(a_sp, b_sp, byid=TRUE))

   user  system elapsed 
187.596   0.320 195.193

> library(rgeos)
Loading required package: sp
rgeos version: 0.5-5, (SVN revision 640)
 GEOS runtime version: 3.7.0-CAPI-1.11.0
 Linking to sp version: 1.4-4
 Polygon checking: TRUE

> load("GEOS_regression_objects.rda")
> system.time(bI <- gIntersection(a_sp, b_sp, byid=TRUE))
   user  system elapsed
183.152   0.261 191.275

To try with sf, I'd need to re-build GDAL for each GEOS version too. Looks more like container instances with a succession of version sets.

@edzer
Copy link
Member

edzer commented Oct 14, 2020

I built sf 0.7-5 in a docker against

> library(sf)
Linking to GEOS 3.6.0, GDAL 2.2.4, PROJ 4.8.0
> utils::packageVersion("sf")
[1] ‘0.7.5’

and get

> system.time(Boundary_Intersection <- st_intersection(Boundary1,Boundary2))
   user  system elapsed 
 28.033   0.000  28.035 
Warning message:
attribute variables are assumed to be spatially constant throughout all geometries 

This is the identical sf package code (downloaded from CRAN) that gave me 600 secs here. I'm a bit at a loss what to try next.

@rsbivand
Copy link
Member

Could you try Overlay-NG - a recent nightly (I had 9 October on my desktop, master from git on gitea on laptop), ../configure --enable-overlayng, and see if you also get the speed-up I'm seeing. As rgeos seems not to see a regression for gIntersection(), maybe it isn't GEOS but an interaction between sf, Rcpp and GEOS (and here I think Rcpp is constant).

@edzer
Copy link
Member

edzer commented Oct 14, 2020

With overlayng-enabled GEOS compiled from github, and

> library(sf)
Linking to GEOS 3.9.0dev, GDAL 3.1.1, PROJ 7.1.0

I see

> system.time(Boundary_Intersection <- st_intersection(Boundary1,Boundary2))
   user  system elapsed 
 16.668   0.040  16.711 
Warning message:
attribute variables are assumed to be spatially constant throughout all geometries 

edzer added a commit that referenced this issue Oct 14, 2020
@rsbivand
Copy link
Member

Very good! This is the same improvement that I see. My revdeps are running, so I'll see whether there are any breakages attributable to Overlay-NG.

@edzer
Copy link
Member

edzer commented Oct 14, 2020

And the other issue seems to be related to precision, default set in rgeos to 1e8, set to zero (not set) in sf. When setting to 1e8:

library(sf)
# Linking to GEOS 3.8.1, GDAL 3.1.3, PROJ 7.1.1
utils::packageVersion("sf")
# [1] ‘0.9.7’
#download.file("https://www.dropbox.com/s/izol20rxf90apye/Areas.gpkg?dl=1",
#              destfile = "Areas.gpkg" , mode='wb')
Boundary1 <- st_set_precision(st_transform(st_read("Areas.gpkg"),crs=27700),
	1e8)
# Reading layer `Areas' from data source `/tmp/Areas.gpkg' using driver `GPKG'
# Simple feature collection with 1000 features and 1 field
# geometry type:  MULTIPOLYGON
# dimension:      XY
# bbox:           xmin: -116.1928 ymin: 68218.7 xmax: 640329.7 ymax: 868949.7
# projected CRS:  OSGB 1936 / British National Grid

#download.file("https://www.dropbox.com/s/z8pham6w9ldesfy/Local_Authority_Districts__May_2020__Boundaries_UK_BFE-shp.zip?dl=1",
#              destfile = "Local_Authority_Districts__May_2020__Boundaries_UK_BFE-shp.zip" , mode='wb')
#unzip("Local_Authority_Districts__May_2020__Boundaries_UK_BFE-shp.zip", exdir = ".")
Boundary2 <- st_set_precision(st_transform(st_read("Local_Authority_Districts__May_2020__Boundaries_UK_BFE.shp"),crs=27700),
	1e8)
# Reading layer `Local_Authority_Districts__May_2020__Boundaries_UK_BFE' from data source `/tmp/Local_Authority_Districts__May_2020__Boundaries_UK_BFE.shp' using driver `ESRI Shapefile'
# Simple feature collection with 379 features and 10 fields
# geometry type:  MULTIPOLYGON
# dimension:      XY
# bbox:           xmin: -116.1928 ymin: 5333.81 xmax: 655989 ymax: 1220302
# projected CRS:  OSGB 1936 / British National Grid
system.time(Boundary_Intersection <- st_intersection(Boundary1, Boundary2))
#    user  system elapsed 
#  22.916   0.038  22.962 
# Warning message:
# attribute variables are assumed to be spatially constant throughout all geometries 

@edzer
Copy link
Member

edzer commented Oct 14, 2020

For me, that closes the issue.

@edzer
Copy link
Member

edzer commented Oct 14, 2020

... but not before mentioning #1482 and @dbaston - this has taken both Roger and me several hours, not for the first time, and would not have been needed if sf had used the same default precision as rgeos.

@geogale
Copy link
Author

geogale commented Oct 14, 2020

For me, that closes the issue.

@edzer just to confirm my layperson understanding. Are you suggesting setting the precision to 1e8 for all st_intersection operations? Obviously with my example it's a known problem, but with other examples with larger volumes of data it might not be feasible to try with and without setting the precision to see if it makes a time difference.

@dbaston
Copy link
Contributor

dbaston commented Oct 14, 2020

Yes, perturbing the inputs can have a large effect on algorithm runtime. Most of the time it will probably improve. In other cases it will cause the overlay to fail entirely or return an incorrect result.

@rsbivand
Copy link
Member

With 1e8, 3.9.0dev runs 10 seconds, as against 17 seconds with default precision. So Overlay-NG seems to help anyway, but for < 3.9.0, users (of GEOS 3.7 and 3.8 ??) need to be advised to try setting precision to 1e8 if run times are excessive for binary (?) topological operations.

@edzer
Copy link
Member

edzer commented Oct 14, 2020

@geogale setting precision to 1e8 means (with coordinates having units m) that all coordinates are rounded to the nearest multiple of 10 nm before computing the intersection. As @dbaston mentions, it might be a good thing to always do so, but it also might be another cause for problems.

@Robinlovelace
Copy link
Contributor

Following this thread with interest, don't understand every part of it but very grateful that this has been discovered and documented for the benefit of everyone working with medium-large datasets. From @edzer's comment above, I guess that this would mean running a command such as

st_precision(Boundary1) = 0.00000001

and the same for Boundary2 would speed-up the operation on machines that show the potential regression, as per https://r-spatial.github.io/sf/reference/st_precision.html

If a default setting like that has the potential for factor 30 speed-ups, and given most users will not know about let alone read this issue, my first impression is: in favour.

@edzer
Copy link
Member

edzer commented Oct 14, 2020

Robin, I set it to 1e8, not 1e-8. You can indeed set it to units::set_units(10, nm), which is equivalent to setting it to 1e8.

@Robinlovelace
Copy link
Contributor

Aha, so:

st_precision(Boundary1) = 1e8

It's not clear to me from the example in https://r-spatial.github.io/sf/reference/st_precision.html what appropriate values are. Is it worth updating that help page with an example that is closer to a real-world scenario?

@Robinlovelace
Copy link
Contributor

P.s. also just spotted that there is the equivalent code above.

Boundary2 <- st_set_precision(st_transform(st_read("Local_Authority_Districts__May_2020__Boundaries_UK_BFE.shp"),crs=27700),
	1e8)

@edzer
Copy link
Member

edzer commented Oct 14, 2020

I agree that for instance

"Local_Authority_Districts__May_2020__Boundaries_UK_BFE.shp" %>%
  st_read() %>%
  st_transform(crs=27700) %>%
  st_set_precision(units::set_units(10, nm)) ->
  Boundary2

is a more readable form.

The documentation of precision refers here, did you make it there?

@Robinlovelace
Copy link
Contributor

Both are readable and it's partly a matter of style, I think style is a continuum and that both ways of writing it represents different ends of the continuum. I would probably have gone for an intermediary approach, but definitely wasn't aiming to comment on the style, just the substance.

In answer to your question, no I didn't make it there and probably should have checked. From there I can see the key quote:

to specify 3 decimal places of precision, use a scale factor of 1000...

Worth saying something like that in the docs for st_precision() also? Happy to give it a bash if that would help.

@Robinlovelace
Copy link
Contributor

Robinlovelace commented Oct 14, 2020

For what it's worth I had a think about how to type that 'tidy pipeline' and came up with this, a balance that is readable and concise?

f = "Local_Authority_Districts__May_2020__Boundaries_UK_BFE.shp"
Boundary2_original = st_read(f)
Boundary2 = Boundary2_original
  st_transform(crs=27700) %>%
  st_set_precision(units::set_units(10, nm)) 

I didn't know about the st_set_precision(units::set_units(10, nm)) notation. That is really neat!

@Robinlovelace
Copy link
Contributor

Also as a heads-up @geogale, I wrote a function that you can point to one of the datasets provided by data.gov.uk/ESRI and it auto downloads/unzips the shapefile + tries to read it in a single command - take a look if of use/interest, I've been using it a lot of late: https://github.com/Robinlovelace/ukboundaries/blob/master/R/ukborders-package.R#L6

Reprex:

library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
u = "https://opendata.arcgis.com/datasets/910f48f3c4b3400aa9eb0af9f8989bbe_0.zip?outSR=%7B%22latestWkid%22%3A27700%2C%22wkid%22%3A27700%7D"
Boundary2_original = ukboundaries::duraz(u)
#> Using default data cache directory ~/.ukboundaries/cache 
#> Use cache_dir() to change it.
#> Reading layer `Local_Authority_Districts__May_2020__UK_BUC' from data source `/tmp/RtmpEfD8o1/reprex174185e970dd2/Local_Authority_Districts__May_2020__UK_BUC.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 379 features and 10 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -116.1928 ymin: 7054.1 xmax: 655653.8 ymax: 1218618
#> projected CRS:  OSGB 1936 / British National Grid
#> Removing the following files: Local_Authority_Districts__May_2020__UK_BUC.dbf Local_Authority_Districts__May_2020__UK_BUC.prj Local_Authority_Districts__May_2020__UK_BUC.shp Local_Authority_Districts__May_2020__UK_BUC.shx
Boundary2 = Boundary2_original %>% 
  st_transform(crs=27700) %>%
  st_set_precision(units::set_units(10, nm)) 
plot(Boundary2)

Created on 2020-10-14 by the reprex package (v0.3.0)

@geogale
Copy link
Author

geogale commented Oct 14, 2020

Thanks @Robinlovelace - I have no doubt that's been well used lately! I may be reading the data from the place where it lives before ending up in an Esri interface - but that would be telling!

@rsbivand
Copy link
Member

rsbivand commented Oct 15, 2020

Fun (?) fact: GEOS 3.9.0 with Overlay-NG generates the same slivers, but reports them in different order, see r-spatial/asdar-book.org@144f3d5. Same changed order of output breaks geozoning, uribo/kuniezu#6, plotly/plotly.R#1862, sf itself #1512, hypertidy/sfdct#13.

@dr-jts
Copy link

dr-jts commented Nov 7, 2020

Yes, perturbing the inputs (by using a precision value) can have a large effect on algorithm runtime. Most of the time it will probably improve. In other cases it will cause the overlay to fail entirely or return an incorrect result.

A note that this behaviour changes in GEOS 3.9 with OverlayNG. OverlayNG should generally run faster if floating (= no) precision is specified, since it can use faster noding algorithms. Conversely, specifying a precision invokes snap-rounding noding, which is generally slower (anywhere from 1.5x to 3x). So the recommendation to use a precision model should probably be revisited (once confirmed by experience by the sf community).

@edzer edzer closed this as completed Mar 3, 2021
ateucher added a commit to bcgov/ecosystem-representation-analysis that referenced this issue Aug 11, 2021
javierps added a commit to HopkinsIDD/cholera-mapping-pipeline that referenced this issue Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants