-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
132 lines (98 loc) · 3.4 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
geobench is a place to put data (in the releases) and code (in any language) to test performance.
It relies on the `bench` package for testing and datasets uploaded into the releases.
## Pre-requisites
This section lists the software that needs to be installed to run these benchmarks.
It will likely evolve over time.
### R packages
- `bench`
- `sf`
- `sp`
- `spData`
## Set-up
The code in this section only needs to run once, or at least each time a new dataset is added to the benchmarks.
Python was set-up to run `python3`.
This can be done at the system level or, if you're running these benchmarks from RStudio by adding the following line to `.Renviron` (which can be edited with `usethis::edit_r_environ()`):
```
RETICULATE_PYTHON=/usr/bin/python3
```
### Large point dataset covering UK
```{r, eval=FALSE}
ac = stats19::get_stats19(year = 2022, output_format = "sf")
sf::write_sf(ac, "ac.geojson", delete_dsn = TRUE)
fs::file_size("ac.geojson")
# 147 MB .geojson file
sf::write_sf(ac[1:1e6, ], "ac-M.geojson", delete_dsn = TRUE)
sf::write_sf(ac[1:1e5, ], "ac-100K.geojson", delete_dsn = TRUE)
sf::write_sf(ac[1:1e4, ], "ac-10K.geojson", delete_dsn = TRUE)
sf::write_sf(ac[1:1e3, ], "ac-1K.geojson", delete_dsn = TRUE)
zip("ac.geojson.zip", "ac.geojson") # 14 MB
piggyback::pb_upload(file = "ac.geojson.zip")
piggyback::pb_upload(file = "ac-100K.geojson")
piggyback::pb_upload(file = "ac-10K.geojson")
reticulate::py_install("pytictoc")
```
## Download data
```{r}
if(!file.exists("ac.geojson")) {
download.file("https://github.com/ATFutures/geobench/releases/download/0.0.1/ac.geojson.zip", "ac.geojson.zip")
unzip("ac.geojson.zip")
download.file("https://github.com/ATFutures/geobench/releases/download/0.0.1/ac-100k.geojson", "ac-100k.geojson")
download.file("https://github.com/ATFutures/geobench/releases/download/0.0.1/ac-10k.geojson", "ac-10k.geojson")
}
```
## Benchmark 1: reading data
### Full dataset
Took too long to run and consumed too much memory to evaluate each time.
Results from first run pasted below:
```{r, eval=FALSE}
bench::mark(iterations = 1, check = FALSE,
{ac_sf = sf::read_sf("ac.geojson")},
{ac_sp = rgdal::readOGR("ac.geojson")}
)
#> expression min mean median max `itr/sec` mem_alloc
#> <chr> <bch:t> <bch:t> <bch:tm> <bch:t> <dbl> <bch:byt>
#> 1 {... 36.7s 36.7s 36.7s 36.7s 0.0272 655.6MB
#> 2 {... 2m 2m 2m 2m 0.00835 1.2GB
```
### 100K sample
This runs happily and quickly so the results are run each time:
```{r, message=FALSE}
bench::mark(iterations = 1, check = FALSE,
geosf = {ac_gsf = geojsonsf::geojson_sf("ac-100K.geojson")},
sf = {ac_sf10k = sf::read_sf("ac-100K.geojson")},
sp = {ac_sp10k = rgdal::readOGR("ac-100K.geojson", verbose = F)}
)
```
```{r, engine='bash', eval=FALSE}
pip3 install geopandas
pip3 install pytictoc
pip3 install rasterio
```
```{python}
import sys
print(sys.version)
import geopandas as gpd
from pytictoc import TicToc
t = TicToc()
t.tic()
s = gpd.read_file("ac-100K.geojson")
t.toc()
```
## Benchmark 2: spatial subsetting
Work in progress...
```{r}
```
## System info
```{r}
system("lscpu", intern = TRUE)
```