README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

geobench is a place to put data (in the releases) and code (in any language) to test performance.

It relies on the `bench` package for testing and datasets uploaded into the releases.

## Pre-requisites

This section lists the software that needs to be installed to run these benchmarks.
It will likely evolve over time.

### R packages

- `bench`
- `sf`
- `sp`
- `spData`

## Set-up

The code in this section only needs to run once, or at least each time a new dataset is added to the benchmarks.

Python was set-up to run `python3`.
This can be done at the system level or, if you're running these benchmarks from RStudio by adding the following line to `.Renviron` (which can be edited with `usethis::edit_r_environ()`):

```
RETICULATE_PYTHON=/usr/bin/python3
```


### Large point dataset covering UK

```{r, eval=FALSE}
ac = stats19::get_stats19(year = 2022, output_format = "sf")
sf::write_sf(ac, "ac.geojson", delete_dsn = TRUE)
fs::file_size("ac.geojson")
# 147 MB .geojson file
sf::write_sf(ac[1:1e6, ], "ac-M.geojson", delete_dsn = TRUE)  
sf::write_sf(ac[1:1e5, ], "ac-100K.geojson", delete_dsn = TRUE)
sf::write_sf(ac[1:1e4, ], "ac-10K.geojson", delete_dsn = TRUE)
sf::write_sf(ac[1:1e3, ], "ac-1K.geojson", delete_dsn = TRUE)
zip("ac.geojson.zip", "ac.geojson") # 14 MB
piggyback::pb_upload(file = "ac.geojson.zip")
piggyback::pb_upload(file = "ac-100K.geojson")
piggyback::pb_upload(file = "ac-10K.geojson")
reticulate::py_install("pytictoc")
```

## Download data

```{r}
if(!file.exists("ac.geojson")) {
  download.file("https://github.com/ATFutures/geobench/releases/download/0.0.1/ac.geojson.zip", "ac.geojson.zip")
  unzip("ac.geojson.zip")
  download.file("https://github.com/ATFutures/geobench/releases/download/0.0.1/ac-100k.geojson", "ac-100k.geojson")
  download.file("https://github.com/ATFutures/geobench/releases/download/0.0.1/ac-10k.geojson", "ac-10k.geojson")
}
```

## Benchmark 1: reading data

### Full dataset

Took too long to run and consumed too much memory to evaluate each time.
Results from first run pasted below:

```{r, eval=FALSE}
bench::mark(iterations = 1, check = FALSE,
            {ac_sf = sf::read_sf("ac.geojson")},
            {ac_sp = rgdal::readOGR("ac.geojson")}
)
#> expression     min    mean   median     max `itr/sec` mem_alloc  
#>   <chr>      <bch:t> <bch:t> <bch:tm> <bch:t>     <dbl> <bch:byt>       
#> 1 {...         36.7s   36.7s    36.7s   36.7s   0.0272    655.6MB 
#> 2 {...            2m      2m       2m      2m   0.00835     1.2GB 
```

### 100K sample

This runs happily and quickly so the results are run each time:

```{r, message=FALSE}
bench::mark(iterations = 1, check = FALSE,
            geosf = {ac_gsf = geojsonsf::geojson_sf("ac-100K.geojson")},
            sf = {ac_sf10k = sf::read_sf("ac-100K.geojson")},
            sp = {ac_sp10k = rgdal::readOGR("ac-100K.geojson", verbose = F)}
)
```

```{r, engine='bash', eval=FALSE}
pip3 install geopandas
pip3 install pytictoc
pip3 install rasterio
```


```{python}
import sys
print(sys.version)
import geopandas as gpd
from pytictoc import TicToc
t = TicToc()
t.tic()
s = gpd.read_file("ac-100K.geojson")
t.toc()
```

## Benchmark 2: spatial subsetting

Work in progress...

```{r}
```

## System info

```{r}
system("lscpu", intern = TRUE)
```