Switch parts of overline to data.table #517

mem48 · 2023-08-15T13:40:28Z

@Robinlovelace a very quick look at speeding up overline2. I've replaced some dplyr code with data.table

bench::mark(t1 = stplanr::overline2(r, c("bicycle","car_driver","all")),
            t2 = overline2(r, attrib = c("bicycle","car_driver","all")),
            check = FALSE)

# A tibble: 2 × 13
  expression    min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result
  <bch:expr> <bch:> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>
1 t1          45.3s  45.3s    0.0221   13.46GB    0.728     1    33      45.3s <NULL>
2 t2          24.6s  24.6s    0.0406    2.06GB    1.50      1    37      24.6s <NULL>

t1 <- t1[order(t1$bicycle, t1$all),]
t2 <- t2[order(t2$bicycle, t2$all),]
identical(t1, t2) # TRUE

Resutls come out in a different order, but I don't think that matters. I've not tested all use cases like large data and use multicore.

Its a modest speed inmprovements but a major reduction in memory usage which should help.

There is another bit of dplyr code I can't work out how to do in data.table

sls <- dplyr::group_by_at(sl, c("1", "2", "3", "4"))
sls <- dplyr::ungroup(dplyr::summarise_all(sls, .funs = fun))

but the fix I have made is to only make one object sls rather than making an slg object that is never used again.

mem48 · 2023-08-15T13:44:37Z

Test done with

r = readRDS("../../nptscot/npt/outputdata/2023-07-09/routes_max_dist_commute_fastest.Rds")
nrow(r) # 257284

So 257284 segments in Edinburgh

Robinlovelace · 2023-08-15T13:44:59Z

Sounds promising. If I recall correctly I was seeing more than a 2x speed-up in tests I was doing a few weeks ago: https://github.com/Robinlovelace/overline-tests

Just opened-up, was aiming for the tests to be a bit more ready before.

👍 to speeding-up overline!

mem48 · 2023-08-16T16:37:50Z

@Robinlovelace I've now checked this and got the multicore support working, although it seems to matter a lot less with data.table can we get this merged as it would benefit the NPT builds

mem48 · 2023-08-16T16:40:06Z

Sounds promising. If I recall correctly I was seeing more than a 2x speed-up in tests I was doing a few weeks ago: https://github.com/Robinlovelace/overline-tests

Perhaps I'm not understanding, but I only see small speedups on some of the faster parts of the process?

Robinlovelace · 2023-08-16T19:43:43Z

Will take a look.

Robinlovelace · 2023-08-16T19:45:21Z

can we get this merged as it would benefit the NPT builds

You can use this branch with

remotes::install_github("ropensci/stplanr@overline-dt")

Robinlovelace · 2023-08-16T19:46:07Z

5x reduction in memory use = v. promising.

mpadge · 2023-08-17T09:29:04Z

@Robinlovelace @mem48 FYI After a heap of CRAN rejections for one of my pkgs with lots of data.table usage, I learnt that new CRAN checks mean that I needed this line at the top of all examples, tests, and vignettes:

data.table::setDTthreads(1L)

Without that, CRAN autochecks always rejected because of "ratio of CPU to elaped time > X". In case that helps.

Robinlovelace · 2023-08-17T21:29:00Z

Handy. Probably over-due a CRAN release. Thanks Mark!

Switch parts of overline to data.table

57025de

mem48 added 5 commits August 15, 2023 14:53

ad missing sf::

50275c4

Fix multicore support

5e56ee1

Bring in #510

e19390b

Drop regionalisation value

9c9292d

update docs

afb9c74

Add tests results

519891e

Robinlovelace approved these changes Aug 17, 2023

View reviewed changes

Robinlovelace merged commit 6583e63 into master Aug 17, 2023
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch parts of overline to data.table #517

Switch parts of overline to data.table #517

mem48 commented Aug 15, 2023

mem48 commented Aug 15, 2023

Robinlovelace commented Aug 15, 2023

mem48 commented Aug 16, 2023

mem48 commented Aug 16, 2023

Robinlovelace commented Aug 16, 2023

Robinlovelace commented Aug 16, 2023

Robinlovelace commented Aug 16, 2023

mpadge commented Aug 17, 2023

Robinlovelace commented Aug 17, 2023

Switch parts of overline to data.table #517

Switch parts of overline to data.table #517

Conversation

mem48 commented Aug 15, 2023

mem48 commented Aug 15, 2023

Robinlovelace commented Aug 15, 2023

mem48 commented Aug 16, 2023

mem48 commented Aug 16, 2023

Robinlovelace commented Aug 16, 2023

Robinlovelace commented Aug 16, 2023

Robinlovelace commented Aug 16, 2023

mpadge commented Aug 17, 2023

Robinlovelace commented Aug 17, 2023