Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch parts of overline to data.table #517

Merged
merged 7 commits into from
Aug 17, 2023
Merged

Switch parts of overline to data.table #517

merged 7 commits into from
Aug 17, 2023

Conversation

mem48
Copy link
Collaborator

@mem48 mem48 commented Aug 15, 2023

@Robinlovelace a very quick look at speeding up overline2. I've replaced some dplyr code with data.table

bench::mark(t1 = stplanr::overline2(r, c("bicycle","car_driver","all")),
            t2 = overline2(r, attrib = c("bicycle","car_driver","all")),
            check = FALSE)

# A tibble: 2 × 13
  expression    min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result
  <bch:expr> <bch:> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>
1 t1          45.3s  45.3s    0.0221   13.46GB    0.728     1    33      45.3s <NULL>
2 t2          24.6s  24.6s    0.0406    2.06GB    1.50      1    37      24.6s <NULL>

t1 <- t1[order(t1$bicycle, t1$all),]
t2 <- t2[order(t2$bicycle, t2$all),]
identical(t1, t2) # TRUE

Resutls come out in a different order, but I don't think that matters. I've not tested all use cases like large data and use multicore.

Its a modest speed inmprovements but a major reduction in memory usage which should help.

There is another bit of dplyr code I can't work out how to do in data.table

sls <- dplyr::group_by_at(sl, c("1", "2", "3", "4"))
sls <- dplyr::ungroup(dplyr::summarise_all(sls, .funs = fun))

but the fix I have made is to only make one object sls rather than making an slg object that is never used again.

@mem48
Copy link
Collaborator Author

mem48 commented Aug 15, 2023

Test done with

r = readRDS("../../nptscot/npt/outputdata/2023-07-09/routes_max_dist_commute_fastest.Rds")
nrow(r) # 257284

So 257284 segments in Edinburgh

@Robinlovelace
Copy link
Member

Sounds promising. If I recall correctly I was seeing more than a 2x speed-up in tests I was doing a few weeks ago: https://github.com/Robinlovelace/overline-tests

Just opened-up, was aiming for the tests to be a bit more ready before.

👍 to speeding-up overline!

@mem48
Copy link
Collaborator Author

mem48 commented Aug 16, 2023

@Robinlovelace I've now checked this and got the multicore support working, although it seems to matter a lot less with data.table can we get this merged as it would benefit the NPT builds

@mem48
Copy link
Collaborator Author

mem48 commented Aug 16, 2023

Sounds promising. If I recall correctly I was seeing more than a 2x speed-up in tests I was doing a few weeks ago: https://github.com/Robinlovelace/overline-tests

Perhaps I'm not understanding, but I only see small speedups on some of the faster parts of the process?

@Robinlovelace
Copy link
Member

Will take a look.

@Robinlovelace
Copy link
Member

can we get this merged as it would benefit the NPT builds

You can use this branch with

remotes::install_github("ropensci/stplanr@overline-dt")

@Robinlovelace
Copy link
Member

5x reduction in memory use = v. promising.

@Robinlovelace Robinlovelace merged commit 6583e63 into master Aug 17, 2023
5 checks passed
@mpadge
Copy link
Member

mpadge commented Aug 17, 2023

@Robinlovelace @mem48 FYI After a heap of CRAN rejections for one of my pkgs with lots of data.table usage, I learnt that new CRAN checks mean that I needed this line at the top of all examples, tests, and vignettes:

data.table::setDTthreads(1L)

Without that, CRAN autochecks always rejected because of "ratio of CPU to elaped time > X". In case that helps.

@Robinlovelace
Copy link
Member

Handy. Probably over-due a CRAN release. Thanks Mark!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants