Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network centrality function #90

Open
mpadge opened this issue Apr 5, 2019 · 14 comments
Open

network centrality function #90

mpadge opened this issue Apr 5, 2019 · 14 comments
Labels
Documentation Documentation update needed must do Critically important

Comments

@mpadge
Copy link
Member

mpadge commented Apr 5, 2019

After #69 ad #89, dodgr will be able to calculate centrality more efficiently than igraph, and it will be straightforward to adapt existing code to do so.

@mpadge
Copy link
Member Author

mpadge commented May 13, 2019

Centrality can already be calculated with dodgr_flows and a unit flow matrix, but this is not the same as igraph::centrality, because the latter simply uses the graph with unit edge weights and unit flow matrix, whereas the dodgr-equivalent version uses the full weighted graph. The igraph version can be calculated by setting all edge distances to 1, but dodgr won't calculate that as efficiently as igraph because dodgr remains hard coded for <double> edge weights (see #89). Closing this for now and leaving the 2 options as:

  1. Faster, cheaper calculation via igraph to give centrality independent of everything but graph connections; or
  2. Calculation "as is" via dodgr that takes longer than igraph, but will give a much more accurate measure that reflects the fullness of the underlying graph structure including edge properties.

@mpadge mpadge closed this as completed May 13, 2019
@mpadge
Copy link
Member Author

mpadge commented Oct 4, 2019

Re-opening because this now seems really worth doing. I just have to code a custom Dijkstra to (i) use (rounded) <int> edge weights; and (ii) aggregate to unit-integer values. Can then calculate centrality via arbitrary metrics: distance, time, and even angle. Ping @Robinlovelace @agila5

@mpadge
Copy link
Member Author

mpadge commented Oct 10, 2019

Note this algorithm (original paper here - "A faster algorithm for betweenness centrality" (2001)). This is what igraph uses, and fairly straightfoward to implement.

@mpadge
Copy link
Member Author

mpadge commented Oct 15, 2019

All of the C++ work done in this commit. Now just have to wrap in an R function, and test.

@Robinlovelace
Copy link
Contributor

Awesome!

mpadge added a commit that referenced this issue Oct 16, 2019
mpadge added a commit that referenced this issue Oct 16, 2019
@mpadge mpadge closed this as completed in d85bfc0 Oct 16, 2019
@mpadge
Copy link
Member Author

mpadge commented Oct 16, 2019

@Robinlovelace That gets us to this:

library(dodgr)
graph <- weight_streetnet (hampi, wt_profile = "foot")
igr <- dodgr_to_igraph (graph)
graph_c <- dodgr_contract_graph (graph)
igr_c <- dodgr_to_igraph (graph_c)

# edge betweenness on full graph
f_dodgr <- function (graph) dodgr_centrality (graph, contract = FALSE, edges = TRUE)
f_igraph <- function (graph) igraph::edge_betweenness (graph)
rbenchmark::benchmark (
             res_d <- f_dodgr (graph),
             res_i <- f_igraph (igr),
             replications = 2
) [, 1:4]
#>                      test replications elapsed relative
#> 1 res_d <- f_dodgr(graph)            2   1.234    1.213
#> 2  res_i <- f_igraph(igr)            2   1.017    1.000
identical (res_i, res_d$centrality)
#> [1] TRUE

# edge betweenness on contracted graph
rbenchmark::benchmark (
             res_d <- f_dodgr (graph_c),
             res_i <- f_igraph (igr_c),
             replications = 2
) [, 1:4]
#>                        test replications elapsed relative
#> 1 res_d <- f_dodgr(graph_c)            2   0.022    1.222
#> 2  res_i <- f_igraph(igr_c)            2   0.018    1.000
identical (res_i, res_d$centrality)
#> [1] TRUE

Created on 2019-10-16 by the reprex package (v0.3.0)

dodgr is only about 20% slower, essentially because it's C++ while igraph is pure C. But the point of this is now the trivial extra step of wrapping it in RcppParalllel, so that dodgr will effectively be several times faster through parallelisation... Re-opening until that's been done.

@mpadge mpadge reopened this Oct 16, 2019
@Robinlovelace
Copy link
Contributor

Legend. Looks very cool. I look forward to results and giving this a spin. Heads-up @agila5, if this can work with different weighting profiles that respect oneway and other restrictions this will be a big step forward.

@mpadge
Copy link
Member Author

mpadge commented Oct 16, 2019

It already does that, and the idea is to enable flexible specification of different ways of measuring centrality that can't be done via igraph, notably including for example centrality considering effects of incline, or using time-based routing including effects of turning across oncoming traffic.

@Robinlovelace
Copy link
Contributor

the idea is to enable flexible specification of different ways of measuring centrality

The ideal contents of WHO-funded research into scenarios of intervention in the transport system.

@mpadge mpadge closed this as completed in ef3f4f6 Oct 16, 2019
@mpadge
Copy link
Member Author

mpadge commented Oct 16, 2019

setwd ("/data/mega/code/repos/atfutures/dodgr")
devtools::load_all (".", export_all = FALSE)
#> Loading dodgr
graph <- dodgr_streetnet ("ilkley england") %>%
    weight_streetnet (wt_profile = "bicycle") %>%
    dodgr_contract_graph ()
#> The following highway types are present in data yet lack corresponding weight_profile values: road,
message ("graph has ", format (nrow (graph), big.mark = ","), " edges")
#> graph has 5,671 edges
igr <- dodgr_to_igraph (graph)

# -------- edge centrality --------
f_dodgr <- function (graph) dodgr_centrality (graph, contract = FALSE,
                                              edges = TRUE, parallel = FALSE)
f_dodgr_par <- function (graph) dodgr_centrality (graph, contract = FALSE,
                                                  edges = TRUE, parallel = TRUE)
f_igraph <- function (graph) igraph::edge_betweenness (graph)
rbenchmark::benchmark (
             res_d <- f_dodgr (graph),
             res_dp <- f_dodgr_par (graph),
             res_i <- f_igraph (igr),
             replications = 1, order = "relative") [, 1:4]
#>                           test replications elapsed relative
#> 2 res_dp <- f_dodgr_par(graph)            1   0.252    1.000
#> 3       res_i <- f_igraph(igr)            1   0.867    3.440
#> 1      res_d <- f_dodgr(graph)            1   0.917    3.639
identical (res_i, res_d$centrality)
#> [1] TRUE
identical (res_i, res_dp$centrality)
#> [1] TRUE

# -------- vertex centrality --------
f_dodgr <- function (graph) dodgr_centrality (graph, contract = FALSE,
                                              edges = FALSE, parallel = FALSE)
f_dodgr_par <- function (graph) dodgr_centrality (graph, contract = FALSE,
                                                  edges = FALSE, parallel = TRUE)
f_igraph <- function (graph) igraph::betweenness (graph)
rbenchmark::benchmark (
             res_d <- f_dodgr (graph),
             res_dp <- f_dodgr_par (graph),
             res_i <- f_igraph (igr),
             replications = 1, order = "relative") [, 1:4]
#>                           test replications elapsed relative
#> 2 res_dp <- f_dodgr_par(graph)            1   0.246    1.000
#> 3       res_i <- f_igraph(igr)            1   0.838    3.407
#> 1      res_d <- f_dodgr(graph)            1   0.890    3.618
identical (unname (res_i), res_d$centrality)
#> [1] TRUE
identical (unname (res_i), res_dp$centrality)
#> [1] TRUE

Created on 2019-10-16 by the reprex package (v0.3.0)

@Robinlovelace @agila5 - now we're talking - dodgr_centrality is now parallelised, and should run several times faster than igraph. This is what we need to calculate centrality of huge graphs. (Note that now that it all works, the non-parallel version and accompanying parameter shown above will be ditched, and everything will always be in parallel.)

@mpadge
Copy link
Member Author

mpadge commented Oct 17, 2019

Re-opened to implement threshold parameter - centrality measured only within a certain distance of each point converges to value measured over entire graph (within some constant scale factor), so this enables considerable speed-ups. Next commits will implement ...

@mpadge mpadge reopened this Oct 17, 2019
@agila5
Copy link
Contributor

agila5 commented Oct 17, 2019

Hi Mark, amazing, as always, and, as always, I'm super lost 😰

Next will I'm going to get back to studying these topics!

mpadge added a commit that referenced this issue Oct 17, 2019
@mpadge mpadge closed this as completed in 7be1f1b Oct 17, 2019
@mpadge
Copy link
Member Author

mpadge commented Oct 17, 2019

That commit and the few ones prior implement a couple of helper functions, estimate_centrality_threshold() and estimate_centrality_time(). These are very useful for big graphs, for which true centrality measures using every pair of points are often impractical. Calculating centrality only within a given distance threshold often provides good approximations to the full values, and the estimate_centrality_threshold() function provides an estimate of these thresholds for a specified error tolerance. The estimate_centrality_time() function can then be used to estimate the likely time required for a complete centrality calculation for a given threshold.


Now re-opening for hopefully the last time, to close upon writing a vignette on all these centrality calculations and options.

@mpadge mpadge reopened this Oct 17, 2019
@Robinlovelace
Copy link
Contributor

As an additional point, it seems that the centrality values are lost when converting back to sf:

library(dodgr)
graph <- weight_streetnet (hampi, wt_profile = "foot")
igr <- dodgr_to_igraph (graph)
#> Loading required namespace: igraph


system.time({res_igr <- igraph::edge_betweenness(igr)})
#>    user  system elapsed 
#>   0.504   0.000   0.504
system.time({res_dodgr <- dodgr_centrality(graph, contract = FALSE, edges = TRUE)})
#>    user  system elapsed 
#>   0.771   0.040   0.160
res_dodgr_sf <- dodgr_to_sf(res_dodgr)
#> Loading required namespace: sf
names(res_dodgr_sf)
#>  [1] "geom_num"      "edge_id"       "from_id"       "from_lon"     
#>  [5] "from_lat"      "to_id"         "to_lon"        "to_lat"       
#>  [9] "d"             "d_weighted"    "highway"       "way_id"       
#> [13] "time"          "time_weighted" "component"     "geometry"

Created on 2019-10-22 by the reprex package (v0.3.0)

Speedup is impressive though, good work Mark!

@mpadge mpadge pinned this issue Oct 29, 2019
@mpadge mpadge added must do Critically important Documentation Documentation update needed labels Oct 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Documentation update needed must do Critically important
Projects
None yet
Development

No branches or pull requests

3 participants