Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.data pronouns slow down plots #5730

Closed
teunbrand opened this issue Feb 29, 2024 · 3 comments · Fixed by #5731
Closed

.data pronouns slow down plots #5730

teunbrand opened this issue Feb 29, 2024 · 3 comments · Fixed by #5731

Comments

@teunbrand
Copy link
Collaborator

This issue is separated from #5729.

Using the .data pronoun is about 20% slower than use normal aesthetics, which seems unreasonable to me.
All the slowdown is in the build stage, not the gtable stage.

library(ggplot2)

p <- ggplot(mtcars) +
  geom_point() +
  facet_grid(gear ~ cyl)

standard <- p + aes(x = mpg, y = disp)
pronoun  <- p + aes(x = .data[["mpg"]], y = .data[["disp"]])

bench::mark(
  ggplot_build(standard),
  ggplot_build(pronoun),
  check = FALSE, min_iterations = 5
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression                  min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>             <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 ggplot_build(standard)   48.9ms   50.5ms      19.5    6.46MB     60.5
#> 2 ggplot_build(pronoun)    75.4ms   76.8ms      11.6    7.88MB     32.9

standard <- ggplot_build(standard)
pronoun  <- ggplot_build(pronoun)

bench::mark(
  ggplot_gtable(standard),
  ggplot_gtable(pronoun),
  check = FALSE, min_iterations = 5
)
#> # A tibble: 2 × 6
#>   expression                   min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>              <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 ggplot_gtable(standard)   41.4ms   42.2ms      23.6    3.39MB     70.7
#> 2 ggplot_gtable(pronoun)    42.4ms   43.1ms      23.0  218.79KB     61.4

Created on 2024-02-29 with reprex v2.1.0

@aphalo
Copy link
Contributor

aphalo commented Feb 29, 2024

The delay increases with the number of layers in the plot. Once the namespaces have been loaded, the time increase can be easily more than 20%, even more than 200%. Most of the slow down takes place when each layer is rendered. When profiling plots that use .data I see calls to utils:::readCitationFile() , parsing of the bibentry, etc., in each layer. Such calls are absent when .data is not used. Hopefully, this provides some clue into the origin of the problem. The example above, uses a plot with no layers, and this explains why the slowdown is only 20%. I give some more realistic timings in the reprex below.

The more complex the plot, the proportionally larger the slow-down seems to be. In the two examples below the median time increases by a factor of about 2.4 and 4 times respectively with some small variation between runs. (If a file is actually being read the slowdown is likely to vary with OS and computer hardware.)

library(ggplot2)

p1 <- ggplot(mtcars, aes(hp, mpg, colour = factor(cyl))) +
  geom_point()

p2 <- ggplot(mtcars, aes(.data[["hp"]], 
                         .data[["mpg"]], 
                         colour = factor(.data[["cyl"]]))) +
  geom_point()

gc()
#>           used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells  841786 45.0    1432812 76.6  1432812 76.6
#> Vcells 1446284 11.1    8388608 64.0  2317017 17.7
bench::mark(ggplotGrob(p1), 
            ggplotGrob(p2),
            check = FALSE, 
            min_iterations = 10
)
#> # A tibble: 2 × 6
#>   expression          min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 ggplotGrob(p1)   50.8ms   51.7ms     19.4    10.54MB     12.9
#> 2 ggplotGrob(p2)  121.2ms  121.2ms      8.25    9.13MB     74.2

p3 <- ggplot(mtcars, aes(hp, mpg, colour = factor(cyl), label = am)) +
  geom_point() +
  geom_text()

p4 <- ggplot(mtcars, aes(.data[["hp"]], 
                         .data[["mpg"]], 
                         colour = factor(.data[["cyl"]]), 
                         label = .data[["am"]])) +
  geom_point() +
  geom_text()

gc()
#>           used (Mb) gc trigger  (Mb) max used  (Mb)
#> Ncells 1151472 61.5    2304648 123.1  2304648 123.1
#> Vcells 2053219 15.7    8388608  64.0  7469216  57.0
bench::mark(ggplotGrob(p3), 
            ggplotGrob(p4),
            check = FALSE, 
            min_iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression          min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 ggplotGrob(p3)   59.5ms   63.3ms     15.9   444.89KB     7.93
#> 2 ggplotGrob(p4)  241.7ms  246.7ms      3.77    2.19MB     6.78

Created on 2024-02-29 with reprex v2.1.0

@aphalo
Copy link
Contributor

aphalo commented Feb 29, 2024

Just for completeness, the reprex below shows that vars() is not affected.

library(ggplot2)

p5 <- ggplot(mtcars, aes(hp, mpg, colour = factor(vs), label = am)) +
  geom_point() +
  geom_text() +
  facet_wrap(facets = vars(cyl))

p6 <- ggplot(mtcars, aes(hp, mpg, colour = factor(vs), label = am)) +
  geom_point() +
  geom_text() +
  facet_wrap(facets = vars(.data[["cyl"]]))

p7 <- ggplot(mtcars, aes(hp, mpg, colour = factor(vs), label = am)) +
  geom_point() +
  geom_text() +
  facet_wrap(facets = vars(factor(.data[["cyl"]])))


gc()
#>           used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells  864372 46.2    1435730 76.7  1435730 76.7
#> Vcells 1480113 11.3    8388608 64.0  2354795 18.0
bench::mark(ggplotGrob(p5), 
            ggplotGrob(p6),
            ggplotGrob(p7),
            check = FALSE, 
            min_iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 3 × 6
#>   expression          min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 ggplotGrob(p5)    124ms    129ms      7.34   16.23MB     6.61
#> 2 ggplotGrob(p6)    130ms    131ms      7.44    1.58MB     7.44
#> 3 ggplotGrob(p7)    131ms    135ms      6.92  522.75KB     6.92

Created on 2024-02-29 with reprex v2.1.0

@teunbrand
Copy link
Collaborator Author

teunbrand commented Feb 29, 2024

Found the culprit: see #5731 for diagnosis.
Thanks for spotting this bug and the help debugging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants