-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logging overhead investigations #6046
Comments
I ran the renderpass benchmarks through superluminal (with |
they are still compiled in though? I assumed so far that the entire logging performance discussion is on the premise that most logs are not rendered (either displayed on console or written to disk). Log rendering performance depends on what logger to use (e.g. env_logger) and what target (some terminals are more horribly slow than others). I.e. we want to measure the overhead of the the calls to/interaction with the |
I checked that they are indeed compiled in. They should be useful if the workload is a good enough proxy for a game (so it would still be useful to check using bevy itself).
That is my understanding as well. |
All the bevy performance results have been with only |
Assuming you mean |
A lot of logging was removed in #6065, so it'll be important to pick a recent revision of wgpu for new measurements. Turns out d3d12 had a fair amount more logging than the other backends so it might explain discrepancies between results measured by different people (I've been benchmarking the vulkan backend on linux so far). Backends should produce similar more similar volume of logs now. |
I have previously observed that logs which are ultimately filtered out but which aren't filtered out by the global The benchmarks in wgpu don't include a logger, so the I took https://github.com/bevyengine/bevy/blob/0070bdccd873e186a9ccec3ecb6f1d3d83b8710b/crates/bevy_log/src/lib.rs, removed the irrelevant portions, and simplified it to: use tracing_subscriber::{prelude::*, registry::Registry, EnvFilter};
let filter = "wgpu=error,naga=warn".to_string();
let level = tracing::Level::INFO;
let default_filter = { format!("{},{}", level, filter) };
let filter_layer = EnvFilter::try_from_default_env()
.or_else(|_| EnvFilter::try_new(&default_filter))
.unwrap();
let fmt_layer = tracing_subscriber::fmt::Layer::default().with_writer(std::io::stderr);
let subscriber = Registry::default().with(filter_layer).with(fmt_layer);
tracing_log::LogTracer::init().unwrap();
tracing::subscriber::set_global_default(subscriber).unwrap(); Notably, BenchmarksTo test this out, I did some benchmarks starting with the commit before the log cleanup in #6065 Using:
(0) 7ff80d6 (1) 7ff80d6 with logging setup to match bevy defaults (compared against 0): diffdiff --git a/Cargo.lock b/Cargo.lock
index 2dbf69ee7..91b650830 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -3661,9 +3661,21 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c3523ab5a71916ccf420eebdf5521fcef02141234bbc0b8a49f2fdc4544364ef"
dependencies = [
"pin-project-lite",
+ "tracing-attributes",
"tracing-core",
]
+[[package]]
+name = "tracing-attributes"
+version = "0.1.27"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "34704c8d6ebcbc939824180af020566b01a7c01f80641264eba0999f6c2b6be7"
+dependencies = [
+ "proc-macro2",
+ "quote",
+ "syn 2.0.72",
+]
+
[[package]]
name = "tracing-core"
version = "0.1.32"
@@ -4251,6 +4263,9 @@ dependencies = [
"pollster",
"profiling",
"rayon",
+ "tracing",
+ "tracing-log",
+ "tracing-subscriber",
"tracy-client",
"wgpu",
]
diff --git a/benches/Cargo.toml b/benches/Cargo.toml
index 82207d510..75745579b 100644
--- a/benches/Cargo.toml
+++ b/benches/Cargo.toml
@@ -42,5 +42,8 @@ once_cell.workspace = true
pollster.workspace = true
profiling.workspace = true
rayon.workspace = true
+tracing = "0.1.40"
+tracing-log = "0.2.0"
+tracing-subscriber = { version = "0.3.18", features = ["env-filter"] }
tracy-client = { workspace = true, optional = true }
wgpu = { workspace = true, features = ["wgsl", "metal", "dx12"] }
diff --git a/benches/benches/root.rs b/benches/benches/root.rs
index 064617783..c4aa3529c 100644
--- a/benches/benches/root.rs
+++ b/benches/benches/root.rs
@@ -6,6 +6,24 @@ mod renderpass;
mod resource_creation;
mod shader;
+fn init_logging() {
+ use tracing_subscriber::{prelude::*, registry::Registry, EnvFilter};
+
+ let filter = "wgpu=error,naga=warn".to_string();
+ let level = tracing::Level::INFO;
+ let default_filter = { format!("{},{}", level, filter) };
+ let filter_layer = EnvFilter::try_from_default_env()
+ .or_else(|_| EnvFilter::try_new(&default_filter))
+ .unwrap();
+
+ let fmt_layer = tracing_subscriber::fmt::Layer::default().with_writer(std::io::stderr);
+
+ let subscriber = Registry::default().with(filter_layer).with(fmt_layer);
+
+ tracing_log::LogTracer::init().unwrap();
+ tracing::subscriber::set_global_default(subscriber).unwrap();
+}
+
struct DeviceState {
adapter_info: wgpu::AdapterInfo,
device: wgpu::Device,
@@ -14,6 +32,7 @@ struct DeviceState {
impl DeviceState {
fn new() -> Self {
+ init_logging();
#[cfg(feature = "tracy")]
tracy_client::Client::start(); Bench results
(2) 1 with max level passed to NOTE: The benchmarks varied a bit for me and I didn't put any effort into creating a stable benchmarking environment. I.E. I've see differences as significant as -10% in other runs comparing 1 vs 2. diffdiff --git a/benches/benches/root.rs b/benches/benches/root.rs
index c4aa3529c..d3cea8afe 100644
--- a/benches/benches/root.rs
+++ b/benches/benches/root.rs
@@ -7,7 +7,8 @@ mod resource_creation;
mod shader;
fn init_logging() {
- use tracing_subscriber::{prelude::*, registry::Registry, EnvFilter};
+ use tracing_log::AsLog;
+ use tracing_subscriber::{filter::LevelFilter, prelude::*, registry::Registry, EnvFilter};
let filter = "wgpu=error,naga=warn".to_string();
let level = tracing::Level::INFO;
@@ -15,12 +16,13 @@ fn init_logging() {
let filter_layer = EnvFilter::try_from_default_env()
.or_else(|_| EnvFilter::try_new(&default_filter))
.unwrap();
+ let max_level = filter_layer.max_level_hint().unwrap_or(LevelFilter::TRACE);
let fmt_layer = tracing_subscriber::fmt::Layer::default().with_writer(std::io::stderr);
let subscriber = Registry::default().with(filter_layer).with(fmt_layer);
- tracing_log::LogTracer::init().unwrap();
+ tracing_log::LogTracer::init_with_filter(max_level.as_log()).unwrap();
tracing::subscriber::set_global_default(subscriber).unwrap();
} Bench results
(3) 9c6ae1b with changes from 1 (compared against 1) Here we look at the impact of the log cleanup when logs aren't being filtered out immediately using the max level. Bench results
(4) 1 with trace level logs statically disabled (compared against 1) diffdiff --git a/Cargo.lock b/Cargo.lock
index 91b650830..2bee5ccd0 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -4257,6 +4257,7 @@ dependencies = [
"bincode",
"bytemuck",
"criterion",
+ "log",
"naga",
"nanorand",
"once_cell",
diff --git a/benches/Cargo.toml b/benches/Cargo.toml
index 75745579b..848faa22d 100644
--- a/benches/Cargo.toml
+++ b/benches/Cargo.toml
@@ -47,3 +47,4 @@ tracing-log = "0.2.0"
tracing-subscriber = { version = "0.3.18", features = ["env-filter"] }
tracy-client = { workspace = true, optional = true }
wgpu = { workspace = true, features = ["wgsl", "metal", "dx12"] }
+log = { version = "0.4", features = ["release_max_level_debug"] } Bench results
I think it would be interesting to see the impact of the change from 2 in bevy's Going back to the topic of selectively enabling trace level logs. In my previous investigations,
In my experience, the typical slowness of the default debug build configuration leads to enabling optimizations for the debug/development profile to make things actually usable. So I don't think the slowness of default debug builds can be relied on to make the logging overhead negligible. |
Thanks @Imberflur! Sounds like there is some digging to do on the bevy side to configure the tracing/logging in the most efficient way. I'm operating under the assumption that overhead associated with logs that are filtered out by I'll rebase #6010 and see if it makes a difference on top of your
Interesting. @ErichDonGubler recently suggested switching to |
Previous misunderstanding lead to an incorrect *Performance* section implying the log filter string was the cause of performance loss. Rather, there is performance to be _gained_ by using a compile time feature of the `log` crate to eliminate logging calls entirely. Updated information is based on the [`log` documentation](https://docs.rs/log/#compile-time-filters) and benchmark 4 from [this GitHub comment](gfx-rs/wgpu#6046 (comment))
Context
Bevy folks measured that wgpu's trace-level logging when enabled at build time but disabled at run time can have a significant overhead (15%). This overhead comes from checking a global atomic variable that defines the log level.
In #6017, @Elabajaba proposes gating api logs and resource logs (trace level by default) behind a build flag.
I'm not fond of adding another feature flag for this, so I would like to explore other options first.
Disable trace level logs at build time
The most straightforward option is to note that trace level logging is the most verbose out of 5 logging levels. If the most verbose logging level is too verbose for a bevy application, then bevy could disable trace-level logging in optimized builds. This removes the performance concerns for bevy while letting its dependencies and bevy itself have a log level that has potentially verbose logs that are useful for development or for debugging particularly tricky issues. Firefox, for example, disable both trace and debug log levels in optimized builds by default for the same reason.
Debug builds being orders of magnitude slower than release builds, I believe it is safe to assume trace-level logging overhead to be negligible (unless someone has benchmarks to show otherwise).
I'm tempted to leave it at that. That said, 15% overhead is a lot and there should be ways to mitigate some of it without requiring more build configurations on the wgpu side.
Reduce the overhead of some of the logs
For context, wgpu's repository has a few benchmarks. Removing all trace-level logs yields improvements ranging from 0 to 5% on these benchmarks. This doesn't mean the that reducing the logging overhead will improve bevy by at most 5%, it only means that the highest improvement we can observe through these benchmarks is in that range. We'll have to scale these numbers a bit to transpose to other projects and getting some performance measurements from bevy itself would be valuable.
Resource logs
Resource logs only happen when creating and destroying API objects. For a typical application or game that's fairly rare (in the order of a few of times per frame). Not worth optimizing. Removing them entirely does not affect wgpu's benchmarks.
API logs
API logs can be very verbose. Most of the API log traffic in a typical frame would be in render and compute passes with one log per command (and games tend to submit a lot of commands per render pass). In #6010 I proposed to check the log level once at the beginning of the render pass instead of once per command. The idea was that checking a local variable instead of a global atomic at a high-ish frequency should be faster. I ran the wgpu's render pass benchmarks on two computes and the numbers (1) are puzzling. It is a mixed bag of improvements and regressions in the [-1.4% .. +1.7%] range.
Removing the api logs entirely was just as puzzling. The results (3) show either no changes or regressions.
Why? I don't know, I'm curious. It's easy to speculate but digging into it would take time. I ran the measurements multiple times on two computers and the results are consistent.
Other trace-level logs
There's a fair amount of trace-level logging outside of api and resource logs. Some of them might not be useful anymore and could be removed.
I'll open a PR to remove some of it and maintainers can chime in to "save" any log they care about. We already discussed keeping api, resource and initialization related logs.
Conclusions
Hard to draw conclusions at this point other that optimized builds are faster without trace-level logs, but that focusing on resource and api logs yields inconclusive results.
Measurements
Focusing on the render pass benchmarks here since other benchmarks were mostly not affected by the changes.
(1) Disable all trace-level logs
(2) Reduce the overhead of api logs in render passes
By checking the global log level once per render pass instead of once per command (#6010).
(3) Remove api logs at build time
The text was updated successfully, but these errors were encountered: