-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-evaluate --use-memory-manager #2581
Comments
Quick ad-hoc benchmark using the tgen tests. v.2.3.0:
v2.3.0 with
Seeing some perf benefit here; 780s vs 770s or about 1%. |
Another example of a user needing to disable this feature: #2622 If we don't remove this feature, we should probably stabilize it, and:
|
Results: https://github.com/shadow/benchmark-results/tree/master/tgen/2022-12-25-T05-07-08 The "real time" graph in https://github.com/shadow/benchmark-results/blob/master/tgen/2022-12-25-T05-07-08/plots/shadow.results.pdf shows a slight performance penalty with the MM disabled. The RSS gets significantly smaller, since without the MM enabled, the plugin memory isn't mapped into shadow. This doesn't necessarily say anything about total system memory usage. |
Is it worth also running a manual Tor benchmark? Because of the better tooling, I think that'll give us the overall system memory usage too (i.e., because we're running |
Yeah I'll kick one off. I think we've run into this total memory vs rss issue before and found the MM not to consume more memory, but it's also worth checking the performance difference on our primary use case. Notably, a user in #2686 (comment) reported a large performance difference in their tor simulation, though there are some other variables in play there. Running: https://github.com/shadow/benchmark/actions/runs/3988395349 |
Hmm, looks like GH still thinks we're over quota on storage, and discarded artifacts from this run :/. https://github.com/shadow/benchmark/actions/runs/3988395349/jobs/6839490367. I'm guessing there's some periodic job that re-evaluates storage vs quota, that hasn't run since cleaning up storage earlier today. Can get a rough idea from the logs at least. Without the MM:
vs the most recent nightly https://github.com/shadow/benchmark/actions/runs/3964662216/jobs/6793720522
I think that RAM is RSS again, so we still don't have a RAM comparison. As expected, there's a slight performance penalty for disabling the MM - ~2.6%. Probably enough benefit to justify keeping it around, but maybe not enough to keep it as the default. |
After discussion with @robgjansen, the tentative plan is:
|
Revisiting this now that
New runs: tor: https://github.com/shadow/benchmark/actions/runs/6318052173 |
The effect on the tgen benchmark appears to be negligible: https://github.com/shadow/benchmark-results/tree/master/tgen/2023-09-26-T20-35-40 |
The Tor benchmark seems to be a bit slower, as before, though it has an overlapping confidence interval with the weekly. https://github.com/shadow/benchmark-results/tree/master/tor/2023-09-26-T20-37-14 |
Progress on #2581 I'm still on the fence about ripping out entirely. I think we definitely want to disable by default so that `fork` works by default. I'd feel better waiting a release or two to give ourselves a chance to change our minds before removing it.
We currently have two paths to access managed process memory. The default (
--use-memory-manager=true
) mmap's most of managed process memory into shared memory (/dev/shm
) so that shadow can access it directly. The fallback mechanism (--use-memory-manager=false
, or for accessing regions we haven't mmapd) is to useprocess_vm_readv
andprocess_vm_writev
to read and write the memory.In early iterations of shadow 2.0, the mapping path was a bigger win, because the fallback mechanisms were slower (e.g. transferring control back to the plugin and asking it to copy chunks of memory into a fixed-size shared memory buffer for us in "preload" mode;
PTRACE_PEEKDATA
or file operations on/proc/.../mem
in "ptrace" mode).Last time we looked at it, mapping does still give us a moderate win in I/O intensive microbenchmarks, but I'm not sure it ends up being a significant win for more realistic simulations.
Downsides of using the mapping path:
It may be worth doing some benchmarks on more realistic simulation (e.g. tor) to check how much performance benefit the mapping path gives us there. If it's not much, we might want to change the default, and maybe even consider removing that code.
The text was updated successfully, but these errors were encountered: