Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust external allocated memory less often #429

Merged

Conversation

benjamn
Copy link
Contributor

@benjamn benjamn commented Dec 31, 2019

Calling v8::Isolate::AdjustAmountOfExternalAllocatedMemory frequently appears to result in excessive garbage collections and wasted CPU cycles, per discussion here and as stated in the Node documentation: "Registering externally allocated memory will trigger global garbage collections more often than it would otherwise."

While I think the V8 implementation could/should do a better job of handling requests to adjust external memory without over-reacting to each and every request, in the meantime the fibers library can work around the issue by not calling uni::AdjustAmountOfExternalAllocatedMemory as often. You can see in the Meteor 1.9 PR discussion that this change seems to have fixed the CPU spikes that multiple developers encountered in production, and passes all of Meteor's tests, which exercise fibers pretty thoroughly.

This fix was inspired by marudor/libxmljs2#22, which addressed nodejs/node#30995. I would note that the current implementation in this PR under-reports external memory usage, since the reporting lags behind the actual usage by up to ~10MB. I'm definitely open to switching things around a bit so that uni::AdjustAmountOfExternalAllocatedMemory gets called with a larger number of bytes than necessary, thereby over-reporting the external memory usage, if you prefer that approach.

Here's another project that benefitted from adjusting external allocated memory less often: mapnik/node-mapnik#136

It would be ideal to ship Meteor 1.9 with an official version of fibers, rather than a GitHub URL, but I realize that's asking for your time at a busy/vacation time of year.

Apparently calling v8::Isolate::AdjustAmountOfExternalAllocatedMemory
frequently results in lots of wasted CPU cycles on garbage collection, per
discussion here: meteor/meteor#10527 (comment)

This fix was inspired by marudor/libxmljs2#22,
which seems to have addressed nodejs/node#30995.

Another project that benefitted from adjusting external allocated memory
less often: mapnik/node-mapnik#136
Thanks to the allocation of the Coroutine stack (see set_stack_size), each
Fiber has a size of approximately 1MB on a 64-bit machine, so using a
threshold of 1MB gives no improvement over adjusting the memory every
single time a fiber starts or exits. Instead (with this commit), we adjust
memory whenever the gap exceeds 10x the size of a Fiber.
src/fibers.cc Outdated Show resolved Hide resolved
@laverdet
Copy link
Owner

Doesn't v8 already have some threshold logic?

https://github.com/v8/v8/blob/e08436ce07c11eb1a43b2eb5065da1f6a1f89c6b/include/v8.h#L11799-L11838

It seems like v8 won't actually send a notification to the GC unless more than 32mb has been allocated. Each fiber is a 1mb so that would be a lot of fibers.

Also did you experiment with just removing AdjustAmountOfExternalAllocatedMemory entirely? That might be a good idea because GC probably isn't finding a lot of orphaned fibers anyway.

@benjamn
Copy link
Contributor Author

benjamn commented Jan 2, 2020

It looks like V8 has imposed a threshold since v6.1.58, which was first present in Node v9.0.0, which updated V8 to v6.1.534.36. The threshold was only 1MB until v6.1.94, when it increased to 32MB (its current value)… but that change should also have been included in the Node 9 changes (and thus Node 12.14.0 too). Hmm.

Without doing a lot more digging, I don't have a good answer as to why the V8 thresholding was inadequate, though I would note that this area of the code is fairly complicated, with some significant known bugs like getting the subtraction backwards. Perhaps there are still unknown bugs that cause the memory pressure to be checked too often? It's either that, or multiple Meteor developers are misreading their CPU charts in the same way, which seems unlikely.

In other words, I would be more than happy for fibers to stop using AdjustAmountOfExternalAllocatedMemory entirely, given that it seems to be causing more problems than it's solving—which, as you pointed out, may be a positive number of problems versus zero real benefits.

Is there a particular experiment you would like to see to validate this removal, other than running a patched version of fibers through the Meteor test suite?

@benjamn
Copy link
Contributor Author

benjamn commented Jan 3, 2020

@laverdet Ok, from the Meteor perspective, simply not calling AdjustAmountOfExternalAllocatedMemory seems to work.

Would you be willing to release this (much simpler) change as [email protected]? If you're not 100% confident yet, we could avoid tagging 4.0.3 as latest for the time being, so Meteor could install it explicitly, but it would not be the default version of fibers installed from npm. Even [email protected] would allow us to release Meteor 1.9 without depending on a GitHub URL.

Besides your argument about V8 not having much opportunity to collect Fibers, I'm fairly confident this approach is reasonable, because Fiber objects are pooled/recycled, which effectively provides a form of garbage collection, and the way V8 attempts to do thresholding suggests absolute precision in the reporting of external memory is not very important, so it seems safe to skip the reporting entirely.

@laverdet laverdet merged commit a63d776 into laverdet:master Jan 7, 2020
@laverdet
Copy link
Owner

laverdet commented Jan 7, 2020

Sounds good to me. Based on my experience the v8 garbage collector is perfectly capable of shooting itself in the foot without us helping.

@laverdet
Copy link
Owner

laverdet commented Jan 7, 2020

4.0.3 is on npm and untagged as suggested. I didn't test any of the binaries and rushed through the deployment but I had my fingers crossed so everything should be fine.

benjamn added a commit to meteor/meteor that referenced this pull request Jan 8, 2020
This includes laverdet/node-fibers#429, fixing the
CPU spikes reported and discussed here: #10527 (comment)

Using an official fibers release rather than a GitHub URL is preferable
because it doesn't require building fibers from source when deploying a
Meteor app to production, and also doesn't rely on GitHub being
operational, though of course it does rely on other networked services
like npm.
@benjamn
Copy link
Contributor Author

benjamn commented Jan 8, 2020

I did notice in the Galaxy logs that [email protected] is falling back to compilation from source because there's a problem with the binary:

> [email protected] install /app/bundle/programs/server/node_modules/fibers
> node build.js || nodejs build.js

`linux-x64-72-glibc` exists; testing
Problem with the binary; manual build incoming
make: Entering directory `/app/bundle/programs/server/node_modules/fibers/build'
 CXX(target) Release/obj.target/fibers/src/fibers.o
 CXX(target) Release/obj.target/fibers/src/coroutine.o
 CC(target) Release/obj.target/fibers/src/libcoro/coro.o
 SOLINK_MODULE(target) Release/obj.target/fibers.node
 COPY Release/fibers.node
make: Leaving directory `/app/bundle/programs/server/node_modules/fibers/build'
Installed in `/app/bundle/programs/server/node_modules/fibers/bin/linux-x64-72-glibc/fibers.node`

However, this was also happening for [email protected], and I can install [email protected] without compiling it on a different x86_64 linux machine, so I think this is just a harmless problem with the specific Docker image we use for Galaxy (maybe a different GLIBC version, not sure). Anyway, since the compilation succeeds, it should always succeed, so this seems a Galaxy problem that we can ignore for now.

@filipenevola If this ever becomes a problem, I think we just need to update our base Ubuntu image here: https://github.com/meteor/galaxy-images/blob/master/galaxy-app/Dockerfile#L1 (let me know if you need my help with that).

@laverdet
Copy link
Owner

laverdet commented Jan 8, 2020

No telling. The linux binaries come from a Gentoo server I have laying around that hasn't seen an update in 5 years or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants