-
Notifications
You must be signed in to change notification settings - Fork 7.3k
V8: avoid deadlock when profiling is active #25309
Conversation
The issue was reported multi time recently. cc @misterdjules |
FYI @tunniclm is validating this change against issues we've seen as well |
Background
In preparing to share this with the community I found @dmelikyan 's work. I wanted to check the fixes with my tests just in case they could find a similar window as (1). Result |
// ISSUE-14576: To avoid deadlock, return if there is a thread lock | ||
if (Isolate::GetProcessWideMutex().Pointer()->TryLock()) { | ||
Isolate::GetProcessWideMutex().Pointer()->Unlock(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: I'm not familiar with V8's code base's style guidelines, but I think this line should be } else {
(curly braces on the same line as the else
keyword).
@tunniclm Thank you for the tests report! @dmelikyan Besides my two comments, I think it's worth it to try to come up with a regression test, is this something you have some time to work on? Thank you very much again for your work and your patience! |
@misterdjules I've added a regression test. Not sure if "simple" if is the right place for it though. Also fixed the style problem. |
172f9c3
to
b9a5b85
Compare
@misterdjules Switched to using mutex pointer instead of object just in case related to your comments. |
|
||
var child = undefined; | ||
|
||
var deadlockTimer = setTimeout(function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests suite driver has a default timeout of 60 seconds, and we usually don't setup timeouts in the tests themselves, so I would suggest removing that mechanism. If the test times out, it will eventually be killed by the tests suite driver and the failure will be reported as a timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spawned and hanging node processes aren't killed by the test suite, because they do not die when parent process dies (at least on Mac). They should be killed explicitly. I've changed the way it's done so that it is compliant with the timeout option. Also running the test script multiple times now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmelikyan That's great, thank you!
@dmelikyan |
}); | ||
} | ||
|
||
var testScript = "var i = 0; function r() { if(++i > 25) return; setTimeout(r, 1); }; r();"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: wrap at 80 columns by concatenating the string.
@dmelikyan Thank you very much for making the changes according to my comments 👍 I think we're making some good progress to get this landed soon. I just have one remaining question and some styling comments. |
@misterdjules See latest commit for style changes. Other thing, running with --prof creates files, e.g. isolate-0x101004c00-v8.log in the node directory. Should we worry about it? I'll try to find a way to remove them or disable if possible at all. |
@dmelikyan That's a good question, I think it's worth it to investigate if we can remove them when the test completes. |
@dmelikyan @tunniclm So I would think that this should be ready to land when after we determine what can be done with the log files generated by the test. Thank you for the great work so far! |
@misterdjules There is a V8 option for disabling per isolate log files. See my last commit. No log files are generated now. |
@dmelikyan Great! Now let's just squash the commits into one and we'll be ready to land it. |
A deadlock happens when sampler initiated by SIGPROF tries to lock the thread and the thread is already locked by the same thread. As a result, other thread involved in sampling process hangs. The patch adds a check for thread lock before continuing sampler operation. The fix has been tested on a sample app under load with and without profiling turned on. Fixes issue nodejs#14576 and specifically the duplicate issue nodejs#25295
9503117
to
4e1b4d6
Compare
@misterdjules squashed. |
LGTM. |
A deadlock happens when sampler initiated by SIGPROF tries to lock the thread and the thread is already locked by the same thread. As a result, other thread involved in sampling process hangs. The patch adds a check for thread lock before continuing sampler operation. The fix has been tested on a sample app under load with and without profiling turned on. Fixes issue #14576 and specifically the duplicate issue #25295 Reviewed-By: Julien Gilli <[email protected]> PR-URL: #25309
Landed in b81a643, thank you all 👍 |
Added a reference to this change to the list of floating patches on top of V8 in v0.12. |
A deadlock happens when sampler initiated by SIGPROF tries to lock
the thread and the thread is already locked by the same thread. As
a result, other thread involved in sampling process hangs. The
patch adds a check for thread lock before continuing sampler
operation.
The fix has been tested on a sample app under load with and without
profiling turned on.
Fixes issue #14576 and specifically the duplicate issue #25295