Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in v12.13.1 during GC #30875

Closed
Sebmaster opened this issue Dec 10, 2019 · 17 comments
Closed

Segfault in v12.13.1 during GC #30875

Sebmaster opened this issue Dec 10, 2019 · 17 comments
Labels
v8 engine Issues and PRs related to the V8 dependency.

Comments

@Sebmaster
Copy link
Contributor

Sebmaster commented Dec 10, 2019

Unfortunately it seems like I'm running into a consistently reproducible (with feedback loops of 20ish minutes) segfault during GC. Sadly it's part of a data pipeline so there's significant data flowing through the system so I don't even know how to start creating a repro. I managed to get a core dump however so I can run whatever commands are needed for debugging.

Basic gdb info is as follows:

Core was generated by `/usr/local/bin/node cli.js'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000cae8df in v8::internal::ArrayBufferTracker::PrepareToFreeDeadInNewSpace(v8::internal::Heap*) ()
[Current thread is 1 (Thread 0x7f46ee176740 (LWP 30))]
(gdb) where
#0  0x0000000000cae8df in v8::internal::ArrayBufferTracker::PrepareToFreeDeadInNewSpace(v8::internal::Heap*) ()
#1  0x0000000000d4c47a in v8::internal::ScavengerCollector::CollectGarbage() ()
#2  0x0000000000cddd81 in v8::internal::Heap::Scavenge() ()
#3  0x0000000000cf1eb3 in v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) ()
#4  0x0000000000cf2a65 in v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) ()
#5  0x0000000000cf5478 in v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) ()
#6  0x0000000000cbbda7 in v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType) ()
#7  0x0000000000ff1e0b in v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) ()
#8  0x0000000001374fd9 in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit () at ../../deps/v8/../../deps/v8/src/builtins/base.tq:3028
#9  0x0000080db0c38adb in ?? ()
#10 0x0000000000000000 in ?? ()

Only native module in the project is [email protected].

I'll try to bisect some node versions to maybe make finding the root cause easier.

@addaleax addaleax added the v8 engine Issues and PRs related to the V8 dependency. label Dec 11, 2019
@addaleax
Copy link
Member

I'll try to bisect some node versions to maybe make finding the root cause easier.

That might be helpful; also, if you can, maybe give a debug build of Node.js a try. It’ll be slower but could provide a lot more information about what’s going wrong.

@mhassan1
Copy link

mhassan1 commented Jan 7, 2020

I am also seeing a reproducible segfault in 12.13.1. @Sebmaster did you ever discover anything beyond your initial comment?

@Sebmaster
Copy link
Contributor Author

I'm hoping to do some testing tomorrow. In our case this is only consistently reproducible when spinning up hundreds of child processes; limiting that number has allowed us to work around it for now.

@Sebmaster
Copy link
Contributor Author

I tried replicating this again, but unfortunately wasn't able to (with a slightly changed setup) anymore. So I don't think I can track this down anymore. Sorry about that.

@CSLTech
Copy link

CSLTech commented Mar 3, 2020

We seem to be hitting very similar symptoms on 12.16.1. Our app doesn't use any native modules, so we can rule this out.

If there is anything that we can try to diagnose the issue, I'm willing to try it.

@addaleax
Copy link
Member

addaleax commented Mar 3, 2020

@CSLTech Yeah, it would be helpful to:

  1. Get a stack trace to verify that this is, in fact, the same issue as the one reported here originally
  2. Have a reproduction, if possible, or maybe at least a core dump if it isn’t
  3. Know which Node.js versions are affected.

@mmarchini
Copy link
Contributor

FYI, lldb usually produces a better stack trace for Node.js processes (gdb gets lost while unwinding V8 frames). llnode might be useful as well to see where in the JS stack your code is crashing.

A core dump as @addaleax suggested would be great, but be careful not to share sensitive information publicly (if it is a core dump from a production server, or any application that deals with passwords in any way, it might be better not to share).

@CSLTech
Copy link

CSLTech commented Mar 8, 2020

Hi,

Here is the stack trace that is returned by segfault-handler. This was running 12.15.0:

/home/instant/run/run_20200306_220309/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x2ca1)[0x7fdd90423ca1]
/lib64/libpthread.so.0(+0xf5f0)[0x7fdd91a055f0]
node(_ZN2v88internal18ArrayBufferTracker27PrepareToFreeDeadInNewSpaceEPNS0_4HeapE+0xef)[0xcafe1f]
node(_ZN2v88internal18ScavengerCollector14CollectGarbageEv+0x109a)[0xd4d9ba]
node(_ZN2v88internal4Heap8ScavengeEv+0x141)[0xcdf2c1]
node(_ZN2v88internal4Heap24PerformGarbageCollectionENS0_16GarbageCollectorENS_15GCCallbackFlagsE+0x663)[0xcf33f3]
node(_ZN2v88internal4Heap14CollectGarbageENS0_15AllocationSpaceENS0_23GarbageCollectionReasonENS_15GCCallbackFlagsE+0x215)[0xcf3fa5]
node(_ZN2v88internal4Heap26AllocateRawWithRetryOrFailEiNS0_14AllocationTypeENS0_19AllocationAlignmentE+0x48)[0xcf69b8]
node(_ZN2v88internal7Factory19NewRawOneByteStringEiNS0_14AllocationTypeE+0x36)[0xcc4606]
node(_ZN2v88internal7Factory17NewStringFromUtf8ERKNS0_6VectorIKcEENS0_14AllocationTypeE+0x8d)[0xcc4dbd]
node(_ZN2v86String11NewFromUtf8EPNS_7IsolateEPKcNS_13NewStringTypeEi+0xbf)[0xb538df]
node(_ZN4node11StringBytes6EncodeEPN2v87IsolateEPKcmNS_8encodingEPNS1_5LocalINS1_5ValueEEE+0x5c0)[0xa92590]
node[0x9b4fc6]
node[0x12f776d]

@mmarchini
Copy link
Contributor

Thanks for sharing. It does look like the same GC stack trace, but the first few frames are different.

Demangled stack trace:

/home/instant/run/run_20200306_220309/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x2ca1)[0x7fdd90423ca1]
/lib64/libpthread.so.0(+0xf5f0)[0x7fdd91a055f0]
node(v8::internal::ArrayBufferTracker::PrepareToFreeDeadInNewSpace(v8::internal::Heap*)+0xef)[0xcafe1f]
node(v8::internal::ScavengerCollector::CollectGarbage()+0x109a)[0xd4d9ba]
node(v8::internal::Heap::Scavenge()+0x141)[0xcdf2c1]
node(v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags)+0x663)[0xcf33f3]
node(v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags)+0x215)[0xcf3fa5]
node(v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationAlignment)+0x48)[0xcf69b8]
node(v8::internal::Factory::NewRawOneByteString(int, v8::internal::AllocationType)+0x36)[0xcc4606]
node(v8::internal::Factory::NewStringFromUtf8(v8::internal::Vector<char const> const&, v8::internal::AllocationType)+0x8d)[0xcc4dbd]
node(v8::String::NewFromUtf8(v8::Isolate*, char const*, v8::NewStringType, int)+0xbf)[0xb538df]
node(node::StringBytes::Encode(v8::Isolate*, char const*, unsigned long, node::encoding, v8::Local<v8::Value>*)+0x5c0)[0xa92590]
node[0x9b4fc6]
node[0x12f776d]

Are you experiencing the same issue on v12.16.1? V8 was upgraded on 12.16.0, if the issue was on V8 maybe it is fixed already.

@CSLTech
Copy link

CSLTech commented Mar 8, 2020

Hi,

Here is the equivalent stack trace from 12.16.1 (Same JS code)

/home/instant/run/run_20200219_180646/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x2ca1)[0x7f2e884daca1]
/lib64/libpthread.so.0(+0xf5f0)[0x7f2e89abc5f0]
node(_ZN2v88internal18ArrayBufferTracker27PrepareToFreeDeadInNewSpaceEPNS0_4HeapE+0xef)[0xcd410f]
node(_ZN2v88internal18ScavengerCollector14CollectGarbageEv+0x109a)[0xd7273a]
node(_ZN2v88internal4Heap8ScavengeEv+0x141)[0xd03221]
node(_ZN2v88internal4Heap24PerformGarbageCollectionENS0_16GarbageCollectorENS_15GCCallbackFlagsE+0x663)[0xd17963]
node(_ZN2v88internal4Heap14CollectGarbageENS0_15AllocationSpaceENS0_23GarbageCollectionReasonENS_15GCCallbackFlagsE+0x215)[0xd18515]
node(_ZN2v88internal4Heap26AllocateRawWithRetryOrFailEiNS0_14AllocationTypeENS0_16AllocationOriginENS0_19AllocationAlignmentE+0x4c)[0xd1afcc]
node(_ZN2v88internal7Factory13NewFixedArrayEiNS0_14AllocationTypeE+0x6e)[0xce1e9e]
node[0xe72c64]
node(_ZN2v88internal18FastKeyAccumulator11GetKeysFastENS0_17GetKeysConversionE+0x1e2)[0xeca5b2]
node(_ZN2v88internal14KeyAccumulator7GetKeysENS0_6HandleINS0_10JSReceiverEEENS0_17KeyCollectionModeENS0_14PropertyFilterENS0_17GetKeysConversionEbb+0xf3)[0xecc203]
node(_ZN2v88internal15JsonStringifier23SerializeJSReceiverSlowENS0_6HandleINS0_10JSReceiverEEE+0x60a)[0xe031aa]
node(_ZN2v88internal15JsonStringifier10Serialize_ILb0EEENS1_6ResultENS0_6HandleINS0_6ObjectEEEbS6_+0x802)[0xe044c2]
node(_ZN2v88internal15JsonStringifier22SerializeArrayLikeSlowENS0_6HandleINS0_10JSReceiverEEEjj+0x1f3)[0xe05503]
node(_ZN2v88internal15JsonStringifier10Serialize_ILb0EEENS1_6ResultENS0_6HandleINS0_6ObjectEEEbS6_+0x5b8)[0xe04278]
node(_ZN2v88internal13JsonStringifyEPNS0_7IsolateENS0_6HandleINS0_6ObjectEEES5_S5_+0xd4)[0xe058b4]
node(_ZN2v88internal21Builtin_JsonStringifyEiPmPNS0_7IsolateE+0x5b)[0xc0f24b]
node[0x13a72b9]

@mmarchini
Copy link
Contributor

Unfortunately the stack alone doesn't say much. This last one happened during JSON.stringify(...) and the previous one during some string operation I think (not sure). @CSLTech are you still affected by this issue? Were you able to find a reproducible code for it?

@CSLTech
Copy link

CSLTech commented Apr 10, 2020

Hi,

We have semi-reproductible code for it. We've tracked down the issues to one of our internal libraries that is shared by a few processes. All of which are affected by the issue. We are working on getting more information, but it's a bit slow going.

@dfoody
Copy link

dfoody commented Jun 19, 2020

@CSLTech - have you made any progress on this? We're seeing a very similar crash on v12.18.0.

 * thread #1: tid = 20603, 0x0000000000cf740f node.v12.prod`v8::internal::ArrayBufferTracker::PrepareToFreeDeadInNewSpace(v8::internal::Heap*) + 239, name = 'node', stop reason = signal SIGSEGV
  * frame #0: 0x0000000000cf740f node.v12.prod`v8::internal::ArrayBufferTracker::PrepareToFreeDeadInNewSpace(v8::internal::Heap*) + 239
    frame #1: 0x0000000000d957ea node.v12.prod`v8::internal::ScavengerCollector::CollectGarbage() + 4250
    frame #2: 0x0000000000d26331 node.v12.prod`v8::internal::Heap::Scavenge() + 321
    frame #3: 0x0000000000d3aacb node.v12.prod`v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) + 1611
    frame #4: 0x0000000000d3b635 node.v12.prod`v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) + 533
    frame #5: 0x0000000000d3e0ec node.v12.prod`v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) + 76
    frame #6: 0x0000000000d04cbb node.v12.prod`v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) + 43
    frame #7: 0x00000000010464be node.v12.prod`v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) + 158
    frame #8: 0x00000000013cb519 <exit>

@ruchiraw
Copy link

ruchiraw commented Jun 23, 2020

Thanks for sharing. It does look like the same GC stack trace, but the first few frames are different.

Demangled stack trace:

/home/instant/run/run_20200306_220309/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x2ca1)[0x7fdd90423ca1]
/lib64/libpthread.so.0(+0xf5f0)[0x7fdd91a055f0]
node(v8::internal::ArrayBufferTracker::PrepareToFreeDeadInNewSpace(v8::internal::Heap*)+0xef)[0xcafe1f]
node(v8::internal::ScavengerCollector::CollectGarbage()+0x109a)[0xd4d9ba]
node(v8::internal::Heap::Scavenge()+0x141)[0xcdf2c1]
node(v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags)+0x663)[0xcf33f3]
node(v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags)+0x215)[0xcf3fa5]
node(v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationAlignment)+0x48)[0xcf69b8]
node(v8::internal::Factory::NewRawOneByteString(int, v8::internal::AllocationType)+0x36)[0xcc4606]
node(v8::internal::Factory::NewStringFromUtf8(v8::internal::Vector<char const> const&, v8::internal::AllocationType)+0x8d)[0xcc4dbd]
node(v8::String::NewFromUtf8(v8::Isolate*, char const*, v8::NewStringType, int)+0xbf)[0xb538df]
node(node::StringBytes::Encode(v8::Isolate*, char const*, unsigned long, node::encoding, v8::Local<v8::Value>*)+0x5c0)[0xa92590]
node[0x9b4fc6]
node[0x12f776d]

Are you experiencing the same issue on v12.16.1? V8 was upgraded on 12.16.0, if the issue was on V8 maybe it is fixed already.

@CSLTech were you able to root cause it or find a reproducible code?

We also experience similar errors on node v12.15.0 which is causing our production servers to crash intermittently.

Following are two different segfault errors which caused our servers to crash.

/srv/www/workers/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x2c81)[0x7efe78bbdc81]
/lib64/libpthread.so.0(+0xf600)[0x7efe7bba4600]
/usr/local/bin/node(_ZN2v88internal18ArrayBufferTracker27PrepareToFreeDeadInNewSpaceEPNS0_4HeapE+0xef)[0xcafe1f]
/usr/local/bin/node(_ZN2v88internal18ScavengerCollector14CollectGarbageEv+0x109a)[0xd4d9ba]
/usr/local/bin/node(_ZN2v88internal4Heap8ScavengeEv+0x141)[0xcdf2c1]
/usr/local/bin/node(_ZN2v88internal4Heap24PerformGarbageCollectionENS0_16GarbageCollectorENS_15GCCallbackFlagsE+0x663)[0xcf33f3]
/usr/local/bin/node(_ZN2v88internal4Heap14CollectGarbageENS0_15AllocationSpaceENS0_23GarbageCollectionReasonENS_15GCCallbackFlagsE+0x215)[0xcf3fa5]
/usr/local/bin/node(_ZN2v88internal4Heap26AllocateRawWithRetryOrFailEiNS0_14AllocationTypeENS0_19AllocationAlignmentE+0x48)[0xcf69b8]
/usr/local/bin/node(_ZN2v88internal7Factory19NewRawOneByteStringEiNS0_14AllocationTypeE+0x36)[0xcc4606]
/usr/local/bin/node(_ZN2v88internal7Factory18NewProperSubStringENS0_6HandleINS0_6StringEEEii+0x141)[0xcc5651]
/usr/local/bin/node(_ZN2v88internal19Runtime_StringSplitEiPmPNS0_7IsolateE+0x2ac)[0x101f49c]
/usr/local/bin/node[0x1376519]
#
# Fatal error in , line 0
# Check failed: AllowJavascriptExecution::IsAllowed(isolate).
/srv/www/workers/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x2c81)[0x7f7a07bfbc81]
/lib64/libpthread.so.0(+0xf600)[0x7f7a16dc2600]
/usr/local/nodejs-binary-12.15.0/bin/node(_ZN2v88internal18ArrayBufferTracker27PrepareToFreeDeadInNewSpaceEPNS0_4HeapE+0xef)[0xcafe1f]
/usr/local/nodejs-binary-12.15.0/bin/node(_ZN2v88internal18ScavengerCollector14CollectGarbageEv+0x109a)[0xd4d9ba]
/usr/local/nodejs-binary-12.15.0/bin/node(_ZN2v88internal4Heap8ScavengeEv+0x141)[0xcdf2c1]
/usr/local/nodejs-binary-12.15.0/bin/node(_ZN2v88internal4Heap24PerformGarbageCollectionENS0_16GarbageCollectorENS_15GCCallbackFlagsE+0x663)[0xcf33f3]
/usr/local/nodejs-binary-12.15.0/bin/node(_ZN2v88internal4Heap14CollectGarbageENS0_15AllocationSpaceENS0_23GarbageCollectionReasonENS_15GCCallbackFlagsE+0x215)[0xcf3fa5]
/usr/local/nodejs-binary-12.15.0/bin/node(_ZN2v88internal4Heap26AllocateRawWithRetryOrFailEiNS0_14AllocationTypeENS0_19AllocationAlignmentE+0x48)[0xcf69b8]
/usr/local/nodejs-binary-12.15.0/bin/node(_ZN2v88internal7Factory15NewFillerObjectEibNS0_14AllocationTypeE+0x27)[0xcbd2e7]
/usr/local/nodejs-binary-12.15.0/bin/node(_ZN2v88internal33Runtime_AllocateInYoungGenerationEiPmPNS0_7IsolateE+0x9b)[0xff334b]
/usr/local/nodejs-binary-12.15.0/bin/node[0x1376519]

@mmarchini
Copy link
Contributor

# Fatal error in , line 0
# Check failed: AllowJavascriptExecution::IsAllowed(isolate).

That failed check might give us a clue about what is happening, although looking at the code the check doesn't fit the stack trace. We still need a reproducible code to track down this issue (I tried to come up with some based on the stack traces, but none of the examples I tried are crashing).

If anyone experiencing this issue could share the output of pmap <node pid> (replace with the PID of the running Node.js process), that would help rule out any direct or transitive native dependency issues.

@dfoody
Copy link

dfoody commented Jul 26, 2020

FWIW we upgraded to v14.4.0 and the problem went away.

@targos targos added the v12.x label Nov 20, 2021
@targos
Copy link
Member

targos commented Apr 8, 2022

Closing this issue because v12.x goes EOL at the end of this month and no more releases are planned.

@targos targos closed this as completed Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v8 engine Issues and PRs related to the V8 dependency.
Projects
None yet
Development

No branches or pull requests

8 participants