Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault in ada::url_aggregator #49960

Closed
isaacs opened this issue Sep 29, 2023 · 8 comments · Fixed by #49984
Closed

segfault in ada::url_aggregator #49960

isaacs opened this issue Sep 29, 2023 · 8 comments · Fixed by #49984
Labels
confirmed-bug Issues with confirmed bugs. whatwg-url Issues and PRs related to the WHATWG URL implementation.

Comments

@isaacs
Copy link
Contributor

isaacs commented Sep 29, 2023

Version

v20.7.0

Platform

Darwin moxy.lan 22.6.0 Darwin Kernel Version 22.6.0: Fri Sep 15 13:41:28 PDT 2023; root:xnu-8796.141.3.700.8~1/RELEASE_ARM64_T6000 arm64

Subsystem

url

What steps will reproduce the bug?

It's unclear, unfortunately. I'm finding this only when running quite a lot of node processes at one time, when testing all the packages in the tapjs monorepo. It is very sporadic, and only happens on node 20.

How often does it reproduce? Is there a required condition?

Sporadically

What is the expected behavior? Why is that the expected behavior?

Expect that a segfault will not happen.

This is expected because programs that don't segfault tend to be more useful 😅

What do you see instead?

Occasional segfaults.

Additional information

Here's the stack trace where the segv happens:

Thread 7 Crashed:
0   node                          	       0x10145c7b0 ada::url_aggregator ada::parser::parse_url<ada::url_aggregator>(std::__1::basic_string_view<char, std::__1::char_traits<char>>, ada::url_aggregator const*) + 280
1   node                          	       0x1014614f0 ada::can_parse(std::__1::basic_string_view<char, std::__1::char_traits<char>>, std::__1::basic_string_view<char, std::__1::char_traits<char>> const*) + 384
2   ???                           	       0x1160a8bb0 ???
3   node                          	       0x100eaf210 Builtins_AsyncFunctionAwaitResolveClosure + 80
4   node                          	       0x100f5cfb8 Builtins_PromiseFulfillReactionJob + 56
5   node                          	       0x100e9eb94 Builtins_RunMicrotasks + 596
6   node                          	       0x100e763f4 Builtins_JSRunMicrotasksEntry + 148
7   node                          	       0x10074bc58 v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) + 2932
8   node                          	       0x10074c144 v8::internal::(anonymous namespace)::InvokeWithTryCatch(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) + 88
9   node                          	       0x10074c320 v8::internal::Execution::TryRunMicrotasks(v8::internal::Isolate*, v8::internal::MicrotaskQueue*) + 64
10  node                          	       0x1007733dc v8::internal::MicrotaskQueue::RunMicrotasks(v8::internal::Isolate*) + 324
11  node                          	       0x100773b78 v8::internal::MicrotaskQueue::PerformCheckpoint(v8::Isolate*) + 124
12  node                          	       0x1003b8c64 node::InternalCallbackScope::Close() + 252
13  node                          	       0x1003b901c node::InternalMakeCallback(node::Environment*, v8::Local<v8::Object>, v8::Local<v8::Object>, v8::Local<v8::Function>, int, v8::Local<v8::Value>*, node::async_context) + 576
14  node                          	       0x1003cf48c node::AsyncWrap::MakeCallback(v8::Local<v8::Function>, int, v8::Local<v8::Value>*) + 188
15  node                          	       0x1004c9b9c node::worker::MessagePort::OnMessage(node::worker::MessagePort::MessageProcessingMode) + 504
16  node                          	       0x100e55658 uv__async_io + 268
17  node                          	       0x100e67730 uv__io_poll + 1020
18  node                          	       0x100e55c1c uv_run + 476
19  node                          	       0x1003b9754 node::SpinEventLoopInternal(node::Environment*) + 256
20  node                          	       0x10053a7b8 node::worker::Worker::Run() + 2164
21  node                          	       0x10053dad0 node::worker::Worker::StartThread(v8::FunctionCallbackInfo<v8::Value> const&)::$_3::__invoke(void*) + 56
22  libsystem_pthread.dylib       	       0x181ee7fa8 _pthread_start + 148
23  libsystem_pthread.dylib       	       0x181ee2da0 thread_start + 8

Stack is always the same when the fault occurs.

Full macOS ips report: https://gist.github.com/isaacs/4f313a514b3a95e6268381e957fb32fe

Happy to capture a proper core dump and share it with y'all if that's useful, but I'd rather not put the contents of my computer's memory buffer on the internet in a gist, just in to be on the safe side. Could be some sensitive stuff in process.env or something.

@anonrig anonrig added the whatwg-url Issues and PRs related to the WHATWG URL implementation. label Sep 29, 2023
@anonrig
Copy link
Member

anonrig commented Sep 29, 2023

cc @lemire @nodejs/url

@anonrig
Copy link
Member

anonrig commented Sep 29, 2023

It appears that the bug is caused with this trace:

0   node                          	       0x1016ba17c ada::unicode::has_tabs_or_newline(std::__1::basic_string_view<char, std::__1::char_traits<char>>) + 80 (ada.cpp:9841) [inlined]

I've talked to @isaacs and appears that they're using M1 (NEON). We might have a bug in our SIMD solution @lemire

Ref: https://github.com/ada-url/ada/blob/main/src/unicode.cpp#L49

@mscdex
Copy link
Contributor

mscdex commented Sep 29, 2023

If it only crashes when you have a large number of processes, then it is likely not a bug in the C++ code. It is likely ressource exhaustion, and it has likely to do with Node.js/v8 stability.

If it's a segfault that would have to do with invalid memory access, not running out of memory.

@lemire
Copy link
Member

lemire commented Sep 29, 2023

If it's a segfault that would have to do with invalid memory access, not running out of memory.

If you try to allocate memory and it fails, and you have disabled exceptions in C++, you get back a null pointer. Dereferencing a null pointer is a segmentation fault.

I am not claiming that's the mechanism at play here, but we don't have a reproducible test case so one can only speculate. The only information that we have is that it only occurs if the system is highly stressed. Other possibilities include data races (outside of ada::can_parse since ada is single-threaded). Without a reproducible test case, it is difficult to do anything productive.

@lemire
Copy link
Member

lemire commented Sep 30, 2023

Fixed upstream in ada-url/ada#519

Thanks @isaacs for the report.

@lemire
Copy link
Member

lemire commented Sep 30, 2023

Both node.js and ada need to get ARM-based sanitized testing. For ada, I opened an issue at ada-url/ada#520

For node.js, someone should check. I know it runs sanitizers, but possibly only on x64 builds.

@anonrig
Copy link
Member

anonrig commented Sep 30, 2023

cc @nodejs/build

@anonrig anonrig added the confirmed-bug Issues with confirmed bugs. label Sep 30, 2023
@isaacs
Copy link
Contributor Author

isaacs commented Oct 1, 2023

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. whatwg-url Issues and PRs related to the WHATWG URL implementation.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants