Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node buffer allocation crash #23627

Closed
me4502 opened this issue Apr 30, 2020 · 22 comments
Closed

Node buffer allocation crash #23627

me4502 opened this issue Apr 30, 2020 · 22 comments
Assignees
Labels
type: bug An issue or pull request relating to a bug in Gatsby

Comments

@me4502
Copy link
Contributor

me4502 commented Apr 30, 2020

Description

Recently a node buffer crash has started occurring, this did not start after a specific Gatsby update, and changing the Gatsby version does not appear to fix it.

This appears to be similar to some previous issues, however those were fixed by lowering the buffer size of the cache when passed to v8.serialize

success run queries - 312.371s - 14100/14100 45.14/s
/Users/madelinemiller/.npm/bin/node[65919]: ../src/node_buffer.cc:455:MaybeLocal<v8::Object> node::Buffer::New(node::Environment *, char *, size_t, bool): Assertion `length <= kMaxLength' failed.
 1: 0x1010285f9 node::Abort() (.cold.1) [/Users/madelinemiller/.npm/bin/node]
 2: 0x10008634d node::FatalError(char const*, char const*) [/Users/madelinemiller/.npm/bin/node]
 3: 0x1000861e2 node::AppendExceptionLine(node::Environment*, v8::Local<v8::Value>, v8::Local<v8::Message>, node::ErrorHandlingMode) [/Users/madelinemiller/.npm/bin/node]
 4: 0x100068aed node::Buffer::New(node::Environment*, char*, unsigned long, bool) [/Users/madelinemiller/.npm/bin/node]
 5: 0x1000e0cb8 node::(anonymous namespace)::SerializerContext::ReleaseBuffer(v8::FunctionCallbackInfo<v8::Value> const&) [/Users/madelinemiller/.npm/bin/node]
 6: 0x1009031ed Builtins_CallApiCallback [/Users/madelinemiller/.npm/bin/node]
 7: 0x10207d868c71 
error Command failed with signal "SIGABRT".

Related issues:

#17233
#21555

It's worth noting that unlike the prior issue that has 100k+ pages, we only have 14k

Steps to reproduce

Still working on this

Expected result

No node crash

Actual result

Node crashes after queries are run

Environment

System:
OS: macOS 10.15.4
CPU: (12) x64 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
Shell: 3.2.57 - /bin/bash
Binaries:
Node: 12.16.3 - ~/.npm/bin/node
Yarn: 1.22.0 - ~/.npm/bin/yarn
npm: 6.14.4 - ~/.npm/bin/npm
Languages:
Python: 2.7.17 - /usr/local/bin/python
Browsers:
Firefox: 70.0.1
Safari: 13.1

Gatsby version is 2.21.5

@me4502 me4502 added the type: bug An issue or pull request relating to a bug in Gatsby label Apr 30, 2020
@me4502
Copy link
Contributor Author

me4502 commented Apr 30, 2020

I believe this issue is due to guessSafeChunkSize in persist.ts not checking the size of each type of node. Meaning large nodes (such as blog posts) may cause issues if the nodes it checks are tiny.

There's a TODO outlining that it should be changed to test a few nodes of each type, rather than just a few nodes overall.

@pieh
Copy link
Contributor

pieh commented Apr 30, 2020

I'll loop @pvdz on this as well.

We know that the node size checking is not foolproof. It is heuristic rather than something deterministic. Checking each node size would be super expensive, that's why we need to make guesses there. As mentioned in TODO - there are ways to improve its accuracy (but it will never be 100% accurate)

@pieh pieh added the scaling label Apr 30, 2020
@pvdz
Copy link
Contributor

pvdz commented Apr 30, 2020

Matthew, why are you base64 encoding 2 gigs of images in your blog post? ;)

On a more serious note, would you reasonably expect that the internal node collection of your 14k pages would cause Gatsby to serialize to over 2gb? I'm curious what kind of setup would lead to it. This might help us get to an better heuristic.

Would this happen to be a contentful site? I'm aware that sourcing plugin is generating a ton of nodes even for smaller sites.

Anyways, you're right, the heuristic is not doing its check per node type and it would be an improvement to probe a few nodes per type, rather than arbitrarily across the set. That shouldn't be too hard to implement.

@me4502
Copy link
Contributor Author

me4502 commented Apr 30, 2020

Yeah it's a contentful site.

My assumption is the heuristic is finding tiny nodes, so it's chunking to ~1k nodes or something crazy.

Our Contentful download takes a decent amount of time, so it's quite possible it's a large amount of data. The blog posts nodes can contain many images, and have a large amount of written content in them. Some of these images may be GIFs, etc, so the size may be rather large. If it's allowing a large number of nodes per chunk, one of these chunks may be full of massive blog posts and therefore going over the limit.

@pvdz
Copy link
Contributor

pvdz commented Apr 30, 2020

My image comment was a joke, appologies if that wasn't obvious.

I've triaged a few contentful sites and one problem seems to be that the sourcing plugin is creating a lot of nodes. If you inspect the internal storage I suspect you'll encounter a million nodes or so, at 14k pages. It may also be 800k nodes, or 1.5 million nodes. Seems to depend a bit on the setup.

I suspect your assumption is correct because contentful has ~100 node types in the sites I checked so far. And it's not unreasonable to expect the heuristic to simply find tiny nodes and figure the shards are massive.

@pvdz
Copy link
Contributor

pvdz commented Apr 30, 2020

@me4502 there's a potential fix in #23643 but it won't work with Loki. As I'm slated to drop Loki in the next two weeks anyways, we'll probably merge this PR after that's done. Are you okay with that? The fix is fairly straightforward and I wouldn't be surprised if you already kind of did this yourself, locally, anyways :p

@me4502
Copy link
Contributor Author

me4502 commented Apr 30, 2020

Yeah I got that it was a joke 😂 It just may not be too far from the truth as some pages do have a lot of GIFs etc

Thanks, yeah - I hadn't had a chance to actually write it yet but my plan was to do something super similar.

I'll use patch-package to apply that change until it's merged. Thanks

@pvdz
Copy link
Contributor

pvdz commented May 11, 2020

Loki is out so I'm going to work on getting this merged this week :)

@pvdz
Copy link
Contributor

pvdz commented May 11, 2020

Hm I was almost ready to merge the fix when I realized the implementation was buggy and the performance implication was quite bad. While we do have the nodes grouped by type, this group is a Map without direct access. This means we either have to create a temporary array to hold all nodes of one type, so that we can access them directly, or we loop through all nodes, which is also expensive.

The middle ground is using an iterator and just checking the first n elements, breaking the for-of loop afterwards. But I'm not a fan of that approach either. Then again, not sure how many options we have here... :/ Thoughts?

@me4502
Copy link
Contributor Author

me4502 commented May 12, 2020

We've actually stopped using the patch so it's not major if it doesn't go in right now. We made another patch to set a maximum depth of contentful nodes, as that lowered the size by a decent amount without causing issues.

@pvdz
Copy link
Contributor

pvdz commented May 12, 2020

Can you elaborate on that? The contentful plugin is being a problem with the amount of nodes that it generates and it's quite hard to find a proper solution for it that doesn't come down to "make Gatsby faster at scale". A 15k page contentful site acts like a million page local sourced site.

@me4502
Copy link
Contributor Author

me4502 commented May 13, 2020

Basically the contentful plugin appears to be creating nodes that recurse for rich text fields. Eg, If I have a rich text field that links to a page containing another rich text field that links to another page, the node for that rich text field will contain the entirety of the two other pages. So we added a maximum recursion depth for rich text fields, preventing it becoming too large. In our case, our navbar was basically containing the entire website.

@pvdz
Copy link
Contributor

pvdz commented May 13, 2020

Maybe creating arrays for each type isn't all that bad and still a viable way forward. But I'll have to run some scaling perf tests to verify this. I'll keep this on the back of my mind. You may currently not be blocked by this, but this is definitely an issue in the back of my mind.

I feel like, if nothing else, we should try to explicitly pick PageData type nodes, since those are generically speaking the most likely culprits?

@github-actions
Copy link

github-actions bot commented Jun 3, 2020

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.
If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!
As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

@github-actions github-actions bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Jun 3, 2020
@LekoArts LekoArts removed the stale? Issue that may be closed soon due to the original author not responding any more. label Jun 3, 2020
@pvdz pvdz self-assigned this Jun 3, 2020
@j218
Copy link

j218 commented Nov 4, 2020

@pvdz @me4502 Hi I'm running into the same issue dealing with node buffer crash with a 15k+ contentful site. I increased the nodesToTest variable and had no issues til now. Was there a solution to this issue?

@me4502
Copy link
Contributor Author

me4502 commented Nov 4, 2020

If you're using contentful, it's likely the issue is RichText fields are causing some nodes to be very very large. Try using the contentful@next version, as it improves RichText handling.

@j218
Copy link

j218 commented Nov 4, 2020

@me4502 thank you for the quick response, i will check it out!

@pvdz
Copy link
Contributor

pvdz commented Nov 9, 2020

This issue still haunts me. It's frustrating that there's no simple way of estimating the size of a node to prevent this problem.

@me4502 how many nodes is your site these days? Are you still running into this problem? I think it was a contentful site, right? The .next tag was merged just now and will be published as v4 for the plugin later this week.

I wonder how frequent people get this in the wild. Or maybe it's more a problem for small-medium contentful sites due to the number of internal nodes.

@me4502
Copy link
Contributor Author

me4502 commented Nov 9, 2020

According to a build from earlier today, we're on 100k nodes.

Ever since the rich text improvement branch for the Contentful plugin we haven't had this issue, however. I strongly believe it was caused by our blog post type having giant chunks of JSON data due to post content.

@pvdz
Copy link
Contributor

pvdz commented Nov 11, 2020

Ok.

I think I'm going to close this issue and consider it a wontfix for now. The reasoning is that it's highly unlikely that a small site should reach this limit, without having some kind of configuration problem / bug;

  • small sites should never reach this limit without trying hard
  • large sites are likely to be optimized to have small payloads
  • bugs like dumping large blobs of data in a page context that are only partially used should be fixed instead of supported

If we start to see a good use case for needing to work around this limit we can revisit this problem.

Cheers :)

@pvdz pvdz closed this as completed Nov 11, 2020
@shmakovdima
Copy link

Problem is still exists, v2.24.0

@pvdz
Copy link
Contributor

pvdz commented Dec 3, 2020

For one, we're currently on 2.28.0 and for two, yes it still exists.

Please explain your use case. Do you have a super large sites? >100k pages with lots of content? How do you think the content is triggering this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug An issue or pull request relating to a bug in Gatsby
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants