-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node buffer allocation crash #23627
Comments
I believe this issue is due to There's a TODO outlining that it should be changed to test a few nodes of each type, rather than just a few nodes overall. |
I'll loop @pvdz on this as well. We know that the node size checking is not foolproof. It is heuristic rather than something deterministic. Checking each node size would be super expensive, that's why we need to make guesses there. As mentioned in TODO - there are ways to improve its accuracy (but it will never be 100% accurate) |
Matthew, why are you base64 encoding 2 gigs of images in your blog post? ;) On a more serious note, would you reasonably expect that the internal node collection of your 14k pages would cause Gatsby to serialize to over 2gb? I'm curious what kind of setup would lead to it. This might help us get to an better heuristic. Would this happen to be a contentful site? I'm aware that sourcing plugin is generating a ton of nodes even for smaller sites. Anyways, you're right, the heuristic is not doing its check per node type and it would be an improvement to probe a few nodes per type, rather than arbitrarily across the set. That shouldn't be too hard to implement. |
Yeah it's a contentful site. My assumption is the heuristic is finding tiny nodes, so it's chunking to ~1k nodes or something crazy. Our Contentful download takes a decent amount of time, so it's quite possible it's a large amount of data. The blog posts nodes can contain many images, and have a large amount of written content in them. Some of these images may be GIFs, etc, so the size may be rather large. If it's allowing a large number of nodes per chunk, one of these chunks may be full of massive blog posts and therefore going over the limit. |
My image comment was a joke, appologies if that wasn't obvious. I've triaged a few contentful sites and one problem seems to be that the sourcing plugin is creating a lot of nodes. If you inspect the internal storage I suspect you'll encounter a million nodes or so, at 14k pages. It may also be 800k nodes, or 1.5 million nodes. Seems to depend a bit on the setup. I suspect your assumption is correct because contentful has ~100 node types in the sites I checked so far. And it's not unreasonable to expect the heuristic to simply find tiny nodes and figure the shards are massive. |
@me4502 there's a potential fix in #23643 but it won't work with Loki. As I'm slated to drop Loki in the next two weeks anyways, we'll probably merge this PR after that's done. Are you okay with that? The fix is fairly straightforward and I wouldn't be surprised if you already kind of did this yourself, locally, anyways :p |
Yeah I got that it was a joke 😂 It just may not be too far from the truth as some pages do have a lot of GIFs etc Thanks, yeah - I hadn't had a chance to actually write it yet but my plan was to do something super similar. I'll use patch-package to apply that change until it's merged. Thanks |
Loki is out so I'm going to work on getting this merged this week :) |
Hm I was almost ready to merge the fix when I realized the implementation was buggy and the performance implication was quite bad. While we do have the nodes grouped by type, this group is a Map without direct access. This means we either have to create a temporary array to hold all nodes of one type, so that we can access them directly, or we loop through all nodes, which is also expensive. The middle ground is using an iterator and just checking the first |
We've actually stopped using the patch so it's not major if it doesn't go in right now. We made another patch to set a maximum depth of contentful nodes, as that lowered the size by a decent amount without causing issues. |
Can you elaborate on that? The contentful plugin is being a problem with the amount of nodes that it generates and it's quite hard to find a proper solution for it that doesn't come down to "make Gatsby faster at scale". A 15k page contentful site acts like a million page local sourced site. |
Basically the contentful plugin appears to be creating nodes that recurse for rich text fields. Eg, If I have a rich text field that links to a page containing another rich text field that links to another page, the node for that rich text field will contain the entirety of the two other pages. So we added a maximum recursion depth for rich text fields, preventing it becoming too large. In our case, our navbar was basically containing the entire website. |
Maybe creating arrays for each type isn't all that bad and still a viable way forward. But I'll have to run some scaling perf tests to verify this. I'll keep this on the back of my mind. You may currently not be blocked by this, but this is definitely an issue in the back of my mind. I feel like, if nothing else, we should try to explicitly pick |
Hiya! This issue has gone quiet. Spooky quiet. 👻 We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here. Thanks for being a part of the Gatsby community! 💪💜 |
If you're using contentful, it's likely the issue is RichText fields are causing some nodes to be very very large. Try using the |
@me4502 thank you for the quick response, i will check it out! |
This issue still haunts me. It's frustrating that there's no simple way of estimating the size of a node to prevent this problem. @me4502 how many nodes is your site these days? Are you still running into this problem? I think it was a contentful site, right? The .next tag was merged just now and will be published as v4 for the plugin later this week. I wonder how frequent people get this in the wild. Or maybe it's more a problem for small-medium contentful sites due to the number of internal nodes. |
According to a build from earlier today, we're on 100k nodes. Ever since the rich text improvement branch for the Contentful plugin we haven't had this issue, however. I strongly believe it was caused by our blog post type having giant chunks of JSON data due to post content. |
Ok. I think I'm going to close this issue and consider it a wontfix for now. The reasoning is that it's highly unlikely that a small site should reach this limit, without having some kind of configuration problem / bug;
If we start to see a good use case for needing to work around this limit we can revisit this problem. Cheers :) |
Problem is still exists, v2.24.0 |
For one, we're currently on 2.28.0 and for two, yes it still exists. Please explain your use case. Do you have a super large sites? >100k pages with lots of content? How do you think the content is triggering this problem? |
Description
Recently a node buffer crash has started occurring, this did not start after a specific Gatsby update, and changing the Gatsby version does not appear to fix it.
This appears to be similar to some previous issues, however those were fixed by lowering the buffer size of the cache when passed to v8.serialize
Related issues:
#17233
#21555
It's worth noting that unlike the prior issue that has 100k+ pages, we only have 14k
Steps to reproduce
Still working on this
Expected result
No node crash
Actual result
Node crashes after queries are run
Environment
System:
OS: macOS 10.15.4
CPU: (12) x64 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
Shell: 3.2.57 - /bin/bash
Binaries:
Node: 12.16.3 - ~/.npm/bin/node
Yarn: 1.22.0 - ~/.npm/bin/yarn
npm: 6.14.4 - ~/.npm/bin/npm
Languages:
Python: 2.7.17 - /usr/local/bin/python
Browsers:
Firefox: 70.0.1
Safari: 13.1
Gatsby version is
2.21.5
The text was updated successfully, but these errors were encountered: