Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core): Make filter/sort query only hold onto node properties it needs #34747

Merged
merged 46 commits into from
Feb 15, 2022

Conversation

imjoshin
Copy link
Contributor

@imjoshin imjoshin commented Feb 7, 2022

Description

When filtering and sorting, entire node objects are currently stored in memory. This becomes a problem when nodes have large fields, as those fields are likely not used in the filter/sort queries.

What this does

This PR introduces a new object type, IGatsbyNodePartial. As it's name implies, it is a partial object of the original IGatsbyNode. This object will only hold onto the id, internal.counter, and any indexFields (used for sorting, potentially others in the future) that are needed.

Now, as we're iterating over filters, we create buckets of node partials, so we don't keep the entire node in memory. On top of that, the references to each object are stored as WeakRefs. If they do happen to be garbage collected (will only happen with these changes), the first time we grab the partial again, we'll reload it in memory so we can act upon it again. If at any point we encounter a new indexField that we hadn't tracked before, we will also grab the original node's value and store it on the object in memory.

Any large nodes are now shrunk down, as long as we are not sorting on any of those large fields.

What this does not do

This is not a huge rewrite. We would love to revisit the fast filters and indexing flow of query parsing, but this was the quickest, least risk win with significant improvements.

This will provide very little benefit to memory usage in the case of many small nodes. If the nodes are not too much larger than their id/internal.counter attributes, storing the object as a partial provides little benefit.

Benchmarks

We did some limited testing against the main thread of the build process. This was done on a local environment using #34724, so tests can be skewed based on what's running locally. Since these are based off of average runs, I figured it was a decent baseline to make sure nothing crazy was happening.

Conclusion: As suspected, many tiny nodes saw little improvement, and larger nodes saw the benefit. There are no notable differences in build times.

Smaller nodes just eq filter:
Capture

Larger nodes with a sort:
Capture

Documentation

All internal, none needed for now!

Related Issues

[sc-44553]
[sc-45216]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: core Relates to Gatsby's core (e.g. page loading, reporter, state machine) topic: performance Related to runtime & build performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants