Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gatsby-transform-json out of memory #34081

Closed
2 tasks done
joernroeder opened this issue Nov 25, 2021 · 5 comments · Fixed by #34084
Closed
2 tasks done

gatsby-transform-json out of memory #34081

joernroeder opened this issue Nov 25, 2021 · 5 comments · Fixed by #34084
Labels
help wanted Issue with a clear description that the community can help with. topic: plugins Related to plugin system, themes & catch-all for plugins that don't have a label type: bug An issue or pull request relating to a bug in Gatsby

Comments

@joernroeder
Copy link
Contributor

joernroeder commented Nov 25, 2021

Preliminary Checks

Description

I'm running a 10k+ pages instance on gatsby cloud and in a dedicated branch I upgraded gatsby 4 dependencies for a while.
Unfortunately I never got it running due to high memory consumption.

Your Gatsby build's memory consumption exceeded the limits allowed in your plan. 
For more details, see https://gatsby.dev/memory.

Also my local builds on my machine consumed huge amounts of memory until I killed the process.

Screenshot 2021-11-25 at 17 32 36

Long story short, today I had some time to dig into the issue, commented out all my plugins and isolated the issue.
The source of the issue is this line https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby-transformer-json/src/gatsby-node.js#L63

A .forEach loop over the array items of a json file to create the individual json nodes. Whats the issue you may ask?

Gatsbys createNode function which is used by transformObject returns a

Promise [which] resolves when all cascading onCreateNode API calls triggered by createNode have finished.

whereas .forEach

Note: forEach expects a synchronous function.

forEach does not wait for promises. Make sure you are aware of the implications while using promises (or async functions) as forEach callback.

The result is that the loop kicks off all 10k transformations right away instead of in some serial/concurrent loop which the code suggests.

The fix for that seems quite simple:

  • mark transformObject as an async function
  • use a regular loop which awaits the transformObject call
  if (_.isArray(parsedContent)) {
    for (let i = 0, l = parsedContent.length; i < l; i++) {
      const obj = parsedContent[i];

      await transformObject(
        obj,
        createNodeId(`${node.id} [${i}] >>> JSON`),
        getType({ node, object: obj, isArray: true })
      )
    }
  }

I feel you could get fancy here and use something like .eachLimit to add some concurrency and speed it up again.

Reproduction Link

https://github.com/joernroeder/gatsby-json-memory

Steps to Reproduce

  1. clone the reproduction link and rungatsby start
  2. Watch the memory consumption

Expected Result

no multi GB memory usage :)

Actual Result

broken builds

Environment

Binaries:
    Node: 14.16.1
    Yarn: 1.22.17
    npm: 7.16.0 - ~/.nvm/versions/node/v14.16.1/bin/npm
  Languages:
    Python: 2.7.16 - /usr/bin/python
  npmPackages:
    gatsby: @next => 4.3.0-next.1 
    gatsby-source-filesystem: ^4.2.0 => 4.2.0 
    gatsby-transformer-json: ^4.2.0 => 4.2.0

Config Flags

No response

@joernroeder joernroeder added the type: bug An issue or pull request relating to a bug in Gatsby label Nov 25, 2021
@gatsbot gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Nov 25, 2021
@LekoArts LekoArts added help wanted Issue with a clear description that the community can help with. topic: plugins Related to plugin system, themes & catch-all for plugins that don't have a label and removed status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer labels Nov 26, 2021
@LekoArts
Copy link
Contributor

Hi, thanks for the issue! As you already pointed out the issue might be related to #33868 and as my colleague also wrote there it might be a good idea to try streaming instead. It's not something we can prioritize at the moment but we'd be happy to review a PR with this. Thanks!

@joernroeder
Copy link
Contributor Author

@LekoArts streaming might be the ultimate solution but I feel converting the loop to respect the async nature of createNode might be a good start and doesn't seem to be to difficult (the code above is all we need) and would fix at least this issue here. I would open a PR to get the fix out (and me to gatsby 4) if you don't mind.

@joernroeder
Copy link
Contributor Author

joernroeder commented Nov 26, 2021

Also just to make it clear, the same amount of nodes (same json file) is currently transformed in the gatsby 3 setup successfully which indicates that something might have changed upstream and causes the issue now.

@KyleAMathews
Copy link
Contributor

What changed with v4 is we now write data to LMDB which is more costly than writing to memory. The sync forEach loop doesn't allow LMDB to flush data to disk meaning in progress changes accumulate. Switching to async solves as does using e.g. process.nextTick to let the event loop run through again.

@joernroeder
Copy link
Contributor Author

@KyleAMathews ah that makes sense, didn't know LMDB is part of gatsby 4 by default – that's awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Issue with a clear description that the community can help with. topic: plugins Related to plugin system, themes & catch-all for plugins that don't have a label type: bug An issue or pull request relating to a bug in Gatsby
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants