Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gatsby-transformer-csv hangs if CSV too large #11839

Closed
marvinmarnold opened this issue Feb 17, 2019 · 13 comments
Closed

gatsby-transformer-csv hangs if CSV too large #11839

marvinmarnold opened this issue Feb 17, 2019 · 13 comments
Labels
stale? Issue that may be closed soon due to the original author not responding any more. topic: performance Related to runtime & build performance

Comments

@marvinmarnold
Copy link

marvinmarnold commented Feb 17, 2019

Description

I'm trying to include data from a 3.5MB CSV using gatsby-transformer-csv. When I do gatsby develop, the terminal hangs at source and transform nodes for a long time but does eventually complete.

Is gatsby-transformer-csv meant for files this large? How can I add a progress indicator for large files? Other best practices for loading large files.

Steps to reproduce

git clone https://github.com/marvinmarnold/new_orleans_cameras
cd new_orleans_cameras
git checkout 2854f8f1937b2767f042a67a4bb88e039c82bf22
gatsby develop

Expected result

Should have some kind of indication that processing is happening.

Actual result

Build seems to hang, then eventually finishes. 3.5MB does not feel like it should take this long to complete.

Environment

$ npx gatsby info --clipboard

System:
OS: Linux 4.15 Ubuntu 16.04.5 LTS (Xenial Xerus)
CPU: (3) x64 Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
Shell: 4.3.48 - /bin/bash
Binaries:
Node: 8.15.0 - /usr/bin/node
npm: 6.5.0 - /usr/bin/npm
Browsers:
Firefox: 65.0
npmPackages:
gatsby: ^2.0.76 => 2.0.76
gatsby-image: ^2.0.20 => 2.0.25
gatsby-plugin-manifest: ^2.0.9 => 2.0.12
gatsby-plugin-offline: ^2.0.16 => 2.0.20
gatsby-plugin-react-helmet: ^3.0.2 => 3.0.5
gatsby-plugin-react-leaflet: ^2.0.3 => 2.0.3
gatsby-plugin-sharp: ^2.0.14 => 2.0.16
gatsby-source-filesystem: ^2.0.8 => 2.0.12
gatsby-transformer-csv: ^2.0.7 => 2.0.7
gatsby-transformer-sharp: ^2.1.8 => 2.1.9
npmGlobalPackages:
gatsby-cli: 2.4.8

@wardpeet wardpeet added help wanted Issue with a clear description that the community can help with. type: feature or enhancement labels Feb 18, 2019
@wardpeet
Copy link
Contributor

@marvinmarnold thanks a bunch for adding a reproduction right away! We probably want to add some kind of progress bar as we do for images and queries. I need to dig a bit deeper in the code if this something we can do from the Gatsby end or look for a csv stream parser.

@marvinmarnold
Copy link
Author

marvinmarnold commented Feb 18, 2019

@wardpeet I'm happy to make a pull request if you point me in the right direction.

I've also noticed that the time it takes to process the CSV doesn't grow linearly with file size. If I sample 25% of records then it takes less than a minute to process (under 1MB). But when I include all the records (~3.5MB), it takes over 10 minutes.

@wardpeet wardpeet self-assigned this Feb 18, 2019
@wardpeet
Copy link
Contributor

Let me get back to you tomorrow so I can have some time to put you in the right direction 😂

@KyleAMathews
Copy link
Contributor

@marvinmarnold could you check how long it takes to process the CSV directly? We use https://www.npmjs.com/package/csvtojson to convert CSVs to JSON and then to Gatsby nodes.

Also how many rows is the full file?

@marvinmarnold
Copy link
Author

marvinmarnold commented Feb 19, 2019

@KyleAMathews using csvtojson, I'm able to read each row of the complete (58,934 line) source dataset in less than a second.

Source: https://github.com/marvinmarnold/new_orleans_cameras/tree/master/data/csvtojson-test

$ node index.js 
Read 58934 rows in 720ms

@gatsbot
Copy link

gatsbot bot commented Mar 12, 2019

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.

If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

Thanks for being a part of the Gatsby community! 💪💜

@gatsbot gatsbot bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Mar 12, 2019
@marvinmarnold
Copy link
Author

marvinmarnold commented Mar 12, 2019 via email

@LekoArts LekoArts added not stale and removed stale? Issue that may be closed soon due to the original author not responding any more. labels Mar 12, 2019
@ramseytisher
Copy link

ramseytisher commented Aug 1, 2019

Seeing this issue as well, trying to use a CSV with over 72,000 rows .. build time is pretty lengthy but also seeing a large page-data.json file that takes over 30s to pull down at times. Not as much worried about the build time, as much as it takes about 30s for a user to load the page and for it to be ready.

My repo: https://github.com/ramseytisher/mds-dx-aws
Deployed Site: https://mock.caretrackeronline.com/DiagnosisSearch

@makezi
Copy link

makezi commented Dec 18, 2019

Same issue with either this plugin using a 4.2MB CSV or the gatsby-transformer-json plugin using a 8.9MB JSON. My environment is as below:

System:
OS: macOS 10.15.2
CPU: (12) x64 Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
Shell: 5.7.1 - /bin/zsh
Binaries:
Node: 12.12.0 - /usr/local/bin/node
npm: 6.11.3 - /usr/local/bin/npm
Languages:
Python: 2.7.16 - /usr/bin/python
Browsers:
Chrome: 79.0.3945.88
Firefox: 71.0
Safari: 13.0.4
npmPackages:
gatsby: ^2.18.12 => 2.18.12
gatsby-image: ^2.2.34 => 2.2.34
gatsby-plugin-manifest: ^2.2.31 => 2.2.31
gatsby-plugin-offline: ^3.0.27 => 3.0.27
gatsby-plugin-react-helmet: ^3.1.16 => 3.1.16
gatsby-plugin-root-import: ^2.0.5 => 2.0.5
gatsby-plugin-sharp: ^2.3.5 => 2.3.5
gatsby-source-filesystem: ^2.1.42 => 2.1.42
gatsby-transformer-csv: ^2.1.21 => 2.1.21
gatsby-transformer-json: ^2.2.22 => 2.2.22
gatsby-transformer-sharp: ^2.3.7 => 2.3.7
npmGlobalPackages:
gatsby-cli: 2.8.19

@tsriram
Copy link
Contributor

tsriram commented Dec 28, 2019

I'm facing this issue as well with a ~25MB CSV 😱 (~150K rows & ~10 fields). And yeah, csvtojson converts the file to JSON in a few seconds, but gatsby build never finishes. I'm happy to help if there's more info on where to look.

@tsriram
Copy link
Contributor

tsriram commented Dec 30, 2019

Hello @ramseytisher @makezi @marvinmarnold 👋

@KyleAMathews suggested splitting the CSV into multiple files and see if it helps. Though my build is still failing when creating pages, splitting CSV looks to be significantly speeding up the "source and transform nodes" step. Give it a try and see if it helps. I used https://github.com/aleung/csvsplit to split the CSV. You could probably have an npm script to do this and run this before gatsby develop and gatsby build.

@pvdz pvdz added topic: performance Related to runtime & build performance topic: scaling builds and removed help wanted Issue with a clear description that the community can help with. labels Sep 22, 2020
@pvdz pvdz assigned pvdz and unassigned wardpeet Sep 22, 2020
@github-actions
Copy link

github-actions bot commented Mar 7, 2021

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 60 days of inactivity. It’s been at least 20 days since the last update here.
If we missed this issue or if you want to keep it open, please reply here.
As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

@github-actions github-actions bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Mar 7, 2021
@github-actions
Copy link

github-actions bot commented Jun 7, 2021

Hey again!

It’s been 60 days since anything happened on this issue, so our friendly neighborhood robot (that’s me!) is going to close it.
Please keep in mind that I’m only a robot, so if I’ve closed this issue in error, I’m HUMAN_EMOTION_SORRY. Please feel free to comment on this issue or create a new one if you need anything else.
As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks again for being part of the Gatsby community! 💪💜

@github-actions github-actions bot closed this as completed Jun 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale? Issue that may be closed soon due to the original author not responding any more. topic: performance Related to runtime & build performance
Projects
None yet
Development

No branches or pull requests

9 participants