-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gatsby-transformer-csv hangs if CSV too large #11839
Comments
@marvinmarnold thanks a bunch for adding a reproduction right away! We probably want to add some kind of progress bar as we do for images and queries. I need to dig a bit deeper in the code if this something we can do from the Gatsby end or look for a csv stream parser. |
@wardpeet I'm happy to make a pull request if you point me in the right direction. I've also noticed that the time it takes to process the CSV doesn't grow linearly with file size. If I sample 25% of records then it takes less than a minute to process (under 1MB). But when I include all the records (~3.5MB), it takes over 10 minutes. |
Let me get back to you tomorrow so I can have some time to put you in the right direction 😂 |
@marvinmarnold could you check how long it takes to process the CSV directly? We use https://www.npmjs.com/package/csvtojson to convert CSVs to JSON and then to Gatsby nodes. Also how many rows is the full file? |
@KyleAMathews using csvtojson, I'm able to read each row of the complete (58,934 line) source dataset in less than a second. Source: https://github.com/marvinmarnold/new_orleans_cameras/tree/master/data/csvtojson-test
|
Hiya! This issue has gone quiet. Spooky quiet. 👻 We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here. If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open! Thanks for being a part of the Gatsby community! 💪💜 |
not stale
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
…On Tuesday, March 12, 2019 12:28 PM, gatsbot[bot] ***@***.***> wrote:
Hiya!
This issue has gone quiet. Spooky quiet. 👻
We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.
If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!
Thanks for being a part of the Gatsby community! 💪💜
—
You are receiving this because you were mentioned.
Reply to this email directly, [view it on GitHub](#11839 (comment)), or [mute the thread](https://github.com/notifications/unsubscribe-auth/ABQBNj27OTw6vd2HTFj6fZdW3T0qttVKks5vV-OwgaJpZM4a_vmh).
|
Seeing this issue as well, trying to use a CSV with over 72,000 rows .. build time is pretty lengthy but also seeing a large page-data.json file that takes over 30s to pull down at times. Not as much worried about the build time, as much as it takes about 30s for a user to load the page and for it to be ready. My repo: https://github.com/ramseytisher/mds-dx-aws |
Same issue with either this plugin using a 4.2MB CSV or the gatsby-transformer-json plugin using a 8.9MB JSON. My environment is as below: System: |
I'm facing this issue as well with a ~25MB CSV 😱 (~150K rows & ~10 fields). And yeah, |
Hello @ramseytisher @makezi @marvinmarnold 👋 @KyleAMathews suggested splitting the CSV into multiple files and see if it helps. Though my build is still failing when creating pages, splitting CSV looks to be significantly speeding up the "source and transform nodes" step. Give it a try and see if it helps. I used https://github.com/aleung/csvsplit to split the CSV. You could probably have an npm script to do this and run this before |
Hiya! This issue has gone quiet. Spooky quiet. 👻 We get a lot of issues, so we currently close issues after 60 days of inactivity. It’s been at least 20 days since the last update here. Thanks for being a part of the Gatsby community! 💪💜 |
Hey again! It’s been 60 days since anything happened on this issue, so our friendly neighborhood robot (that’s me!) is going to close it. Thanks again for being part of the Gatsby community! 💪💜 |
Description
I'm trying to include data from a 3.5MB CSV using gatsby-transformer-csv. When I do
gatsby develop
, the terminal hangs atsource and transform nodes
for a long time but does eventually complete.Is gatsby-transformer-csv meant for files this large? How can I add a progress indicator for large files? Other best practices for loading large files.
Steps to reproduce
Expected result
Should have some kind of indication that processing is happening.
Actual result
Build seems to hang, then eventually finishes. 3.5MB does not feel like it should take this long to complete.
Environment
$ npx gatsby info --clipboard
System:
OS: Linux 4.15 Ubuntu 16.04.5 LTS (Xenial Xerus)
CPU: (3) x64 Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
Shell: 4.3.48 - /bin/bash
Binaries:
Node: 8.15.0 - /usr/bin/node
npm: 6.5.0 - /usr/bin/npm
Browsers:
Firefox: 65.0
npmPackages:
gatsby: ^2.0.76 => 2.0.76
gatsby-image: ^2.0.20 => 2.0.25
gatsby-plugin-manifest: ^2.0.9 => 2.0.12
gatsby-plugin-offline: ^2.0.16 => 2.0.20
gatsby-plugin-react-helmet: ^3.0.2 => 3.0.5
gatsby-plugin-react-leaflet: ^2.0.3 => 2.0.3
gatsby-plugin-sharp: ^2.0.14 => 2.0.16
gatsby-source-filesystem: ^2.0.8 => 2.0.12
gatsby-transformer-csv: ^2.0.7 => 2.0.7
gatsby-transformer-sharp: ^2.1.8 => 2.1.9
npmGlobalPackages:
gatsby-cli: 2.4.8
The text was updated successfully, but these errors were encountered: