Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sql unnest to optimize entity insertion #1098

Merged
merged 4 commits into from
Mar 14, 2024
Merged

Conversation

ktuite
Copy link
Member

@ktuite ktuite commented Mar 6, 2024

Related to #1095

Happy to report that I was able to insert 500,000 entities w/ 10 properties in about ~30 seconds (on my M3 mac) with this sql.unnest change.

That payload was about 76,000KB (76MB) and this limit was increased to 250000kb (250MB)
service.use(bodyParser.json({ type: 'application/json', limit: '250kb' }));

What has been done to verify that this works as intended?

Tests of small insertions still work.
I tried it on my own with those larger numbers.

Why is this the best possible solution? Were any other approaches considered?

There is still more to do to create many entities at once (streaming JSON?), but I think it's worth merging this particular change now so the bottleneck is no longer the database insert.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

Does this change require updates to the API documentation? If so, please update docs/api.md as part of this PR.

Before submitting this PR, please make sure you have:

  • run make test and confirmed all checks still pass OR confirm CircleCI build passes
  • verified that any code from external sources are properly credited in comments or that everything is internally sourced

Copy link
Member

@matthew-white matthew-white left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very exciting that the database is no longer the bottleneck! 🎉 I have a question about insertMany(), but this is looking like a nice change to me.

lib/model/query/entities.js Outdated Show resolved Hide resolved
lib/model/query/entities.js Outdated Show resolved Hide resolved
lib/model/query/entities.js Show resolved Hide resolved
lib/model/query/entities.js Outdated Show resolved Hide resolved
lib/util/db.js Show resolved Hide resolved
lib/model/frames/entity.js Outdated Show resolved Hide resolved
lib/model/frames/entity.js Show resolved Hide resolved
lib/model/frames/entity.js Outdated Show resolved Hide resolved
lib/util/db.js Outdated Show resolved Hide resolved
@ktuite ktuite merged commit e610ed1 into master Mar 14, 2024
5 checks passed
@ktuite ktuite deleted the ktuite/improve_bulk_upload branch March 14, 2024 01:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants