Support large, batch importing of relationships #94

josephschorr · 2022-03-03T22:40:21Z

zed import currently constructs a single WriteRelationships call, which will be exceeded after ~5000 relationships. We should add support for batching the relationships to be imported into chunks, and executing those chunks in parallel.

As mentioned in Discord (https://discord.com/channels/844600078504951838/844600078948630559/949071336574181457), there are a number of issues to address:

The gRPC server has a limit on message sizes. This can easily be solved by batching the requests

Batching is tricky, serially executing each batch works to an extent, but it's not very scalable. This could be improved by executing each batch in a goroutine. Of course, this comes with some risk, as there's roughly a 9k batch size limit (see 3). If a zed import is trying to import 6 million+ rows, that's about 650 connections each trying to shove through 9k tuples.

Postgres and MySQL both have a limit on how many placeholders you can have in a single query. This appears to be 65535 for Postgres and MySQL (https://stackoverflow.com/a/49379324, https://stackoverflow.com/a/24447922). This roughly translating to a maximum of 9362 relationship tuple writes (assuming each is going to require 7 placeholders). I didn't initially hit this limitation because I was testing with an in memory database.

We'll need to see a reasonable limit on the number of parallelized write requests, have a progress bar for display, etc

The text was updated successfully, but these errors were encountered:

josephschorr added area/CLI Affects the command line priority/3 low This would be nice to have labels Mar 3, 2022

bryanhuhta mentioned this issue Mar 3, 2022

Add basic batching to zed import #95

Closed

jzelinskie linked a pull request Mar 10, 2022 that will close this issue

Add basic batching to zed import #95

Closed

bryanhuhta mentioned this issue May 11, 2022

Add batching functionality #120

Merged

jzelinskie closed this as completed in #120 May 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support large, batch importing of relationships #94

Support large, batch importing of relationships #94

josephschorr commented Mar 3, 2022 •

edited

Loading

Support large, batch importing of relationships #94

Support large, batch importing of relationships #94

Comments

josephschorr commented Mar 3, 2022 • edited Loading

josephschorr commented Mar 3, 2022 •

edited

Loading