Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow scalar indices to be updated with new data #1576

Merged
merged 4 commits into from
Nov 17, 2023

Conversation

westonpace
Copy link
Contributor

@westonpace westonpace commented Nov 10, 2023

Training a scalar index is quick so on update we retrain. We sort the new data and then read in the already sorted old data and train on the merged stream.

Closes #1568

@westonpace westonpace force-pushed the feat/scalar-index-update branch 3 times, most recently from 960294b to 532c782 Compare November 14, 2023 18:07
@westonpace westonpace marked this pull request as ready for review November 14, 2023 18:09
Comment on lines +1125 to +1126
let all_data = Arc::new(UnionExec::new(vec![old_input, new_input]));
let ordered = Arc::new(SortPreservingMergeExec::new(vec![sort_expr], all_data));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice use of DataFusion 🔥

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a very cool tool

let index_dir = dataset.indices_dir().child(new_uuid.to_string());
let new_store = LanceIndexStore::new((*dataset.object_store).clone(), index_dir);

index.update(new_data_stream.into(), &new_store).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remind me, why don't we remove old data? Is that a future TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...embarrassingly I think it's just because I didn't consider it 😆 I'll add it as a future TODO.

@westonpace
Copy link
Contributor Author

Test failure is a known intermittent failure. I will merge.

@westonpace westonpace merged commit 4ad9f96 into lancedb:main Nov 17, 2023
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow scalar indices to be updated
2 participants