-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: allow scalar indices to be updated with new data #1576
Conversation
960294b
to
532c782
Compare
532c782
to
47c5137
Compare
let all_data = Arc::new(UnionExec::new(vec![old_input, new_input])); | ||
let ordered = Arc::new(SortPreservingMergeExec::new(vec![sort_expr], all_data)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice use of DataFusion 🔥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a very cool tool
let index_dir = dataset.indices_dir().child(new_uuid.to_string()); | ||
let new_store = LanceIndexStore::new((*dataset.object_store).clone(), index_dir); | ||
|
||
index.update(new_data_stream.into(), &new_store).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remind me, why don't we remove old data? Is that a future TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm...embarrassingly I think it's just because I didn't consider it 😆 I'll add it as a future TODO.
Test failure is a known intermittent failure. I will merge. |
Training a scalar index is quick so on update we retrain. We sort the new data and then read in the already sorted old data and train on the merged stream.
Closes #1568