Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: support vector search #9486

Merged
merged 30 commits into from
Sep 30, 2024
Merged

*: support vector search #9486

merged 30 commits into from
Sep 30, 2024

Conversation

Lloyd-Pottiger
Copy link
Contributor

@Lloyd-Pottiger Lloyd-Pottiger commented Sep 27, 2024

What problem does this PR solve?

Issue Number: close #9032

Problem Summary:

What is changed and how it works?

*: support vector search

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Lloyd-Pottiger and others added 21 commits July 31, 2024 15:08
ref #9032

Signed-off-by: Lloyd-Pottiger <[email protected]>

Co-authored-by: JaySon-Huang <[email protected]>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref #9032

Co-authored-by: JaySon-Huang <[email protected]>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref #9032

storage: Use mmap to view vector index

Signed-off-by: Lloyd-Pottiger <[email protected]>

Co-authored-by: JaySon-Huang <[email protected]>
ref #9032

storage: Add vector search metrics

Signed-off-by: Wish <[email protected]>

Co-authored-by: Wenxuan <[email protected]>
ref #9032

*: use SimSIMD for vectors

Signed-off-by: Lloyd-Pottiger <[email protected]>
ref #9032

storage: Add system.dt_local_indexes

Signed-off-by: Lloyd-Pottiger <[email protected]>
ref #9032

DMFile: Support modify DMFile meta

---------

Signed-off-by: Wish <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Co-authored-by: Wenxuan <[email protected]>
Co-authored-by: JaySon <[email protected]>
ref #9032

storage: Force evict when downloading vector index files

Signed-off-by: Wish <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>

Co-authored-by: Wenxuan <[email protected]>
ref #9032

storage: add local indexer scheduler

Signed-off-by: Lloyd-Pottiger <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref #9032

storage: Support adding vector index in background

Signed-off-by: Lloyd-Pottiger <[email protected]>
close #9032

storage: Abort vector index building as soon as possible

Signed-off-by: Wish <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>

Co-authored-by: Wenxuan <[email protected]>
Co-authored-by: Lloyd-Pottiger <[email protected]>
ref #9032

ddl: Support parsing VectorIndex defined in IndexInfo

Co-authored-by: JaySon <[email protected]>
ref #9032

storage: support the HTTP API of sync table schema

Signed-off-by: Lloyd-Pottiger <[email protected]>

Co-authored-by: Lynn <[email protected]>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref #9032

storage: cache PK column in memory

Signed-off-by: Lloyd-Pottiger <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…yncTableSchema (#9451)

ref #9032

*: support vector index and adding/dropping vector index when doing syncTableSchema

Signed-off-by: Lloyd-Pottiger <[email protected]>

Co-authored-by: JaySon <[email protected]>
ref #9032

ddl: Adapt with the latest vector index def
ref #9032

Storage: Support multiple vec indexes on the same column

Signed-off-by: Lloyd-Pottiger <[email protected]>

Co-authored-by: JaySon <[email protected]>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…is dropped (#9475)

### What problem does this PR solve?

Issue Number: ref #9032

Problem Summary:

### What is changed and how it works?

Pick tidbcloud/tiflash-cse#283 and
tidbcloud/tiflash-cse#300

* Unify the logic of `generateLocalIndexInfos` and `initLocalIndexInfos`
* Print 1 logging for the vector index added/dropped/existing in one
table instead. This can avoid the flood of logging when tiflash restart
with lots of table with vector index defined
* Support drop the vector index defined on ColumnInfo after the Column
has been dropped in TiDB
* Add more ut in the DeltaMergeStore read level
* vector search fallback when top_k = max uint32

```commit-message

```

### Check List

Tests <!-- At least one of them must be included. -->

- [ ] Unit test
- [ ] Integration test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No code

Side effects

- [ ] Performance regression: Consumes more CPU
- [ ] Performance regression: Consumes more Memory
- [ ] Breaking backward compatibility

Documentation

- [ ] Affects user behaviors
- [ ] Contains syntax changes
- [ ] Contains variable changes
- [ ] Contains experimental features
- [ ] Changes MySQL compatibility

### Release note

<!-- bugfix or new feature needs a release note -->

```release-note
None
```

---------

Signed-off-by: Lloyd-Pottiger <[email protected]>
Co-authored-by: JaySon <[email protected]>
Co-authored-by: jinhelin <[email protected]>
@Lloyd-Pottiger Lloyd-Pottiger marked this pull request as draft September 27, 2024 08:59
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 27, 2024
Copy link
Contributor

ti-chi-bot bot commented Sep 27, 2024

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 27, 2024
Copy link
Contributor

ti-chi-bot bot commented Sep 27, 2024

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

JaySon-Huang and others added 3 commits September 27, 2024 09:55
)

close #9485

vector: Fix ColumnArray does not work well with CHBlockChunkCodec

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Storage: Add error message when fail to build local index
JaySon-Huang and others added 4 commits September 29, 2024 09:16
…oat32), Nullable(Array(Float32))" (#9490)

ref #9032

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
storage: remove vector_index in column level
@JaySon-Huang JaySon-Huang marked this pull request as ready for review September 30, 2024 05:00
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 30, 2024
@JaySon-Huang JaySon-Huang removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 30, 2024
@JaySon-Huang JaySon-Huang reopened this Sep 30, 2024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temporary disable clang-tidy for this PR because too many file changes and it takes too long

Copy link
Contributor

@JaySon-Huang JaySon-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. release-note-none Denotes a PR that doesn't merit a release note. labels Sep 30, 2024
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Sep 30, 2024
Copy link
Contributor

ti-chi-bot bot commented Sep 30, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-09-30 08:01:06.261755596 +0000 UTC m=+255421.681968632: ☑️ agreed by JaySon-Huang.
  • 2024-09-30 08:13:12.453821332 +0000 UTC m=+256147.874034343: ☑️ agreed by zanmato1984.

Copy link
Contributor

ti-chi-bot bot commented Sep 30, 2024

@zimulala: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

ti-chi-bot bot commented Sep 30, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JaySon-Huang, zanmato1984, zimulala

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [JaySon-Huang,zanmato1984]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot merged commit 2a198f1 into master Sep 30, 2024
5 checks passed
@ti-chi-bot ti-chi-bot bot deleted the feature/vector-index branch September 30, 2024 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vector Search
4 participants