Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage: Support multiple vec indexes on the same column #9469

Merged

Conversation

Lloyd-Pottiger
Copy link
Contributor

@Lloyd-Pottiger Lloyd-Pottiger commented Sep 25, 2024

What problem does this PR solve?

Issue Number: ref #9032

Problem Summary:

What is changed and how it works?

  • For query, add index_id in ANNQueryInfo (DMFileWithVectorIndexBlockInputStream.cpp)
    • If the index_id > 0, then try to query with the vector index defined by index_id
    • If the index_id <= 0, then try to query with the vector index defined in column
  • For write, turn ColumnStat.vector_index from std::optional<dtpb::VectorIndexFileProps> vector_index to std::vector<dtpb::VectorIndexFileProps>
    • DMFileIndexWriter.cpp supports saving multiple vector index with different index_id
    • Save the size of index into each dtpb::VectorIndexFileProps instead of ColumnStat.index_bytes
    • The vector index filename is defined as "idx_${index_id}.vector"
  • Ensure the atomic between reading vector index from DMFile and bumping meta version of DMFile
Storage: Support multiple vec indexes on the same column

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 25, 2024
@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Sep 26, 2024
@JinheLin
Copy link
Contributor

If the index_id > 0, then try to query with the vector index defined by index_id
If the index_id <= 0, then try to query with the vector index defined in column

When would index_id <= 0? Why not make index_id always greater than 0?

@JaySon-Huang
Copy link
Contributor

If the index_id > 0, then try to query with the vector index defined by index_id
If the index_id <= 0, then try to query with the vector index defined in column

When would index_id <= 0? Why not make index_id always greater than 0?

In the beta implementation Vector Search Index(Beta), we support adding vector index by comment. That kind of vector index is created along with column_id and no index_id. For backward compatibility, we define the index_id==0 for those indexes. And in the cse branch, those indexes still workable for a period of time.

@Lloyd-Pottiger
Copy link
Contributor Author

If the index_id > 0, then try to query with the vector index defined by index_id
If the index_id <= 0, then try to query with the vector index defined in column

When would index_id <= 0? Why not make index_id always greater than 0?

https://github.com/pingcap/tidb/blob/9dff38ba98405422cb0eb15993f385efe9068b47/pkg/meta/model/index.go#L26 Follow tidb.

Copy link
Contributor

ti-chi-bot bot commented Sep 26, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JaySon-Huang, JinheLin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [JaySon-Huang,JinheLin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Sep 26, 2024
Copy link
Contributor

ti-chi-bot bot commented Sep 26, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-09-26 07:01:14.514491868 +0000 UTC m=+1722144.254915806: ☑️ agreed by JaySon-Huang.
  • 2024-09-26 07:56:53.747144389 +0000 UTC m=+1725483.487568331: ☑️ agreed by JinheLin.

@ti-chi-bot ti-chi-bot bot merged commit 290972e into pingcap:feature/vector-index Sep 26, 2024
5 of 7 checks passed
@Lloyd-Pottiger Lloyd-Pottiger deleted the support-muliti-index branch September 26, 2024 09:51
@JaySon-Huang JaySon-Huang mentioned this pull request Sep 30, 2024
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants