Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Star Tree File Formats #14837

Closed
sarthakaggarwal97 opened this issue Jul 19, 2024 · 0 comments · Fixed by #14809
Closed

[Feature Request] Star Tree File Formats #14837

sarthakaggarwal97 opened this issue Jul 19, 2024 · 0 comments · Fixed by #14809
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Indexing:Performance Indexing & Search Indexing Indexing, Bulk Indexing and anything related to indexing v2.17.0

Comments

@sarthakaggarwal97
Copy link
Contributor

sarthakaggarwal97 commented Jul 19, 2024

Is your feature request related to a problem? Please describe

This issue is to discuss file formats to store the star trees and its associated meta. There could be multiple implementations of composite index, star-tree being one of them.

Required Files:

  1. Composite Index Metadata (.cim) ~ This file will store the metadata related to the Composite Index. This will primarily used to initialize the meta around star tree, and give the offsets to read the respective star tree.
  2. Composite Index Data (.cid) ~ This file will store the actual Star Tree data structure. The Star Tree data will be serialized and stored in this file.
  3. Composite Index Data Doc Values (.cidvd) ~ to store doc values of the star tree dimensions and metrics
  4. Composite Index Metadata Doc Values (.cidvm) ~ to store doc values metadata

Note: These files are extensible to store as many data structures as possible while not limiting itself to star tree. The idea is, if a new data structure comes based on composite index, we would be able to store it.

Composite Index Meta (cim)

Header

  1. Composite Index Marker
  2. Version
  3. Composite Index Field Name
  4. Composite Index Field Type (here Star Tree)

Metadata

  1. Number of dimensions
  2. Dimension Field Names
  3. Number of metric entries (field - metric pairs)
  4. Metric Entries
  5. Segment Aggregated Document Count
  6. Max Leaf Docs
  7. Number of skip star node creation dimensions
  8. Skip star node creation dimensions
  9. Star Tree Build Mode (OnHeap / Offheap)
  10. Data File Pointer (where respective star-tree data is stored)
  11. Data Length (length of the star tree)

Composite Index Data: (cid)

Header

  1. Composite Index Marker
  2. Version
  3. Number of nodes

Star Node:

  1. dimension_id
  2. dimension_value
  3. start_doc_id
  4. end_doc_id
  5. aggregate_doc_id
  6. is_star
  7. first_child
  8. last_child
@sarthakaggarwal97 sarthakaggarwal97 added enhancement Enhancement or improvement to existing feature or request untriaged labels Jul 19, 2024
@sarthakaggarwal97 sarthakaggarwal97 added Indexing & Search Indexing Indexing, Bulk Indexing and anything related to indexing labels Jul 19, 2024
@sarthakaggarwal97 sarthakaggarwal97 self-assigned this Jul 19, 2024
@mgodwan mgodwan added v2.17.0 and removed untriaged labels Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing:Performance Indexing & Search Indexing Indexing, Bulk Indexing and anything related to indexing v2.17.0
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants