Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lay the foundation for external sorting and index implementation #3

Open
lffg opened this issue Mar 21, 2023 · 0 comments
Open

Lay the foundation for external sorting and index implementation #3

lffg opened this issue Mar 21, 2023 · 0 comments

Comments

@lffg
Copy link
Owner

lffg commented Mar 21, 2023

Brainstorming.

Generalized Heap Page Operations

Currently, the implementation in exec/query/table/* isn't high-level, contrary to what the table name would imply. The current implementation is primarily low-level, dealing directly with heap pages. Also, ideally, one would want such an implementation to be generic over the data stored in the heap's data section since this would help avoid code duplication as currently seen in exec/query/{table, object} implementations.

One problem with making heap operations generic over the data section would be handling a T: SerdeCtx's context.

Drop current exec/query/{table, object} operations.

Since, as per the above section, one will generalize the heap implementation (which would become a primitive database execution building block), table and object handlers would be promoted to high-level interfaces instead of the current mutual-high-low-level mess.

E.g. of a possible new structure:

  • exec
    • operations (low-level operations)
      • heap
      • index
    • query (high-level user-facing queries)
      • table/*
      • object/*

Bulk Heap Inserts

The database doesn't currently implement bulk heap inserts. This would be useful, for example, to aid the implementation of the external sorting's "tape" abstraction, as it wouldn't have to deal with writing sorted record "runs" manually.

However, in this case, one probably wouldn't use the actual database instance directly. Instead, a new temporary "instance" would be required to avoid unnecessary contention on the first page.

Also, a hypothetical BulkHeapInsert should accept an option to avoid writing bookkeeping information on the primary page, as this isn't needed in some cases (e.g., external sorting runs storage).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant