Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataframe v2: support for clear semantics #7652

Merged
merged 6 commits into from
Oct 10, 2024
Merged

Conversation

teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Oct 9, 2024

Support clear semantics in the dataframe API.
Tombstones are never visible to end-users, only their effect.

Like every other Dataframe v2 feature PR, and following recommendations from @jleibs, this prioritizes convenience of implementation over everything else, for now.
All clear chunks are fetched, post-processed, and re-injected into the view contents during init(), and then the streaming join runs as usual after that.

Static clear semantics can get pretty unhinged, but that's A) not specific to the dataframe API and B) so extremely niche that our time is better spent on real-world problems right now:


Checklist

  • I have read and agree to Contributor Guide and the Code of Conduct
  • I've included a screenshot or gif (if applicable)
  • I have tested the web demo (if applicable):
  • The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG
  • If applicable, add a new check to the release checklist!
  • If have noted any breaking changes to the log API in CHANGELOG.md and the migration guide

To run all checks from main, comment on the PR with @rerun-bot full-check.

@teh-cmc teh-cmc added enhancement New feature or request 🔍 re_query affects re_query itself do-not-merge Do not merge this PR include in changelog labels Oct 9, 2024
@teh-cmc teh-cmc changed the base branch from main to cmc/dataframev2_chunk_tools October 9, 2024 10:47
Copy link
Member

@jleibs jleibs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I like this partitioning. Makes it easier to cache/optimize fetch_clear_chunks in the future with some additional data-structures + the component-refactoring we've discussed.

Base automatically changed from cmc/dataframev2_chunk_tools to main October 10, 2024 07:20
Copy link

Deployed docs

Commit Link
5f92358 https://landing-4xjs9140y-rerun.vercel.app/docs

@teh-cmc teh-cmc removed the do-not-merge Do not merge this PR label Oct 10, 2024
@teh-cmc teh-cmc merged commit c6d842b into main Oct 10, 2024
27 of 28 checks passed
@teh-cmc teh-cmc deleted the cmc/dataframev2_clears branch October 10, 2024 07:22
teh-cmc added a commit that referenced this pull request Oct 10, 2024
This implements support for "constant-time" pagination.
Obviously it's not constant-time, but it scales in a sane fashion.

This PR is still _not_ about general performance optimizations. It is
the last step before those can start though.

* Fixes #7657 
* DNM: requires #7652
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request include in changelog 🔍 re_query affects re_query itself
Projects
None yet
2 participants