Dataframe v2: new and improved chunk tools #7649

teh-cmc · 2024-10-09T08:58:29Z

Bunch of improvements and/or additions to the Chunk toolbox that happened as part of the implementation of the dataframe v2 API.

Checklist

I have read and agree to Contributor Guide and the Code of Conduct
I've included a screenshot or gif (if applicable)
I have tested the web demo (if applicable):
- Using examples from latest main build: rerun.io/viewer
- Using full set of examples from nightly build: rerun.io/viewer
The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG
If applicable, add a new check to the release checklist!
If have noted any breaking changes to the log API in CHANGELOG.md and the migration guide

To run all checks from main, comment on the PR with @rerun-bot full-check.

crates/store/re_chunk/src/chunk.rs

jleibs · 2024-10-09T15:20:06Z

crates/store/re_chunk/src/slice.rs

+    /// WARNING: the returned chunk has the same old [`crate::ChunkId`]! Change it with [`Self::with_id`].
+    #[must_use]
+    #[inline]
+    pub fn components_removed(self) -> Self {


chunk.without_components() reads more intuitively to me but I don't feel strongly

I'm trying (hard) to keep to the seemingly de-facto arrow standard of using past participles (I think that's what they're called?) for methods that take ownership, filter and return a new one.

jleibs · 2024-10-09T15:30:29Z

crates/store/re_chunk/src/slice.rs

+
+    /// Applies a [take] kernel to the [`Chunk`] as a whole.
+    ///
+    /// In release builds, indices are allowed to have null entries (they will be taken as `null`s).


What are the situations that cause us to query with null indices? Seems like returning a ChunkResult here and always making that an error condition would be preferable.

We don't, but this is technically part of the public Rust API, so I don't want to punish end users trying to do something that is perfectly valid and apparently well accepted in the broader ecosystem (whether its panics or results, they're both extremely annoying in these filter chains).

@jleibs

Support clear semantics in the dataframe API. Tombstones are never visible to end-users, only their effect. Like every other Dataframe v2 feature PR, and following recommendations from @jleibs, this prioritizes convenience of implementation over everything else, for now. All clear chunks are fetched, post-processed, and re-injected into the view contents during init(), and then the streaming join runs as usual after that. Static clear semantics can get pretty unhinged, but that's A) not specific to the dataframe API and B) so extremely niche that our time is better spent on real-world problems right now: - #7650 - #7631 --- - Fixes #7495 - Fixes #7414 - Fixes #7468 - Fixes #7493 - DNM: requires #7649

new and improved chunk slicing tools

742dc40

teh-cmc added enhancement New feature or request 🔍 re_query affects re_query itself include in changelog labels Oct 9, 2024

teh-cmc mentioned this pull request Oct 9, 2024

Dataframe v2: support for clear semantics #7652

Merged

6 tasks

zehiko approved these changes Oct 9, 2024

View reviewed changes

jleibs approved these changes Oct 9, 2024

View reviewed changes

more doc

8cfe25a

teh-cmc merged commit ab69022 into main Oct 10, 2024
30 checks passed

teh-cmc deleted the cmc/dataframev2_chunk_tools branch October 10, 2024 07:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataframe v2: new and improved chunk tools #7649

Dataframe v2: new and improved chunk tools #7649

teh-cmc commented Oct 9, 2024 •

edited by github-actions bot

Loading

jleibs Oct 9, 2024

teh-cmc Oct 10, 2024

jleibs Oct 9, 2024

teh-cmc Oct 10, 2024

Dataframe v2: new and improved chunk tools #7649

Dataframe v2: new and improved chunk tools #7649

Conversation

teh-cmc commented Oct 9, 2024 • edited by github-actions bot Loading

Checklist

jleibs Oct 9, 2024

Choose a reason for hiding this comment

teh-cmc Oct 10, 2024

Choose a reason for hiding this comment

jleibs Oct 9, 2024

Choose a reason for hiding this comment

teh-cmc Oct 10, 2024

Choose a reason for hiding this comment

teh-cmc commented Oct 9, 2024 •

edited by github-actions bot

Loading