Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear field meta data during reading so that it can be processed #2949

Closed
wants to merge 1 commit into from

Conversation

niyue
Copy link
Contributor

@niyue niyue commented Sep 28, 2024

This PR aims to demonstrate the issue described in LanceDB/lance#2947, where a dataset containing a nested field with associated metadata can be written successfully but cannot be read correctly by Lance.

In this PR, I propose a quick fix that clears the field metadata when reading from the manifest. However, this approach doesn't seem to be a perfect solution to the issue. I also attempted to remove the field metadata right before the decode method’s StructArray::try_new call, but there appear to be multiple places with similar issues, so I opted to clear the metadata once when reading the field from the manifest instead.

It would be great if we have better solution to this issue. Thanks.

Copy link

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

let mut batch_stream = result.map(|b| b);
let mut batches = Vec::new();
while let Some(batch) = batch_stream.next().await {
assert!(batch.is_ok());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error occurs at this point, but for some reason, the exact error message is not reported by cargo test (it crashes without providing useful information). However, if the test case is placed in lance/examples/write_read_ds.rs, the error message described in issue #2947 will be displayed.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 98.57143% with 1 line in your changes missing coverage. Please review.

Project coverage is 79.01%. Comparing base (681db8c) to head (363e55d).

Files with missing lines Patch % Lines
rust/lance/src/dataset.rs 98.48% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2949      +/-   ##
==========================================
+ Coverage   78.99%   79.01%   +0.02%     
==========================================
  Files         234      234              
  Lines       72822    72890      +68     
  Branches    72822    72890      +68     
==========================================
+ Hits        57525    57596      +71     
+ Misses      12332    12325       -7     
- Partials     2965     2969       +4     
Flag Coverage Δ
unittests 79.01% <98.57%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@niyue
Copy link
Contributor Author

niyue commented Sep 29, 2024

Close this PR in favor of #2950

@niyue niyue closed this Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants