Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fuse_snapshot only show the latest snapshot for external table #15721

Merged
merged 4 commits into from
Jun 4, 2024

Conversation

dantengsky
Copy link
Member

@dantengsky dantengsky commented Jun 3, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Before this PR, during the traversal of the table history, the application-level data operator is used. However, this approach fails to load the snapshot history of external tables because their storage parameters differ from those of the application-level data operator.

The confusing part is that during the process of writing a new snapshot, we also generate the snapshot cache. Therefore, on the node where data is written to the external table, despite the aforementioned issue, the traversal of the table history snapshot still works because the snapshot can be retrieved from the cache.

However, on other nodes that do not have the snapshot cache mentioned above, the issue persists. This results in fuse_snapshot only being able to return the most recent snapshot (the corresponding table snapshot cache is populated during table instantiation).

  • fix

    • use table's data operator while traversing table history
    • use table's data operator in index refreshing
  • refactors:

    • rename TableContext::get_data_operator to get_application_level_data_operator

      clarify that the data operator got from ctx is NOT Table's operator

    • de-dup code by introducing shared_table::save_share_table_info

  • tweak CI script to test with table meta cache disabled

  • Fixes bug: table function fuse_snapshot only show the latest snapshot for external table  #15724

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

- rename `TableContext::get_data_operator` to `get_application_level_data_operator`

  the clarify the the operator got from ctx is NOT Table's operator

- dedup code by introducing `shared_table::save_share_table_info`
@github-actions github-actions bot added the pr-bugfix this PR patches a bug in codebase label Jun 3, 2024
@dantengsky dantengsky marked this pull request as ready for review June 3, 2024 16:36
@dantengsky dantengsky changed the title fix: table function fuse_snapshot only show the latest snapshot for external table fix:fuse_snapshot only show the latest snapshot for external table Jun 3, 2024
@dantengsky dantengsky changed the title fix:fuse_snapshot only show the latest snapshot for external table fix: fuse_snapshot only show the latest snapshot for external table Jun 3, 2024
@BohuTANG BohuTANG merged commit 440eead into databendlabs:main Jun 4, 2024
86 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-bugfix this PR patches a bug in codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: table function fuse_snapshot only show the latest snapshot for external table
2 participants