Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fulltext stager's cache behavior is in conflict with OpenDAL's cache layer #4594

Open
waynexia opened this issue Aug 20, 2024 · 1 comment
Open
Assignees
Labels
C-bug Category Bugs

Comments

@waynexia
Copy link
Member

What type of bug is this?

Other

What subsystems are affected?

Datanode

Minimal reproduce step

  • create a table with fulltext column
  • write data
  • query data

What did you expect to see?

no duplicate ranges in cached files

What did you see instead?

too many duplications ranges like 0 ~ 20480, 4 ~ 20480

2G data files will consume ~60G disk cache

What operating system did you use?

ArchLinux AMD64

What version of GreptimeDB did you use?

nightly

Relevant log output and stack trace

No response

@waynexia waynexia added the C-bug Category Bugs label Aug 20, 2024
@zhongzc
Copy link
Contributor

zhongzc commented Sep 10, 2024

In the S3 scenario, combining opendal's AsyncRead and AsyncSeek abstractions with object_store::layers::LruCacheLayer results in the LRU cache storing data from the seek point to the end of the file almost every time a seek + read operation is performed.

Improvement Plan:

Puffin and Index should abandon using AsyncRead + AsyncSeek as input sources. Instead, we should use RangeReader, which takes Range<u64> as read parameters. This way, it can adapt to opendal, accurately read the range from object store, and avoid caching unnecessary content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category Bugs
Projects
None yet
Development

No branches or pull requests

2 participants