Fulltext stager's cache behavior is in conflict with OpenDAL's cache layer #4594

waynexia · 2024-08-20T09:30:40Z

What type of bug is this?

Other

What subsystems are affected?

Datanode

Minimal reproduce step

create a table with fulltext column
write data
query data

What did you expect to see?

no duplicate ranges in cached files

What did you see instead?

too many duplications ranges like 0 ~ 20480, 4 ~ 20480

2G data files will consume ~60G disk cache

What operating system did you use?

ArchLinux AMD64

What version of GreptimeDB did you use?

nightly

Relevant log output and stack trace

No response

The text was updated successfully, but these errors were encountered:

zhongzc · 2024-09-10T12:15:36Z

In the S3 scenario, combining opendal's AsyncRead and AsyncSeek abstractions with object_store::layers::LruCacheLayer results in the LRU cache storing data from the seek point to the end of the file almost every time a seek + read operation is performed.

Improvement Plan:

Puffin and Index should abandon using AsyncRead + AsyncSeek as input sources. Instead, we should use RangeReader, which takes Range<u64> as read parameters. This way, it can adapt to opendal, accurately read the range from object store, and avoid caching unnecessary content.

waynexia added the C-bug Category Bugs label Aug 20, 2024

killme2008 assigned zhongzc Aug 23, 2024

zhongzc mentioned this issue Sep 10, 2024

[Tracking Issue] RangeReader as input source of Index #4717

Open

6 tasks

zhongzc linked a pull request Nov 4, 2024 that will close this issue

feat(puffin): apply range reader #4928

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fulltext stager's cache behavior is in conflict with OpenDAL's cache layer #4594

Fulltext stager's cache behavior is in conflict with OpenDAL's cache layer #4594

waynexia commented Aug 20, 2024

zhongzc commented Sep 10, 2024

Fulltext stager's cache behavior is in conflict with OpenDAL's cache layer #4594

Fulltext stager's cache behavior is in conflict with OpenDAL's cache layer #4594

Comments

waynexia commented Aug 20, 2024

What type of bug is this?

What subsystems are affected?

Minimal reproduce step

What did you expect to see?

What did you see instead?

What operating system did you use?

What version of GreptimeDB did you use?

Relevant log output and stack trace

zhongzc commented Sep 10, 2024