Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idea: Allow concurrent stat in list #3097

Closed
Xuanwo opened this issue Sep 17, 2023 · 9 comments
Closed

idea: Allow concurrent stat in list #3097

Xuanwo opened this issue Sep 17, 2023 · 9 comments

Comments

@Xuanwo
Copy link
Member

Xuanwo commented Sep 17, 2023

RFC-2779 allows user to list with metakey. However, the stat inside list could make the list process much slower. We should allow concurrent stat during list so that stat could be send concurrently.

let l = op.lister_with(path).metakey(Metakey::ContentLength).concurrent(10).await?;
@morristai
Copy link
Member

Hi @Xuanwo , by "allow concurrent stat during list", do you mean we need to send a stat call to fetch the metadata if the metadata is unknown, and this should be concurrent, but currently, it's sequential, right?

@Xuanwo
Copy link
Member Author

Xuanwo commented Nov 8, 2023

do you mean we need to send a stat call to fetch the metadata if the metadata is unknown, and this should be concurrent, but currently, it's sequential, right?

Yep

@morristai
Copy link
Member

Hi @Xuanwo, I want to look into this issue. Can you assign me to this?

@Xuanwo
Copy link
Member Author

Xuanwo commented Nov 10, 2023

Hi @Xuanwo, I want to look into this issue. Can you assign me to this?

This will introduce a change to our public API, would you like to start an RFC first?

@morristai
Copy link
Member

Should changing our public API be a necessity? My current thought is to utilize Semaphore and JoinSet to concurrently execute the stat API. Because JoinSet::spawn() will execute the task without requiring a call to await(). As a result, all the tasks that the JoinSet holds will be triggered before calling self.handlers.join_next().await inside poll_next() to obtain the result. Please enlighten me if I miss something 🫣.

@Xuanwo
Copy link
Member Author

Xuanwo commented Nov 10, 2023

My current thought is to utilize Semaphore and JoinSet to concurrently execute the stat API. Because JoinSet::spawn() will execute the task without requiring a call to await(). As a result, all the tasks that the JoinSet holds will be triggered before calling self.handlers.join_next().await inside poll_next() to obtain the result. Please enlighten me if I miss something 🫣.

I have some questions in mind:

  • Our users care about not only performance but also memory usage. How can they control the concurrency inside? For example, how to control the Semaphore permits?
  • Can we avoid using Semaphore and JoinSet which introduces extra cost? I didn't take a closer look into this feature, but I thought we could store those features directly?
  • Is this implementaion zero cost to users who don't want to do concurrent stat?
  • How to implement a similar logic to blocking API?

@morristai
Copy link
Member

morristai commented Nov 12, 2023

Hi @Xuanwo, to answer your question, I conducted a small POC to demonstrate the concept, which can be found at #3565. I'd like to ask about your typical approach to benchmarking and testing performance in various scenarios?

  • Our users care about not only performance but also memory usage. How can they control the concurrency inside? For example, how to control the Semaphore permits?
    A: the Semaphore permit is the concurrent value itself.
  • Is this implementaion zero cost to users who don't want to do concurrent stat?
    A: If we can set a toggle switch to turn on if concurrent value is > 1?
  • How to implement a similar logic to blocking API?
    A: I have no clue right now.

@Xuanwo
Copy link
Member Author

Xuanwo commented Nov 12, 2023

  • A: the Semaphore permit is the concurrent value itself.

So there is a public API change introduced, we need an RFC. I don't want to delve into the implementation details too early.

@Xuanwo
Copy link
Member Author

Xuanwo commented Nov 23, 2023

Thanks to @morristai's bravo work!

@Xuanwo Xuanwo closed this as completed Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants