Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: further separate CLI logic from the API related functionality (see #117) #124

Open
wants to merge 15 commits into
base: v3
Choose a base branch
from

Conversation

Rolv-Apneseth
Copy link

Continuation of #117.

As discussed, some refactors. You will probably want some more done here and that's OK, just let me know what to do.

One thing for sure is the String values in api/check.rs - which ones exactly would you like me to make &str or Cow<'_, str>?

Also, for cli/mod.rs, I could potentially remove the need for the manual call of cmd.execute for each subcommand using the enum_dispatch crate if you're OK with adding a dependency? Fine as it is anyway though in my opinion, as there aren't many subcommands.

Copy link

codspeed-hq bot commented Oct 5, 2024

CodSpeed Performance Report

Merging #124 will not alter performance

Comparing Rolv-Apneseth:refactor-v3 (b863f36) with v3 (a7247e4)

Summary

✅ 6 untouched benchmarks

@jeertmans
Copy link
Owner

Hi @Rolv-Apneseth! I have quickly looked at your PR, it looks very good! I will give it a better look later, but let me answer your first questions.

One thing for sure is the String values in api/check.rs - which ones exactly would you like me to make &str or Cow<'_, str>?

Well, I think &str should be enough, because we will never need to mutate (i.e., overwrite) the text strings, so keeping a reference should be fine. However, If I remember well, Clap makes this hard as you don't get reference to string read from the terminal, but owned string. So parsing text (either pure text or data annotation) from the terminal will required to be able to pass owned data, I fear. Using borrowed data is only useful for when reading files, as we may want to avoid allocating the full content of the files, and read by chunks instead.

So in conclusion, Cow<'_, str> may be the only solution (apart from String).

Also, for cli/mod.rs, I could potentially remove the need for the manual call of cmd.execute for each subcommand using the enum_dispatch crate if you're OK with adding a dependency? Fine as it is anyway though in my opinion, as there aren't many subcommands.

We can give it a try :-)
Dependencies should be fine, especially as we can keep it under the cli feature.

@Rolv-Apneseth
Copy link
Author

I have quickly looked at your PR, it looks very good! I

Great!

Well, I think &str should be enough, because we will never need to mutate (i.e., overwrite) the text strings, so keeping a reference should be fine. However, If I remember well, Clap makes this hard as you don't get reference to string read from the terminal, but owned string. So parsing text (either pure text or data annotation) from the terminal will required to be able to pass owned data, I fear. Using borrowed data is only useful for when reading files, as we may want to avoid allocating the full content of the files, and read by chunks instead.

Sorry, I meant which things in check.rs you actually wanted converted, but from this I'm assuming it's the Request.text? But I am confused how this would be achieved. Do we not need to allocate a String no matter what the source is? As for reading the file by chunks, do you want me to just make this use a BufReader and create and await requests one at a time instead of the way it is currently?

            let text = std::fs::read_to_string(filename)?;
            let requests = request
                .clone()
                .with_text(text.clone())
                .split(self.max_length, self.split_pattern.as_str());
            let response = server_client.check_multiple_and_join(requests).await?;

And for the text.clone() caused by server_client.check_multiple_and_join, I could make that function just return a ResponseWithContext instead of having it convert to Response automatically like it's doing, so we can pass the owned string in and get it back out of the response.

We can give it a try :-)

I've added it in there - I think it fits nicely.

@Rolv-Apneseth
Copy link
Author

I could make that function just return a ResponseWithContext instead of having it convert to Response automatically like it's doing, so we can pass the owned string in and get it back out of the response.

I've pushed some changes there for this, avoiding the need to clone the input text.

@jeertmans
Copy link
Owner

Sorry, I meant which things in check.rs you actually wanted converted, but from this I'm assuming it's the Request.text?

Yes, Request.text and Request.data's inner fields. Mainly because one could create a vector of data annotations from a stream of Markdown tokens, which are usually &str of the initial source.

But I am confused how this would be achieved.

I think using Cow<'_, str> should make the trick.

Do we not need to allocate a String no matter what the source is?

Yes and no. Yes because reqwest will have to allocate a string for the HTML request, but no because we don't actually need the user to provide us an owned String. We don't care because reqwest will allocate anyway. So it's good to provide flexibility.

As for reading the file by chunks, do you want me to just make this use a BufReader and create and await requests one at a time instead of the way it is currently?

            let text = std::fs::read_to_string(filename)?;
            let requests = request
                .clone()
                .with_text(text.clone())
                .split(self.max_length, self.split_pattern.as_str());
            let response = server_client.check_multiple_and_join(requests).await?;

Yes, this would be great. The split-text logic isn't really trivial to solve, because we want to avoid splitting text where it would break the meaning of a sentence, but also we cannot have long-running text either. I think working on a better split-logic can be a next step. For the moment, we can just use BufReader and read the whole file, and feed the server with multiple requests (if text is too large). We could also read the file by chunks, but I think this is a fair assumption to assume that any file we want to check should fit completely in the memory.

And for the text.clone() caused by server_client.check_multiple_and_join, I could make that function just return a ResponseWithContext instead of having it convert to Response automatically like it's doing, so we can pass the owned string in and get it back out of the response.

Yes, the end-goal would be to remove the complex logic from Server and perform multiple requests merging outside, so the server only has one method per HTTP request, nothing more.

@Rolv-Apneseth
Copy link
Author

Hey again. Had a lot of trouble with lifetimes and I could only get 'static to work - not sure I love the whole solution, but if you havea look at the latest commit, is that what you had in mind?

@jeertmans
Copy link
Owner

Hello @Rolv-Apneseth, rapidly looking at your commit, I think you just forgot to add the lifetime parameter in Request -> Request<'source> (I would prefer using 'source over anonymous lifetime names like 'a). This should explain why you could only use 'static.

However, it makes little sense to have reference in responses, because reqwest::RequestBuilder::send returns an owned Response. We can keep owned String in that case :-)

This will replicate a similar behavior to that of reqwest. E.g., the builder takes a reference to be constructed pub fn json<T: Serialize + ?Sized>(self, json: &T) -> RequestBuilder, but the response returns an owned struct: pub async fn json<T: DeserializeOwned>(self) -> Result<T>.

.as_ref()
.ok_or(Error::InvalidRequest("missing text field".to_string()))?;
pub fn try_split(mut self, n: usize, pat: &str) -> Result<Vec<Self>> {
let text = mem::take(&mut self.text)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, one goal would be that try_split returns a reference to &self, so the signature should be as follows:

pub fn try_split<'source>(&'source self, n: usize, pat: &str) -> Result<Vec<Self<'source>>>

Note that we should not need to mutate the actual request, because we only take references.

@Rolv-Apneseth
Copy link
Author

I can have a go again but something was complaining about needing to live longer than 'static. As for Response, that makes sense.

@jeertmans
Copy link
Owner

I can have a go again but something was complaining about needing to live longer than 'static. As for Response, that makes sense.

No issue. If you can't make it compile, just push with 'source, and I will try to fix it myself :-)

@Rolv-Apneseth
Copy link
Author

So this code:

#[cfg_attr(feature = "cli", derive(Args))]
#[derive(Clone, Debug, PartialEq, Eq, Serialize, Hash)]
#[serde(rename_all = "camelCase")]
#[non_exhaustive]
pub struct Request<'source> {
    /// The text to be checked. This or 'data' is required.
    #[cfg_attr(
        feature = "cli",
        clap(short = 't', long, conflicts_with = "data", allow_hyphen_values(true))
    )]
    #[serde(skip_serializing_if = "Option::is_none")]
    pub text: Option<Cow<'source, str>>,

Leads to this error:

error: lifetime may not live long enough
   --> src/api/check.rs:398:15
    |
391 | pub struct Request<'source> {
    |                    ------- lifetime `'source` defined here
...
398 |     pub text: Option<Cow<'source, str>>,
    |               ^^^^^^ requires that `'source` must outlive `'static`

error: lifetime may not live long enough
   --> src/api/check.rs:392:5
    |
391 | pub struct Request<'source> {
    |                    ------- lifetime `'source` defined here
392 |     /// The text to be checked. This or 'data' is required.
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ argument requires that `'source` must outlive `'static`
    |
    = note: this error originates in the macro `clap::value_parser` (in Nightly builds, run with -Z macro-backtrace for more info)

So I don't think it will work as it is. Should I take out clap from here and have a separate struct for the request arguments?

@jeertmans
Copy link
Owner

^ requires that 'source must outlive 'static

Oh yeah, I remember why I struggled with &str back then ^^'

Creating a separate struct would work, but that means duplicating a lot of the code. I wonder if there is still a way to allow 'source to take 'static in the case of App::parse, but not in general... This isn't as easy as I imagined

@Rolv-Apneseth
Copy link
Author

I wonder if there is still a way to allow 'source to take 'static in the case of App::parse, but not in general

Not that I can think of anyway. I'd say the options are to either stick with String, use 'static, or a different struct. For the different struct, could an impl From keep fields in sync at least?

@jeertmans
Copy link
Owner

I wonder if there is still a way to allow 'source to take 'static in the case of App::parse, but not in general

Not that I can think of anyway. I'd say the options are to either stick with String, use 'static, or a different struct. For the different struct, could an impl From keep fields in sync at least?

Ok so some news: at the moment, it seems impossible to do that in one struct, see clap-rs/clap#5773 and the related discussion, so I suggest we duplicated the structure: to original Request, with Cow<'source, str>, and the clap-compatible RequestCli, with String. Maybe we can simply avoid duplicating code with a macro rule. If you don't see how to do that, I think I can put have some time during the weekend for this.

@Rolv-Apneseth
Copy link
Author

I think it would have to be a procedural macro right? Can't see how to do that with a declarative one.

I have a bit of time today so I'll have a look but feel free to do it if you have a good idea of how it should be done

@jeertmans
Copy link
Owner

I actually started some work on your branch today, I will hopefully have something working this afternoon :)

@Rolv-Apneseth
Copy link
Author

Oh nice, good luck

@jeertmans
Copy link
Owner

Ok so it compiles with --no-default-features (without CLI), but I fear it isn't possible to conditionally compile the #[deriver(Parser)] based on lifetime $lt == 'static.

I am looking a possible crates that might provide useful tools to avoid rewriting most of the code.

@jeertmans
Copy link
Owner

Unfortunately, I haven't been able to put more work on this PR this weekend, sorry for that.

I fear more and more that the copy/paste option would be the only way of solving this.

@Rolv-Apneseth
Copy link
Author

Hey - been busy moving this weekend. If I have any time in the coming days I can have a look for any crates that could solve this problem - if not, yes, copy paste might have to do for now

@jeertmans
Copy link
Owner

Hey - been busy moving this weekend. If I have any time in the coming days I can have a look for any crates that could solve this problem - if not, yes, copy paste might have to do for now

No issue! Yeah, maybe let's copy and paste (keeping the 'source lifetime for the api/click.rs and use String in cli/checks.rs). The "easiest" solution would be to fix the upstream issue, i.e., do a PR to Clap's repository, but that I will only look at after this month,

@Rolv-Apneseth
Copy link
Author

Yeah couldn't find any crates for this situation.

I do think just copying the struct is better but to provide another option, what do you think of something like this:

pub struct Request<'source> {
    // HACK: keep compiler happy when enabling the cli feature flag
    #[cfg(feature = "cli")]
    #[clap(skip)]
    _lifetime: PhantomData<&'source ()>,
    /// (...docs)
    #[cfg(feature = "cli")]
    #[clap(short = 't', long, conflicts_with = "data", allow_hyphen_values(true))]
    #[serde(skip_serializing_if = "Option::is_none")]
    pub text: Option<Cow<'static, str>>,
    /// (...docs)
    #[cfg(not(feature = "cli"))]
    #[serde(skip_serializing_if = "Option::is_none")]
    pub text: Option<Cow<'source, str>>,
    /// (...docs)
    #[cfg(feature = "cli")]
    #[clap(short = 'd', long, conflicts_with = "text")]
    #[serde(skip_serializing_if = "Option::is_none")]
    pub data: Option<Data<'static>>,
    /// (...docs)
    #[cfg(not(feature = "cli"))]
    #[serde(skip_serializing_if = "Option::is_none")]
    pub data: Option<Data<'source>>,

As for the upstream issue - I had a look but I believe it's beyond my skills. Needs quite a bit of familiarity with how their derive macros work, and the author seems to agree having assigned it that E-hard tag. If you want to hold this off until you have a look at making a PR yourself though, that would be fine by me.

@jeertmans
Copy link
Owner

e struct is better but to provide another option,

This is not a wrong idea, but the issue is that compiling with cli will then enforce 'static lifetime everywhere. Which is something we want to avoid even in the CLI, when we read text from files with ltrs check FILES...

This is an edge case, but we don't want to have to keep a static lifetime reference to the content of each file, where we don't need it. So copying the struct is the only we to both have Clap happy and still be able to use non-static lifetime when desired.

As for the upstream issue - I had a look but I believe it's beyond my skills. Needs quite a bit of familiarity with how their derive macros work, and the author seems to agree having assigned it that E-hard tag. If you want to hold this off until you have a look at making a PR yourself though, that would be fine by me.

I agree, this is not easy, especially as it involves proc macros!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants