Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quotes from doc comments are unnecessarily escaped #885

Open
andreymal opened this issue Sep 3, 2024 · 6 comments
Open

Quotes from doc comments are unnecessarily escaped #885

andreymal opened this issue Sep 3, 2024 · 6 comments
Labels
bug c: register Register classes, functions and other symbols to GDScript

Comments

@andreymal
Copy link
Contributor

I literally copied godot/tests/docs.rs into a new project and got this:

gdext-escaped-quotes

The quotes are "double"-escaped because of Rust string literals:

.flat_map(|doc| {
doc.into_iter().map(|x| {
x.to_string()
.trim_start_matches('r')
.trim_start_matches('#')
.trim_start_matches('"')
.trim_end_matches('#')
.trim_end_matches('"')
.to_string()
})
})

Can probably be fixed with litrs:

        .flat_map(|doc| {
            doc.into_iter().map(|x| {
                litrs::StringLit::parse(x.to_string()).unwrap().value().to_string()
            })
        })

but not sure adding a new dependency is a good idea

@Bromeon Bromeon added bug c: register Register classes, functions and other symbols to GDScript labels Sep 3, 2024
@Bromeon
Copy link
Member

Bromeon commented Sep 3, 2024

The discussion about litrs has come up before, not only for registered docs, but also for proc-macro attribute parsing. It's quite a lightweight crate, so if it helps simplify our code in many places, we can maybe consider it? 🤔

What didn't convince me last time was the suggestion to create extra intermediate representations for input literals -- if at all, litr types should just be used at the parsing stage, and then converted to canonical Rust ones (String, int, etc).

Maybe we should check more concretely which other places could benefit from it?

@Yarwin
Copy link
Contributor

Yarwin commented Sep 23, 2024

I believe you linked wrong snippet:

'"' => result.push_str("""),
. The one you linked is crudely filtered (with .filter(|x| x.get_single_path_segment().is_some_and(|x| x == "doc"))) to handle inline attributes such as:

    #[doc = r#"this is very documented"#]
    #[var]
    item: f32,

@andreymal
Copy link
Contributor Author

andreymal commented Sep 23, 2024

@Yarwin you've probably missed the comment before this snippet:

/// `///` is expanded to `#[doc = "…"]`.

The doc comments are expanded to regular, non-raw strings. This means that quotes inside these doc strings are always escaped with \ (not "), and the linked snippet handles them, so it is correct and your example is not relevant here.

Also, note that some other characters are also escaped. For example, try using a tab inside a doc string:

/// a	b

You'll get a\tb in Godot. This clearly demonstrates that xml_escape is absolutely unrelated here

@Yarwin
Copy link
Contributor

Yarwin commented Sep 23, 2024

You are right, sorry for doubting you! – it was a bit late.
Alternatively, we can just convert all the escaped characters to non escaped ones 🙃. It works albeit is a bit crude and unelegant. (very naive ofc) PoC:

image

pub fn unescape(escaped: &str) -> String {
    let mut unescaped_string = String::new();
    let mut iterator = escaped.chars();
    while let Some(c) = iterator.next() {
        match c {
            '\\' => match iterator.next() {
                Some('t') => {
                    unescaped_string.push_str(r#"   "#)
                },
                Some('"') => {
                    unescaped_string.push('"')
                },
                Some(other) => {
                    unescaped_string.push(c);
                    unescaped_string.push(other);
                }
                None => unescaped_string.push(c),
            },
            _ => unescaped_string.push(c)
        }
    }

    unescaped_string
}


()
                unescape(&x.to_string()
                    .trim_start_matches('r')
                    .trim_start_matches('#')
                    .trim_start_matches('"')
                    .trim_end_matches('#')
                    .trim_end_matches('"'))
            })

(obviously replace(r#"\""#, r#"""#) replace(r#"\t"#, r#" "#) works too, albeit is a bit slower)

@Bromeon
Copy link
Member

Bromeon commented Sep 23, 2024

Or we try with litrs 😉

@andreymal
Copy link
Contributor Author

we can just convert all the escaped characters to non escaped ones

This is what I originally planned to do, but then I opened The Rust Reference and got #[doc = "sc\x61r\u{0065}d"] 🙃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug c: register Register classes, functions and other symbols to GDScript
Projects
None yet
Development

No branches or pull requests

3 participants