Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of   by rustdoc makes APIs not copyable into Rust source code #106098

Closed
dtolnay opened this issue Dec 23, 2022 · 5 comments · Fixed by #107615
Closed

Use of   by rustdoc makes APIs not copyable into Rust source code #106098

dtolnay opened this issue Dec 23, 2022 · 5 comments · Fixed by #107615
Labels
regression-untriaged Untriaged performance or correctness regression. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.

Comments

@dtolnay
Copy link
Member

dtolnay commented Dec 23, 2022

Suppose we have this trait:

// lib.rs

pub trait Trait {
    fn method<T>()
    where
        T: Default;
}

Run cargo doc; it renders as:

Now let's begin writing an impl of this trait:

struct MyStruct;

impl Trait for MyStruct {
}

and copy in the method method signature directly from the documentation:

struct MyStruct;

impl Trait for MyStruct {
    fn method<T>()
    where
        T: Default {}
}

We get this distressing error from cargo check:

error: unknown start of token: \u{a0}
  --> src/main.rs:11:1
   |
11 |     where
   | ^
   |
help: Unicode character ' ' (No-Break Space) looks like ' ' (Space), but it is not
   |
11 |     where
   | +

error: unknown start of token: \u{a0}
  --> src/main.rs:11:2
   |
11 |     where
   |  ^
   |
help: Unicode character ' ' (No-Break Space) looks like ' ' (Space), but it is not
   |
11 |     where
   |  +

error: unknown start of token: \u{a0}
  --> src/main.rs:11:3
   |
11 |     where
   |   ^
   |
help: Unicode character ' ' (No-Break Space) looks like ' ' (Space), but it is not
   |
11 |     where
   |   +

error: unknown start of token: \u{a0}
  --> src/main.rs:11:4
   |
11 |     where
   |    ^
   |
help: Unicode character ' ' (No-Break Space) looks like ' ' (Space), but it is not
   |
11 |     where
   |    +

error: unknown start of token: \u{a0}
  --> src/main.rs:12:1
   |
12 |         T: Default {}
   | ^
   |
help: Unicode character ' ' (No-Break Space) looks like ' ' (Space), but it is not
   |
12 |         T: Default {}
   | +

error: unknown start of token: \u{a0}
  --> src/main.rs:12:2
   |
12 |         T: Default {}
   |  ^
   |
help: Unicode character ' ' (No-Break Space) looks like ' ' (Space), but it is not
   |
12 |         T: Default {}
   |  +

error: unknown start of token: \u{a0}
  --> src/main.rs:12:3
   |
12 |         T: Default {}
   |   ^
   |
help: Unicode character ' ' (No-Break Space) looks like ' ' (Space), but it is not
   |
12 |         T: Default {}
   |   +

error: unknown start of token: \u{a0}
  --> src/main.rs:12:4
   |
12 |         T: Default {}
   |    ^
   |
help: Unicode character ' ' (No-Break Space) looks like ' ' (Space), but it is not
   |
12 |         T: Default {}
   |    +

error: unknown start of token: \u{a0}
  --> src/main.rs:12:5
   |
12 |         T: Default {}
   |     ^
   |
help: Unicode character ' ' (No-Break Space) looks like ' ' (Space), but it is not
   |
12 |         T: Default {}
   |     +

error: unknown start of token: \u{a0}
  --> src/main.rs:12:6
   |
12 |         T: Default {}
   |      ^
   |
help: Unicode character ' ' (No-Break Space) looks like ' ' (Space), but it is not
   |
12 |         T: Default {}
   |      +

error: unknown start of token: \u{a0}
  --> src/main.rs:12:7
   |
12 |         T: Default {}
   |       ^
   |
help: Unicode character ' ' (No-Break Space) looks like ' ' (Space), but it is not
   |
12 |         T: Default {}
   |       +

error: unknown start of token: \u{a0}
  --> src/main.rs:12:8
   |
12 |         T: Default {}
   |        ^
   |
help: Unicode character ' ' (No-Break Space) looks like ' ' (Space), but it is not
   |
12 |         T: Default {}
   |        +

Hopefully there is some other way rustdoc would be able to display the same content in a way that doesn't suffer this limitation.

Is this just as simple as using white-space: pre; in the CSS and then writing ordinary spaces instead of &nbsp;? If so, that seems definitely worthwhile doing.

@dtolnay dtolnay added the T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. label Dec 23, 2022
@dtolnay
Copy link
Member Author

dtolnay commented Dec 23, 2022

For now the best workaround I've found is to comment out the whole copied code using a block comment:

/*
impl Trait for MyStruct {
    fn method<T>()
    where
        T: Default {}
}
*/

Then run cargo fmt which will delete all the U+00A0 characters (unclear whether this behavior is a rustfmt bug):

/*
impl Trait for MyStruct {
    fn method<T>()
where
T: Default {}
}
*/

Then you can uncomment that code:

impl Trait for MyStruct {
    fn method<T>()
where
T: Default {}
}

and finally run rustfmt again to get normal spaces:

impl Trait for MyStruct {
    fn method<T>()
    where
        T: Default,
    {
    }
}

@est31
Copy link
Member

est31 commented Dec 23, 2022

The behaviour in rustdoc is pretty old, it occurs at least since 1.12 (haven't checked further back).

What is new though is the behaviour of Firefox to translate &nbsp; to U+00A0. They changed this recently in version 103, mentioned in the release notes, and the bugzilla issue is here. There are years-old comments however saying that Chrome had similar behaviour.

@dtolnay
Copy link
Member Author

dtolnay commented Dec 24, 2022

That lines up. Yes I am using Firefox, and have been copying code from rustdoc rendered documentation for years, and don't remember it being a problem until recently.

Would it be worth triaging this as a regression, due to the recent regression in user experience, despite not being caused by a code change in rustdoc?

@dtolnay dtolnay added the regression-untriaged Untriaged performance or correctness regression. label Dec 24, 2022
@est31
Copy link
Member

est31 commented Dec 24, 2022

Would it be worth triaging this as a regression, due to the recent regression in user experience, despite not being caused by a code change in rustdoc?

No idea really, IIRC the main purpose of those labels is to track regressions that rustc changes caused. Chromium users had the behaviour for 4+ years, maybe even since Chromium existed, so for them this is not a regression, but been the status quo.

This doesn't mean that this isn't a rustc side bug though, copying code from generated rustdoc should work. I've done it in the past myself. I usually only copy single lines however, and don't copy the indentation, so I didn't notice this change.

@est31
Copy link
Member

est31 commented Dec 24, 2022

More firefox issue threads:

bors added a commit to rust-lang-ci/rust that referenced this issue Jan 16, 2023
Emit only one nbsp error per file

Fixes rust-lang#106101.

See rust-lang#106098 for an explanation of how someone would end up with a large number of these nbsp characters in their source code, which is why I think rustc needs to handle this specific case in a friendlier way.
bors added a commit to rust-lang/miri that referenced this issue Jan 23, 2023
Emit only one nbsp error per file

Fixes #106101.

See rust-lang/rust#106098 for an explanation of how someone would end up with a large number of these nbsp characters in their source code, which is why I think rustc needs to handle this specific case in a friendlier way.
@bors bors closed this as completed in f7210b3 Feb 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
regression-untriaged Untriaged performance or correctness regression. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
2 participants