Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong unicode char reporting. #23682

Closed
lunny opened this issue Mar 24, 2023 · 12 comments · Fixed by #28454
Closed

Wrong unicode char reporting. #23682

lunny opened this issue Mar 24, 2023 · 12 comments · Fixed by #28454
Labels

Comments

@lunny
Copy link
Member

lunny commented Mar 24, 2023

image

@lunny lunny added the type/bug label Mar 24, 2023
@wolfogre
Copy link
Member

It is a Chinese unicode , my typo.

@lunny lunny reopened this Mar 24, 2023
@silverwind
Copy link
Member

I agree, this detection is still far too agressive. It should only warn on actual suspicious stuff, like RTL characters and such, not "uncommon" punctations.

@wxiaoguang
Copy link
Contributor

Also agree that "the detection is too aggressive"

At least, I think we can make the detection skip the Markdown/HTML/XML/etc types. That detection only makes sense for source code files.

@silverwind
Copy link
Member

silverwind commented Jun 5, 2023

I would prefer a solution that adds a config option ui.CHARACTER_HIGHLIGHT=bidi,ambiguous that triggers on certain character classes:

  1. bidi: Nefarious BIDI characters like RTL Override character. I think this list on Wikipedia is accurate.
  2. ambiguous: Also highlight ambiguous characters like non-standard whitespace and puncation.

The default should be bidi to cover CVE-2021-42574, but admins could also set it to empty string to disable the checking completely.

@silverwind
Copy link
Member

At least, I think we can make the detection skip the Markdown/HTML/XML/etc types. That detection only makes sense for source code files.

No, don't skip any files. BIDI exploits can also be in the filename. From the linked page:

when using it in the file name my-text.'U+202E'cod.exe, the file name is actually displayed as my-text.exe.doc

@wxiaoguang
Copy link
Contributor

I think it worries too much.

https://github.com/wxiaoguang/playground/wiki/test-202e

$ xxd playground.wiki/test-202e.md
00000000: 6162 63e2 80ae 7879 7a                   abc...xyz

image

@silverwind
Copy link
Member

silverwind commented Jun 5, 2023

GitHub only does it for code. I guess that is where it matters the most. So we could do two options with these defaults:

[ui]
CHARACTER_HIGHLIGHT=bidi
CHARACTER_HIGHLIGHT_INCLUDE_MARKUP=false

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Jun 5, 2023

GitHub only does it for code.

Yup, for "source code" only. For markdown in code repo:

image

GitHub only does it for "source code", not for rendered contents.


Update: it seems that the "display invisible char" button on the github page is broken .... at least it doesn't work for me 😂

@silverwind
Copy link
Member

Yeah, that's what I meant, source code, e.g. stuff that renders in a monospace font 😉.

@jirutka
Copy link

jirutka commented Jan 5, 2024

It is a Chinese unicode ’, my typo.

No, it’s not; U+2019 (’) is Right Single Quotation Mark. This is the true (typographic) single quotation mark, unlike U+0027 Apostrophe (').

This got my attention because of this nonsensual and stupid PR in one of my repositories replacing correct characters with their plainpoor ASCII alternatives. Please remove this ignorant “feature”.

@techknowlogick
Copy link
Member

@jirutka there is a a recently merged config option to disable this highlighting. As for removing it (or disabling by default), due to it being a claimed security issue it is required to have, otherwise we risk having a CVE opened that'll apply to all versions (and ones in the future). If that happens then that prevents many people from running the software (there are many places that can't run software with known CVEs regardless of context, sometimes this is legislated by law too).

@go-gitea go-gitea locked and limited conversation to collaborators Jan 5, 2024
@wxiaoguang
Copy link
Contributor

You could set this option: [ui].AMBIGUOUS_UNICODE_DETECTION=false (Gitea >= 1.21.3)

image

fuxiaohei pushed a commit to fuxiaohei/gitea that referenced this issue Jan 17, 2024
silverwind pushed a commit to silverwind/gitea that referenced this issue Feb 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants