Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix commenting on non-utf8 encoded files #11916

Merged

Conversation

zeripath
Copy link
Contributor

@zeripath zeripath commented Jun 16, 2020

The original implementation of models.Comment.Patch maps this to an SQL TEXT field. This works fine when the line being commented on is UTF-8 however, if there are non-UTF-8 characters this fails on MySQL and PostgreSQL databases.

This PR changes this field to quote it if necessary.

Signed-off-by: Andrew Thornton [email protected]

@zeripath zeripath added type/bug pr/wip This PR is not ready for review labels Jun 16, 2020
@zeripath zeripath added this to the 1.13.0 milestone Jun 16, 2020
@zeripath
Copy link
Contributor Author

zeripath commented Jun 16, 2020

Finally! This PR now replicates the errors seen in #7270 & #10393

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jun 16, 2020
@zeripath zeripath force-pushed the fix-7270-use-blob-for-comment-patch branch from 9c71bfa to 77cb452 Compare June 17, 2020 12:13
@zeripath zeripath changed the title WIP: Add comment on non-unicode line to force fail Fix commenting on non-utf8 encoded files Jun 17, 2020
models/migrations/v143.go Outdated Show resolved Hide resolved
models/migrations/v143.go Outdated Show resolved Hide resolved
models/issue_comment.go Outdated Show resolved Hide resolved
models/migrations/v143.go Outdated Show resolved Hide resolved
@zeripath
Copy link
Contributor Author

Hmm... seems like Blob won't work without adding functions to convert between blob and text.

In which case we may as well keep it as TEXT and just escape the bad chars.

models/issue_comment.go Outdated Show resolved Hide resolved
@lafriks
Copy link
Member

lafriks commented Jun 17, 2020

You should change struct Patch to byte array instead of string

@zeripath zeripath force-pushed the fix-7270-use-blob-for-comment-patch branch from 0d9611e to cb9374d Compare June 17, 2020 19:26
@zeripath
Copy link
Contributor Author

I think the simplest thing is to shadow the Patch column and quote/unquote as necessary.

@zeripath zeripath removed the pr/wip This PR is not ready for review label Jun 17, 2020
@zeripath zeripath force-pushed the fix-7270-use-blob-for-comment-patch branch from d7c5f4f to 015f9ce Compare June 17, 2020 20:27
@lafriks
Copy link
Member

lafriks commented Jun 17, 2020

Can't we just convert to UTF-8 when saving if it is not valid UTF-8 like we do it for displaying with detect encoding?

@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Jun 18, 2020
@zeripath
Copy link
Contributor Author

So I've thought about that but:

a) We don't have a long enough piece of text to detect the encoding and we can't assume that we will always be enable to detect the encoding.
b) How do you demonstrate changes of encoding? It's entirely possible that one side of the change has a different encoding to the other.
c) That breaks the patch - admittedly we don't use the patch at present except for demonstration but I am working on something at present that might use the patch.

Keeping the broken bytes is really better as we can use them to improve our rendering in future.

Copy link
Member

@6543 6543 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to fix it this is totaly ok - we should just konsiddert to refactor&improve this at some point

@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Jun 18, 2020
@6543
Copy link
Member

6543 commented Jun 18, 2020

🚀

@techknowlogick techknowlogick merged commit 654a970 into go-gitea:master Jun 18, 2020
@6543
Copy link
Member

6543 commented Jun 18, 2020

I'll send a backport

6543 pushed a commit to 6543-forks/gitea that referenced this pull request Jun 18, 2020
* Add comment on non-unicode line to force fail

Signed-off-by: Andrew Thornton <[email protected]>

* Just quote/unquote patch

Signed-off-by: Andrew Thornton <[email protected]>
@zeripath zeripath deleted the fix-7270-use-blob-for-comment-patch branch June 18, 2020 14:14
@techknowlogick techknowlogick added the backport/done All backports for this PR have been created label Jun 18, 2020
lafriks pushed a commit that referenced this pull request Jun 18, 2020
* Add comment on non-unicode line to force fail

Signed-off-by: Andrew Thornton <[email protected]>

* Just quote/unquote patch

Signed-off-by: Andrew Thornton <[email protected]>

Co-authored-by: zeripath <[email protected]>
ydelafollye pushed a commit to ydelafollye/gitea that referenced this pull request Jul 31, 2020
* Add comment on non-unicode line to force fail

Signed-off-by: Andrew Thornton <[email protected]>

* Just quote/unquote patch

Signed-off-by: Andrew Thornton <[email protected]>
@go-gitea go-gitea locked and limited conversation to collaborators Nov 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/done All backports for this PR have been created lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants