Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix utf8 length handling for shellwords #5738

Merged
merged 1 commit into from
Feb 1, 2023

Conversation

trink
Copy link
Contributor

@trink trink commented Jan 30, 2023

If the last argument to shellwords ends in a multibyte utf8 character the entire argument will be dropped.
e.g. :sh echo test1 test2𒀀 will only output test1

@trink trink mentioned this pull request Jan 30, 2023
@grasshorse
Copy link

I see it... thanks.

@the-mikedavis the-mikedavis added C-bug Category: This is a bug A-core Area: Helix core improvements S-waiting-on-review Status: Awaiting review from a maintainer. labels Jan 31, 2023
Copy link
Member

@pascalkuthe pascalkuthe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Mixed up utf-8 and utf-32 again :D


#[test]
fn test_multibyte_at_end() {
assert_eq!(Shellwords::from("𒀀").parts(), &["𒀀"]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked your examples in the PR description, let's add cases for them to make sure we don't have regressions in the future 👍

Suggested change
assert_eq!(Shellwords::from("𒀀").parts(), &["𒀀"]);
assert_eq!(Shellwords::from("𒀀").parts(), &["𒀀"]);
assert_eq!(Shellwords::from(":sh echo 𒀀").parts(), &[":sh", "echo", "𒀀"]);
assert_eq!(Shellwords::from(":sh echo hello𒀀").parts(), &[":sh", "echo", "hello𒀀"]);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the change as a separate commit as it is generally cleaner to review (as opposed to an amend). I can squash them if you want. What is the general preference for this project?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewers generally squash themselves for small PRs like this if they find it necessary. I don't think it matters much either way,

If the last argument to shellwords ends in a multibyte utf8 character
the entire argument will be dropped.
e.g. `:sh echo test1 test2𒀀` will only output `test1`

Add additional tests based on the code review feedback
@the-mikedavis
Copy link
Member

(Force pushed without changes to trigger CI. I'm not sure why it didn't run the first time 🤔)

@the-mikedavis the-mikedavis changed the title Fix the shellwords truncation of a word ending in a multibyte character Fix utf8 length handling for shellwords Feb 1, 2023
@the-mikedavis the-mikedavis merged commit 62d046f into helix-editor:master Feb 1, 2023
@trink trink deleted the shellwords_utf8 branch March 2, 2024 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-core Area: Helix core improvements C-bug Category: This is a bug S-waiting-on-review Status: Awaiting review from a maintainer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants