Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix UTF32 characters breaking sentence parser due directly accessing string array indexes #1452

Merged
merged 1 commit into from
Oct 3, 2024

Conversation

Kuuuube
Copy link
Member

@Kuuuube Kuuuube commented Oct 3, 2024

UTF32 characters take up two indexes in JS strings and the sentence parser was indexing the string directly causing an incorrect offset to be applied to the sentence. This causes the {cloze-prefix}, {cloze-body}, and {cloze-suffix} handlebars to select the wrong text.

Converting the text to an array before parsing combines UTF32 characters into a single index.

Discovered on this page: https://learnjapanese.moe/texthooker.html. It has a UTF32 emoji character in the top right (🚫) which gets pulled into the sentence parser.

@Kuuuube Kuuuube added kind/bug The issue or PR is regarding a bug area/anki The issue or PR is related to Anki integration labels Oct 3, 2024
@Kuuuube Kuuuube requested a review from a team as a code owner October 3, 2024 02:02
@jamesmaa jamesmaa added this pull request to the merge queue Oct 3, 2024
Merged via the queue into yomidevs:master with commit 9d549ec Oct 3, 2024
11 checks passed
austinyu12 pushed a commit to austinyu12/yomitan that referenced this pull request Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/anki The issue or PR is related to Anki integration kind/bug The issue or PR is regarding a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants