Fix UTF32 characters breaking sentence parser due directly accessing string array indexes #1452

Kuuuube · 2024-10-03T02:02:40Z

UTF32 characters take up two indexes in JS strings and the sentence parser was indexing the string directly causing an incorrect offset to be applied to the sentence. This causes the {cloze-prefix}, {cloze-body}, and {cloze-suffix} handlebars to select the wrong text.

Converting the text to an array before parsing combines UTF32 characters into a single index.

Discovered on this page: https://learnjapanese.moe/texthooker.html. It has a UTF32 emoji character in the top right (🚫) which gets pulled into the sentence parser.

…string array indexes

…string array indexes (yomidevs#1452)

Fix UTF32 characters breaking sentence parser due directly accessing …

13bd968

…string array indexes

Kuuuube added kind/bug The issue or PR is regarding a bug area/anki The issue or PR is related to Anki integration labels Oct 3, 2024

Kuuuube requested a review from a team as a code owner October 3, 2024 02:02

jamesmaa approved these changes Oct 3, 2024

View reviewed changes

jamesmaa added this pull request to the merge queue Oct 3, 2024

Merged via the queue into yomidevs:master with commit 9d549ec Oct 3, 2024
11 checks passed

austinyu12 pushed a commit to austinyu12/yomitan that referenced this pull request Oct 22, 2024

Fix UTF32 characters breaking sentence parser due directly accessing …

57f7cdd

…string array indexes (yomidevs#1452)

Kuuuube mentioned this pull request Oct 27, 2024

Fix UTF16 and UTF32 characters breaking cloze data due to the use of substring #1533

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix UTF32 characters breaking sentence parser due directly accessing string array indexes #1452

Fix UTF32 characters breaking sentence parser due directly accessing string array indexes #1452

Kuuuube commented Oct 3, 2024 •

edited

Loading

Fix UTF32 characters breaking sentence parser due directly accessing string array indexes #1452

Fix UTF32 characters breaking sentence parser due directly accessing string array indexes #1452

Conversation

Kuuuube commented Oct 3, 2024 • edited Loading

Kuuuube commented Oct 3, 2024 •

edited

Loading