Skip to content

Commit

Permalink
Fix that some valid subdivisions were not decompressed (REGEX_VALID)
Browse files Browse the repository at this point in the history
  • Loading branch information
janlelis committed Oct 17, 2024
1 parent d3b39b7 commit c777cd4
Show file tree
Hide file tree
Showing 8 changed files with 16 additions and 7 deletions.
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
### 3.7.0 (unereleased)

- Bump required Ruby slightly to 2.5
- Be stricter about selection of tag characters in REGEX_WELL_FORMED
- Fix that some valid subdivisions were not decompressed (`REGEX_VALID`)
- Be stricter about selection of tag characters in (`REGEX_WELL_FORMED`)
- Only U+E0030..U+E0039, U+E0061..U+E007A allowed
- Max tag sequence length
- Use native /\p{RI}/ regex for regional indicators
Expand Down
2 changes: 1 addition & 1 deletion data/generate_constants.rb
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ def compile(emoji_character:, emoji_modifier:, emoji_modifier_base:, emoji_compo
"(?:" +
pack(EMOJI_TAG_BASE_FLAG) +
"(?:" + VALID_SUBDIVISIONS.sort_by(&:length).reverse.map{ |sd|
Regexp.escape(sd.tr("\u{20}-\u{7E}", "\u{E0020}-\u{E007E}"))
sd.tr("\u{30}-\u{39}\u{61}-\u{7A}", "\u{E0030}-\u{E0039}\u{E0061}-\u{E007A}")
}.join("|") + ")" +
pack(CANCEL_TAG) +
")"
Expand Down
2 changes: 1 addition & 1 deletion lib/unicode/emoji/generated/regex_valid.rb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion lib/unicode/emoji/generated/regex_valid_include_text.rb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion lib/unicode/emoji/generated_native/regex_valid.rb

Large diffs are not rendered by default.

Large diffs are not rendered by default.

5 changes: 4 additions & 1 deletion lib/unicode/emoji/lazy_constants.rb
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,16 @@ module Emoji
EXTENDED_PICTOGRAPHIC_NO_EMOJI= INDEX[:PROPERTIES].select{ |ord, props| props.include?(:X) && !props.include?(:E) }.keys.freeze
EMOJI_KEYCAPS = INDEX[:KEYCAPS].freeze
VALID_REGION_FLAGS = INDEX[:FLAGS].freeze
VALID_SUBDIVISIONS = INDEX[:SD].freeze
VALID_SUBDIVISIONS = INDEX[:SD].map{_1.sub(/(.)~(.)/, '[\1-\2]') }
RECOMMENDED_SUBDIVISION_FLAGS = INDEX[:TAGS].freeze
RECOMMENDED_ZWJ_SEQUENCES = INDEX[:ZWJ].freeze

LIST = INDEX[:LIST].freeze.each_value(&:freeze)
LIST_REMOVED_KEYS = [
"Smileys & People",
].freeze



end
end
5 changes: 5 additions & 0 deletions spec/unicode_emoji_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,11 @@
assert_equal "🏴󠁧󠁢󠁡󠁧󠁢󠁿", $&
end

it "matches valid tag sequences (compressed one)" do
"🏴󠁬󠁶󠀰󠀴󠀲󠁿 lv042" =~ Unicode::Emoji::REGEX_VALID
assert_equal "🏴󠁬󠁶󠀰󠀴󠀲󠁿", $&
end

it "does not match invalid tag sequences" do
"🏴󠁧󠁢󠁡󠁡󠁡󠁿 GB AAA" =~ Unicode::Emoji::REGEX_VALID
assert_equal "🏴", $& # only base flag is matched
Expand Down

0 comments on commit c777cd4

Please sign in to comment.