-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make all regexes constants lazy loaded from pregenerated files #9
Conversation
I removed the code checking if native regex properties can be used. I'm not sure if I get the idea. Why not use the same regexes on all environment, even if they support emoji properties? Wouldn't it be better to have consistent behaviour on every platform? They way it was done can make some nuances, depending whether non-native or native emoji properties was used. However, if this is still needed then I can handle that as well. It would require to generate two versions of regexes and dynamically decide which one should be loaded. |
Thanks for the PR, I will take a closer look over the next few days. Regarding the native Emoji properties: It certainly adds complexity, but there is a lot to be won when Ruby's own regex properties can be used: A much smaller regex gets generated, which is good for not only for memory usage, but also for performance. I agree that there is potential for undesired behavior there, but since both - this library and the Ruby core - integrate the Unicode Standard data very closely, there shouldn't be any deviations (if so, it's a bug). |
5532e0f
to
fbb25d7
Compare
1acd3ff
to
add3bdd
Compare
I made two changes. Support for native regex propertiesGenerated regexes are saved to Character classesFor a list of consecutive characters (at least 3) instead of big union To get memory usage of regexes, I used this code: require 'objspace'
Unicode::Emoji.constants.sort.filter_map { const = Unicode::Emoji.const_get(_1); const.is_a?(Regexp) ? [_1, ObjectSpace.memsize_of(const)] : nil }
Comparing
I didn't check it but after this change, some regexes may perform better. I suspect that when a big list of unions is used then regex engine has to check every union element one by one. When character class with a range is used it (probably) checks by comparing to beginning and end of a range. |
Hi Radosław, thank you for all the work - the PR looks great and is a huge memory improvement. I am going to merge this, but I'd like you to ask about two small details:
|
Before merging I was about to ask whether you think that current specs are sufficient to feel comfortable with this PR changes? I didn't dig too much but I see they are checking some stuff but not everything. For example, I had a misspelled constant name
I like it! I have no idea why I didn't think about that :D. I doesn't make a huge difference (from what I checked briefly, it saves just couple of kilobytes of memory) but there is no reason to not have it. I will create a commit.
Absolutely, I'll do it. |
Co-authored-by: Jan Lelis <[email protected]>
377f74a
to
3a8b744
Compare
|
||
desc "#{gemspec.name} | Generates all regex constants and saves them to lib/unicode/emoji/{generated,generated_native} directories" | ||
task :generate_constants do | ||
load "data/generate_constants.rb", true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing true
to load will "wrap" loaded code inside a temporary module. So for example, when loaded code make global include
or defines global methods (like our script "generate_constants.rb" does) then it won't affect a global (main) object.
Great thanks for the last adaptions!
Do you remember the details about this? (should have been caught by unicode-emoji/spec/unicode_emoji_spec.rb Lines 410 to 415 in 91a6172
|
This spec checks different constant ( |
Hi, just to clarify, does the major version number bump indicate a breaking change for downstream users of this gem, or can we continue using it as we had before? Thank you! Love the library! |
Hi @jywarren, I had chosen the major version step to indicate that there were fundamental changes in the implementation, so it might behave differently (better memory usage) and might have bugs because of that (doesn't seems so, so far) - however, there were no API changes. Thanks for using the library! |
This PR addresses the problem #8.
Instead of generating regexes on the fly, when gem is loaded, they are now pre-generated. Every regex constants is in separate file and is loaded through Ruby's
autoload
. If it is not used then it doesn't take memory.To generate regexes, just run:
Before this PR, memory usage presented as:
Every run gives a little bit different result but it oscillates around ~7.5MB on mac os, and 6.5MB on linux.
After the this PR:
Both on mac os and linux it oscillates around 0.82MB. Difference is 8-9x.
This was measured on ruby 2.7, by this program:
Also, time of
require
dropped 4x: