-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add smart weight: a better way to balance classification probabilities #40
Commits on Feb 16, 2022
-
Feat: add smart weight support
What's smart weight? A default weight/count that each word gets, even if we've never seen it before. The main advantage is that new words now get 50/50 equal weighting between groups, and we need evidence to disturb this balance. More evidence means a stronger weighting towards a group. All you need to do to enable this is set `smart_weight` to true, which can be done during initialization (`Groupie.new smart_weight: true`) or via the `groupie.smart_weight = true` setter.
Configuration menu - View commit details
-
Copy full SHA for c80b50c - Browse repository at this point
Copy the full SHA c80b50cView commit details -
Style: apply a few layout rules regarding indentation
I'm using these in another place and just ran into a situation where the rules were needed. It basically prevents multi-line arguments from being indented really far. Oh, and newlines are consistent as \n
Configuration menu - View commit details
-
Copy full SHA for a471f58 - Browse repository at this point
Copy the full SHA a471f58View commit details -
Test: moved serialization spec from Groupie::Group to Groupie
It's more useful to test the entire Groupie can be serialized than just one.
Configuration menu - View commit details
-
Copy full SHA for c8a02ad - Browse repository at this point
Copy the full SHA c8a02adView commit details -
Change: Groups track their total word count
This is a refactoring with side effects, so it's a change. Tracking this means we don't have to count it manually in Groupie#default_weight. A side effect is that old serialized data won't have this cache data, so its smart weight default weight will not be correct. Adding old data to a new instance should fix this.
Configuration menu - View commit details
-
Copy full SHA for a8ee866 - Browse repository at this point
Copy the full SHA a8ee866View commit details -
Refactor: Groupie tracks known words to speedup default_weight
This eliminates another piece of waste from Groupie#default_weight by not having to iterate all Groups to compile a list of known unique words.
Configuration menu - View commit details
-
Copy full SHA for cfcfcf6 - Browse repository at this point
Copy the full SHA cfcfcf6View commit details -
Refactor: simplified Groupie#default_weight
Now that we have access to unique words and total word counts in Groups, we can rewrite the method to simply sum what we have, without explicit iteration and manual summing. The net result is code that reads much more straight-forward.
Configuration menu - View commit details
-
Copy full SHA for 2f720a0 - Browse repository at this point
Copy the full SHA 2f720a0View commit details -
Refactor: Groupie#classify internals are simpler
This is another old method that gained new features over time which complicated it. The double iteration and calls to apply_count_strategy bothered me. Using Hash#transform_values makes it cleaner.
Configuration menu - View commit details
-
Copy full SHA for b4ed9a1 - Browse repository at this point
Copy the full SHA b4ed9a1View commit details -
Specfix: switch to Psych for YAML test
On Ruby 2.6 and 2.7 the bundled version of YAML has an inconsistent interface (YAML.unsafe_load, YAML.safe_load and YAML.load) are not consistently present. So embrace the gem version of the underlying library (Psych) and simply use the latest version.
Configuration menu - View commit details
-
Copy full SHA for 09f5f74 - Browse repository at this point
Copy the full SHA 09f5f74View commit details