-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance: memomize LinesClassifier::no_cov_line #662
Performance: memomize LinesClassifier::no_cov_line #662
Conversation
In my measurements the majority of the time in SimpleCov was being spent in this one-line function. In my tests on a large project, this change makes SimpleCov run from 2.5x to 3.75x faster. Admittedly, memoizing it does change its behavior but only in the edge case where someone runs SimpleCov twice in a single test run, changing the +nocov_token+ between the two runs. This is such an odd edge case I decided not to worry about it.
2d31d4b
to
93b82b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there,
thanks for the contribution!
Could you share a benchmark or parts of it for this? I guess you can't share the airbnb code base ;P
I'm just curious (plus like to keep benchmarks around and repeatable) it seems odd for such a little line to cause so much overhead, but as it's evaluated a lot and a new regex is instantiated every time I guess it makes sense...
I agree, the edge case is negligible / people shouldn't change there nocov tokens. I'd even debate the ability to change this tokens as not really necessary imo (and then this could even be in a constant...). But that's nod a change for now.
Thanks a bunch!
Haha I'd be happy to share the airbnb code base with you if you come work here! 😛 But hopefully this will help: our single largest Ruby project has tens of thousands of files and millions of lines of code. Our CI runs it with a test splitter that splits it into 720 parallel runs on 24 machines. (!) If we use If we don't use Either way, all those constructions of RegExp objects take a heck of a long time: Before memoizing:
After memoizing:
|
P.S. Those benchmarks were in Ruby 2.1.8 |
Please hold off on merging this until I resolve an issue I'm getting in our CI, although I can't repro it locally:
|
The error is happening because The string is "text = "This is a test of the new xml markup. I\xF1t\xEBrn\xE2ti\xF4n\xE0liz\xE6ti\xF8n\n" * 10000\n" I can verify this in pry:
What I'm curious about is why, on SimpleCov head, this file is being processed at all. It seems like a bug. I was previously using 0.14.1 and it did not process this file because our filters are: add_filter [
"/config/",
"/db/",
"/script/",
"/spec/",
"/test/",
"/vendor/",
] and the filename is I am familiar with the change in 0.15.1 to add support for regexp filters but it does not seem like that would affect it. Is this a bug that was introduced after 0.14.1? Is it calling |
Update: feel free to merge this. The issue I'm seeing is unrelated to this change. I get the same behavior when I upgrade from 0.14.1 to 0.15.1 without this change. Haven't yet figured out why. |
@BMorearty thanks for the input. It does seem like we might have had some form of regression regarding filters but I'm unsure and honestly I don't have the most time to spend on simplecov but I might be able to investigate and repro. Similarly, maybe I get to wip up a quick benchmark for our line classification. |
Ok wipped up a quick benchmark and it looks good:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my measurements the majority of the time in SimpleCov was being
spent in this one-line function.
In my tests on a large project at Airbnb, this change makes SimpleCov run
from 2.5x to 3.75x faster when processing results.
(From 154.2 seconds down to 60.6 seconds in one test.
From 16.6 seconds down to 4.4 seconds in another.)
Admittedly, memoizing it does change its behavior but only in the edge
case where someone runs SimpleCov twice in a single test run, changing
the +nocov_token+ between the two runs. This is such an odd edge case I
decided not to worry about it.