-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with swapping GenericVector for std::vector #3473
Comments
Thank you for this valuable hint. What do you mean with the "performance decrease" of 4 %? Increased character error rates? Larger execution time? |
I am afraid that the necessary fixes will increase the execution time even further. |
The difference is not large but measureable. After applying changes mentioned in the comments here execution time went back to normal so I have some hope. There is yet another performance issue like that introduced in later commits but I have not found it yet. |
We need some script to find such incorrect fixes in git history and update the places. |
Fixes: 9710bc0 Signed-off-by: Stefan Weil <[email protected]>
Sure, I'll be happy to test it but from my initial grepping it looks like there were changes in a lot of places. I just realised I pasted a wrong commit range, the GenericVector changes started in Oct 2020 and finished in Mar 2021, commits: 92b6c652..65d882f9. Updated the issue description, sorry about that. This is quite a lot of code to go through. |
If you have other issues to work on I may try and change all the wrong code I can find but it will take me a few days. |
Fixes: 9710bc0 Signed-off-by: Stefan Weil <[email protected]>
Fixes: c8b8d26 Fixes: 9710bc0 Signed-off-by: Stefan Weil <[email protected]>
Fixes: c8b8d26 Fixes: 9710bc0 Signed-off-by: Stefan Weil <[email protected]>
That was quick! I ran your merged code from master and it looks better (runs faster) but still not exactly the same as originally. You can try to find and replace problematic Thank you. |
Looks like the other thing may be related to some compiler optimization. The time differences are pretty slim but show up consistently. The commit that introduced additional delays: 2a3682a3 Created a PR that makes the problem go away on my machine: #3481 Is there a CI pipeline that can track and report tesseract performance? If you see no other problems with GenericVector replacements then this issue can be closed. |
We don't have CI perf tests at the moment. |
Can you also provide some numbers? From 2a3682a I see copies in loops in two places
Not sure if this is intended. |
493620us vs. 479074us on GCC (min of ~300 runs). I checked all the other code that looked weird, including copies in those 2 fors, I was surprised to see that none of the above made any measurable difference, only getting the data pointer did. But this is a bit of a guessing game.
Those would be very cool to have. |
So can it be closed now, or is there anything still open? |
@stweil yes, thank you so much for the support! |
Environment
Problem
Replacing
GenericVector
withstd::vector
has resulted in a small execution time performance decrease (about 4% on my simple test). I am not sure whether some other bugs have been introduced this way.Most of the issues I could see are from changing
generic_vector.init_to_size(n, x);
directly intovector.resize(n, x);
. It is becauseinit_to_size()
not only resizes the vector but also resets all of its values (not only the new ones).Suggested Fix
Recheck the logic and update
GenericVector
's changed lines:gv.init_to_size(n, x) and gv.resize(n, x) should become:
gv.truncate(n) (other than truncate(0) which is just v.clear()) should become:
gv.resize_no_init(n) (I think this one was ok):
I have added an example in comments of a commit where the first performance hit was noted .
As far as I could see the code affected is within commit range 92b6c652..65d882f9. I might be able to provide a PR but this could take some time.
The text was updated successfully, but these errors were encountered: