Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-85160: improve performance of singledispatchmethod #107148

Merged
merged 32 commits into from
Aug 6, 2023

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Jul 23, 2023

This PR implements the idea from #85160 to improve performance of the singledispatchmethod by caching the generated dispatch method. It is a continuation of #23213

Also see #106448

Lib/functools.py Outdated Show resolved Hide resolved
Lib/functools.py Outdated Show resolved Hide resolved
Lib/functools.py Outdated Show resolved Hide resolved
eendebakpt and others added 2 commits July 23, 2023 22:42
Co-authored-by: Alex Waygood <[email protected]>
Co-authored-by: Alex Waygood <[email protected]>
Lib/functools.py Outdated Show resolved Hide resolved
Lib/functools.py Outdated Show resolved Hide resolved
Lib/functools.py Outdated Show resolved Hide resolved
Lib/functools.py Outdated Show resolved Hide resolved
Copy link
Member

@corona10 corona10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with nit comment

It's become a lot faster

script

import pyperf

runner = pyperf.Runner()
runner.timeit(name="bench singledispatchmethod",
              stmt="""
_ = t.go(1, 1)
""",
setup="""
from functools import singledispatch, singledispatchmethod

class Test:
    @singledispatchmethod
    def go(self, item, arg):
        pass

    @go.register
    def _(self, item: int, arg):
        return item + arg

t = Test()
""")

Mean +- std dev: [base] 1.37 us +- 0.01 us -> [opt] 410 ns +- 2 ns: 3.35x faster

@eendebakpt
Copy link
Contributor Author

@AlexWaygood The weakref approach can be improved a bit. We can use _all_weakrefable_instances instead of _unweakrefable_instances, which avoids a not. Also we can use _all_weakrefable_instances to avoid the expensive lookups in the WeakrefDictionary in the case of slotted classes.

That's very nice indeed. Good spot! I can reproduce your results on my machine.

Could you change your PR to this approach? I think I agree with @ethanhs that mutating the instance dictionary of instances with singeldispatchmethod is too risky a behaviour change here :/

The PR was adapted. Since cls is not None is True for all common cases (method, staticmethod, classmethod) I swapped the _all_weakrefable_instances and cls is not None

Lib/functools.py Outdated Show resolved Hide resolved
Lib/functools.py Outdated Show resolved Hide resolved
@AlexWaygood
Copy link
Member

AlexWaygood commented Jul 31, 2023

I did some more playing around locally and found we could improve performance even more with a few tweaks. Proposed a PR to your branch at eendebakpt#5, which also adds some more test coverage.

@eendebakpt
Copy link
Contributor Author

eendebakpt commented Jul 31, 2023

@AlexWaygood I merged your PR and added one more optimization: we can skip the obj is not None check since for obj=None the weakref indexing will fail with a TypeError.

One more optimization is in https://github.com/eendebakpt/cpython/pull/6/files. It eliminates the caching variable. The assumption there is that if self._method_cache[obj] returns a KeyError, the expression self._method_cache[obj] = ... is safe.

I still have to do benchmarking on this.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @ethanhs, how does this look to you now?

@eendebakpt
Copy link
Contributor Author

Latest benchmarks:

bench singledispatchmethod: Mean +- std dev: [main] 2.48 us +- 0.02 us -> [pr] 1.04 us +- 0.02 us: 2.38x faster
bench classmethod: Mean +- std dev: [main] 3.45 us +- 0.06 us -> [pr] 3.50 us +- 0.07 us: 1.01x slower
bench classmethod on instance: Mean +- std dev: [main] 3.49 us +- 0.06 us -> [pr] 1.12 us +- 0.01 us: 3.11x faster
bench slotted class: Mean +- std dev: [main] 2.48 us +- 0.03 us -> [pr] 2.53 us +- 0.03 us: 1.02x slower

Geometric mean: 1.64x faster

So the performance improves for the common case (normal methods) and is more or less equal for class methods and slotted classes.

One more variation is to eliminate attribute lookups in the dispatch wrapper: eendebakpt/cpython@singledispatchmethod3...eendebakpt:cpython:singledispatchmethod3b
This results in (compared to this PR):

bench singledispatchmethod: Mean +- std dev: [pr] 1.04 us +- 0.02 us -> [pr_lookup] 979 ns +- 15 ns: 1.06x faster
bench classmethod on instance: Mean +- std dev: [pr] 1.12 us +- 0.01 us -> [pr_lookup] 1.06 us +- 0.01 us: 1.06x faster
bench slotted class: Mean +- std dev: [pr] 2.53 us +- 0.03 us -> [pr_lookup] 2.52 us +- 0.03 us: 1.01x faster

Benchmark hidden because not significant (1): bench classmethod

Geometric mean: 1.03x faster

@AlexWaygood
Copy link
Member

AlexWaygood commented Aug 1, 2023

One more variation is to eliminate attribute lookups in the dispatch wrapper: eendebakpt/[email protected]:cpython:singledispatchmethod3b

I can reproduce the speedup locally (great spot!), and it seems like a reasonable thing to do. Please name the variable dispatch rather than dp, though :)

One other optimisation that shaves around 5 microseconds off the benchmark for slotted classes and other objects that can't be weakref'd (but doesn't do much for those that can):

--- a/Lib/functools.py
+++ b/Lib/functools.py
@@ -959,7 +959,7 @@ def _method(*args, **kwargs):
             method = self.dispatcher.dispatch(args[0].__class__)
             return method.__get__(obj, cls)(*args, **kwargs)

-        _method.__isabstractmethod__ = self.__isabstractmethod__
+        _method.__isabstractmethod__ = getattr(self.func, "__isabstractmethod__", False)

But if you want to do ^that, you should probably add a comment about why we're duplicating the logic in the __isabstractmethod__ property.

@AlexWaygood
Copy link
Member

One other optimisation that shaves around 5 microseconds off the benchmark for slotted classes and other objects that can't be weakref'd (but doesn't do much for those that can):

--- a/Lib/functools.py
+++ b/Lib/functools.py
@@ -959,7 +959,7 @@ def _method(*args, **kwargs):
             method = self.dispatcher.dispatch(args[0].__class__)
             return method.__get__(obj, cls)(*args, **kwargs)

-        _method.__isabstractmethod__ = self.__isabstractmethod__
+        _method.__isabstractmethod__ = getattr(self.func, "__isabstractmethod__", False)

Hmm, I thought I could see a speedup from this earlier, but now I can no longer reproduce it. I assume it was only noise -- best to leave it.

@AlexWaygood AlexWaygood merged commit 3e334ae into python:main Aug 6, 2023
17 checks passed
@AlexWaygood
Copy link
Member

@mental32 and @eendebakpt, thanks so much for all your work on this!

try:
_method = self._method_cache[obj]
except TypeError:
self._all_weakrefable_instances = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I wonder if this would have been a good idea, to keep memory usage down for the slow path (since in the slow path, we don't really need the WeakKeyDictionary at all)

Suggested change
self._all_weakrefable_instances = False
self._all_weakrefable_instances = False
del self._method_cache

@corona10, what do you think? Does it matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could even delay the creation of the Weakkeydictionary untill the first invocation of __get__. The code will look a bit odd though, a simple del looks cleaner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think I prefer the approach with del

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants