-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-117709: Add vectorcall support to str()
#117725
gh-117709: Add vectorcall support to str()
#117725
Conversation
Some crude This PR# positional-only args
$ ./python.exe -m timeit "str()"
20000000 loops, best of 5: 14.6 nsec per loop
$ ./python.exe -m timeit "str(1)"
5000000 loops, best of 5: 56.8 nsec per loop
$ ./python.exe -m timeit "str(b'a', 'latin1')"
10000000 loops, best of 5: 37.3 nsec per loop
$ ./python.exe -m timeit "str(b'a', 'latin1', 'strict')"
5000000 loops, best of 5: 42.6 nsec per loop
# with kw args
$ ./python.exe -m timeit "str(b'a', 'latin1', errors='strict')"
5000000 loops, best of 5: 48.4 nsec per loop
$ ./python.exe -m timeit "str(b'a', encoding='latin1')"
5000000 loops, best of 5: 42.4 nsec per loop
# fallback to tp_call
$ ./python.exe -m timeit "str(object=b'a', encoding='latin1', errors='strict')"
2000000 loops, best of 5: 182 nsec per loop `main`# positional-only args
$ ./python.exe -m timeit "str()"
10000000 loops, best of 5: 26.2 nsec per loop
$ ./python.exe -m timeit "str(1)"
5000000 loops, best of 5: 56 nsec per loop
$ ./python.exe -m timeit "str(b'a', 'latin1')"¨
5000000 loops, best of 5: 75.4 nsec per loop
$ ./python.exe -m timeit "str(b'a', 'latin1', 'strict')"
5000000 loops, best of 5: 82.2 nsec per loop
# with kw args
$ ./python.exe -m timeit "str(b'a', 'latin1', errors='strict')"
2000000 loops, best of 5: 145 nsec per loop
$ ./python.exe -m timeit "str(b'a', encoding='latin1')"
2000000 loops, best of 5: 137 nsec per loop
$ ./python.exe -m timeit "str(object=b'a', encoding='latin1', errors='strict')"
2000000 loops, best of 5: 182 nsec per loop Footnotes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that the current complex implementation is worth it. I suggest to truncate your implementation to support only 1 or 2 arguments.
cc @corona10 who wrote similar optimizations.
} | ||
return PyUnicode_FromEncodedObject(object, encoding, errors); | ||
} | ||
if (nargs + nkwargs == 3) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that it's worth it to optimize the 3 arguments case using vectorcall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the real question: which cases do we want to optimise? It would be nice if we had some usage stats. Perhaps the Faster CPython team has stats. cc. @mdboom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a regular expression, I only found 5 lines in the whole stdlib (excluding tests) which call str() with 3 arguments, and all these matching lines only use positional arguments:
$ git grep '\<str *([^,()]\+,[^,()]\+,'
Lib/email/utils.py: return str(rawbytes, charset, errors)
Lib/encodings/punycode.py: base = str(text[:pos], "ascii", errors)
Lib/pickletools.py: return str(data, 'utf-8', 'surrogatepass')
Lib/pickletools.py: return str(data, 'utf-8', 'surrogatepass')
Lib/pickletools.py: return str(data, 'utf-8', 'surrogatepass')
I modified git grep
output to ignore tests, documentation and lines which care not str() calls (like Argument Clinic code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I cannot find any str(bytes, encoding=...)
or str(bytes, errors=...)
call in the stdlib. I used the regex: git grep '\<str *([^,()]\+,[^,()]\+='
. There are only results in the documentation and tests.
In short, str() is only called with positional-only arguments. We should maybe strictly focus on these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, optimising only for the positional-only argument cases would definitely make the PR smaller and the resulting code more maintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A PR for positional-only cases is up for comparison:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are only results in the documentation [...]
Well, people often copy the examples from the documentation, so I would expect the examples shown in the docs to be very common in the wild.
Thanks for the initial review, Victor; the PR is now ~20 lines shorter ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please a PR without keyword arguments? So we can compare the two implementations.
} | ||
|
||
static PyObject * | ||
fallback_to_tp_call(PyObject *type, Py_ssize_t nargs, Py_ssize_t nkwargs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to inline this function in unicode_vectorcall(), using goto if needed? The function name is too generic, it doesn't mention unicode_new().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure; that was my original code, but I refactored it out in order to reduce the size of the vectorcall function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code is nicer with this refactoring in place though; perhaps a better name would suffice.
Created: |
I don't think that the complexity of handling keyword arguments is worth it. I prefer to only optimize positional-only arguments. |
Closed in favour of #117746. We can revisit the need to speed up some of the keyword-arg cases later. |
I would be fine if code parsing keyword arguments would be generated by Argument Clinic. But hand written code is more expensive to maintain. Here I don't think that it's worth it. |
Never forget #87613 |
IMO, this is bordering too much complexity, even though the speedup is considerable for some of the cases. We consider adding a simplified variant first.
PyUnicode_Type
needs vectorcall. #117709