gh-117709: Add vectorcall support to `str()` #117725

erlend-aasland · 2024-04-10T20:10:20Z

IMO, this is bordering too much complexity, even though the speedup is considerable for some of the cases. We consider adding a simplified variant first.

Issue: PyUnicode_Type needs vectorcall. #117709

erlend-aasland · 2024-04-10T20:18:30Z

Some crude timeit benchmarks¹²:

This PR

# positional-only args
$ ./python.exe -m timeit "str()"
20000000 loops, best of 5: 14.6 nsec per loop
$ ./python.exe -m timeit "str(1)"
5000000 loops, best of 5: 56.8 nsec per loop
$ ./python.exe -m timeit "str(b'a', 'latin1')"
10000000 loops, best of 5: 37.3 nsec per loop
$ ./python.exe -m timeit "str(b'a', 'latin1', 'strict')"
5000000 loops, best of 5: 42.6 nsec per loop

# with kw args
$ ./python.exe -m timeit "str(b'a', 'latin1', errors='strict')"
5000000 loops, best of 5: 48.4 nsec per loop
$ ./python.exe -m timeit "str(b'a', encoding='latin1')"
5000000 loops, best of 5: 42.4 nsec per loop

# fallback to tp_call
$ ./python.exe -m timeit "str(object=b'a', encoding='latin1', errors='strict')"
2000000 loops, best of 5: 182 nsec per loop

`main`

# positional-only args
$ ./python.exe -m timeit "str()"
10000000 loops, best of 5: 26.2 nsec per loop
$ ./python.exe -m timeit "str(1)"
5000000 loops, best of 5: 56 nsec per loop
$ ./python.exe -m timeit "str(b'a', 'latin1')"¨
5000000 loops, best of 5: 75.4 nsec per loop
$ ./python.exe -m timeit "str(b'a', 'latin1', 'strict')"
5000000 loops, best of 5: 82.2 nsec per loop

# with kw args
$ ./python.exe -m timeit "str(b'a', 'latin1', errors='strict')"
2000000 loops, best of 5: 145 nsec per loop
$ ./python.exe -m timeit "str(b'a', encoding='latin1')"
2000000 loops, best of 5: 137 nsec per loop
$ ./python.exe -m timeit "str(object=b'a', encoding='latin1', errors='strict')"
2000000 loops, best of 5: 182 nsec per loop

release build on macOS ↩
updated as of ref c59478c ↩

…ents

vstinner

I'm not sure that the current complex implementation is worth it. I suggest to truncate your implementation to support only 1 or 2 arguments.

cc @corona10 who wrote similar optimizations.

Objects/unicodeobject.c

vstinner · 2024-04-11T07:16:55Z

Objects/unicodeobject.c

+        }
+        return PyUnicode_FromEncodedObject(object, encoding, errors);
+    }
+    if (nargs + nkwargs == 3) {


I'm not sure that it's worth it to optimize the 3 arguments case using vectorcall.

Yes, this is the real question: which cases do we want to optimise? It would be nice if we had some usage stats. Perhaps the Faster CPython team has stats. cc. @mdboom

Using a regular expression, I only found 5 lines in the whole stdlib (excluding tests) which call str() with 3 arguments, and all these matching lines only use positional arguments:

$ git grep '\<str *([^,()]\+,[^,()]\+,' Lib/email/utils.py: return str(rawbytes, charset, errors) Lib/encodings/punycode.py: base = str(text[:pos], "ascii", errors) Lib/pickletools.py: return str(data, 'utf-8', 'surrogatepass') Lib/pickletools.py: return str(data, 'utf-8', 'surrogatepass') Lib/pickletools.py: return str(data, 'utf-8', 'surrogatepass')

I modified git grep output to ignore tests, documentation and lines which care not str() calls (like Argument Clinic code).

By the way, I cannot find any str(bytes, encoding=...) or str(bytes, errors=...) call in the stdlib. I used the regex: git grep '\<str *([^,()]\+,[^,()]\+='. There are only results in the documentation and tests.

In short, str() is only called with positional-only arguments. We should maybe strictly focus on these cases?

Well, optimising only for the positional-only argument cases would definitely make the PR smaller and the resulting code more maintainable.

A PR for positional-only cases is up for comparison:

gh-117709: Add vectorcall support for positional-only arguments of str() #117746

There are only results in the documentation [...]

Well, people often copy the examples from the documentation, so I would expect the examples shown in the docs to be very common in the wild.

erlend-aasland · 2024-04-11T08:16:19Z

Thanks for the initial review, Victor; the PR is now ~20 lines shorter ;)

vstinner

Can you please a PR without keyword arguments? So we can compare the two implementations.

vstinner · 2024-04-11T08:22:37Z

Objects/unicodeobject.c

+}
+
+static PyObject *
+fallback_to_tp_call(PyObject *type, Py_ssize_t nargs, Py_ssize_t nkwargs,


Would it be possible to inline this function in unicode_vectorcall(), using goto if needed? The function name is too generic, it doesn't mention unicode_new().

Sure; that was my original code, but I refactored it out in order to reduce the size of the vectorcall function.

I think the code is nicer with this refactoring in place though; perhaps a better name would suffice.

erlend-aasland · 2024-04-11T08:34:55Z

Can you please a PR without keyword arguments? So we can compare the two implementations.

Created:

gh-117709: Add vectorcall support for positional-only arguments of str() #117746

vstinner · 2024-04-11T10:29:28Z

I don't think that the complexity of handling keyword arguments is worth it. I prefer to only optimize positional-only arguments.

erlend-aasland · 2024-04-11T13:31:33Z

Closed in favour of #117746. We can revisit the need to speed up some of the keyword-arg cases later.

vstinner · 2024-04-11T13:36:00Z

Closed in favour of #117746. We can revisit the need to speed up some of the keyword-arg cases later.

I would be fine if code parsing keyword arguments would be generated by Argument Clinic. But hand written code is more expensive to maintain. Here I don't think that it's worth it.

erlend-aasland · 2024-04-11T13:39:39Z

I would be fine if code parsing keyword arguments would be generated by Argument Clinic. But hand written code is more expensive to maintain. Here I don't think that it's worth it.

Never forget #87613

PoC: str vectorcall

fbbf5f3

erlend-aasland requested review from vstinner, corona10 and markshannon April 10, 2024 20:10

bedevere-app bot added the awaiting core review label Apr 10, 2024

bedevere-app bot mentioned this pull request Apr 10, 2024

PyUnicode_Type needs vectorcall. #117709

Closed

erlend-aasland added 2 commits April 10, 2024 22:15

Nicer error message

2a94bfa

NEWS

1a29140

erlend-aasland marked this pull request as draft April 10, 2024 20:24

bedevere-app bot removed the awaiting core review label Apr 10, 2024

erlend-aasland added 2 commits April 10, 2024 22:44

Don't call unicode_get_empty() directly; remove some unneeded assignm…

311aa3c

…ents

type is always PyUnicode_Type?

c59478c

erlend-aasland marked this pull request as ready for review April 10, 2024 21:12

bedevere-app bot added the awaiting core review label Apr 10, 2024

vstinner reviewed Apr 11, 2024

View reviewed changes

erlend-aasland added 5 commits April 11, 2024 10:07

Pull in main

10804c6

Address review: variable naming

068de64

Align exception messages with tp_call

57c01a4

Address review: use better internal APIs

37b45ed

Address review: explcitly initialise variables

7e141e7

vstinner reviewed Apr 11, 2024

View reviewed changes

erlend-aasland mentioned this pull request Apr 11, 2024

gh-117709: Add vectorcall support for positional-only arguments of str() #117746

Merged

erlend-aasland closed this Apr 11, 2024

erlend-aasland deleted the perf/unicode-vectorcall branch April 11, 2024 13:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-117709: Add vectorcall support to `str()` #117725

gh-117709: Add vectorcall support to `str()` #117725

erlend-aasland commented Apr 10, 2024 •

edited by bedevere-app bot

Loading

erlend-aasland commented Apr 10, 2024 •

edited

Loading

vstinner left a comment

vstinner Apr 11, 2024

erlend-aasland Apr 11, 2024

vstinner Apr 11, 2024

vstinner Apr 11, 2024

erlend-aasland Apr 11, 2024

erlend-aasland Apr 11, 2024

erlend-aasland Apr 11, 2024

erlend-aasland commented Apr 11, 2024

vstinner left a comment

vstinner Apr 11, 2024

erlend-aasland Apr 11, 2024

erlend-aasland Apr 11, 2024

erlend-aasland commented Apr 11, 2024

vstinner commented Apr 11, 2024

erlend-aasland commented Apr 11, 2024

vstinner commented Apr 11, 2024

erlend-aasland commented Apr 11, 2024

gh-117709: Add vectorcall support to str() #117725

gh-117709: Add vectorcall support to str() #117725

Conversation

erlend-aasland commented Apr 10, 2024 • edited by bedevere-app bot Loading

erlend-aasland commented Apr 10, 2024 • edited Loading

Footnotes

vstinner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erlend-aasland commented Apr 11, 2024

vstinner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erlend-aasland commented Apr 11, 2024

vstinner commented Apr 11, 2024

erlend-aasland commented Apr 11, 2024

vstinner commented Apr 11, 2024

erlend-aasland commented Apr 11, 2024

gh-117709: Add vectorcall support to `str()` #117725

gh-117709: Add vectorcall support to `str()` #117725

erlend-aasland commented Apr 10, 2024 •

edited by bedevere-app bot

Loading

erlend-aasland commented Apr 10, 2024 •

edited

Loading