Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-117709: Add vectorcall support to str() #117725

Closed

Conversation

erlend-aasland
Copy link
Contributor

@erlend-aasland erlend-aasland commented Apr 10, 2024

IMO, this is bordering too much complexity, even though the speedup is considerable for some of the cases. We consider adding a simplified variant first.

@erlend-aasland
Copy link
Contributor Author

erlend-aasland commented Apr 10, 2024

Some crude timeit benchmarks12:

This PR
# positional-only args
$ ./python.exe -m timeit "str()"
20000000 loops, best of 5: 14.6 nsec per loop
$ ./python.exe -m timeit "str(1)"
5000000 loops, best of 5: 56.8 nsec per loop
$ ./python.exe -m timeit "str(b'a', 'latin1')"
10000000 loops, best of 5: 37.3 nsec per loop
$ ./python.exe -m timeit "str(b'a', 'latin1', 'strict')"
5000000 loops, best of 5: 42.6 nsec per loop

# with kw args
$ ./python.exe -m timeit "str(b'a', 'latin1', errors='strict')"
5000000 loops, best of 5: 48.4 nsec per loop
$ ./python.exe -m timeit "str(b'a', encoding='latin1')"
5000000 loops, best of 5: 42.4 nsec per loop

# fallback to tp_call
$ ./python.exe -m timeit "str(object=b'a', encoding='latin1', errors='strict')"
2000000 loops, best of 5: 182 nsec per loop
`main`
# positional-only args
$ ./python.exe -m timeit "str()"
10000000 loops, best of 5: 26.2 nsec per loop
$ ./python.exe -m timeit "str(1)"
5000000 loops, best of 5: 56 nsec per loop
$ ./python.exe -m timeit "str(b'a', 'latin1')"¨
5000000 loops, best of 5: 75.4 nsec per loop
$ ./python.exe -m timeit "str(b'a', 'latin1', 'strict')"
5000000 loops, best of 5: 82.2 nsec per loop

# with kw args
$ ./python.exe -m timeit "str(b'a', 'latin1', errors='strict')"
2000000 loops, best of 5: 145 nsec per loop
$ ./python.exe -m timeit "str(b'a', encoding='latin1')"
2000000 loops, best of 5: 137 nsec per loop
$ ./python.exe -m timeit "str(object=b'a', encoding='latin1', errors='strict')"
2000000 loops, best of 5: 182 nsec per loop

Footnotes

  1. release build on macOS

  2. updated as of ref c59478c

@erlend-aasland erlend-aasland marked this pull request as draft April 10, 2024 20:24
@erlend-aasland erlend-aasland marked this pull request as ready for review April 10, 2024 21:12
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that the current complex implementation is worth it. I suggest to truncate your implementation to support only 1 or 2 arguments.

cc @corona10 who wrote similar optimizations.

Objects/unicodeobject.c Outdated Show resolved Hide resolved
Objects/unicodeobject.c Outdated Show resolved Hide resolved
Objects/unicodeobject.c Outdated Show resolved Hide resolved
Objects/unicodeobject.c Outdated Show resolved Hide resolved
Objects/unicodeobject.c Outdated Show resolved Hide resolved
}
return PyUnicode_FromEncodedObject(object, encoding, errors);
}
if (nargs + nkwargs == 3) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that it's worth it to optimize the 3 arguments case using vectorcall.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the real question: which cases do we want to optimise? It would be nice if we had some usage stats. Perhaps the Faster CPython team has stats. cc. @mdboom

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a regular expression, I only found 5 lines in the whole stdlib (excluding tests) which call str() with 3 arguments, and all these matching lines only use positional arguments:

$ git grep '\<str *([^,()]\+,[^,()]\+,'
Lib/email/utils.py:        return str(rawbytes, charset, errors)
Lib/encodings/punycode.py:            base = str(text[:pos], "ascii", errors)
Lib/pickletools.py:        return str(data, 'utf-8', 'surrogatepass')
Lib/pickletools.py:        return str(data, 'utf-8', 'surrogatepass')
Lib/pickletools.py:        return str(data, 'utf-8', 'surrogatepass')

I modified git grep output to ignore tests, documentation and lines which care not str() calls (like Argument Clinic code).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I cannot find any str(bytes, encoding=...) or str(bytes, errors=...) call in the stdlib. I used the regex: git grep '\<str *([^,()]\+,[^,()]\+='. There are only results in the documentation and tests.

In short, str() is only called with positional-only arguments. We should maybe strictly focus on these cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, optimising only for the positional-only argument cases would definitely make the PR smaller and the resulting code more maintainable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A PR for positional-only cases is up for comparison:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are only results in the documentation [...]

Well, people often copy the examples from the documentation, so I would expect the examples shown in the docs to be very common in the wild.

@erlend-aasland
Copy link
Contributor Author

Thanks for the initial review, Victor; the PR is now ~20 lines shorter ;)

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please a PR without keyword arguments? So we can compare the two implementations.

}

static PyObject *
fallback_to_tp_call(PyObject *type, Py_ssize_t nargs, Py_ssize_t nkwargs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to inline this function in unicode_vectorcall(), using goto if needed? The function name is too generic, it doesn't mention unicode_new().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure; that was my original code, but I refactored it out in order to reduce the size of the vectorcall function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code is nicer with this refactoring in place though; perhaps a better name would suffice.

@erlend-aasland
Copy link
Contributor Author

Can you please a PR without keyword arguments? So we can compare the two implementations.

Created:

@vstinner
Copy link
Member

I don't think that the complexity of handling keyword arguments is worth it. I prefer to only optimize positional-only arguments.

@erlend-aasland
Copy link
Contributor Author

Closed in favour of #117746. We can revisit the need to speed up some of the keyword-arg cases later.

@erlend-aasland erlend-aasland deleted the perf/unicode-vectorcall branch April 11, 2024 13:31
@vstinner
Copy link
Member

Closed in favour of #117746. We can revisit the need to speed up some of the keyword-arg cases later.

I would be fine if code parsing keyword arguments would be generated by Argument Clinic. But hand written code is more expensive to maintain. Here I don't think that it's worth it.

@erlend-aasland
Copy link
Contributor Author

I would be fine if code parsing keyword arguments would be generated by Argument Clinic. But hand written code is more expensive to maintain. Here I don't think that it's worth it.

Never forget #87613

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants