Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-117709: Add vectorcall support to str() #117725

Closed
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Speed up calls to :func:`str` by using the :pep:`590` ``vectorcall`` calling
convention. Patch by Erlend Aasland.
110 changes: 110 additions & 0 deletions Objects/unicodeobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -14617,6 +14617,115 @@ unicode_new_impl(PyTypeObject *type, PyObject *x, const char *encoding,
return unicode;
}

static const char *
as_const_char(PyObject *obj, const char *name)
{
if (!PyUnicode_Check(obj)) {
PyErr_Format(PyExc_TypeError,
"str() argument '%s' must be str, not %T",
name, obj);
return NULL;
}
const char *str = _PyUnicode_AsUTF8NoNUL(obj);
if (str == NULL) {
return NULL;
}
return str;
}

static PyObject *
fallback_to_tp_call(PyObject *type, Py_ssize_t nargs, Py_ssize_t nkwargs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to inline this function in unicode_vectorcall(), using goto if needed? The function name is too generic, it doesn't mention unicode_new().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure; that was my original code, but I refactored it out in order to reduce the size of the vectorcall function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code is nicer with this refactoring in place though; perhaps a better name would suffice.

PyObject *const *args, PyObject *kwnames)
{
PyObject *tuple = _PyTuple_FromArray(args, nargs);
if (tuple == NULL) {
return NULL;
}
PyObject *dict = _PyStack_AsDict(args + nargs, kwnames);
if (dict == NULL) {
Py_DECREF(tuple);
return NULL;
}
return unicode_new(_PyType_CAST(type), tuple, dict);
}

static PyObject *
unicode_vectorcall(PyObject *type, PyObject *const *args,
size_t nargsf, PyObject *kwnames)
{
assert(Py_Is(_PyType_CAST(type), &PyUnicode_Type));

Py_ssize_t nargs = PyVectorcall_NARGS(nargsf);
Py_ssize_t nkwargs = (kwnames) ? PyTuple_GET_SIZE(kwnames) : 0;
if (nargs == 0 && nkwargs == 0) {
return unicode_get_empty();
}
PyObject *object = args[0];
if (nargs == 1 && nkwargs == 0) {
return PyObject_Str(object);
}
if (nargs + nkwargs == 2) {
const char *encoding = NULL;
const char *errors = NULL;
if (nkwargs == 1) {
PyObject *key = PyTuple_GET_ITEM(kwnames, 0);
if (_PyUnicode_EqualToASCIIString(key, "encoding")) {
encoding = as_const_char(args[1], "encoding");
if (encoding == NULL) {
return NULL;
}
}
else if (_PyUnicode_EqualToASCIIString(key, "errors")) {
errors = as_const_char(args[1], "errors");
if (errors == NULL) {
return NULL;
}
}
else {
PyErr_Format(PyExc_TypeError,
"str() got an unexpected keyword argument %R", key);
return NULL;
}
}
else if (nkwargs == 0) {
encoding = as_const_char(args[1], "encoding");
}
else {
return fallback_to_tp_call(type, nargs, nkwargs, args, kwnames);
}
return PyUnicode_FromEncodedObject(object, encoding, errors);
}
if (nargs + nkwargs == 3) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that it's worth it to optimize the 3 arguments case using vectorcall.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the real question: which cases do we want to optimise? It would be nice if we had some usage stats. Perhaps the Faster CPython team has stats. cc. @mdboom

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a regular expression, I only found 5 lines in the whole stdlib (excluding tests) which call str() with 3 arguments, and all these matching lines only use positional arguments:

$ git grep '\<str *([^,()]\+,[^,()]\+,'
Lib/email/utils.py:        return str(rawbytes, charset, errors)
Lib/encodings/punycode.py:            base = str(text[:pos], "ascii", errors)
Lib/pickletools.py:        return str(data, 'utf-8', 'surrogatepass')
Lib/pickletools.py:        return str(data, 'utf-8', 'surrogatepass')
Lib/pickletools.py:        return str(data, 'utf-8', 'surrogatepass')

I modified git grep output to ignore tests, documentation and lines which care not str() calls (like Argument Clinic code).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I cannot find any str(bytes, encoding=...) or str(bytes, errors=...) call in the stdlib. I used the regex: git grep '\<str *([^,()]\+,[^,()]\+='. There are only results in the documentation and tests.

In short, str() is only called with positional-only arguments. We should maybe strictly focus on these cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, optimising only for the positional-only argument cases would definitely make the PR smaller and the resulting code more maintainable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A PR for positional-only cases is up for comparison:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are only results in the documentation [...]

Well, people often copy the examples from the documentation, so I would expect the examples shown in the docs to be very common in the wild.

if (nkwargs == 1) {
PyObject *key = PyTuple_GET_ITEM(kwnames, 0);
if (!_PyUnicode_EqualToASCIIString(key, "errors")) {
PyErr_Format(PyExc_TypeError,
"str() got an unexpected keyword argument %R", key);
return NULL;
}
}
else if (nkwargs != 0) {
return fallback_to_tp_call(type, nargs, nkwargs, args, kwnames);
}
const char *encoding = as_const_char(args[1], "encoding");
if (encoding == NULL) {
return NULL;
}
const char *errors = as_const_char(args[2], "errors");
if (errors == NULL) {
return NULL;
}
return PyUnicode_FromEncodedObject(object, encoding, errors);
}
if (nargs > 3) {
PyErr_Format(PyExc_TypeError,
"str() takes at most 3 arguments (%d given)", nargs + nkwargs);
return NULL;
}

return fallback_to_tp_call(type, nargs, nkwargs, args, kwnames);
}

static PyObject *
unicode_subtype_new(PyTypeObject *type, PyObject *unicode)
{
Expand Down Expand Up @@ -14758,6 +14867,7 @@ PyTypeObject PyUnicode_Type = {
0, /* tp_alloc */
unicode_new, /* tp_new */
PyObject_Del, /* tp_free */
.tp_vectorcall = unicode_vectorcall,
};

/* Initialize the Unicode implementation */
Expand Down
Loading