Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-125916: Allow functools.reduce 'initial' to be a keyword argument #125917

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

sayandipdutta
Copy link

@sayandipdutta sayandipdutta commented Oct 24, 2024

Before:

from functools import reduce
from operator import sub
>>> reduce(sub, [1, 1, 2, 3, 5, 8], 21)
1
>>> reduce(sub, [1, 1, 2, 3, 5, 8], initial=21)
TypeError: reduce() takes no keyword arguments

After:

from functools import reduce
from operator import sub
>>> reduce(sub, [1, 1, 2, 3, 5, 8], 21)
1
>>> reduce(sub, [1, 1, 2, 3, 5, 8], initial=21)
1

Issue: gh-125916


📚 Documentation preview 📚: https://cpython-previews--125917.org.readthedocs.build/

Copy link

cpython-cla-bot bot commented Oct 24, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

Copy link
Member

@skirpichev skirpichev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also benchmark your implementation.

Kwargs handling will affect performance even if keyword will not be actually used (e.g. calls like reduce(f, seq, init)). IIUC, PyArg_ParseTupleAndKeywords is much slower in general than PyArg_UnpackTuple.

Modules/_functoolsmodule.c Outdated Show resolved Hide resolved
Doc/library/functools.rst Outdated Show resolved Hide resolved
Modules/_functoolsmodule.c Outdated Show resolved Hide resolved
@skirpichev
Copy link
Member

CC @sobolevn, as you have added AC comment.

@skirpichev

This comment was marked as outdated.

Co-authored-by: Sergey B Kirpichev <[email protected]>
@skirpichev
Copy link
Member

skirpichev commented Oct 24, 2024

Now with AC (patch2).

Patch with AC (a draft).
diff --git a/Modules/_functoolsmodule.c b/Modules/_functoolsmodule.c
index 802b1cf792..8faa8ad1ac 100644
--- a/Modules/_functoolsmodule.c
+++ b/Modules/_functoolsmodule.c
@@ -932,15 +932,30 @@ _functools_cmp_to_key_impl(PyObject *module, PyObject *mycmp)
 
 /* reduce (used to be a builtin) ********************************************/
 
-// Not converted to argument clinic, because of `args` in-place modification.
-// AC will affect performance.
+/*[clinic input]
+_functools.reduce
+
+    function as func: object
+    iterable as seq: object
+    /
+    initial as result: object(c_default="NULL") = None
+
+Apply a function of two arguments cumulatively.
+
+Apply it to the items of a sequence or iterable, from left to right, so as to
+reduce the iterable to a single value.  For example, reduce(lambda x, y: x+y,
+[1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5).  If initial is present, it is
+placed before the items of the iterable in the calculation, and serves as a
+default when the iterable is empty.
+[clinic start generated code]*/
+
 static PyObject *
-functools_reduce(PyObject *self, PyObject *args)
+_functools_reduce_impl(PyObject *module, PyObject *func, PyObject *seq,
+                       PyObject *result)
+/*[clinic end generated code: output=30d898fe1267c79d input=b7082b8b1473fdc2]*/
 {
-    PyObject *seq, *func, *result = NULL, *it;
+    PyObject *args, *it;
 
-    if (!PyArg_UnpackTuple(args, "reduce", 2, 3, &func, &seq, &result))
-        return NULL;
     if (result != NULL)
         Py_INCREF(result);
 
@@ -1006,16 +1021,6 @@ functools_reduce(PyObject *self, PyObject *args)
     return NULL;
 }
 
-PyDoc_STRVAR(functools_reduce_doc,
-"reduce(function, iterable[, initial], /) -> value\n\
-\n\
-Apply a function of two arguments cumulatively to the items of a sequence\n\
-or iterable, from left to right, so as to reduce the iterable to a single\n\
-value.  For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates\n\
-((((1+2)+3)+4)+5).  If initial is present, it is placed before the items\n\
-of the iterable in the calculation, and serves as a default when the\n\
-iterable is empty.");
-
 /* lru_cache object **********************************************************/
 
 /* There are four principal algorithmic differences from the pure python version:
@@ -1720,7 +1725,7 @@ PyDoc_STRVAR(_functools_doc,
 "Tools that operate on functions.");
 
 static PyMethodDef _functools_methods[] = {
-    {"reduce",          functools_reduce,     METH_VARARGS, functools_reduce_doc},
+    _FUNCTOOLS_REDUCE_METHODDEF
     _FUNCTOOLS_CMP_TO_KEY_METHODDEF
     {NULL,              NULL}           /* sentinel */
 };

You should run ./python Tools/clinic/clinic.py Modules/_functoolsmodule.c to update autogenerated code.

I did some benchmarks.

# a.py
import pyperf
from functools import reduce

f = lambda x, y: x + y
lst = list(range(10))
init = 123

runner = pyperf.Runner()
runner.bench_func('reduce(f, lst)', reduce, f, lst)
runner.bench_func('reduce(f, lst, init)', reduce, f, lst, init)

Run e.g. with: python a.py -q -o ref.json.

with results:

Benchmark ref patch patch2
reduce(f, lst) 2.18 us 2.42 us: 1.11x slower 2.11 us: 1.03x faster
reduce(f, lst, init) 2.35 us 2.64 us: 1.12x slower 2.27 us: 1.04x faster
Geometric mean (ref) 1.12x slower 1.03x faster

Looks the patch with AC even slightly faster than in the main.

@sayandipdutta
Copy link
Author

@skirpichev Is initial=None safe for backward compatibility? Does this mean reduce(Callable[[None, T], None], Iterable[T], None) will behave differently in 3.13 and 3.14?

Copy link
Member

@Eclips4 Eclips4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I think that if we want to add keyword support for functools.reduce, it should be done for all parameters, not just the initial. If so, this would match the behavior of the pure Python version of functools.

Lib/functools.py Outdated Show resolved Hide resolved
@Eclips4
Copy link
Member

Eclips4 commented Oct 24, 2024

@skirpichev Is initial=None safe for backward compatibility? Does this mean reduce(Callable[[None, T], None], Iterable[T], None) will behave differently in 3.13 and 3.14?

No, it doesn't. Someone can use None as initial value.

Taken from patch by Sergey B Kirpichev <[email protected]>
@skirpichev
Copy link
Member

skirpichev commented Oct 24, 2024

it should be done for all parameters, not just the initial

That might slowdown the patch v2.

No, it doesn't. Someone can use None as initial value.

That was a draft;) I think we could use same trick as for the Python version.

BTW, it seems the PEP 661 doesn't cover this at all.

Edit:

Updated AC patch with a sentinel value.
diff --git a/Modules/_functoolsmodule.c b/Modules/_functoolsmodule.c
index 802b1cf792..00b4a5e6cc 100644
--- a/Modules/_functoolsmodule.c
+++ b/Modules/_functoolsmodule.c
@@ -932,15 +932,31 @@ _functools_cmp_to_key_impl(PyObject *module, PyObject *mycmp)
 
 /* reduce (used to be a builtin) ********************************************/
 
-// Not converted to argument clinic, because of `args` in-place modification.
-// AC will affect performance.
+/*[clinic input]
+_functools.reduce
+
+    function as func: object
+    iterable as seq: object
+    /
+    initial as result: object(c_default="NULL") = _functools._initial_missing
+
+Apply a function of two arguments cumulatively to an iterable, from left to right.
+
+This efficiently reduce the iterable to a single value.  If initial is present,
+it is placed before the items of the iterable in the calculation, and serves as
+a default when the iterable is empty.
+
+For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5])
+calculates ((((1+2)+3)+4)+5).
+[clinic start generated code]*/
+
 static PyObject *
-functools_reduce(PyObject *self, PyObject *args)
+_functools_reduce_impl(PyObject *module, PyObject *func, PyObject *seq,
+                       PyObject *result)
+/*[clinic end generated code: output=30d898fe1267c79d input=40be8069bcbc1a75]*/
 {
-    PyObject *seq, *func, *result = NULL, *it;
+    PyObject *args, *it;
 
-    if (!PyArg_UnpackTuple(args, "reduce", 2, 3, &func, &seq, &result))
-        return NULL;
     if (result != NULL)
         Py_INCREF(result);
 
@@ -1006,16 +1022,6 @@ functools_reduce(PyObject *self, PyObject *args)
     return NULL;
 }
 
-PyDoc_STRVAR(functools_reduce_doc,
-"reduce(function, iterable[, initial], /) -> value\n\
-\n\
-Apply a function of two arguments cumulatively to the items of a sequence\n\
-or iterable, from left to right, so as to reduce the iterable to a single\n\
-value.  For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates\n\
-((((1+2)+3)+4)+5).  If initial is present, it is placed before the items\n\
-of the iterable in the calculation, and serves as a default when the\n\
-iterable is empty.");
-
 /* lru_cache object **********************************************************/
 
 /* There are four principal algorithmic differences from the pure python version:
@@ -1720,7 +1726,7 @@ PyDoc_STRVAR(_functools_doc,
 "Tools that operate on functions.");
 
 static PyMethodDef _functools_methods[] = {
-    {"reduce",          functools_reduce,     METH_VARARGS, functools_reduce_doc},
+    _FUNCTOOLS_REDUCE_METHODDEF
     _FUNCTOOLS_CMP_TO_KEY_METHODDEF
     {NULL,              NULL}           /* sentinel */
 };
@@ -1789,6 +1795,10 @@ _functools_exec(PyObject *module)
     // lru_list_elem is used only in _lru_cache_wrapper.
     // So we don't expose it in module namespace.
 
+    if (PyModule_Add(module, "_initial_missing", _PyObject_New(&PyBaseObject_Type)) < 0) {
+        return -1;
+    }
+
     return 0;
 }
 

@sayandipdutta
Copy link
Author

sayandipdutta commented Oct 24, 2024

@skirpichev should the default be specified at all? I think reduce(function, iterable, /, initial) is closer representation of internal working than reduce(function, iterable, /, initial=None). Or is there some sort of a convention?

EDIT: Ah scratch that. That makes initial required. I meant reduce(function, iterable, /[, initial])

@skirpichev
Copy link
Member

I think reduce(function, iterable, /, initial) is closer representation

No. Current code in the main more accurately can be described as function with multiple signatures. Funny notation reduce(function, iterable[, initial], /) means it's possible to have two signature:

reduce(function, iterable, /)
reduce(function, iterable, initial, /)

The AC can't represent multiple signatures yet. The only way - using the sentinel value _initial_missing, like pure-Python version does. See updated patch above. You shouldn't use None as default value.

Lib/functools.py Outdated
@@ -236,7 +236,7 @@ def __ge__(self, other):

def reduce(function, sequence, initial=_initial_missing):
"""
reduce(function, iterable[, initial], /) -> value
reduce(function, iterable, /, initial=None) -> value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use ellipsis:

Suggested change
reduce(function, iterable, /, initial=None) -> value
reduce(function, iterable, /, initial=...) -> value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See PEP 661:)

Copy link
Contributor

@nineteendo nineteendo Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the sentinel is private and doesn't even exist in the C implementation. Ellipsis is frequently used for unspecified default values in typeshed. We could use multiple signatures though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the sentinel is private and doesn't even exist in the C implementation.

It's easy to add, see #125917 (comment)

Ellipsis is frequently used for unspecified default values in typeshed.

@Eclips4?

We could use multiple signatures though.

Yes, I think it's fine for the sphinx docs. But help will looks like this (as for pure-Python version):

>>> help(functools.reduce)
Help on built-in function reduce in module _functools:

reduce(function, iterable, /,
       initial=_functools._initial_missing)
    Apply a function of two arguments cumulatively to an iterable, from left to right.

    [...]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I've ever seen =... in the docs. Do we have precedent for that?

Copy link
Author

@sayandipdutta sayandipdutta Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the signature is giving inspect a hard time. But it is autogenerated by AC. Did I do something wrong?

reduce(function, iterable, /,
       initial=_functools._initial_missing)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the sentinel is private and doesn't even exist in the C implementation. Ellipsis is frequently used for unspecified default values in typeshed. We could use multiple signatures though.

Multiple signatures for a docs sounds like a good solution.
Using ... for default values is essentially the same as using None, and it's just wrong since users can pass ... as the initial value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I've ever seen =... in the docs. Do we have precedent for that?

Yeah, e.g. for the int.from_bytes, for example.

it seems like the signature is giving inspect a hard time. But it is autogenerated by AC. Did I do something wrong?

First, note that reduce() has no correct signature in the current main.

Now AC adds one, but it can't be parsed by inspect._signature_fromstr(): this helper has own opinion on what can be specified as a default value (e.g. it can't be a complex number).

Modules/_functoolsmodule.c Outdated Show resolved Hide resolved
Modules/_functoolsmodule.c Outdated Show resolved Hide resolved
Doc/library/functools.rst Outdated Show resolved Hide resolved
@nineteendo
Copy link
Contributor

Do you update this test?

def test_functools_module_has_signatures(self):
no_signature = {'reduce'}
self._test_module_has_signatures(functools, no_signature)

@erlend-aasland erlend-aasland changed the title gh-125916: Allow functools.reduces 'initial' to be a keyword argument gh-125916: Allow functools.reduce 'initial' to be a keyword argument Oct 25, 2024
@Eclips4
Copy link
Member

Eclips4 commented Oct 25, 2024

For what it's worth I'm not convinced we need to make all parameters pos-and-keyword. I think there's a much stronger use case for passing initial= as a keyword argument, since it's an optional argument and even people who understand what reduce does in general might not immediately recognize it.

+1

Moreover, I think that this change should be broken up in two steps:

  1. Adapt reduce() to Argument Clinic (including benchmarks)
  2. Allow initial to be a keyword argument

We know that when adapting functions and methods to Argument Clinic, it is easy to introduce subtle bugs, so I would really like this to be a separate change.

I support the opinion that having all parameters as pos-and-keyword is not very useful, but if we only make initial pos-and keyword, then we need to decide what to do about Python implementation, which currently treats all parameters like pos-and-keyword arguments. Maybe we should start raising a warning if function and sequence arguments are passed as keyword arguments?

@erlend-aasland
Copy link
Contributor

[...] then we need to decide what to do about Python implementation [...]

Differences between functools.py and _functoolsmodule.c is out of scope for this issue/change.

@skirpichev
Copy link
Member

@erlend-aasland as AC changes coming from my patch, probably 1) point - is my job. I think that @sayandipdutta can keep this PR as-is for a while. I'll make a separate PR with AC-related changes.

@skirpichev
Copy link
Member

PR, that switch to AC: #125999

@sayandipdutta
Copy link
Author

Thanks a lot @skirpichev! It seems all I have to do is wait for your PR to be merged and then merge main into my PR. Will do so.

@erlend-aasland erlend-aasland marked this pull request as draft October 26, 2024 20:42
@erlend-aasland
Copy link
Contributor

Meta: I'll marked this as draft until Sergey's PR has landed.

@erlend-aasland
Copy link
Contributor

Meta: I'll marked this as draft until Sergey's PR has landed.

Sergey's Argument Clinic adaption landed just now. Please resolve conflicts, regenerate clinic, and mark the PR ready for review again.

@erlend-aasland
Copy link
Contributor

Can you post updated benchmarks vs. current main?

@sayandipdutta sayandipdutta marked this pull request as ready for review November 1, 2024 22:18
@sayandipdutta
Copy link
Author

I have made the requested changes; please review again

@bedevere-app

This comment was marked as outdated.

@sayandipdutta
Copy link
Author

sayandipdutta commented Nov 1, 2024

Checked against debug build. Followed script from #125917 (comment)

call Main branch This branch
reduce(f, lst) 3.59 us +- 0.13 us 3.51 us +- 0.14 us
reduce(f, lst, initial) 3.82 us +- 0.25 us 3.78 us +- 0.13 us

@erlend-aasland

EDIT: On release:

call Main branch This branch
reduce(f, lst) 912 ns +- 51 ns 915 ns +- 50 ns
reduce(f, lst, initial) 987 ns +- 49 ns 979 ns +- 34 ns

Copy link
Contributor

@erlend-aasland erlend-aasland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@erlend-aasland
Copy link
Contributor

BTW, you need a What's New entry for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants