-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-45885: Specialize COMPARE_OP #29734
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
You microbenchmark is only testing |
Python/ceval.c
Outdated
PyObject *left = SECOND(); | ||
DEOPT_IF(!PyUnicode_CheckExact(left), COMPARE_OP); | ||
DEOPT_IF(!PyUnicode_CheckExact(right), COMPARE_OP); | ||
DEOPT_IF(!PyUnicode_IS_READY(left), COMPARE_OP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? If left == right
then they are equal regardless of whether the string is ready or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect that almost all strings are ready. I suppose the alternative is
DEOPT_IF(!PyUnicode_CheckExact(left), COMPARE_OP);
DEOPT_IF(!PyUnicode_CheckExact(right), COMPARE_OP);
int res = 1;
if (left != right) {
DEOPT_IF(!PyUnicode_IS_READY(left), COMPARE_OP);
DEOPT_IF(!PyUnicode_IS_READY(left), COMPARE_OP);
res = comare_the_strings_assuming_theyre_ready(left, right);
}
is that what you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Assuming that the identity shortcut is worthwhile, then it makes sense to shortcut as much work as possible.
You are also missing out of the quick inequality test using lengths len(a) != len(b)
implies that a != b
.
Maybe wrap unicode_compare_eq
in an equality function that readies the strings (I'm surprised that no such thing exists) and use that?
int res = 1;
if (left != right) {
res = _PyUnicode_Equal(left, right);
}
There are a lot of branches in the int and float specializations, which is not good for performance. Because neither comparison can fail (ints have a total ordering, and floats have a total ordering plus a special not-equal-if-either-is-a-nan result) we can avoid the branching by masking the result of a comparison with a code for the operator. Also see my comment on the issue about avoiding the need to push and pop objects. |
This comment has been minimized.
This comment has been minimized.
Here's one idea for ints:
Then the mask can be chosen depending on whether the following instruction is pop_jump_if_true/false. There would still be the branch depending on whether Py_SIZE(left) - Py_SIZE(right) == 0 and a loop over digits and a branch to check for Py_SIZE(left) < 0, but at least it would save the switch over oparg, the push/pop/incref/decref, and one DISPATCH(). Maybe the float equivalent would deopt if either arg is NaN and then do |
These three combined opcodes have slightly worse hit percentages on some benchmarks. |
Newer Pyperformance numbers (Note that my setup is not very stable): Slower (20):
Faster (17):
Benchmark hidden because not significant (18): chaos, crypto_pyaes, dulwich_log, json_dumps, json_loads, logging_format, raytrace, regex_compile, regex_v8, richards, sympy_expand, sympy_integrate, sympy_sum, sympy_str, xml_etree_parse, xml_etree_iterparse, xml_etree_generate, xml_etree_process Geometric mean: 1.00x slower |
Newest microbenchmarks: pretty good, even including more comparisons from pyperf import Runner
runner = Runner()
runner.timeit(
"float_loop",
setup="arr = [float(x) for x in range(5_000_000)]",
stmt="for x in arr:\n"
" if x > 1e6 or x == 1e6 or x < 0.0:\n"
" break\n",
)
runner.timeit(
"int_loop",
setup="arr = list(range(5_000_000))",
stmt="for i in arr:\n"
" if i > 6_000_000 or i == 6_000_000 or i < 0:\n"
" break\n",
)
runner.timeit(
"str_loop",
setup="arr = ['Py'] * 5_000_000",
stmt="for word in arr:\n"
" if word == 'Python' or word != 'Py':\n"
" break\n",
)
"""
float_loop: Mean +- std dev: [micro_main] 41.5 ms +- 0.4 ms -> [micro_combined3] 23.7 ms +- 0.8 ms: 1.75x faster
int_loop: Mean +- std dev: [micro_main] 202 ms +- 2 ms -> [micro_combined3] 133 ms +- 1 ms: 1.52x faster
str_loop: Mean +- std dev: [micro_main] 140 ms +- 7 ms -> [micro_combined3] 88.1 ms +- 3.9 ms: 1.59x faster
Geometric mean: 1.62x faster
""" |
This comment has been minimized.
This comment has been minimized.
Slower (19):
Faster (20):
Benchmark hidden because not significant (16): chaos, float, logging_format, pickle, pidigits, python_startup_no_site, regex_compile, regex_dna, richards, sympy_integrate, telco, tornado_http, unpickle, xml_etree_parse, xml_etree_generate, xml_etree_process Geometric mean: 1.00x slower |
My results are less noisy and show a ~1% speedup. Given that some of the largest improvements are from notoriously noisy benchmarks, 1% may be overstating the speedup but it does seem to be real. |
double dleft = PyFloat_AS_DOUBLE(left); | ||
double dright = PyFloat_AS_DOUBLE(right); | ||
int sign = (dleft > dright) - (dleft < dright); | ||
DEOPT_IF(isnan(dleft), COMPARE_OP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially we could use the fact that nans are not equal to everything including themselves to add a fourth bit to the mask. Probably leave that for another PR as the maths starts getting a bit convoluted.
goto success; | ||
} | ||
} | ||
SPECIALIZATION_FAIL(COMPARE_OP, SPEC_FAIL_OTHER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like more information here.
Is this a builtin class, or a Python class?
Does it override the comparison, or (in the case of ==/!==
) does it rely on the default identity comparison?
Doesn't need to be in this PR, though.
Looks good in general. The approach seems sound and gives a speedup. There a couple of things that need to be tightened up, and a couple of possible enhancements (for future PRs). |
Updated results are good. |
👍 |
|
@sweeneyde |
https://bugs.python.org/issue45885