-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-91432: Replace JUMP+FOR_ITER with FOR_END #70016
Conversation
Below are some benchmarks. My machine is not the most stable, but I believe there is some consistent measurable speedup. PyPerformance:Slower (9):
Faster (20):
Benchmark hidden because not significant (29): 2to3, chameleon, crypto_pyaes, dulwich_log, fannkuch, go, hexiom, json_dumps, json_loads, logging_format, mako, meteor_contest, pidigits, pyflate, python_startup, python_startup_no_site, richards, scimark_fft, sqlalchemy_declarative, sqlite_synth, sympy_expand, sympy_integrate, sympy_str, tornado_http, unpack_sequence, unpickle, xml_etree_parse, xml_etree_generate, xml_etree_process Geometric mean: 1.01x faster Microbenchmarks:benchmark code: from itertools import repeat
from pyperf import Runner, perf_counter
runner = Runner()
def time_this(func):
runner.bench_time_func(func.__name__, func)
return func
###############################
@time_this
def range_sum(loops):
s = 0
r = iter(range(loops))
t0 = perf_counter()
for x in r:
s += x
return perf_counter() - t0
@time_this
def list_sum(loops):
s = 0.0
r = iter([1.0] * loops)
t0 = perf_counter()
for x in r:
s += x
return perf_counter() - t0
@time_this
def repeat_sum(loops):
s = 0.0
r = repeat(1.0, loops)
t0 = perf_counter()
for x in r:
s += x
return perf_counter() - t0
###############################
@time_this
def range_all(loops):
r = iter(range(1, loops + 1))
t0 = perf_counter()
for x in r:
if not x:
break
return perf_counter() - t0
@time_this
def list_all(loops):
r = iter([True] * loops)
t0 = perf_counter()
for x in r:
if not x:
break
return perf_counter() - t0
@time_this
def repeat_all(loops):
r = repeat(True, loops)
t0 = perf_counter()
for x in r:
if not x:
break
return perf_counter() - t0 Faster (6):
Geometric mean: 1.03x faster These don't make too much sense (some macro-benchmarks speeding up more than the tightest micro-benchmarks?), so if someone with a stable machine is willing to re-measure, that would be appreciated. |
int prev_op = _PyOpcode_Deopt[_Py_OPCODE(code[instr_prev])]; | ||
int next_op = _PyOpcode_Deopt[_Py_OPCODE(*frame->prev_instr)]; | ||
// Trace before FOR_END, not after, even though a backwards | ||
// jump happens after. However, don't trace on the first FOR_END | ||
// of the for loop, since we're staying on the same line. | ||
// Also don't trace JUMP_NO_INTERRUPT --> SEND. | ||
bool line_number_changed = (line != lastline); | ||
bool first_iteration = (next_op == FOR_END && | ||
prev_op == JUMP_FORWARD && | ||
!line_number_changed); | ||
bool for_loop_end = (next_op == FOR_END && !first_iteration); | ||
bool back_jump = (_PyInterpreterFrame_LASTI(frame) < instr_prev && | ||
next_op != SEND && prev_op != FOR_END); | ||
if (line_number_changed || for_loop_end || back_jump) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not yet totally convinced of the robustness of the first_iteration
check. It works on the whole test suite, and I couldn't come up with any failing cases, but it feels fragile.
Something like this correctly traces the backwards jumps because the POP_TOP has line number -1:
def for_loop_same_line_with_jump(): # line 0
for x in (0, 1, 2): +x if x >= 0 else -x # line 1
I don't know of any way to get JUMP_FORWARD targeted at FOR_END on the same line, unless it's the first loop iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could add something like this if preferred:
#ifdef Py_DEBUG
if (first_iteration) {
int prevprev_op = _PyOpcode_Deopt[_Py_OPCODE(code[instr_prev-1])];
assert(prevprev_op == GET_ITER || prevprev_op == LOAD_FAST);
if (prevprev_op == LOAD_FAST) {
assert(_Py_OPARG(code[instr_prev-1]) == 0);
PyObject *names = frame->f_code->co_localsplusnames;
PyObject *name = PyTuple_GET_ITEM(names, 0);
assert(_PyUnicode_EqualToASCIIString(name, ".0"));
}
}
#endif
This comment was marked as outdated.
This comment was marked as outdated.
@sweeneyde thanks for trying this, but it looks like we want to make all branches go forward (except |
Is this an older version of the same idea? #46711 |
#91432
TODO: