Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-91432: Replace JUMP+FOR_ITER with FOR_END #70016

Closed
wants to merge 31 commits into from

Conversation

sweeneyde
Copy link
Member

@sweeneyde sweeneyde commented Apr 10, 2022

#91432

TODO:

  • Get test_sys_settrace passing
  • get test_trace passing
  • Get test_bdb passing
  • Get test_pdb passing
  • refactor/comment maybe_call_line_trace in ceval.c
  • issue
  • NEWS
  • DOCS
  • WhatsNew

@sweeneyde
Copy link
Member Author

sweeneyde commented Apr 10, 2022

Below are some benchmarks. My machine is not the most stable, but I believe there is some consistent measurable speedup.

PyPerformance:

Slower (9):

  • scimark_sparse_mat_mult: 4.87 ms +- 0.23 ms -> 5.04 ms +- 0.19 ms: 1.04x slower
  • spectral_norm: 96.6 ms +- 1.4 ms -> 100.0 ms +- 1.7 ms: 1.03x slower
  • regex_v8: 23.3 ms +- 0.4 ms -> 24.0 ms +- 0.5 ms: 1.03x slower
  • pathlib: 19.7 ms +- 0.4 ms -> 20.2 ms +- 0.8 ms: 1.03x slower
  • sympy_sum: 184 ms +- 2 ms -> 188 ms +- 6 ms: 1.02x slower
  • telco: 6.46 ms +- 0.12 ms -> 6.56 ms +- 0.19 ms: 1.02x slower
  • sqlalchemy_imperative: 24.5 ms +- 1.0 ms -> 24.8 ms +- 0.7 ms: 1.02x slower
  • regex_dna: 209 ms +- 3 ms -> 212 ms +- 3 ms: 1.01x slower
  • logging_simple: 6.55 us +- 0.12 us -> 6.64 us +- 0.28 us: 1.01x slower

Faster (20):

  • pickle_list: 4.77 us +- 0.17 us -> 4.44 us +- 0.15 us: 1.07x faster
  • logging_silent: 105 ns +- 5 ns -> 98.0 ns +- 2.0 ns: 1.07x faster
  • nbody: 105 ms +- 5 ms -> 99.1 ms +- 3.7 ms: 1.06x faster
  • pickle_dict: 28.6 us +- 0.4 us -> 26.9 us +- 0.5 us: 1.06x faster
  • scimark_lu: 111 ms +- 2 ms -> 105 ms +- 3 ms: 1.05x faster
  • pickle: 12.6 us +- 1.1 us -> 12.0 us +- 1.4 us: 1.05x faster
  • xml_etree_iterparse: 116 ms +- 16 ms -> 110 ms +- 2 ms: 1.05x faster
  • deltablue: 4.27 ms +- 0.18 ms -> 4.12 ms +- 0.09 ms: 1.04x faster
  • regex_effbot: 3.19 ms +- 0.09 ms -> 3.08 ms +- 0.04 ms: 1.04x faster
  • scimark_monte_carlo: 72.9 ms +- 1.5 ms -> 70.4 ms +- 1.2 ms: 1.03x faster
  • django_template: 42.1 ms +- 2.2 ms -> 40.9 ms +- 2.4 ms: 1.03x faster
  • float: 81.6 ms +- 1.5 ms -> 79.7 ms +- 1.3 ms: 1.02x faster
  • pickle_pure_python: 351 us +- 10 us -> 343 us +- 8 us: 1.02x faster
  • chaos: 80.3 ms +- 1.8 ms -> 78.8 ms +- 2.1 ms: 1.02x faster
  • unpickle_pure_python: 263 us +- 6 us -> 259 us +- 7 us: 1.02x faster
  • nqueens: 87.0 ms +- 1.4 ms -> 85.5 ms +- 2.2 ms: 1.02x faster
  • raytrace: 329 ms +- 9 ms -> 324 ms +- 10 ms: 1.01x faster
  • unpickle_list: 5.11 us +- 0.12 us -> 5.05 us +- 0.08 us: 1.01x faster
  • regex_compile: 152 ms +- 3 ms -> 151 ms +- 3 ms: 1.01x faster
  • scimark_sor: 119 ms +- 2 ms -> 118 ms +- 1 ms: 1.01x faster

Benchmark hidden because not significant (29): 2to3, chameleon, crypto_pyaes, dulwich_log, fannkuch, go, hexiom, json_dumps, json_loads, logging_format, mako, meteor_contest, pidigits, pyflate, python_startup, python_startup_no_site, richards, scimark_fft, sqlalchemy_declarative, sqlite_synth, sympy_expand, sympy_integrate, sympy_str, tornado_http, unpack_sequence, unpickle, xml_etree_parse, xml_etree_generate, xml_etree_process

Geometric mean: 1.01x faster

Microbenchmarks:

benchmark code:

from itertools import repeat
from pyperf import Runner, perf_counter

runner = Runner()
def time_this(func):
    runner.bench_time_func(func.__name__, func)
    return func

###############################

@time_this
def range_sum(loops):
    s = 0
    r = iter(range(loops))
    t0 = perf_counter()
    for x in r:
        s += x
    return perf_counter() - t0

@time_this
def list_sum(loops):
    s = 0.0
    r = iter([1.0] * loops)
    t0 = perf_counter()
    for x in r:
        s += x
    return perf_counter() - t0

@time_this
def repeat_sum(loops):
    s = 0.0
    r = repeat(1.0, loops)
    t0 = perf_counter()
    for x in r:
        s += x
    return perf_counter() - t0

###############################

@time_this
def range_all(loops):
    r = iter(range(1, loops + 1))
    t0 = perf_counter()
    for x in r:
        if not x:
            break
    return perf_counter() - t0

@time_this
def list_all(loops):
    r = iter([True] * loops)
    t0 = perf_counter()
    for x in r:
        if not x:
            break
    return perf_counter() - t0

@time_this
def repeat_all(loops):
    r = repeat(True, loops)
    t0 = perf_counter()
    for x in r:
        if not x:
            break
    return perf_counter() - t0

Faster (6):

  • range_all: 17.5 ns +- 0.1 ns -> 16.6 ns +- 0.3 ns: 1.05x faster
  • list_all: 10.6 ns +- 0.7 ns -> 10.3 ns +- 0.4 ns: 1.03x faster
  • repeat_sum: 15.2 ns +- 0.2 ns -> 14.8 ns +- 0.5 ns: 1.03x faster
  • list_sum: 17.3 ns +- 0.2 ns -> 16.9 ns +- 0.3 ns: 1.03x faster
  • range_sum: 31.5 ns +- 0.3 ns -> 30.9 ns +- 0.3 ns: 1.02x faster
  • repeat_all: 7.98 ns +- 0.06 ns -> 7.90 ns +- 0.10 ns: 1.01x faster

Geometric mean: 1.03x faster

These don't make too much sense (some macro-benchmarks speeding up more than the tightest micro-benchmarks?), so if someone with a stable machine is willing to re-measure, that would be appreciated.

@sweeneyde sweeneyde marked this pull request as ready for review April 10, 2022 23:29
@sweeneyde sweeneyde changed the title Replace JUMP_ABSOLUTE/FOR_ITER with FOR_END gh-91432: Replace JUMP+FOR_ITER with FOR_END Apr 10, 2022
Comment on lines +6760 to +6773
int prev_op = _PyOpcode_Deopt[_Py_OPCODE(code[instr_prev])];
int next_op = _PyOpcode_Deopt[_Py_OPCODE(*frame->prev_instr)];
// Trace before FOR_END, not after, even though a backwards
// jump happens after. However, don't trace on the first FOR_END
// of the for loop, since we're staying on the same line.
// Also don't trace JUMP_NO_INTERRUPT --> SEND.
bool line_number_changed = (line != lastline);
bool first_iteration = (next_op == FOR_END &&
prev_op == JUMP_FORWARD &&
!line_number_changed);
bool for_loop_end = (next_op == FOR_END && !first_iteration);
bool back_jump = (_PyInterpreterFrame_LASTI(frame) < instr_prev &&
next_op != SEND && prev_op != FOR_END);
if (line_number_changed || for_loop_end || back_jump) {
Copy link
Member Author

@sweeneyde sweeneyde Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not yet totally convinced of the robustness of the first_iteration check. It works on the whole test suite, and I couldn't come up with any failing cases, but it feels fragile.

Something like this correctly traces the backwards jumps because the POP_TOP has line number -1:

def for_loop_same_line_with_jump():           # line 0
    for x in (0, 1, 2): +x if x >= 0 else -x  # line 1

I don't know of any way to get JUMP_FORWARD targeted at FOR_END on the same line, unless it's the first loop iteration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could add something like this if preferred:

#ifdef Py_DEBUG
        if (first_iteration) {
            int prevprev_op = _PyOpcode_Deopt[_Py_OPCODE(code[instr_prev-1])];
            assert(prevprev_op == GET_ITER || prevprev_op == LOAD_FAST);
            if (prevprev_op == LOAD_FAST) {
                assert(_Py_OPARG(code[instr_prev-1]) == 0);
                PyObject *names = frame->f_code->co_localsplusnames;
                PyObject *name = PyTuple_GET_ITEM(names, 0);
                assert(_PyUnicode_EqualToASCIIString(name, ".0"));
            }
        }
#endif

@cpython-cla-bot

This comment was marked as outdated.

@ambv ambv closed this Apr 11, 2022
@ambv ambv reopened this Apr 11, 2022
@markshannon
Copy link
Member

@sweeneyde thanks for trying this, but it looks like we want to make all branches go forward (except JUMP_BACKWARD and JUMP_BACKWARD_NO_INTERRUPT.

faster-cpython/ideas#297 (comment)

@sweeneyde sweeneyde closed this May 20, 2022
@iritkatriel
Copy link
Member

Is this an older version of the same idea? #46711

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants