gh-106529: Make FOR_ITER a viable uop #112134

gvanrossum · 2023-11-15T23:43:22Z

Also clean up a few nits in the code generator, and add back an important Make dependency for ceval.o.

Issue: Branching design for Tier 2 (uops) interpreter #106529

gvanrossum · 2023-11-15T23:44:34Z

~~Hold on, need to remove merge conflicts from commit history.~~

gvanrossum · 2023-11-16T00:02:53Z

Tools/cases_generator/flags.py

@@ -176,7 +175,7 @@ def variable_used_unspecialized(node: parsing.Node, name: str) -> bool:
    tokens: list[lx.Token] = []
    skipping = False
    for i, token in enumerate(node.tokens):
-        if token.kind == "MACRO":
+        if token.kind == "CMACRO":


NOTE: This fix resulted in _SPECIALIZE_UNPACK_SEQUENCE becoming a viable uop. It was missing a TIER_ONE_ONLY marker; I've added it back. (The fix is needed to restore the feature that this function doesn't look inside #if TIER_ONE.)

gvanrossum · 2023-11-16T00:09:36Z

@brandtbucher Let me know how bad this interferes with the JIT branch. The stuff I put in _FOR_ITER_TIER_TWO probably shouldn't be compiled literally into the template. Maybe I should make part of the code at the deoptimize label into a macro? (But the setting of frame->instr_ptr must differ -- we can't use target since that's also used by error exits, since GH-112065.)

gvanrossum · 2023-11-16T17:23:28Z

Benchmark results:
https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20231115-3.13.0a1+-5c5d8bd-PYTHON_UOPS

brandtbucher · 2023-11-16T21:45:34Z

@brandtbucher Let me know how bad this interferes with the JIT branch. The stuff I put in _FOR_ITER_TIER_TWO probably shouldn't be compiled literally into the template. Maybe I should make part of the code at the deoptimize label into a macro? (But the setting of frame->instr_ptr must differ -- we can't use target since that's also used by error exits, since GH-112065.)

So, this doesn't actually look too bad. I think with a tiny bit of reworking, this could be merged into the JIT branch with no pain:

Move the line that sets frame->instr_ptr from the exit_trace label into the _EXIT_TRACE instruction, using a macro (like CURRENT_TARGET() or something) to access the target member.
Remove the stuff from _FOR_ITER_TIER_TWO that updates the stats, saves the stack pointer, and decrefs the executor, and just goto exit_trace; instead after updating frame->instr_ptr.

gvanrossum · 2023-11-16T23:41:53Z

Okay, I think I managed to do that. I can now merge into the justin branch with only a single trivial merge conflict (we both added something to the end of ceval_macros.h). Next up of course, how would I redefine CURRENT_TARGET() in templace.c?

brandtbucher · 2023-11-16T23:43:58Z

#define CURRENT_TARGET() (target)

gvanrossum · 2023-11-16T23:54:12Z

#define CURRENT_TARGET() (target)

Yup, and I also needed to move the exit_trace: label one line down (to avoid setting frame->instr_ptr twice).

It seems to work, but unsure how to prove it (it compiles and passes tests, but it would too if the JIT was never invoked).

Anyway, we should probably wait until we've decided that making Tier 2 5% slower is a good idea.

markshannon · 2023-11-17T10:17:49Z

There seem to be a few unrelated changes in this PR.
Could you make another PR for the increase to the uop buffer size, extra debugging and change to UNPACK_SEQUENCE?

markshannon · 2023-11-17T10:25:02Z

Python/bytecodes.c

+                /* iterator ended normally */
+                Py_DECREF(iter);
+                STACK_SHRINK(1);
+                /* HACK: Emulate DEOPT_IF to jump over END_FOR */


No hacks, please 🙂

The code should look like this:

if (next == NULL) { if (_PyErr_Occurred(tstate)) { if (!_PyErr_ExceptionMatches(tstate, PyExc_StopIteration)) { GOTO_ERROR(error); } _PyErr_Clear(tstate); } /* iterator ended normally */ Py_DECREF(iter); STACK_SHRINK(1); DEOPT_IF(true); }

The trace generator can adjust the target, so it points after the END_FOR.

gvanrossum · 2023-11-17T19:13:47Z

Could you make another PR for the increase to the uop buffer size, extra debugging and change to UNPACK_SEQUENCE?

Sure, see gh-112214. I've merged that into main, since the tests pass, next I'll merge it into this PR and do the other thing you requested.

- Double max trace size to 256 - Add a dependency on executor_cases.c.h for ceval.o - Mark `_SPECIALIZE_UNPACK_SEQUENCE` as `TIER_ONE_ONLY` - Add debug output back showing the optimized trace - Bunch of cleanups to Tools/cases_generator/

gvanrossum · 2023-11-17T21:17:52Z

Okay, here's the new version. @markshannon Want to review one more time?

Python/bytecodes.c

Python/optimizer.c

Tools/cases_generator/instructions.py

gvanrossum · 2023-11-17T22:25:23Z

I've applied Brandt's suggestions.

I've also split off some more cleanups and debug improvements into gh-112218 and gh-112220. Once those land I'll merge main once more and this should shrink a bit more (the cases_generator tweaks will be separated out).

gvanrossum · 2023-11-18T04:57:38Z

Benchmark says 4% slower (with Tier 2):
https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20231117-3.13.0a1+-14aea56-PYTHON_UOPS

For spectral_norm, one out of five traces is still too long (that must be one hell of a trace :-)
Also, spectral_norm no longer uses any unsupported opcodes (nice!)
Pystats diff show about 8% more traces executed, 8% more uops executed
Overall, FOR_ITER_GEN is now by far the most-used unsupported opcode (74k, vs. 9k for the next most-used, CALL)

gvanrossum · 2023-11-19T23:56:00Z

FWIW, I have a branch on top of this that makes the uop interpreter 3% faster. The approach is to let the code generator generate code that extracts oparg and operand from the instruction stream as needed (i.e., only for opcodes that use them) rather than the interpreter loop preemptively extracting them for every instruction. I'm not completely happy with it, and it probably would make life a little harder for the JIT template, which doesn't like directly referencing next_uop -- this should probably wait until we're ready to generate the JIT template code separately from the Tier 2 interpreter code.

brandtbucher · 2023-11-20T00:14:29Z

Honestly, as long as it’s done in macros, it shouldn’t be too bad. Macros are a great escape hatch in the short term before we start generating bespoke “JIT cases”.

gvanrossum · 2023-11-20T00:28:03Z

Honestly, as long as it’s done in macros, it shouldn’t be too bad. Macros are a great escape hatch in the short term before we start generating bespoke “JIT cases”.

So it could generate something like

case _FOO_UOP: {
    oparg = CURRENT_OPARG();
    operand = CURRENT_OPERAND();
    ...
}

where the default definitions for those (in ceval_macros.h) would be

#define CURRENT_OPARG() next_uop[-1].oparg
#define CURRENT_OPERAND() next_uop[-1].operand

brandtbucher · 2023-11-20T00:32:38Z

Yup!

- Double max trace size to 256 - Add a dependency on executor_cases.c.h for ceval.o - Mark `_SPECIALIZE_UNPACK_SEQUENCE` as `TIER_ONE_ONLY` - Add debug output back showing the optimized trace - Bunch of cleanups to Tools/cases_generator/

This uses the new mechanism whereby certain uops are replaced by others during translation, using the `_PyUop_Replacements` table. We further special-case the `_FOR_ITER_TIER_TWO` uop to update the deoptimization target to point just past the corresponding `END_FOR` opcode. Two tiny code cleanups are also part of this PR.

- Double max trace size to 256 - Add a dependency on executor_cases.c.h for ceval.o - Mark `_SPECIALIZE_UNPACK_SEQUENCE` as `TIER_ONE_ONLY` - Add debug output back showing the optimized trace - Bunch of cleanups to Tools/cases_generator/

This uses the new mechanism whereby certain uops are replaced by others during translation, using the `_PyUop_Replacements` table. We further special-case the `_FOR_ITER_TIER_TWO` uop to update the deoptimization target to point just past the corresponding `END_FOR` opcode. Two tiny code cleanups are also part of this PR.

bedevere-app bot added the awaiting core review label Nov 15, 2023

bedevere-app bot mentioned this pull request Nov 15, 2023

Branching design for Tier 2 (uops) interpreter #106529

Open

11 tasks

gvanrossum marked this pull request as draft November 15, 2023 23:44

bedevere-app bot removed the awaiting core review label Nov 15, 2023

gvanrossum added 5 commits November 15, 2023 15:54

Add executor_cases.c.h dependency for ceval.o

a08909d

Clean up flags.py

4c2914b

Clean up parsing.py

053a0a2

Add back printing optimized uops

b838435

Hacky way to make FOR_ITER a viable uop

b28effa

gvanrossum force-pushed the for-iter-uop branch from e909a83 to b28effa Compare November 15, 2023 23:58

gvanrossum marked this pull request as ready for review November 15, 2023 23:58

bedevere-app bot added the awaiting core review label Nov 15, 2023

gvanrossum commented Nov 16, 2023

View reviewed changes

_SPECIALIZE_UNPACK_SEQUENCE is TIER_ONE_ONLY

de8f199

NEWS

5c5d8bd

Double max trace length to 256

36e9ada

Move stuff around to suit the JIT branch

def1830

gvanrossum requested a review from markshannon as a code owner November 16, 2023 23:40

markshannon reviewed Nov 17, 2023

View reviewed changes

Merge remote-tracking branch 'origin/main' into for-iter-uop

ce19637

gvanrossum added 2 commits November 17, 2023 12:59

Clean up _FOR_ITER_TIER_TWO using DEOPT_IF(true)

7096818

Add test

5852105

brandtbucher reviewed Nov 17, 2023

View reviewed changes

Python/bytecodes.c Outdated Show resolved Hide resolved

brandtbucher reviewed Nov 17, 2023

View reviewed changes

Python/optimizer.c Outdated Show resolved Hide resolved

brandtbucher reviewed Nov 17, 2023

View reviewed changes

Tools/cases_generator/instructions.py Outdated Show resolved Hide resolved

gvanrossum added 3 commits November 17, 2023 14:06

Revert debug change to is_viable_uop()

4ac68b3

Avoid debug-only local variable 'word'

95b1a01

Revert changes to _EXIT_TRACE logic

4c72028

Merge remote-tracking branch 'origin/main' into for-iter-uop

14aea56

Merge branch 'main' into for-iter-uop

88c1701

gvanrossum merged commit 1995955 into python:main Nov 20, 2023
30 checks passed

bedevere-app bot removed the awaiting core review label Nov 20, 2023

gvanrossum deleted the for-iter-uop branch November 20, 2023 18:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-106529: Make FOR_ITER a viable uop #112134

gh-106529: Make FOR_ITER a viable uop #112134

gvanrossum commented Nov 15, 2023 •

edited by bedevere-app bot

Loading

gvanrossum commented Nov 15, 2023 •

edited

Loading

gvanrossum Nov 16, 2023 •

edited

Loading

gvanrossum commented Nov 16, 2023

gvanrossum commented Nov 16, 2023

brandtbucher commented Nov 16, 2023

gvanrossum commented Nov 16, 2023

brandtbucher commented Nov 16, 2023

gvanrossum commented Nov 16, 2023

markshannon commented Nov 17, 2023

markshannon Nov 17, 2023

gvanrossum commented Nov 17, 2023 •

edited

Loading

gvanrossum commented Nov 17, 2023

gvanrossum commented Nov 17, 2023

gvanrossum commented Nov 18, 2023 •

edited

Loading

gvanrossum commented Nov 19, 2023

brandtbucher commented Nov 20, 2023

gvanrossum commented Nov 20, 2023

brandtbucher commented Nov 20, 2023

gh-106529: Make FOR_ITER a viable uop #112134

gh-106529: Make FOR_ITER a viable uop #112134

Conversation

gvanrossum commented Nov 15, 2023 • edited by bedevere-app bot Loading

gvanrossum commented Nov 15, 2023 • edited Loading

gvanrossum Nov 16, 2023 • edited Loading

Choose a reason for hiding this comment

gvanrossum commented Nov 16, 2023

gvanrossum commented Nov 16, 2023

brandtbucher commented Nov 16, 2023

gvanrossum commented Nov 16, 2023

brandtbucher commented Nov 16, 2023

gvanrossum commented Nov 16, 2023

markshannon commented Nov 17, 2023

markshannon Nov 17, 2023

Choose a reason for hiding this comment

gvanrossum commented Nov 17, 2023 • edited Loading

gvanrossum commented Nov 17, 2023

gvanrossum commented Nov 17, 2023

gvanrossum commented Nov 18, 2023 • edited Loading

gvanrossum commented Nov 19, 2023

brandtbucher commented Nov 20, 2023

gvanrossum commented Nov 20, 2023

brandtbucher commented Nov 20, 2023

gvanrossum commented Nov 15, 2023 •

edited by bedevere-app bot

Loading

gvanrossum commented Nov 15, 2023 •

edited

Loading

gvanrossum Nov 16, 2023 •

edited

Loading

gvanrossum commented Nov 17, 2023 •

edited

Loading

gvanrossum commented Nov 18, 2023 •

edited

Loading