Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-46939: Specialize calls to Python classes (POSTCALL edition) #31936

Closed

Conversation

Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented Mar 16, 2022

@Fidget-Spinner
Copy link
Member Author

Final alternative method, if we don't want all the stack manipulation: PRECALL_PY_CLASS shall incref self, then set oparg of POST_CALL to 0 to signal it to return self. Then POST_CALL accesses into above the TOS (into the already decrefed arguments) to find arg[0] (self).

Downside is that decref during error handling is tricky. Current method of pushing things onto the stack handles that for us.

@markshannon
Copy link
Member

Have you benchmarked this?
I'm worried that the extra 6 bytes per call, plus the extra code, is going to cost more than the specialization gains.

@Fidget-Spinner
Copy link
Member Author

Have you benchmarked this? I'm worried that the extra 6 bytes per call, plus the extra code, is going to cost more than the specialization gains.

Not yet. I plan to do so tomorrow (your time). The results should be available by friday.

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented Mar 18, 2022

compare_to RETURN_VALUE hack PR -> POSTCALL:

Slower (13):
- scimark_lu: 184 ms +- 1 ms -> 191 ms +- 3 ms: 1.04x slower
- go: 228 ms +- 2 ms -> 235 ms +- 2 ms: 1.03x slower
- pathlib: 30.1 ms +- 0.5 ms -> 30.9 ms +- 0.4 ms: 1.03x slower
- pidigits: 302 ms +- 1 ms -> 309 ms +- 1 ms: 1.02x slower
- scimark_sor: 184 ms +- 2 ms -> 187 ms +- 4 ms: 1.02x slower
- logging_simple: 9.09 us +- 0.14 us -> 9.22 us +- 0.14 us: 1.01x slower
- json_dumps: 19.2 ms +- 0.2 ms -> 19.4 ms +- 0.2 ms: 1.01x slower
- hexiom: 11.0 ms +- 0.1 ms -> 11.1 ms +- 0.1 ms: 1.01x slower
- richards: 76.1 ms +- 1.6 ms -> 77.1 ms +- 1.8 ms: 1.01x slower
- logging_silent: 171 ns +- 4 ns -> 173 ns +- 4 ns: 1.01x slower
- telco: 10.1 ms +- 0.2 ms -> 10.2 ms +- 0.1 ms: 1.01x slower
- dulwich_log: 119 ms +- 1 ms -> 120 ms +- 1 ms: 1.01x slower
- deltablue: 6.38 ms +- 0.06 ms -> 6.44 ms +- 0.08 ms: 1.01x slower

Faster (11):
- unpickle: 20.1 us +- 0.2 us -> 19.6 us +- 0.1 us: 1.02x faster
- sqlite_synth: 3.49 us +- 0.13 us -> 3.41 us +- 0.07 us: 1.02x faster
- regex_v8: 37.6 ms +- 0.8 ms -> 36.9 ms +- 0.3 ms: 1.02x faster
- nqueens: 133 ms +- 1 ms -> 130 ms +- 1 ms: 1.02x faster
- unpickle_list: 7.63 us +- 0.15 us -> 7.50 us +- 0.14 us: 1.02x faster
- spectral_norm: 165 ms +- 1 ms -> 162 ms +- 4 ms: 1.02x faster
- django_template: 56.8 ms +- 2.0 ms -> 56.0 ms +- 0.5 ms: 1.01x faster
- unpack_sequence: 74.0 ns +- 0.7 ns -> 73.0 ns +- 0.8 ns: 1.01x faster
- nbody: 147 ms +- 2 ms -> 145 ms +- 1 ms: 1.01x faster
- regex_dna: 334 ms +- 1 ms -> 330 ms +- 2 ms: 1.01x faster
- pickle: 16.1 us +- 0.2 us -> 15.9 us +- 0.3 us: 1.01x faster

Benchmark hidden because not significant (33): 2to3, chameleon, chaos, crypto_pyaes, fannkuch, float, html5lib, json_loads, logging_format, mako, meteor_contest, pickle_dict, pickle_list, pickle_pure_python, pyflate, python_startup, python_startup_no_site, raytrace, regex_compile, regex_effbot, scimark_fft, scimark_monte_carlo, scimark_sparse_mat_mult, sympy_expand, sympy_integrate, sympy_sum, sympy_str, tornado_http, unpickle_pure_python, xml_etree_parse, xml_etree_iterparse, xml_etree_generate, xml_etree_process

Geometric mean: 1.00x slower

compare_to main->POSTCALL:

Slower (9):
- pidigits: 297 ms +- 1 ms -> 309 ms +- 1 ms: 1.04x slower
- fannkuch: 603 ms +- 6 ms -> 626 ms +- 5 ms: 1.04x slower
- logging_simple: 8.91 us +- 0.12 us -> 9.22 us +- 0.14 us: 1.03x slower
- richards: 75.4 ms +- 1.8 ms -> 77.1 ms +- 1.8 ms: 1.02x slower
- logging_format: 10.1 us +- 0.2 us -> 10.2 us +- 0.2 us: 1.02x slower
- pickle_dict: 47.6 us +- 0.2 us -> 48.3 us +- 0.2 us: 1.02x slower
- mako: 17.7 ms +- 0.1 ms -> 18.0 ms +- 0.1 ms: 1.02x slower
- go: 232 ms +- 2 ms -> 235 ms +- 2 ms: 1.01x slower
- unpickle_list: 7.42 us +- 0.15 us -> 7.50 us +- 0.14 us: 1.01x slower

Faster (21):
- raytrace: 486 ms +- 4 ms -> 445 ms +- 3 ms: 1.09x faster
- chaos: 113 ms +- 1 ms -> 105 ms +- 1 ms: 1.07x faster
- float: 126 ms +- 1 ms -> 118 ms +- 1 ms: 1.07x faster
- nbody: 150 ms +- 2 ms -> 145 ms +- 1 ms: 1.03x faster
- chameleon: 11.3 ms +- 0.2 ms -> 10.9 ms +- 0.1 ms: 1.03x faster
- unpack_sequence: 75.2 ns +- 0.9 ns -> 73.0 ns +- 0.8 ns: 1.03x faster
- scimark_fft: 538 ms +- 12 ms -> 523 ms +- 3 ms: 1.03x faster
- regex_effbot: 5.32 ms +- 0.06 ms -> 5.21 ms +- 0.04 ms: 1.02x faster
- scimark_sor: 191 ms +- 2 ms -> 187 ms +- 4 ms: 1.02x faster
- deltablue: 6.57 ms +- 0.07 ms -> 6.44 ms +- 0.08 ms: 1.02x faster
- regex_dna: 336 ms +- 12 ms -> 330 ms +- 2 ms: 1.02x faster
- telco: 10.4 ms +- 0.4 ms -> 10.2 ms +- 0.1 ms: 1.02x faster
- dulwich_log: 123 ms +- 1 ms -> 120 ms +- 1 ms: 1.02x faster
- json_loads: 42.9 us +- 0.4 us -> 42.2 us +- 0.3 us: 1.02x faster
- regex_v8: 37.5 ms +- 0.8 ms -> 36.9 ms +- 0.3 ms: 1.02x faster
- pyflate: 711 ms +- 9 ms -> 702 ms +- 6 ms: 1.01x faster
- unpickle: 19.9 us +- 0.3 us -> 19.6 us +- 0.1 us: 1.01x faster
- regex_compile: 224 ms +- 1 ms -> 221 ms +- 1 ms: 1.01x faster
- spectral_norm: 164 ms +- 3 ms -> 162 ms +- 4 ms: 1.01x faster
- json_dumps: 19.6 ms +- 0.3 ms -> 19.4 ms +- 0.2 ms: 1.01x faster
- tornado_http: 207 ms +- 4 ms -> 205 ms +- 4 ms: 1.01x faster

Benchmark hidden because not significant (27): 2to3, crypto_pyaes, django_template, hexiom, html5lib, logging_silent, meteor_contest, nqueens, pathlib, pickle, pickle_list, pickle_pure_python, python_startup, python_startup_no_site, scimark_lu, scimark_monte_carlo, scimark_sparse_mat_mult, sqlite_synth, sympy_expand, sympy_integrate, sympy_sum, sympy_str, unpickle_pure_python, xml_etree_parse, xml_etree_iterparse, xml_etree_generate, xml_etree_process

Geometric mean: 1.01x faster

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented Mar 24, 2022

@markshannon in short, POSTCALL is slightly slower than the RETURN_VALUE hack. But both approaches are still 1.01x faster than main.

Shall we go ahead with POSTCALL then? I can update this PR with main if you're satisfied.

@Fidget-Spinner
Copy link
Member Author

test_dis and _bootstrap_external are core developer merge race conditions :(.

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented Apr 6, 2022

If this is accepted, I have one more specialization in mind before 3.11 beta. Namely, calls to Python classes with non-Python __init__. E.g this common code pattern:

class MyOwnException(Exception): ...
...
raise MyOwnException()   # Optimize this
...

@brandtbucher
Copy link
Member

Should this be closed in favor of #99331?

@Fidget-Spinner Fidget-Spinner deleted the specialize_py_class_postcall branch June 22, 2023 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting core review performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants