Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-46841: Update adaptive.md for inline caching #31817

Merged
merged 1 commit into from
Mar 11, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 20 additions & 9 deletions Python/adaptive.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,16 @@ A family of instructions has the following fundamental properties:
it executes the non-adaptive instruction.
* It has at least one specialized form of the instruction that is tailored
for a particular value or set of values at runtime.
* All members of the family have access to the same number of cache entries.
Individual family members do not need to use all of the entries.
* All members of the family must have the same number of inline cache entries,
to ensure correct execution.
Individual family members do not need to use all of the entries,
but must skip over any unused entries when executing.

The current implementation also requires the following,
although these are not fundamental and may change:

* If a family uses one or more entries, then the first entry must be a
`_PyAdaptiveEntry` entry.
* If a family uses no cache entries, then the `oparg` is used as the
counter for the adaptive instruction.
* All families uses one or more inline cache entries,
the first entry is always the counter.
* All instruction names should start with the name of the non-adaptive
instruction.
* The adaptive instruction should end in `_ADAPTIVE`.
Expand Down Expand Up @@ -76,6 +76,10 @@ keeping `Ti` low which means minimizing branches and dependent memory
accesses (pointer chasing). These two objectives may be in conflict,
requiring judgement and experimentation to design the family of instructions.

The size of the inline cache should as small as possible,
without impairing performance, to reduce the number of
`EXTENDED_ARG` jumps, and to reduce pressure on the CPU's data cache.

### Gathering data

Before choosing how to specialize an instruction, it is important to gather
Expand Down Expand Up @@ -106,7 +110,7 @@ This can be tested quickly:
* `globals->keys->dk_version == expected_version`

and the operation can be performed quickly:
* `value = globals->keys->entries[index].value`.
* `value = entries[cache->index].me_value;`.

Because it is impossible to measure the performance of an instruction without
also measuring unrelated factors, the assessment of the quality of a
Expand All @@ -119,8 +123,7 @@ base instruction.

In general, specialized instructions should be implemented in two parts:
1. A sequence of guards, each of the form
`DEOPT_IF(guard-condition-is-false, BASE_NAME)`,
followed by a `record_cache_hit()`.
`DEOPT_IF(guard-condition-is-false, BASE_NAME)`.
2. The operation, which should ideally have no branches and
a minimum number of dependent memory accesses.

Expand All @@ -129,3 +132,11 @@ can be re-used in the operation.

If there are branches in the operation, then consider further specialization
to eliminate the branches.

### Maintaining stats

Finally, take care that stats are gather correctly.
After the last `DEOPT_IF` has passed, a hit should be recorded with
`STAT_INC(BASE_INSTRUCTION, hit)`.
After a optimization has been deferred in the `ADAPTIVE` form,
that should be recorded with `STAT_INC(BASE_INSTRUCTION, deferred)`.