Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace the cycle GC with a tracing one holding one extra ref in the reference counter #15

Open
arhadthedev opened this issue Oct 26, 2022 · 0 comments

Comments

@arhadthedev
Copy link
Owner

arhadthedev commented Oct 26, 2022

This means that we want:

[...]

  • To remove other overhead due to poor memory layout, reference counting and cycle GC.

faster-cpython/ideas#376


pep-9999-hybrid-garbage-collector.txt

Сборка мусора: гибридный подход между счётчиком ссылок и подходом без него.

Идея: вводим

Идея: счётчик ссылок остаётся, но его ненулевое значение в

Hybrid Garbage Collector

This PEP proposes to add the tracing tri-color garbage collector with accounting for nonzero refcount.

Abstract

Currently the CPython runtime manages object lifetime using the reference counter.

The proposal covers CPython internals only. It does not define any public API; it may be done by a separate PEP after:

  1. Tuning of the implementation to achieve maximum performance; internal API may be breken here
  2. Wide public discussion across maintainers of third party modules and alternative Python implementations.

Implementation Summary

This PEP introduces two parallel systems to track a lifetime of :class:PyObject\ s:

  • the tracing garbage collector with tri-color marking as a primary one
  • the reference counter for compatibility with existing third-party code

To not allow a false positive destruction of an object, these systems are kept in sync:

  • when an object is created, the collector clones a strong reference with Py_NewRef and writes the clone into its reachability forest

  • if another run of the collector states that the object is no longer reachable

    • ... and refcount==1 (the collector is the only holder of the object), the collector calls Py_DECREF thus destructing the object
    • ... and refcount>1 (there is some other C-API holder), the collector reassigns the object to a special Python object thus keeping the object and its referents available for next scans
  • while the garbage collector system sees the object as reachable, the collector keeps an extra reference in the reference counter.

  • a verdict of the collector is maintained as an extra "reference" in the reference counter

  • any other reference in the reference counter is maintained as an ingoing reference from some ephemeral global object

  • the garbage collector aquires an extra refcount reference when the object becomes reachable in its tree, and releases the refer

  • for objects seen by the new garbage collector as reachable, the old reference counter is increased by one. It is required for :func:Py_INCREF macro u both

However, there is code both outside and inside Python that use for low-level decrease
of the reference counter and destruction. To accomodate it, the following synchronization i

  • for objects with the refcounter

  • reachability in the garbage collector tree affects the refcounter as an extra reference

  • any other refcounter references are treated as an ingoing reference from some ephemeral global object

  • an object reachable through the garbage collector tree

For the whole

Here’s a high-level look at the implementation:

A lifetime is determined using tracing garbage collection with tri-color marking:

  • each object belongs to one of three states: unreachable ("while"), reachable with no outgoing references ("black"), and reachable with no outgoing references ("gray")
  • .. further description of the algorithm

!!!ref1:
To account third party code directly accessing the reference counter, each object also has a reference count with...

if we initialize with zero, we get destruction bu third party code and use-after-free reports

so we MUST initialize with 1 meaning "referr".
!!!

For CPython, a non-zero value of a reference counter is treated as an ingoing reference from some ephemeral global object. , its value .

We maintain three per-interpreter sets: white (for removal), black (reachable from a current python code scope),

The collector shall be not called when execution is inside module or object C code. In other words, the collector
may be called from inside of an interpreter loop only.

The reachability tree starts from the current function scope. As a result, all literals need to be fields of this global object to not be garbage collected.

What this PEP does not cover

  • removal of a refcount field
  • memory mobility and generations
  • retirement of the :opt:--with-trace-refs build; objects created by the old way still need to be accounted

It's just about determination whether an object can be destroyed, i. e. lifetime determination.

Motivation

  • cyclic references (plus the whole Lib/test/leakers can be removed)
  • faster garbage collection: show numbers
  • removal of multiple PY_INCREF and PY_DECREF
  • removal of the whole concept of immortal objects - both objects and the corresponding affinity is untouched if we ling them
  • removal of the whole concept of a borrowed link
    from an invisible field of a python context
  • can postpone memory reclamation to free memory outside of tight, performance critical loops, in bulk
    • check if incref/decref takes the GIL
    • check when we may reclaim resources
      • freeing memory is not cheap
    • ideally we make the reclamation once, when the program ends
      • find a balance between used memory included free heap blocks and invocation
    • maybe: may move to fixed-chunk linked-list custom memory allocator
  • starting with Python 4:
    • :opt:--with-trace-refs build gets simplified by reusing the reference tree instead of maintaining its own
      PyObject._ob_prev/PyObject._ob_next machinery

Specification

The approach involves these fundamental changes:

  • (add a sweeper class)
  • (add a timer to run the sweep in a main thread)
  • (extend :opt:--with-trace-refs to scan both its own PyObject._ob_next/PyObject._ob_prev and the reachability tree)

Reachability tree is a child-sibling tree with the following fields:

struct _Py_ReachabilityTreeNode {
	uint32_t first_child_id;      // array ID to allow tree expansion with reallocation
	uint32_t next_sibling_id;     // array ID to allow tree expansion with reallocation
	PyObject *object_sibling;
};

The tree nodes are stored in an array of the following structure:

union {
	_Py_ReachabilityTreeNode node;
	uint32_t free_node_next;
}

modifications of https://docs.python.org/3/library/gc.html here

For C API objects we have no access to their internal structure but have a visitor
slot tp_traverse PyTypeObject.structseq_traverse that we can use instead. Indeed,
we need to store status of each object (white/black/gray). The white set can be stored
as a single linked list between all elements to be removed; the black set is not
explicitly marked; the gray list can be stored in the same single linked list but with
another start pointer.

Backward compatibility

No API or ABI change of the public interface.

Compatible changes of the private interface -- what's about interpreter class?

Changes in observable behavior:

  • debuggers and leak detectors report underestimated reference count; however a living object will always have a positive refcount

Reference implementation

Migration plan

Right after this PEP is implemented, nothing changes:

  • all objects including CPython-created ones continue to rely on a reference counter
  • the garbage collector works in the background; it immediately binds the objects to the special root node
  • the safety net of one-off reference counter has no use yet

Then, as <refcounting methods https://docs.python.org/3/c-api/refcounting.html>_ and comments on borrowed refs are removed,
the safety net of the one-off reference starts to take more and more load until CPython stops touching the reference counter altogether.

However, C API working with stolen references must be reconsidered:

  • reduce usage of stolen references as mush as possible
  • call :func:Py_DECREF after passing the stolen reference to the garbage collector

Todo: externally created objects that do not support traverse

Rejected ideas

Getting rid of reference counters

Using PyObject._ob_prev/PyObject._ob_next -

  1. Available in :opt:--with-trace-refs builds only that are binary incompatible with the release build
  2. Until the reference count is retired, :opt:--with-trace-refs further need a way to account them

https://twitter.com/VictorStinner/status/1481783100191428611

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant