Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ContextVars are not thread-safe in the free-threaded build #121546

Closed
colesbury opened this issue Jul 9, 2024 · 6 comments
Closed

ContextVars are not thread-safe in the free-threaded build #121546

colesbury opened this issue Jul 9, 2024 · 6 comments
Assignees
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes topic-free-threading type-bug An unexpected behavior, bug, or error

Comments

@colesbury
Copy link
Contributor

colesbury commented Jul 9, 2024

Bug report

The ContextVar implementation is not thread-safe in the free-threaded build:

In particular, the caching is not thread-safe, although there may be other thread-safety issues as well:

cpython/Python/context.c

Lines 206 to 212 in 9c08f40

if (var->var_cached != NULL &&
var->var_cached_tsid == ts->id &&
var->var_cached_tsver == ts->context_ver)
{
*val = var->var_cached;
goto found;
}

Example Test Cases:

  • ./python -m test test_context -m test_context_threads_1
  • ./python -m test test_decimal -m test_threading

Linked PRs

@colesbury colesbury added type-bug An unexpected behavior, bug, or error 3.13 bugs and security fixes topic-free-threading 3.14 new features, bugs and security fixes labels Jul 9, 2024
colesbury added a commit to colesbury/cpython that referenced this issue Jul 9, 2024
ContextVars are not currently thread-safe in the free-threaded build. We
will want to fix that soon, but in the meantime add a suppression for
the TSan reported races.
colesbury added a commit to colesbury/cpython that referenced this issue Jul 9, 2024
ContextVars are not currently thread-safe in the free-threaded build. We
will want to fix that soon, but in the meantime add a suppression for
the TSan reported races.
colesbury added a commit to colesbury/cpython that referenced this issue Jul 11, 2024
ContextVars are not currently thread-safe in the free-threaded build. We
will want to fix that soon, but in the meantime add a suppression for
the TSan reported races.
@Fidget-Spinner Fidget-Spinner self-assigned this Jul 13, 2024
@Fidget-Spinner
Copy link
Member

Is throwing a lock on all modifications of the cache a valid solution? Or would that be too slow?

The other option is to "register" a cache in a linked list in thread state. Then have that be cleared on shutdown. I like this better.

@colesbury
Copy link
Contributor Author

A lock would mean that concurrent accesses to ContextVars would not scale well. I'd be more tempted to just disable the cache in the free-threaded build and pay the increased access cost, but still scale well.

Can you explain your linked list idea more?

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Jul 13, 2024

Change
struct _pycontextvarobject {
    PyObject_HEAD
    PyObject *var_name;
    PyObject *var_default;
    PyObject *var_cached;
    uint64_t var_cached_tsid;
    uint64_t var_cached_tsver;
    Py_hash_t var_hash;
};
to
struct _pycontextvarobject {
    PyObject_HEAD
    PyObject *var_name;
    PyObject *var_default;
    _PyCtxVarCacheEntry *var_entry;
    Py_hash_t var_hash;
};
where var_entry points to an entry in an array (something like a freelist).
Said array is managed similar to pycore_freelists.h.
The array looks like this:
| Entry 1          | | Entry 2          | | Free Entry 1  | |  Entry 3         | | Free Entry 2  |
| var_cached       | | var_cached       | | ---> next     | | var_cached       | | ---> NULL     |
| var_cached_tsid  | | var_cached_tsid  | |               | | var_cached_tsid  | |               |
| var_cached_tsver | | var_cached_tsver | |               | | var_cached_tsver | |               |

Flow:

  1. When PyContextVar_Get finds a valid object and sets the cache, we need to read the cache. A cache hit is lockless and returns directly the entry in the array. A cache miss needs to be locked to modify the array to update the cache.
  2. ContextVar_Set needs to lock when setting.

Therefore, most reads (cache hits) should scale well, only cache misses and writes (sets) scale badly.
I'm not too sure how much of the HAMT code is thread-safe. So I have no clue whether we need additional locks there too. A cursory read suggests to me we should at least lock _PyHamt_Assoc in contextvar_set if I understand the code correctly. While I don't think it's easy to cause any races in the HAMT implementation from Python, it seems easier with the C API introduced in PEP 567.

@colesbury
Copy link
Contributor Author

This seems like a complicated scheme and would still not scale well for typical usage patterns. The assumption that most accesses are cache hits is probably not true for a multi-threaded program. It's a plausible assumption with the GIL because you have coarse grained accesses: one thread runs at a time and may do lots of context var accesses within that time slice. Without the GIL, that's probably not true: a context var is likely to have many threads access it in a highly interleaved (or concurrent) pattern.

Contexts are per-thread. The lookup is on the current thread's context, and a context can only be active on a single thread at a time. HAMTs are persistent, in other words the contents are immutable, and mutations create a new data structure (with efficient sharing).

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Jul 13, 2024

HAMTs are persistent, in other words the contents are immutable, and mutations create a new data structure (with efficient sharing)

Oops yea I forgot/didn't know that HAMTs are copy on write. Thanks for pointing that out. In that case I'll send a PR to disable the cache on free-threaded builds.

@colesbury
Copy link
Contributor Author

I think this was addressed by @Fidget-Spinner in #121740

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes topic-free-threading type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants