Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial thread-safety #82

Merged
merged 2 commits into from
Mar 10, 2023
Merged

Partial thread-safety #82

merged 2 commits into from
Mar 10, 2023

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Feb 28, 2023

Make the library partially thread-safe.
Crucially, this makes distributed.Worker.data.fast.__getitem__() thread-safe, which means that Worker.execute and get_data will only need to offload to a thread if any of the keys are actually spilled.

Actual thread-offloadling mechanics will be in later PRs (both here and in distributed).

@crusaderky crusaderky self-assigned this Feb 28, 2023
@crusaderky crusaderky marked this pull request as ready for review February 28, 2023 14:46
Copy link

@milesgranger milesgranger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this is good. I'm only not convinced that ignoring KeyErrors caused by thread unsafe operations, is a way to make it thread-safe. If there exists a condition when we care about a KeyError other than when caused by threads, then it's not clear we would know how to determine its cause.

I made a prototype PR using threading.RLock and based on some basic benchmarking the added test here test_getitem_isthreasafe it seems to be negligible in performance impact, if just a smidgen faster but ya, basically the same:

# Ignoring KeyErrors
Benchmark 1: python zict/tests/test_lru.py
  Time (mean ± σ):      1.251 s ±  0.054 s    [User: 1.142 s, System: 0.148 s]
  Range (min … max):    1.168 s …  1.380 s    25 runs

# Re-entrant locking
Benchmark 1: python zict/tests/test_lru.py
  Time (mean ± σ):      1.214 s ±  0.121 s    [User: 1.105 s, System: 0.160 s]
  Range (min … max):    1.080 s …  1.531 s    25 runs

@crusaderky
Copy link
Collaborator Author

crusaderky commented Mar 10, 2023

@milesgranger that's because test_getitem_isthreasafe spends most of its time sending messages across the thread pool, which is itself wrapped in locks.

I modified the test:

def test_getitem_is_threasafe():
    lru = LRU(100, {})
    lru["x"] = 1

    def f(_):
        barrier.wait()
        for _ in range(5_000_000):
            assert lru["x"] == 1

    n = cpu_count()
    barrier = Barrier(n)
    with ThreadPoolExecutor(n) as ex:
        for _ in ex.map(f, range(n)):
            pass

On my 12-core Linux box, this PR takes 9.9s to run, while your version takes 18.9s.

@milesgranger
Copy link

Sorry, my comment wasn't about either implementation being faster or not, but that the alternative, given a test which was assumed to be representative of the use case would be unaffected in terms of time while improving the thread-safety aspect.

If the original test was not representative to how it would be used in practice, and discerning if a KeyError is raised from legitimate misuse or threading is not important; then making it 'thread-safe' by ignoring KeyErrors is perfectly fine by me.

Apologies if I've misunderstood anything here.

@crusaderky
Copy link
Collaborator Author

crusaderky commented Mar 10, 2023

If the original test was not representative to how it would be used in practice

The original test was representative of how it would be used, functionally.
But unit tests are never designed to be performance-efficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Asynchronous Disk Access in Workers
2 participants