-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread safety of printing and matmul #26562
Conversation
This approach seems fine with me for now. We'll have to carefully think this through with the new threading runtime, since that'll start scheduling tasks on different threads, so thread-local will no longer be right. cc @kpamnany |
Right, in general we need task-local storage, but I expect thread-local to be much more efficient (since there's a fixed number of threads, among other things). So we should continue to use this technique for internal buffers where we know that tasks won't block while using them (they won't, right? right? 😄 ). |
One possible model would be to an explicit acquire/release for task local buffers, such that they get automatically copied along with the task if switching does happen. |
CI failures are unrelated, I think. MacOS Travis is failing with |
Would be good to spot check that float to string conversion and printf performance are unaffected. I expect there would be no difference, but you never know. |
old ( julia> foo(x) = @sprintf("%3.4f %10.3g", x, x);
foo(io, x) = @printf(io, "%3.4f %10.3g", x, x);
julia> @btime string(3.7234);
338.630 ns (4 allocations: 192 bytes)
julia> @btime foo(3.7234);
738.968 ns (10 allocations: 576 bytes)
julia> buf = IOBuffer(sizehint=100); @btime print($buf, 3.7234);
255.677 ns (0 allocations: 0 bytes)
julia> buf = IOBuffer(sizehint=100); @btime foo($buf, 3.7234);
717.572 ns (5 allocations: 288 bytes) new thread-safe version: julia> @btime string(3.7234);
332.072 ns (4 allocations: 192 bytes)
julia> @btime foo(3.7234);
760.429 ns (10 allocations: 576 bytes)
julia> buf = IOBuffer(sizehint=100); @btime print($buf, 3.7234);
257.676 ns (0 allocations: 0 bytes)
julia> buf = IOBuffer(sizehint=100); @btime foo($buf, 3.7234);
745.630 ns (5 allocations: 288 bytes) So there seems to be at most a 4% slowdown, usually less. This seems a worthwhile price to pay for thread safety. If you care about |
Okay to merge? |
I think I would prefer we drop this commit unless it is absolutely conclusive that it is essential. We can refactor some of this code to move the computation of the buffer outside hot loops if necessary. |
This reverts commit c7cba84.
@vtjnash, reverted. I just tested and it makes no measurable difference in performance. |
Unrelated timeout error on AppVeyor 64-bit |
Do we need to do the same thing for @quinnj do you know? |
Hmmmm, not sure. I think I remember @vtjnash fixing that a while ago though. |
No, I managed to trigger an issue. See #27826 |
This PR uses a per-thread buffer to make the generic-matmul thread-safe (fixes #22581), floating-point printing thread-safe (fixes #25727), and printf thread-safe (fixes #21335).
In particular, the old
DIGITS
array in grisu is supplemented by aDIGITSs = [DIGITS]
array that is expanded by__init__
into a per-thread buffer. I kept the oldDIGITS
array to avoid breaking the few external packages that use this — they can migrate gradually to the thread-safe version.The basic matmul buffer structure is not changed, so #26008 is neither better nor worse.
As I mentioned in #10441, it would be good to have better support for thread-local variables in the compiler. Ideally I could just put
@threadlocal const DIGITS=[....]
and it would do the right thing under the hood.