Thread safety tracker #10421

ViralBShah · 2015-03-06T15:10:20Z

One of the first things we need to do is make the runtime thread safe. This work is on the threads branch, and this tracker predates the new GC. I thought it is worth capturing a tracker that @StefanKarpinski prepared earlier in an issue to ease the thread safety work.

This list is organized as Variable; Approach

builtins.c

extern size_t jl_page_size; constant
extern int jl_in_inference; lock
extern int jl_boot_file_loaded; constant
int in_jl_ = 0; thread-local

ccall.cpp

cgutils.cpp

static std::map<const std::stringGlobalVariable*> stringConstants; lock
static std::map<void*jl_value_llvm> jl_value_to_llvm; lock
static std::map<Value void> llvm_to_jl_value; lock
static std::vector<Constant*> jl_sysimg_gvars; lock
static std::map<intjl_value_t*> typeIdToType; lock
jl_array_t *typeToTypeId; lock
static int cur_type_id = 1; lock

codegen.cpp

void *__stack_chk_guard = NULL; thread-local (jwn: why is this on the list? it's a constant and not thread local)

debuginfo.cpp

extern "C" volatile int jl_in_stackwalk;
JuliaJITEventListener *jl_jit_events;
static obfiletype objfilemap;
extern char *jl_sysimage_name; constant
static logdata_t coverageData;
static logdata_t mallocData;

dump.c

static jl_array_t *tree_literal_values=NULL; thread-local
static jl_value_t *jl_idtable_type=NULL; constant
static jl_array_t *datatype_list=NULL; thread 0 only
jl_value_t ***sysimg_gvars = NULL; thread 0 only
extern int globalUnique; thread 0 only
static size_t delayed_fptrs_n = 0; thread 0 only
static size_t delayed_fptrs_max = 0; thread 0 only

gc.c

init.c

char *jl_stack_lo; thread-local
char *jl_stack_hi; thread-local
volatile sig_atomic_t jl_signal_pending = 0; thread-local
volatile sig_atomic_t jl_defer_signal = 0; thread-local
uv_loop_t *jl_io_loop; I/O thread ?
static void *signal_stack; thread-local (see Random CI failures rather intense right now #9763 (comment))
static mach_port_t segv_port = 0; constant
extern void * __stack_chk_guard; thread-local (duplicate of above)

jltypes.c

int inside_typedef = 0; thread-local
static int match_intersection_mode = 0; thread-local
static int has_ntuple_intersect_tuple = 0; thread-local
static int t_uid_ctr = 1; lock

llvm-simdloop.cpp

static unsigned simd_loop_mdkind = 0; constant
static MDNode* simd_loop_md = NULL; constant
char LowerSIMDLoop::ID = 0; lock

module.c

jl_module_t *jl_main_module=NULL; constant
jl_module_t *jl_core_module=NULL; constant
jl_module_t *jl_base_module=NULL; constant
jl_module_t *jl_current_module=NULL; thread-local
jl_array_t *jl_module_init_order = NULL; lock (this code is bady broken anyways: module init order is wrong and can cause segfaults #9799)

profile.c

sys.c

JL_STREAM *JL_STDIN=0; constant
JL_STREAM *JL_STDOUT=0; constant
JL_STREAM *JL_STDERR=0; constant

task.c

toplevel.c

int jl_lineno = 0; thread-local
jl_module_t *jl_old_base_module = NULL; constant
jl_module_t *jl_internal_main_module = NULL; constant
extern int jl_in_inference; lock

The text was updated successfully, but these errors were encountered:

StefanKarpinski · 2015-03-06T15:12:25Z

Don't we have a spreadsheet of this somewhere? Maybe make it public?

ViralBShah · 2015-03-06T15:14:15Z

https://docs.google.com/a/mayin.org/spreadsheets/d/1FLrB90u0ORvBDmJZtxUfXhSt8HVdKMM0rqYjoOeMvGo/edit#gid=0

Don't know if it is public - but I thought it would be easier to track the work here in the issue.

StefanKarpinski · 2015-03-06T16:13:05Z

I fancied up the spreadsheet a bit so that it's easier to organize and sort.

vtjnash · 2015-03-06T22:20:38Z

i've tried to help knock a few off the list based off existing issues and upcoming changes

JeffBezanson · 2015-03-06T22:45:01Z

That's true, #9986 helps with this.

ViralBShah · 2015-03-09T05:41:42Z

It would be nice to get basic printing to work:

julia> using Base.Threading
julia> @threads all for i=1:100; println(i) ; end

signal (11): Segmentation fault
Segmentation fault (core dumped)

ArchRobison · 2015-03-11T19:15:19Z

Are these changes being made in the master, or a branch? I see items checked off for llvm-simdloop.cpp, but don't see them in the master branch.

StefanKarpinski · 2015-03-11T19:30:09Z

I think this is for the threading branch, right, @JeffBezanson?

tknopp · 2015-03-11T19:38:31Z

would also be interesting which is the current threading branch, "threading" or "threads"

StefanKarpinski · 2015-03-11T19:41:22Z

"threading" is newer than "threads"; not sure why we changed branches.

tknopp · 2015-03-11T19:49:56Z

that why I asked :-) I have seen changes being made to both branches at a time and thus was not sure about it.

Would be also interesting to have an issue what the blockers are for merging into master. IIUC at some point llvm svn was required but this might be solved if we switch to llvm3.6? Further there is still the pthreads dependency.

ArchRobison · 2015-03-11T22:40:08Z

Moving to C11/C++11 threads would remove the pthreads dependency. Using C11/C++11 would also give us portable atomic ops with good memory consistency controls. LLVM is moving aggressively to depend on C++11 anyway.

tknopp · 2015-03-11T22:57:28Z

I am not 100% sure why the pthread dependency was added but it seems to be required to set thread affinity. Is this possible with C++11? For atomic ops we have written some macros that use the compiler dependent intrinsics.

ViralBShah · 2015-03-12T06:58:33Z

@kpamnany had suggested that I use the threads branch and not threading as one of the macros was not quite working. If that is now fixed, I'd rather delete the threads branch and just have everything on threading.

ArchRobison · 2015-03-12T15:16:03Z

As far as I know, C++11 has no notion of thread affinity. So we still have to write platform-specific code for that. There is a hook in C++11 for getting at the platform-specific thread handle, so I think it's possible to write most of the threading in C++11/C11 and interface to platform-specific stuff for affinity.

tknopp · 2015-03-12T23:24:55Z

Not sure what exactly we gain when using C++11 threads. The Julia code is currently largely C based and only uses C++ where necessary (i.e. when interfacing with LLVM). This is a design decision. One could simplify quite some code when using C++ and could e.g. use STL containers instead of the self-written ones in libsupport. But the C vs C++ topic can be quite subjective. Its up to Jeff to give a direction here.

ArchRobison · 2015-03-13T14:25:43Z

C11 threading would work too. It's supposed to interoperate with C++ threads, i.e. in principle it's just difference spellings of the key operations. Though we'd need to check on Windows. Microsoft has been good about implementing the latest versions of C++, but has shown indifference to implementing even C99.

ArchRobison · 2015-03-13T14:27:27Z

Another route is to figure out the threading model first, and then write platform specific implementations of the core operations. That's the way the Intel Cilk Plus run-time is written, since it requires stack-switching capabilities beyond PThreads.

eschnett · 2015-03-13T21:03:29Z

To handle thread affinity I recommend using the hwloc library, already nicely wrapped in https://github.com/JuliaParallel/hwloc.jl. hwloc provides a platform-independent API to query number of cores, sockets, caches, etc., and to define the threads' affinity to them.

I find that C++11 threading implementations are based on pthreads, and thus slow. It would e.g. not be efficient to run each Julia task on a new C++11 thread. This makes C++11 threading facilities (the thread class, async method, future objects) rather useless if one is interested in more fine-grained threading. Most languages that are serious about threading define their own thread abstraction and do not rely on pthreads.

@ArchRobison: What stack-switching capabilities does Cilk Plus have? I had the impression that there are certain limitations. I'm currently using Qthreads https://github.com/Qthreads/qthreads, since there each thread has its own stack, and threads can switch arbitrarily.

ArchRobison · 2015-03-13T22:21:30Z

Yes, direct use of C11/C++11 threading for tasks would be rather useless, but it at least provides portable access to OS-level threads. I would imagine the Julia run-time would have several layers for threading:

Platform specific services. E.g. create a thread.
Unsafe abstractions implemented directly on top of (1), and hence an implementation per platform.
Portable unsafe abstractions built on top of (2)
Safe services for mere mortals, built on top of (2) and (3).

Cilk implementations (including Cilk Plus) implement a cactus stack. This paper gives a good overview of different ways to implement cactus stacks. The limitation to cactus stacks and the "busy leaves" property (each leaf has a thread running on it) is essential to Cilk's strong space and time guarantees, which may or may not be worth the limitations depending on perspective.

For level 2 above, the internals of the Intel implementation use an abstraction similar to Windows fibers, essentially arbitrary stack switching. The file runtime/cilk_fiber-unix.cpp has the Linux version -- about 144 lines of code.

@eschnett How does QThreads deal with stack overflow? I recall that Rust gave up on segmented stacks. Go copies the entire stack, which requires some care with code generation and stack maps, and takes on a little overhead in function prologs. Though overall I like its design since users don't have to worry about stack overflow and are not tied to cactus stacks.

eschnett · 2015-03-14T00:10:04Z

@ArchRobison QThread stack overflows are handled via good, old signal 11, if you enable stack overflow detection. Otherwise they remain unhandled.

eschnett · 2015-03-14T17:44:08Z

@ArchRobison: Regarding Cilk Plus: I have the impression -- please correct me if I am wrong -- that Cilk offers only a limited kind of parallelism: A routine can spawn children, but all children need to exit before the routine itself exits. This is less powerful than e.g. C++11, where the spawned thread's state is typically captured in a future that can be returned, stored, and passed around. Thread execution in C++11 thus doesn't form a tree structure, and cactus stacks (as described in the paper you mention) are not relevant. Is this what you meant by "busy leaves property"?

eschnett · 2015-03-14T17:50:36Z

@ArchRobison: You said "I recall that Rust gave up on segmented stacks." Do you have a pointer to what Rust tried, or how they failed?

kpamnany · 2015-03-16T05:54:59Z

To clarify on the threads vs. threading branches: there are no macro issues. The latter branch is newer and we should throw away the former, but #10527.

kpamnany · 2015-03-16T06:19:14Z

The threading code uses pthreads to start threads, set thread affinities, and for a mutex and condition variable pair to let threads sleep rather than spin when they aren't working. The other platform dependency is atomics, which would also be used in the runtime for spin locks and other synchronization constructs.

I've begun looking at C11 for threads and atomics, but if platform-dependent code is required anyways (for thread affinitization), and moving to C11 doesn't give us Windows, then I'm not sure it's worth the effort. hwloc looks cool but very elaborate for just thread affinity control.

To maximize parallel performance, at least on HSW-EP and KNL (i.e. 72 to >250 threads), plenty of platform-specific code will be needed anyway (e.g. the "best" barrier algorithm itself is different).

vtjnash · 2015-03-16T12:20:16Z

libuv also provides threading primitives: https://github.com/joyent/node/blob/a995a6a776f1b2d01946702190c6ff837c338577/deps/uv/include/uv.h#L1331-L1378

tknopp · 2015-03-16T12:36:05Z

@vtjnash this is what is used but libuv misses the thread affinity code. I would say extending libuv would be a Good appoach

ArchRobison · 2015-03-16T14:56:59Z

@eschnett: Here is a link to Rust giving up on segmented stacks.

"Busy leaves" property says that all leaves in the cactus stack have a thread actively running. E.g., on a P-thread system there will be no more than P leaves, and each leaf will have a thread running it. Each path from a leaf to the root corresponds to a stack that would have occurred in the sequential version of the program, hence the total space is bounded by P*(space for sequential execution). An example of a system without the busy leave property is TBB, where a leaf can stall because of the way TBB maps tasks to threads internally. (Alas a trade off we made for easy of portability.)

It looks like libuv's threading support is essentially the same as pthreads. Atomic operations and user-level scheduling are missing. So libuv seems like a good way to break dependence on pthreads, but we'll still need at least macros for atomics and something for decoupling stack<-->thread bindings.

JeffBezanson · 2015-12-03T17:18:38Z

I'll start a checklist of steps needed before we can turn threading on by default:

CI setup and tests pass
Improve stability (e.g. Segfault in threading: complex numbers #13380)
Make Base thread safe (some functions use global or let-bound temp buffers, should be stack allocated instead)
Stop-the-world GC

Bonus items:

Integrate task system (and remove yieldto)
Implement some approach to I/O

StefanKarpinski · 2015-12-03T17:31:57Z

That's a good list for things that need to be done before threading can be used but I think far less needs to be done before it can be enabled but not used. In fact I think that none of those are necessary for threading support to be enambled by default.

tkelman · 2015-12-03T17:38:33Z

@yuyichao's opinion was that at least the GC item should be done before enabling it by default. Does the current setup when threads are enabled result in worse performance in the single threaded case?

ViralBShah · 2015-12-03T17:41:12Z

I do agree that some performance testing would be great to have and we turn on threading by default carefully.

@jrevels Is this something that your perf testing setup you are planning, be used to track?

JeffBezanson · 2015-12-03T17:49:43Z

Ok, I agree not all of those are necessary. I lowered the priority of some.

I don't think there should be an intermediate state where threading is enabled by default, but shouldn't be used. To me, enabling it by default sends a signal that everybody should feel free to use it.

StefanKarpinski · 2015-12-03T17:50:14Z

Why does the GC need to be changed before compiling thread support by default? My understanding of the GC situation is that it can only cause problems in cases where threading is actually used. The main issue with turning threading support on currently is that on LLVM 3.7 there's a massive regression in compile time, but @Keno's hard work on LLVM 3.7.1 should fix that once we switch LLVM versions.

StefanKarpinski · 2015-12-03T17:51:56Z

The point of enabling it by default as a strictly experimental features is so that we can make sure it actually compiles everywhere and can start getting bug reports on crashes from people trying it out but not relying on it as a stable feature. We are not going fix all the potentially crashes in a vacuum.

yuyichao · 2015-12-03T17:52:38Z

Does the current setup when threads are enabled result in worse performance in the single threaded case?

The only performance regression now is probably the tls access for every GC frame. One main challenge for #14190 is to make sure I don't insert too many GC safepoint / transitions so a performance benchmark would be good.

My understanding of the GC situation is that it can only cause problems in cases where threading is actually used.

If this is what we want, I'm fine with enabling threading for now. I just don't want to encourage people to use threading yet.

The main issue with turning threading support on currently is that on LLVM 3.7 there's a massive regression in compile time,

Threading works fine on LLVM 3.3 now.

ViralBShah · 2015-12-03T17:53:07Z

I thought threading worked on LLVM 3.3 as well as 3.7 now with @yuyichao's work

JeffBezanson · 2015-12-03T17:54:08Z

start getting bug reports on crashes

No shortage of those! As it is, it crashes almost constantly. We should at least get to a state where we can say "seems to work ok".

yuyichao · 2015-12-03T17:54:11Z

Also the stop-the-world is also not the only thing that needs to be fixed in the GC. Many GC internal datastructures are still not thread safe (mostly importantly the remset (for write barrier) and the allocation counter).

ViralBShah · 2015-12-03T17:54:13Z

@yuyichao The point is to enable threading so that we detect various stability issues in the rest of the codebase, and not necessarily encourage people to use threading. In fact @ArchRobison's PR helps discourage some of the current thread APIs.

ViralBShah · 2015-12-03T17:55:40Z

Yes - at least all sequential tests need to pass and CI needs to be set up before we can enable threading by default. Ideally, also no performance regressions, but as long as those are identified, I guess it would be ok.

StefanKarpinski · 2015-12-03T17:56:52Z

We should at least get to a state where we can say "seems to work ok".

I vehemently disagree. We can fix compilation issues and other problems concurrently with making it more stable. I do agree that our test suite needs to pass with threading enabled, but AFAIK, that's already true. If there are platforms where that's not the case, we want to know about them so we can fix them.

ViralBShah · 2015-12-03T17:58:53Z

@yuyichao Now that things work with LLVM 3.3, I guess we don't need to worry about requiring any updates to the CI system (which was the case before), and threading can get tested. Is that correct? If so, we can at least have a PR with threading enabled, and see if we can get a green.

yuyichao · 2015-12-03T18:01:02Z

If so, we can at least have a PR with threading enabled, and see if we can get a green.

Yep. As long as we make it clear that it will almost certainly crash for non-trival interaction with the runtime, I'm totally fine with enabling it by default. And we can certainly start by seeing how does Travis and Appveyor like it.

JeffBezanson · 2015-12-03T18:06:42Z

I don't see the point of shipping something that crashes so much, and where we know we have simple library functions like modf using global buffers. If you want to try something this early-stage, you can set JULIA_THREADS=1. That seems perfectly appropriate to me. Of course we could flip the switch and simply warn people not to use it, but I think the switch-flip sends a stronger signal than anything we post (which not everybody will necessarily read).

JeffBezanson · 2015-12-03T18:10:13Z

Perhaps a good intermediate state is to enable everything else that JULIA_THREADS=1 enables, but fix nthreads==1 by default. I would be fine with that.

StefanKarpinski · 2015-12-03T18:17:12Z

People want to try out experimental features without having to compile a special version of Julia. That said, I would be fine with JULIA_THREADS=1 and nthreads == 1, but I still think having it be an experimental feature is better. What are you worried about if we enable it?

JeffBezanson · 2015-12-03T18:24:06Z

Ok, then this will be fine. All you'll need to do is set an environment variable. What I'm worried about is the appearance of "shipping crap". We need to balance releasing new features ASAP with maintaining rigorous quality standards.

tkelman · 2015-12-03T18:27:10Z

It's dev master, we're already telling people not to use it for real work. It isn't shipping until it's a stable release branch, and we're still a ways away from that on array and other stdlib work.

JeffBezanson · 2015-12-03T18:32:04Z

True, but this is a different level of brokenness. Let's at least see if we can fix some of the more flagrant crashes over the next few days.

jrevels · 2015-12-03T18:44:26Z

@jrevels Is this something that your perf testing setup you are planning, be used to track?

The "testable unit" of the perf tracking framework is just a function call, so as long as a benchmark can be coerced into that form, then the framework should be able to measure its execution.

I'd have to explicitly test out a multithreaded function call to be sure that it doesn't induce any weird breakage, but I can't think of any reason why it shouldn't work in theory.

ViralBShah · 2016-04-29T09:41:29Z

Should we close this issue for now, or update it for 0.5.0? I am tagging 0.5.0 to make sure this gets reviewed for release, but perhaps 0.5.x is the more suitable tag.

vtjnash · 2016-05-03T01:52:14Z

this issue doesn't really list anything

ViralBShah added the multithreading Base.Threads and related functionality label Mar 6, 2015

ViralBShah mentioned this issue Mar 6, 2015

Making Generational GC Thread-Safe #10317

Closed

tkelman mentioned this issue May 6, 2015

calling julia from C #11153

Closed

yuyichao mentioned this issue May 17, 2015

Print filename and line number for deprecation warnings #11311

Merged

yuyichao mentioned this issue Dec 10, 2015

flisp is not thread safe #14354

Closed

ViralBShah added this to the 0.5.0 milestone Apr 29, 2016

vtjnash closed this as completed May 3, 2016

Thread safety tracker #10421

Thread safety tracker #10421

Comments

ViralBShah commented Mar 6, 2015

builtins.c

ccall.cpp

cgutils.cpp

codegen.cpp

debuginfo.cpp

dump.c

gc.c

init.c

jltypes.c

llvm-simdloop.cpp

module.c

profile.c

sys.c

task.c

toplevel.c

StefanKarpinski commented Mar 6, 2015

ViralBShah commented Mar 6, 2015

StefanKarpinski commented Mar 6, 2015

vtjnash commented Mar 6, 2015

JeffBezanson commented Mar 6, 2015

ViralBShah commented Mar 9, 2015

ArchRobison commented Mar 11, 2015

StefanKarpinski commented Mar 11, 2015

tknopp commented Mar 11, 2015

StefanKarpinski commented Mar 11, 2015

tknopp commented Mar 11, 2015

ArchRobison commented Mar 11, 2015

tknopp commented Mar 11, 2015

ViralBShah commented Mar 12, 2015

ArchRobison commented Mar 12, 2015

tknopp commented Mar 12, 2015

ArchRobison commented Mar 13, 2015

ArchRobison commented Mar 13, 2015

eschnett commented Mar 13, 2015

ArchRobison commented Mar 13, 2015

eschnett commented Mar 14, 2015

eschnett commented Mar 14, 2015

eschnett commented Mar 14, 2015

kpamnany commented Mar 16, 2015

kpamnany commented Mar 16, 2015

vtjnash commented Mar 16, 2015

tknopp commented Mar 16, 2015

ArchRobison commented Mar 16, 2015

JeffBezanson commented Dec 3, 2015

StefanKarpinski commented Dec 3, 2015

tkelman commented Dec 3, 2015

ViralBShah commented Dec 3, 2015

JeffBezanson commented Dec 3, 2015

StefanKarpinski commented Dec 3, 2015

StefanKarpinski commented Dec 3, 2015

yuyichao commented Dec 3, 2015

ViralBShah commented Dec 3, 2015

JeffBezanson commented Dec 3, 2015

yuyichao commented Dec 3, 2015

ViralBShah commented Dec 3, 2015

ViralBShah commented Dec 3, 2015

StefanKarpinski commented Dec 3, 2015

ViralBShah commented Dec 3, 2015

yuyichao commented Dec 3, 2015

JeffBezanson commented Dec 3, 2015

JeffBezanson commented Dec 3, 2015

StefanKarpinski commented Dec 3, 2015

JeffBezanson commented Dec 3, 2015

tkelman commented Dec 3, 2015

JeffBezanson commented Dec 3, 2015

jrevels commented Dec 3, 2015

ViralBShah commented Apr 29, 2016 • edited Loading

vtjnash commented May 3, 2016

ViralBShah commented Apr 29, 2016 •

edited

Loading