avoiding races with native module initializers #2267

erickt · 2012-04-22T01:04:23Z

The three SSL libraries I've looked at so far, OpenSSL, GnuTLS, and NSS, each require a function to be called (and in some cases, mutexes to be passed in) to initialize their global data structures (ugh). Even better, these functions aren't thread safe, so initialization needs to be pushed up to the executable in order for libraries to not stomp on each other's toes. Now, if a user happens to forget to initialize the library, then they are exposed to race conditions and segfaults.

I'd like to avoid that. Here's what I came up with to avoid these races:

We allow native modules to declare a static initializer that runs before main, and a destructor that runs after.
Add global (Add global variables to unsafe #553), statically initialized mutexes. These then can be used by the library to protect the library initializer functions. To protect against the user not actually initializing the functions, our wrapper could return an abstract handle that's required by the bindings.
Do nothing! These other options are worse than just assuming that people will read and follow the documentation.

brson · 2012-04-22T02:36:26Z

TL;DR I agree we need global data, possibly a mutex type. Let's push libraries as far as we can before adding too many unloved language features.

This is similar to the problem that @olsonjeffery is encountering with the uv bindings. With uv, the standard library needs access to a single global resource (a uv loop) that is available to any task, must be initialized zero or one times, and shutdown when the runtime exits.

The solution we are currently using uses two library functions to create a lazily-initialized, globally accessible task.

What uv is trying to do

This is how the resource is initialized:

Whenever the uv library gets access to the global uv loop it does so by sending messages to another task
It gets access to that task by requesting a channel from the chan_from_global_ptr function, which takes a pointer to a globally-known memory location and returns a channel
If chan_from_global_ptr does not find an existing channel handle at the global ptr location it creates a task and provides it a port to listen on, returning the channel. It's guaranteed to only do this once ever, across all threads.

This is how the resource is deinitialized:

Once the uv task is spawned it converts itself into a "weak" task using the weaken_task function
While weakened the task no longer counts toward the kernel's list of live tasks
When all normal tasks exit the uv task is told to exit via a channel, it cleans up the global resource, and the runtime exits

This could work for other cases too. For instance, we discussed earlier on IRC how it could be used for a getenv/setenv daemon. It may also work for initializing crypto libraries and serializing access to them if necessary.

What ssl could do similarly

Here's how we could possibly use these functions to provide safe access to a crypto library that requires initialization, deinitialization and isn't thread safe if we allow for one rule

All native crypto calls are done on a special-purpose single-threaded scheduler.
We may use any number of tasks, but they all must run on this scheduler.

We would guarantee single initialization and deinitialization the same way as uv. And I could imagine a crypto API that takes a unique closure and lets you execute arbitrary calls within the context of the crypto scheduler, then further APIs that dress that up in synchronous function calls.

If being forced into another task to do crypto is truly intolerable then we would need some kind of mutex as you suggest.

Global data

Even the scheme I'm using requires access to global data; there is clearly a need. The way uv handles it is by stashing a word-sized global in the runtime kernel, but this isn't a solution for everybody. Probably unsafe global variables is inevitable.

The one thing that really weirds me out about this is that this language will allow rust data to "leak" out of the runtime's control in a way that feels different from any other part of the language. In particular, if there are two Rust runtimes running at the same time, then they both will see the same global data. For the purpose of most libraries this means they will also need to coordinate truly global initialization and deinitialization between runtimes and shield runtimes' data from each other.

Of course we can't actually run multiple runtimes, but the runtime will be embeddable, and I don't want to be one of these lame libraries that isn't threadsafe.

Rust mutexes

It's come up a few times now where it would be nice to have a mutex in Rust. We can't just use pthread mutexes because they don't understand the Rust scheduler (if the scheduler deschedules a task while it holds a lock then ... I can't even imagine the consequences but surely they aren't good).

I have not put much thought into it yet because I think we should push actor-based programming as far as we can. Most of the time when you want a lock the code can be expressed with actors. Tasks and messages are the bet we've made. We can create inefficient locks on top of tasks where synchronization is really easier to express with locks. If we find that we must have fast mutexes then it will require serious thinking.

Static initialization

Please let's resist adding this to the language with our very most. I believe we can solve this in the library for many use cases.

erickt · 2012-04-22T03:54:32Z

Great response. I agree I'd hate to have static initialization.

The globally accessible, lazily created task would be perfect for these SSL libraries. I believe all I'd need really is something like:

export ctx::{};

enum ctx { token }
fn gnutls_init() -> ctx {
    let ch = spawn_global_unique { |token_port|
        weaken_task() { |weak_exit_port|
            libgnutls::gnutls_global_init();

            let mut continue = true;
            while continue {
                continue = either::either(
                    { |exit| false },
                    { |token_chan| comm::send(token_chan, token) },
                    comm::select2(weak_exit_port, token_port);
            }

            libgnutls::gnutls_global_deinit();
        }
    };

    comm::recv(ch)
}

fn gnutls_certificate_allocate_credentials(ctx: ctx) { ... }

Where spawn_global_unique is my hypothetical function that only runs once. Rust's runtime hide away allocating a the global pointer for me.

I don't think we need to run the SSL functions inside their own thread though. Beyond synchronizing the init functions, the rest of the API is thread safe, with one exception. The SSL socket wrappers can't be used by more than one thread at a time, but we can already protect against that in rust.

brson · 2013-04-24T00:02:01Z

I think with unsafe global data the lazy initialization problem is solved trivially with a global mutex and a flag. Global shutdown is trickier, especially if you assume that a single process may host multiple Rust 'runtime' instances (which I do).

brson · 2013-04-24T00:05:14Z

I'm going to close this because I think any customizable before-main initialization is a non-starter, and the other possible solutions either exist or are planned. core::unstable::global can be used to do lazy initialization per-runtime (which is usually the same as per-process), and unsafe globals (and therefore global mutexes) are on the drawing board.

Feel free to reopen if you disagree.

rustup I cannot reproduce rust-lang#98493 so let's see what CI says.

…default unwind (rust-lang#2267) I made a few improvements while trying to use assess: - Allow assess to keep building packages even when one or more targets failed to build. Instead, just report how many failed and how many succeeded. - Fix test result reporting for cases where CBMC crashes. We were counting them as success. Now we report the error code. - Allow users to set the default unwind value using --default-unwind. By default, we still use value 1. - Only report as analyzed crates that have any kani-metadata.json file.

brson closed this as completed Apr 24, 2013

bors added a commit to rust-lang-ci/rust that referenced this issue Sep 22, 2022

Auto merge of rust-lang#2267 - RalfJung:rustup, r=RalfJung

54a495c

rustup I cannot reproduce rust-lang#98493 so let's see what CI says.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avoiding races with native module initializers #2267

avoiding races with native module initializers #2267

erickt commented Apr 22, 2012

brson commented Apr 22, 2012

erickt commented Apr 22, 2012

brson commented Apr 24, 2013

brson commented Apr 24, 2013

avoiding races with native module initializers #2267

avoiding races with native module initializers #2267

Comments

erickt commented Apr 22, 2012

brson commented Apr 22, 2012

What uv is trying to do

What ssl could do similarly

Global data

Rust mutexes

Static initialization

erickt commented Apr 22, 2012

brson commented Apr 24, 2013

brson commented Apr 24, 2013