-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading from Pool is not thread-safe when pool_alloctmpspace() is used #471
Comments
I think more correct approach (which will also make it "safe" for multiple threads, assuming that you don't touch string pool, e.g. by creating new packages) is to mimic pool_solvable2str() yourself and use your temporary memory for those. I think we could probably have 2 variations of these functions, so that caller can choose which one he wants (thread-safe or fast)… They way current function is designed is to use as less memory as possible and to be as fast as possible. |
I don't think it's as easy as you suggested. |
@mlschroe I know multi threading is not an easy topic but I would like to ask you for your opinion. We are preparing a new DNF daemon and multi threading support in libsolv is a feature that is critical for the design. Do you think whether is it possible or feasible to make libsolv thread safe? Right now we see an issue with |
I don't think you can easily make libsolv multi-thread safe. You can do this for the tmpspace, but there's more to it. For example provider lookup caches the result is the whatprovides hash. If we want to go this route we need to go through all the API functions and find out if they need locking. The original design was "a pool must only be accessed by one thread at a time", so there might be some ugly corner cases... |
Thank you very much for information. OK, we will try to not use libsolv in multi-thread. |
In general, libsolv isn't thread-safe.[[1]] There are multiple functions which return pointers that can be invalidated by other threads. It's hard to follow the libsolv side of this, but I think `solvable_lookup_bin_checksum()` (called by `dnf_package_get_chksum()`) is such a case. Because we do importing in bulk using multi-threading, rpm-ostree triggers this bug. This is the root cause of coreos#4565, where during the `cosa fetch` we embed the wrong repodata chksum_repr in the cached RPM's OSTree commit. So then `cosa build` considers it a cache miss when it sees the mismatch and wants to redownload it. A trivial way to reproduce coreos#4565 more easily is with: ```diff diff --git a/src/libpriv/rpmostree-rpm-util.cxx b/src/libpriv/rpmostree-rpm-util.cxx index 658a5fd8..ad8dc104 100644 --- a/src/libpriv/rpmostree-rpm-util.cxx +++ b/src/libpriv/rpmostree-rpm-util.cxx @@ -1142,6 +1142,7 @@ get_repodata_chksum_repr (DnfPackage &pkg) chksum_raw = dnf_package_get_hdr_chksum (&pkg, &chksum_type); else chksum_raw = dnf_package_get_chksum (&pkg, &chksum_type); + g_usleep (5000); if (chksum_raw) chksum = hy_chksum_str (chksum_raw, chksum_type); ``` Fix this by locking down this critical section. As mentioned in the comment, a more correct fix would be to not call unsafe libsolv APIs in the importer, but that'd require a larger rework. The locking added here doesn't appear to affect performance. I've checked that we don't do any other unsafe libsolv calls in the importer path that could be subject to this. Fixes: coreos#4565 [1]: openSUSE/libsolv#471
In general, libsolv isn't thread-safe.[[1]] There are multiple functions which return pointers that can be invalidated by other threads. It's hard to follow the libsolv side of this, but I think `solvable_lookup_bin_checksum()` (called by `dnf_package_get_chksum()`) is such a case. Because we do importing in bulk using multi-threading, rpm-ostree triggers this bug. This is the root cause of #4565, where during the `cosa fetch` we embed the wrong repodata chksum_repr in the cached RPM's OSTree commit. So then `cosa build` considers it a cache miss when it sees the mismatch and wants to redownload it. A trivial way to reproduce #4565 more easily is with: ```diff diff --git a/src/libpriv/rpmostree-rpm-util.cxx b/src/libpriv/rpmostree-rpm-util.cxx index 658a5fd8..ad8dc104 100644 --- a/src/libpriv/rpmostree-rpm-util.cxx +++ b/src/libpriv/rpmostree-rpm-util.cxx @@ -1142,6 +1142,7 @@ get_repodata_chksum_repr (DnfPackage &pkg) chksum_raw = dnf_package_get_hdr_chksum (&pkg, &chksum_type); else chksum_raw = dnf_package_get_chksum (&pkg, &chksum_type); + g_usleep (5000); if (chksum_raw) chksum = hy_chksum_str (chksum_raw, chksum_type); ``` Fix this by locking down this critical section. As mentioned in the comment, a more correct fix would be to not call unsafe libsolv APIs in the importer, but that'd require a larger rework. The locking added here doesn't appear to affect performance. I've checked that we don't do any other unsafe libsolv calls in the importer path that could be subject to this. Fixes: #4565 [1]: openSUSE/libsolv#471
Several functions use pool_alloctmpspace() to return a string allocated in Pool's temporary area.
Such strings must be processed and discarded immediately.
Everything works fine as long as the code runs in a single thread.
We're currently writing new dnfdaemon - a DBus interface that processes incoming requests in an infinite loop and turns them into asynchronous threads. We are aware of potential concurrency issues, designing the code to avoid them as much as possible.
Our use case is: load repositories into Pool and let users run multiple queries at a time.
We believe that pool_alloctmpspace() might become an issue when used in threads and that impacts all libsolv functions that use it, incl. pool_solvable2str() for example.
Locking doesn't seem to be a good idea, because a user would have to always unlock it explicitly after the value is consumed and it also decreases performance.
I'm wondering if
thread_local
annotation onpool->tmpspace
couldn't be used to eliminate race conditions among threads.Maybe also returning a newly allocated string might be an option, but I don't know the performance impact.
Or do you have any better idea?
The text was updated successfully, but these errors were encountered: