Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use of private zstd POOL_* APIs prevents linking against shared zstd library #106

Closed
rathann opened this issue Jun 3, 2020 · 5 comments
Closed

Comments

@rathann
Copy link
Contributor

rathann commented Jun 3, 2020

This module has a --system-zstd option in setup.py, but it doesn't work as expected, because the bundled zstd copy (#48) is compiled anyway. I found that it's almost possible to delete the bundled zstd, but one has to remove all references to the internal zstd POOL_* APIs. Does this module really need to (ab)use internal zstd APIs? Could the multi-threaded mode be implemented in some other way? Maybe using the standard Pool interface with multiprocessing.dummy?

My use-case is packaging zstandard for use in Fedora and we have, like many other distributions, specific provisions against bundling, mostly for security reasons in our packaging guidelines. I'd really appreciate it if I could simply rm -rf zstd and still have the module build with --system-zstd and link to the shared system libzstd then.

@indygreg
Copy link
Owner

We currently make use of the private zstandard APIs unconditionally. And we don't make attempts to support zstandard library versions that don't match the bundled version.

I concede that both of these traits are sub-optimal.

zstandard uses its own threads internally for C code and we cannot use Python's multiprocessing code here.

The proper solution here is to make the build system and run-time code more adaptive to varying system environments. This is on the roadmap for a 1.0 release. It just hasn't been a priority so far.

@terrelln
Copy link
Contributor

terrelln commented Jul 9, 2020

This issue is also problematic for people who use source-based build systems like buck. Now that zstd has switched to using relative includes for non-public headers like pool.h, we can no longer successfully build python-zstandard with our build system.

As a short-term fix, would it be possible to copy zstd's pool.* into python-zstandard and use your own version?

@indygreg
Copy link
Owner

@terrelln this came up again after the 0.15.0 release of this package and the transition to the single file zstd library.

The problem is that some C code in this Python package makes use of private APIs in pool.h. Whether this is appropriate, I'm not sure.

From my perspective, the easy fix is for libzstd to expose these pool APIs from the public .h files. It currently has some pool APIs. But symbols like POOL_function and POOL_add aren't exported.

Long term, I can work around this by removing the C backend and reimplementing things in Rust. The reason I'm using the pool.h symbols in the first place is so I can have an easy-to-use thread pool without having to write a bunch of C or manage another 3rd party thread pool library. With Rust, adding a crate providing a simple thread pool is trivial to do and the overhead is a non-issue.

Anyway, I plan to work around this in 0.15.1 by distributing the pool.h file again. Unfortunate, it depends on zstd_deps.h, so I have to ship that too. And by shipping my own versions of these files, I run the risk of mixing headers from different zstd library versions. It's a rather brittle situation...

@indygreg
Copy link
Owner

I think I'm going to make a backwards-incompatible change to disable the experimental APIs relying on pool.h to only be available when linking against the single file bundled zstd C source file. That's likely the easiest way out of this mess. But this will require a new major version, as it is a significant backwards-incompatible change. Oh well.

indygreg added a commit that referenced this issue Dec 31, 2020
This restores the prior behavior in 0.14. I'm not a fan of the behavior.
But it does unblock building/linking against the system libzstd.

Closes #129. But the solution is not ideal. CC #106.
indygreg added a commit that referenced this issue Dec 31, 2020
This restores the prior behavior in 0.14. I'm not a fan of the behavior.
But it does unblock building/linking against the system libzstd.

Closes #129. But the solution is not ideal. CC #106.
indygreg added a commit that referenced this issue Dec 31, 2020
This restores the prior behavior in 0.14. I'm not a fan of the behavior.
But it does unblock building/linking against the system libzstd.

Closes #129. But the solution is not ideal. CC #106.
indygreg added a commit that referenced this issue Dec 31, 2020
This restores the prior behavior in 0.14. I'm not a fan of the behavior.
But it does unblock building/linking against the system libzstd.

Closes #129. But the solution is not ideal. CC #106.
indygreg added a commit that referenced this issue Dec 31, 2020
The pool APIs are not exported from the library nor are they present
in public headers.

This commit changes behavior of the C backend to conditionally enable
features relying on these symbols. The practical effect is that these
features will only be available if building against the single file /
bundled libzstd and will no longer be available when building against a
system libzstd.

This effectively closes #106.

A side-effect of this change is that --system-zstd likely regressed
due to inability to find zstd headers. We'll need to add a setup.py
argument to plumb the header search path through. But this was a
pre-existing condition.
@terrelln
Copy link
Contributor

terrelln commented Jan 4, 2021

I think I'm going to make a backwards-incompatible change to disable the experimental APIs relying on pool.h to only be available when linking against the single file bundled zstd C source file. That's likely the easiest way out of this mess. But this will require a new major version, as it is a significant backwards-incompatible change. Oh well.

I think that is the best way forward. pool.h isn't meant to be a public header. So we don't guarantee any stability between releases, or guarantee that it will continue to work.

Alternatively, you could copy pool.{h,c}, and threading.{h,c}, remove the zstd_deps.h dependency, and rename the symbols so they don't collide with zstd's implementations.

Though, if you're willing to add a rust (or C++) toolchain to the build process, it would be much easier to just use a rust implementation, because it already has cross-platform threads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants