Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: Cannot use cyrptography in embedded wsgi #2299

Closed
Naddiseo opened this issue Aug 27, 2015 · 45 comments
Closed

Regression: Cannot use cyrptography in embedded wsgi #2299

Naddiseo opened this issue Aug 27, 2015 · 45 comments

Comments

@Naddiseo
Copy link

We have discovered at $work that the 1.0 cryptography release does not work with apache2+mod_wsgi. The error log spits out "Truncated or oversized response headers received from daemon process", which seems to happen if the wsgi process dies. We've tracked the issue down to an upgrade with cryptography, and something to do with threading and/or the Python GIL (a work around is in the wsgi config to set WSGIApplicationGroup to %{GLOBAL}).

I did a git bisect from 0.9.3 and 1.0, and found that the issue was introduced in commit b3d37a5.

@reaperhulk
Copy link
Member

Wow, would not have expected it to come from that...

/cc @glyph

@Naddiseo
Copy link
Author

Me either, the commit itself doesn't look like it could cause something like this, I think that it may have been an earlier commit that introduced code paths that weren't called until the triggering commit.

@reaperhulk
Copy link
Member

Out of curiosity... Calling backend.activate_builtin_random() should disable this code path (you'll need to have a backend available in that scope of course. from cryptography.hazmat.backends.openssl.backend import backend will do it). Does this resolve your issue?

@Naddiseo
Copy link
Author

Yes, those two lines also resolve the issue.

@glyph
Copy link
Contributor

glyph commented Aug 28, 2015

OK, so here's a hypothesis:

This is a problem because sub-interpreters have an entirely different Python namespace (different modules, etc) but the same C namespace (same GIL, same C-level global variables, shared libraries, etc). I'm not sure exactly why initializing the engine a second time causes a problem, but it's C-level runtime initialization stuff, so ¯_(ツ)_/¯. I'm sure that cryptography being surprised to discover an already-initialized OpenSSL causes all kinds of exciting issues.

@reaperhulk - you should like this one since it would also explain some of the other concurrency issues you're seeing at import time.

(The main hypothesis I guess I'm describing here is that subinterpreters are just a completely broken concept nobody should use, but maybe cryptography could be made to work around these issues.)

@reaperhulk
Copy link
Member

If we work on that assumption we can make engine addition idempotent at the C layer (see #2293) but since the random engine has callbacks in Python (that potentially point to functions not in the "current" sub-interpreter) it would still fail in the sub-interpreter...correct?

@glyph
Copy link
Contributor

glyph commented Aug 29, 2015

There's a pretty good explanation here - http://emptysqua.re/blog/python-c-extensions-and-mod-wsgi/#but-c-extensions-are-not-isolated - of how C extensions are treated by mod_wsgi.

@glyph
Copy link
Contributor

glyph commented Aug 29, 2015

I am pretty sure that you can call between different sub-interpreters just fine, since they share a GIL.

@glyph
Copy link
Contributor

glyph commented Aug 30, 2015

I flied a cffi bug to ask for a way to keep the GIL while calling C: https://bitbucket.org/cffi/cffi/issues/220/some-way-to-annotate-a-call-to-ask-it-not

@glyph
Copy link
Contributor

glyph commented Aug 30, 2015

One perhaps-obvious solution here would be to move all this library initialization to the module scope so that it's protected by the import lock, rather than trying to protect it with an ad-hoc lock. It seems counter to the way that cryptography is constructed, to avoid shared mutable state, but in this case there is just the illusion of such avoidance: the mutable state does in fact exist and attempting to abstract it away is hard in a multi-interpreter environment.

@public
Copy link
Member

public commented Aug 30, 2015

Sub interpreters could definitely trigger a race in the engine setup code. #2293 proposes making this process idempotent which seems like a fairly reasonable solution to me.

@reaperhulk
Copy link
Member

@glyph I'm not opposed to that solution if it resolves our problem, but does it given the change in behavior in python 3.4?

@glyph
Copy link
Contributor

glyph commented Aug 30, 2015

Possibly not; I can't figure out at a glance how the per-module locks are initialized. @cyli suggests that cryptography could just take out a filesystem lock in /tmp ;-)

@reaperhulk
Copy link
Member

Okay, reviving this conversation. Proposal for the sake of discussion: To avoid issues with sub-interpreters we should do the following...

  • Move os random engine back to C
  • Make os random engine registration idempotent (we can hold a lock at the C layer which will prevent races with the sub-interpreters)
  • Move the locking callbacks to C (since they are presumably subject to the same issues)
  • Make locking callback registration idempotent (this part might be tricky if we want to detect locking callbacks already being registered by things like import _ssl)

Do people agree that this would resolve the issue? If not, why not? And if so, is there a way to do this without doing it all in C because obviously we'd prefer to not do that...

@alex
Copy link
Member

alex commented Sep 8, 2015

FWIW, I'm strictly opposed to moving this code back itno C.

On Tue, Sep 8, 2015 at 11:05 AM, Paul Kehrer [email protected]
wrote:

Okay, reviving this conversation. Proposal for the sake of discussion: To
avoid issues with sub-interpreters we should do the following:

  • Move os random engine back to C
  • Make os random engine registration idempotent (we can hold a lock at
    the C layer which will prevent races with the sub-interpreters)
  • Move the locking callbacks to C (since they are presumably subject
    to the same issues)
  • Make locking callback registration idempotent (this part might be
    tricky if we want to detect locking callbacks already being registered by
    things like import _ssl)

Do people agree that this would resolve the issue? If not, why not? And if
so, is there a way to do this without doing it all in C because obviously
we'd prefer to not do that...


Reply to this email directly or view it on GitHub
#2299 (comment).

"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084

@glyph
Copy link
Contributor

glyph commented Sep 8, 2015

FWIW, I'm strictly opposed to moving this code back itno C.

Just because having more code in C is bad, or...?

@alex
Copy link
Member

alex commented Sep 8, 2015

Pretty much :-)

C is bad, and going back to our C code is a regression in functionality and
security.

On Tue, Sep 8, 2015 at 3:41 PM, Glyph [email protected] wrote:

FWIW, I'm strictly opposed to moving this code back itno C.

Just because having more code in C is bad, or...?


Reply to this email directly or view it on GitHub
#2299 (comment).

"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084

@simo5
Copy link
Contributor

simo5 commented Sep 8, 2015

Alex what regression in security do you see ?
Is that specific to random or does it include locking ?
Isn't handling low level locking in Python backwards ?

@glyph
Copy link
Contributor

glyph commented Sep 8, 2015

In order to properly initialize the random engine and locking callbacks, Cryptography needs to hold a lock so that these initializations are mutually exclusive between interpreters. I'm going to ignore the potential for concurrency among backends (import _ssl, M2Crypto, etc) because I am trying to hold on to a few shreds of sanity.

Here are some of the strategies I've discussed on IRC or in discussions with Cryptography maintainers:

Use the GIL

CFFI presently offers no capability of holding the GIL; we would really like at least a temporary solution before that is likely to land. Not to mention that every new-CFFI-version requirement means a new pypy requirement, practically speaking, which is pretty aggressive.

Plus, this only works on the C side of things. In Python, every bytecode executed might release the GIL, so it's no help there.

Allocate our own locks in Cryptography

This is what Cryptography is trying to do now. The problem with acquiring such a lock is that there is nowhere to share such a lock. And a lock you can't share isn't exactly a lock. Any lock allocated in Python code is going to be allocated in multiple interpreters.

If there were an API in C for allocating locks, then we could use that on the C side and then acquire that from the Python side. But there is no such API in C; POSIX locks and Windows locks are fairly different. We could of course use Python's API for this, but starting to rely on cpyext at this point in cryptography's history would seem like even more of a backslide than going back to C code.

Re-use the global import lock

As I mentioned above, Python has a global import lock. What I didn't mention above (although I did on another ticket) is that there is an API, imp.acquire_lock and imp.release_lock to acquire and release that lock.

The API is deprecated externally, but the very definitely not deprecated importlib makes use of the same lock during its bootstrap initialization. So it should not be going away any time soon; it needs to be available in some form.

Conclusion

I think that the best way forward would be to use the import lock in the near term, and in the long term work with CFFI to have logic specifically dedicated to dealing with sub-interpreter concurrency by exposing some form of portable lock initialization, and then using that. Or maybe pursuade CPython that exposing the import lock is necessary for the long term. Or possibly even allowing Python code to run in initmodule somehow. For now, as a bonus, the import lock will probably protect loading C modules as well, since initmodule is called with the GIL held and I doubt many extension modules release it; I think this means we should be OK with _ssl for the time being, for example.

@glyph
Copy link
Contributor

glyph commented Sep 8, 2015

Isn't handling low level locking in Python backwards ?

Python provides APIs for portable locking; C doesn't. So it seems like a reasonable place to handle it, if we can manage it.

@public
Copy link
Member

public commented Sep 9, 2015

Using the import lock like that sounds like a reasonable safeguard to me, at least while it remains available in implementations we support.

It's not entirely obvious to me that the current approach to idempotency is in fact invalid but I guess that doesn't really matter if we have a real mutex we can use.

On a slightly different note, this discussion made me look at

def _lock_cb(cls, mode, n, file, line):
and I suspect it might also be quite wrong in the presence of sub interpreters as each interpreter will get a totally isolated set of locks to use.

@glyph
Copy link
Contributor

glyph commented Sep 9, 2015

On a slightly different note, this discussion made me look at

def _lock_cb(cls, mode, n, file, line):
and I suspect it might also be quite wrong in the presence of sub interpreters as each interpreter will get a totally isolated set of locks to use.

My reading of that code is that one subinterpreter will "win" and initialize the locks with its lock object, and then all other subinterpreters will get a reference to that via the pointers in openssl itself. So I think it's still correct, inasmuch as any code can be "correct" in the face of this mess…

@public
Copy link
Member

public commented Sep 10, 2015

@glyph I think you are right about how the callbacks works means that whichever one wins the race will be the one that the callbacks are executed in. Unfortunately I think there's still a problem since sub interpreters have independent lifetimes. If the one that won the race exits the other interpreter will no longer have a valid locking callback to use.

This same problem applies to osrandom too. It's not really very obvious to me that merely getting initial allocation of the callbacks safe solves the sub interpreter problem.

@glyph
Copy link
Contributor

glyph commented Sep 10, 2015

@public I don't think that tearing down a subinterpreter will automatically destroy all objects within that subinterpreter. It might do something like clearing module dictionaries, but it relies upon GC to free objects. I suppose someone could write a proof of concept C program 😩

@ncoghlan
Copy link

As @glyph notes, we actually still need the global import lock to protect the management of the per-module locks, so using that as at least a temporary workaround is reasonable.

For Python 3.5+, cffi would ideally support using multi-phase initialization which has been designed to improve extension module subinterpreter support: https://docs.python.org/3.5/c-api/module.html#multi-phase-initialization

That unfortunately won't be immediately helpful in cryptography's case, since the problem you have is one that is still an open question for multi-phase init: making it easy to handle external singleton libraries like OpenSSL in a subinterpreter compatible way (or at least in a way that fails noisily rather than introducing subtle data sharing problems - Cython opts for the latter approach by explicitly not supporting subinterpreters at all).

Further work on this is planned for 3.6, so if anyone from cryptography and/or cffi has time to participate, we'd love to hear from you on import-sig: https://mail.python.org/mailman/listinfo/import-sig

@ericsnowcurrently
Copy link

The "global" import lock is actually a C global:

https://hg.python.org/cpython/file/tip/Python/import.c#l157

...but that may change. While the motivation for the lock is thread races in the same interpreter, C extensions benefit (perhaps incidentally) in the case of subinterpreters. I'm glad this discussion came up because I expect at some point de-globalizing the lock between interpreters may happen as we try to further isolate interpreters in the same process. It's clear that the C extension case will have to be satisfied safely. That may not be the 3.6 timeframe, but I could see it happening by 3.7 (as we make headway on isolation efforts).

Also...

@ericsnowcurrently
Copy link

...the C extension support in subinterpreters to which Nick refers should help. Currently C extension modules can define C-level state (instead of using C globals); see https://www.python.org/dev/peps/pep-3121/. Perhaps it would help to support some form of process-global per-module state.

@reaperhulk
Copy link
Member

@Naddiseo Would it be possible to test the change in #2336 in your environment?

@Naddiseo
Copy link
Author

@reaperhulk, I'll check on Monday when I get to work.

@Naddiseo
Copy link
Author

@reaperhulk, it appears #2336 does not fix my issue. I'm also having trouble reducing the project to a minimal test case: I can reproduce the bug on the full project, but not on a single file which imports cryptography.

@reaperhulk
Copy link
Member

@Naddiseo d'oh :( Is the project something where you could build a docker container we could use to test potential solutions or is it something proprietary?

Subinterpreters are very frustrating...

@Naddiseo
Copy link
Author

It is proprietary, so anything other than a minimal test case is a no-go. I'll continue to try and make a test case in my spare cycles. In the mean time, if you do have other changes you'd like to test, you can point me to a branch/commit and I'll give it a test.

@reaperhulk
Copy link
Member

@Naddiseo if you get a chance could you test #2293? I made some changes to that approach.

@toabctl
Copy link

toabctl commented Sep 29, 2015

@Naddiseo if you get a chance could you test #2293? I made some changes to that approach.

@reaperhulk I have a similiar problem (running on py27, cryptography 1.0 in a apache2 wsgi env and getting RuntimeError: osrandom engine already registered) and the patch you proposed in #2293 fixes this problem.

@reaperhulk
Copy link
Member

@toabctl that's great -- any chance you can provide a docker container that replicates the bug? We haven't been willing to land #2293 thus far because it relies on locking callbacks that are registered in Python. I think #2293 is really just narrowing the window for the race condition and not fixing the underlying issue, but I'd like to experiment with an environment that triggers the initial bug.

@toabctl
Copy link

toabctl commented Sep 29, 2015

any chance you can provide a docker container that replicates the bug?

No. Sorry for that. I can paste the wsgi file and the apache config if that helps.

@Naddiseo
Copy link
Author

@reaperhulk, no joy with #2293

@jobec
Copy link

jobec commented Feb 11, 2016

I noticed a similar issue with to combination of apache, mod_wsgi and the Requests module.

When doing a request to a HTTPS site with Requests, the function call never returns.
The backend.activate_builtin_random() line of code mentioned in reply 4 also did the trick.

I eventually figured it out because I saw the following error in my apache error log:
extern "Python": function Cryptography_rand_bytes() called, but no code was attached to it yet with @ffi.def_extern(). Returning 0.

For the code that was causing the issue, see: snok/django-auth-adfs@28936fb

--- UPDATE 2016/11/29 ---
I ran into this bug again (v1.4 of cryptography).
It appears this backend.activate_builtin_random() MUST be called before doing any HTTPS request with the requests module. If it's not called before it, it will hang the wsgi process.
First with an error like: Timeout when reading response headers from daemon process and after a while (the moment the daemon process gets cleaned up) with function Cryptography_rand_bytes() called, but no code was attached to it yet with @ffi.def_extern()

jobec referenced this issue in snok/django-auth-adfs Feb 11, 2016
…ause of a bug in the cryptography module.

ref pyca/cryptography/#2299
@reaperhulk
Copy link
Member

I'm going to close this as our current solution for mod_wsgi is the WSGIApplicationGroup %{GLOBAL} fix and we've recently documented it in our FAQ.

Hopefully at some point in the future this won't be required, but for now it's the best we can do.

@glyph
Copy link
Contributor

glyph commented Mar 16, 2016

Should the FAQ also mention #2537 since that is the (eventual) path forward on this issue? Or am I confused?

@jobec
Copy link

jobec commented Mar 16, 2016

Are there any downsides in using backend.activate_builtin_random() ? Or is it also a safe workaround?

@reaperhulk
Copy link
Member

@glyph until we merge it I'm going to be somewhat quiet about it (other than to perhaps point people that direction).

@jobec The random initialization issue appears to be mostly handled by some code we put in the 1.2 release, so it shouldn't be necessary to separately call activate_builtin_random (although doing so shouldn't cause problems). We've been receiving sporadic reports of other issues though, so that's why we've stated that the official way to handle this is the ApplicationGroup directive.

@chmarr
Copy link

chmarr commented Sep 9, 2016

Adding a note here as I ran into this problem and the documented workaround - setting WSGIProgressGroup %{global} - didn't work for me. I hope someone else will find this useful.

From what I can glean, setting %{GLOBAL} doesn't assign the reference (e.g. WSGISCriptAlias) to the first sub-interpreter, but simply to the first process group. Sub-interpreters are created purely by the number of different wsgi applications that are in that process group, with "different application" is roughly equivalent to a different wsgi.py file. So, assuming that diagnosis true, the documented solution doesn't work at all and is in fact completely wrong.

I added some code to the wsgi.py file to syslog out id(os) and os.getpid(), as a way of detecting additional sub-interpreters being generated. Using id(builtins) didn't seem to be useful.

The fix was to ensure each application runs in a different process group, I.e., a different WSGIDaemonProcess for every different wsgi.py file you have. When I did this, this problem, along with the 'isinstance' problem the WSGI documentation referred to, completely went away.

(As an aside, I only have a single wsgi.py file, but I'm referring to them from two different VirtualServers. I think that whatever code detects references to the same wsgi.py file breaks across the virtualserver boundary. I have multiple references to the wsgi.py file inside one of the VirtualServers, and that doesn't appear to cause additional sub-interpreters.)

I hope this helps!

@jobec
Copy link

jobec commented Nov 29, 2016

I ran into this bug again (v1.4 of cryptography).
It appears backend.activate_builtin_random() MUST be called before doing any HTTPS request with the requests module. If it's not called before it, it will hang the wsgi process.

First with an error like: Timeout when reading response headers from daemon process and after a while (the moment the daemon process gets cleaned up) with function Cryptography_rand_bytes() called, but no code was attached to it yet with @ffi.def_extern()

@reaperhulk
Copy link
Member

This issue will be resolved with the next cryptography release. See #3229

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests