Regression: Cannot use cyrptography in embedded wsgi #2299

Naddiseo · 2015-08-27T18:17:28Z

We have discovered at $work that the 1.0 cryptography release does not work with apache2+mod_wsgi. The error log spits out "Truncated or oversized response headers received from daemon process", which seems to happen if the wsgi process dies. We've tracked the issue down to an upgrade with cryptography, and something to do with threading and/or the Python GIL (a work around is in the wsgi config to set WSGIApplicationGroup to %{GLOBAL}).

I did a git bisect from 0.9.3 and 1.0, and found that the issue was introduced in commit b3d37a5.

The text was updated successfully, but these errors were encountered:

reaperhulk · 2015-08-27T18:20:48Z

Wow, would not have expected it to come from that...

/cc @glyph

Naddiseo · 2015-08-27T18:34:58Z

Me either, the commit itself doesn't look like it could cause something like this, I think that it may have been an earlier commit that introduced code paths that weren't called until the triggering commit.

reaperhulk · 2015-08-27T18:41:01Z

Out of curiosity... Calling backend.activate_builtin_random() should disable this code path (you'll need to have a backend available in that scope of course. from cryptography.hazmat.backends.openssl.backend import backend will do it). Does this resolve your issue?

Naddiseo · 2015-08-27T19:07:00Z

Yes, those two lines also resolve the issue.

glyph · 2015-08-28T04:20:33Z

OK, so here's a hypothesis:

This is a problem because sub-interpreters have an entirely different Python namespace (different modules, etc) but the same C namespace (same GIL, same C-level global variables, shared libraries, etc). I'm not sure exactly why initializing the engine a second time causes a problem, but it's C-level runtime initialization stuff, so ¯_(ツ)_/¯. I'm sure that cryptography being surprised to discover an already-initialized OpenSSL causes all kinds of exciting issues.

@reaperhulk - you should like this one since it would also explain some of the other concurrency issues you're seeing at import time.

(The main hypothesis I guess I'm describing here is that subinterpreters are just a completely broken concept nobody should use, but maybe cryptography could be made to work around these issues.)

reaperhulk · 2015-08-28T13:54:25Z

If we work on that assumption we can make engine addition idempotent at the C layer (see #2293) but since the random engine has callbacks in Python (that potentially point to functions not in the "current" sub-interpreter) it would still fail in the sub-interpreter...correct?

glyph · 2015-08-29T23:36:17Z

There's a pretty good explanation here - http://emptysqua.re/blog/python-c-extensions-and-mod-wsgi/#but-c-extensions-are-not-isolated - of how C extensions are treated by mod_wsgi.

glyph · 2015-08-29T23:37:22Z

I am pretty sure that you can call between different sub-interpreters just fine, since they share a GIL.

glyph · 2015-08-30T02:24:00Z

I flied a cffi bug to ask for a way to keep the GIL while calling C: https://bitbucket.org/cffi/cffi/issues/220/some-way-to-annotate-a-call-to-ask-it-not

glyph · 2015-08-30T08:03:52Z

One perhaps-obvious solution here would be to move all this library initialization to the module scope so that it's protected by the import lock, rather than trying to protect it with an ad-hoc lock. It seems counter to the way that cryptography is constructed, to avoid shared mutable state, but in this case there is just the illusion of such avoidance: the mutable state does in fact exist and attempting to abstract it away is hard in a multi-interpreter environment.

public · 2015-08-30T09:51:21Z

Sub interpreters could definitely trigger a race in the engine setup code. #2293 proposes making this process idempotent which seems like a fairly reasonable solution to me.

reaperhulk · 2015-08-30T19:40:02Z

@glyph I'm not opposed to that solution if it resolves our problem, but does it given the change in behavior in python 3.4?

glyph · 2015-08-30T22:15:43Z

Possibly not; I can't figure out at a glance how the per-module locks are initialized. @cyli suggests that cryptography could just take out a filesystem lock in /tmp ;-)

reaperhulk · 2015-09-08T15:05:47Z

Okay, reviving this conversation. Proposal for the sake of discussion: To avoid issues with sub-interpreters we should do the following...

Move os random engine back to C
Make os random engine registration idempotent (we can hold a lock at the C layer which will prevent races with the sub-interpreters)
Move the locking callbacks to C (since they are presumably subject to the same issues)
Make locking callback registration idempotent (this part might be tricky if we want to detect locking callbacks already being registered by things like import _ssl)

Do people agree that this would resolve the issue? If not, why not? And if so, is there a way to do this without doing it all in C because obviously we'd prefer to not do that...

alex · 2015-09-08T15:07:01Z

FWIW, I'm strictly opposed to moving this code back itno C.

On Tue, Sep 8, 2015 at 11:05 AM, Paul Kehrer [email protected]
wrote:

Okay, reviving this conversation. Proposal for the sake of discussion: To
avoid issues with sub-interpreters we should do the following:

Move os random engine back to C

Make os random engine registration idempotent (we can hold a lock at
the C layer which will prevent races with the sub-interpreters)

Move the locking callbacks to C (since they are presumably subject
to the same issues)

Make locking callback registration idempotent (this part might be
tricky if we want to detect locking callbacks already being registered by
things like import _ssl)

Do people agree that this would resolve the issue? If not, why not? And if
so, is there a way to do this without doing it all in C because obviously
we'd prefer to not do that...

—
Reply to this email directly or view it on GitHub
#2299 (comment).

"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084

glyph · 2015-09-08T19:41:15Z

FWIW, I'm strictly opposed to moving this code back itno C.

Just because having more code in C is bad, or...?

alex · 2015-09-08T19:57:51Z

Pretty much :-)

C is bad, and going back to our C code is a regression in functionality and
security.

On Tue, Sep 8, 2015 at 3:41 PM, Glyph [email protected] wrote:

FWIW, I'm strictly opposed to moving this code back itno C.

Just because having more code in C is bad, or...?

—
Reply to this email directly or view it on GitHub
#2299 (comment).

"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084

simo5 · 2015-09-08T20:29:03Z

Alex what regression in security do you see ?
Is that specific to random or does it include locking ?
Isn't handling low level locking in Python backwards ?

glyph · 2015-09-08T20:43:53Z

In order to properly initialize the random engine and locking callbacks, Cryptography needs to hold a lock so that these initializations are mutually exclusive between interpreters. I'm going to ignore the potential for concurrency among backends (import _ssl, M2Crypto, etc) because I am trying to hold on to a few shreds of sanity.

Here are some of the strategies I've discussed on IRC or in discussions with Cryptography maintainers:

Use the GIL

CFFI presently offers no capability of holding the GIL; we would really like at least a temporary solution before that is likely to land. Not to mention that every new-CFFI-version requirement means a new pypy requirement, practically speaking, which is pretty aggressive.

Plus, this only works on the C side of things. In Python, every bytecode executed might release the GIL, so it's no help there.

Allocate our own locks in Cryptography

This is what Cryptography is trying to do now. The problem with acquiring such a lock is that there is nowhere to share such a lock. And a lock you can't share isn't exactly a lock. Any lock allocated in Python code is going to be allocated in multiple interpreters.

If there were an API in C for allocating locks, then we could use that on the C side and then acquire that from the Python side. But there is no such API in C; POSIX locks and Windows locks are fairly different. We could of course use Python's API for this, but starting to rely on cpyext at this point in cryptography's history would seem like even more of a backslide than going back to C code.

Re-use the global import lock

As I mentioned above, Python has a global import lock. What I didn't mention above (although I did on another ticket) is that there is an API, imp.acquire_lock and imp.release_lock to acquire and release that lock.

The API is deprecated externally, but the very definitely not deprecated importlib makes use of the same lock during its bootstrap initialization. So it should not be going away any time soon; it needs to be available in some form.

Conclusion

I think that the best way forward would be to use the import lock in the near term, and in the long term work with CFFI to have logic specifically dedicated to dealing with sub-interpreter concurrency by exposing some form of portable lock initialization, and then using that. Or maybe pursuade CPython that exposing the import lock is necessary for the long term. Or possibly even allowing Python code to run in initmodule somehow. For now, as a bonus, the import lock will probably protect loading C modules as well, since initmodule is called with the GIL held and I doubt many extension modules release it; I think this means we should be OK with _ssl for the time being, for example.

glyph · 2015-09-08T20:45:04Z

Isn't handling low level locking in Python backwards ?

Python provides APIs for portable locking; C doesn't. So it seems like a reasonable place to handle it, if we can manage it.

public · 2015-09-09T14:41:28Z

Using the import lock like that sounds like a reasonable safeguard to me, at least while it remains available in implementations we support.

It's not entirely obvious to me that the current approach to idempotency is in fact invalid but I guess that doesn't really matter if we have a real mutex we can use.

On a slightly different note, this discussion made me look at

cryptography/src/cryptography/hazmat/bindings/openssl/binding.py

Line 158 in b5bb49d

def _lock_cb(cls, mode, n, file, line):

and I suspect it might also be quite wrong in the presence of sub interpreters as each interpreter will get a totally isolated set of locks to use.

glyph · 2015-09-09T18:27:53Z

On a slightly different note, this discussion made me look at

cryptography/src/cryptography/hazmat/bindings/openssl/binding.py

Line 158 in b5bb49d

def _lock_cb(cls, mode, n, file, line):

and I suspect it might also be quite wrong in the presence of sub interpreters as each interpreter will get a totally isolated set of locks to use.

My reading of that code is that one subinterpreter will "win" and initialize the locks with its lock object, and then all other subinterpreters will get a reference to that via the pointers in openssl itself. So I think it's still correct, inasmuch as any code can be "correct" in the face of this mess…

public · 2015-09-10T09:10:00Z

@glyph I think you are right about how the callbacks works means that whichever one wins the race will be the one that the callbacks are executed in. Unfortunately I think there's still a problem since sub interpreters have independent lifetimes. If the one that won the race exits the other interpreter will no longer have a valid locking callback to use.

This same problem applies to osrandom too. It's not really very obvious to me that merely getting initial allocation of the callbacks safe solves the sub interpreter problem.

glyph · 2015-09-10T18:35:51Z

@public I don't think that tearing down a subinterpreter will automatically destroy all objects within that subinterpreter. It might do something like clearing module dictionaries, but it relies upon GC to free objects. I suppose someone could write a proof of concept C program 😩

ncoghlan · 2015-09-11T04:14:53Z

As @glyph notes, we actually still need the global import lock to protect the management of the per-module locks, so using that as at least a temporary workaround is reasonable.

For Python 3.5+, cffi would ideally support using multi-phase initialization which has been designed to improve extension module subinterpreter support: https://docs.python.org/3.5/c-api/module.html#multi-phase-initialization

That unfortunately won't be immediately helpful in cryptography's case, since the problem you have is one that is still an open question for multi-phase init: making it easy to handle external singleton libraries like OpenSSL in a subinterpreter compatible way (or at least in a way that fails noisily rather than introducing subtle data sharing problems - Cython opts for the latter approach by explicitly not supporting subinterpreters at all).

Further work on this is planned for 3.6, so if anyone from cryptography and/or cffi has time to participate, we'd love to hear from you on import-sig: https://mail.python.org/mailman/listinfo/import-sig

ericsnowcurrently · 2015-09-11T05:08:03Z

The "global" import lock is actually a C global:

https://hg.python.org/cpython/file/tip/Python/import.c#l157

...but that may change. While the motivation for the lock is thread races in the same interpreter, C extensions benefit (perhaps incidentally) in the case of subinterpreters. I'm glad this discussion came up because I expect at some point de-globalizing the lock between interpreters may happen as we try to further isolate interpreters in the same process. It's clear that the C extension case will have to be satisfied safely. That may not be the 3.6 timeframe, but I could see it happening by 3.7 (as we make headway on isolation efforts).

Also...

ericsnowcurrently · 2015-09-11T05:08:07Z

...the C extension support in subinterpreters to which Nick refers should help. Currently C extension modules can define C-level state (instead of using C globals); see https://www.python.org/dev/peps/pep-3121/. Perhaps it would help to support some form of process-global per-module state.

reaperhulk · 2015-09-27T04:02:09Z

@Naddiseo Would it be possible to test the change in #2336 in your environment?

Naddiseo · 2015-09-27T06:04:17Z

@reaperhulk, I'll check on Monday when I get to work.

Naddiseo · 2015-09-28T16:29:15Z

@reaperhulk, it appears #2336 does not fix my issue. I'm also having trouble reducing the project to a minimal test case: I can reproduce the bug on the full project, but not on a single file which imports cryptography.

reaperhulk · 2015-09-28T16:32:07Z

@Naddiseo d'oh :( Is the project something where you could build a docker container we could use to test potential solutions or is it something proprietary?

Subinterpreters are very frustrating...

Naddiseo · 2015-09-28T16:50:31Z

It is proprietary, so anything other than a minimal test case is a no-go. I'll continue to try and make a test case in my spare cycles. In the mean time, if you do have other changes you'd like to test, you can point me to a branch/commit and I'll give it a test.

reaperhulk · 2015-09-29T13:27:41Z

@Naddiseo if you get a chance could you test #2293? I made some changes to that approach.

toabctl · 2015-09-29T13:50:00Z

@Naddiseo if you get a chance could you test #2293? I made some changes to that approach.

@reaperhulk I have a similiar problem (running on py27, cryptography 1.0 in a apache2 wsgi env and getting RuntimeError: osrandom engine already registered) and the patch you proposed in #2293 fixes this problem.

reaperhulk · 2015-09-29T15:22:11Z

@toabctl that's great -- any chance you can provide a docker container that replicates the bug? We haven't been willing to land #2293 thus far because it relies on locking callbacks that are registered in Python. I think #2293 is really just narrowing the window for the race condition and not fixing the underlying issue, but I'd like to experiment with an environment that triggers the initial bug.

toabctl · 2015-09-29T16:49:32Z

any chance you can provide a docker container that replicates the bug?

No. Sorry for that. I can paste the wsgi file and the apache config if that helps.

Naddiseo · 2015-09-30T15:39:23Z

@reaperhulk, no joy with #2293

jobec · 2016-02-11T12:23:08Z

I noticed a similar issue with to combination of apache, mod_wsgi and the Requests module.

When doing a request to a HTTPS site with Requests, the function call never returns.
The backend.activate_builtin_random() line of code mentioned in reply 4 also did the trick.

I eventually figured it out because I saw the following error in my apache error log:
extern "Python": function Cryptography_rand_bytes() called, but no code was attached to it yet with @ffi.def_extern(). Returning 0.

For the code that was causing the issue, see: snok/django-auth-adfs@28936fb

--- UPDATE 2016/11/29 ---
I ran into this bug again (v1.4 of cryptography).
It appears this backend.activate_builtin_random() MUST be called before doing any HTTPS request with the requests module. If it's not called before it, it will hang the wsgi process.
First with an error like: Timeout when reading response headers from daemon process and after a while (the moment the daemon process gets cleaned up) with function Cryptography_rand_bytes() called, but no code was attached to it yet with @ffi.def_extern()

…ause of a bug in the cryptography module. ref pyca/cryptography/#2299

reaperhulk · 2016-03-16T00:11:34Z

I'm going to close this as our current solution for mod_wsgi is the WSGIApplicationGroup %{GLOBAL} fix and we've recently documented it in our FAQ.

Hopefully at some point in the future this won't be required, but for now it's the best we can do.

glyph · 2016-03-16T05:43:54Z

Should the FAQ also mention #2537 since that is the (eventual) path forward on this issue? Or am I confused?

jobec · 2016-03-16T10:19:29Z

Are there any downsides in using backend.activate_builtin_random() ? Or is it also a safe workaround?

reaperhulk · 2016-03-16T12:41:58Z

@glyph until we merge it I'm going to be somewhat quiet about it (other than to perhaps point people that direction).

@jobec The random initialization issue appears to be mostly handled by some code we put in the 1.2 release, so it shouldn't be necessary to separately call activate_builtin_random (although doing so shouldn't cause problems). We've been receiving sporadic reports of other issues though, so that's why we've stated that the official way to handle this is the ApplicationGroup directive.

chmarr · 2016-09-09T23:54:57Z

Adding a note here as I ran into this problem and the documented workaround - setting WSGIProgressGroup %{global} - didn't work for me. I hope someone else will find this useful.

From what I can glean, setting %{GLOBAL} doesn't assign the reference (e.g. WSGISCriptAlias) to the first sub-interpreter, but simply to the first process group. Sub-interpreters are created purely by the number of different wsgi applications that are in that process group, with "different application" is roughly equivalent to a different wsgi.py file. So, assuming that diagnosis true, the documented solution doesn't work at all and is in fact completely wrong.

I added some code to the wsgi.py file to syslog out id(os) and os.getpid(), as a way of detecting additional sub-interpreters being generated. Using id(builtins) didn't seem to be useful.

The fix was to ensure each application runs in a different process group, I.e., a different WSGIDaemonProcess for every different wsgi.py file you have. When I did this, this problem, along with the 'isinstance' problem the WSGI documentation referred to, completely went away.

(As an aside, I only have a single wsgi.py file, but I'm referring to them from two different VirtualServers. I think that whatever code detects references to the same wsgi.py file breaks across the virtualserver boundary. I have multiple references to the wsgi.py file inside one of the VirtualServers, and that doesn't appear to cause additional sub-interpreters.)

I hope this helps!

jobec · 2016-11-29T13:35:39Z

I ran into this bug again (v1.4 of cryptography).
It appears backend.activate_builtin_random() MUST be called before doing any HTTPS request with the requests module. If it's not called before it, it will hang the wsgi process.

First with an error like: Timeout when reading response headers from daemon process and after a while (the moment the daemon process gets cleaned up) with function Cryptography_rand_bytes() called, but no code was attached to it yet with @ffi.def_extern()

reaperhulk · 2016-11-29T13:59:56Z

This issue will be resolved with the next cryptography release. See #3229

This was referenced Sep 12, 2015

osrandom engine already registered #2287

Closed

idempotent engine add #2293

Merged

acquire a global lock to init the OpenSSL singleton locks #2336

Closed

reaperhulk mentioned this issue Nov 9, 2015

osrandom engine already registered #2473

Closed

jobec referenced this issue in snok/django-auth-adfs Feb 11, 2016

Fix for an issue where a call to the Requests.post never returned bec…

28936fb

…ause of a bug in the cryptography module. ref pyca/cryptography/#2299

reaperhulk closed this as completed Mar 16, 2016

porridge mentioned this issue Aug 29, 2016

Regression: M2Crypto.m2.rand_bytes() hangs on Apache+mod_wsgi with pyOpenSSL 16.1.10 pyca/pyopenssl#520

Closed

sanderjo mentioned this issue Mar 5, 2017

SAB 2.0.0 Alpha on Debian 9.0 on ARM: "SSLError: [SSL: CA_MD_TOO_WEAK] ca md too weak (_ssl.c:2699)" sabnzbd/sabnzbd#845

Closed

ericsnowcurrently mentioned this issue Aug 31, 2018

C extension init races under subinterpreter ericsnowcurrently/multi-core-python#39

Open

github-actions bot locked as resolved and limited conversation to collaborators Aug 13, 2020

Regression: Cannot use cyrptography in embedded wsgi #2299

Regression: Cannot use cyrptography in embedded wsgi #2299

Comments

Naddiseo commented Aug 27, 2015

reaperhulk commented Aug 27, 2015

Naddiseo commented Aug 27, 2015

reaperhulk commented Aug 27, 2015

Naddiseo commented Aug 27, 2015

glyph commented Aug 28, 2015

reaperhulk commented Aug 28, 2015

glyph commented Aug 29, 2015

glyph commented Aug 29, 2015

glyph commented Aug 30, 2015

glyph commented Aug 30, 2015

public commented Aug 30, 2015

reaperhulk commented Aug 30, 2015

glyph commented Aug 30, 2015

reaperhulk commented Sep 8, 2015

alex commented Sep 8, 2015

glyph commented Sep 8, 2015

alex commented Sep 8, 2015

simo5 commented Sep 8, 2015

glyph commented Sep 8, 2015

Use the GIL

Allocate our own locks in Cryptography

Re-use the global import lock

Conclusion

glyph commented Sep 8, 2015

public commented Sep 9, 2015

glyph commented Sep 9, 2015

public commented Sep 10, 2015

glyph commented Sep 10, 2015

ncoghlan commented Sep 11, 2015

ericsnowcurrently commented Sep 11, 2015

ericsnowcurrently commented Sep 11, 2015

reaperhulk commented Sep 27, 2015

Naddiseo commented Sep 27, 2015

Naddiseo commented Sep 28, 2015

reaperhulk commented Sep 28, 2015

Naddiseo commented Sep 28, 2015

reaperhulk commented Sep 29, 2015

toabctl commented Sep 29, 2015

reaperhulk commented Sep 29, 2015

toabctl commented Sep 29, 2015

Naddiseo commented Sep 30, 2015

jobec commented Feb 11, 2016 • edited Loading

reaperhulk commented Mar 16, 2016

glyph commented Mar 16, 2016

jobec commented Mar 16, 2016

reaperhulk commented Mar 16, 2016

chmarr commented Sep 9, 2016

jobec commented Nov 29, 2016

reaperhulk commented Nov 29, 2016

jobec commented Feb 11, 2016 •

edited

Loading