Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear OpenSSL error queue after SSL_shutdown #11736

Closed
wants to merge 1 commit into from

Conversation

jhawthorn
Copy link
Contributor

In production we've observed from other libraries errors left in the OpenSSL error queue (specifically, "shutdown while in init"). By adding some logging to the errors being pushed it revealed that the source was this file.

Since we call SSL_read and SSL_shutdown here, but don't check the return code for an error, we should clear the OpenSSL error queue in case one was raised.

This didn't affect curl because we call ERR_clear_error before every write operation (a0dd9df), but when libcurl is used in a process with other OpenSSL users, they may detect an OpenSSL error pushed by libcurl's SSL_shutdown as if it was their own.

Our use of libcurl is through ethon, a Ruby wrapper, so I'm not 100% sure the use of libcurl is correct, but this does seem like a place we must clear errors, since we don't handle them.

@github-actions github-actions bot added the TLS label Aug 25, 2023
@bagder
Copy link
Member

bagder commented Aug 25, 2023

seem like a place we must clear errors, since we don't handle them.

It sounds reasonable but it is unfortunate that the docs for SSL_read() and SSL_shutdown() unsurprisingly mention nothing of this.

If I want to reproduce the problem to also verify the fix, do you have any advice?

@pudiva
Copy link

pudiva commented Aug 25, 2023

Hiiii @bagder! 👋

It sounds reasonable but it is unfortunate that the docs for SSL_read() and SSL_shutdown() unsurprisingly mention nothing of this.

They seem to say something about it on the manpage (or am I mistaken?):

Note that SSL_shutdown() must not be called if a previous fatal error has occurred on a connection i.e. if SSL_get_error() has returned SSL_ERROR_SYSCALL or SSL_ERROR_SSL.

Source: https://www.openssl.org/docs/man1.1.1/man3/SSL_shutdown.html

If I want to reproduce the problem to also verify the fix, do you have any advice?

We observed the problem when libcurl executed and then we got the error seemingly raised in our code. We managed to find out that it was raised by libcurl by patching OpenSSL to include a stack trace whenever it raised an error:

If you want to reproduce it, you may try adding some errors to the OpenSSL queue and then call SSL_shutdown().

✨ ✨ ✨

@bagder
Copy link
Member

bagder commented Aug 25, 2023

They seem to say something about it on the manpage (or am I mistaken?):

That talks about when not to call SSL_shutdown, it says nothing about adding data to or leaving lingering entries in the error queue. Are you saying libcurl might be calling SSL_shutdown() in a situation where it shouldn't and that is why it adds this crap the the error queue?

I'm only asking to educate myself. I can't imagine that it can be bad to clear the error queue there and since you have done this research I believe this helps.

@jhawthorn
Copy link
Contributor Author

Just found a reproduction using curl's test suite, thanks for prompting, I'll document it below.

As for docs (which agreed could be better), from SSL_shutdown's RETURN VALUES section:

<0

The shutdown was not successful. Call SSL_get_error(3) with the return value ret to find out the reason. It can occur if an action is needed to continue the operation for nonblocking BIOs.

It can also occur when not all data was read using SSL_read().

Following that through to SSL_get_error:

SSL_ERROR_SSL

A non-recoverable, fatal error in the SSL library occurred, usually a protocol error. The OpenSSL error queue contains more information on the error.

So anything using OpenSSL's API, which may cause an error, may add things to the thread-local error queue. It's definitely an unfortunate API, but it's how almost all OpenSSL methods work. I suppose there's some ambiguity over whether the queue should be cleared before any operation (which was added to curl in a0dd9df, but not all libraries do this) or after (which generally libraries do just by virtue of handling errors, I think it should be a requirement).


Here's a reproduction:
I made a branch where we call exit(1) and do some printing if anything is left in the queue jhawthorn@9b32abc (this is just to test, maybe we could put an assertion in the test file or something to commit?)

With that if we run test 405 we can see the "shutdown while in init" error:

$ ./runtests.pl -n -k 405
********* System characteristics ********
* curl 8.3.0-DEV (x86_64-pc-linux-gnu)
* libcurl/8.3.0-DEV OpenSSL/3.1.2 zlib/1.3 brotli/1.0.9 zstd/1.5.5 libidn2/2.3.4 libpsl/0.21.2 (+libidn2/2.3.4) nghttp2/1.55.1 librtmp/2.3 OpenLDAP/2.6.6
* Features: alt-svc AsynchDNS brotli HSTS HTTP2 HTTPS-proxy IDN IPv6 Largefile libz NTLM NTLM_WB PSL SSL threadsafe TLS-SRP UnixSockets zstd
* Disabled:
* Host: zergling
* System: Linux zergling 6.4.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 01 Jul 2023 16:17:21 +0000 x86_64 GNU/Linux
* OS: linux
* Env:
* Seed: 239600
* Servers: HTTP-IPv6 HTTP-unix FTP-IPv6
*****************************************
test 0405...[FTPS operation to FTP port]

curl returned 1, when expecting 35,28
 405: exit FAILED

 - abort tests
TESTDONE: 1 tests were considered during 1 seconds.
TESTDONE: 0 tests out of 1 reported OK: 0%

TESTFAIL: These test cases failed: 405

$ cat log/stderr405
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
SSL error in queue
4099245E6E7F0000:error:0A000197:SSL routines:SSL_shutdown:shutdown while in init:ssl/ssl_lib.c:2260:

So this probably indicates that SSL_shutdown is called where it shouldn't be (likely as @pudiva shared, because it's not valid to call after the connection has had a previous error). Even if that were to be fixed, clearing the error queue should be done as it could still report a network or error which comes from legitimate usage.

@jay
Copy link
Member

jay commented Aug 26, 2023

Clearing error states is not our responsibility. For example we don't guarantee errno is 0 after libcurl returns. Anyhow I think your request is fine. It should be changed to a one-liner, the comments are unnecessary.

We've seen errors left in the OpenSSL error queue (specifically,
"shutdown while in init") by adding some logging it revealed that the
source was this file.

Since we call SSL_read and SSL_shutdown here, but don't check the return
code for an error, we should clear the OpenSSL error queue in case one
was raised.

This didn't affect curl because we call ERR_clear_error before every
write operation (a0dd9df), but when
libcurl is used in a process with other OpenSSL users, they may detect
an OpenSSL error pushed by libcurl's SSL_shutdown as if it was their
own.

Co-authored-by: Satana de Sant'Ana <[email protected]>
@jhawthorn
Copy link
Contributor Author

jhawthorn commented Aug 26, 2023

Clearing error states is not our responsibility. For example we don't guarantee errno is 0 after libcurl returns

I'd argue if we want to coexist with other .so linked against OpenSSL it is our responsibility. errno is quite different: it only stores one value, which is clobbered by future errors, and API functions which set errno are able to clearly indicate via their return value whether or no errno has been set.

OpenSSL unfortunately does not work this way: you must check SSL_get_error on the return code to determine if an error was raised and that return is based on whether or not there is an existing value in the thread-local queue. This is most obvious in the following example: it's fine to call write with an existing value in errno, but if you call SSL_write with an error in the OpenSSL error queue, you won't be able to tell whether or not it has failed because the abandoned error in the queue affects SSL_get_error (if there's an error in the queue, we'll think there's an error on this write!). For every error pushed by OpenSSL, someone must handle it. Most libraries handle all of their own errors, but not spurious errors from other libraries. Some libraries (including libcurl) hack around other misbehaving libraries by clearing errors before performing SSL_write and similar (which is probably why issue wasn't previously noticed in libcurl), but most libraries don't do this and shouldn't be expected to.

Again, no argument from me that that this is in any way a good API, but if you want to use OpenSSL in a shared environment, clearing the error queue is necessary.

Anyhow I think your request is fine. It should be changed to a one-liner, the comments are unnecessary.

Great 🙇‍♂️! I've force-pushed with the comment removed.

@jay
Copy link
Member

jay commented Aug 26, 2023

Some libraries (including libcurl) hack around other misbehaving libraries by clearing errors before performing SSL_write and similar (which is probably why issue wasn't previously noticed in libcurl), but most libraries don't do this and shouldn't be expected to.

I disagree. That's not a hack. See here:

SSL_get_error() must be used in the same thread that performed the TLS/SSL I/O operation, and no other OpenSSL function calls should appear in between. The current thread's error queue must be empty before the TLS/SSL I/O operation is attempted, or SSL_get_error() will not work reliably.

The only way to ensure that is clear the queue before the call.

@bagder
Copy link
Member

bagder commented Aug 26, 2023

I agree with @jay that, exactly as this PR proves, it seems overly fragile for OpenSSL users to rely on all other users do clear the queue. The only reasonable action is for everyone to make sure it is cleared for their own use. Like the errno comparison.

I will still merge this because it seems clears things up and does not hurt.

@bagder
Copy link
Member

bagder commented Aug 26, 2023

Thanks!

@bagder bagder closed this in 6d44625 Aug 26, 2023
ptitSeb pushed a commit to wasix-org/curl that referenced this pull request Sep 25, 2023
We've seen errors left in the OpenSSL error queue (specifically,
"shutdown while in init") by adding some logging it revealed that the
source was this file.

Since we call SSL_read and SSL_shutdown here, but don't check the return
code for an error, we should clear the OpenSSL error queue in case one
was raised.

This didn't affect curl because we call ERR_clear_error before every
write operation (a0dd9df), but when
libcurl is used in a process with other OpenSSL users, they may detect
an OpenSSL error pushed by libcurl's SSL_shutdown as if it was their
own.

Co-authored-by: Satana de Sant'Ana <[email protected]>

Closes curl#11736
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

4 participants