Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python][S3] Segfault when using S3FileSystem in uwsgi #44071

Closed
lucasmo opened this issue Sep 11, 2024 · 7 comments
Closed

[Python][S3] Segfault when using S3FileSystem in uwsgi #44071

lucasmo opened this issue Sep 11, 2024 · 7 comments
Assignees
Labels
Component: Python Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. Type: bug
Milestone

Comments

@lucasmo
Copy link

lucasmo commented Sep 11, 2024

Describe the bug, including details regarding any error messages, version, and platform.

Simply instantiating S3FileSystem within a uwsgi container causes a segfault on process exit.

Backtrace

(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007f5a4669826e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007f5a4667b8ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007f5a4667c7b6 in __libc_message_impl (fmt=fmt@entry=0x7f5a468218d7 "%s\n") at ../sysdeps/posix/libc_fatal.c:132
#6  0x00007f5a466fbfe5 in malloc_printerr (str=str@entry=0x7f5a468248d8 "malloc_consolidate(): invalid chunk size") at ./malloc/malloc.c:5772
#7  0x00007f5a466fcd58 in malloc_consolidate (av=av@entry=0x7f5a46856ac0 <main_arena>) at ./malloc/malloc.c:4851
#8  0x00007f5a466fea80 in _int_malloc (av=av@entry=0x7f5a46856ac0 <main_arena>, bytes=bytes@entry=1808) at ./malloc/malloc.c:4041
#9  0x00007f5a467006e4 in __GI___libc_malloc (bytes=bytes@entry=1808) at ./malloc/malloc.c:3336
#10 0x00007f5a4678906b in __backtrace_symbols (array=0x55f7e1600c20, size=<optimized out>) at ./debug/backtracesyms.c:68
#11 0x000055f7dfd72b61 in uwsgi_backtrace ()
#12 0x000055f7dfd72f87 in uwsgi_segfault ()
#13 <signal handler called>
#14 0x00007f5a40a59d27 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release_last_use_cold() () from /home/ubuntu/.cache/pypoetry/virtualenvs/uwsgiboom-qBOj13LC-py3.11/lib/python3.11/site-packages/pyarrow/libarrow.so.1400
#15 0x00007f5a418b1d15 in std::unique_ptr<arrow::fs::(anonymous namespace)::AwsInstance, std::default_delete<arrow::fs::(anonymous namespace)::AwsInstance> >::~unique_ptr() ()
   from /home/ubuntu/.cache/pypoetry/virtualenvs/uwsgiboom-qBOj13LC-py3.11/lib/python3.11/site-packages/pyarrow/libarrow.so.1400
#16 0x00007f5a4669aa66 in __run_exit_handlers (status=30, listp=<optimized out>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at ./stdlib/exit.c:108
#17 0x00007f5a4669abae in __GI_exit (status=<optimized out>) at ./stdlib/exit.c:138
#18 0x000055f7dfd252c5 in uwsgi_exit ()
#19 0x000055f7dfd6f82b in end_me ()
#20 <signal handler called>
#21 0x00007f5a4677d042 in epoll_wait (epfd=7, events=0x7ffc1469aa0c, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#22 0x000055f7dfd65428 in event_queue_wait ()
#23 0x000055f7dfd22a0e in wsgi_req_accept ()
#24 0x000055f7dfd6e6ae in simple_loop_run ()
#25 0x000055f7dfd6e79b in simple_loop ()
#26 0x000055f7dfd732d9 in uwsgi_ignition ()
#27 0x000055f7dfd779ab in uwsgi_worker_run ()
#28 0x000055f7dfd77f2a in uwsgi_run ()
#29 0x000055f7dfd21e64 in main ()

Reproducing

I tried this with pyarrow 17.0.0 and 16.1.0. I tried with the minimum versions of each package that didn't have critical CVEs and worked with python 3.11 or 3.12.

In a virtualenv, install uwsgi==2.0.23 and pyarrow==17.0.0.

Create the following program called app.py:

from pyarrow.fs import S3FileSystem

def application(env, start_response):
    start_response('200 OK', [('Content-Type','text/html')])
    fs = S3FileSystem()
    return [b"Hello World"]

Run uwsgi --http :9090 --wsgi-file app.py
In another shell, run curl http://localhost:9090/
Press ^C to terminate the uwsgi container (this also happens after the container serves several requests, or if the container sits idle for 15 minutes).

See:

/arrow/cpp/src/arrow/filesystem/s3fs.cc:2913: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit
!!! uWSGI process 36031 got Segmentation Fault !!!
malloc_consolidate(): invalid chunk size
Aborted (core dumped)

Component(s)

Python

@pitrou
Copy link
Member

pitrou commented Sep 12, 2024

We have an atexit handler that finalizes S3 at process shutdown:

# GH-38364: we don't initialize S3 eagerly as that could lead
# to crashes at shutdown even when S3 isn't used.
# Instead, S3 is initialized lazily using `ensure_s3_initialized`
# in assorted places.
import atexit
atexit.register(ensure_s3_finalized)

Does uwsgi call atexit handlers at shutdown? @methane

@lucasmo
Copy link
Author

lucasmo commented Sep 12, 2024

It does look like that is called, from the core dump:

__run_exit_handlers (...) at ./stdlib/exit.c:108

This is the line before the pointer free happens. Is it possible the finalize is called twice somehow, resulting in a double free?

@pitrou
Copy link
Member

pitrou commented Sep 12, 2024

It does look like that is called, from the core dump:

No, this is the libc exit, and it is often too late at this point to finalize S3. Python atexit handlers are called before, in Py_Finalize.

@pitrou
Copy link
Member

pitrou commented Sep 12, 2024

Ok, I think I understand what happens. uwsgi finalizes its plugins thanks to a C-level atexit hook:
https://github.com/unbit/uwsgi/blob/b9618b619a73f2fc9680094eba7c202f5422fa42/core/uwsgi.c#L3202

The atexit handler for the Python handler then does call Py_Finalize at the end:
https://github.com/unbit/uwsgi/blob/b9618b619a73f2fc9680094eba7c202f5422fa42/plugins/python/python_plugin.c#L388-L446

So we have this:

  1. uwsgi receives a Ctrl-C (or decides to shutdown a worker for another reason)
  2. for this, it calls the libc's exit function
  3. the libc's exit function shuts down the process
  4. at some point the process shutdown triggers C-level atexit handlers
  5. this calls back into uwsgi, especially uwsgi_plugins_atexit
  6. uwsgi_plugins_atexit calls the Python plugin's uwsgi_python_atexit
  7. uwsgi_python_atexit calls Py_Finalize which calls the Python atexit handlers
  8. our atexit handler is then executed to finalize S3:
    import atexit
    atexit.register(ensure_s3_finalized)

However, at this point, it's already too late, because calling the libc's exit function has already deallocated critical resources in the AWS SDK. S3 finalization fails and crashes the process.

@pitrou
Copy link
Member

pitrou commented Sep 12, 2024

So, to solve this, we'd like to find a way for uwsgi to call Py_Finalize before calling the C exit function.
I'm not sure that is possible @xrmx @methane @niol ?

pitrou added a commit to pitrou/arrow that referenced this issue Sep 12, 2024
pitrou added a commit to pitrou/arrow that referenced this issue Sep 13, 2024
pitrou added a commit to pitrou/arrow that referenced this issue Sep 16, 2024
pitrou added a commit to pitrou/arrow that referenced this issue Sep 16, 2024
pitrou added a commit to pitrou/arrow that referenced this issue Sep 17, 2024
@kou kou changed the title Segfault when using S3FileSystem in uwsgi [Python][S3] Segfault when using S3FileSystem in uwsgi Sep 18, 2024
pitrou added a commit to pitrou/arrow that referenced this issue Sep 19, 2024
pitrou added a commit that referenced this issue Sep 23, 2024
…44090)

### Rationale for this change

Leaking S3 structures at shutdown can be better than inducing a segfault because those structures' destructors run too late at process exit.

This seems to avoid the crash when run under `uwsgi` in #44071

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Hopefully not.

* GitHub Issue: #44071

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
@pitrou pitrou added this to the 18.0.0 milestone Sep 23, 2024
@pitrou
Copy link
Member

pitrou commented Sep 23, 2024

Issue resolved by pull request 44090
#44090

@pitrou pitrou closed this as completed Sep 23, 2024
@pitrou
Copy link
Member

pitrou commented Sep 23, 2024

@lucasmo Ok, I pushed a workaround for it that will be available in 18.0.0. However, do note that this not totally eliminate the risk of crashes at shutdown, due to how uwsgi implements interpreter finalization (see messages above). Unfortunately, uwsgi seems maintenance-only and it's not obvious there will be any change on that front.

@amoeba amoeba added the Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. label Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Python Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. Type: bug
Projects
None yet
Development

No branches or pull requests

3 participants