Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gracefully exiting a Python process running Bjoern #91

Open
saley89 opened this issue May 31, 2016 · 37 comments · May be fixed by #236
Open

Gracefully exiting a Python process running Bjoern #91

saley89 opened this issue May 31, 2016 · 37 comments · May be fixed by #236

Comments

@saley89
Copy link

saley89 commented May 31, 2016

Hi,

I have a simple Python script wrapping a Bjoern web server around a WSGI app. In essence, it looks like this:

def signal_term_handler(signal, frame):
    logging.info('got SIGTERM')
    sys.exit(0)

def main():
    signal.signal(signal.SIGTERM, signal_term_handler)
    logging.info(" Starting Bjoern with WSGI app at http://{}:{}/".format(HOST, PORT))
    bjoern.run(router, HOST, PORT)

The server works just fine. The problem is that when sending a SIGTERM signal (e.g. using kill -15), the signal handler doesn't get called and the process doesn't die. When sending SIGKILL, the process does die, without calling the signal handler (this is expected).

How can I intercept SIGTERM?

@jonashaag
Copy link
Owner

jonashaag commented May 31, 2016

Hm... the way Python implements signals makes it impossible to use them in this situation (C event-loop based server), because they are put into in an internal signal queue and the handlers are only called the next time the Python interpreter is executed, which is when a new request is being made to the server or an existing request is being processed.

I'm not sure how other event-loop based servers solve this problem, maybe by periodically invoking the Python interpreter.

Maybe you can do aside with signals altogether?

@jonashaag
Copy link
Owner

This seems to be possible to work around with periodically calling into https://docs.python.org/2/c-api/exceptions.html#c.PyErr_CheckSignals

@saley89
Copy link
Author

saley89 commented Jun 1, 2016

Thanks for the quick reply and confirming our suspicions that it may be related to the C side of the library. We will look into an alternative solution.

It would be great if you could build in the CheckSignals code you mentioned above if that would allow us to handle signals correctly. I can leave this open if you would like to look into it, otherwise I am happy to close the issue.

@vmandke
Copy link

vmandke commented Dec 16, 2016

Shouldn't SIGTERM be handled the same way SIGINT is ? For graceful shutdown ?

@jonashaag
Copy link
Owner

That's correct, if it doesn't work like that then that's a separate issue.

@vmandke
Copy link

vmandke commented Dec 16, 2016

@jonashaag SIGTERM is not handled like SIGINT as
https://github.com/jonashaag/bjoern/blob/master/bjoern/server.c#L76 only handles SIGINT.

A follow-up question:

https://github.com/jonashaag/bjoern/blob/master/bjoern/server.c#L91 places a SIGINT on the signal queue, would

PyErr_SetInterrupt(); PyErr_CheckSignals();
ensure that the signal is propagated as soon as it occurs ?

@jonashaag
Copy link
Owner

I don't know, maybe the other way round (first check signals, then set interrupt), but in either case we'd have to make sure we hold the GIL

@Auha
Copy link

Auha commented May 8, 2017

Are there any work arounds? I have an app that should not have any downtime.

@ninjatrench
Copy link

Maybe you can run this as a subprocess and kill it whenever required, I guess

@bulletmark
Copy link

To gracefully shut down services, systemd sends a SIGTERM followed 90 secs later by a SIGKILL if necessary (see here). Due to this issue there is no easy way to capture SIGTERM when using bjoern. I find that meinheld just returns from the main event loop on any interrupting signal so I don't need a signal handler at all. Perhaps that approach could be considered?

So I have will have to move back to meinheld at least temporarily although my benchmarks show that bjoern performs slightly faster.

@jonashaag
Copy link
Owner

Can you guys try the https://github.com/jonashaag/bjoern/tree/signal-watcher branch

@bulletmark
Copy link

Sorry, but I really don't want to create a signal handler (it is asynchronous and there are plenty of spin-off issues, e.g. when using multiprocess the children inherit the handler) and was trying to use one simply to catch systemd terminating my service. I suspect this is probably also the case for the OP @saley89 here.

Now that I have seen that meinheld simply returns from the run() event loop for the SIGTERM case it makes a lot of sense to me so I will stick with that. As I said, perhaps you could consider that approach also for bjoern for better systemd compatibility.

@jonashaag
Copy link
Owner

@bulletmark The patch in that branch should make this work. It simply checks every 100ms if there's a new signal in the Python signal queue and executes that Python signal handler.

@thefab
Copy link
Contributor

thefab commented May 7, 2019

The https://github.com/jonashaag/bjoern/tree/signal-watcher branch works perfectly for me with SIGTERM.

My use case is a on_sigterm python handler which do some "cleaning things" then send a SIGINT to the same process to force bjoern to exit.

@bulletmark
Copy link

I still think that simply returning from the event loop as I describe in my 2 posts above (and how meinheld does it) is a better way to go, and most compatible with systemd termination. @jonashaag, please reconsider that approach.

@jonashaag
Copy link
Owner

I just released 3.0.0 with the approach taken in the signal-watcher branch.

@bulletmark could you elaborate on what's better in the meinheld implementation for your use case?

@thefab
Copy link
Contributor

thefab commented May 7, 2019

@bulletmark @jonashaag a complete solution could be to add to the bjoern package a kind of bjoern_wrapper which could be used as a "bjoern cli" (launched by systemd, supervisor, circus or other process managers). Included features would be:

  • gracefull and automatic exit in case of SIGTERM
  • timeout management
  • CLI (argparse) to set host, port, unix_socket and (of course) wsgi application path (with import thanks to importlib)

I use something like that internally. With a little bit of cleaning, I could propose this as a PR if your are interested.

@jonashaag
Copy link
Owner

jonashaag commented May 7, 2019

I use something like that internally. With a little bit of cleaning, I could propose this as a PR if your are interested.

Definitely! Could also paste it here uncleaned so we can discuss without having you spend time on it

@thefab
Copy link
Contributor

thefab commented May 7, 2019

internal script hardly cleaned (but not completely), not even tested, just here to discuss

i think we need to add :

  • argparse options for "port" (I use only unix socket)
  • some "hook" features to be able to provide python callables (thanks to importlib) for some events (on_sigterm, before_start...) like in gunicorn
#!/usr/bin/env python3

import argparse
import sys
import bjoern
import importlib
import signal
import datetime
import time
import os
import threading
from mflog import get_logger
from mfutil import kill_process_and_children


LOGGER = get_logger("bjoern_wrapper")


def send_sigint(pid):
    LOGGER.info("sending SIGINT to %i", pid)
    os.kill(pid, signal.SIGINT)


def on_signal(signum, frame):
    if signum == 15:
        LOGGER.info("received SIGTERM => smart closing...")
        # [...]
        # Real stop
        send_sigint(os.getpid())


def get_wsgi_application(path):
    if len(path.split(':')) != 2:
        LOGGER.warning("main_arg must follow module.submodule:func_name")
        sys.exit(1)
    module_path, func_name = path.split(':')
    mod = importlib.import_module(module_path)
    try:
        return getattr(mod, func_name)
    except Exception:
        LOGGER.warning("can't find: %s func_name in module: %s" % (
            func_name, mod))
        sys.exit(1)


class TimeoutWsgiMiddleware(object):

    def __init__(self, app, timeout):
        self.app = app
        self.timeout = timeout
        self.started = None
        if self.timeout > 0:
            x = threading.Thread(target=self.timeout_handler, daemon=True)
            x.start()

    def timeout_handler(self):
        now = datetime.datetime.now
        while True:
            if self.started:
                if (now() - self.started).total_seconds() > self.timeout:
                    LOGGER.warning("Request Timeout => exit")
                    # Self-Kill
                    kill_process_and_children(os.getpid())
            time.sleep(1)

    def __call__(self, environ, start_response):
        if self.timeout > 0:
            self.started = datetime.datetime.now()
        try:
            # see http://blog.dscpl.com.au/2012/10/
            #     obligations-for-calling-close-on.html
            iterable = self.app(environ, start_response)
            for data in iterable:
                yield data
        finally:
            self.started = None
            if hasattr(iterable, 'close'):
                iterable.close()

def main():
    parser = argparse.ArgumentParser(description="bjoern wrapper")
    parser.add_argument("main_arg", help="wsgi application path")
    parser.add_argument("unix_socket", help="unix socket to listen path")
    parser.add_argument("--timeout", default=60, type=int,
                        help="one request execution timeout (in seconds)")
    args = parser.parse_args()
    signal.signal(signal.SIGTERM, on_signal)
    wsgi_app = get_wsgi_application(args.main_arg)
    try:
        os.unlink(args.unix_socket)
    except Exception:
        pass
    try:
        app = TimeoutWsgiMiddleware(wsgi_app, args.timeout)
        bjoern.run(app, 'unix:%s' % args.unix_socket)
    except KeyboardInterrupt:
        LOGGER.info("process shutdown")


if __name__ == '__main__':
    main()

@bulletmark
Copy link

bulletmark commented May 7, 2019

@jonashaag, what I am talking about is so simple it is embarrassing that I apparently have not explained it clearly enough in my first 2 posts here! :)

Normally code blocks essentially forever in run() and all I am suggesting is that run() should simply return if a SIGTERM is received. SIGTERM is the standard way systemd shuts down a program. No need for the user to write any signal handlers (with the associated complications that often entails as I mention above).

Meinheld does it this way and it works well.

@thefab
Copy link
Contributor

thefab commented May 8, 2019

@bulletmark I understand what you are saying ;-) But IMHO, today, bjoern is a kind of library and not a "full featured daemon". So, you can also want to get "full control" of the stop process (particularly if you provide a python socket(-like) object by yourself).

So we have two different use cases:

  • a "gunicorn like" use case (when you use bjoern as a regular daemon with SIGTERM standard features)
  • a "library" use case (where you want full control on signals)

I propose to add to existing bjoern a wrapper to have both use cases.

With the wrapper, systemd will launch your app with:

bjoern_cli --port=8080 --timeout=60 yourmod.yoursubmod:app

and you will get a regular SIGTERM feature (without anything to implement)

What do you think about this compromise ?

@bulletmark
Copy link

I have a python program which just calls bjoern's run(). Actually it does that via bottles run(..., server='bjoern') which is even more trivial as I can just change server='meinheld' etc. After the run() command I call some cleanup stuff which I need to do before termination. How would I do that with your proposal?

@thefab
Copy link
Contributor

thefab commented May 9, 2019

My plan:

bjoern_cli --port=8080 --timeout=60 --hook-after-stop=yourmod.after_stop yourmod:app

with "user provided" hook:

def after_stop(wsgi_app, **kwargs)
    # [...] clean what you want
    return True

I plan to implement other hooks:

  • before_start(wsgi_app, **kwargs) (if return False the cli exit)
  • on_signal(wsgi_app, signum, frame, **kwargs) (if return False is returned, a SIGINT is sent to bjoern to exit the main loop), with of course a default/regular SIGTERM handling

@bulletmark
Copy link

That would work of course but nowhere near as simple or intuitive as just returning from run() when terminating.

@jonashaag
Copy link
Owner

Stupid question, if I send sigterm/sigint to a Python program, it will raise KeyboardInterrupt, right? So shouldn't that be what happens when you try to interrupt bjoern as well?

@thefab
Copy link
Contributor

thefab commented May 10, 2019

With this code:

import time

try:
    time.sleep(60)
except KeyboardInterrupt:
    print("catched KeyboardInterrupt")
except:
    print("catched something else")
  • with SIGINT, you get: catched KeyboardInterrupt (with return code 0)
  • with SIGTERM, you get an uncatched Terminated (with return code 143)

@jonashaag
Copy link
Owner

jonashaag commented May 18, 2019

From #145

So, here's how Gunicorn works:

  • SIGINT shuts everything down momentarily.
  • SIGTERM waits for all active connections to finish, with a timeout (30 secs by default). It ignores the backlog though, and there doesn't seem to be a way to change that through configuration.

I agree that we should not throw away the backlog when doing a graceful shutdown.

Also I think it's fine to just stop bjoern without any kind of graceful shutdown for SIGINT (but not for SIGTERM).

Bjoern currently works as follows:

  • SIGINT waits for active connections to finish, without a timeout, and ignores the backlog.
  • SIGTERM simply terminates the program immediately.

So, going forward, my suggestion is we change Bjoern behaviour as follows:

  • SIGINT simply shuts down all connections immediately, and raises a KeyboardInterrupt in the program.
  • SIGTERM waits for active connections to finish, and the backlog to be empty (while still accepting new connections), with a timeout, and exits the program. (?)

We can make the SIGTERM behaviour configurable, and allow for custom shutdown behaviours, by exporting some shutdown primitives from the C code.

@jonashaag
Copy link
Owner

I'm still not sure how to meet both the expectations of people thinking of bjoern as a "library" that doesn't fiddle with signal handling and people thinking of bjoern as a server that takes care of graceful loop shutdown.

Maybe we actually have to implement something along the lines of what @thefab suggested.

@thefab
Copy link
Contributor

thefab commented May 19, 2019

I'm not sure it's a typo or not but you wrote:

"SIGTERM waits for active connections to finish, and the backlog to be empty (while still accepting new connections), with a timeout, and exits the program."

IMHO, after a SIGTERM, we don't accept new connections !

For my "library use case", it's perfectly ok to have another signal for custom shutdown. All I need is the python signal handler called from C like in 3.0.

@akuzia
Copy link

akuzia commented Dec 21, 2019

is it a correct way to gracefully exit?

import bjoern
import sys
import signal
import os


def sigterm_handler(_signo, _stack_frame):
    sys.stdout.write(' [*] SIGTERM: Stop \n')
    os.kill(os.getpid(), signal.SIGINT)


if __name__ == '__main__':
    signal.signal(signal.SIGTERM, sigterm_handler)
    sys.stdout.write(' [*] Starting bjoern on port 5000...! \n')
    try:
        bjoern.run(app, '0.0.0.0', 5000)
    except KeyboardInterrupt:
        sys.stdout.write(' [*] SIGINT: Stop \n')
    finally:
        sys.stdout.write(' [*] Exiting \n')

@jf
Copy link

jf commented Sep 27, 2020

I'm not sure it's a typo or not but you wrote:

"SIGTERM waits for active connections to finish, and the backlog to be empty (while still accepting new connections), with a timeout, and exits the program."

IMHO, after a SIGTERM, we don't accept new connections !

Yup. I would agree. Without the use of an external coordinator (to route new connections elsewhere), the server might potentially never have a chance to gracefully shut down, cos it'll keep accepting new connections.

For my "library use case", it's perfectly ok to have another signal for custom shutdown. All I need is the python signal handler called from C like in 3.0.

This "library vs server" thing is interesting. Would the following work for a simple compromise? This supposes that you have the ability to modify (and why shouldn't you?) whatever mechanism you're using to send the SIGTERM to bjoern:

  • before sending the SIGTERM, your external system (a script?) can start up another instance of bjoern to continue listening and accepting new connections (this assumes SO_REUSEPORT is available)
  • on SIGTERM, bjoern will shut the listening socket down so that it no longer accepts new connections
  • for existing connections, the older instance of bjoern can take as much time as it needs to finish the existing requests without any pressure to shut anything down quickly, because the new listening socket is already up and serving new traffic. We should probably define a timeout before the older instance of bjoern shuts down... but the key thing here is that we don't have to worry about needing to keep this short.

This setup would work for keeping the server up without any interruption in serving connections, and also allow for graceful connection draining. Does this make sense?

@ak4nv
Copy link

ak4nv commented Oct 20, 2020

Here are my two cents...
What if in most cases we use a systemd manager for our services it would be graceful to add a systemd-socket into bjoern out-of-the-box. In this case, all socket issues with opening or closing are gone.

Offtopic: I was confused a little bit. If you run bjoern with unix socket it works as wsgi server, otherwise, it works as http. Is it possible to manage via params? I would like to have the ability to run wsgi server which binds to tcp socket.

@jonashaag
Copy link
Owner

It always works as a WSGI server (which is a special kind of HTTP server)

@ak4nv

This comment has been minimized.

@bulletmark
Copy link

As above, I left bjoern and went to meinheld given this issue but am now back to bjoern for other reasons. My problem here was that bjoern does not return from run() when systemd sends the process a SIGTERM which is the default manner by which systemd terminates an application. However, I have now found this is a trivial issue because all I had to do was define KillSignal=SIGINT in the systemd service file and that works as I want because bjoern run() does return on SIGINT. Just noting this here for the benefit of others using systemd to manage their application, and want a graceful way to catch termination.

@nhoad nhoad linked a pull request Aug 28, 2022 that will close this issue
@ally-ben
Copy link

Hi folks, inspire by @bulletmark, I do something like this. (GracefulKiller from StackOverflow)
A few things need to be handled during the graceful shutdown.

  1. Close the socket (stop receving the request)
  2. Finish the request, and close the multithread
  3. Exit with 0
#!/usr/bin/env python
import os
from flask import Flask
import bjoern
import signal
import time
import sys

app = Flask(__name__)

@app.route("/")
def hello_world():
    time.sleep(30)
    return "<p>Hello, World!</p>"


class GracefulKiller:
    kill_now = False
    def __init__(self):
        # not handle sigint, since the bjoern will raise it.
        # signal.signal(signal.SIGINT, self.exit_gracefully)
        signal.signal(signal.SIGTERM, self.exit_gracefully)

    def exit_gracefully(self, *args):
        sock, wsgi_app = bjoern._default_instance
        # Once the sock close, the request still in buffer will be close, and get the error 'Connection reset by peer'
        # The request already in process will be handle completed. 
        sock.close() 
        print('you can close the background jobs here, or the in main KeyboardInterrupt exception')
        os.kill(os.getpid(), signal.SIGINT)


if __name__ == '__main__':
    GracefulKiller()
    try:
        bjoern.run(app, '0.0.0.0', 8080)
    except KeyboardInterrupt:
        print('If you have some background jobs with multithreading, you can handle it here')
        print('start process graceful shutdown')
        time.sleep(3)
        print('Ready to exit')
        sys.exit(0)

@slashburygin
Copy link

Greetings.
In my env (python 3.8, bjoern 3.1.0) I test script by @ally-ben. Bjoern stops after sending SIGTERM only after starting so far it has not accepted any request. But if I send to app on bjoern one or more any requests, it's not responding to SIGTERM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.