Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

last release (21.1.0) sends a lot of errors #3032

Closed
wind-shift opened this issue Jul 18, 2023 · 23 comments · Fixed by #3033
Closed

last release (21.1.0) sends a lot of errors #3032

wind-shift opened this issue Jul 18, 2023 · 23 comments · Fixed by #3033
Assignees

Comments

@wind-shift
Copy link

Throwing random errors

[2023-07-18 13:04:10 +0000] [3150] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/gunicorn/arbiter.py", line 609, in spawn_worker
    worker.init_process()
  File "/usr/local/lib/python3.11/site-packages/gunicorn/workers/gthread.py", line 95, in init_process
    super().init_process()
  File "/usr/local/lib/python3.11/site-packages/gunicorn/workers/base.py", line 142, in init_process
    self.run()
  File "/usr/local/lib/python3.11/site-packages/gunicorn/workers/gthread.py", line 232, in run
    self.murder_keepalived()
  File "/usr/local/lib/python3.11/site-packages/gunicorn/workers/gthread.py", line 176, in murder_keepalived
    self.poller.unregister(conn.sock)
  File "/usr/local/lib/python3.11/selectors.py", line 366, in unregister
    key = super().unregister(fileobj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/selectors.py", line 249, in unregister
    key = self._fd_to_key.pop(self._fileobj_lookup(fileobj))
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/selectors.py", line 225, in _fileobj_lookup
    return _fileobj_to_fd(fileobj)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/selectors.py", line 42, in _fileobj_to_fd
    raise ValueError("Invalid file descriptor: {}".format(fd))
ValueError: Invalid file descriptor: -1
[2023-07-18 13:04:10 +0000] [3150] [INFO] Worker exiting (pid: 3150)
[2023-07-18 13:04:11 +0000] [1] [ERROR] Worker (pid:3150) exited with code 255
[2023-07-18 13:04:11 +0000] [1] [ERROR] Worker (pid:3150) exited with code 255.

The previous versions did not have this issue at all, was triggered in our latest build, so we’ll rollback version but still the problem remains in your latest release.

@rafaeljuzwiak
Copy link

I was about to create a ticket regarding this update as well. I manage an API that processes around 2k-3K calls / second distribuite a couple of pods inside Kubernetes with an average of 150ms.

Yesterday I had a hotfix regarding the API applied and I didn't notice I didn't have freeze the gunicorn in the requirements.txt file. As a result, Gunicorned instaled the latested version. I didn't notice at the time but, I notice after applying the patch my response times were averaging 600ms. It was a prod workload, so I undo de PR asap but couldn't get the response back to 150ms regardless what I did.

Only today I decided to compare the packages and notice this in gunicorn. Put it back to 20.1.0 and now it looks beutiful again. Maybe that is related to the "send a lot of errors" above, that is taking too much CPU time away, idk, I know that performance to the 21.1.0 compared to the 20.1.0 went down by a LOT.

@benoitc
Copy link
Owner

benoitc commented Jul 18, 2023

@dkr-sahar 21.1.0 (out today) fix this error that was in 21.0.1. (out last night) Look at #3026. Are you sure about the version ? I don't reproduce it anymore after applying the fix.

Please confirm. @rafaeljuzwiak same question :)

@rafaeljuzwiak
Copy link

rafaeljuzwiak commented Jul 18, 2023

Positive, I did restart my pods and built again (on version 21) this morning and I still had issues.
Only after locking down the requirements.txt to 20.1.0 I got fixed.
You can see attachment, Purple is stage, Yellow Sandbox and Blue production. Lines goes down as soon I had the requirements.txt locked and and restart each environment.

I had to increase 4 pods in order to get production around the 250ms and still, with 4 less pods before I had around 150ms.

I really don't want to test this on prod now as it already caused some fuzz, I can update sandbox latter today to give another try and get the feedback to you.

response times

@rafaeljuzwiak
Copy link

Ps: Morning I mean around 8AM UTC, not sure what tz you are in and if I still got the previous unpatched version when I made the rebuild.

@benoitc
Copy link
Owner

benoitc commented Jul 18, 2023

i don't reproduce it. I don't see how it can be rzproduced in current version with the error catched. can you provide an example? Or more contewt ? What return the command line gunicorn --version ? Gunicorn 21.1.0 has been released 2h ago. (14:00 UTC)

@xkobal
Copy link

xkobal commented Jul 18, 2023

I had same problem with the latest gunicorn version. A rollback to 20.1.0 fix the problem.

# gunicorn --version
gunicorn (version 21.1.0)

@benoitc
Copy link
Owner

benoitc commented Jul 18, 2023

Please provide a way to reproduce it. I don't anymore since the patch :(

@xkobal
Copy link

xkobal commented Jul 18, 2023

I will try.

Meanwhile, I am using the gunicorn[gthread]==20.1.0 version.

@wind-shift
Copy link
Author

wind-shift commented Jul 18, 2023

@dkr-sahar 21.1.0 (out today) fix this error that was in 21.0.1. (out last night) Look at #3026. Are you sure about the version ? I don't reproduce it anymore after applying the fix.

Please confirm. @rafaeljuzwiak same question :)

@benoitc I confirm that 21.1.0 fixed our issue (did not notice that it was 2hours ago)

However we still have this problem that just appaeared.

[2023-07-18 15:33:24 +0000] [17] [ERROR] Socket error processing request.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/gunicorn/workers/gthread.py", line 280, in handle
    req = next(conn.parser)
          ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/gunicorn/http/parser.py", line 42, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.source_addr, self.req_count)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/gunicorn/http/message.py", line 184, in __init__
    super().__init__(cfg, unreader, peer_addr)
  File "/usr/local/lib/python3.11/site-packages/gunicorn/http/message.py", line 55, in __init__
    unused = self.parse(self.unreader)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/gunicorn/http/message.py", line 196, in parse
    self.get_data(unreader, buf, stop=True)
  File "/usr/local/lib/python3.11/site-packages/gunicorn/http/message.py", line 187, in get_data
    data = unreader.read()
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/gunicorn/http/unreader.py", line 37, in read
    d = self.chunk()
        ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/gunicorn/http/unreader.py", line 64, in chunk
    return self.sock.recv(self.mxchunk)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [Errno 9] Bad file descriptor

It seems that it appears when a 500 error is raised (backend: django).

@xkobal
Copy link

xkobal commented Jul 18, 2023

I confirm that it seems to happen only when my backend (django too) is raising an exception.

@wind-shift
Copy link
Author

wind-shift commented Jul 18, 2023

I confirm that it seems to happen only when my backend (django too) is raising an exception.

It could also arise in the following sequence:

  • backend takes too much time to answer (but still succeed to proceed the HTTP request)
  • gunicorn raise this exception with bad file descriptor
  • returns a 500 anyway.

Not totally sure of which way it works.

@xkobal
Copy link

xkobal commented Jul 18, 2023

Same as @dkr-sahar.
And when I call a second time the same endpoint, it works, I have a normal 500 error.

All of this is happening in Docker in my case.

@mihalikv
Copy link

Hi, same issue here.

Gunicorn version:
image

Error stack trace:
image

Gunicorn.conf:

image

Env is production.

Let me know if I can help you with more information.

@geudrik
Copy link

geudrik commented Jul 18, 2023

Our CI (shame on us) installed 21.1.0 at 10:45 EST this am, upgrading us from 20.1.0, and we get tons of the following, causing 502's to return from nginx (sitting in front of gunicorn)

 [2023-07-18 17:33:38 +0000] [55] [ERROR] Socket error processing request.                                                                                                         │
 Traceback (most recent call last):                                                                                                                                                │
   File "/root/.pyenv/versions/3.8/lib/python3.8/site-packages/gunicorn/workers/gthread.py", line 285, in handlekeepalive = self.handle_request(req, conn)                                                                                                                                    │
   File "/root/.pyenv/versions/3.8/lib/python3.8/site-packages/gunicorn/workers/gthread.py", line 357, in handle_requestutil.reraise(*sys.exc_info())                                                                                                                                                 │
   File "/root/.pyenv/versions/3.8/lib/python3.8/site-packages/gunicorn/util.py", line 641, in reraiseraise valueFile "/root/.pyenv/versions/3.8/lib/python3.8/site-packages/gunicorn/workers/gthread.py", line 343, in handle_requestresp.write(item)                                                                                                                                                              │
   File "/root/.pyenv/versions/3.8/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 326, in writeself.send_headers()                                                                                                                                                           │
   File "/root/.pyenv/versions/3.8/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 322, in send_headersutil.write(self.sock, util.to_bytestring(header_str, "latin-1"))                                                                                                              │
   File "/root/.pyenv/versions/3.8/lib/python3.8/site-packages/gunicorn/util.py", line 299, in writesock.sendall(data)                                                                                                                                                            │
 OSError: [Errno 9] Bad file descriptor 

@isra17
Copy link

isra17 commented Jul 18, 2023

Same here, we have emissary in front of gunicorn. My logs shows around 10% of upstream connection closed with the following logs:

2023-07-18T17:52:40Z: [error] Socket error processing request.
Traceback (most recent call last):
  File "/opt/venv/lib/python3.11/site-packages/gunicorn/workers/gthread.py", line 280, in handle
    req = next(conn.parser)
          ^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/gunicorn/http/parser.py", line 42, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.source_addr, self.req_count)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/gunicorn/http/message.py", line 184, in __init__
    super().__init__(cfg, unreader, peer_addr)
  File "/opt/venv/lib/python3.11/site-packages/gunicorn/http/message.py", line 55, in __init__
    unused = self.parse(self.unreader)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/gunicorn/http/message.py", line 196, in parse
    self.get_data(unreader, buf, stop=True)
  File "/opt/venv/lib/python3.11/site-packages/gunicorn/http/message.py", line 187, in get_data
    data = unreader.read()
           ^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/gunicorn/http/unreader.py", line 37, in read
    d = self.chunk()
        ^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/gunicorn/http/unreader.py", line 64, in chunk
    return self.sock.recv(self.mxchunk)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [Errno 9] Bad file descriptor

gunicorn is ran as gunicorn -b 0.0.0.0:80 --workers 4 --threads 4 --timeout 300 --access-logfile - inside docker.

@benoitc
Copy link
Owner

benoitc commented Jul 18, 2023

can someone test with the gh-thread branch ? #3033

@alpden550
Copy link

Python 3.10.12 and latest gunicorn stil nas this error:

OSError: [Errno 9] Bad file descriptor

@revanthmahesh
Copy link

@alpden550 you can use pip install git+https://github.com/benoitc/gunicorn.git@fix-gthread or downgrade the gunicorn version for now. Until the fix is deployed by @benoitc

@alpden550
Copy link

@revanthmahesh
thanks, I know and fixed gunicorn as gunicorn==20.1.0

@oxtime
Copy link

oxtime commented Jul 19, 2023

Python 3.6.7

OSError: [Errno 9] Bad file descriptor

@benoitc
Copy link
Owner

benoitc commented Jul 19, 2023

can someone test with the gh-thread branch ? #3033 Complaining that current release has an issue doesn't help that much. I don't reproduce the issue myself. thanks :)

@geudrik
Copy link

geudrik commented Jul 19, 2023

If it helps others down the road, we had a flask endpoint that would reliably fail / cause the bad file descriptor stack using 21.1.0. All it did was use requests to make an API call upstream and return the results.

I can't easily test with the gthread branch @benoitc, but maybe this info is enough for you to update your test and check?

@benoitc
Copy link
Owner

benoitc commented Jul 19, 2023

gunicorn 21.2.0 has been released which is fixing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants