-
-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python exceptions raised in a wait_callback do not always propagate #410
Comments
Hi, Can you please run your tests with the current If you manage to put together some testing rig to reproduce the issue let us know. |
No I'm afraid that neither branch seems to solve the issue. Still no luck with a test script, but I'll post one when I can. |
Seems like we're still bumping into the same issue with
|
We've tried bumping the versions of all these dependencies and still had the issue with
|
@underyx I don't have a way to reproduce this issue, I am still waiting for one. |
@dvarrazzo Right, I was working on that, and I came up with a rather silly script, that does reproduce the issue, but depends on random chance a bit. Install the following packages: Run this code: from gevent import monkey; monkey.patch_all()
from psycogreen.gevent import patch_psycopg; patch_psycopg()
from gevent import spawn
from gevent.pool import Pool
import psycopg2
from sqlalchemy import create_engine
import time
engine = create_engine('postgresql://localhost')
def sleepy():
try:
conn = engine.raw_connection()
cur = conn.cursor()
cur.execute('SELECT pg_sleep(0.1)')
cur.close()
conn.close()
except:
import traceback; traceback.print_exc()
def killy(greenlets):
for greenlet in greenlets:
time.sleep(0.001)
greenlet.kill()
def main():
pool = Pool()
for _ in range(100):
greenlets = [pool.apply_async(sleepy) for _ in range(100)]
spawn(killy, greenlets)
pool.join()
if __name__ == '__main__':
main() Out of 10000 queries this results in around 80 |
@underyx I'll give it a run, thank you very much. |
@underyx I've tried run your script, then I've tried running your script with 10000 instead of 100 in the outer loop. Twice. So it ran >2M queries, and I didn't see a single unknown :( Maybe a source of difference is the libpq version? I've tried mine with 9.6.1:
For completeness, I've run my test on python 2.7 on Ubuntu 16.04 64 bits. Tests run on 2.7.1 rather than on master to verify if it is a case of #539, but even on 2.7.1 I don't seem able to reproduce it. |
That's fascinating. The three environments where I've seen this error are:
I've published this repo with a Dockerfile and a docker-compose.yml you can use to launch a database and run the script in a way that triggers the failure. I got 500 PS: I'm sure you didn't make this mistake, but just in case: if PPS: I timed my script execution and it came out to 20 seconds for 100 loops. That'd be 33 minutes for 10000 loops, which is a lot more than what it took for you. Since the timing is important here (you need to raise |
I would have exactly asked you for a Dockers thing that I could have tested: thank you very much for providing that, I appreciate the effort and will try to look into it again. I used |
Ok, that got me thinking about difference between network/unix socket connections... tested better and my run of the script was just failing to connect because missing password :\ Fixed I'll try to look a bit better if I can understand what causes them. |
At least I'll be able to sleep properly tonight! :D |
The 'unknown error' happens on query.
Ok nailed it. I managed to set up a test not based on concurrency and added it to the test suite. |
The 'unknown error' happens on query.
Wow, that's amazing, thank you! We'll test this commit in our production env to confirm the bug to be fixed. |
That would be great: let us know, thank you very much. |
Yesterday we got 66 of these errors over the span of around an hour. We just ran for 40 minutes with psycopg2 at 4b4d279 and we got
So, I can confirm that this issue is fixed! Thanks again 🙃 |
Perfect, thank you for testing! |
I haven't been able to put together a minimal example of this, so unfortunately I have only been able to get this problem to manifest within the scope of a project I am on. I will continue to see if I can put together an example, but in the meantime the best I can offer is some context.
The project is using SQLAlchemy on top of psycopg2, using psycogreen to provide this gevent wait_callback:
The problem I'm seeing is that an exception that is raised within this callback is sometimes not being propagated to the Python code that is making calls to psycopg2 via SQLAlchemy. Instead, I get an
OperationalError('unknown error')
.To be specific, I am killing greenlets, which raises a
GreenletExit
exception in the greenlet when it's execution resumes. In some cases,GreenletExit
is getting raised in the wait_callback. Usually,GreenletExit
is then also raised in the code that makes an SQLAlchemy call (e.g.session.query(Model).filter().one()
). The problem is that sometimes I get theOperationalError
instead.I've toyed around in the C layer of psycopg2, and I can see that
PyErr_Occurred()
returnsGreenletExit
after the callback is called ingreen.c:psyco_wait()
. That is somehow getting lost in some cases before the C layer returns control to Python, and it ends up inpqpath.c:pq_complete_error()
which raisesOperationalError('unknown error')
.This seems to me like a bug along some specific code path, but I haven't been able to trace exactly where it goes astray and the exception gets lost.
The text was updated successfully, but these errors were encountered: