-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More robust listener close in riverpgxv5
#246
Conversation
@bgentry Seriously tried to add a test case for this, but couldn't find a way to make |
Thanks for the fix! I've deployed a version of my worker with riverpgxv5 from this branch and will report on Monday whether the issue appeared again. So far it looks good. Enjoy your weekend! |
There is this package from pgx https://github.com/jackc/pgx/blob/master/pgxtest/pgxtest.go. You can overwrite the CloseFunc on the test runner I believe. |
8ba4e4c
to
f566e2c
Compare
@mfrister Excellent. Thanks! @Rshep3087 Hm, just reading through this module, I believe |
cc2c92e
to
45d7f15
Compare
As part of #239, we're observing the possibility of the notifier going into a hot loop where it's trying to reopen a listener connection, but the listener won't let it because it thinks the connection is already open. A suspect is the listener's `Close` implementation, which in the event of an error, returns the error and fails to release an underlying connection, putting it into a state where it's never reusable. Here, modify `Close` so that it always releases and unsets an underlying connection regardless of the error state returned. I tried to add a test case for this, but reading through pgx and net source code, I couldn't find any way to simulate an error from `Close` (I thought a cancelled context would do it, but it does not), so I had to leave it.
45d7f15
to
02267da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only ways I can think of to test this are (a) use an interface over the conn here, which feels excessive, or (b) somehow force the server to uncleanly terminate the conn or reply with an error upon close.
I'm fine with not going forward with one of those approaches at this time, they seem to heavy for the benefit at this stage. Thanks for the quick fix
if err := l.conn.Conn().Close(ctx); err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Man, I noticed this change when I reviewed the mega-PR, should have said something 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh. Well, I was going to say "next time", but hopefully nothing like that PR ever happens again :)
Yeah, I had the same reactions. I think (a) is plausible, but has the downside that it makes the code a little more painful to use. Rather than being able to easily jump easily into concrete implementations, you now have to jump through the interface, which is always a bit annoying. And RE: (b), I read through a lot of pgx source around
Sweet. Yeah, that was my conclusion as well. Possible maybe to add some form of testing, but maybe more trouble than it's worth, and getting the fix out the door is the best compromise for the time being. Thanks! |
Contains an important fix from #246 which resolves a problem wherein a listener from the `riverpgxv5` driver couldn't be reused in cases where `Close` returned an error.
Contains an important fix from #246 which resolves a problem wherein a listener from the `riverpgxv5` driver couldn't be reused in cases where `Close` returned an error.
Regarding unit testing, I have a little bit of experience here. I hunted down a bug in From what I understand, your unit tests hit a REAL PostgreSQL instance, so you could run a proxy within a Go unit test and put that proxy in front of the PostgreSQL instance available to the unit tests. 🤞 I'd hope a plain old TCP error over a non-TLS connection would do the trick, but maybe not. In my log capture around the issue that this fixed (#248), it was actually a TLS protocol issue that showed up in the error:
However, if I had to guess I'd imagine the TCP connection is what had already been closed (e.g. via RST). |
Here, follow up #246 by adding a few more tests that verify a listener's state after `Close` has been invoked, including if it returned an error, which we're able to simulate by overriding pgx's `DialFunc` and returning a stand-in stub for an underlying `net.Conn`. Also, remove the explicit `Close` call on an underlying connection in favor of just invoking the pool's `Release` function. In case of an error condition, `Release` will detect that and do the right thing, and pgx is better tested/vetted to make sure that right thing happens.
Here, follow up #246 by adding a few more tests that verify a listener's state after `Close` has been invoked, including if it returned an error, which we're able to simulate by overriding pgx's `DialFunc` and returning a stand-in stub for an underlying `net.Conn`. Also, remove the explicit `Close` call on an underlying connection in favor of just invoking the pool's `Release` function. In case of an error condition, `Release` will detect that and do the right thing, and pgx is better tested/vetted to make sure that right thing happens.
Here, follow up #246 by adding a few more tests that verify a listener's state after `Close` has been invoked, including if it returned an error, which we're able to simulate by overriding pgx's `DialFunc` and returning a stand-in stub for an underlying `net.Conn`. Also, remove the explicit `Close` call on an underlying connection in favor of just invoking the pool's `Release` function. In case of an error condition, `Release` will detect that and do the right thing, and pgx is better tested/vetted to make sure that right thing happens.
@dhermes Thanks for the added detail there! I'm sort of hoping to avoid a full-blown proxy for the time being if we can possibly do it (looked at yours — looks good, but definitely a substantial amount of new code). You gave me the idea though of getting something more like an "in code" proxy going, and I found a fairly low touch way of simulating a type connStub struct {
net.Conn
closeFunc func() error
}
func newConnStub(conn net.Conn) *connStub {
return &connStub{
Conn: conn,
closeFunc: conn.Close,
}
}
func (c *connStub) Close() error {
return c.closeFunc()
} var connStub *connStub
config := testPoolConfig()
config.ConnConfig.DialFunc = func(ctx context.Context, network, addr string) (net.Conn, error) {
// Dialer settings come from pgx's default internal one (not exported unfortunately).
conn, err := (&net.Dialer{KeepAlive: 5 * time.Minute}).Dial(network, addr)
if err != nil {
return nil, err
}
connStub = newConnStub(conn)
return connStub, nil
} |
Here, follow up #246 by adding a few more tests that verify a listener's state after `Close` has been invoked, including if it returned an error, which we're able to simulate by overriding pgx's `DialFunc` and returning a stand-in stub for an underlying `net.Conn`. Also, remove the explicit `Close` call on an underlying connection in favor of just invoking the pool's `Release` function. In case of an error condition, `Release` will detect that and do the right thing, and pgx is better tested/vetted to make sure that right thing happens.
Here, follow up #246 by adding a few more tests that verify a listener's state after `Close` has been invoked, including if it returned an error, which we're able to simulate by overriding pgx's `DialFunc` and returning a stand-in stub for an underlying `net.Conn`. Also, remove the explicit `Close` call on an underlying connection in favor of just invoking the pool's `Release` function. In case of an error condition, `Release` will detect that and do the right thing, and pgx is better tested/vetted to make sure that right thing happens.
Here, follow up #246 by adding a few more tests that verify a listener's state after `Close` has been invoked, including if it returned an error, which we're able to simulate by overriding pgx's `DialFunc` and returning a stand-in stub for an underlying `net.Conn`. Also, remove the explicit `Close` call on an underlying connection in favor of just invoking the pool's `Release` function. In case of an error condition, `Release` will detect that and do the right thing, and pgx is better tested/vetted to make sure that right thing happens.
As promised above:
The problem hasn't reappeared on my worker over the weekend, as expected. |
@mfrister That's great to hear!! Thanks for checking in about it. |
As part of #239, we're observing the possibility of the notifier going
into a hot loop where it's trying to reopen a listener connection, but
the listener won't let it because it thinks the connection is already
open.
A suspect is the listener's
Close
implementation, which in the eventof an error, returns the error and fails to release an underlying
connection, putting it into a state where it's never reusable.
Here, modify
Close
so that it always releases and unsets an underlyingconnection regardless of the error state returned.
I tried to add a test case for this, but reading through pgx and net
source code, I couldn't find any way to simulate an error from
Close
(I thought a cancelled context would do it, but it does not), so I had
to leave it.
Thanks @mfrister for pointing this one out!
Fixes #248.