-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return rows as an AsyncStream
instead of buffering.
#128
Conversation
This allows streaming large result sets instead of bufferring them. The change is fairly invasive because the state machine has to be adapted to allow returning a `PgResponse` before the client is allowed to dispatch other requests on the connection. This is handled in the dispatcher where it expects such `PgResponse`s to provide a signal to release the connection. NOTE: this doesn't apply the strategy to prepared statements yet. Looking for feedback before moving forward.
@jeremyrsmith I'm not sure if this is something the project wants, but I believe it better represents Postgres' protocol since it produces rows one at a time on the wire. This is also why I took the liberty of breaking the API: it should probably represent the protocol as closely as possible. This can obviously be avoided, but it seemed more appropriate to me. Would it be possible let me know if this has a chance of getting merged (once complete) before I continue onto the prepared statement version? |
@plaflamme In general I think it's good to support streaming results. My only concern would be the overhead of I'd also defer to the more active maintainers (i.e. @leonmaia) because I haven't used (or worked on, really) this library for a while. |
@jeremyrsmith Thanks for the feedback. I pinged you because you're listed as the current maintainer in the As for the overhead, what are you thinking of exactly? I can see that What could be done is a configurable buffering when building the @leonmaia please let me know if this has any chance of getting merged, I'll continue the implementation. |
@plaflamme I meant overhead both in the performance sense and also in the API sense. The API is relatively straightforward with I agree with the motivation, though, and FWIW I think it's the right way to go. It's going to be a bit more complicated to do it right, since there are some new design decisions to make:
Etc. |
@jeremyrsmith I agree that the stream is cumbersome in the simple case. I added this flavour to the high-level API: https://github.com/finagle/finagle-postgres/pull/128/files#diff-107a2fd6c9e304147a40852aea34c910R46 It's not much, but it's not nothing :) Other such helper methods could be provided as well for other common access patterns.
|
@leonmaia does this have any chance of getting merged? I'd rather close it otherwise. |
@leonmaia Looks like you might have taken a look from the emoji on the previous comment. Any updates on this? |
This also applies the streaming of rows to the extended query mode.
3bc8d2f
to
90b1c4f
Compare
@jeremyrsmith I've added the remaining parts to also stream rows in the "extended query" mode. I've also undone the API breakage. Please take a look. |
(f: Row => T): Future[Seq[T]] | ||
def select[T](sql: String)(f: Row => T): Future[Seq[T]] = | ||
selectToStream(sql)(f).flatMap(_.toSeq) | ||
def selectToStream[T](sql: String)(f: Row => T): Future[AsyncStream[T]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can this be called selectStream
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing... Though should we also rename prepareAndQueryToStream
to prepareAndQueryStream
? I find that somewhat questionable, but I don't really have a strong opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I agree with you about that... I'm not totally sure TBH. I don't want to bikeshed the API too much, as long as it works people probably don't care all that much 😀
Let me give it some thought
Overall I think this looks quite nice. It does change some internal machinery a bit (most likely for the better 😄) and so there's some risk of breakage in real-world scenarios. I'd be for releasing this as a new minor version. It would be good to hear from @leonmaia (... ping 😄) since I don't use finagle or postgres these days and thus I don't really have a dog in the race. |
@@ -26,8 +27,9 @@ class QuerySpec extends FreeSpec with Matchers with MockFactory { | |||
|
|||
val client = new PostgresClient { | |||
|
|||
def prepareAndQuery[T](sql: String, params: Param[_]*)(f: (Row) => T): Future[Seq[T]] = | |||
def prepareAndQueryToStream[T](sql: String, params: Param[_]*)(f: (Row) => T): Future[AsyncStream[T]] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, AsyncStream already incorporates a Future. Its possible to construct an AsyncStream[T]
directly from Future[Seq[T]]
without wrapping it in the extra future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 from me. I've tested this on an internal project and not seen any regressions. I think it would make sense to flatten Future[AsyncStream[T]]
to just AsyncStream[T]
. Also would be good to add a streaming method to Query
but that could be done later.
CHANGELOG.md
Outdated
## <Next release> | ||
|
||
* Select results are now exposed as `AsyncStream[DataRow]` and result sets as `AsyncStream[Row]` | ||
* incompatible change: `PostgresClient.select` now returns `Future[AsyncStream[T]]` instead of `Future[Seq[T]]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now out of date.
@YarekTyshchenko Thanks for the feedback, I've added a commit to flatten out the @dangerousben Thanks for the feedback and testing this! I've added commits to address your comments, pleas take a look at 7035ea7 and 4c5703b |
All looks good. @leonmaia I'll merge this unless I hear an objection. |
This allows streaming large result sets instead of bufferring them.
The change is fairly invasive because the state machine has to be adapted to allow returning a
PgResponse
before the client is allowed to dispatch other requests on the connection.This is handled in the dispatcher where it expects such
PgResponse
s to provide a signal to release the connection.NOTE: this doesn't apply the strategy to prepared statements yet. Looking for feedback before moving forward.