Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Streaming results #12188

Closed
nik9000 opened this issue Jul 10, 2015 · 6 comments
Closed

Proposal: Streaming results #12188

nik9000 opened this issue Jul 10, 2015 · 6 comments
Labels
discuss :Search/Search Search-related issues that do not fall into other categories

Comments

@nik9000
Copy link
Member

nik9000 commented Jul 10, 2015

Maybe this is a crazy idea but I've spent some time working with another system (blazegraph) that supports sending large result sets over its API using HTTP 1.1's chunked encoding. The results stream back to the client and the client can close the tcp connection when it has enough results and the server stops producing results. I was wondering if it might make sense to do something similar to Elasticsearch. The advantage it'd have over scan/scroll is that its simpler to reason about when server side resources are in use - only as long as the tcp connection is open to the client.

I don't know enough about the overhead of scan/scroll to know if its worth doing. It doesn't solve the infinite scroll problem either - for that you need an efficient way for clients to poll deeply and this just isn't it.

@jprante
Copy link
Contributor

jprante commented Jul 10, 2015

Not sure how blocking chunked transfer can solve challenges like back pressure, but it should be possible to write RxJava https://github.com/ReactiveX/RxJava based code that implements reactive streams http://www.reactive-streams.org/ similar to http://mongodb.github.io/mongo-java-driver-reactivestreams/
This would be more easier if Observer pattern for actions and Java 8 lambdas could be used. Example of what can be done is "HTTP tail" for JVM https://github.com/myfreeweb/rxjava-http-tail

@nik9000
Copy link
Member Author

nik9000 commented Jul 20, 2015

Not sure how blocking chunked transfer can solve challenges like back pressure

It kind of can. There isn't anything that I know of like the triple-ack of tcp but there are buffers and you could in theory check how full they are and only try to fill them when they get below a certain point.

@clintongormley
Copy link

@nik9000 is this still something you want to investigate?

@clintongormley clintongormley added discuss :Search/Search Search-related issues that do not fall into other categories labels Jan 18, 2016
@nik9000
Copy link
Member Author

nik9000 commented Jan 18, 2016

@nik9000 is this still something you want to investigate?

I think its a neat idea and might be useful for something someday but it just doesn't have the crazy +1 train that some other proposals have accumulated. I'm going to close it. Maybe someone can revive it when they have some super awesome use case.

Honestly the flip side might be more useful: implement bulk indexing using chunked uploads. That has really simple back pressure on the uploading thread and would be simpler to implement. @mikemccand and I talked about it many months ago. The neat thing about it is that Elasticsearch can better manage its memory if the user is uploading using chunks - they can continue sending chunks until they want to make sure the translog has fsynced - then they send the last chunk and we consider the bulk request complete and run the fsync. Rather than having to load the whole bulk request we get to rely on tcp's back pressure to slow the client down so we can have as much of the bulk request "in flight" as we think is appropriate.

Its a neat idea but I dunno if its actually worth implementing.

@nik9000 nik9000 closed this as completed Jan 18, 2016
@mikemccand
Copy link
Contributor

implement bulk indexing using chunked uploads.

+1

The neat thing about it is that Elasticsearch can better manage its memory if the user is uploading using chunks

Manage its memory and also manage appropriate concurrency to bring to bear. Plus the client gets much simpler, not having to play games with proper item count per bulk request, how many client threads to use, dealing w/ rejected exceptions, etc.

@honzakral recently added some nice sugar to the ES python client APIs that does some of this for the user, so the user feels like they're using a single streaming bulk indexing API, and under the hood the Python ES client breaks it into chunks using N threads ...

@Bargs
Copy link
Contributor

Bargs commented Apr 12, 2016

+1 on chunked uploads. This would be a huge benefit to the CSV upload functionality I'm building into Kibana. Right now I have to make educated guesses about what bulk size will be the best for the largest number of users, and it just won't be a good experience for some people. If ES supported chunked uploads the entire thing could be implemented as one big stream from the user's browser, to Kibana's node backend, to ES and back.

(elastic/kibana#6541 and elastic/kibana#6844)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

5 participants