Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streams using Paginators/Kotlin coroutines #206

Closed
mattbdean opened this issue Dec 11, 2017 · 2 comments
Closed

Streams using Paginators/Kotlin coroutines #206

mattbdean opened this issue Dec 11, 2017 · 2 comments
Assignees
Milestone

Comments

@mattbdean
Copy link
Owner

mattbdean commented Dec 11, 2017

The purpose of this feature is to continuously poll an API endpoint that returns a Listing (for example, /r/{subreddit}/comments), and notify the user when that listing provides new data. This is similar to PRAW's SubredditStream.

A Stream is an object that is infinitely iterable. Each new model detected is yielded to the Stream.

Mock usage:

Stream stream = redditClient.subreddit("redditdev").commentStream();

for (Comment newComment : stream) {
  // do something
}

Here are a few strategies for managing network requests:

  • Constant rate/no backoff: Each request is sent at a constant, user configurable rate
  • Exponential backoff: Each request that does not yield new data will increase the delay to the next request by a factor of 2. For example, the third request that does not yield new data will be followed by a delay of 23 = 8 seconds. The maximum delay should be user configurable.
  • Short-term learned backoff: The first few requests are executed at the maximum OAuth2 rate, 1 req/s. The amount of new models per request are kept track of. After a user-configurable number of requests, the average number of models per request is used to calculate the new request rate. Each request updates the average and by extension, the request rate. There should be some user-configurable error to the request rate, so if there's a 10% increase in comments, we won't miss out on all of them. If the request rate is too low, fall back to exponential backoff.
  • Long-term learned backoff: This is like short-term learned backoff, but it takes more variables into account like the time of day and the day of the week. This is a machine learning problem at its core. The recorded data should be able to be saved/loaded so when the stream shuts down it doesn't have to build up all of it's data again. This option is a bit optimistic and I'm not too sure how much use it'll actually get.

Streams would be ideally implemented using Kotlin coroutines, specifically using buildSequence.

A very basic example:

val seq = buildSequence {
  val data: Listing<T> = fetchLatestData().filter(/* if we've seen this model */)
  if (data.isEmpty()) {
    // delay next request
  } else {
    yieldAll(data)
  }
}

Here is PRAW's implementation for reference.

All this is subject to change, of course. Any feedback is welcome!

Related: #200

@mattbdean mattbdean added this to the v1.1.0 milestone Dec 11, 2017
@mattbdean mattbdean self-assigned this Dec 11, 2017
@mattbdean mattbdean changed the title feat: Streams using Paginators/Kotlin coroutines Streams using Paginators/Kotlin coroutines Dec 11, 2017
@eduard-netsajev
Copy link
Contributor

Constant rate is a valid strategy as well, at least haven't had any problems using it for smaller subreddits

@mattbdean
Copy link
Owner Author

@eduard-netsajev Agreed, I've added it to the list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants