-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add configuration settings for PubSub source #374
base: develop
Are you sure you want to change the base?
Conversation
max_outstanding_messages = 2000 | ||
|
||
# Maximum size of unprocessed messages (default 1e9) | ||
max_outstanding_bytes = 2e9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with the go sdk for pubsub, but I am very familiar with the java sdk for pubsub.
In the java sdk, these settings apply per streaming pull. E.g. if the app opens 50 streaming pulls, then your default setting will allow 50 * 2e9
outstanding bytes in total across all streaming pulls, i.e. 100 GB.
In Snowbridge the number of streaming pulls is set on this line (NumGoRoutines
). Snowbridge sets it equal to the value of concurrent writes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment, @istreeter, it made me look a bit deeper.
To start with what this PR does, before exposing these settings, we were implicitly using these defaults from the client library (1000 and 1e9) already. In this PR these are not changed, only made explicit, so that they can be configured.
This particular file shows 2000
and 2e9
as example values, the defaults have been and still are half of that. Still, as example values are valid (based on these recommendations).
Indeed NumGoRoutines
is the number of streaming pull connections maintained by the client.
However as far as i can tell so far, the MaxOutstanding*
settings are meant to throttle the throughput no matter how many goroutines for pulling messages there are.
So my understanding is that these flow control settings do limit the outstanding messages/bytes per subscriber client. For example, if max_outstanding_bytes
is set to 2e9
then, as long as the setting is respected(caveats seem to exist), it will allow up to 2GB of outstanding size across all streaming pulls.
Here is how the client flow control acquires a message. This happens for each message, that was received for each goroutine/streaming pull.
Having said that, i'd be surprised if the java and go clients are different by such a margin in memory settings, which makes me doubt my understanding. Please do share any more pointers!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the java sdk, it also does exactly what you describe if "legacy flow control" is enabled, i.e. it uses a local FlowController
which is shared across all streaming pulls.
But if it's using non-legacy flow control then it's slightly different. In the java sdk if you pick non-legacy flow control, then instead of using the local FlowController
, it instead sets the options .setMaxOutstandingBytes
on the streaming pull request. (Edit: Forgot to add link)
I notice the Go sdk also has an option UseLegacyFlowControl
here on the ReceiveSettings
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate your comment about the purpose of this PR, and how this just makes explicit a setting that was previously non-configurable. That all makes sense 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these comments Ian - very helpful in my current investigation of pubsub issues and how these settings should be configured!
Jira ref: PDP-1480