Add configuration settings for PubSub source #374

adatzer · 2024-10-08T05:29:16Z

istreeter · 2024-10-08T13:20:11Z

assets/docs/configuration/sources/pubsub-full-example.hcl

+    max_outstanding_messages = 2000
+
+    # Maximum size of unprocessed messages (default 1e9)
+    max_outstanding_bytes = 2e9


I am not familiar with the go sdk for pubsub, but I am very familiar with the java sdk for pubsub.

In the java sdk, these settings apply per streaming pull. E.g. if the app opens 50 streaming pulls, then your default setting will allow 50 * 2e9 outstanding bytes in total across all streaming pulls, i.e. 100 GB.

In Snowbridge the number of streaming pulls is set on this line (NumGoRoutines). Snowbridge sets it equal to the value of concurrent writes.

Thanks for the comment, @istreeter, it made me look a bit deeper.

To start with what this PR does, before exposing these settings, we were implicitly using these defaults from the client library (1000 and 1e9) already. In this PR these are not changed, only made explicit, so that they can be configured.

This particular file shows 2000 and 2e9 as example values, the defaults have been and still are half of that. Still, as example values are valid (based on these recommendations).

Indeed NumGoRoutines is the number of streaming pull connections maintained by the client.
However as far as i can tell so far, the MaxOutstanding* settings are meant to throttle the throughput no matter how many goroutines for pulling messages there are.

So my understanding is that these flow control settings do limit the outstanding messages/bytes per subscriber client. For example, if max_outstanding_bytes is set to 2e9 then, as long as the setting is respected(caveats seem to exist), it will allow up to 2GB of outstanding size across all streaming pulls.

Here is how the client flow control acquires a message. This happens for each message, that was received for each goroutine/streaming pull.

Having said that, i'd be surprised if the java and go clients are different by such a margin in memory settings, which makes me doubt my understanding. Please do share any more pointers!

In the java sdk, it also does exactly what you describe if "legacy flow control" is enabled, i.e. it uses a local FlowController which is shared across all streaming pulls.

But if it's using non-legacy flow control then it's slightly different. In the java sdk if you pick non-legacy flow control, then instead of using the local FlowController, it instead sets the options .setMaxOutstandingBytes on the streaming pull request. (Edit: Forgot to add link)

I notice the Go sdk also has an option UseLegacyFlowControl here on the ReceiveSettings.

I appreciate your comment about the purpose of this PR, and how this just makes explicit a setting that was previously non-configurable. That all makes sense 👍

Thanks for these comments Ian - very helpful in my current investigation of pubsub issues and how these settings should be configured!

Add configuration settings for PubSub source

e5780ea

istreeter reviewed Oct 8, 2024

View reviewed changes

pondzix approved these changes Oct 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add configuration settings for PubSub source #374

Add configuration settings for PubSub source #374

adatzer commented Oct 8, 2024 •

edited by jira bot

Loading

istreeter Oct 8, 2024

adatzer Oct 8, 2024 •

edited

Loading

istreeter Oct 9, 2024 •

edited

Loading

istreeter Oct 9, 2024

colmsnowplow Oct 25, 2024

Add configuration settings for PubSub source #374

Are you sure you want to change the base?

Add configuration settings for PubSub source #374

Conversation

adatzer commented Oct 8, 2024 • edited by jira bot Loading

istreeter Oct 8, 2024

Choose a reason for hiding this comment

adatzer Oct 8, 2024 • edited Loading

Choose a reason for hiding this comment

istreeter Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

istreeter Oct 9, 2024

Choose a reason for hiding this comment

colmsnowplow Oct 25, 2024

Choose a reason for hiding this comment

adatzer commented Oct 8, 2024 •

edited by jira bot

Loading

adatzer Oct 8, 2024 •

edited

Loading

istreeter Oct 9, 2024 •

edited

Loading