Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share buffer pool across all partitions #310

Merged
merged 3 commits into from
Jul 9, 2020

Conversation

merlimat
Copy link
Contributor

@merlimat merlimat commented Jul 6, 2020

Motivation

When a producer is publishing on many partitions, there can be significant memory overhead in maintaining a per-partition pool. Instead, there's not significant perf impact in using a single shared buffer pool.

@merlimat merlimat added this to the 0.2.0 milestone Jul 6, 2020
@merlimat merlimat self-assigned this Jul 6, 2020
Copy link
Member

@wolfstudy wolfstudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, the change LGTM+1, just only some concerns whether the use of sync.pool here will cause GC pressure because sync.Pool cannot specify a size, which is only subject to the GC threshold.

@@ -62,6 +63,8 @@ func newProducerCommand() *cobra.Command {
"Publish rate. Set to 0 to go unthrottled")
flags.IntVarP(&produceArgs.BatchingTimeMillis, "batching-time", "b", 1,
"Batching grouping time in millis")
flags.IntVarP(&produceArgs.BatchingMaxSize, "batching-max-size", "", 128,
"Max size of a batch (in KB)")
Copy link
Member

@wolfstudy wolfstudy Jul 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pulsar-perf produce about --batch-max-messages is abbreviated and the default value is as follows, can we consider keeping the two consistent?

   -bm, --batch-max-messages
       Maximum number of messages per batch
       Default: 1000

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This are 2 different settings. One is the number of messages in 1 batch and the other is the max size of the batch

@merlimat
Copy link
Contributor Author

merlimat commented Jul 8, 2020

just only some concerns whether the use of sync.pool here will cause GC pressure because sync.Pool cannot specify a size, which is only subject to the GC threshold.

Using the pool will not causing GC pressure itself: it's actually there to avoid the GC pressure.

The pool having no max size is not a big problem. The max amount of memory is still determined by the "pending" messages whose payloads are buffered until they get acknowledged by the broker.

This change is to use a single pool to avoid that each pool will have a few buffers that cannot be immediately reused. After this change, the memory usage is the same as without the pooling.

Copy link
Member

@wolfstudy wolfstudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@wolfstudy wolfstudy merged commit 9cbe36f into apache:master Jul 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants