-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add requests-in-flight metric #1539
Conversation
@slaunay this looks amazing! looking forward to see the final results! |
Looking into why the build failed, it seems that toxiproxy fails to establish connection to the target broker Here are the logs from the failed build (
The failed initial metadata request led to a connection to a different broker In that test case the metrics are validated against the cluster (i.e. all brokers) and a specific one Looking into that build, it seems that there are plenty of those errors:
My guess is that broker number 1 stopped somehow (there is no Note that this failure is not related to the current PR but something that probably happens from time to time so I will submit a separate PR to address it. |
👋 I'm about to cut a new release when kafka 2.4.0 is released in a few days. Any further thoughts on this PR? |
You are probably asking the community but I would love to this integrated as I would then be able to propose some changes to fully honour The changes in I believe it will be a great addition to improve performance. |
Last build failed with configuration
It seems that there is a race condition between closing the TCP listener and creating a new one on the same port: Not sure why though as there is some synchronization in place to ensure that It is possible but unlikely that another test or process has reused that port before it is used on the new I have not seen that failure before but I do not believe it is related to this PR. |
It's just a flaky test, I restarted the affected build. |
- add requests-in-flight and requests-in-flight-for-broker-<brokerID> metrics to the broker - move some b.updateOutgoingCommunicationMetrics(...) just after b.write(...) for consistency - add -max-open-requests flag to kafka-producer-performance - update kafka-producer-performance to monitor total requests in flight - update unit tests - document new metrics This is a squash of the official PR based on more recent Sarama master: IBM#1539
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, there are some conflicts though.
- add requests-in-flight and requests-in-flight-for-broker-<brokerID> metrics to the broker - move some b.updateOutgoingCommunicationMetrics(...) just after b.write(...) for consistency - add -max-open-requests flag to kafka-producer-performance - update kafka-producer-performance to monitor total requests in flight - update unit tests - validate new metric in TestFuncProducing* - document new metrics
0acbf44
to
2db8d8a
Compare
^ I had to rebase as recent changes (#1573) was conflicting indeed and I took the opportunity to squash the original three commits into one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
Feature
Add
requests-in-flight
metric similar to the Kafka Java client described as:This is an interesting metric to monitor for optimizing throughput when writing to a Kafka cluster on a high latency network link and something that is recommended when implementing the Kafka protocol as described in their network section:
The related configuration in Sarama is
config.MaxOpenRequests
and defaults to5
like the Java Kafka client (max.in.flight.requests.per.connection
).This configuration is actually not fully honoured when using the
AsyncProducer
, more on that below.Changes
requests-in-flight
andrequests-in-flight-for-broker-<brokerID>
metrics to the brokerb.updateOutgoingCommunicationMetrics(...)
just afterb.write(...)
for consistency-max-open-requests
flag tokafka-producer-performance
kafka-producer-performance
to monitor total requests in flightTesting done
Running the updated tests:
Using the updated
kafka-producer-performance
to see how many total requests are in flight when writing to 5 brokers (with a network latency of ~70ms):Note that the maximum number of requests in flight is
5
which corresponds to the number of brokers (the topic has 64 partitions spread mostly evenly between those 5 brokers).And when writing to a single broker/topic partition:
As you can see the number of requests in flight seems to never go above
1
per broker despite havingconfig.MaxOpenRequests = 5
.This can be explained by a few things:
kafka-producer-performance
is captured every 5 seconds and there might not be much requests in flight at that time (could even be0
when accumulating records or building aProduceRequest
prior to sending it)-required-acks=-1
)AsyncProducer
only allows a singleProduceRequest
in flight at a time to a brokerThe last point can be seen when looking at the following section of the
brokerProducer
where the "bridge" goroutine builds and forwardsProduceRequest
s one at a time:https://github.com/Shopify/sarama/blob/675b0b1ff204c259877004140a540d6adf38db17/async_producer.go#L660-L674
and the fact that the
Produce
receiver on aBroker
is blocking till a response is received or an error occurs:https://github.com/Shopify/sarama/blob/675b0b1ff204c259877004140a540d6adf38db17/broker.go#L336-L355
I have another contribution pending to address that limitation without creating more go routines that results in better throughput in such scenario.
Nevertheless having such metric is one way to check how many requests are awaiting a response to a given broker or to the whole cluster.