Skip to content
This repository has been archived by the owner on Sep 12, 2024. It is now read-only.

disable GooglePubSubSender if pubsub cannot be reached #1030

Closed
wants to merge 1 commit into from

Conversation

mattnworb
Copy link
Member

Adds a isHealthy() method to the EventSender interface that is used to
filter out any unhealthy senders at startup.

This logic is done in a new class named EventSenderFactory, which
extracts the identical logic for constructing Lists of EventSenders from
the MasterService and the AgentService. To make this common class
possible, also extracted a class for the common configuration fields
between AgentConfig and MasterConfig. This new CommonConfiguration class
is incomplete - more fields can be moved to this class to avoid
duplication, but I left that for future commits.

I created EventSenderFactory in hopes of writing a test for the logic
that unhealthy senders are removed from the List, but the nature of this
class makes this test to impractical - since the
List-of-EventSender-building involves constructing new instances of
KafkaProvider/GooglePubSubProvider/etc and calling methods on instances
that those instances return, which is not really feasible to be mocked
out in a test.

@mattnworb
Copy link
Member Author

@davidxia @rohansingh @lndbrg @zalenski

this aims to avoid issues like googleapis/google-cloud-java#1432

Adds a `isHealthy()` method to the EventSender interface that is used to
filter out any unhealthy senders at startup.

This logic is done in a new class named EventSenderFactory, which
extracts the identical logic for constructing Lists of EventSenders from
the MasterService and the AgentService. To make this common class
possible, also extracted a class for the common configuration fields
between AgentConfig and MasterConfig. This new CommonConfiguration class
is incomplete - more fields can be moved to this class to avoid
duplication, but I left that for future commits.

I created EventSenderFactory in hopes of writing a test for the logic
that unhealthy senders are removed from the List, but the nature of this
class makes this test to impractical - since the
List-of-EventSender-building involves constructing new instances of
KafkaProvider/GooglePubSubProvider/etc and calling methods on instances
that those instances return, which is not really feasible to be mocked
out in a test.
@codecov-io
Copy link

Current coverage is 51.25% (diff: 40.81%)

Merging #1030 into master will increase coverage by 0.04%

@@             master      #1030   diff @@
==========================================
  Files           274        276     +2   
  Lines         13132      13135     +3   
  Methods           0          0          
  Messages          0          0          
  Branches       1700       1700          
==========================================
+ Hits           6725       6733     +8   
+ Misses         5903       5899     -4   
+ Partials        504        503     -1   

Powered by Codecov. Last update 8cd160e...f14c677

@lndbrg
Copy link

lndbrg commented Dec 2, 2016

@mattnworb i believe it would be beneficial to have a cache of senders that expires after n minutes and retries connection, in case the targeted event bus is down intermittently.

@mattnworb
Copy link
Member Author

I think there are two approaches we could take to improve this so that an intermittent network problem at startup does not cause the pubsub sender to be disabled for as long as the agent is up:

  1. Add a periodic out-of-band check (similar to getTopic here) that checks connectivity, and enables/disables the sender based on the result.

  2. Remove the healthcheck-at-startup added here and modify the failure listener in the FutureCallback added to the future returned by pubsub.publishAsync(..) to set a flag that disables sending for some configured period of time. When this time elapses, subsequent publish attempts would be made again.

I am leaning towards 1 as 2 would still cause publishing attempts if the agent had total unconnectivity to the pubsub service, and so the ThresholdBundler will still fill up with requests, but just at a slower rate.

@lndbrg
Copy link

lndbrg commented Dec 2, 2016

1 sounds like a good idea.

public void send(final KafkaRecord kafkaRecord) {
@Override
public boolean isHealthy() {
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to PR, but should we always return true here? Is there value in checking we can talk to kafka by using `KafkaProducer.waitOnMetadata() for example?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we don't have any issues with the kafka sender and bad behavior when messages can't be published, I opted to leave that out here to avoid changing anything that is working ok today.

@mattnworb mattnworb closed this Dec 2, 2016
@mattnworb mattnworb deleted the pubsub-healthcheck branch January 12, 2017 20:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants