-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid bootstrap servers list blocks during startup #1473
Comments
It will continue to check DNS until thr name resolves. This helps maintain
continuous performance during temporary network outages. What were you
expecting?
…On Thu, Apr 12, 2018, 8:52 AM mmodenesi ***@***.***> wrote:
>>> kafka.KafkaProducer(bootstrap_servers='can-not-resolve:9092')# never returns
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1473>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAzetGVFCuerzcYyNfIVAF3Su8tny1n8ks5tn3hFgaJpZM4TSDXV>
.
|
I see your point. So, this is a good thing when there is a temporary problem and the name will eventually be resolved properly again. But what if the provided name is plain wrong? (say, for example, you edited some configuration files and mispelled the server fqdn). I though "there should be an optional Line 288 in c0fddbd
(Note that the presence of the named argument made me think I was on the right track) which is called from kafka-python/kafka/client_async.py Line 245 in c0fddbd
definitely without any arguments. In my opinion, this should not enter the infinite loop unavoidably. I am only starting with both kafka and your code, so I don't know the implications of this (I mean, may be this function is called on bootstrapping and then every 10 seconds for metadata refreshing, I really don't know if what I am asking is way stupid). This is what I would expect:
|
The two options here would be (1) validate that bootstrap_servers resolve via DNS to at least one IP address; (2) validate that bootstrap succeeds and we are able to get initial cluster metadata. |
I agree with @mmodenesi that this should be configurable, either with a timeout or a max number of retries, and should raise an exception on exhaustion of retries. Ultimately the caller should be able to make a choice about whether to reconnect ad naseum or handle the exception in a different way. Also, I think the |
I've started on these changes here: svvitale@e60129c @dpkp, let me know if you agree with this approach and I'd be happy to add documentation for these new parameters and file a pull request. |
@dpkp, I saw your PR disabled the DNS retries. As @svvitale noted, what do you think about making this configurable? I understand the perspective of folks who want this to fail loudly and immediately in case they typo'd. However, my day job has a flaky DNS server that's owned by another team and every few hours it will drop some queries... I'd prefer not to make my kafka-python wrapper more complex just to handle these retries. Futhermore, while ours is worse than most, I would still assert that in all production environments, DNS cannot assumed to be reliable 100% of the time, so adding the option of retries is useful. So would it be possible to add a timeout config that defaults to failing immediately, but could be passed a larger value to keep retrying for a period of time? Or as @svvitale suggested, could just inherit this retry timeout value from If you want a PR for this, I'm more than happy to do so, but I wanted to discuss the API first. |
That seems like a good thing to make configurable. To be clear, it only currently affects bootstrapping (once an initial metadata response is found, all future "reconnects" should continue retries after backoff even if DNS fails). But I do think that folks would prefer the default behavior be to assume DNS is functioning properly and not retry at all. |
The text was updated successfully, but these errors were encountered: