-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
talos_cluster_health
apparently requires control_plane_nodes
to be IP addresses
#143
Comments
I ran into the same issue and was wondering what's best practice with endpoint and nodes arguments. In my talosconfig, I'm using fqdn (at home) or IPs (in cloud) as endpoints and hostnames as nodes. It's easier to distinguish nodes by their name rather than remembering IPs, e.g. with talosctl |
I just found out that the ip addresses given in I can imagine that this might be a bit too strict in some cases. E.g. when control plane nodes fail but etcd still has quorum, this health check will fail and potentially block the applying of changes to fix the situation. In my case, the health check took forever and failed because I used the control plane's IPv4 addresses while etcd members use IPv6 addresses. Wouldn't it be more appropriate to relax the check and just check whether etcd has quorum? (This also wouldn't require to know the exact etcd members IPs). Or maybe make it optionally possible to check for quorum only (e.g. by not giving control plane IPs or by setting an option)? |
The checks are currently designed for a full cluster wide health (cluster here does not mean kubernetes, but the whole talos cluster). Etcd advertise subnets can be user defined to specify which addresses to listen on, so it's entirely user customizable, otherwise talos would just try to pick a default |
@frezbo I think the problem is that the underlying Talos health check is not flexible enough for multi-homed clusters, it assumes a single IP per node. this could be fixed of course. |
FYI I was able to work around my issue (control plane using IPv6 addresses not known to Terraform) by using (Yes, etcd advertise subnets can be configured. I intentionally set it to 2000::/3 on my cloud servers since IPv4 might not always be available on all of them) |
I see an error like the following when using
talos_cluster_health
:The error message isn't clear but a little testing shows this is coming from the
control_plane_nodes
value. This requirement is in contrast to other data sources that takenodes
liketalos_client_configuration
ortalos_cluster_kubeconfig
.Additionally, I think examples of exactly how to use
talos_cluster_health
to do whattalos_cluster_kubeconfig.wait
did would be helpful. Replacing an argument with a data source deserves explanation.The text was updated successfully, but these errors were encountered: