-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Method to remove decommissioned datacenter from catalog #5881
Comments
@robn fully agree on this. Having a way to de-provision/disconnect a remote DC would be a great additional feature |
We already have WIP on forced reaping serf members which I think is exactly the same issue as here just for WAN pool rather than LAN. There is an internal RFC being worked on currently that should address that. This isn't a dupe as the use-case is different but hopefully will be fixed by the same thing as #2981. |
This should be resolved with #6582 🤞 |
I think this issues should be fixed since #6420 was merged. Left servers no longer appear and thus inaccessible datacenters no longer are in the catalog. |
Ok, I tested it on master and this is what happened:
Before #6420 |
Based on your output, yes, that will definitely take care of it. I do not intend to be decommissioning another datacentre within the next five years, if ever, but I am glad to know I will not run into this again! 😉 Thanks! |
Hey there, This issue has been automatically locked because it is closed and there hasn't been any activity for at least 30 days. If you are still experiencing problems, or still have questions, feel free to open a new one 👍. |
Feature Description
Some method to remove a decommissioned datacenter from the catalog. I suggest something like
consul operator raft remove-peer
for the WAN pool.Use Case(s)
When a datacenter is decommissioned, all Consul nodes are shut down, so its server pool is effectively destroyed. In other datacenters, those nodes still appear in the WAN pool in the "left" state, waiting for the
reconnect_timeout_wan
time period to pass after which they would be cleaned up.Since those nodes still "exist", the datacenter is still listed in the catalog. Any tools that then want to operate across datacenters (many of our internal tools) pull this list, then try to execute an operation in each datacenter. The operation for the decommissioned datacenter fails with
500 Internal Server Error: No path to datacenter
, which is correct, but not helpful as you can't programatically understand that the datacenter is legitimately unavailable vs a genuine quorum loss or something deeper (network damage).When I decommissioned a datacenter last week, I looked for an equivalent of
consul operator raft remove-peer
for the WAN pool, but didn't find one. Something like that is probably all that's needed, since this is such a rare situation and automatically handling it is likely complicated.The text was updated successfully, but these errors were encountered: