-
Notifications
You must be signed in to change notification settings - Fork 302
Container not restarted on remaining nodes after reboot #1621
Comments
To me there seems to be a race condition when fleet is starting after reboot. It seems to try to start services before all the unit files are loaded. Extracting the relevant parts of the logs from @jeanfabrice - following the grafana.service and sidekick grafana-announce.service.
I see the same behavior with sidekick services where both services are needed. I run latest stable coreos (1122.3.0) shipped with fleet 0.11.7. I have followed the guide on https://www.digitalocean.com/community/tutorials/how-to-create-flexible-services-for-a-coreos-cluster-with-fleet-unit-files to set up sidekick services (adjusted with corrected variable names in the x-fleet sections.) |
@jeanfabrice have you started or just loaded the grafana-announce.service with fleet, i.e. is the DSTATE for grafana.announce.service loaded or launched when you run |
@cskarby Thanks for the heads up. Apart from workaround, I think fleet should be able to avoid such races. I'll try to look into that. |
First we need to check if it's already fixed. If the issue would be still there with master branch, we need to think about other possibilities. |
Hi,
I'm trying to have my containers survive a node failure, being rescheduled and restarted on surviving nodes in a 3-nodes Coreos beta channel cluster (1068.3.0).
I'm facing the following issue when shutting down a node member: Some containers get randomly restarted, some others don't. According to the fleet log, the root cause seems to be that the corresponding sidekick unit is not available on disk when Fleet decides to start a service unit.
Here is a unit and its sidekick counterpart (I'm using -announce to suffix the sidekick unit name) :
And here is the fleet log:
I have read many issues about similar behaviours, but all are now closed and seem to be related to Fleet v0.9.1.
I'm running the following :
Are there some misconfiguration in my units ?
The text was updated successfully, but these errors were encountered: