Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve rescue of VM when node fail is detected #70

Conversation

atodorov-storpool
Copy link
Contributor

  • Introduce new LCM states PROLOG_MIGRATE_UNKNOWN and PROLOG_MIGRATE_UNKNOWN_FAILURE
  • Change VM migrate logic when state is ACTIVE and lcm_state is UNKNOWN to
    call TM's PROLOG_MIGR action before VMM's BOOT

All core TM drivers that are not empty are skipping disks so there is no impact on
the default behavior

The datastore addon drivers that implement access to raw block devices should check
if the LCM_STATE == 60 (PROLOG_MIGRATE_UNKNOWN) and to remove block device access
from the failed node and provide access to the current node.
There is a simple script function added to get LCM_STATE that can be used as follow

LCM_STATE=$(lcm_state)
if [ "$LCM_STATE" = "60" ]
then
# remove access from SRC
# add access for DST
fi

* Introduce new LCM states PROLOG_MIGRATE_UNKNOWN and PROLOG_MIGRATE_UNKNOWN_FAILURE

* Change VM migrate logic for when state is ACTIVE and lcm_state is UNKNOWN to
  call TM's PROLOG_MIGR action before VMM's BOOT

All core TM drivers that are not empty are skipping disks so there is no impact on
the default behaviour

The datastore addon drivers that implement access to raw block devices should check
if the LCM_STATE == 60 (PROLOG_MIGRATE_UNKNOWN) and to remove block device access
from the failed node and provide access to the current node.
There is a simple script function added to get LCM_STATE that can be used as follow

```bash
LCM_STATE=$(lcm_state)
if [ "$LCM_STATE" = "60" ]
fi
```
@atodorov-storpool
Copy link
Contributor Author

This pull request should be linked to feature request #3958 but I messed the "Pull request" field there.

atodorov-storpool added a commit to OpenNebula/addon-storpool that referenced this pull request Aug 31, 2015
When we are called in PROLOG_MIGRATE_UNKNOWN LCM state the SRC_HOST
is most probably DOWN so run detach command on DST host.

Hopefully a proposed patch to call tm/mv bofore trying to boot VM
from failed Host will be merged [pull request #70](OpenNebula/one#70)
to utilize this feature
@rsmontero
Copy link
Member

This is no merged in master. THANKS!

@rsmontero rsmontero closed this Mar 4, 2016
rsmontero pushed a commit that referenced this pull request Jul 2, 2020
rsmontero pushed a commit that referenced this pull request Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants