diff options
author | Matt Riedemann <mriedem@us.ibm.com> | 2015-04-15 11:51:26 -0700 |
---|---|---|
committer | Matt Riedemann <mriedem@us.ibm.com> | 2015-04-16 17:32:53 +0000 |
commit | b19764d2c6a8160102a806c1d6811c4182a8bac8 (patch) | |
tree | 6a641a9c36bf734b94269f38ae2afc39f1616f3a | |
parent | 22d7547c6b62fb9dabd861e4941edd34eedabfc6 (diff) | |
download | nova-b19764d2c6a8160102a806c1d6811c4182a8bac8.tar.gz |
compute: stop handling virt lifecycle events in cleanup_host()
When rebooting a compute host, guest VMs can be getting shutdown
automatically by the hypervisor and the virt driver is sending events to
the compute manager to handle them. If the compute service is still up
while this happens it will try to call the stop API to power off the
instance and update the database to show the instance as stopped.
When the compute service comes back up and events come in from the virt
driver that the guest VMs are running, nova will see that the vm_state
on the instance in the nova database is STOPPED and shut down the
instance by calling the stop API (basically ignoring what the virt
driver / hypervisor tells nova is the state of the guest VM).
Alternatively, if the compute service shuts down after changing the
intance task_state to 'powering-off' but before the stop API cast is
complete, the instance can be in a strange vm_state/task_state
combination that requires the admin to manually reset the task_state to
recover the instance.
Let's just try to avoid some of this mess by disconnecting the event
handling when the compute service is shutting down like we do for
neutron VIF plugging events. There could still be races here if the
compute service is shutting down after the hypervisor (e.g. libvirtd),
but this is at least a best attempt to do the mitigate the potential
damage.
Closes-Bug: #1444630
Related-Bug: #1293480
Related-Bug: #1408176
Change-Id: I1a321371dff7933cdd11d31d9f9c2a2f850fd8d9
(cherry picked from commit d1fb8d0fbdd6cb95c43b02f754409f1c728e8cd0)
-rw-r--r-- | nova/compute/manager.py | 1 | ||||
-rw-r--r-- | nova/tests/unit/compute/test_compute_mgr.py | 4 |
2 files changed, 5 insertions, 0 deletions
diff --git a/nova/compute/manager.py b/nova/compute/manager.py index aa56396613..d8a04c9b47 100644 --- a/nova/compute/manager.py +++ b/nova/compute/manager.py @@ -1276,6 +1276,7 @@ class ComputeManager(manager.Manager): self._update_scheduler_instance_info(context, instances) def cleanup_host(self): + self.driver.register_event_listener(None) self.instance_events.cancel_all_events() self.driver.cleanup_host(host=self.host) diff --git a/nova/tests/unit/compute/test_compute_mgr.py b/nova/tests/unit/compute/test_compute_mgr.py index cd30ab0880..0379306e02 100644 --- a/nova/tests/unit/compute/test_compute_mgr.py +++ b/nova/tests/unit/compute/test_compute_mgr.py @@ -455,6 +455,10 @@ class ComputeManagerUnitTestCase(test.NoDBTestCase): mock_driver.init_host.assert_called_once_with(host='fake-mini') self.compute.cleanup_host() + # register_event_listener is called on startup (init_host) and + # in cleanup_host + mock_driver.register_event_listener.assert_has_calls([ + mock.call(self.compute.handle_events), mock.call(None)]) mock_driver.cleanup_host.assert_called_once_with(host='fake-mini') def test_init_host_with_deleted_migration(self): |