compute: stop handling virt lifecycle events in cleanup_host()

When rebooting a compute host, guest VMs can be getting shutdown automatically by the hypervisor and the virt driver is sending events to the compute manager to handle them. If the compute service is still up while this happens it will try to call the stop API to power off the instance and update the database to show the instance as stopped. When the compute service comes back up and events come in from the virt driver that the guest VMs are running, nova will see that the vm_state on the instance in the nova database is STOPPED and shut down the instance by calling the stop API (basically ignoring what the virt driver / hypervisor tells nova is the state of the guest VM). Alternatively, if the compute service shuts down after changing the intance task_state to 'powering-off' but before the stop API cast is complete, the instance can be in a strange vm_state/task_state combination that requires the admin to manually reset the task_state to recover the instance. Let's just try to avoid some of this mess by disconnecting the event handling when the compute service is shutting down like we do for neutron VIF plugging events. There could still be races here if the compute service is shutting down after the hypervisor (e.g. libvirtd), but this is at least a best attempt to do the mitigate the potential damage. Closes-Bug: #1444630 Related-Bug: #1293480 Related-Bug: #1408176 Change-Id: I1a321371dff7933cdd11d31d9f9c2a2f850fd8d9 (cherry picked from commit d1fb8d0fbdd6cb95c43b02f754409f1c728e8cd0)
author: Matt Riedemann <mriedem@us.ibm.com> 2015-04-15 11:51:26 -0700
committer: Matt Riedemann <mriedem@us.ibm.com> 2015-04-16 17:32:53 +0000
commit: b19764d2c6a8160102a806c1d6811c4182a8bac8 (patch)
tree: 6a641a9c36bf734b94269f38ae2afc39f1616f3a
parent: 22d7547c6b62fb9dabd861e4941edd34eedabfc6 (diff)
download: nova-b19764d2c6a8160102a806c1d6811c4182a8bac8.tar.gz
2 files changed, 5 insertions, 0 deletions
diff --git a/nova/compute/manager.py b/nova/compute/manager.py
index aa56396613..d8a04c9b47 100644
--- a/nova/compute/manager.py
+++ b/nova/compute/manager.py
@@ -1276,6 +1276,7 @@ class ComputeManager(manager.Manager):
             self._update_scheduler_instance_info(context, instances)
 
     def cleanup_host(self):
+        self.driver.register_event_listener(None)
         self.instance_events.cancel_all_events()
         self.driver.cleanup_host(host=self.host)
 
diff --git a/nova/tests/unit/compute/test_compute_mgr.py b/nova/tests/unit/compute/test_compute_mgr.py
index cd30ab0880..0379306e02 100644
--- a/nova/tests/unit/compute/test_compute_mgr.py
+++ b/nova/tests/unit/compute/test_compute_mgr.py
@@ -455,6 +455,10 @@ class ComputeManagerUnitTestCase(test.NoDBTestCase):
             mock_driver.init_host.assert_called_once_with(host='fake-mini')
 
             self.compute.cleanup_host()
+            # register_event_listener is called on startup (init_host) and
+            # in cleanup_host
+            mock_driver.register_event_listener.assert_has_calls([
+                mock.call(self.compute.handle_events), mock.call(None)])
             mock_driver.cleanup_host.assert_called_once_with(host='fake-mini')
 
     def test_init_host_with_deleted_migration(self):
author	Matt Riedemann <mriedem@us.ibm.com>	2015-04-15 11:51:26 -0700
committer	Matt Riedemann <mriedem@us.ibm.com>	2015-04-16 17:32:53 +0000
commit	b19764d2c6a8160102a806c1d6811c4182a8bac8 (patch)
tree	6a641a9c36bf734b94269f38ae2afc39f1616f3a
parent	22d7547c6b62fb9dabd861e4941edd34eedabfc6 (diff)
download	nova-b19764d2c6a8160102a806c1d6811c4182a8bac8.tar.gz