summaryrefslogtreecommitdiff
path: root/nova/virt
Commit message (Collapse)AuthorAgeFilesLines
* libvirt: Do not reference VIR_ERR_DEVICE_MISSING when libvirt is < v4.1.0Lee Yarwood2021-01-222-8/+25
| | | | | | | | | | | | | | | | | | I7eb86edc130d186a66c04b229d46347ec5c0b625 introduced VIR_ERR_DEVICE_MISSING into the hot unplug libvirt error code list within detach_device_with_retry. While the change correctly referenced that the error code was introduced in v4.1.0 it made no attempt to handle versions prior to this. With MIN_LIBVIRT_VERSION currently pinned to v4.0.0 we need to handle libvirt < v4.1.0 to avoid referencing the non-existent error code within the libvirt module. Closes-Bug: #1891547 Change-Id: I32908b77c18f8ec08211dd67be49bbf903611c34 (cherry picked from commit bc96af565937072c04dea31781d86d2073b77ed4) (cherry picked from commit 3f3b889f4e7e204a140d32d71201c4f23dd54c24) (cherry picked from commit c61f4c8e20d712ba84a8965cbe0cba90c7d27d0b) (cherry picked from commit 334a479ae2f4ce3d48dcc4c1b9e14d0cb9822272) (cherry picked from commit 9c885b67a9e6c30570084f1f78218defa0278d83)
* libvirt: Handle VIR_ERR_DEVICE_MISSING when detaching devicesLee Yarwood2021-01-211-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduced in libvirt v4.1.0 [1] this error code replaces the previously raised VIR_ERR_INVALID_ARG, VIR_ERR_OPERATION_FAILED and VIR_ERR_INVALID_ARG codes [2][3]. VIR_ERR_OPERATION_FAILED was introduced and tested as an active/live/hot unplug config device detach error code in I131aaf28d2f5d5d964d4045e3d7d62207079cfb0. VIR_ERR_INTERNAL_ERROR was introduced and tested as an active/live/hot unplug config device detach error code in I3055cd7641de92ab188de73733ca9288a9ca730a. VIR_ERR_INVALID_ARG was introduced and tested as an inactive/persistent/cold unplug config device detach error code in I09230fc47b0950aa5a3db839a070613c9c817576. This change introduces support for the new VIR_ERR_DEVICE_MISSING error code while also retaining coverage for these codes until MIN_LIBVIRT_VERSION is bumped past v4.1.0. The majority of this change is test code motion with the existing tests being modified to run against either the active or inactive versions of the above error codes for the time being. test_detach_device_with_retry_operation_internal and test_detach_device_with_retry_invalid_argument_no_live have been removed as they duplicate the logic within the now refactored _test_detach_device_with_retry_second_detach_failure. [1] https://libvirt.org/git/?p=libvirt.git;a=commit;h=bb189c8e8c93f115c13fa3bfffdf64498f3f0ce1 [2] https://libvirt.org/git/?p=libvirt.git;a=commit;h=126db34a81bc9f9f9710408f88cceaa1e34bbbd7 [3] https://libvirt.org/git/?p=libvirt.git;a=commit;h=2f54eab7c7c618811de23c60a51e910274cf30de Closes-Bug: #1887946 Change-Id: I7eb86edc130d186a66c04b229d46347ec5c0b625 (cherry picked from commit 902f09af251d2b2e56fb2f2900a3510baf38a508) (cherry picked from commit 93058ae1b8bc1b1728f08b9e606b68318751fc3b) (cherry picked from commit 863d6ef7601302901fa3368ea8457b3564eeb501) (cherry picked from commit 76428c1a6a7796391957a3e83207f85cfe924505) (cherry picked from commit 74b053f47a659a0250d051020d6c8b4e3c256e7d)
* libvirt: Remove reference to transient domain when detaching devicesLee Yarwood2021-01-211-6/+8
| | | | | | | | | | | | | | | | | | | | | When detaching a device from a domain we first attempt to remove the device from both the persistent and live configs before looping to ensure the device has really been detached from the running live config. Previously when this failed we logged an error message that suggested that this was due to issues detaching the device from a transient domain, however this is not the case as the domain is persistent. This change simply updates the error and associated comments to only reference the live config of the domain. Additionally a DEBUG line claiming that a device has been successfully detached is now only logged once the device is removed from the live config, hopefully avoiding any confusion from this line been logged each time an attempt is made to detach the device. Change-Id: If869470216600c303d47cf79f12c4fc88abcf813 (cherry picked from commit 636c7461dee4002571da6e99986eb17e9a28b0f4)
* sync_guest_time: use the proper errnoChen Hanxiao2021-01-211-1/+1
| | | | | | | | | | | In qemuDomainSetTime, VIR_ERR_OPERATION_UNSUPPORTED is used to report qemu doesn't support it. [1]: https://github.com/libvirt/libvirt/blob/228ae70938d0cb85353e35f744fbc494de619481/src/qemu/qemu_driver.c#L19437 Change-Id: I84ddb9c434625fd4a57a4f54d0856044e1c56f3f Signed-off-by: Chen Hanxiao <chenhx@certusnet.com.cn> (cherry picked from commit a991471f3e14298a8b32d1b5d566c895cea1c8e4)
* libvirt: Provide VIR_MIGRATE_PARAM_PERSIST_XML during live migrationLee Yarwood2020-09-181-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The VIR_MIGRATE_PARAM_PERSIST_XML parameter was introduced in libvirt v1.3.4 and is used to provide the new persistent configuration for the destination during a live migration: https://libvirt.org/html/libvirt-libvirt-domain.html#VIR_MIGRATE_PARAM_PERSIST_XML Without this parameter the persistent configuration on the destination will be the same as the original persistent configuration on the source when the VIR_MIGRATE_PERSIST_DEST flag is provided. As Nova does not currently provide the VIR_MIGRATE_PARAM_PERSIST_XML param but does provide the VIR_MIGRATE_PERSIST_DEST flag this means that a soft reboot by Nova of the instance after a live migration can revert the domain back to the original persistent configuration from the source. Note that this is only possible in Nova as a soft reboot actually results in the virDomainShutdown and virDomainLaunch libvirt APIs being called that recreate the domain using the persistent configuration. virDomainReboot does not result in this but is not called at this time. The impact of this on the instance after the soft reboot is pretty severe, host devices referenced in the original persistent configuration on the source may not exist or could even be used by other users on the destination. CPU and NUMA affinity could also differ drastically between the two hosts resulting in the instance being unable to start etc. As MIN_LIBVIRT_VERSION is now > v1.3.4 this change simply includes the VIR_MIGRATE_PARAM_PERSIST_XML param using the same updated XML for the destination as is already provided to VIR_MIGRATE_PARAM_DEST_XML. Conflicts: nova/tests/unit/virt/libvirt/test_driver.py nova/tests/unit/virt/test_virt_drivers.py nova/virt/libvirt/driver.py nova/virt/libvirt/guest.py NOTE(lyarwood): Conflicts as If0a091a7441f2c3269148e40ececc3696d69684c (libvirt: Bump MIN_{LIBVIRT,QEMU}_VERSION for "Rocky"), Id9ee1feeadf612fa79c3d280cee3a614a74a00a7 (libvirt: Remove usage of migrateToURI{2} APIs) and I3af68f745ffb23ef2b5407ccec0bebf4b2645734 (Remove mox in test_virt_drivers.py) are not present on stable/queens. As a result we can now add the parameter directly in _live_migration_operation before calling down into guest.migrate. Co-authored-by: Tadayoshi Hosoya <tad-hosoya@wr.jp.nec.com> Closes-Bug: #1890501 Change-Id: Ia3f1d8e83cbc574ce5cb440032e12bbcb1e10e98 (cherry picked from commit 1bb8ee95d4c3ddc3f607ac57526b75af1b7fbcff) (cherry picked from commit bbf9d1de06e9991acd968fceee899a8df3776d60) (cherry picked from commit 6a07edb4b29d8bfb5c86ed14263f7cd7525958c1) (cherry picked from commit b9ea91d17703f5b324a50727b6503ace0f4e95eb) (cherry picked from commit c438fd9a0eb1903306a53ab44e3ae80660d8a429)
* Removed the host FQDN from the exception messagePraharshitha Metla2020-09-032-2/+2
| | | | | | | | | | | | | | | | | Deletion of an instance after disabling the hypervisor by a non-admin user leaks the host fqdn in fault msg of instance.Removing the 'host' field from the error message of HypervisorUnavaiable cause it's leaking host fqdn to non-admin users. The admin user will see the Hypervisor unavailable exception msg but will be able to figure on which compute host the guest is on and that the connection is broken. Change-Id: I0eae19399670f59c17c9a1a24e1bfcbf1b514e7b Closes-Bug: #1851587 (cherry picked from commit a89ffab83261060bbb9dedb2b8de6297b2d07efd) (cherry picked from commit ff82601204e9d724b3032dc94c49fa5c8de2699b) (cherry picked from commit c5abbd17b5552209e53ad61713c4787f47f463c6) (cherry picked from commit d5ff9f87c8af335e1f83476319a2540fead5224c) (cherry picked from commit 8c4af53d7754737f6857c25820a256487c45e676)
* ironic: add instance_uuid before any other spawn activityJim Rollenhagen2020-07-282-4/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the baremetal use case, it is necessary to allow a remote source of truth assert information about a physical node, as opposed to a client asserting desired configuration. As such, the prepare_networks_before_block_device_mapping virt driver method was added, however in doing so network VIF attaching was moved before actual hardware reservation. As ironic only supports users to have a number of VIFs limited by the number of network interfaces in the physical node, VIF attach actions cannot be performed without first asserting control over the node. Adding an "instance_uuid" upfront allows other ironic API consumers to be aware that a node is now in use. Alternatively it also allows nova to become aware prior to adding network information on the node that the node may already be in use by an external user. Co-Authored-By: Julia Kreger <juliaashleykreger@gmail.com> Conflicts: nova/tests/unit/compute/test_compute_mgr.py NOTE(melwitt): The conflict is because change I755b6fdddc9d754326cd9c81b6880581641f73e8 is not in Queens. Closes-Bug: #1766301 Change-Id: I87f085589bb663c519650f307f25d087c88bbdb1 (cherry picked from commit e45c5ec819cfba3a45367bfe7dd853769c80d816)
* Merge "fix scsi disk unit number of the attaching volume when cdrom bus is ↵Zuul2020-07-211-5/+5
|\ | | | | | | scsi" into stable/queens
| * fix scsi disk unit number of the attaching volume when cdrom bus is scsiKevin Zhao2020-07-201-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | From Image Meta Properties: hw_cdrom_bus=scsi, and use virtio-scsi mode, it will also need a disk address for it. So we need to calculate the disk address when call the function to get the next unit of scsi controller. Closes-Bug: #1867075 Change-Id: Ifd8b249de3e8f96fa13db252f0abe2b1bd950de0 Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org> (cherry picked from commit c8d6767cf8baaf3cc81496c83db10c8ae72fce06) (cherry picked from commit 11b2b7f0b3a8c09216cd8ebfea8b4cd059605290) (cherry picked from commit 86328d11468af66d95587b53ce28f65ed92c46d7) (cherry picked from commit 3b2c6ccf261ddb810473954559fa1dd1454e9f09)
* | libvirt: Don't delete disks on shared storage during evacuateMatthew Booth2020-07-161-44/+137
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When evacuating an instance between compute hosts on shared storage, during the rebuild operation we call spawn() on the destination compute. spawn() currently assumes that it should cleanup all resources on failure, which results in user data being deleted in the evacuate case. This change modifies spawn in the libvirt driver such that it only cleans up resources it created. Conflicts: nova/virt/libvirt/driver.py NOTE(lyarwood): Conflicts due to I51673e58fc8d5f051df911630f6d7a928d123a5b ("Revert resize: wait for events according to hybrid plug") not being present in stable/queens. Co-Authored-By: Lee Yarwood <lyarwood@redhat.com> Closes-Bug: #1550919 Change-Id: I764481966c96a67d993da6e902dc9fc3ad29ee36 (cherry picked from commit 497360b0ea970f1e68912be8229ef8c3f5454e9e) (cherry picked from commit 8b48ca672d9c0eb108c71b7f9f3f089d9ecf688a) (cherry picked from commit 1a320f2a0e0918de6afcce5cf23b7de178ec3a49) (cherry picked from commit a7d8aa699793e1d60ece4e03920e6041337f9a43) (cherry picked from commit ae7602c1206de23439d6c3609b5872831138aa99)
* | Merge "libvirt: Fix misleading debug msg "Instance is running"" into ↵Zuul2020-07-151-1/+1
|\ \ | | | | | | | | | stable/queens
| * | libvirt: Fix misleading debug msg "Instance is running"Matthew Booth2020-07-151-1/+1
| |/ | | | | | | | | | | | | | | | | We were logging "Instance is running" after guest creation, but before the instance is running. The log message we emit when the instance is actually running is "Instance spawned successfully". Change-Id: I53ef1fb6a612fc55fa60f3a50f8710c8bf5caba4 (cherry picked from commit 339c29f98051edac3f3ee63f1fd6887c5fc08d2e)
* | Merge "Make RBD imagebackend flatten method idempotent" into stable/queensZuul2020-07-101-1/+11
|\ \ | |/ |/|
| * Make RBD imagebackend flatten method idempotentVladyslav Drok2020-02-191-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If glance and nova are both configured with RBD backend, but glance does not return location information from the API, nova will fail to clone the image from glance pool and will download it from the API. In this case, image will be already flat, and subsequent flatten call will fail. This commit makes flatten call idempotent, so that it ignores already flat images by catching ImageUnacceptable when requesting parent info from ceph. Closes-Bug: 1860990 Change-Id: Ia6c184c31a980e4728b7309b2afaec4d9f494ac3 (cherry picked from commit 65825ebfbd58920adac5e8594891eec8e9cec41f) (cherry picked from commit 03d59e289369df4980bc1e7350e7f52a6f6aa828) (cherry picked from commit dd3c17216cdf2814cbefc83371c712b3dd9d9147) (cherry picked from commit 5d44052fedc9914aed4de4af3dcae4de3a03a856)
* | Merge "Revert "nova shared storage: rbd is always shared storage"" into ↵Zuul2020-05-251-4/+0
|\ \ | | | | | | | | | stable/queens
| * | Revert "nova shared storage: rbd is always shared storage"hutianhao272020-05-111-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 05b7f63a42e3512da6fe45d2e6639fb47ed8102b. The _is_storage_shared_with method is specifically checking if the instance directory is shared. It is not checking if the actual instance disks are shared and as a result assumptions cannot be made based on the value of images_type. Closes-Bug: 1824858 Change-Id: I52293b6ce3e1ce77fa31b382d0067fb3bc68b21f (cherry picked from commit 404932f82130445472837095b3ad9089c75e2660) (cherry picked from commit 890882ebbf74db14a7c1904cca96cd7f5907493b) (cherry picked from commit a63c97fd2d13de532330523587894163e32a892f) (cherry picked from commit 44609c5847dfe591caf3f40497f1abc7c7aa7384)
* | | Lowercase ironic driver hash ring and ignore case in cachemelanie witt2020-05-011-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recently we had a customer case where attempts to add new ironic nodes to an existing undercloud resulted in half of the nodes failing to be detected and added to nova. Ironic API returned all of the newly added nodes when called by the driver, but half of the nodes were not returned to the compute manager by the driver. There was only one nova-compute service managing all of the ironic nodes of the all-in-one typical undercloud deployment. After days of investigation and examination of a database dump from the customer, we noticed that at some point the customer had changed the hostname of the machine from something containing uppercase letters to the same name but all lowercase. The nova-compute service record had the mixed case name and the CONF.host (socket.gethostname()) had the lowercase name. The hash ring logic adds all of the nova-compute service hostnames plus CONF.host to hash ring, then the ironic driver reports only the nodes it owns by retrieving a service hostname from the ring based on a hash of each ironic node UUID. Because of the machine hostname change, the hash ring contained, for example: {'MachineHostName', 'machinehostname'} when it should have contained only one hostname. And because the hash ring contained two hostnames, the driver was able to retrieve only half of the nodes as nodes that it owned. So half of the new nodes were excluded and not added as new compute nodes. This adds lowercasing of hosts that are added to the hash ring and ignores case when comparing the CONF.host to the hash ring members to avoid unnecessary pain and confusion for users that make hostname changes that are otherwise functionally harmless. This also adds logging of the set of hash ring members at level DEBUG to help enable easier debugging of hash ring related situations. Closes-Bug: #1866380 Change-Id: I617fd59de327de05a198f12b75a381f21945afb0 (cherry picked from commit 7145100ee4e732caa532d614e2149ef2a545287a) (cherry picked from commit 588b0484bf6f5fe41514f1428aeaf5613635e35a) (cherry picked from commit 8f8667a8dd0e453eaef8f75a3fff25db62d4cc17) (cherry picked from commit 019e3da75bc6fb171b32a012ce339075fe690ca7)
* | | Include only required fields in ironic node cacheMark Goddard2020-05-011-1/+1
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ironic virt driver maintains a cache of ironic nodes to avoid continually polling the ironic API. Code paths requiring a specific node use a limited set of fields, _NODE_FIELDS, when querying the ironic API for the node. This reduces the memory footprint required by the cache, and the network traffic required to populate it. However, in most cases the cache is populated using a detailed node list operation in _refresh_cache(), which includes all node fields. This change specifies _NODE_FIELDS in the node list operation in _refresh_cache(). We also modify the unit tests to use fake node objects that are representative of the nodes in the cache. Change-Id: Id96e7e513f469b87992ddae1431cce714e91ed16 Related-Bug: #1746209 (cherry picked from commit 8bbad196a7f6a6e2ea093aeee87dfde2154c9358)
* | Unplug VIFs as part of cleanup of networksStephen Finucane2020-03-271-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an instance fails to build, which is possible for a variety of reasons, we may end up in a situation where we have remnants of a plugged VIF (typically files) left on the host. This is because we cleanup from the neutron perspective but don't attempt to unplug the VIF, a call which may have many side-effects depending on the VIF driver. Resolve this by always attempting to unplug VIFs as part of the network cleanup. A now invalid note is also removed and a unit test corrected. Conflicts: nova/tests/unit/compute/test_compute_mgr.py NOTE(stephenfin): Conflict is because we're missing change Ic5cab99944df9e501ba2032eb96911c36304494d ("Port binding based on events during live migration") which we don't want to backport. Closes-Bug: #1831771 Related-Bug: #1830081 Signed-off-by: Stephen Finucane <sfinucan@redhat.com> Change-Id: Ibdbde4ed460a99b0cbe0d6b76e0e5b3c0650f9d9 (cherry picked from commit b3e14931d6aac6ee5776ce1e6974c75a5a6b1823) (cherry picked from commit 3e935325a88bf7a0206ec07bc67383e8be846f15) (cherry picked from commit 265fd4f6bd56c711d7827c2defc993e19a541770) (cherry picked from commit 85521691a843b9606d4a8aa050f4452ba025eb02)
* | libvirt: Ignore DiskNotFound during update_available_resourceMatthew Booth2020-03-041-27/+10
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a previous attempt to fix this in change Id687e11e235fd6c2f99bb647184310dfdce9a08d. However, there were 2 problems with the previous fix: 1. The handling of missing volumes and disks, while typically having the same cause, was inconsistent. 2. It failed to consider the very wide race opportunity in _get_disk_over_committed_size_total between initially fetching the instance list from the DB and later getting disk sizes. Because _get_disk_over_committed_size_total() can be a very long operation, we found that we were reliably hitting this race in CI. It might be possible to fix the race, but this would add unnecessary complication to code which isn't critical. It's far more robust just to log it and ignore it, which is also consistent with the handling of missing volumes. Closes-Bug: #1774249 Change-Id: I48719c02713113a41176b8f5cc3c5831f1284a39 (cherry picked from commit 6198f317be549e6d2bd324a48f226b379556e945) (cherry picked from commit 73d9b6e5f622dc645ac6ad322c836ffbe4045072) (cherry picked from commit 4700b3658e5983a731d0da259365317e230c4a52) (cherry picked from commit 1962633328dc7227dd040c1cf3a9cbe97b36ea37)
* Do not update root_device_name during guest configAlexandre Arents2019-12-101-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | _get_guest_config() is currently updating instance.root_device_name and called in many ways like: _hard_reboot(), rescue(), spawn(), resume(), finish_migration(), finish_revert_migration() It is an issue because root_device_name is initally set during instance build and should remain the same after: manager.py: _do_build_and_run_instance() .. _default_block_device_names() <-here .. driver.spawn() This may lead to edge case, like in rescue where this value can be mistakenly updated to reflect disk bus property of rescue image (hw_disk_bus). Further more, a _get* method should not modify instance object. Note that test test_get_guest_config_bug_1118829 is removed because no more relevant with current code. Conflicts: nova/virt/libvirt/driver.py NOTE: conflict is due to small comment removal patch: I08916cf57d50f766126a99a479d79a27a1bca36f Change-Id: I1787f9717618d0837208844e8065840d30341cf7 Closes-Bug: #1835926 (cherry picked from commit 5e0ed5e7fee3c4c887263a0e9fa847c2dcc5cf3b) (cherry picked from commit 5e858d0cbd672639318543201e251ed00324a9c2) (cherry picked from commit 9f9f8d330a50aec188e55ab8ae921db710e6cc83) (cherry picked from commit c075e3a76d07b8d8ccf201756810567ddf04db60)
* libvirt: Ignore volume exceptions during post_live_migrationLee Yarwood2019-11-011-6/+16
| | | | | | | | | | | | | | | | | | | | | | Previously errors while disconnecting volumes from the source host during post_live_migration within LibvirtDriver would result in the overall failure of the migration. This would also mean that while the instance would be running on the destination it would still be listed as running on the source within the db. This change simply ignores any exceptions raised while attempting to disconnect volumes on the source. These errors can be safely ignored as they will have no impact on the running instance on the destination. In the future Nova could wire up the force and ignore_errors kwargs when calling down into the associated os-brick connectors to help avoid this. Closes-Bug: #1843639 Change-Id: Ieff5243854321ec40f642845e87a0faecaca8721 (cherry picked from commit ac68cffd43a2f5103c28a2d4b31e087c3f5c24b9) (cherry picked from commit ff36b6d97ff289ddc34d7776f6a9141b09eb3ad9) (cherry picked from commit 022ea2819425b5ab3001791455dda36ed638c22d) (cherry picked from commit a07c612ea6fb8553effecef7454caa179589e916)
* Merge "Fixes multi-registry config in Quobyte driver" into stable/queensqueens-em17.0.13Zuul2019-10-291-26/+28
|\
| * Fixes multi-registry config in Quobyte driverSilvan Kaiser2019-10-071-26/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This closes a bug concerning multi-registry configurations for Quobyte volumes due to no longer using the is_mounted() method that failed in that case. Besides, this adds exception handling for the unmount call that is issued on trying to mount an already mounted volume. NOTE: The original commit also added a new feature (fs type based validation) which is omitted in this backport. Closes-Bug: #1737131 Change-Id: Ia5a23ce1123a68608ee2ec6f2ac5dca02da67c59 (cherry picked from commit 05a73c0f3a9f8edf9024f9870279bc6fb7bba2e7) (cherry picked from commit 656aa1cd40570154df606484d31616989b5296aa) (cherry picked from commit c958ad8a68ea9a8ea465d9e3dd889248d9f42481)
* | Merge "Stop sending bad values from libosinfo to libvirt" into stable/queensZuul2019-10-221-8/+19
|\ \
| * | Stop sending bad values from libosinfo to libvirtJohn Garbutt2019-10-111-8/+19
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we try to use either virtio1.0-block or virtio1.0-net it is correctly rejected by libvirt. We get these returned from libosinfo for newer operating systems that support virtio1.0. As we want to support libvirts older than 5.2.0, its best we just request "virtio", please see: https://libvirt.org/formatdomain.html#elementsVirtioTransitional You can see virtio1.0-net and virtio-block being added here: https://gitlab.com/libosinfo/osinfo-db/blob/master/data/os/fedoraproject.org/fedora-23.xml.in#L31 Change-Id: I633faae47ad5a33b27f5e2eef6e0107f60335146 Closes-Bug: #1835400 (cherry picked from commit 6be668e51992df53a4d871bea70bc738a9beacb8) (cherry picked from commit a06922d546a26c9e6550a93cbe0718cf841b6b9f) (cherry picked from commit 89d2a764d75aad6876551663e8350d301336eb59)
* | Merge "Fix rebuild of baremetal instance when vm_state is ERROR" into ↵Zuul2019-10-181-2/+4
|\ \ | | | | | | | | | stable/queens
| * | Fix rebuild of baremetal instance when vm_state is ERRORMathieu Gagné2019-09-081-2/+4
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nova allows rebuild of instance when vm_state is ERROR. [1] The vm_state is restored to ACTIVE only after a successful build. This means rebuilding a baremetal instance using the Ironic driver is impossible because wait_for_active fails if vm_state=ERROR is found. This is a regression introduced in a previous change which added the ability to delete an instance in spawning state. [2] This present change will skip the abort installation logic if task_state is REBUILD_SPAWNING while preserving the previous logic. [1] https://bugs.launchpad.net/nova/+bug/1183946 [2] https://bugs.launchpad.net/nova/+bug/1455000 Change-Id: I857ad7264f1a7ef1263d8a9d4eca491d6c8dce0f Closes-bug: #1735009 (cherry picked from commit 1819718e798fb904644391badc3beb40c181ac39) (cherry picked from commit c21cbf296495b0604cc995d5d17ed164ae8562c5) (cherry picked from commit e8f418909eb0f6c319e28d6a1eac0471a0a9cee8)
* | Merge "Add functional recreate test for regression bug 1825537" into ↵Zuul2019-10-171-0/+7
|\ \ | | | | | | | | | stable/queens
| * | Add functional recreate test for regression bug 1825537Matt Riedemann2019-08-081-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change I2d9ab06b485f76550dbbff46f79f40ff4c97d12f in Rocky (and backported through to Pike) added error handling to the resize_instance and finish_resize methods to revert allocations in placement when a failure occurs. This is OK for resize_instance, which runs on the source compute, as long as the instance.host/node values have not yet been changed to the dest host/node before RPC casting to the finish_resize method on the dest compute. It's OK because the instance is still on the source compute and the DB says so, so any attempt to recover the instance via hard reboot or rebuild will be on the source host. This is not OK for finish_resize because if we fail there and revert the allocations, the instance host/node values are already pointing at the dest compute and by reverting the allocations in placement, placement will be incorrectly tracking the instance usage with the old flavor against the source node resource provider rather than the new flavor against the dest node resource provider - where the instance is actually running and the nova DB says the instance lives. This change adds a simple functional regression test to recreate the bug with a multi-host resize. There is already a same-host resize functional test marked here which will need to be fixed as well. NOTE(mriedem): The import in the test is changed because Ia69fabce8e7fd7de101e291fe133c6f5f5f7056a is not in Queens. Change-Id: Ie9e294db7e24d0e3cbe83eee847f0fbfb7478900 Related-Bug: #1825537 (cherry picked from commit f4bb67210602914e1b9a678419cf22cfbeaf1431) (cherry picked from commit eaa1fc6159ca4437a1e0cbaa77a3da779afb8cb2) (cherry picked from commit 9a977cb28c1c4f2e8d950476fa373326f636dfd6)
* | | Merge "lxc: make use of filter python3 compatible" into stable/queensZuul2019-10-171-2/+3
|\ \ \
| * | | lxc: make use of filter python3 compatibleSean Mooney2019-10-021-2/+3
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | _detect_nbd_devices uses the filter builtin internally to filter valid devices. In python 2, filter returns a list. In python 3, filter returns an iterable or generator function. This change eagerly converts the result of calling filter to a list to preserve the python 2 behaviour under python 3. Closes-Bug: #1840068 Change-Id: I25616c5761ea625a15d725777ae58175651558f8 (cherry picked from commit fc9fb383c16ecb98b1b546f21e7fabb5f00a42ac) (cherry picked from commit e135afec851e33148644d024a9d78e56f962efd4) (cherry picked from commit 944c08ff764c1cb598dbebbad8aa51bbdd0a692c)
* | | Merge "libvirt: Rework 'EBUSY' (SIGKILL) error handling code path" into ↵Zuul2019-10-141-11/+34
|\ \ \ | |/ / |/| | | | | stable/queens
| * | libvirt: Rework 'EBUSY' (SIGKILL) error handling code pathKashyap Chamarthy2019-10-101-11/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change ID I128bf6b939 (libvirt: handle code=38 + sigkill (ebusy) in _destroy()) handled the case where a QEMU process "refuses to die" within a given timeout period set by libvirt. Originally, libvirt sent SIGTERM (allowing the process to clean-up resources), then waited 10 seconds, if the guest didn't go away. Then it sent, the more lethal, SIGKILL and waited another 5 seconds for it to take effect. From libvirt v4.7.0 onwards, libvirt increased[1][2] the time it waits for a guest hard shutdown to complete. It now waits for 30 seconds for SIGKILL to work (instead of 5). Also, additional wait time is added if there are assigned PCI devices, as some of those tend to slow things down. In this change: - Increment the counter to retry the _destroy() call from 3 to 6, thus increasing the total time from 15 to 30 seconds, before SIGKILL takes effect. And it matches the (more graceful) behaviour of libvirt v4.7.0. This also gives breathing room for Nova instances running in environments with large compute nodes with high instance creation or delete churn, where the current timout may not be sufficient. - Retry the _destroy() API call _only_ if MIN_LIBVIRT_VERSION is lower than 4.7.0. [1] https://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=9a4e4b9 (process: wait longer 5->30s on hard shutdown) [2] https://libvirt.org/git/?p=libvirt.git;a=commit;h=be2ca04 ("process: wait longer on kill per assigned Hostdev") Conflicts: nova/virt/libvirt/driver.py (Trivial conflict: Rocky didn't have the QEMU-native TLS feature yet.) Conflicts (stable/queens): nova/tests/unit/virt/libvirt/test_driver.py Related-bug: #1353939 Change-Id: If2035cac931c42c440d61ba97ebc7e9e92141a28 Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com> (cherry picked from commit 10d50ca4e210039aeae84cb9bd5d18895948af54) (cherry picked from commit 75985e25bc147369efb90d4fa9f046631766c14c)
* | | Merge "libvirt: move checking CONF.my_ip to init_host()" into stable/queens17.0.12Zuul2019-08-121-5/+9
|\ \ \
| * | | libvirt: move checking CONF.my_ip to init_host()Artom Lifshitz2019-07-221-5/+9
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Migrations use the libvirt driver's get_host_ip_addr() method to determine the dest_host field of the migration object. get_host_ip_addr() checks whether CONF.my_ip is actually assigned to one of the host's interfaces. It does so by calling get_machine_ips(), which iterates over all of the host's interfaces. If the host has many interfaces, this can take a long time, and introduces needless delays in processing the migration. get_machine_ips() is only used to print a warning, so this patch moves the get_machine_ips() call to a single method in init_host(). This way, a warning is still emitted at compute service startup, and migration progress is not needlessly slowed down. NOTE(artom) While the following paragraph still applies, the poison patch will not be backported. Stubbing out use of netifaces.interfaces() is still a good thing to do, however. This patch also has a chicken and egg problem with the patch on top of it, which poisons use of netifaces.interfaces() in tests. While this patch fixes all the tests that break with that poison, it starts breaking different tests because of the move of get_machine_ips() into init_host(). Therefore, while not directly related to the bug, this patch also preventatively mocks or stubs out any use of get_machine_ips() that will get poisoned with the subsequent patch. (cherry picked from commit 30d8159d4ee51a26a03de1cb134ea64c6c07ffb2) (cherry picked from commit 560317c766afc4a4c4c5017ecfc9ce432fe63ea7) (cherry picked from commit 65d2e455e323a627ef228ed57a1f0c86d8252665) Conflicts: nova/tests/unit/virt/libvirt/fakelibvirt.py Due to 23fd6c2287f1e68336e7752246999de739b9f7c0 which mocked out get_fs_info() at the same place as this patch mocks out get_machine_ips(). nova/virt/libvirt/driver.py Due to cbc28f0d15287dcf24a07f835210affa41c38993 which added _check_file_backed_memory_support() to init_host() at the same place this patch added _check_my_ip(). Closes-bug: 1837075 Change-Id: I58a4038b04d5a9c28927d914e71609e4deea3d9f
* | | Merge "Fix type error on call to mount device" into stable/queensZuul2019-08-101-1/+1
|\ \ \
| * | | Fix type error on call to mount deviceMiguel Herranz2019-08-091-1/+1
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The call in nova.virt.disk.mount.api.Mount.mnt_dev() to nova.privsep.fs.mount() should include the `options` argument to fulfill with the method signature. The test test_do_mount_need_to_specify_fs_type has been modified to check that the caller use the correct signature. Closes-Bug: 1829506 Change-Id: Id14993db6ea33b2da14caa4b58671fc57c182706 Signed-off-by: Miguel Herranz <miguel@midokura.com> (cherry picked from commit d2ef1ce309c28a5416af4cc00662ad6925574004) (cherry picked from commit 8371073ac7a3439ea5b6ce9091a91e77cf88020e) (cherry picked from commit e3cd1d9baa74181cbac1765291ad67654b35721d)
* | | libvirt: flatten rbd images when unshelving an instancersritesh2019-07-292-0/+31
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously attempts to remove the shelved snapshot of an unshelved instance when using the rbd backends for both Nova and Glance would fail. This was due to the instance disk being cloned from and still referencing the shelved snapshot image in Glance, blocking any attempt to remove this image later in the unshelve process. After much debate this change attempts to fix this issue by flattening the instance disk while the instance is being spawned as part of an unshelve. For the rbd imagebackend this removes any reference to the shelved snapshot in Glance allowing this image to be removed. For all other imagebackends the call to flatten the image is currently a no-op. Co-Authored-By: Lee Yarwood <lyarwood@redhat.com> Co-Authored-By: Vladyslav Drok <vdrok@mirantis.com> NOTE(lyarwood): Test conflicts due to Ie3130e104d7ca80289f1bd9f0fee9a7a198c263c and I407034374fe17c4795762aa32575ba72d3a46fe8 not being present in stable/queens. Note that the latter was backported but then reverted via Ibf2b5eeafd962e93ae4ab6290015d58c33024132 resulting in this conflict. Conflicts: nova/tests/unit/virt/libvirt/test_driver.py Closes-Bug: #1653953 Change-Id: If3c9d1de3ce0fe394405bd1e1f0fa08ce2baeda8 (cherry picked from commit d89e7d7857e0ab56c3b088338272c24d0618c07f) (cherry picked from commit e802ede4b30b21c7590620abc142300a57bcf349) (cherry picked from commit e93bc57a73d8642012f759a4ffbe5289112ba490)
* | Merge "libvirt: Avoid using os-brick encryptors when device_path isn't ↵Zuul2019-07-031-0/+8
|\ \ | | | | | | | | | provided" into stable/queens
| * | libvirt: Avoid using os-brick encryptors when device_path isn't providedLee Yarwood2019-04-301-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When disconnecting an encrypted volume the Libvirt driver uses the presence of a Libvirt secret associated with the volume to determine if the new style native QEMU LUKS decryption or original decryption method using os-brick encrytors is used. While this works well in most deployments some issues have been observed in Kolla based environments where the Libvirt secrets are not fully persisted between host reboots or container upgrades. This can lead to _detach_encryptor attempting to build an encryptor which will fail if the associated connection_info for the volume does not contain a device_path, such as in the case for encrypted rbd volumes. This change adds a simple conditional to _detach_encryptor to ensure we return when device_path is not present in connection_info and native QEMU LUKS decryption is available. This handles the specific use case where we are certain that the encrypted volume was never decrypted using the os-brick encryptors, as these require a local block device on the compute host and have thus never supported rbd. It is still safe to build an encryptor and call detach_volume when a device_path is present however as change I9f52f89b8466d036 made such calls idempotent within os-brick. Change-Id: Id670f13a7f197e71c77dc91276fc2fba2fc5f314 Closes-bug: #1821696 (cherry picked from commit 56ca4d32ddf944b541b8a6c46f07275e7d8472bc) (cherry picked from commit c6432ac0212d15b6d8f1620b42937b2abcb66d46) (cherry picked from commit 2c6e59e835b123d6040e2a059aaa98bf9cced392)
* | | Merge "Fix live-migration when glance image deleted" into stable/queensZuul2019-07-021-1/+1
|\ \ \
| * | | Fix live-migration when glance image deletedAlexandre Arents2019-05-301-1/+1
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When block live-migration is run on instance with a deleted glance image, image.cache() is called without specyfing instance disk size parameter, preventing the resize of disk on the target host. Change-Id: Id0f05bb1275cc816d98b662820e02eae25dc57a3 Closes-Bug: #1829000 (cherry picked from commit c1782bacd8461bdd8c833792864e61228fa451f1) (cherry picked from commit b45f47c7577f7b9694ba0a6060312f1a0ec06abd) (cherry picked from commit e06a66fe9f9cb05c2d05cae336e2676fb5e5f3d2)
* | | Merge "Include all network devices in nova diagnostics" into stable/queensZuul2019-07-011-18/+27
|\ \ \
| * | | Include all network devices in nova diagnosticsFrancois Palin2019-06-181-18/+27
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | get_instance_diagnostics expected all interfaces to have a <target> element with a "dev" attribute in the instance XML. This is not the case for VFIO interfaces (<interface type="hostdev">). This caused an IndexError when looping over the interfaces. This patch fixes this issue by retrieving interfaces data directly from the guest XML and adding nics appropriately to the diagnostics object. The new functional test has been left out of this cherry-pick, since a lot of the test code that supports the test is missing and would have to be back-ported just for that one test, including a ramification of other commit dependencies. The functional code change itself is rather simple, and not having this functional test present in Queens is considered to be low risk. Change-Id: I8ef852d449e9e637d45e4ac92ffc5d1abd8d31c5 Closes-Bug: #1821798 (cherry picked from commit ab7c968b6f66404c032f62a952e353f94d3be165) (cherry picked from commit 1d4f64b190afc60b0c2a56de718209869c41cfb3) (cherry picked from commit 19ca8bcc2232e1d81efc349948a21cc1c3fc811d)
* | | Merge "Fix {min|max}_version in ironic Adapter setup" into stable/queensZuul2019-07-011-1/+7
|\ \ \
| * | | Fix {min|max}_version in ironic Adapter setupEric Fried2019-04-301-1/+7
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change If625411f40be0ba642baeb02950f568f43673655 introduced nova.utils.get_ksa_adapter, which accepts min_version and max_version kwargs to be passed through to the ksa Adapter constructor. These are supposed to represent minimum and maximum *major* API versions, min_version was erroneously set to *microversions* when setting up the Adapter for ironicclient. This commit changes it to a major version. (Microversion negotiation is done within ironicclient itself.) Also, this bug went latent for several releases because a) it only seems to be triggered when region_name is given in the conf; but also b) ironicclient has code to discover a reasonable endpoint if passed None. So this change also adds a warning log if we try and fail to discover the endpoint via ksa. Conflicts: nova/tests/unit/virt/ironic/test_client_wrapper.py This was just because the old microversion was 1.37 instead of 1.38. The patch still changes it to 1.0. Change-Id: I34a3f8d4a496217eb01790e2d124111625bf5f85 Closes-Bug: #1825583 (cherry picked from commit 13278be9f265e237fc68ee60acfacaa1df68522e) (cherry picked from commit 35bda4ec385e4c2b3d4cee07467f5077b13b1dd9)
* | | Merge "xenapi/agent: Change openssl error handling" into stable/queensZuul2019-07-011-5/+13
|\ \ \
| * | | xenapi/agent: Change openssl error handlingCorey Bryant2019-04-301-5/+13
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this patch, if the openssl command returned a zero exit code and wrote details to stderr, nova would raise a RuntimeError exception. This patch changes the behavior to only raise a RuntimeError exception when openssl returns a non-zero exit code. Regardless of the exit code a warning will always be logged with stderr details if stderr is not None. Note that processutils.execute will now raise a processutils.ProcessExecutionError exception for any non-zero exit code since we are passing check_exit_code=True, which we convert to a Runtime error. Thanks to Dimitri John Ledkov <xnox@ubuntu.com> and Eric Fried <openstack@fried.cc> for helping with this patch. Conflicts: nova/virt/xenapi/agent.py NOTE(coreycb): The conflict is due to Ibe2f478288db42f8168b52dfc14d85ab92ace74b not being in stable/queens. Change-Id: I212ac2b5ccd93e00adb7b9fe102fcb70857c6073 Partial-Bug: #1771506 (cherry picked from commit 1da71fa4ab1d7d0f580cd5cbc97f2dfd2e1c378a) (cherry picked from commit 64793cf6f77c5ba7c9ea51662d936c7545ffce8c) (cherry picked from commit 82de38ad4ce86c5398538a8635713a86407216d0)
* | | libvirt: Do not reraise DiskNotFound exceptions during resizejichenjc2019-06-281-3/+9
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an instance has VERIFY_RESIZE status, the instance disk on the source compute host has moved to <instance_path>/<instance_uuid>_resize folder, which leads to disk not found errors if the update available resource periodic task on the source compute runs before resize is actually confirmed. Icec2769bf42455853cbe686fb30fda73df791b25 almost fixed this issue but it will only set reraise to False when task_state is not None, that isn't the case when an instance is resized but resize is not yet confirmed. This patch adds a condition based on vm_state to ensure we don't reraise DiskNotFound exceptions while resize is not confirmed. Closes-Bug: 1774249 Co-Authored-By: Vladyslav Drok <vdrok@mirantis.com> Change-Id: Id687e11e235fd6c2f99bb647184310dfdce9a08d (cherry picked from commit 966192704c20d1b4e9faf384c8dafac8ea6e06ea) (cherry picked from commit f1280ab849d20819791f7c4030f570a917d3e91d) (cherry picked from commit fd5c45473823105d8572d7940980163c6f09169c)