summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Merge "libvirt: Skip fetching the virtual size of block devices" into ↵15.1.3Zuul2018-06-072-3/+57
|\ | | | | | | stable/ocata
| * libvirt: Skip fetching the virtual size of block devicesLee Yarwood2018-06-012-3/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In this latest episode of `Which CI job has lyarwood broken today?!` we find that I464bc2b88123a012cd12213beac4b572c3c20a56 introduced a regression in the nova-lvm experimental job as n-cpu attempted to run qemu-img info against block devices as an unprivileged user. For the time being we should skip any attempt to use this command against block devices until the disk_api layer can make privileged calls using privsep. Conflicts: nova/virt/libvirt/driver.py nova/tests/unit/virt/libvirt/test_driver.py NOTE(lyarwood): Conflicts due to the substantial refactoring of _get_instance_disk_info via I9616a602ee0605f7f1dc1f47b6416f01895e025b, for this change the test has been extended to provide valid XML via the config classes. Closes-bug: #1771700 Change-Id: I9653f81ec716f80eb638810f65e2d3cdfeedaa22 (cherry picked from commit fda48219a378d09a9a363078ba161d7f54e32c0a) (cherry picked from commit 8ea98c56b647526aae7a786531e934eeee7a90a2) (cherry picked from commit 43cac615f6a0a4399c7bf3dda6c2595749f27ace)
* | Merge "libvirt: handle DiskNotFound during update_available_resource" into ↵Zuul2018-06-054-2/+87
|\ \ | |/ | | | | stable/ocata
| * libvirt: handle DiskNotFound during update_available_resourceMatt Riedemann2018-06-014-2/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The update_available_resource periodic task in the compute manager eventually calls through to the resource tracker and virt driver get_available_resource method, which gets the guests running on the hypervisor, and builds up a set of information about the host. This includes disk information for the active domains. However, the periodic task can race with instances being deleted concurrently and the hypervisor can report the domain but the driver has already deleted the backing files as part of deleting the instance, and this leads to failures when running "qemu-img info" on the disk path which is now gone. When that happens, the entire periodic update fails. This change simply tries to detect the specific failure from 'qemu-img info' and translate it into a DiskNotFound exception which the driver can handle. In this case, if the associated instance is undergoing a task state transition such as moving to another host or being deleted, we log a message and continue. If the instance is in steady state (task_state is not set), then we consider it a failure and re-raise it up. Note that we could add the deleted=False filter to the instance query in _get_disk_over_committed_size_total but that doesn't help us in this case because the hypervisor says the domain is still active and the instance is not actually considered deleted in the DB yet. Conflicts: nova/virt/libvirt/driver.py nova/tests/unit/virt/libvirt/test_driver.py NOTE(lyarwood): Conflicts due to the substantial refactoring of _get_instance_disk_info via I9616a602ee0605f7f1dc1f47b6416f01895e025b and removal of _LW etc during Pike, Change-Id: Icec2769bf42455853cbe686fb30fda73df791b25 Closes-Bug: #1662867 (cherry picked from commit 5f16e714f58336344752305f94451e7c7c55742c) (cherry picked from commit 5a4c6913a37f912489543abd5e12a54feeeb89e2) (cherry picked from commit d251b95083731829ba104dc5c7f642dd5097d510)
* | Merge "libvirt: Report the virtual size of RAW disks" into stable/ocataZuul2018-06-052-40/+32
|\ \ | |/
| * libvirt: Report the virtual size of RAW disksLee Yarwood2018-05-152-40/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If642e51a4e186833349a8e30b04224a3687f5594 started to correctly report the actual size of preallocated file based disks but missed that this value was later used as the virtual disk size for RAW disks. This is an issue as nova.virt.libvirt.utils.create_image creates these disks as sparse files with a reported actual size much smaller than the virtual size. During block based LM this then results in disks on the destination being created with a much smaller virtual size than the disk should have leading to errors during the transfer. Conflicts: nova/virt/libvirt/driver.py NOTE(mriedem): The conflict is due to not having change I9616a602ee0605f7f1dc1f47b6416f01895e025b in Ocata. Closes-Bug: #1770640 Change-Id: I464bc2b88123a012cd12213beac4b572c3c20a56 (cherry picked from commit 016986f4706e881fed16c85c8790af4d26a7c351) (cherry picked from commit 2dc9795a0df40c51ef3c79f99e57725db99019c0) (cherry picked from commit 938c0a745325fa73d098c6d5ddd20b2a599f9624)
* | Merge "Avoid showing password in log" into stable/ocataZuul2018-05-311-8/+11
|\ \
| * | Avoid showing password in logjichenjc2018-05-291-8/+11
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | per bug indicated, the password is shown in the log. https://github.com/openstack/oslo.utils/blob/master/oslo_utils/strutils.py#L295 indicated auth_password can be masked through mask_password method. Conflicts: nova/compute/manager.py NOTE(lyarwood): Conflicts caused by Ica323b87fa85a454fca9d46ada3677f18fe50022 and Ifc01dbf98545104c998ab96f65ff8623a6db0f28 not being present in Pike. Additionally If12e7860baad2899380f06144a0270784a5466b8 was not present in Queens but landed in Pike and Ocata as a stable only change. Change-Id: I725eea1866642b40cc6b065ed0e8aefb91ca2889 Closes-Bug: 1761054 (cherry picked from commit 1b61d6c08c7c86834acab45320230824b88d529c) (cherry picked from commit df90dfd5cdf76c65b8d8a539d79e384c82c8428c) (cherry picked from commit 978066fe31a5331f143a05e1fd753c729b2dcf09)
* | Merge "Add ssbd and virt-ssbd flags to cpu_model_extra_flags whitelist" into ↵Zuul2018-05-312-7/+28
|\ \ | | | | | | | | | stable/ocata
| * | Add ssbd and virt-ssbd flags to cpu_model_extra_flags whitelistDan Smith2018-05-252-7/+28
| |/ | | | | | | | | | | | | | | | | | | This adds two other flags to the whitelist of available options to the cpu_model_extra_flags variable related to further variants of Meltdown/Spectre recently published. Related-Bug: #1750829 Change-Id: I72085016c8756ff88a4da722368f62359bcd7080 (cherry picked from commit a27ea0f9100d0061c1cf3b20407095d3cd04df26)
* | Merge "Fix shelving a paused instance" into stable/ocataZuul2018-05-312-6/+19
|\ \ | |/ |/|
| * Fix shelving a paused instanceLeopardMa2018-05-202-6/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible to shelve a paused instance, but in that case the guest is already shutdown, and some hypervisors will fail when trying to perform a clean shutdown of a non-running guest. For example, attempting to shelve a paused libvirt instance will result in this error: libvirtError: Requested operation is not valid: domain is not running Therefore, if the instance is paused, we don't attempt a clean shutdown while shelving. Related Tempest test: https://review.openstack.org/564127/ Closes-Bug: #1745529 Change-Id: I8ca25d9847d50001fbe8513a6c1dba8b697c24e4 (cherry picked from commit d5901f613cf98f61b5253a1568b22af1d9dd1a08)
* | Merge "libvirt: check image type before removing snapshots in ↵15.1.2Zuul2018-05-122-32/+25
|\ \ | | | | | | | | | _cleanup_resize" into stable/ocata
| * | libvirt: check image type before removing snapshots in _cleanup_resizeMatt Riedemann2018-05-102-32/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change Ic683f83e428106df64be42287e2c5f3b40e73da4 added some disk cleanup logic to _cleanup_resize because some image backends (Qcow2, Flat and Ploop) will re-create the instance directory and disk.info file when initializing the image backend object. However, that change did not take into account volume-backed instances being resized will not have a root disk *and* if the local disk is shared storage, removing the instance directory effectively deletes the instance files, like the console.log, on the destination host as well. Change I29fac80d08baf64bf69e54cf673e55123174de2a was made to resolve that issue. However (see the pattern?), if you're doing a resize of a volume-backed instance that is not on shared storage, we won't remove the instance directory from the source host in _cleanup_resize. If the admin then later tries to live migrate the instance back to that host, it will fail with DestinationDiskExists in the pre_live_migration() method. This change is essentially a revert of I29fac80d08baf64bf69e54cf673e55123174de2a and alternate fix for Ic683f83e428106df64be42287e2c5f3b40e73da4. Since the root problem is that creating certain imagebackend objects will recreate the instance directory and disk.info on the source host, we simply need to avoid creating the imagebackend object. The only reason we are getting an imagebackend object in _cleanup_resize is to remove image snapshot clones, which is only implemented by the Rbd image backend. Therefore, we can check to see if the image type supports clones and if not, don't go through the imagebackend init routine that, for some, will recreate the disk. Change-Id: Ib10081150e125961cba19cfa821bddfac4614408 Closes-Bug: #1769131 Related-Bug: #1666831 Related-Bug: #1728603 (cherry picked from commit 8e3385707cb1ced55cd12b1314d8c0b68d354c38) (cherry picked from commit 174764340d3c965d31143b39af4ab2e8ecefe594) (cherry picked from commit c72a0a7665e96219f0301525edc513dda07b320b)
* | | Migrate tempest-dsvm-multinode-live-migration job in-treemelanie witt2018-05-103-0/+94
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This defines the nova-live-migration job, based on the tempest-dsvm-multinode-live-migration job from openstack-zuul-jobs. The branch override parts of the job definition are removed since the job will be defined per-branch now. Conflicts: .zuul.yaml NOTE(mriedem): The conflict is due to not having job nova-tox-functional-py35 in stable/ocata. Change-Id: Idea86d6bb648b1e6fef8813dbe569724ce81a750 (cherry picked from commit 7d8246244db56420e8c3512f991604ffda9bcc12) (cherry picked from commit 0db59f717275a004c2f6c12f1248110dbb425587) (cherry picked from commit a4adf3b7a36c01cf741313290b8994c34cc866bb)
* | libvirt: Make `cpu_model_extra_flags` case-insensitive for realKashyap Chamarthy2018-05-042-1/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we introduced `cpu_model_extra_flags` config attribute (in commit: 6b601b7 -- "libvirt: Allow to specify granular CPU feature flags"), we said it was case-insensitive; but unfortunately I missed to _really_ make it so (despite proposing code for it in one of the revisions). Address that mistake by making `cpu_model_extra_flags` case-insensitive for real, from Nova's point of view. NB: Internally, this patch is normalizing 'extra_flags' to _lower_ casing -- because CPU flags _must_ be lower case from libvirt's point of view. Nova must honour that; otherwise, launching instances with an upper case CPU flag, 'FOO', will fail with: "libvirtError: internal error: Unknown CPU feature FOO". Related-Bug: #1750829 Change-Id: Ia7ff0566a5109c76c009f3a0c6199c4ba419cfb1 Reported-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com> (cherry picked from commit 8e438eda9bb16cdd3b627b93da2435572275b921) (cherry picked from commit f9ab466c6d9bb657409356f29139a9edfbb98747) (cherry picked from commit efa26e8dc24b5e828447fb78d0f54004054ae8b9)
* | Merge "only increment disk address unit for scsi devices" into stable/ocata15.1.1Zuul2018-04-242-3/+167
|\ \
| * | only increment disk address unit for scsi devicesJay Pipes2018-04-192-3/+167
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We were erroneously incrementing the disk address unit attribute for non-scsi devices, which resulted in inconsistent disk device naming and addresses when SCSI devices were used along with non-SCSI devices (like configdrive devices). Also, we ensure that we assign unit number 0 for the boot volume of a boot-from-volume instance. Co-authored-by: Mehdi Abaakouk <sileht@sileht.net> Closes-bug: #1729584 Closes-bug: #1753394 Change-Id: Ia91e2f9c316e25394a0f41dc341d903dfcff6921 (cherry picked from commit 2616b384e642b6eb58eef7da87b6e893f25a949e) (cherry picked from commit f9c66434eea245ae05a449059391515376f5a456) (cherry picked from commit b255e16bd93d9891caff8ffc84b8d7bc2991f90a)
* | | Merge "Don't persist RequestSpec.retry" into stable/ocataZuul2018-04-245-16/+36
|\ \ \
| * | | Don't persist RequestSpec.retryMatt Riedemann2018-04-125-16/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During a resize, the RequestSpec.flavor is updated to the new flavor. If the resize failed on one host and was rescheduled, the RequestSpec.retry is updated for that failed host and mistakenly persisted, which can affect later move operations, like if an admin targets one of those previously failed hosts for a live migration or evacuate operation. This change fixes the problem by not ever persisting the RequestSpec.retry field to the database, since retries are per-request/operation and not something that needs to be persisted. Alternative to this, we could reset the retry field in the RequestSpec.reset_forced_destinations method but that would be slightly overloading the meaning of that method, and the approach taken in this patch is arguably cleaner since retries shouldn't ever be persisted. It should be noted, however, that one advantage to resetting the 'retry' field in the RequestSpec.reset_forced_destinations method would be to avoid this issue for any existing DB entries that have this problem. The related functional regression test is updated to show the bug is now fixed. NOTE(mriedem): This backport also includes change I61f745667f4c003d7e3ca6f2f9a99194930ac892 squashed into it in order to not re-introduce that bug. On Ocata it must be adjusted slightly to pass a string rather than list to _wait_for_migration_status since I752617066bb2167b49239ab9d17b0c89754a3e12 is not in Ocata. NOTE(mriedem): This patch has to pull some changes from two other patches to make live migration work in the fake virt driver: ce893e37f and b97c433f7. Change-Id: Iadbf8ec935565a6d4ccf6f36ef630ab6bf1bea5d Closes-Bug: #1718512 (cherry picked from commit 6647f11dc1aba89f9b0e2703f236a43f31d88079) (cherry picked from commit 757dbd17cf37aecea005dfdc954bf50bbddedd95) (cherry picked from commit 878e99d1f82fbeec840b2dbab8e40c27127d88ba)
* | | | Merge "Add regression test for persisted RequestSpec.retry from failed ↵Zuul2018-04-241-0/+168
|\ \ \ \ | |/ / / | | | / | |_|/ |/| | resize" into stable/ocata
| * | Add regression test for persisted RequestSpec.retry from failed resizeMatt Riedemann2018-04-101-0/+168
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 74ab427d4796d8a386f84a15cc49188c2a60f8f1 in Newton added code to persist changes to the RequestSpec during a resize since the flavor changes. That change inadvertantly also persisted any failed hosts during the resize that are stored in the RequestSpec.retry field during a reschedule. The problem is that later those persisted failed hosts are rejected by the RetryFilter, which can be confusing if an admin is trying to live migrate or evacate the instance to one of those specific hosts. This adds a functional regression test to show the failure, which will be fixed in a separate change that then modifies the assertions. NOTE(mriedem): There are two changes in this backport: 1. The functional test needed to change slightly to disable the DiskFilter since 2fe96819c24eff5a9493a6559f3e8d5b4624a8c9 is not in Ocata. 2. The test needs to use 'api_post' directly on the API client for the confirmResize call since the check_response_status kwarg wasn't in post_server_action until 8dd11ca1b34e1ed58b4 in Pike. Change-Id: Ib8a23db838b0bbf2cfb8123cf6aaa39d00ff0640 Related-Bug: #1718512 (cherry picked from commit 89448bea577b30c40ce39185d14fe14f9c61a0c2) (cherry picked from commit c2dc902e39eb345ebf674ad47422f1e72ec170e6) (cherry picked from commit 004e9acf99964ac78f85d3efbd0a04404bd9a3ef)
* | | Merge "libvirt: Allow to specify granular CPU feature flags" into stable/ocataZuul2018-04-215-2/+200
|\ \ \
| * | | libvirt: Allow to specify granular CPU feature flagsKashyap Chamarthy2018-04-205-2/+200
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent "Meltdown" CVE fixes have resulted in a critical performance penalty[*] that will impact every Nova guest with certain CPU models. I.e. assume you have applied all the "Meltdown" CVE fixes, and performed a cold reboot (explicit stop & start) of all Nova guests, for the updates to take effect. Now, if any guests that are booted with certain named virtual CPU models (e.g. "IvyBridge", "Westmere", etc), then those guests, will incur noticeable performance degradation[*], while being protected from the CVE itself. To alleviate this guest performance impact, it is now important to specify an obscure Intel CPU feature flag, 'PCID' (Process-Context ID) -- for the virtual CPU models that don't already include it (more on this below). To that end, this change will allow Nova to explicitly specify CPU feature flags via a new configuration attribute, `cpu_model_extra_flags`, e.g. in `nova.conf`: ... [libvirt] cpu_mode = custom cpu_model = IvyBridge cpu_model_extra_flags = pcid ... NB: In the first iteration, the choices for `cpu_model_extra_flags` is restricted to only 'pcid' (the option is case-insensitive) -- to address the earlier mentioned guest performance degradation. A future patch will remove this restriction, allowing to add / remove multiple CPU feature flags, thus making way for other useful features. Some have asked: "Why not simply hardcode the 'PCID' CPU feature flag into Nova?" That's not graceful, and more importantly, impractical: (1) Not every Intel CPU model has 'PCID': - The only Intel CPU models that include the 'PCID' capability are: "Haswell", "Broadwell", and "Skylake" variants. - The libvirt / QEMU Intel CPU models: "Nehalem", "Westmere", "SandyBridge", and "IvyBridge" will *not* expose the 'PCID' capability, even if the host CPUs by the same name include it. I.e. 'PCID' needs to be explicitly when using the said virtual CPU models. (2) Magically adding new CPU feature flags under the user's feet impacts live migration. [*] https://groups.google.com/forum/m/#!topic/mechanical-sympathy/L9mHTbeQLNU Conflicts: nova/virt/libvirt/driver.py NOTE(lyarwood): The above is a trivial warning log translation conflict required prior to stable/pike. Closes-Bug: #1750829 Change-Id: I6bb956808aa3df58747c865c92e5b276e61aff44 (cherry picked from commit 6b601b7cf6e7f23077f428353a3a4e81084eb3a1) (cherry picked from commit 98eb85f29c5f0775de480d5ea2946dcbba85fe8a) (cherry picked from commit 56350b977e412d59da96a79290d80c6422fa44b1)
* | | Merge "libvirt: log vm and task state when vif plugging times out" into ↵Zuul2018-04-202-3/+7
|\ \ \ | | | | | | | | | | | | stable/ocata
| * | | libvirt: log vm and task state when vif plugging times outMatt Riedemann2018-04-052-3/+7
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This enhances the warning we log when we timeout waiting for the network-vif-plugged event from Neutron. It includes the vm_state and task_state for context on the instance operation since this is done for more than just the initial create, it's also for things like rebuild. The explicit instance uuid is also removed from the message since sending the instance kwarg to LOG.warning logs the instance uuid automatically. And _LW is also dropped since we no longer translate log messages. Conflicts: nova/virt/libvirt/driver.py NOTE(lyarwood): Conflict is the result of _LW being required in Ocata. Change-Id: I6daf1569cba2cfcb4e8da0b46c91d5251c9c6740 Related-Bug: #1694371 (cherry picked from commit 82ecd93a20f580a3bbec96bf570f2fcc1dc99f33)
* | | Merge "Detach volumes when VM creation fails" into stable/ocataZuul2018-04-203-21/+77
|\ \ \
| * | | Detach volumes when VM creation failsAmeed Ashour2018-04-183-21/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the boot-volume creation fails, the data volume is left in state "in-use", attached to the server which is now in "error" state. The user can't detach the volume because of the server's error state. They can delete the server, which then leaves the volume apparently attached to a server that no longer exists, which is being fixed separately: https://review.openstack.org/#/c/340614/ The only way out of this is to ask an administrator to reset the state of the data volume (this option is not available to regular users by default policy). This change fixes the problem in the compute service such that when the creation fails, compute manager detaches the created volumes before putting the VM into error state. Then you can delete the instance without care about attached volumes. Conflicts: nova/compute/manager.py NOTE(mriedem): The conflict in _delete_instance is due to restructuring the method in I9269ffa2b80e48db96c622d0dc0817738854f602 in Pike. Also note that _LW has to be used for the warning message since those translation markers are still required in Ocata. Change-Id: I8b1c05317734e14ea73dc868941351bb31210bf0 Closes-bug: #1633249 (cherry picked from commit 61f6751a1807d3c3ee76d0351d17a82c6e1a915a) (cherry picked from commit 22164d5118ea04321432432d89877aae91097e81) (cherry picked from commit 4dbe72f976a67d442fd0e0489cadc3bc605ed012)
* | | | Merge "libvirt: Report the allocated size of preallocated file based disks" ↵Zuul2018-04-203-13/+39
|\ \ \ \ | | | | | | | | | | | | | | | into stable/ocata
| * | | | libvirt: Report the allocated size of preallocated file based disksLee Yarwood2018-04-173-13/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At present the Libvirt driver can preallocate file based disks using the fallocate command and importantly the `-n` option. This option allocates blocks on the filesystem past the initial EOF of a given file: ``` $ touch test.img ; fallocate -n -l $(( 1024 * 1024 )) test.img $ ll -lah test.img -rw-rw-r--. 1 stack stack 0 Apr 16 13:28 test.img $ du -h test.img 1.0M test.img ``` This results in a miscalculation of the total disk overcommit for file based (excluding ploop) disks as os.path.getsize is currently used to determine the allocated size of these disks: ``` >>> import os >>> os.path.getsize('test.img') 0 ``` Using the above example the disk overcommit would be reported as 1.0M as the disk appears empty yet will report a potential (virtual) size of 1.0M. However as the required blocks have already been allocated on the filesystem the host will report disk_available_least as missing an additional 1.0M, essentially doubling the allocation for each disk. To correct this the allocated size of file based (excluding ploop) disks is reported using `disk_size` from the `qemu-img info` command. This should ensure blocks allocated past the EOF of the file are taken into account and correctly reported as allocated. A future change should ultimately remove the use of the `-n` option with fallocate, however as this would not help disks that have already been allocated this has not been included in this change to simplify backports. Conflicts: nova/tests/unit/virt/libvirt/test_driver.py NOTE(lyarwood): I11e329ac5f5fe4b9819fefbcc32ff1ee504fc58b made get_domain private in Queens. Change-Id: If642e51a4e186833349a8e30b04224a3687f5594 Closes-bug: #1764489 (cherry picked from commit 23bd8f62634707fc9896a38ff4dae606c89c6c4b) (cherry picked from commit 2d50f6e7854543849c4cb7641ae6b88fe04cb6f6) (cherry picked from commit d88b75e81eabfbd463007f6a4f27e6966a466530)
* | | | | Merge "Increase cpu time for image conversion" into stable/ocataZuul2018-04-201-1/+1
|\ \ \ \ \
| * | | | | Increase cpu time for image conversionSean Dague2018-04-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Apparently the current 8 second timeout on qemu-info may not be sufficient if snapshot images are > 120G in size. This bumps that to 30s instead to provide a backstop, but not hurt people with large snapshots. Change-Id: I877b9401a671904a13bb07bae3636b72d7d20df8 Closes-Bug: #1705340 (cherry picked from commit 011ae614d5c5fb35b2e9c22a9c4c99158f6aee20)
* | | | | | Merge "Handle spawning error on unshelving" into stable/ocataZuul2018-04-202-0/+91
|\ \ \ \ \ \
| * | | | | | Handle spawning error on unshelvingShoham Peller2018-04-182-0/+91
| |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If spawning fails when unshelving, terminate the volumes' connections with the node, and remove the node reference from the instance entry. Co-Authored-By: Matt Riedemann <mriedem.os@gmail.com> Conflicts: nova/compute/manager.py nova/tests/unit/compute/test_shelve.py NOTE(mriedem): Conflicts are due to the following changes not in Ocata: I2394b0588bc7210efd16456af2fc12dc7071681c Id2c7b7b3b4abda8a3b878fdee6806bcfe096e12e I3a3caa4c566ecc132aa2699f8c7e5987bbcc863a Closes-Bug: 1627694 Change-Id: I8cfb2280d956d452ccad1fc711bd814b7258147f (cherry picked from commit dcdd2c9832c7c60fe9163cd744ca2b5acfe16bcc) (cherry picked from commit 73c3e4969c108f2eefc77b57021d9b3e17afdc8e) (cherry picked from commit 06946b7f8ad310e01eff895362ad0f3a164636f5)
* | | | | | Merge "Set error state after failed evacuation" into stable/ocataZuul2018-04-203-12/+29
|\ \ \ \ \ \
| * | | | | | Set error state after failed evacuationElőd Illés2018-04-173-12/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When evacuation fails with NoValidHost, the migration status remains 'accepted' instead of 'error'. This causes problem in case the compute service starts up again and looks for evacuations with status 'accepted', as it then removes the local instances for those evacuations even though the instance was never actually evacuated to another host. Conflicts: nova/conductor/manager.py NOTE(mriedem): The conflict is due to not having change I6590f0eda4ec4996543ad40d8c2640b83fc3dd9d in Ocata. Change-Id: I06d78c744fa75ae5f34c5cfa76bc3c9460767b84 Closes-Bug: #1713783 (cherry picked from commit a8ebf5f1aac080854704e27146e8c98b053c6224) (cherry picked from commit a3f286f43d866cd343d26d9bafadecab1c225e4b)
* | | | | | | Merge "Modify incorrect debug meaasge in _inject_data" into stable/ocataZuul2018-04-201-3/+3
|\ \ \ \ \ \ \
| * | | | | | | Modify incorrect debug meaasge in _inject_dataguanzuoyu2018-04-181-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It attempts to show inject_info but can not format by %(info) correctly,so we can use string directly. Change-Id: Ib9c67757e0b1d833cf722eb8dfcc47a4414b6fb6 Closes-Bug: #1729294 (cherry picked from commit 76a4e42e8bb98c8d5f30d8fef7b0fafd3976b626) (cherry picked from commit 4402da85cd5fe03809989373b4670adc7beb40f0)
* | | | | | | | Merge "libvirt: Block swap volume attempts with encrypted volumes prior to ↵Zuul2018-04-2010-14/+106
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | Queens" into stable/ocata
| * | | | | | | | libvirt: Block swap volume attempts with encrypted volumes prior to QueensLee Yarwood2018-04-1810-14/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to Queens any attempt to swap between encrypted volumes would result in unencrypted data being written to the new volume. This unencrypted data would then be overwritten the next time the volume was attached to an instance as Nova no longer identified the volume as encrypted, resulting in the volume being reformatted. This stable only change uses limited parts of the following changes to block all swap_volume attempts with encrypted volumes prior to Queens where this was resolved by Ica323b87fa85a454fca9d46ada3677f18 and also blocked when using QEMU to decrypt LUKS volumes by Ibfa64f18bbd2fb70db7791330ed1a64fe61c1. Ica323b87fa85a454fca9d46ada3677f18fe50022 The request context is provided to swap_volume in order to look up the encryption metadata of a volume. Ibfa64f18bbd2fb70db7791330ed1a64fe61c1355 Attempts to swap from an encrypted volume are blocked with a NotImplementedError exception raised. I258127fdcd011ccec721d5ff62eb7f128f130336 Attempts to swap from an unencrypted volume to an encrypted volume are also blocked with a NotImplementedError exception raised. Ie02d298cd92d5b5ebcbbcd2b0e8be01f197bfafb The serial of a volume is used as the id if connection_info for the volume doesn't contain the volume_id key. Required to avoid bug #1746609. Conflicts: nova/tests/unit/compute/test_compute_mgr.py nova/tests/unit/virt/libvirt/test_driver.py NOTE(lyarwood): Conflict due to cinderv3 support for swap_volume not being present in stable/ocata via I4b8bd01f1ffe2640fe7313213bf853d2e1bef9dd. Closes-bug: #1739593 Change-Id: If12e7860baad2899380f06144a0270784a5466b8 (cherry picked from commit 5b64a1936122eeb35f37a09f9d38159e1a224c58)
* | | | | | | | | Merge "Pass the correct image to build_request_spec in ↵Zuul2018-04-202-1/+7
|\ \ \ \ \ \ \ \ \ | | |_|/ / / / / / | |/| | | | | | | | | | | | | | | | conductor.rebuild_instance" into stable/ocata
| * | | | | | | | Pass the correct image to build_request_spec in conductor.rebuild_instanceMatt Riedemann2018-04-172-1/+7
| | |_|_|/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we're calling build_request_spec in conductor.rebuild_instance, it's because we are evacuating and the instance is so old it does not have a request spec. We need the request_spec to pass to the scheduler to pick a destination host for the evacuation. For evacuate, nova-api does not pass any image reference parameters, and even if it did, those are image IDs, not an image meta dict that build_request_spec expects, so this code has just always been wrong. This change fixes the problem by passing a primitive version of the instance.image_meta which build_request_spec will then return back to conductor and that gets used to build a RequestSpec object from primitives. It's important to use the correct image meta so that the scheduler can properly filter hosts using things like the AggregateImagePropertiesIsolation and ImagePropertiesFilter filters. Conflicts: nova/conductor/manager.py NOTE(mriedem): Conflict is due to e211fca55a11c80058d5d78e31dc3ad466d7edfd not being in Ocata. Change-Id: I0c8ce65016287de7be921c312493667a8c7f762e Closes-Bug: #1727855 (cherry picked from commit d2690d6b038e200efed05bf7773898a0a8bb01d7) (cherry picked from commit dc44c48943f8ce66bbbdc2050ed2dc47778cf477)
* | | | | | | | Merge "Revert "Proper error handling by _ensure_resource_provider"" into ↵Zuul2018-04-208-75/+37
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | stable/ocata
| * | | | | | | | Revert "Proper error handling by _ensure_resource_provider"Lee Yarwood2018-04-198-75/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 44912b5fefd94d3cbff1e308fecb0589e7928932. Change-Id: I92a9b97be2afea06a2ea6f05749e9b32d7514b15
* | | | | | | | | Merge "unquiesce instance on volume snapshot failure" into stable/ocataZuul2018-04-202-26/+61
|\ \ \ \ \ \ \ \ \
| * | | | | | | | | unquiesce instance on volume snapshot failureEric M Gonzalez2018-02-192-26/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds an exception catch to "snapshot_volume_backed()" of compute/api.py that catches (at the moment) _all_ exceptions from the underlying cinderclient. Previously, if the instance is quiesced ( frozen filesystem ) then the exception will break execution of the function, skipping the needed unquiesce, and leave the instance in a frozen state. Now, the exception catch will unquiesce the instance if it was prior to the failure. Got a unit test in place with the help of Matt Riedemann. test_snapshot_volume_backed_with_quiesce_create_snap_fails NOTE(mriedem): There is a small change in Ocata since we have to use the _LI translation markers for the new INFO log level messages. Change-Id: I60de179c72eede6746696f29462ee9d805dace47 Closes-bug: #1731986 (cherry picked from commit bca425a33f52584051348a3ace832be8151299a7) (cherry picked from commit 7ab98b5345f4a023bd209e714cd0aa60b3a31d48) (cherry picked from commit 17b9b900a249f6f432552fe27a9cdd54c1495b99)
* | | | | | | | | | Merge "Refactor a test method including 3 test cases" into stable/ocataZuul2018-04-191-117/+114
|\ \ \ \ \ \ \ \ \ \ | | |_|_|/ / / / / / | |/| | | | | | | |
| * | | | | | | | | Refactor a test method including 3 test casesTakashi NATSUME2018-04-161-117/+114
| | |_|_|_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The test_swap_volume_volume_api_usage method in test_compute_mgr.py has 3 test cases (1 normal, 2 errors). So devide it to the following 3 test methods. * test_swap_volume_volume_api_usage * test_swap_volume_with_compute_driver_exception * test_swap_volume_with_initialize_connection_exception Change-Id: I08278c10104786a12835ab64a3602503901285bc (cherry picked from commit 1a55fad16a90599119a5106a7c7014f81ecee845)
* | | | | | | | | Merge "Clean up volumes on boot failure" into stable/ocataZuul2018-04-192-0/+5
|\ \ \ \ \ \ \ \ \ | | |_|_|_|_|_|/ / | |/| | | | | | |
| * | | | | | | | Clean up volumes on boot failureyuanyue2018-04-182-0/+5
| | |_|_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When instance fails to spawn, nova would just shutdown the instance, during which volumes would be detached. If not retry, volumes created by the boot task would be left. In fact, volumes should be cleaned up. This patch fixes this. Change-Id: I877d8eff8d2fecde0cd16b01e80bff41bdb8d88a Closes-Bug: #1699469 (cherry picked from commit 1a32bfd2ca1541af935b13b86728f124bba3d2c4)