summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Update pci stat pools based on PCI device changes20.5.0Hemanth Nakkina2021-01-116-33/+222
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At start up of nova-compute service, the PCI stat pools are populated based on information in pci_devices table in Nova database. The pools are updated only when new device is added or removed but not on any device changes like device type. If an existing device is configured as SRIOV and nova-compute is restarted, the pci_devices table gets updated but the device is still listed under the old pool in pci_tracker.stats.pool (in-memory object). This patch looks for device type updates in existing devices and updates the pools accordingly. Conflicts: nova/tests/functional/libvirt/test_pci_sriov_servers.py nova/tests/unit/virt/libvirt/fakelibvirt.py nova/tests/functional/libvirt/base.py To avoid the conflicts and make the new functional test execute, following changes are performed - Modified the test case to use flavor extra spec pci_passthrough :alias to create a server with sriov port instead of creating a sriov port and passing port information during server creation. - Removed changes in nova/tests/functional/libvirt/base.py as they are required only if neutron sriov port is created in the test case. Change-Id: Id4ebb06e634a612c8be4be6c678d8265e0b99730 Closes-Bug: #1892361 (cherry picked from commit b8695de6da56db42b83b9d9d4c330148766644be) (cherry picked from commit d8b8a8193b6b8228f6e7d6bde68b5ea6bb53dd8b) (cherry picked from commit f58399cf496566e39d11f82a61e0b47900f2eafa)
* [stable-only] Cap bandit to 1.6.2 and raise hacking, flake8 and stestrLee Yarwood2020-12-232-5/+5
| | | | | | | | | | | | | | | | | | | | | | The 1.6.3 [1] release has dropped support for py2 [2] so cap to 1.6.2 when using py2. This change also raises hacking to 1.1.0 in lower-constraints.txt after it was bumped by I35c654bd39f343417e0a1124263ff31dcd0b05c9. This also means that flake8 is bumped to 2.6.0. stestr is also bumped to 2.0.0 as required by oslotest 3.8.0. All of these changes are squashed into a single change to pass the gate. [1] https://github.com/PyCQA/bandit/releases/tag/1.6.3 [2] https://github.com/PyCQA/bandit/pull/615 Depends-On: https://review.opendev.org/c/openstack/devstack/+/768256 Depends-On: https://review.opendev.org/c/openstack/swift/+/766214 Closes-Bug: #1907438 Closes-Bug: #1907756 Change-Id: Ie5221bf37c6ed9268a4aa0737ffcdd811e39360a
* Merge "Change default num_retries for glance to 3" into stable/trainZuul2020-12-014-8/+19
|\
| * Change default num_retries for glance to 3Keigo Noha2020-11-134-8/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the default value of num_retries for glance is 0. It means that the request to glance is sent only one time. On the other hand, neutron and cinder clients set the default value to 3. To align the default value for retry to other components, we should change the default value to 3. Closes-Bug: #1888168 Change-Id: Ibbd4bd26408328b9e1a1128b3794721405631193 (cherry picked from commit 662af9fab6eacb46bcaee38d076d33c2c0f82b9b) (cherry picked from commit 1f9dd694b937cc55a81a64fdce442829f009afb3)
* | Merge "Validate id as integer for os-aggregates" into stable/trainZuul2020-11-282-21/+85
|\ \
| * | Validate id as integer for os-aggregatesJohannes Kulik2020-11-272-21/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the api-ref, the id passed to calls in os-aggregates is supposed to be an integer. No function validated this, so any value passed to these functions would directly reach the DB. While this is fine for SQLite, making a query with a string for an integer column on other databases like PostgreSQL results in a DBError exception and thus a HTTP 500 instead of 400 or 404. This commit adds validation for the id parameter the same way it's already done for other endpoints. Conflicts: nova/api/openstack/compute/aggregates.py Changes: nova/tests/unit/api/openstack/compute/test_aggregates.py NOTE(stephenfin): Conflicts are due to absence of change I4ab96095106b38737ed355fcad07e758f8b5a9b0 ("Add image caching API for aggregates") which we don't want to backport. A test related to this feature must also be removed. Change-Id: I83817f7301680801beaee375825f02eda526eda1 Closes-Bug: 1865040 (cherry picked from commit 2e70a1717f25652912886cbefa3f40e6df908c00)
* | | docs: Clarify configuration steps for PF devicesStephen Finucane2020-11-262-5/+27
|/ / | | | | | | | | | | | | | | | | | | | | | | Devices that report SR-IOV capabilities cannot be used without special configuration - namely, the addition of "'device_type': 'type-PF'" or "'device_type': 'type-VF'" to the '[pci] alias' configuration option. Spell this out in the docs. Change-Id: I4abbe30505a5e4ccba16027addd6d5f45066e31b Signed-off-by: Stephen Finucane <sfinucan@redhat.com> Closes-Bug: #1852727 (cherry picked from commit 810aafc5ec9a7d25b33cf6c137c47b117c91269a)
* | docs: Change order of PCI configuration stepsStephen Finucane2020-11-261-72/+70
| | | | | | | | | | | | | | | | | | | | | | It doesn't really make sense to describe the "higher level" configuration steps necessary for PCI passthrough before describing things like BIOS configuration. Simply switch the ordering. Change-Id: I4ea1d9a332d6585ce2c0d5a531fa3c4ad9c89482 Signed-off-by: Stephen Finucane <sfinucan@redhat.com> Related-Bug: #1852727 (cherry picked from commit 557728abaf0c822f2b1a5cdd4fb2e11e19d8ead7)
* | docs: Rework the PCI passthrough guidesStephen Finucane2020-11-261-75/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rewrite the document, making the following changes: - Remove use of bullet points in favour of more descriptive steps - Cross-reference various configuration options - Emphasise that ``[pci] alias`` must be set on both controller and compute node - Style nits, such as fixing the header style Change-Id: I2ac7df7d235f0af25f5a99bc8f6abddbae2cb3af Signed-off-by: Stephen Finucane <sfinucan@redhat.com> Related-Bug: #1852727 (cherry picked from commit d5259abfe163058b13ad943ad16a5c281c2080e7)
* | Merge "add [libvirt]/max_queues config option" into stable/trainZuul2020-11-134-0/+45
|\ \ | |/ |/|
| * add [libvirt]/max_queues config optionSean Mooney2020-07-084-0/+45
| | | | | | | | | | | | | | | | | | | | | | This change adds a max_queues config option to allow operators to set the maximium number of virtio queue pairs that can be allocated to a virtio network interface. Change-Id: I9abe783a9a9443c799e7c74a57cc30835f679a01 Closes-Bug: #1847367 (cherry picked from commit 0e6aac3c2d97c999451da50537df6a0cbddeb4a6)
* | Merge "Follow up for cherry-pick check for merge patch" into stable/train20.4.1Zuul2020-10-301-1/+1
|\ \
| * | Follow up for cherry-pick check for merge patchmelanie witt2020-10-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a follow up to change I8e4e5afc773d53dee9c1c24951bb07a45ddc2f1a which fixed an issue with validation when the topmost patch after a Zuul rebase is a merge patch. We need to also use the $commit_hash variable for the check for stable-only patches, else it will incorrectly fail because it is checking the merge patch's commit message. Change-Id: Ia725346b65dd5e2f16aa049c74b45d99e22b3524 (cherry picked from commit 1e10461c71cb78226824988b8c903448ba7a8a76) (cherry picked from commit f1e4f6b078baf72e83cd7341c380aa0fc511519e) (cherry picked from commit e676a480544b3fa71fcaa984a658e2131b7538c5)
* | | Merge "compute: Validate a BDMs disk_bus when provided" into stable/trainZuul2020-10-237-4/+37
|\ \ \ | |/ / |/| |
| * | compute: Validate a BDMs disk_bus when providedLee Yarwood2020-09-037-4/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously disk_bus values were never validated and could easily end up being ignored by the underlying virt driver and hypervisor. For example, a common mistake made by users is to request a virtio-scsi disk_bus when using the libvirt virt driver. This however isn't a valid bus and is ignored, defaulting back to the virtio (virtio-blk) bus. This change adds a simple validation in the compute API using the potential disk_bus values provided by the DiskBus field class as used when validating the hw_*_bus image properties. Conflicts: nova/tests/unit/compute/test_compute_api.py NOTE(lyarwood): Conflict as If9c459a9a0aa752c478949e4240286cbdb146494 is not present in stable/train. test_validate_bdm_disk_bus is also updated as Ib31ba2cbff0ebb22503172d8801b6e0c3d2aa68a is not present in stable/train. Closes-Bug: #1876301 Change-Id: I77b28b9cc8f99b159f628f4655d85ff305a71db8 (cherry picked from commit 5913bd889f9d3dfc8d154415e666c821054c229d) (cherry picked from commit fb31ae430a2e4f8869e77e31ea0d6a9478f6aa61)
* | | Merge "Removes the delta file once image is extracted" into stable/trainZuul2020-10-172-1/+7
|\ \ \
| * | | Removes the delta file once image is extractedesubramanian2020-09-112-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When creating a live snapshot of an instance, nova creates a copy of the instance disk using a QEMU shallow rebase. This copy - the delta file - is then extracted and uploaded. The delta file will eventually be deleted, when the temporary working directory nova is using for the live snapshot is discarded, however, until this happens, we will use 3x the size of the image of host disk space: the original disk, the delta file, and the extracted file. This can be problematic when concurrent snapshots of multiple instances are requested at once. The solution is simple: delete the delta file after it has been extracted and is no longer necessary. Change-Id: I15e9975fa516d81e7d34206e5a4069db5431caa9 Closes-Bug: #1881727 (cherry picked from commit d2af7ca7a5c862f53f18c00ac76fc85336fa79e6) (cherry picked from commit e51555b3f0324b8b72a2b3280a1c30e104b6d8ea)
* | | | Merge "Allow tap interface with multiqueue" into stable/trainZuul2020-10-163-18/+111
|\ \ \ \
| * | | | Allow tap interface with multiqueueRodrigo Barbieri2020-10-133-18/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When vif_type="tap" (such as when using calico), attempting to create an instance using an image that has the property hw_vif_multiqueue_enabled=True fails, because the interface is always being created without multiqueue flags. This change checks if the property is defined and passes the multiqueue parameter to create the tap interface accordingly. In case the multiqueue parameter is passed but the vif_model is not virtio (or unspecified), the old behavior is maintained. Change-Id: I0307c43dcd0cace1620d2ac75925651d4ee2e96c Closes-bug: #1893263 (cherry picked from commit 84cfc8e9ab1396ec17abcfc9646c7d40f1d966ae) (cherry picked from commit a69845f3732843ee1451b2e4ebf547d9801e898d)
* | | | | Merge "api: Set min, maxItems for server_group.policies field" into stable/trainZuul2020-10-153-9/+22
|\ \ \ \ \
| * | | | | api: Set min, maxItems for server_group.policies fieldStephen Finucane2020-09-183-9/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As noted inline, the 'policies' field may be a list but it expects one of two items. Change-Id: I34c68df1e6330dab1524aa0abec733610211a407 Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Closes-Bug: #1894966 (cherry picked from commit 32c43fc8017ee89d4e6cdf79086d87735a00f0c0) (cherry picked from commit 781210bd598c3e0ee9bd6a7db5d25688b5fc0131)
* | | | | | Merge "tests: Add regression test for bug 1894966" into stable/trainZuul2020-10-141-0/+41
|\ \ \ \ \ \ | |/ / / / / | | / / / / | |/ / / / |/| | | |
| * | | | tests: Add regression test for bug 1894966Stephen Finucane2020-09-181-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | You must specify the 'policies' field. Currently, not doing so will result in a HTTP 500 error code. This should be a 4xx error. Add a test to demonstrate the bug before we provide a fix. Changes: nova/tests/functional/regressions/test_bug_1894966.py NOTE(stephenfin): Need to update 'super' call to Python 2-compatible variant. Change-Id: I72e85855f621d3a51cd58d14247abd302dcd958b Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Related-Bug: #1894966 (cherry picked from commit 2c66962c7a40d8ef4fab54324e06edcdec1bd716) (cherry picked from commit 94d24e3e8d04488abdebd4969daf98b780125297)
* | | | | Merge "Set different VirtualDevice.key" into stable/trainZuul2020-10-062-4/+94
|\ \ \ \ \
| * | | | | Set different VirtualDevice.keyyingjisun2020-09-222-4/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In vSphere 7.0, the VirtualDevice.key cannot be the same any more. So set different values to VirtualDevice.key Change-Id: I574ed88729d2f0760ea4065cc0e542eea8d20cc2 Closes-Bug: #1892961 (cherry picked from commit a5d153a4c64f6947531823c0df91be5cbc491977) (cherry picked from commit 0ea5bcca9d7bebf835b173c5e75dc89e666bcb99)
* | | | | | Merge "Sanity check instance mapping during scheduling" into stable/trainZuul2020-10-062-17/+120
|\ \ \ \ \ \ | |/ / / / / |/| | | | |
| * | | | | Sanity check instance mapping during schedulingMatt Riedemann2020-09-162-17/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mnaser reported a weird case where an instance was found in both cell0 (deleted there) and in cell1 (not deleted there but in error state from a failed build). It's unclear how this could happen besides some weird clustered rabbitmq issue where maybe the schedule and build request to conductor happens twice for the same instance and one picks a host and tries to build and the other fails during scheduling and is buried in cell0. To avoid a split brain situation like this, we add a sanity check in _bury_in_cell0 to make sure the instance mapping is not pointing at a cell when we go to update it to cell0. Similarly a check is added in the schedule_and_build_instances flow (the code is moved to a private method to make it easier to test). Worst case is this is unnecessary but doesn't hurt anything, best case is this helps avoid split brain clustered rabbit issues. Closes-Bug: #1775934 Change-Id: I335113f0ec59516cb337d34b6fc9078ea202130f (cherry picked from commit 5b552518e1abdc63fb33c633661e30e4b2fe775e)
* | | | | | Merge "Correctly disable greendns" into stable/trainZuul2020-09-211-10/+13
|\ \ \ \ \ \ | |_|/ / / / |/| | | | |
| * | | | | Correctly disable greendnsArtom Lifshitz2020-09-131-10/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we were setting the environment variable to disable greendns in eventlet *after* import eventlet. This has no effect, as eventlet processes environment variables at import time. This patch moves the setting of EVENTLET_NO_GREENDNS before importing eventlet in order to correctly disable greendns. Closes-bug: 1895322 Change-Id: I4deed815c8984df095019a7f61d089f233f1fc66 (cherry picked from commit 7c1d964faab33a02fe2366b5194611252be045fc) (cherry picked from commit 79e6b7fd30a04cdb2374abcaf496b6b5b76084ff)
* | | | | | Merge "libvirt:driver:Disallow AIO=native when 'O_DIRECT' is not available" ↵Zuul2020-09-173-0/+75
|\ \ \ \ \ \ | |_|/ / / / |/| | | | | | | | | | | into stable/train
| * | | | | libvirt:driver:Disallow AIO=native when 'O_DIRECT' is not availableArthur Dayne2020-09-083-0/+75
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because of the libvirt issue[1], there is a bug[2] that if we set cache mode whose write semantic is not O_DIRECT (.i.e unsafe, writeback or writethrough), there will be a problem with the volume drivers (.i.e nova.virt.libvirt.volume.LibvirtISCSIVolumeDriver, nova.virt.libvirt.volume.LibvirtNFSVolumeDriver and so on), which designate native io explicitly. That problem will generate a libvirt xml for the instance, whose content contains ``` ... <disk ... > <driver ... cache='unsafe/writeback/writethrough' io='native' /> </disk> ... ``` In turn, it will fail to start the instance or attach the disk. > When qemu is configured with a block device that has aio=native set, but > the cache mode doesn't use O_DIRECT (i.e. isn't cache=none/directsync or any > unnamed mode with explicit cache.direct=on), then the raw-posix block driver > for local files and block devices will silently fall back to aio=threads. > The blockdev-add interface rejects such combinations, but qemu can't > change the existing legacy interfaces that libvirt uses today. [1]: https://github.com/libvirt/libvirt/commit/058384003db776c580d0e5a3016a6384e8eb7b92 [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1086704 Closes-Bug: #1841363 Change-Id: If9acc054100a6733f3659a15dd9fc2d462e84d64 (cherry picked from commit af2405e1181d70cdf60bcd0e40b3e80f2db2e3a6) (cherry picked from commit 0bd58921a1fcaffcc4fac25f63434c9cab93b061)
* | | | | Merge "post live migration: don't call Neutron needlessly" into stable/trainZuul2020-09-163-5/+18
|\ \ \ \ \
| * | | | | post live migration: don't call Neutron needlesslyArtom Lifshitz2020-09-093-5/+18
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In bug 1879787, the call to network_api.get_instance_nw_info() in _post_live_migration() on the source compute manager eventually calls out to the Neutron REST API. If this fails, the exception is unhandled, and the migrating instance - which is fully running on the destination at this point - will never be updated in the database. This update normally happens later in post_live_migration_at_destination(). The network_info variable obtained from get_instance_nw_info() is used for two things: notifications - which aren't critical - and unplugging the instance's vifs on the source - which is very important! It turns out that at the time of the get_instance_nw_info() call, the network info in the instance info cache is still valid for unplugging the source vifs. The port bindings on the destination are only activated by the network_api.migrate_instance_start() [1] call that happens shortly *after* the problematic get_instance_nw_info() call. In other words, get_instance_nw_info() will always return the source ports. Because of that, we can replace it with a call to instance.get_network_info(). NOTE(artom) The functional test has been excised, as in stable/train the NeutronFixture does not properly support live migration with ports, making the test worthless. The work to support this was done as part of bp/support-move-ops-with-qos-ports-ussuri, and starts at commit b2734b5a9ae8b869fc9e8e229826343da3b47fcb. NOTE(artom) The test_post_live_migration_no_shared_storage_working_correctly and test_post_live_migration_cinder_v3_api unit tests had to be adjusted as part of the backport to pass with the new code. [1] https://opendev.org/openstack/nova/src/commit/d9e04c4ff0b1a9c3383f1848dc846e93030d83cb/nova/network/neutronv2/api.py#L2493-L2522 Change-Id: If0fbae33ce2af198188c91638afef939256c2556 Closes-bug: 1879787 (cherry picked from commit 6488a5dfb293831a448596e2084f484dd0bfa916) (cherry picked from commit 2c949cb3eea9cd9282060da12d32771582953aa2)
* | | | | Merge "Add note and daxio version to the vPMEM document" into stable/trainZuul2020-09-151-0/+11
|\ \ \ \ \ | |_|/ / / |/| | | |
| * | | | Add note and daxio version to the vPMEM documentzhangbailin2020-09-081-0/+11
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the spec of virtual persistent memory consistent with the contents of the admin manual, update the dependency of virtual persistent memory about daxio, and add NOTE for the tested kernel version. Closes-Bug: #1894022 Change-Id: I30539bb47c98a588b95c066a394949d60af9c520 (cherry picked from commit a8b0c6b456a9afdbdfab69daf8c0d3685f8e3084) (cherry picked from commit eae463ca1541dacdc7507899d25e7d3505194363)
* | | | Merge "hardware: Reject requests for no hyperthreads on hosts with HT" into ↵Zuul2020-09-124-16/+99
|\ \ \ \ | |/ / / |/| | | | | | | stable/train
| * | | hardware: Reject requests for no hyperthreads on hosts with HTStephen Finucane2020-08-264-16/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Attempting to boot an instance with 'hw:cpu_policy=dedicated' will result in a request from nova-scheduler to placement for allocation candidates with $flavor.vcpu 'PCPU' inventory. Similarly, booting an instance with 'hw:cpu_thread_policy=isolate' will result in a request for allocation candidates with 'HW_CPU_HYPERTHREADING=forbidden', i.e. hosts without hyperthreading. This has been the case since the cpu-resources feature was implemented in Train. However, as part of that work and to enable upgrades from hosts that predated Train, we also make a second request for candidates with $flavor.vcpu 'VCPU' inventory. The idea behind this is that old compute nodes would only report 'VCPU' and should be useable, and any new compute nodes that got caught up in this second request could never actually be scheduled to since there wouldn't be enough cores from 'ComputeNode.numa_topology.cells.[*].pcpuset' available to schedule to, resulting in rejection by the 'NUMATopologyFilter'. However, if a host was rejected in the first query because it reported the 'HW_CPU_HYPERTHREADING' trait, it could get picked up by the second query and would happily be scheduled to, resulting in an instance consuming 'VCPU' inventory from a host that properly supported 'PCPU' inventory. The solution is simply, though also a huge hack. If we detect that the host is using new style configuration and should be able to report 'PCPU', check if the instance asked for no hyperthreading and whether the host has it. If all are True, reject the request. Change-Id: Id39aaaac09585ca1a754b669351c86e234b89dd9 Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Closes-Bug: #1889633 (cherry picked from commit 9c270332041d6b98951c0b57d7b344fd551a413c) (cherry picked from commit 7ddab327675d36a4ba59d02d22d042d418236336)
* | | | Merge "Add checks for volume status when rebuilding" into stable/trainZuul2020-09-074-4/+95
|\ \ \ \
| * | | | Add checks for volume status when rebuildingsunhao2020-08-294-4/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When rebuilding, we should only allow detaching the volume with 'in-use' status, volume in status such as 'retyping' should not allowed. Conflicts: nova/api/openstack/compute/servers.py nova/compute/api.py nova/tests/unit/api/openstack/compute/test_server_actions.py Modified: nova/tests/unit/compute/test_compute_api.py NOTE(elod.illes): * conflicts in servers.py and test_server_actions.py are due to bug fixing patch I25eff0271c856a8d3e83867b448e1dec6f6732ab is not backported to stable/train * api.py conflict is due to Ic2ad1468d31b7707b7f8f2b845a9cf47d9d076d5 is part of a feature introduced in Ussuri * modification of test_compute_api.py is also required due to patch I25eff0271c856a8d3e83867b448e1dec6f6732ab is not backported and another patch, Ide8eb9e09d22f20165474d499ef0524aefc67854, that cannot be backported to stable/train Change-Id: I7f93cfd18f948134c9cb429dea55740d2cf97994 Closes-Bug: #1489304 (cherry picked from commit 10e9a9b9fc62a3cf72c3717e3621ed95d3cf5519) (cherry picked from commit bcbeae2c605f4ab4ad805dddccac802928a180b6)
* | | | | Merge "tests: Add reproducer for bug #1889633" into stable/trainZuul2020-09-071-0/+71
|\ \ \ \ \ | | |/ / / | |/| | |
| * | | | tests: Add reproducer for bug #1889633Stephen Finucane2020-08-261-0/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the introduction of the cpu-resources work [1], (libvirt) hosts can now report 'PCPU' inventory separate from 'VCPU' inventory, which is consumed by instances with pinned CPUs ('hw:cpu_policy=dedicated'). As part of that effort, we had to drop support for the ability to boot instances with 'hw:cpu_thread_policy=isolate' (i.e. I don't want hyperthreads) on hosts with hyperthreading. This had been previously implemented by marking thread siblings of the host cores used by such an instance as reserved and unusable by other instances, but such a design wasn't possible in world where we had to track resource consumption in placement before landing in the host. Instead, the 'isolate' policy now simply means "give me a host without hyperthreads". This is enforced by hosts with hyperthreads reporting the 'HW_CPU_HYPERTHREADING' trait, and instances with the 'isolate' policy requesting 'HW_CPU_HYPERTHREADING=forbidden'. Or at least, that's how it should work. We also have a fallback query for placement to find hosts with 'VCPU' inventory and that doesn't care about the 'HW_CPU_HYPERTHREADING' trait. This was envisioned to ensure hosts with old style configuration ('[DEFAULT] vcpu_pin_set') could continue to be scheduled to. We figured that this second fallback query could accidentally pick up hosts with new-style configuration, but we are also tracking the available and used cores from those listed in the '[compute] cpu_dedicated_set' as part of the host 'NUMATopology' objects (specifically, via the 'pcpuset' and 'cpu_pinning' fields of the 'NUMACell' child objects). These are validated by both the 'NUMATopologyFilter' and the virt driver itself, which means hosts with new style configuration that got caught up in this second query would be rejected by this filter or by a late failure on the host. (Hint: there's much more detail on this in the spec). Unfortunately we didn't think about hyperthreading. If a host gets picked up in the second request, it might well have enough PCPU inventory but simply be rejected in the first query since it had hyperthreads. In this case, because it has enough free cores available for pinning, neither the filter nor the virt driver will reject the request, resulting in a situation whereby the instance ends up falling back to the old code paths and consuming $flavor.vcpu host cores, plus the thread siblings for each of these cores. Despite this, it will be marked as consuming $flavor.vcpu VCPU (not PCPU) inventory in placement. This patch proves this to be the case, allowing us to resolve the issue later. [1] https://specs.openstack.org/openstack/nova-specs/specs/train/approved/cpu-resources.html Change-Id: I87cd4d14192b1a40cbdca6e3af0f818f2cab613e Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Related-Bug: #1889633 (cherry picked from commit 737e0c0111acd364d1481bdabd9d23bc8d5d6a2e) (cherry picked from commit 49a793c8ee7a9be26e4e3d6ddd097a6ee6fea29d)
* | | | | Merge "Removed the host FQDN from the exception message" into stable/trainZuul2020-09-073-3/+3
|\ \ \ \ \ | |_|_|/ / |/| | | |
| * | | | Removed the host FQDN from the exception messagePraharshitha Metla2020-09-033-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Deletion of an instance after disabling the hypervisor by a non-admin user leaks the host fqdn in fault msg of instance.Removing the 'host' field from the error message of HypervisorUnavaiable cause it's leaking host fqdn to non-admin users. The admin user will see the Hypervisor unavailable exception msg but will be able to figure on which compute host the guest is on and that the connection is broken. Change-Id: I0eae19399670f59c17c9a1a24e1bfcbf1b514e7b Closes-Bug: #1851587 (cherry picked from commit a89ffab83261060bbb9dedb2b8de6297b2d07efd) (cherry picked from commit ff82601204e9d724b3032dc94c49fa5c8de2699b)
* | | | | Merge "compute: Don't delete the original attachment during pre LM rollback" ↵Zuul2020-09-033-8/+24
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | into stable/train
| * | | | | compute: Don't delete the original attachment during pre LM rollbackLee Yarwood2020-08-273-8/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I0bfb11296430dfffe9b091ae7c3a793617bd9d0d introduced support for live migration with cinderv3 volume attachments during Queens. This initial support handled failures in pre_live_migration directly by removing any attachments created on the destination and reverting to the original attachment ids before re-raising the caught exception to the source compute. It also added rollback code within the main _rollback_live_migration method but missed that this would also be called during a pre_live_migration rollback. As a result after a failure in pre_live_migration _rollback_live_migration will attempt to delete the source host volume attachments referenced by the bdm before updating the bdms with the now non-existent attachment ids, leaving the volumes in an `available` state in Cinder as they have no attachment records associated with them anymore. This change aims to resolve this within _rollback_volume_bdms by ensuring that the current and original attachment_ids are not equal before requesting that the current attachment referenced by the bdm is deleted. When called after a failure in pre_live_migration this should result in no attempt being made to remove the original source host attachments from Cinder. Note that the following changes muddy the waters slightly here but introduced no actual changes to the logic within _rollback_live_migration: * I0f3ab6604d8b79bdb75cf67571e359cfecc039d8 reworked some of the error handling in Rocky but isn't the source of the issue here. * Ibe9215c07a1ee00e0e121c69bcf7ee1b1b80fae0 reworked _rollback_live_migration to use the provided source_bdms. * I6bc73e8c8f98d9955f33f309beb8a7c56981b553 then refactored _rollback_live_migration, moving the logic into a self contained _rollback_volume_bdms method. Closes-Bug: #1889108 Change-Id: I9edb36c4df1cc0d8b529e669f06540de71766085 (cherry picked from commit 2102f1834a6ac9fd870bfb457b28a2172f33e281) (cherry picked from commit 034b2fa8fea0e34fed95a2ba728e4387ce4e78de)
* | | | | | Merge "compute: refactor volume bdm rollback error handling" into stable/trainZuul2020-09-032-27/+178
|\ \ \ \ \ \ | |/ / / / /
| * | | | | compute: refactor volume bdm rollback error handlingLee Yarwood2020-08-272-27/+178
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously any exception while rolling back the connection_info and attachment_id of volume bdms would result in the overall attempt to rollback a LM failing. This change refactors this specific bdm rollback logic into two self contained methods that ignore by default errors where possible to allow the LM rollback attempt to continue. Change-Id: I6bc73e8c8f98d9955f33f309beb8a7c56981b553 (cherry picked from commit 9524a5a1b5745f6064f88cbfbf5bbfae3a973bef)
* | | | | | Merge "Add regression tests for bug #1889108" into stable/trainZuul2020-09-031-0/+113
|\ \ \ \ \ \ | |/ / / / / | | | | / / | |_|_|/ / |/| | | |
| * | | | Add regression tests for bug #1889108Lee Yarwood2020-08-271-0/+113
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NOTE(lyarwood): Various changes were required to get this to work on stable/train without backporting a considerable number of changes to the func tests including the following: - Adding TestVolAttachmentsDuringPreLiveMigration to super() as functional tests run against py27 - Adding USE_NEUTRON=True - Adding api_major_version='v2.1' - Adding self.api to self._wait_for_state_change calls - Removing the use of _build_server and crafting the server creation request by hand - Removing the use of _live_migrate and crafting the live migration request by hand Related-Bug: #1889108 Change-Id: Ib9dbc792dc918e7ea45915e2c1dbd96be82ef562 (cherry picked from commit 4c970f499c31370495d84c91a10319d308d13fb9) (cherry picked from commit 6db72002a65f30ac44b8df0a642b400ea272247e)
* | | | Merge "libvirt: Do not reference VIR_ERR_DEVICE_MISSING when libvirt is < ↵20.4.0Zuul2020-08-284-21/+141
|\ \ \ \ | | | | | | | | | | | | | | | v4.1.0" into stable/train