delta/openstack/nova.git - opendev.org: openstack/nova.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	[compute] always set instance.host in post_livemigration	Sean Mooney	2022-11-28	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change add a new _post_live_migration_update_host function that wraps _post_live_migration and just ensures that if we exit due to an exception instance.host is set to the destination host. when we are in _post_live_migration the guest has already started running on the destination host and we cannot revert. Sometimes admins or users will hard reboot the instance expecting that to fix everything when the vm enters the error state after the failed migrations. Previously this would end up recreating the instance on the source node leading to possible data corruption if the instance used shared storage. Change-Id: Ibc4bc7edf1c8d1e841c72c9188a0a62836e9f153 Partial-Bug: #1628606 (cherry picked from commit 8449b7caefa4a5c0728e11380a088525f15ad6f5) (cherry picked from commit 643b0c7d35752b214eee19b8d7298a19a8493f6b) (cherry picked from commit 17ae907569e45cc0f5c7da9511bb668a877b7b2e) (cherry picked from commit 15502ddedc23e6591ace4e73fa8ce5b18b5644b0) (cherry picked from commit 43c0e40d288960760a6eaad05cb9670e01ef40d0) (cherry picked from commit 0ac64bba8b7aba2fb358e00e970e88b32d26ef7e)
*	Adds a repoducer for post live migration fail	Amit Uniyal	2022-11-28	1	-0/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds a regression test or repoducer for post live migration fail at destination, the possible casue can be fail to get instance network info or block device info changes: adds return server from _live_migrate in _integrated_helpers NOTE(auniyal): Differences * Replaced GlanceFixture with fake.stub_out_image_service in regression test, as GlanceFixture does not exist in Ussuri Related-Bug: #1628606 Change-Id: I48dbe0aae8a3943fdde69cda1bd663d70ea0eb19 (cherry picked from commit a20baeca1f5ebb0dfe9607335a6986e9ed0e1725) (cherry picked from commit 74a618a8118642c9fd32c4e0d502d12ac826affe) (cherry picked from commit 71e5a1dbcc22aeaa798d3d06ce392cf73364b8db) (cherry picked from commit 5efcc3f695e02d61cb8b881e009308c2fef3aa58) (cherry picked from commit ed1ea71489b60c0f95d76ab05f554cd046c60bac) (cherry picked from commit 6dda4f7ca3f25a11cd0178352ad24fe2e8b74785)
*	For evacuation, ignore if task_state is not None	Amit Uniyal	2022-11-09	1	-15/+8
\| \| \| \| \| \| \| \| \| \| \| \|	ignore instance task state and continue with vm evacutaion. Closes-Bug: #1978983 Change-Id: I5540df6c7497956219c06cff6f15b51c2c8bc29d (cherry picked from commit db919aa15f24c0d74f3c5c0e8341fad3f2392e57) (cherry picked from commit 6d61fccb8455367aaa37ae7bddf3b8befd3c3d88) (cherry picked from commit 8e9aa71e1a4d3074a94911db920cae44334ba2c3) (cherry picked from commit 0b8124b99601e1aba492be8ed564f769438bd93d) (cherry picked from commit 3224ceb3fffc57d2375e5163d8ffbbb77529bc38)
*	add regression test case for bug 1978983	Amit Uniyal	2022-11-09	1	-0/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change add a repoducer test for evacuating a vm in the powering-off state Conflicts: nova/tests/functional/integrated_helpers.py nova/tests/functional/test_servers.py Difference: nova/tests/functional/regressions/test_bug_1978983.py NOTE(auniyal): Conflicts are due to the following changes that are not in Ussuri: * I147bf4d95e6d86ff1f967a8ce37260730f21d236 (Cyborg evacuate support) * Ia3d7351c1805d98bcb799ab0375673c7f1cb8848 (Functional tests for NUMA live migration) NOTE(auniyal): Differences * Replaced GlanceFixture with fake.stub_out_image_service in regression test, as GlanceFixture does not exist in Ussuri Related-Bug: #1978983 Change-Id: I5540df6c7497956219c06cff6f15b51c2c8bc299 (cherry picked from commit 5904c7f993ac737d68456fc05adf0aaa7a6f3018) (cherry picked from commit 6bd0bf00fca6ac6460d70c855eded3898cfe2401) (cherry picked from commit 1e0af92e17f878ce64bd16e428cb3c10904b0877) (cherry picked from commit b57b0eef218fd7604658842c9277aad782d11b45) (cherry picked from commit b6c877377f58ccaa797af3384b199002726745ea)
*	Handle instance = None in _local_delete_cleanup	melanie witt	2021-03-05	1	-18/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change I4d3193d8401614311010ed0e055fcb3aaeeebaed added some additional local delete cleanup to prevent leaking of placement allocations. The change introduced a regression in our "delete while booting" handling as the _local_delete_cleanup required a valid instance object to do its work and in two cases, we could have instance = None from _lookup_instance if we are racing with a create request and the conductor has deleted the instance record while we are in the middle of processing the delete request. This handles those scenarios by doing two things: (1) Changing the _local_delete_cleanup and _update_queued_for_deletion methods to take an instance UUID instead of a full instance object as they really only need the UUID to do their work (2) Saving a copy of the instance UUID before doing another instance lookup which might return None and passing that UUID to the _local_delete_cleanup and _update_queued_for_deletion methods Closes-Bug: #1914777 Change-Id: I03cf285ad83e09d88cdb702a88dfed53c01610f8 (cherry picked from commit 123f6262f63477d3f50dfad09688978e044bd9e0) (cherry picked from commit 35112d7667cee9fc33db660ee241164b38468c31)
*	Add regression test for bug 1914777	melanie witt	2021-03-05	1	-0/+152
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds two tests to cover a regression where racing create and delete requests could result in a user receiving a 500 error when attempting to delete an instance: Unexpected exception in API method: AttributeError: 'NoneType' object has no attribute 'uuid' Related-Bug: #1914777 NOTE(melwitt): The difference from the cherry picked change is because I6daea47988181dfa6dde3d9c42004c0ecf6ae87a is not in Ussuri. Change-Id: I8249c572c6f727ef4ca434843106b9b57c47e585 (cherry picked from commit f7975d640ce1e9fa06d045d35177f07451716f0c) (cherry picked from commit 4f17ea2f7dedb67802c2a8bc0caa64ff66fd2c9c)
*	Default user_id when not specified in check_num_instances_quota	melanie witt	2021-02-23	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Quotas.check_deltas method needs a user_id keyword arg in order to scope a quota check to a particular user. However, when we call check_num_instances_quota we don't pass a project_id or user_id because at the time of the quota check, we have not yet created an instance record and thus will not use that to determine the appropriate project and user. Instead, we should rely on the RequestContext.project_id and RequestContext.user_id as defaults in this case, but check_num_instances_quota only defaults project_id and not user_id. check_num_instances_quota should also default user_id to the RequestContext.user_id when user_id is not explicitly passed. check_num_instances_quota should also check whether any per-user quota limits are defined for instance-related resources before passing along the user_id to scope resource counting and limit checking. Counting resources across a user is costly, so we should avoid it if it's not needed. Closes-Bug: #1893284 Change-Id: I3cfb1edc30b0bda4671e0d2cc2a8993055dcc9ff (cherry picked from commit 4c11d5467a30506a82dd5d32dd22b8958a187c0b)
*	Add regression test for bug 1893284	melanie witt	2021-02-23	1	-0/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a regression test for a bug where quota limit checking during server creates is not properly scoped per-user when per-user quota has been defined. As a result, users who should be able to create a server are rejected with a 403 "quota exceeded" error when they should be allowed to create a server because servers owned by other users in the project are incorrectly being counted for the current user. Related-Bug: #1893284 Change-Id: I615ada45ffcbac081474c0a0cf005afdb8eec953 (cherry picked from commit 38bc8b871a4b954f6de84f10dc1e8beb21a7c9ed)
*	Merge "Set migrate_data.vifs only when using multiple port bindings" into ↵	Zuul	2021-02-17	1	-9/+39
\|\ \| \| \| \| \| \|	stable/ussuri
\| *	Set migrate_data.vifs only when using multiple port bindings	root	2020-11-04	1	-9/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the rocky cycle nova was enhanced to support the multiple port binding live migration workflow when neutron supports the binding-extended API extension. When the migration_data object was extended to support multiple port bindings, populating the vifs field was used as a sentinel to indicate that the new workflow should be used. In the train release I734cc01dce13f9e75a16639faf890ddb1661b7eb (SR-IOV Live migration indirect port support) broke the semantics of the migrate_data object by unconditionally populating the vifs field This change restores the rocky semantics, which are depended on by several parts of the code base, by only conditionally populating vifs if neutron supports multiple port bindings. Changes to patch: - unit/virt/libvirt/fakelibvirt.py: Include partial pick from change Ia3d7351c1805d98bcb799ab0375673c7f1cb8848 to add the jobStats, complete_job and fail_job to fakelibvirt. The full change was not cherry-picked as it was part of the numa aware live migration feature in Victoria. Co-Authored-By: Sean Mooney <work@seanmooney.info> Change-Id: Ia00277ac8a68a635db85f9e0ce2c6d8df396e0d8 Closes-Bug: #1888395 (cherry picked from commit b8f3be6b3c5af91d215b4a0cecb9be098e8d8799)
* \|	Merge "add functional regression test for bug #1888395" into stable/ussuri	Zuul	2021-02-17	1	-0/+101
\|\ \ \| \|/
\| *	add functional regression test for bug #1888395	Sean Mooney	2020-10-25	1	-0/+101
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change adds a funcitonal regression test that assert the broken behavior when trying to live migrate with a neutron backend that does not support multiple port bindings. Conflicts/Changes: nova/tests/functional/regressions/test_bug_1888395.py: - specify api major version to allow block_migration 'auto' - use TempDir fixture for instances path nova/tests/unit/virt/libvirt/fake_imagebackend.py: - include portion of change Ia3d7351c1805d98bcb799ab0375673c7f1cb8848 which stubs out the is_file_in_instance_path method. That was included in a feature patch set so just pulling the necessary bit. Change-Id: I470a016d35afe69809321bd67359f466c3feb90a Partial-Bug: #1888395 (cherry picked from commit 71bc6fc9b89535679252ffe5a737eddad60e4102)
* \|	Set instance host and drop migration under lock	Balazs Gibizer	2021-01-29	1	-10/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The _update_available_resources periodic makes resource allocation adjustments while holding the COMPUTE_RESOURCE_SEMAPHORE based on the list of instances assigned to this host of the resource tracker and based on the migrations where the source or the target host is the host of the resource tracker. So if the instance.host or the migration context changes without holding the COMPUTE_RESOURCE_SEMAPHORE while the _update_available_resources task is running there there will be data inconsistency in the resource tracker. This patch makes sure that during evacuation the instance.host and the migration context is changed while holding the semaphore. Conflicts: nova/tests/unit/compute/test_compute_mgr.py due to I147bf4d95e6d86ff1f967a8ce37260730f21d236 is not in stable/ussuri Change-Id: Ica180165184b319651d22fe77e076af036228860 Closes-Bug: #1896463 (cherry picked from commit 7675964af81472f3dd57db952d704539e61a3e6e) (cherry picked from commit 3ecc098d28addd8f3e1da16b29940f4337209e62)
* \|	Reproduce bug 1896463 in func env	Balazs Gibizer	2021-01-29	1	-0/+233
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a race condition between the rebuild and the _update_available_resource periodic task on the compute. This patch adds a reproducer functional test. Unfortunately it needs some injected sleep to make the race happen in a stable way. This is suboptimal but only adds 3 seconds of slowness to the test execution. stable/ussuri specific changes: * GlanceFixture does not exist as I6daea47988181dfa6dde3d9c42004c0ecf6ae87a is not in ussuri * MigrationList.get_in_progress_and_error was added in I422a907056543f9bf95acbffdd2658438febf801 which is not in ussuri Change-Id: Id0577bceed9808b52da4acc352cf9c204f6c8861 Related-Bug: #1896463 (cherry picked from commit 3f348602ae4a40c52c7135b2cb48deaa6052c488) (cherry picked from commit d768cdbb88d0b0b3ca38623c4bb26d5eabdf1596)
* \|	Merge "compute: Skip cinder_encryption_key_id check when booting from ↵	Zuul	2020-11-23	1	-18/+6
\|\ \ \| \| \| \| \| \| \| \| \|	volume" into stable/ussuri
\| * \|	compute: Skip cinder_encryption_key_id check when booting from volume	Lee Yarwood	2020-09-17	1	-18/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Idf84ccff254d26fa13473fe9741ddac21cbcf321 added this check in order for Nova to avoid booting encrypted images created by Cinder as there is currently no support for using such images (rotating keys etc). The check however missed the slightly convoluted use case where this image property is found against a volume after the volume in question is created using an encrypted image created by cinder from an encrypted volume. In other words: - Cinder creates an encrypted volume A - Glance creates an encrypted image A from volume A - Cinder creates an encrypted volume B from image A - Nova attempts to boot an instance using volume B Note that Nova may request the creation of volume B or a user could also do this directly through Cinder. As such this change simply ensures that the instance isn't booting from a volume when preforming the check as it is only valid when booting from an image. Closes-Bug: #1895696 Change-Id: Ic92cab7362fa25050e5bbef5c3e360108365b5c7 (cherry picked from commit f9b67893acf94c06fd41be36b80b99788dc77e48)
* \| \|	Merge "Add regression test for bug #1895696" into stable/ussuri	Zuul	2020-11-23	1	-0/+160
\|\ \ \ \| \|/ / \| \| / \| \|/ \|/\|
\| *	Add regression test for bug #1895696	Lee Yarwood	2020-09-17	1	-0/+160
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	NOTE(lyarwood): I6daea47988181dfa6dde3d9c42004c0ecf6ae87a isn't present on stable/ussuri so this test now uses the fake_image_service. 1f81c086579a0867f9aecd2d5d7cc531b4d3347f also isn't present on stable/ussuri so this test also uses the CinderFixture directly. Related-Bug: #1895696 Change-Id: I15271fb0b8de7f1184acddd607d605837e2eb7d4 (cherry picked from commit e76cccddd3ac64d8cc3e7e63e1772d5d23f20669)
* \|	Merge "api: Set min, maxItems for server_group.policies field" into ↵	Zuul	2020-10-06	1	-2/+1
\|\ \ \| \| \| \| \| \| \| \| \|	stable/ussuri
\| * \|	api: Set min, maxItems for server_group.policies field	Stephen Finucane	2020-09-18	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As noted inline, the 'policies' field may be a list but it expects one of two items. Change-Id: I34c68df1e6330dab1524aa0abec733610211a407 Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Closes-Bug: #1894966 (cherry picked from commit 32c43fc8017ee89d4e6cdf79086d87735a00f0c0)
* \| \|	Merge "tests: Add regression test for bug 1894966" into stable/ussuri	Zuul	2020-10-06	1	-0/+41
\|\ \ \ \| \|/ /
\| * \|	tests: Add regression test for bug 1894966	Stephen Finucane	2020-09-18	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	You must specify the 'policies' field. Currently, not doing so will result in a HTTP 500 error code. This should be a 4xx error. Add a test to demonstrate the bug before we provide a fix. Change-Id: I72e85855f621d3a51cd58d14247abd302dcd958b Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Related-Bug: #1894966 (cherry picked from commit 2c66962c7a40d8ef4fab54324e06edcdec1bd716)
* \| \|	Move revert resize under semaphore	Stephen Finucane	2020-09-11	1	-11/+7
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As discussed in change I26b050c402f5721fc490126e9becb643af9279b4, the resource tracker's periodic task is reliant on the status of migrations to determine whether to include usage from these migrations in the total, and races between setting the migration status and decrementing resource usage via 'drop_move_claim' can result in incorrect usage. That change tackled the confirm resize operation. This one changes the revert resize operation, and is a little trickier due to kinks in how both the same-cell and cross-cell resize revert operations work. For same-cell resize revert, the 'ComputeManager.revert_resize' function, running on the destination host, sets the migration status to 'reverted' before dropping the move claim. This exposes the same race that we previously saw with the confirm resize operation. It then calls back to 'ComputeManager.finish_revert_resize' on the source host to boot up the instance itself. This is kind of weird, because, even ignoring the race, we're marking the migration as 'reverted' before we've done any of the necessary work on the source host. The cross-cell resize revert splits dropping of the move claim and setting of the migration status between the source and destination host tasks. Specifically, we do cleanup on the destination and drop the move claim first, via 'ComputeManager.revert_snapshot_based_resize_at_dest' before resuming the instance and setting the migration status on the source via 'ComputeManager.finish_revert_snapshot_based_resize_at_source'. This would appear to avoid the weird quirk of same-cell migration, however, in typical weird cross-cell fashion, these are actually different instances and different migration records. The solution is once again to move the setting of the migration status and the dropping of the claim under 'COMPUTE_RESOURCE_SEMAPHORE'. This introduces the weird setting of migration status before completion to the cross-cell resize case and perpetuates it in the same-cell case, but this seems like a suitable compromise to avoid attempts to do things like unplugging already unplugged PCI devices or unpinning already unpinned CPUs. From an end-user perspective, instance state changes are what really matter and once a revert is completed on the destination host and the instance has been marked as having returned to the source host, hard reboots can help us resolve any remaining issues. Change-Id: I29d6f4a78c0206385a550967ce244794e71cef6d Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Closes-Bug: #1879878 (cherry picked from commit dc9c7a5ebf11253f86127238d33dff7401465155)
* \|	Move confirm resize under semaphore	Stephen Finucane	2020-09-11	1	-10/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 'ResourceTracker.update_available_resource' periodic task builds usage information for the current host by inspecting instances and in-progress migrations, combining the two. Specifically, it finds all instances that are not in the 'DELETED' or 'SHELVED_OFFLOADED' state, calculates the usage from these, then finds all in-progress migrations for the host that don't have an associated instance (to prevent double accounting) and includes the usage for these. In addition to the periodic task, the 'ResourceTracker' class has a number of helper functions to make or drop claims for the inventory generated by the 'update_available_resource' periodic task as part of the various instance operations. These helpers naturally assume that when making a claim for a particular instance or migration, there shouldn't already be resources allocated for same. Conversely, when dropping claims, the resources should currently be allocated. However, the check for active instances and in-progress migrations in the periodic task means we have to be careful in how we make changes to a given instance or migration record. Running the periodic task between such an operation and an attempt to make or drop a claim can result in TOCTOU-like races. This generally isn't an issue: we use the 'COMPUTE_RESOURCE_SEMAPHORE' semaphore to prevent the periodic task running while we're claiming resources in helpers like 'ResourceTracker.instance_claim' and we make our changes to the instances and migrations within this context. There is one exception though: the 'drop_move_claim' helper. This function is used when dropping a claim for either a cold migration, a resize or a live migration, and will drop usage from either the source host (based on the "old" flavor) for a resize confirm or the destination host (based on the "new" flavor) for a resize revert or live migration rollback. Unfortunately, while the function itself is wrapped in the semaphore, no changes to the state or the instance or migration in question are protected by it. Consider the confirm resize case, which we're addressing here. If we mark the migration as 'confirmed' before running 'drop_move_claim', then the periodic task running between these steps will not account for the usage on the source since the migration is allegedly 'confirmed'. The call to 'drop_move_claim' will then result in the tracker dropping usage that we're no longer accounting for. This "set migration status before dropping usage" is the current behaviour for both same-cell and cross-cell resize, via the 'ComputeManager.confirm_resize' and 'ComputeManager.confirm_snapshot_based_resize_at_source' functions, respectively. We could reverse those calls and run 'drop_move_claim' before marking the migration as 'confirmed', but while our usage will be momentarily correct, the periodic task running between these steps will re-add the usage we just dropped since the migration isn't yet 'confirmed'. The correct solution is to close this gap between setting the migration status and dropping the move claim to zero. We do this by putting both operations behind the 'COMPUTE_RESOURCE_SEMAPHORE', just like the claim operations. Change-Id: I26b050c402f5721fc490126e9becb643af9279b4 Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Partial-Bug: #1879878 (cherry picked from commit a57800d3825ef2fb833d8024c6f7e5c550f55e2f)
* \|	functional: Don't inherit from 'ProviderUsageBaseTestCase'	Stephen Finucane	2020-09-11	1	-3/+7
\|/ \| \| \| \| \| \| \| \|	It's not necessary for this test case since we don't query placement stuff. Change-Id: I5136b1cab8d704f55d01e5490c42ead6e3e14d98 Signed-off-by: Stephen Finucane <stephenfin@redhat.com> (cherry picked from commit e91b07dc522db6b6e4df8972437f78ef2697790b)
*	Add generic reproducer for bug #1879878	Stephen Finucane	2020-09-11	1	-0/+168
\| \| \| \| \| \| \| \|	No need for the libvirt driver in all its complexity here. Change-Id: Ifea9a15fb01c0b25e9973024f4f61faecc56e1cd Signed-off-by: Stephen Finucane <stephenfin@redhat.com> (cherry picked from commit e1adbced92453329f7285ec38c1dc7821ebb52c7)
*	compute: Don't delete the original attachment during pre LM rollback	Lee Yarwood	2020-08-05	1	-7/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I0bfb11296430dfffe9b091ae7c3a793617bd9d0d introduced support for live migration with cinderv3 volume attachments during Queens. This initial support handled failures in pre_live_migration directly by removing any attachments created on the destination and reverting to the original attachment ids before re-raising the caught exception to the source compute. It also added rollback code within the main _rollback_live_migration method but missed that this would also be called during a pre_live_migration rollback. As a result after a failure in pre_live_migration _rollback_live_migration will attempt to delete the source host volume attachments referenced by the bdm before updating the bdms with the now non-existent attachment ids, leaving the volumes in an `available` state in Cinder as they have no attachment records associated with them anymore. This change aims to resolve this within _rollback_volume_bdms by ensuring that the current and original attachment_ids are not equal before requesting that the current attachment referenced by the bdm is deleted. When called after a failure in pre_live_migration this should result in no attempt being made to remove the original source host attachments from Cinder. Note that the following changes muddy the waters slightly here but introduced no actual changes to the logic within _rollback_live_migration: * I0f3ab6604d8b79bdb75cf67571e359cfecc039d8 reworked some of the error handling in Rocky but isn't the source of the issue here. * Ibe9215c07a1ee00e0e121c69bcf7ee1b1b80fae0 reworked _rollback_live_migration to use the provided source_bdms. * I6bc73e8c8f98d9955f33f309beb8a7c56981b553 then refactored _rollback_live_migration, moving the logic into a self contained _rollback_volume_bdms method. Closes-Bug: #1889108 Change-Id: I9edb36c4df1cc0d8b529e669f06540de71766085 (cherry picked from commit 2102f1834a6ac9fd870bfb457b28a2172f33e281)
*	Add regression tests for bug #1889108	Lee Yarwood	2020-07-31	1	-0/+103
\| \| \| \| \| \|	Related-Bug: #1889108 Change-Id: Ib9dbc792dc918e7ea45915e2c1dbd96be82ef562 (cherry picked from commit 4c970f499c31370495d84c91a10319d308d13fb9)
*	objects: Update keypairs when saving an instance	Takashi NATSUME	2020-07-23	1	-0/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The keypair of a server is updated when rebuilding the server with a keypair. This function has been added since API microversion 2.54. However the 'keypairs' of the instance object is not saved when saving the instance object currently. Make the instance object update the 'keypairs' field when saving the instance object. Change-Id: I8a2726b39d0444de8c35480024078a97430f5d0c Closes-Bug: #1843708 Co-authored-by: Stephen Finucane <stephenfin@redhat.com> (cherry picked from commit 086796021b189c3ac64805ed8f6bde833906d284)
*	libvirt: Don't delete disks on shared storage during evacuate	Matthew Booth	2020-06-10	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When evacuating an instance between compute hosts on shared storage, during the rebuild operation we call spawn() on the destination compute. spawn() currently assumes that it should cleanup all resources on failure, which results in user data being deleted in the evacuate case. This change modifies spawn in the libvirt driver such that it only cleans up resources it created. Co-Authored-By: Lee Yarwood <lyarwood@redhat.com> Closes-Bug: #1550919 Change-Id: I764481966c96a67d993da6e902dc9fc3ad29ee36 (cherry picked from commit 497360b0ea970f1e68912be8229ef8c3f5454e9e)
*	Merge "partial support for live migration with specific resources"	Zuul	2020-04-08	1	-1/+9
\|\
\| *	partial support for live migration with specific resources	LuyaoZhong	2020-04-07	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Claim allocations from placement first, then claim specific resources in Resource Tracker on destination to populate migration_context.new_resources 3. cleanup specific resources when live migration succeeds/fails Because we store specific resources in migration_context during live migration, to ensure cleanup correctly we can't drop migration_context before cleanup is complete: a) when post live migration, we move source host cleanup before destination cleanup(post_live_migration_at_destination will apply migration_context and drop it) b) when rollback live migration, we drop migration_context after rollback operations are complete For different specific resource, we might need driver specific support, such as vpmem. This change just ensures that new claimed specific resources are populated to migration_context and migration_context is not droped before cleanup is complete. Change-Id: I44ad826f0edb39d770bb3201c675dff78154cbf3 Implements: blueprint support-live-migration-with-virtual-persistent-memory
* \|	func tests: move _run_periodics() into base class	Artom Lifshitz	2020-03-24	3	-13/+6
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are two almost identical implementations of the _run_periodics() helper - and a third one would have joined them in a subsequent patch, if not for this patch. This patch moves the _run_periodics() to the base test class. In addition, _run_periodics() depends on the self.computes dict used for compute service tracking. The method that populates that dict, _start_compute(), is therefore also moved to the base class. This enables some light refactoring of existing tests that need either the _run_periodics() helper, or the compute service tracking. In addition, a needless override of _start_compute() in test_aggregates that provided no added value is removed. This is done to avoid any potential confusion around _start_compute()'s role. Change-Id: I36dd64dc272ea1743995b3b696323a9431666489 safdasdf Change-Id: I33d8ac0a1cae0b2d275a21287d5e44c008a68122
*	Fix intermittently failing regression case	Balazs Gibizer	2020-03-16	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The test_unshelve_offloaded_fails_due_to_neutron could fail due to race condition. The test case only waits for the first instance.save() call at [1] but the allocation delete happens after it. This causes that the test case can still see the allocation of the offloaded server in placement. The fix makes sure that the test waits for the second instance.save() by checking for the host of the instance. [1] https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L5274-L5288 Related-Bug #1862633 Change-Id: Ic1c3d35749fbdc7f5b6f6ec1e16b8fcf37c10de8
*	functional: Avoid race and fix use of self.api within test_bug_1831771	Lee Yarwood	2020-03-04	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This test would previously only attempt to invoke a race between instance.save(expected_task_state=task_states.SPAWNING) and a parallel attempt to delete an instance when the instance also has a vm_state of ACTIVE and task_state of None. However vm_state and task_state would often be different within the test resulting in no attempt to invoke the test being made. As instance.save is only called with expected_task_state set to task_states.SPAWNING by _unshelve_instance and _build_and_run_instance we should just check for this and avoid any state races within the test. Additionally when attempting to invoke the race this test would call _wait_for_server_parameter and provide self.api. This change removes this argument as since I8c96b337f32148f8f5899c9b87af331b1fa41424 this is no longer required and will result in a `TypeError: 'TestOpenStackClient' object is not subscriptable` error. Closes-Bug: #1866072 Change-Id: I36da36cc5b099174eece0dfba29485fc20b2867b
*	Unplug VIFs as part of cleanup of networks	Stephen Finucane	2020-02-26	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If an instance fails to build, which is possible for a variety of reasons, we may end up in a situation where we have remnants of a plugged VIF (typically files) left on the host. This is because we cleanup from the neutron perspective but don't attempt to unplug the VIF, a call which may have many side-effects depending on the VIF driver. Resolve this by always attempting to unplug VIFs as part of the network cleanup. A now invalid note is also removed and a unit test corrected. Closes-Bug: #1831771 Related-Bug: #1830081 Signed-off-by: Stephen Finucane <sfinucan@redhat.com> Change-Id: Ibdbde4ed460a99b0cbe0d6b76e0e5b3c0650f9d9
*	Functional test for UnexpectedDeletingTaskStateError	Matthew Booth	2020-02-24	1	-0/+101
\| \| \| \| \| \| \| \| \|	Adds a regression-style test for two cleanup bugs when 'UnexpectedDeletingTaskStateError' is raised during build. Change-Id: Ief1dfbb6cc9d67b73dfab4c7b63358e76e12866b Related-Bug: #1848666 Related-Bug: #1831771
*	Clean up allocation if unshelve fails due to neutron	Balazs Gibizer	2020-02-10	1	-6/+1
\| \| \| \| \| \| \| \| \|	When port binding update fails during unshelve of a shelve offloaded instance compute manager has to catch the exception and clean up the destination host allocation. Change-Id: I4c3fbb213e023ac16efc0b8561f975a659311684 Closes-Bug: #1862633
*	Reproduce bug 1862633	Balazs Gibizer	2020-02-10	1	-0/+92
\| \| \| \| \| \| \| \|	If port update fails during unshelve of an offloaded server then placement allocation on the target host is leaked. Change-Id: I7be32e4fc2e69f805535e0a437931516f491e5cb Related-Bug: #1862633
*	functional: Add '_create_server' helper	Stephen Finucane	2020-01-20	3	-29/+13
\| \| \| \| \| \| \| \|	This is a very common pattern. Centralize it. Only a few users are added for now, but we can migrate more later (it's rather tedious work). Change-Id: I84c58de90dad6d86271767363aef90ddac0f1730 Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
*	functional: Add '_delete_server' to 'InstanceHelperMixin'	Stephen Finucane	2020-01-15	4	-12/+6
\| \| \| \| \| \| \|	Another broadly useful helper. Change-Id: I67f5b00541c9f72d50c6fe626987872a0930e982 Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
*	functional: Add unified '_build_server' helper function	Stephen Finucane	2020-01-15	23	-108/+47
\| \| \| \| \| \| \| \| \| \| \| \| \|	'_IntegratedTestBase' has subclassed 'InstanceHelperMixin' since change I0d21cb94c932e6e556eca964c57868c705b2d120, which means both now provide a '_build_minimal_create_server_request' function. However, only '_IntegratedTestBase' provides a '_build_server' function. The '_build_minimal_create_server_request' and '_build_server' functions do pretty much the same thing but there are some differences. Combine these under the '_build_server' alias. Change-Id: I91fa2f73185fef48e9aae9b7f61389c374e06676 Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
*	Merge "functional: Make '_IntegratedTestBase' subclass 'InstanceHelperMixin'"	Zuul	2020-01-09	4	-8/+4
\|\
\| *	functional: Make '_IntegratedTestBase' subclass 'InstanceHelperMixin'	Stephen Finucane	2019-12-06	4	-8/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This means we can drop many duplicate implementations of things - nova.tests.function.integrated_helpers._IntegratedTestBase - _build_minimal_create_server_request - nova.tests.functional.test_servers.ServersTestBase - _wait_for_server_parameter - _wait_for_state_change - _wait_until_deleted Change-Id: I0d21cb94c932e6e556eca964c57868c705b2d120 Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
* \|	Create instance action when burying in cell0	Matt Riedemann	2019-12-12	1	-0/+82
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change I8742071b55f018f864f5a382de20075a5b444a79 in Ocata moved the creation of the instance record from the API to conductor. As a result, the "create" instance action was only being created in conductor when the instance is created in a non-cell0 database. This is a regression because before that change when a server create would fail during scheduling you could still list instance actions for the server and see the "create" action but that was lost once we started burying those instances in cell0. This fixes the bug by creating the "create" action in the cell0 database when burying an instance there. It goes a step further and also creates and finishes an event so the overall action message shows up as "Error" with the details about where the failure occurred in the event traceback. A short release note is added since a new action event is added here (conductor_schedule_and_build_instances) rather than re-use some kind of event that we could generate from the compute service, e.g. compute__do_build_and_run_instance. Change-Id: I1e9431e739adfbcfc1ca34b87e826a516a4b18e2 Closes-Bug: #1852458
*	functional: Remove 'api' parameter	Stephen Finucane	2019-12-06	38	-120/+101
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pretty much every test case in 'nova.tests.functional' defines an 'api' attribute, and many define an 'admin_api' attribute. We can pull these from the class rather than explicitly passing them to helpers. Rework things so this happens. Note that the bulk of the changes here are in the 'nova/tests/functional/integrated_helpers.py' file. The rest of the changes were auto-generated using the following script (my sed-fu is non-existent): $ cd nova/tests/functional $ python3 >>> import glob >>> import re >>> pattern = r'_state_change\((\n\s+)?self\.(admin_)?api,\s+' >>> replace = r'_state_change(\1' >>> for path in glob.glob('.py') + glob.glob('/*.py'): ... with open(path) as fh: ... data = fh.read() ... new = re.sub(pattern, replace, data, flags=re.MULTILINE) ... if new != data: ... with open(path, 'w') as fh: ... fh.write(new) ... >>> quit() (ditto for the other substitutions) Some manual fixups were required after, which pre-commit highlighted :) Change-Id: I8c96b337f32148f8f5899c9b87af331b1fa41424 Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
*	nova-net: Remove final references to nova-net from functional tests	Stephen Finucane	2019-11-29	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	Most of this is just removal of unnecessary 'use_neutron=True' conf overrides. The sole exception is the conversion of the metadata API tests, which turned out to be far less effort to convert than I expected. Change-Id: I5114bf13e7df95ed02be70f29d453116387d4f9c Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
*	Merge "Remove functional test specific nova code"	Zuul	2019-11-20	1	-30/+0
\|\
\| *	Remove functional test specific nova code	Balazs Gibizer	2019-11-13	1	-30/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a bug in oslo messaging fake RPC driver bug 1529084. This driver is used in the nova functional test and the bug currently worked around in the nova production code. As the bug 1529084 is fixed now in oslo.messaging 10.3.0 we can remove the workaround in the nova code by bumping the minimum version of oslo.messaging. Change-Id: I4a32a688c7ceb05c263a0e93a91fb9b8ff0c65d4 Related-Bug: #1529084
* \|	Merge "functional: Rework '_delete_server'"	Zuul	2019-11-18	3	-10/+10
\|\ \