summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Merge "[compute] always set instance.host in post_livemigration" into ↵stable/trainZuul2023-04-033-9/+68
|\ | | | | | | stable/train
| * [compute] always set instance.host in post_livemigrationSean Mooney2023-01-163-9/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change add a new _post_live_migration_update_host function that wraps _post_live_migration and just ensures that if we exit due to an exception instance.host is set to the destination host. when we are in _post_live_migration the guest has already started running on the destination host and we cannot revert. Sometimes admins or users will hard reboot the instance expecting that to fix everything when the vm enters the error state after the failed migrations. Previously this would end up recreating the instance on the source node leading to possible data corruption if the instance used shared storage. NOTE(auniyal): Differences from ussuri to train * nova/tests/unit/compute/test_compute_mgr.py * Added instance.migration_context value to None, as fake_instance do not have this property in train Change-Id: Ibc4bc7edf1c8d1e841c72c9188a0a62836e9f153 Partial-Bug: #1628606 (cherry picked from commit 8449b7caefa4a5c0728e11380a088525f15ad6f5) (cherry picked from commit 643b0c7d35752b214eee19b8d7298a19a8493f6b) (cherry picked from commit 17ae907569e45cc0f5c7da9511bb668a877b7b2e) (cherry picked from commit 15502ddedc23e6591ace4e73fa8ce5b18b5644b0) (cherry picked from commit 43c0e40d288960760a6eaad05cb9670e01ef40d0) (cherry picked from commit 0ac64bba8b7aba2fb358e00e970e88b32d26ef7e) (cherry picked from commit 3885f983c358e5a5f0b10f603633193ac335a45f)
* | Merge "Adds a repoducer for post live migration fail" into stable/trainZuul2023-03-022-3/+76
|\ \ | |/
| * Adds a repoducer for post live migration failAmit Uniyal2023-01-162-3/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds a regression test or repoducer for post live migration fail at destination, the possible casue can be fail to get instance network info or block device info changes: adds return server from _live_migrate in _integrated_helpers NOTE(auniyal): Differences * Replaced GlanceFixture with fake.stub_out_image_service in regression test, as GlanceFixture does not exist in Ussuri NOTE(auniyal): Differences from ussuri to train * integrated_helpers: Added self.api parameter while calling wait_for_state_change. * regression: imported mock module, as unitetest.mock is addted post train release. as _create_server is not present in train used _build_minimal_create_server instead to _create_server. Related-Bug: #1628606 Change-Id: I48dbe0aae8a3943fdde69cda1bd663d70ea0eb19 (cherry picked from commit a20baeca1f5ebb0dfe9607335a6986e9ed0e1725) (cherry picked from commit 74a618a8118642c9fd32c4e0d502d12ac826affe) (cherry picked from commit 71e5a1dbcc22aeaa798d3d06ce392cf73364b8db) (cherry picked from commit 5efcc3f695e02d61cb8b881e009308c2fef3aa58) (cherry picked from commit ed1ea71489b60c0f95d76ab05f554cd046c60bac) (cherry picked from commit 6dda4f7ca3f25a11cd0178352ad24fe2e8b74785) (cherry picked from commit 5e955b62fa63b72816369a21af283a2b64f4af27)
* | Merge "Refactor volume connection cleanup out of _post_live_migration" into ↵Zuul2023-03-023-90/+53
|\ \ | |/ | | | | stable/train
| * Refactor volume connection cleanup out of _post_live_migrationMatt Riedemann2022-11-243-90/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | The _post_live_migration method is huge and hard to review/test/ maintain in isolation so we need to break down the complexity. To that end this change refactors the source host volume connection cleanup code into another method which also allows us to simplify a couple of unit tests that target that specific set of code. The error handling in that new method is tested by the functional test class LiveMigrationCinderFailure. Change-Id: Id0e8b1c32600d53382e5ac938e403258c80221a0 (cherry picked from commit e6916ab11469dde8fb4a8a2936f23b3f8647e24d)
* | [stable-only] Add binary test dependency "python3-devel" for py3 based RPM ↵Jorge San Emeterio2023-02-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | distros. Coming from RHEL 8, dependency "python-devel [platform:rpm test]" indicated on "bindep.txt" cannot be satisfied by the distribution. RHEL 8 expects the package name to be "python3-devel" instead. For such reason, this change adds a conditional that will select the proper package name depending on whether the distro is based on python 2 or 3. Other RPM distributions like CentOS should also benefit from this change. Conditionals for the python version are inverted ("!"), that is because trying to filter them positively resulted in both conditions triggering at the same time. This way works for RHEL 8, although it would still need testing on something RHEL 7 based. Closes-Bug: #2007959 Change-Id: I0aac20be976e687229f4759e1364718aa663cf27
* | Merge "func: Introduce a server_expected_state kwarg to ↵Zuul2023-01-241-3/+4
|\ \ | |/ | | | | InstanceHelperMixin._live_migrate" into stable/train
| * func: Introduce a server_expected_state kwarg to ↵Lee Yarwood2022-11-231-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | InstanceHelperMixin._live_migrate Useful when testing live migration failures that leave the server in an non ACTIVE state. This change also renames the migration_final_status arg to migration_expected_state within the method to keep it in line with _create_server. NOTE(artom): This is to facilitate subsequent backports of live migration regression tests and bug fixes. Partial-Bug: #1628606 Change-Id: Ie0852a89fc9423a92baa7c29a8806c0628cae220 (cherry picked from commit e70ddd621cb59a8845a4241387d8a49e443b7b69) (cherry picked from commit 2b0cf8edf88c5f81696d72b04098aa12d1137e90)
* | Merge "Adapt websocketproxy tests for SimpleHTTPServer fix" into stable/trainZuul2023-01-111-33/+26
|\ \
| * | Adapt websocketproxy tests for SimpleHTTPServer fixmelanie witt2022-12-011-33/+26
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In response to bug 1927677 we added a workaround to NovaProxyRequestHandler to respond with a 400 Bad Request if an open redirect is attempted: Ie36401c782f023d1d5f2623732619105dc2cfa24 I95f68be76330ff09e5eabb5ef8dd9a18f5547866 Recently in python 3.10.6, a fix has landed in cpython to respond with a 301 Moved Permanently to a sanitized URL that has had extra leading '/' characters removed. This breaks our existing unit tests which assume a 400 Bad Request as the only expected response. This adds handling of a 301 Moved Permanently response and asserts that the redirect location is the expected sanitized URL. Doing this instead of checking for a given python version will enable the tests to continue to work if and when the cpython fix gets backported to older python versions. While updating the tests, the opportunity was taken to commonize the code of two unit tests that were nearly identical. Conflicts: nova/tests/unit/console/test_websocketproxy.py NOTE(melwitt): The conflict is because change I23ac1cc79482d0fabb359486a4b934463854cae5 (Allow TLS ciphers/protocols to be configurable for console proxies) is not in Train. The difference from the cherry picked change is because the flake8 version on the stable/train branch does not support f-strings [1]. Related-Bug: #1927677 Closes-Bug: #1986545 [1] https://lists.openstack.org/pipermail/openstack-discuss/2019-November/011027.html Change-Id: I27441d15cc6fa2ff7715ba15aa900961aadbf54a (cherry picked from commit 15769b883ed4a86d62b141ea30d3f1590565d8e0) (cherry picked from commit 4a2b44c7cf55d1d79d5a2dd638bd0def3af0f5af) (cherry picked from commit 0e4a257e8636a979605c614a35e79ba47b74d870) (cherry picked from commit 3023e162e1a415ddaa70b4b8fbe24b1771dbe424) (cherry picked from commit 77bc3f004e7fe4077ea035c659630bedef1cfea1) (cherry picked from commit 746d654c23d75f084b6f0c70e6c32b97eebf419c)
* | Merge "Set instance host and drop migration under lock" into stable/trainZuul2022-12-204-37/+40
|\ \
| * | Set instance host and drop migration under lockBalazs Gibizer2022-11-224-37/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The _update_available_resources periodic makes resource allocation adjustments while holding the COMPUTE_RESOURCE_SEMAPHORE based on the list of instances assigned to this host of the resource tracker and based on the migrations where the source or the target host is the host of the resource tracker. So if the instance.host or the migration context changes without holding the COMPUTE_RESOURCE_SEMAPHORE while the _update_available_resources task is running there there will be data inconsistency in the resource tracker. This patch makes sure that during evacuation the instance.host and the migration context is changed while holding the semaphore. stable/train specific change: fair locking was introduced from ussuri forward. See Ia5e521e0f0c7a78b5ace5de9f343e84d872553f9 So the train backport does not add fair=True to the synchronized decorator of finish_evacuation as that would interfere with the other functions taking the same lock without fair=True. Change-Id: Ica180165184b319651d22fe77e076af036228860 Closes-Bug: #1896463 (cherry picked from commit 7675964af81472f3dd57db952d704539e61a3e6e) (cherry picked from commit 3ecc098d28addd8f3e1da16b29940f4337209e62) (cherry picked from commit 3b497ad7d41bc84ec50124c4d55f4138c4878751)
* | | Merge "Reproduce bug 1896463 in func env" into stable/trainZuul2022-12-201-0/+243
|\ \ \ | |/ /
| * | Reproduce bug 1896463 in func envBalazs Gibizer2022-11-221-0/+243
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a race condition between the rebuild and the _update_available_resource periodic task on the compute. This patch adds a reproducer functional test. Unfortunately it needs some injected sleep to make the race happen in a stable way. This is suboptimal but only adds 3 seconds of slowness to the test execution. Conflicts: nova/tests/functional/regressions/test_bug_1896463.py due to I84c58de90dad6d86271767363aef90ddac0f1730 and I8c96b337f32148f8f5899c9b87af331b1fa41424 is not in stable/train. Also I had to make the test code python 2 compatible. Change-Id: Id0577bceed9808b52da4acc352cf9c204f6c8861 Related-Bug: #1896463 (cherry picked from commit 3f348602ae4a40c52c7135b2cb48deaa6052c488) (cherry picked from commit d768cdbb88d0b0b3ca38623c4bb26d5eabdf1596) (cherry picked from commit 02114a9d7f2e8b62d3a7091ca3fde251dfffa860)
* | | For evacuation, ignore if task_state is not NoneAmit Uniyal2022-11-294-16/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ignore instance task state and continue with vm evacutaion. Closes-Bug: #1978983 Change-Id: I5540df6c7497956219c06cff6f15b51c2c8bc29d (cherry picked from commit db919aa15f24c0d74f3c5c0e8341fad3f2392e57) (cherry picked from commit 6d61fccb8455367aaa37ae7bddf3b8befd3c3d88) (cherry picked from commit 8e9aa71e1a4d3074a94911db920cae44334ba2c3) (cherry picked from commit 0b8124b99601e1aba492be8ed564f769438bd93d) (cherry picked from commit 3224ceb3fffc57d2375e5163d8ffbbb77529bc38) (cherry picked from commit 90e65365ab608792c4b8d8c4c3a87798fccadeec)
* | | add regression test case for bug 1978983Amit Uniyal2022-11-292-0/+130
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change add a repoducer test for evacuating a vm in the powering-off state Conflicts: nova/tests/functional/integrated_helpers.py nova/tests/functional/test_servers.py Difference: nova/tests/functional/regressions/test_bug_1978983.py NOTE(auniyal): Conflicts are due to the following changes that are not in Ussuri: * I147bf4d95e6d86ff1f967a8ce37260730f21d236 (Cyborg evacuate support) * Ia3d7351c1805d98bcb799ab0375673c7f1cb8848 (Functional tests for NUMA live migration) NOTE(auniyal): Differences * Replaced GlanceFixture with fake.stub_out_image_service in regression test, as GlanceFixture does not exist in Ussuri NOTE(auniyal): Differences from ussuri to train * regression: as create_server is not present in train used _build_minimal_create_server instead to create server Related-Bug: #1978983 Change-Id: I5540df6c7497956219c06cff6f15b51c2c8bc299 (cherry picked from commit 5904c7f993ac737d68456fc05adf0aaa7a6f3018) (cherry picked from commit 6bd0bf00fca6ac6460d70c855eded3898cfe2401) (cherry picked from commit 1e0af92e17f878ce64bd16e428cb3c10904b0877) (cherry picked from commit b57b0eef218fd7604658842c9277aad782d11b45) (cherry picked from commit b6c877377f58ccaa797af3384b199002726745ea) (cherry picked from commit 9015c3b663a7b46192c106ef065f93e82f0ab8be)
* | func: Add _live_migrate helper to InstanceHelperMixinArtom Lifshitz2022-11-231-0/+8
|/ | | | | | | | | | | | | | | | | | | This is a partial backport of I70c4715de05d64fabc498b02d5c757af9450fbe9 that introduced this helper will addressing feedback on Ia3d7351c1805d98bcb799ab0375673c7f1cb8848 and I78e79112a9c803fb45d828cfb4641456da66364a that landed in Victoria. Follow-up for NUMA live migration functional tests This patch addresses outstanding feedback on Ia3d7351c1805d98bcb799ab0375673c7f1cb8848 and I78e79112a9c803fb45d828cfb4641456da66364a. Related-Bug: #1628606 Change-Id: I70c4715de05d64fabc498b02d5c757af9450fbe9 (cherry picked from commit ca8f1f422298b0a26cf30165595d256f4fa71135) (cherry picked from commit 726ca4aec5ccea96748de88b2c2a2fd1a078cfc5)
* Ignore plug_vifs on the ironic driverJulia Kreger2022-09-073-18/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the nova-compute service starts, by default it attempts to startup instance configuration states for aspects such as networking. This is fine in most cases, and makes a lot of sense if the nova-compute service is just managing virtual machines on a hypervisor. This is done, one instance at a time. However, when the compute driver is ironic, the networking is managed as part of the physical machine lifecycle potentially all the way into committed switch configurations. As such, there is no need to attempt to call ``plug_vifs`` on every single instance managed by the nova-compute process which is backed by Ironic. Additionally, using ironic tends to manage far more physical machines per nova-compute service instance then when when operating co-installed with a hypervisor. Often this means a cluster of a thousand machines, with three controllers, will see thousands of un-needed API calls upon service start, which elongates the entire process and negatively impacts operations. In essence, nova.virt.ironic's plug_vifs call now does nothing, and merely issues a debug LOG entry when called. Closes-Bug: #1777608 Change-Id: Iba87cef50238c5b02ab313f2311b826081d5b4ab (cherry picked from commit 7f81cf28bf21ad2afa98accfde3087c83b8e269b) (cherry picked from commit eb6d70f02daa14920a2522e5c734a3775ea2ea7c) (cherry picked from commit f210115bcba3436b957a609cd388a13e6d77a638) (cherry picked from commit 35fb52f53fbd3f8290f775760a842d70f583fa67) (cherry picked from commit 2b8c1cffe409af1e1da6597ea0b7b96931a035f7)
* [ironic] Minimize window for a resource provider to be lostJulia Kreger2022-08-233-1/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is based upon a downstream patch which came up in discussion amongst the ironic community when some operators began discussing a case where resource providers had disappeared from a running deployment with several thousand baremetal nodes. Discussion amongst operators and developers ensued and we were able to determine that this was still an issue in the current upstream code and that time difference between collecting data and then reconciling the records was a source of the issue. Per Arun, they have been running this change downstream and had not seen any reoccurances of the issue since the patch was applied. This patch was originally authored by Arun S A G, and below is his original commit mesage. An instance could be launched and scheduled to a compute node between get_uuids_by_host() call and _get_node_list() call. If that happens the ironic node.instance_uuid may not be None but the instance_uuid will be missing from the instance list returned by get_uuids_by_host() method. This is possible because _get_node_list() takes several minutes to return in large baremetal clusters and a lot can happen in that time. This causes the compute node to be orphaned and associated resource provider to be deleted from placement. Once the resource provider is deleted it is never created again until the service restarts. Since resource provider is deleted subsequent boots/rebuilds to the same host will fail. This behaviour is visibile in VMbooter nodes because it constantly launches and deletes instances there by increasing the likelihood of this race condition happening in large ironic clusters. To reduce the chance of this race condition we call _get_node_list() first followed by get_uuids_by_host() method. Change-Id: I55bde8dd33154e17bbdb3c4b0e7a83a20e8487e8 Co-Authored-By: Arun S A G <saga@yahoo-inc.com> Related-Bug: #1841481 (cherry picked from commit f84d5917c6fb045f03645d9f80eafbc6e5f94bdd) (cherry picked from commit 0c36bd28ebd05ec0b1dbae950a24a2ecf339be00) (cherry picked from commit 67be896e0f70ac3f4efc4c87fc03395b7029e345)
* Merge "Don't unset Instance.old_flavor, new_flavor until necessary" into ↵Zuul2022-08-203-33/+45
|\ | | | | | | stable/train
| * Don't unset Instance.old_flavor, new_flavor until necessaryStephen Finucane2021-02-043-33/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since change Ia6d8a7909081b0b856bd7e290e234af7e42a2b38, the resource tracker's 'drop_move_claim' method has been capable of freeing up resource usage. However, this relies on accurate resource reporting. It transpires that there's a race whereby the resource tracker's 'update_available_resource' periodic task can end up not accounting for usage from migrations that are in the process of being completed. The root cause is the resource tracker's reliance on the stashed flavor in a given migration record [1]. Previously, this information was deleted by the compute manager at the start of the confirm migration operation [2]. The compute manager would then call the virt driver [3], which could take a not insignificant amount of time to return, before finally dropping the move claim. If the periodic task ran between the clearing of the stashed flavor and the return of the virt driver, it would find a migration record with no stashed flavor and would therefore ignore this record for accounting purposes [4], resulting in an incorrect record for the compute node, and an exception when the 'drop_move_claim' attempts to free up the resources that aren't being tracked. The solution to this issue is pretty simple. Instead of unsetting the old flavor record from the migration at the start of the various move operations, do it afterwards. Conflicts: nova/compute/manager.py NOTE(stephenfin): Conflicts are due to a number of missing cross-cell resize changes. [1] https://github.com/openstack/nova/blob/6557d67/nova/compute/resource_tracker.py#L1288 [2] https://github.com/openstack/nova/blob/6557d67/nova/compute/manager.py#L4310-L4315 [3] https://github.com/openstack/nova/blob/6557d67/nova/compute/manager.py#L4330-L4331 [4] https://github.com/openstack/nova/blob/6557d67/nova/compute/resource_tracker.py#L1300 Change-Id: I4760b01b695c94fa371b72216d398388cf981d28 Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Partial-Bug: #1879878 Related-Bug: #1834349 Related-Bug: #1818914 (cherry picked from commit 44376d2e212e0f9405a58dc7fc4d5b38d70ac42e) (cherry picked from commit ce95af2caf69cb1b650459718fd4fa5f00ff28f5)
* | [CI] Fix gate by using zuulv3 live migration and grenade jobsLee Yarwood2022-08-1616-523/+253
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is a combination of several legacy-to-zuulv3 job patches to unblock the gate: with the latest Ceph release the legacy grenade jobs started to fail with the following erros (back till ussuri): 'Error EPERM: configuring pool size as 1 is disabled by default.' The patch contains almost a clean backport of the job configuration. Conflicts: .zuul.yaml gate/live_migration/hooks/ceph.sh gate/live_migration/hooks/run_tests.sh gate/live_migration/hooks/utils.sh playbooks/legacy/nova-grenade-multinode/run.yaml playbooks/legacy/nova-live-migration/run.yaml NOTE(melwitt): The .zuul.yaml conflict is because change I4b2d321b7243ec149e9445035d1feb7a425e9a4b (Skip to run all integration jobs for policy-only changes.) and change I86f56b0238c72d2784e62f199cfc7704b95bbcf2 (FUP to Ie1a0cbd82a617dbcc15729647218ac3e9cd0e5a9) are not in Train. The gate/live_migration/hooks/ceph.sh conflict is because change Id565a20ba3ebe2ea1a72b879bd2762ba3e655658 (Convert legacy nova-live-migration and nova-multinode-grenade to py3) is not in Train and change I1d029ebe78b16ed2d4345201b515baf3701533d5 ([stable-only] gate: Pin CEPH_RELEASE to nautilus in LM hook) is only in Train. The gate/live_migration/hooks/run_tests.sh conflict is because change Id565a20ba3ebe2ea1a72b879bd2762ba3e655658 (Convert legacy nova-live-migration and nova-multinode-grenade to py3) is not in Train. The gate/live_migration/hooks/utils.sh conflict is because change Iad2d198c58512b26dc2733b97bedeffc00108656 was added only in Train. The playbooks/legacy/nova-grenade-multinode/run.yaml conflict is because change Id565a20ba3ebe2ea1a72b879bd2762ba3e655658 (Convert legacy nova-live-migration and nova-multinode-grenade to py3) and change Icac785eec824da5146efe0ea8ecd01383f18398e (Drop neutron-grenade-multinode job) are not in Train. The playbooks/legacy/nova-live-migration/run.yaml conflict is because change Id565a20ba3ebe2ea1a72b879bd2762ba3e655658 (Convert legacy nova-live-migration and nova-multinode-grenade to py3) is not in Train. NOTE(lyarwood): An additional change was required to the run-evacuate-hook as we are now running against Bionic based hosts. These hosts only have a single libvirtd service running so stop and start only this during an evacuation run. List of included patches: 1. zuul: Start to migrate nova-live-migration to zuulv3 2. zuul: Replace nova-live-migration with zuulv3 jobs Closes-Bug: #1901739 Change-Id: Ib342e2d3c395830b4667a60de7e492d3b9de2f0a (cherry picked from commit 4ac4a04d1843b0450e8d6d80189ce3e85253dcd0) (cherry picked from commit 478be6f4fbbbc7b05becd5dd92a27f0c4e8f8ef8) 3. zuul: Replace grenade and nova-grenade-multinode with grenade-multinode Change-Id: I02b2b851a74f24816d2f782a66d94de81ee527b0 (cherry picked from commit 91e53e4c2b90ea57aeac4ec522dd7c8c54961d09) (cherry picked from commit c45bedd98d50af865d727b7456c974c8e27bff8b) (cherry picked from commit 2af08fb5ead8ca1fa4d6b8ea00f3c5c3d26e562c) Change-Id: Ibbb3930a6e629e93a424b3ae048f599f11923be3 (cherry picked from commit 1c733d973015999ee692ed48fb10a282c50fdc49) (cherry picked from commit 341ba7aa175a0a082fec6e5360ae3afa2596ca95)
* | Merge "Add generic reproducer for bug #1879878" into stable/trainZuul2022-06-151-0/+173
|\ \ | |/
| * Add generic reproducer for bug #1879878Stephen Finucane2021-02-041-0/+173
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No need for the libvirt driver in all its complexity here. Changes: nova/tests/functional/regressions/test_bug_1879878.py NOTE(stephenfin): The '_create_server' and '_delete_server' helpers don't exist here, so we need to use the older helpers like '_build_minimal_create_server_request'. Also, we need to support Python 2 here so we can't use 'super' without arguments :( Change-Id: Ifea9a15fb01c0b25e9973024f4f61faecc56e1cd Signed-off-by: Stephen Finucane <stephenfin@redhat.com> (cherry picked from commit e1adbced92453329f7285ec38c1dc7821ebb52c7) (cherry picked from commit 15da841ecb6cc6152f5e6dbcd63bf09725cac6fc)
* | Merge "tests: Add reproducer for bug #1879878" into stable/trainZuul2022-06-151-0/+86
|\ \ | |/
| * tests: Add reproducer for bug #1879878Stephen Finucane2021-02-041-0/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When one resizes a pinned instance, the instance claims host CPUs for pinning purposes on the destination. However, the host CPUs on the source are not immediately relinquished. Rather, they are held by the migration record, to handle the event that the resize is reverted. It is only when one confirms this resize that the old cores are finally relinquished. It appears there is a potential race between the resource tracker's periodic task and the freeing of these resources, resulting in attempts to unpin host cores that have already been unpinned. This test highlights that bug pending a fix. Changes: nova/tests/functional/libvirt/test_numa_servers.py NOTE(stephenfin): We don't yet have the '_create_server' helper or the more sensible '_wait_for_state_change' behavior on 'stable/train', so we have to revert to '_build_server' and checking for the state before the one we want. Change-Id: Ie092628ac71eb87c9dfa7220255a2953ada9e04d Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Related-Bug: #1879878 (cherry picked from commit 10f0a42de162c90c701f70c9c28dc31bfada87db) (cherry picked from commit 8ffaac493288c73badfa4f1ec6021ecb4f3137b7)
* | Merge "Helper to start computes with different HostInfos" into stable/trainZuul2022-06-152-64/+56
|\ \ | |/
| * Helper to start computes with different HostInfosArtom Lifshitz2021-02-042-64/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Sometimes functional tests need to start multiple compute hosts with different HostInfo objects (representing, for example, different NUMA topologies). In the future this will also be needed by both NUMA-aware live migration functional tests and a regression test for bug 1843639. This patch adds a helper function to the base libvirt function test class that takes a hostname to HostInfo dict and starts a compute for each host. Existing tests that can make use of this new helper are refactored. Change-Id: Id3f77c4ecccfdc4caa6dbf120c3df4fbdfce9d0f (cherry picked from commit 607307c1d8a5a7fbcae96ddbb4cf8e944f46f42b)
* | Merge "func tests: move _run_periodics() into base class" into stable/trainZuul2022-06-1512-98/+56
|\ \ | |/
| * func tests: move _run_periodics() into base classArtom Lifshitz2021-02-0412-98/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two almost identical implementations of the _run_periodics() helper - and a third one would have joined them in a subsequent patch, if not for this patch. This patch moves the _run_periodics() to the base test class. In addition, _run_periodics() depends on the self.computes dict used for compute service tracking. The method that populates that dict, _start_compute(), is therefore also moved to the base class. This enables some light refactoring of existing tests that need either the _run_periodics() helper, or the compute service tracking. In addition, a needless override of _start_compute() in test_aggregates that provided no added value is removed. This is done to avoid any potential confusion around _start_compute()'s role. Conflicts: nova/tests/functional/compute/test_cache_image.py nova/tests/functional/integrated_helpers.py nova/tests/functional/test_scheduler.py NOTE(stephenfin): Conflicts are due to the two test files not existing yet and the large number of helpers missing from the 'integrated_helpers' class. Changes: nova/tests/functional/regressions/test_bug_1889108.py NOTE(stephenfin): Changes are due to a backport that was modified to remove use of the functions we're adding here. Change-Id: I33d8ac0a1cae0b2d275a21287d5e44c008a68122 (cherry picked from commit ee05cd8b9e73f8a9a43dbe9ce028e6a6beaec2bd)
* | Merge "Only allow one scheduler service in tests" into stable/trainZuul2022-06-151-0/+21
|\ \ | |/
| * Only allow one scheduler service in testsEric Fried2021-02-041-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There have been two recent issues [1][2] caused by starting multiple instances of the same service in tests. This can cause races when the services' (shared) state conflicts. With this patch, the nexus of nova service starting, nova.test.TestCase.start_service, is instrumented to keep track of how many of each service we are running. If we try to run the scheduler service more than once, we fail. We could probably do the same thing for conductor, though that's less important (for now) because conductor is stateless (for now). [1] https://bugs.launchpad.net/nova/+bug/1844174 [2] https://review.opendev.org/#/c/681059/ (not a nova service, but same class of problem) Change-Id: I56d3cb17260dad8b88f03c0a7b9688efb3258d6f (cherry picked from commit fe05d004b51dc9801749b0f5572e1a2392004830)
* | Merge "Define new functional test tox env for placement gate to run" into ↵Zuul2022-06-011-0/+10
|\ \ | | | | | | | | | stable/train
| * | Define new functional test tox env for placement gate to runGhanshyam Mann2022-06-011-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have placement-nova-tox-functional-py38 job defined and run on placement gate[1] to run the nova functional test excluding api and notification _sample_tests, and db-related tests but that job skip those tests via tox_extra_args which is not right way to do as we currently facing error when tox_extra_args is included in tox siblings task - https://opendev.org/zuul/zuul-jobs/commit/c02c28a982da8d5a9e7b4ca38d30967f6cd1531d - https://zuul.openstack.org/build/a8c186b2c7124856ae32477f10e2b9a4 Let's define a new tox env which can exclude the required test in stestr command itself. Conflicts: tox.ini NOTE(melwitt): The conflict is because change I1d6a2986fcb0435cfabdd104d202b65329909d2b (Moving functional jobs to Victoria testing runtime) is not in Ussuri. The stestr option for the exclude regex also had to be changed because --exclude-regex is not in stestr 3.0.1, the version installed in Ussuri. [1] https://opendev.org/openstack/placement/src/commit/bd5b19c00e1ab293fc157f4699bc4f4719731c25/.zuul.yaml#L83 Change-Id: I20d6339a5203aed058f432f68e2ec1af57030401 (cherry picked from commit 7b063e4d0518af3e57872bc0288a94edcd33c19d) (cherry picked from commit 64f5c1cfb0e7223603c06e22a204716919d05294) (cherry picked from commit baf0d93e0fafcd992d37543aa9df3f6dc248a738) (cherry picked from commit d218250eb53791012f49825140e2592dab89e69c) (cherry picked from commit 6e6c69f2f4a917b63ed5636d386d2c908268a7f0)
* | | [stable-only] Use Tempest's run upper constraints from devstackGhanshyam Mann2022-05-231-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Devstack creates the tempest virtual env with upper constraints set by TEMPEST_VENV_UPPER_CONSTRAINTS[1]. For stable/train, they are stable/train constraints but then nova run tempest in run_tests.sh then it does not set/pass the upper constraints so master constraints is used and it ends up recreating the tempest virtual env and fail the job. We should make sure that upper constraints used in nova tempest run and devstack is same. - https://zuul.opendev.org/t/openstack/build/f50f83571d4348e996e175ea5aad97f7/log/job-output.txt#5100 [1] https://github.com/openstack/devstack/blob/stable/train/stackrc#L320 Change-Id: Iad2d198c58512b26dc2733b97bedeffc00108656
* | | [stable-only] Drop lower-constraints jobElod Illes2022-04-223-188/+4
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During the PTG the TC discussed the topic and decided to drop the job completely. Since the latest job configuration broke all stable gate for nova (older than yoga) this is needed there to unblock our gates. For dropping the job on master let's wait to the resolution as the gate is not broken there, hence the patch is stable-only. Conflicts: .zuul.yaml lower-constraints.txt tox.ini NOTE(elod.illes): conflict is due to branch specific settings (job was set to non-voting, lower constraints changes, tox targets were refactored in ussuri). Another change in .zuul.yaml is due to requirements-check job runs now against ubuntu-focal, that breaks tools/test_setup.sh script (fix exists in victoria: I97b0dcbb88c6ef7c22e3c55970211bed792bbd0d). This patch pins the job locally for ubuntu-bionic nodeset. Change-Id: I514f6b337ffefef90a0ce9ab0b4afd083caa277e (cherry picked from commit 15b72717f2f3bd79791b913f1b294a19ced47ca7) (cherry picked from commit ba3c5b81abce49fb86981bdcc0013068b54d4f61) (cherry picked from commit 327693af402e4dd0c03fe247c4cee7beaedd2852) (cherry picked from commit 8ff36f184dd7aedf9adfdbdf8845504557e2bef5) (cherry picked from commit e0b030a1d2b7c63157fd2c621c1830f0867c53f8)
* | Ensure MAC addresses characters are in the same caseDmitriy Rabotyagov2022-01-212-3/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently neutron can report ports to have MAC addresses in upper case when they're created like that. In the meanwhile libvirt configuration file always stores MAC in lower case which leads to KeyError while trying to retrieve migrate_vif. Closes-Bug: #1945646 Change-Id: Ie3129ee395427337e9abcef2f938012608f643e1 (cherry picked from commit 6a15169ed9f16672c2cde1d7f27178bb7868c41f) (cherry picked from commit 63a6388f6a0265f84232731aba8aec1bff3c6d18) (cherry picked from commit 6c3d5de659e558e8f6ee353475b54ff3ca7240ee) (cherry picked from commit 28d0059c1f52e51add31bff50f1f6e443c938792) (cherry picked from commit 184a3c976faed38907af148a533bc6e9faa410f5)
* | address open redirect with 3 forward slashesSean Mooney2021-10-082-6/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ie36401c782f023d1d5f2623732619105dc2cfa24 was intended to address OSSA-2021-002 (CVE-2021-3654) however after its release it was discovered that the fix only worked for urls with 2 leading slashes or more then 4. This change adresses the missing edgecase for 3 leading slashes and also maintian support for rejecting 2+. Conflicts: nova/console/websocketproxy.py nova/tests/unit/console/test_websocketproxy.py NOTE(melwitt): The conflict and difference in websocketproxy.py from the cherry picked change: HTTPStatus.BAD_REQUEST => 400 is due to the fact that HTTPStatus does not exist in Python 2.7. The conflict in test_websocketproxy.py is because change I23ac1cc79482d0fabb359486a4b934463854cae5 (Allow TLS ciphers/protocols to be configurable for console proxies) is not in Train. The difference in test_websocketproxy.py from the cherry picked change is due to a difference in internal implementation [1] in Python < 3.6. See change I546d376869a992601b443fb95acf1034da2a8f36 for reference. [1] https://github.com/python/cpython/commit/34eeed42901666fce099947f93dfdfc05411f286 Change-Id: I95f68be76330ff09e5eabb5ef8dd9a18f5547866 co-authored-by: Matteo Pozza Closes-Bug: #1927677 (cherry picked from commit 6fbd0b758dcac71323f3be179b1a9d1c17a4acc5) (cherry picked from commit 47dad4836a26292e9d34e516e1525ecf00be127c) (cherry picked from commit 9588cdbfd4649ea53d60303f2d10c5d62a070a07) (cherry picked from commit 0997043f459ac616b594363b5b253bd0ae6ed9eb)
* | Reject open redirection in the console proxymelanie witt2021-10-083-0/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NOTE(melwitt): This is the combination of two commits, the bug fix and a followup change to the unit test to enable it also run on Python < 3.6. Our console proxies (novnc, serial, spice) run in a websockify server whose request handler inherits from the python standard SimpleHTTPRequestHandler. There is a known issue [1] in the SimpleHTTPRequestHandler which allows open redirects by way of URLs in the following format: http://vncproxy.my.domain.com//example.com/%2F.. which if visited, will redirect a user to example.com. We can intercept a request and reject requests that pass a redirection URL beginning with "//" by implementing the SimpleHTTPRequestHandler.send_head() method containing the vulnerability to reject such requests with a 400 Bad Request. This code is copied from a patch suggested in one of the issue comments [2]. Closes-Bug: #1927677 [1] https://bugs.python.org/issue32084 [2] https://bugs.python.org/issue32084#msg306545 Conflicts: nova/tests/unit/console/test_websocketproxy.py NOTE(melwitt): The conflict is because change I23ac1cc79482d0fabb359486a4b934463854cae5 (Allow TLS ciphers/protocols to be configurable for console proxies) is not in Train. NOTE(melwitt): The difference from the cherry picked change: HTTPStatus.BAD_REQUEST => 400 is due to the fact that HTTPStatus does not exist in Python 2.7. Reduce mocking in test_reject_open_redirect for compat This is a followup for change Ie36401c782f023d1d5f2623732619105dc2cfa24 to reduce mocking in the unit test coverage for it. While backporting the bug fix, it was found to be incompatible with earlier versions of Python < 3.6 due to a difference in internal implementation [1]. This reduces the mocking in the unit test to be more agnostic to the internals of the StreamRequestHandler (ancestor of SimpleHTTPRequestHandler) and work across Python versions >= 2.7. Related-Bug: #1927677 [1] https://github.com/python/cpython/commit/34eeed42901666fce099947f93dfdfc05411f286 Change-Id: I546d376869a992601b443fb95acf1034da2a8f36 (cherry picked from commit 214cabe6848a1fdb4f5941d994c6cc11107fc4af) (cherry picked from commit 9c2f29783734cb5f9cb05a08d328c10e1d16c4f1) (cherry picked from commit 94e265f3ca615aa18de0081a76975019997b8709) (cherry picked from commit d43b88a33407b1253e7bce70f720a44f7688141f) Change-Id: Ie36401c782f023d1d5f2623732619105dc2cfa24 (cherry picked from commit 781612b33282ed298f742c85dab58a075c8b793e) (cherry picked from commit 470925614223c8dd9b1233f54f5a96c02b2d4f70) (cherry picked from commit 6b70350bdcf59a9712f88b6435ba2c6500133e5b) (cherry picked from commit 719e651e6be277950632e0c2cf5cc9a018344e7b)
* | [stable-only] Pin virtualenv and setuptoolsBalazs Gibizer2021-10-051-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Setuptools 58.0 (bundled in virtualenv 20.8) breaks the installation of decorator 3.4.0. So this patch pins virtualenv to avoid the break. As the used 'require' feature was introduced in tox in version 3.2 [1], the required minversion has to be bumped, too. [1] https://tox.readthedocs.io/en/latest/config.html#conf-requires Conflicts: tox.ini NOTE(melwitt): The conflict is because change Ie1a0cbd82a617dbcc15729647218ac3e9cd0e5a9 (Stop testing Python 2) is not in Train. Change-Id: I26b2a14e0b91c0ab77299c3e4fbed5f7916fe8cf (cherry picked from commit b27f8e9adfcf2db3c83722c42e055ba5065ad06e)
* | Merge "Raise InstanceMappingNotFound if StaleDataError is encountered" into ↵Zuul2021-08-262-2/+20
|\ \ | | | | | | | | | stable/train
| * | Raise InstanceMappingNotFound if StaleDataError is encounteredmelanie witt2021-02-232-2/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a race where if a user issues a delete request while an instance is in the middle of booting, we could fail to update the 'queued_for_delete' field on the instance mapping with: sqlalchemy.orm.exc.StaleDataError: UPDATE statement on table 'instance_mappings' expected to update 1 row(s); 0 were matched. This happens if we've retrieved the instance mapping record from the database and then it gets deleted by nova-conductor before we attempt to save() it. This handles the situation by adding try-except around the update call to catch StaleDataError and raise InstanceMappingNotFound instead, which the caller does know how to handle. Closes-Bug: #1882608 Change-Id: I2cdcad7226312ed81f4242c8d9ac919715524b48 (cherry picked from commit 16df22dcd57a73fe3be15c64c41b4081b4826ef2) (cherry picked from commit 812ce632d50bfc32de62d544746e0b9a83d90ab7)
* | | Merge "Move 'check-cherry-picks' test to gate, n-v check" into stable/trainZuul2021-08-153-8/+23
|\ \ \
| * | | Move 'check-cherry-picks' test to gate, n-v checkStephen Finucane2021-06-183-8/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This currently runs in the 'check' pipeline, as part of the pep8 job, which causes otherwise perfectly valid backports to report as failing CI. There's no reason a stable core shouldn't be encouraged to review these patches: we simply want to prevent them *merging* before their parent(s). Resolve this conflict by moving the check to separate voting job in the 'gate' pipeline as well as a non-voting job in the 'check' pipeline to catch more obvious issues. Change-Id: Id3e4452883f6a3cf44ff58b39ded82e882e28c23 Signed-off-by: Stephen Finucane <stephenfin@redhat.com> (cherry picked from commit 98b01c9a59df4912f5a162c2c52d1f00c84d24c2) (cherry picked from commit fef0305abefbf165fecb883f03bce97f525a790a) (cherry picked from commit b7677ae08ae151858ecb0e67039e54bb3df89700) (cherry picked from commit 91314f7fbba312d4438fa446804f692d316512a8)
* | | | only wait for plugtime events in pre-live-migrationSean Mooney2021-08-123-9/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change modifies _get_neutron_events_for_live_migration to filter the event to just the subset that will be sent at plug-time. Currently neuton has a bug where by the dhcp agent send a network-vif-plugged event during live migration after we update the port profile with "migrating-to:" this cause a network-vif-plugged event to be sent for configuration where vif_plugging in nova/os-vif is a noop. when that is corrected the current logic in nova cause the migration to time out as its waiting for an event that will never arrive. This change filters the set of events we wait for to just the plug time events. Conflicts: nova/compute/manager.py nova/tests/unit/compute/test_compute_mgr.py Related-Bug: #1815989 Closes-Bug: #1901707 Change-Id: Id2d8d72d30075200d2b07b847c4e5568599b0d3b (cherry picked from commit 8b33ac064456482158b23c2a2d52f819ebb4c60e) (cherry picked from commit ef348c4eb3379189f290217c9351157b1ebf0adb) (cherry picked from commit d9c833d5a404dfa206e08c97543e80cb613b3f0b)
* | | | [neutron] Get only ID and name of the SGs from NeutronSlawek Kaplonski2021-07-091-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During the VM booting process Nova asks Neutron for the security groups of the project. If there are no any fields specified, Neutron will prepare list of security groups with all fields, including rules. In case if project got many SGs, it may take long time as rules needs to be loaded separately for each SG on Neutron's side. During booting of the VM, Nova really needs only "id" and "name" of the security groups so this patch limits request to only those 2 fields. This lazy loading of the SG rules was introduced in Neutron in [1] and [2]. [1] https://review.opendev.org/#/c/630401/ [2] https://review.opendev.org/#/c/637407/ Related-Bug: #1865223 Change-Id: I15c3119857970c38541f4de270bd561166334548 (cherry picked from commit 388498ac5fa15ed8deef06ec061ea47e4a1b7377) (cherry picked from commit 4f49545afaf3cd453796d48ba96b9a82d11c01bf) (cherry picked from commit f7d84db5876b30d6849877799c08ebc65ac077ca) (cherry picked from commit be4a514c8aea073a9188cfc878c9afcc9b03cb28)
* | | | Error anti-affinity violation on migrationsRodrigo Barbieri2021-06-293-24/+211
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Error-out the migrations (cold and live) whenever the anti-affinity policy is violated. This addresses violations when multiple concurrent migrations are requested. Added detection on: - prep_resize - check_can_live_migration_destination - pre_live_migration The improved method of detection now locks based on group_id and considers other migrations in-progress as well. Closes-bug: #1821755 Change-Id: I32e6214568bb57f7613ddeba2c2c46da0320fabc (cherry picked from commit 33c8af1f8c46c9c37fcc28fb3409fbd3a78ae39f) (cherry picked from commit 8b62a4ec9bf617dfb2da046c25a9f76b33516508) (cherry picked from commit 6ede6df7f41db809de19e124d3d4994180598f19) (cherry picked from commit bf90a1e06181f6b328b967124e538c6e2579b2e5)
* | | | Merge "Use absolute path during qemu img rebase" into stable/trainZuul2021-06-252-6/+33
|\ \ \ \
| * | | | Use absolute path during qemu img rebaseBalazs Gibizer2021-06-252-6/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During an assisted volume snapshot delete request from Cinder nova removes the snapshot from the backing file chain. During that nova checks the existence of such file. However in some cases (see the bug report) the path is relative and therefore os.path.exists fails. This patch makes sure that nova uses the volume absolute path to make the backing file path absolute as well. Closes-Bug #1885528 Change-Id: I58dca95251b607eaff602783fee2fc38e2421944 (cherry picked from commit b9333125790682f9d60bc74fdbb12a098565e7c2) (cherry picked from commit 831abc9f83a2d3f517030f881e7da724417fea93) (cherry picked from commit c2044d4bd0919860aa2d49687ba9c6ef6f7d37e8)