summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Merge "Lowercase ironic driver hash ring and ignore case in cache" into ↵pike-eolstable/pikeZuul2021-04-102-4/+36
|\ | | | | | | stable/pike
| * Lowercase ironic driver hash ring and ignore case in cachemelanie witt2020-11-042-4/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recently we had a customer case where attempts to add new ironic nodes to an existing undercloud resulted in half of the nodes failing to be detected and added to nova. Ironic API returned all of the newly added nodes when called by the driver, but half of the nodes were not returned to the compute manager by the driver. There was only one nova-compute service managing all of the ironic nodes of the all-in-one typical undercloud deployment. After days of investigation and examination of a database dump from the customer, we noticed that at some point the customer had changed the hostname of the machine from something containing uppercase letters to the same name but all lowercase. The nova-compute service record had the mixed case name and the CONF.host (socket.gethostname()) had the lowercase name. The hash ring logic adds all of the nova-compute service hostnames plus CONF.host to hash ring, then the ironic driver reports only the nodes it owns by retrieving a service hostname from the ring based on a hash of each ironic node UUID. Because of the machine hostname change, the hash ring contained, for example: {'MachineHostName', 'machinehostname'} when it should have contained only one hostname. And because the hash ring contained two hostnames, the driver was able to retrieve only half of the nodes as nodes that it owned. So half of the new nodes were excluded and not added as new compute nodes. This adds lowercasing of hosts that are added to the hash ring and ignores case when comparing the CONF.host to the hash ring members to avoid unnecessary pain and confusion for users that make hostname changes that are otherwise functionally harmless. This also adds logging of the set of hash ring members at level DEBUG to help enable easier debugging of hash ring related situations. Closes-Bug: #1866380 Change-Id: I617fd59de327de05a198f12b75a381f21945afb0 (cherry picked from commit 7145100ee4e732caa532d614e2149ef2a545287a) (cherry picked from commit 588b0484bf6f5fe41514f1428aeaf5613635e35a) (cherry picked from commit 8f8667a8dd0e453eaef8f75a3fff25db62d4cc17) (cherry picked from commit 019e3da75bc6fb171b32a012ce339075fe690ca7) (cherry picked from commit 620e5da840e50aa8a61030b10081821dc7653b94)
* | Merge "Include only required fields in ironic node cache" into stable/pikeZuul2021-04-103-159/+175
|\ \ | |/
| * Include only required fields in ironic node cacheMark Goddard2020-10-013-159/+175
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ironic virt driver maintains a cache of ironic nodes to avoid continually polling the ironic API. Code paths requiring a specific node use a limited set of fields, _NODE_FIELDS, when querying the ironic API for the node. This reduces the memory footprint required by the cache, and the network traffic required to populate it. However, in most cases the cache is populated using a detailed node list operation in _refresh_cache(), which includes all node fields. This change specifies _NODE_FIELDS in the node list operation in _refresh_cache(). We also modify the unit tests to use fake node objects that are representative of the nodes in the cache. Conflicts: nova/tests/unit/virt/ironic/test_driver.py nova/tests/unit/virt/ironic/utils.py NOTE(melwitt): The conflicts are because the following changes: I4065b61edff8bfd66a163c9ccf19833316fdca8e (Implement get_traits() for the ironic virt driver) I1f9056f66519b9ca2f4e23143559735f2bff8943 (Regenerate and pass configdrive when rebuild Ironic nodes) are not in Pike. Change-Id: Id96e7e513f469b87992ddae1431cce714e91ed16 Related-Bug: #1746209 (cherry picked from commit 8bbad196a7f6a6e2ea093aeee87dfde2154c9358) (cherry picked from commit f5b6dc603c5f6f72ab35ef21ab9e35ec982e3219)
* | Merge "Add resource_class to fields in ironic node cache" into stable/pikeZuul2021-04-101-1/+1
|\ \ | |/
| * Add resource_class to fields in ironic node cacheMark Goddard2020-10-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Per the discussion in [1], the ironic nodes added to the node cache in the ironic virt driver may be missing the required field resource_class, as this field is not in _NODE_FIELDS. In practice, this is typically not an issue (possibly never), as the normal code path uses a detailed list to sync all ironic nodes, which contain all fields (including resource_class). However, some code paths use a single node query with the fields limited to _NODE_FIELDS, so could result in a node in the cache without a resource_class. This change adds resource_class to _NODE_FIELDS. [1] https://review.openstack.org/#/c/532288/9/nova/virt/ironic/driver.py@79 Conflicts: nova/virt/ironic/driver.py NOTE(melwitt): Conflict is because change I4065b61edff8bfd66a163c9ccf19833316fdca8e (Implement get_traits() for the ironic virt driver) is not in Pike. Change-Id: Id84b4a47d05532d341a9b6ca2de7e9e66e1930da Closes-Bug: #1746209 (cherry picked from commit 5895566a428be4c30c31ae94070282566a6cc568) (cherry picked from commit 6b11eb794f5881c9c8ed284b2edfa4c31ed62b2b)
* | Merge "Update resources once in update_available_resource" into stable/pikeZuul2021-03-182-13/+13
|\ \
| * | Update resources once in update_available_resourceMaciej Józefczyk2021-03-162-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change ensures that resources are updated only once per update_available_resource() call. Compute resources were previously updated during host object initialization and at the end of update_available_resource(). It could cause inconsistencies in resource tracking between compute host and DB for couple of second when final _update() at the end of update_available_resource() is being called. For example: nova-api shows that host uses 10GB of RAM, but in fact its 12GB because DB doesn't have resources that belongs to shutdown instance. Because of that fact nova-scheduler (CachingScheduler) could choose (based on imcomplete information) host which is already full. For more informations please see realted bug: #1729621 Conflicts: nova/tests/unit/compute/test_resource_tracker.py NOTE: The conflict is due to the backporting order of this patch and I9fa1d509a3de405d6246fb8670612c65c10cc93b to pike has changed. This caused the conflict in first place, and also modification needed in test_compute_node_create_fail_retry_works(), which is now exactly the same as it was originally implemented in queens by the patch mentioned above. Change-Id: I120a98cc4c11772f24099081ef3ac44a50daf71d Closes-Bug: #1729621 (cherry picked from commit c9b74bcfa09d11c2046ce1bfb6dd8463b3a2f3b0) (cherry picked from commit 36d93675d9a6bf903ed64c216243c74a639a2087)
* | | Merge "rt: Make resource tracker always invoking get_inventory()" into ↵Zuul2021-03-182-15/+32
|\ \ \ | |/ / | | | | | | stable/pike
| * | rt: Make resource tracker always invoking get_inventory()Jianghua Wang2021-03-012-15/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a stable/pike only backport from queens. This change originally was made of part of the feature bp add-support-for-vgpu. Bit it is actually fixing a bug in the resource tracker and not adding any VGPU specific code. I amended the commit message and the code comments to adapt it to pike where VGPU is not a thing. The ironic custom resource stat is not saved in the compute_nodes. Instead it's reported by drivers' get_inventory() interface and the inventory data is saved via placement. But the resource tracker will skip invoking get_inventory() if the resource in compute_node is not changed. It will cause the ironic customer resource is not added to placement if other compute node resources are not changed. This commit is to change resource tracker to always proceed to invoke the function of get_inventory and use scheduler_client interfaces to update inventory to placement. The scheduler_client will ensure the update request to placement only happen when inventory is changed via comparing with the local cached inventory. Change-Id: I6e204fe8e7c003246c9d8bebf484323700737093 (cherry picked from commit e2a18a37190e4c7b7697a8811553d331e208182c)
* | | [stable-only] Move grenade jobs to experimentalElod Illes2021-03-171-29/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Grenade jobs in stable/pike are failing as stable/ocata's devstack is broken and no fix is available. This patch moves out the grenade jobs from check and gate jobs to experimental. In this way we are not wasting resources, but still there is the option to run grenade jobs in case ocata is somehow fixed. Change-Id: I8fceeb6dd3ce1a1e888dfe68f4e0910009d506b3
* | | [stable-only] gate: Pin CEPH_RELEASE to nautilus in LM hookLee Yarwood2021-03-162-0/+6
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I1edd5a50079f325fa143a7e0d51b3aa3bb5ed45d moved the branchless devstack-plugin-ceph project to the Octopus release of Ceph that drops support for py2. As this was still the default on stable/train this breaks the nova-live-migration and nova-grenade jobs. This change works around this by pinning the CEPH_RELEASE to nautilus within the LM hook as was previously used prior to the above landing. Note that the devstack-plugin-ceph-tempest job from the plugin repo continues to pass as it is correctly pinned to the Luminous release that supports py2. If anything the above enforces the need to move away from these hook scripts and instead inherit our base ceph jobs from this repo in the future to avoid the Ceph release jumping around like this. Change-Id: I1d029ebe78b16ed2d4345201b515baf3701533d5 (cherry picked from commit ff570d1b4e9b9777405ae75cc09eae2ce255bf19) (cherry picked from commit 436e8172f65193e177a4a12780f752dbc7e88b39) (cherry picked from commit 238c83a2f778a9c2f0abbe318dda97b715926565) (cherry picked from commit 5aa7f3b4e6241e21c69dc103bb7be847b500429c)
* | Merge "[placement] Add status and links fields to version document at /" ↵Zuul2021-02-255-1/+37
|\ \ | | | | | | | | | into stable/pike
| * | [placement] Add status and links fields to version document at /Chris Dent2020-09-115-1/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the spec [1] the version discovery doc must have a status and links for each version. For the primary version the status value should be 'CURRENT'. For placement the version discovery doc and "self" are the same thing, so the provided "self" href looks redundant, but it makes keystoneauth1 happy when doing version discovery. In placement, since there is only one version at the moment, set status to CURRENT. Add a gabbi test that verifies the presence of both fields and values. Without these fields, use of placement with a client that follows the documented version discovery process will fail to work. As the version doc is not considered microversioned[2] and in any case this is making version discovery work where it didn't before, this is not a candidate for a microversion and can be backported to the beginning of placement's history if we like. I've updated the api-ref docs. In the process I made the max microversion in the sample discovery doc a bit more realistic and in alignment with these modern times. Changes: placement-api-ref/source/get-root.json NOTE(stephenfin): Modified the root API sample to reflect the max placement API version in Pike. [1] http://specs.openstack.org/openstack/api-wg/guidelines/microversion_specification.html#version-discovery [2] http://eavesdrop.openstack.org/irclogs/%23openstack-sdks/%23openstack-sdks.2018-06-13.log.html#t2018-06-13T13:40:12 Change-Id: Ie602ab1768efbf103563d8f6b9d28965fc81021a Closes-Bug: #1776668 (cherry picked from commit 1a5a3a9bc8409349ab817b4858ee54bf2a036dab) (cherry picked from commit df1542686349a97ab1527f80c81861d89aaf4f78)
* | | [stable-only] Cap bandit to 1.6.2Lee Yarwood2020-12-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 1.6.3 [1] release has dropped support for py2 [2] so cap to 1.6.2 when using py2. [1] https://github.com/PyCQA/bandit/releases/tag/1.6.3 [2] https://github.com/PyCQA/bandit/pull/615 Depends-On: https://review.opendev.org/c/openstack/swift/+/766495 Conflicts: test-requirements.txt Closes-Bug: #1907438 Change-Id: Ie5221bf37c6ed9268a4aa0737ffcdd811e39360a
* | | Follow up for cherry-pick check for merge patchmelanie witt2020-11-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a follow up to change I8e4e5afc773d53dee9c1c24951bb07a45ddc2f1a which fixed an issue with validation when the topmost patch after a Zuul rebase is a merge patch. We need to also use the $commit_hash variable for the check for stable-only patches, else it will incorrectly fail because it is checking the merge patch's commit message. Change-Id: Ia725346b65dd5e2f16aa049c74b45d99e22b3524 (cherry picked from commit 1e10461c71cb78226824988b8c903448ba7a8a76) (cherry picked from commit f1e4f6b078baf72e83cd7341c380aa0fc511519e) (cherry picked from commit e676a480544b3fa71fcaa984a658e2131b7538c5) (cherry picked from commit 115b43ed3e9514d9e4fb41da5582f0b185ecd10a) (cherry picked from commit cde42879a497cd2b91f0cf926e0417fda07b3c31) (cherry picked from commit 3c774435502a339f202e94ae15d637e49a19d4ce) (cherry picked from commit 70cbd1535a62a2323726b84229b78d0b1d09d710)
* | | Merge "Removed the host FQDN from the exception message" into stable/pikeZuul2020-10-063-3/+3
|\ \ \ | |_|/ |/| |
| * | Removed the host FQDN from the exception messagePraharshitha Metla2020-09-173-3/+3
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Deletion of an instance after disabling the hypervisor by a non-admin user leaks the host fqdn in fault msg of instance.Removing the 'host' field from the error message of HypervisorUnavaiable cause it's leaking host fqdn to non-admin users. The admin user will see the Hypervisor unavailable exception msg but will be able to figure on which compute host the guest is on and that the connection is broken. Change-Id: I0eae19399670f59c17c9a1a24e1bfcbf1b514e7b Closes-Bug: #1851587 (cherry picked from commit a89ffab83261060bbb9dedb2b8de6297b2d07efd) (cherry picked from commit ff82601204e9d724b3032dc94c49fa5c8de2699b) (cherry picked from commit c5abbd17b5552209e53ad61713c4787f47f463c6) (cherry picked from commit d5ff9f87c8af335e1f83476319a2540fead5224c) (cherry picked from commit 8c4af53d7754737f6857c25820a256487c45e676) (cherry picked from commit 4efdf632bcafb65d9725e1077bb249529db40015)
* | libvirt: Provide VIR_MIGRATE_PARAM_PERSIST_XML during live migrationLee Yarwood2020-09-252-6/+93
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The VIR_MIGRATE_PARAM_PERSIST_XML parameter was introduced in libvirt v1.3.4 and is used to provide the new persistent configuration for the destination during a live migration: https://libvirt.org/html/libvirt-libvirt-domain.html#VIR_MIGRATE_PARAM_PERSIST_XML Without this parameter the persistent configuration on the destination will be the same as the original persistent configuration on the source when the VIR_MIGRATE_PERSIST_DEST flag is provided. As Nova does not currently provide the VIR_MIGRATE_PARAM_PERSIST_XML param but does provide the VIR_MIGRATE_PERSIST_DEST flag this means that a soft reboot by Nova of the instance after a live migration can revert the domain back to the original persistent configuration from the source. Note that this is only possible in Nova as a soft reboot actually results in the virDomainShutdown and virDomainLaunch libvirt APIs being called that recreate the domain using the persistent configuration. virDomainReboot does not result in this but is not called at this time. The impact of this on the instance after the soft reboot is pretty severe, host devices referenced in the original persistent configuration on the source may not exist or could even be used by other users on the destination. CPU and NUMA affinity could also differ drastically between the two hosts resulting in the instance being unable to start etc. As MIN_LIBVIRT_VERSION is now > v1.3.4 this change simply includes the VIR_MIGRATE_PARAM_PERSIST_XML param using the same updated XML for the destination as is already provided to VIR_MIGRATE_PARAM_DEST_XML. Conflicts: nova/tests/unit/virt/libvirt/test_driver.py nova/virt/libvirt/driver.py NOTE(melwitt): Conflicts in driver.py are because changes: I6ac601e633ab2b0a67b4802ff880865255188a93 (libvirt: Provide VGPU inventory for a single GPU type) I947bf0ad34a48e9182a3dc016f47f0c9f71c9d7b ([libvirt] Allow multiple volume attachments) Ibfa64f18bbd2fb70db7791330ed1a64fe61c1355 (libvirt: QEMU native LUKS decryption for encrypted volumes) If2035cac931c42c440d61ba97ebc7e9e92141a28 (libvirt: Rework 'EBUSY' (SIGKILL) error handling code path) Ibf210dd27972fed2651d6c9bd73a0bcf352c8bab (libvirt: create vGPU for instance) are not in Pike. Conflict in test_driver.py is because the Pike backport of change I9b545ca8aa6dd7b41ddea2d333190c9fbed19bc1 explicitly asserts byte string destination_xml in _test_live_migration_block_migration_flags and the change is not in Queens where this is being backported from. Co-authored-by: Tadayoshi Hosoya <tad-hosoya@wr.jp.nec.com> Closes-Bug: #1890501 Change-Id: Ia3f1d8e83cbc574ce5cb440032e12bbcb1e10e98 (cherry picked from commit 1bb8ee95d4c3ddc3f607ac57526b75af1b7fbcff) (cherry picked from commit bbf9d1de06e9991acd968fceee899a8df3776d60) (cherry picked from commit 6a07edb4b29d8bfb5c86ed14263f7cd7525958c1) (cherry picked from commit b9ea91d17703f5b324a50727b6503ace0f4e95eb) (cherry picked from commit c438fd9a0eb1903306a53ab44e3ae80660d8a429) (cherry picked from commit a721ca5f510ce3c8ef24f22dac9e475b3d7651db)
* Merge "Fix os-simple-tenant-usage result order" into stable/pikeZuul2020-08-188-16/+204
|\
| * Fix os-simple-tenant-usage result orderLucian Petrut2020-08-038-16/+204
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nova usage-list can return incorrect results, having resources counted twice. This only occurs when using the 2.40 microversion or later. This microversion introduced pagination, which doesn't work properly. Nova API will sort the instances using the tenant id and instance uuid, but 'os-simple-tenant-usage' will not preserve the order when returning the results. For this reason, subsequent API calls made by the client will use the wrong marker (which is supposed to be the last instance id), ending up counting the same instances twice. NOTE(melwitt): The differences from the Queens change in the sample .tpl and .json files are because change I3b25debb0bcfd4e211734307c8d363f2b5dbc655 is not in Pike, so there are only two generated UUIDs per server (instance UUID and vif UUID) instead of three (instance UUID, vif UUID, and bdm UUID). Change-Id: I6c7a67b23ec49aa207c33c38580acd834bb27e3c Closes-Bug: #1796689 (cherry picked from commit afc3a16ce3364c233e6e1cffc9f38987d1d65318) (cherry picked from commit 133b194ba079abe84900d09a5c3c74ef9f464bab) (cherry picked from commit 70b4cdce68f9b1543c032aa700e4f0f4289d90a6)
* | Merge "Fix false ERROR message at compute restart" into stable/pikeZuul2020-08-033-2/+76
|\ \ | |/ |/|
| * Fix false ERROR message at compute restartBalazs Gibizer2020-04-183-2/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an empty compute is restarted a false ERROR message was printed in the log as the placement report client does not distinguish between error from placement from empty allocation dict from placement. This patch changes get_allocations_for_resource_provider to return None in case of error instead of an empty dict. This is in line with @safe_connect that would make the call return None as well. The _error_out_instances_whose_build_was_interrupted also is changed to check for None instead of empty dict before reporting the ERROR. The only other caller of get_allocations_for_resource_provider was already checking for None and converting it to an empty dict so from that caller perspective this is compatible change on the report client. This is stable only change as get_allocations_for_resource_provider was improved during stein[1] to raise on placement error. [1]I020e7dc47efc79f8907b7bfb753ec779a8da69a1 Conflicts: nova/compute/manager.py NOTE(mriedem): The conflict and changes to test_compute_mgr.py are due to not having change I7891b98f225f97ad47f189afb9110ef31c810717 in Pike which added the context argument to method get_allocations_for_resource_provider. Change-Id: I6042e493144d4d5a29ec6ab23ffed6b3e7f385fe Closes-Bug: #1852759 (cherry picked from commit 64f797a0514b0276540d4f6c28cb290383088e35) (cherry picked from commit 4fcb7816bc88fd513debe70b95aa60bff74e37fb)
* | Merge "Check cherry-pick hashes in pep8 tox target" into stable/pikeZuul2020-07-272-0/+43
|\ \
| * | Check cherry-pick hashes in pep8 tox targetDan Smith2020-07-212-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NOTE(elod.illes): This is a combination of 2 commits: the cherry-pick hash checker script and a fix for the script. 1. Check cherry-pick hashes in pep8 tox target This adds a tools/ script that checks any cherry-picked hashes on the current commit (or a provided commit) to make sure that all the hashes exist on at least master or stable/.* branches. This should help avoid accidentally merging stable backports where one of the hashes along the line has changed due to conflicts. 2. Fix cherry-pick check for merge patch Cherry-pick check script validates the proposed patch's commit message. If a patch is not on top of the given branch then Zuul rebases it to the top and the patch becomes a merge patch. In this case the script validates the merge patch's commit message instead of the original patch's commit message and fails. This fix selects the parent of the patch if it is a merge patch. (cherry picked from commit c7c48c6f52c9159767b60a4576ba37726156a5f7) (cherry picked from commit 02f213b831d8e1d4a1d8ebb18d1260571fe20b84) (cherry picked from commit 7a5111ba2943014b6fd53a5fe7adcd9bc445315e) Change-Id: I4afaa0808b75cc31a8dd14663912c162281a1a42 (cherry picked from commit aebc829c4e0d39a160eaaa5ad949c1256c8179e6) (cherry picked from commit 5cacfaab82853241022d3a2a0734f82dae59a34b) (cherry picked from commit d307b964ce380f2fa57debc6c4c8346ac8736afe) (cherry picked from commit c3dd9f86f13a86c901443054de5ad3ab57953901) (cherry picked from commit f1ab10b8252bab4123c55a9fc10583710f347cef) (cherry picked from commit e605600d9e45f1b88ecc5c68c1ec4e24ceea6bd5)
* | | libvirt: Do not reraise DiskNotFound exceptions during resizejichenjc2020-07-212-5/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an instance has VERIFY_RESIZE status, the instance disk on the source compute host has moved to <instance_path>/<instance_uuid>_resize folder, which leads to disk not found errors if the update available resource periodic task on the source compute runs before resize is actually confirmed. Icec2769bf42455853cbe686fb30fda73df791b25 almost fixed this issue but it will only set reraise to False when task_state is not None, that isn't the case when an instance is resized but resize is not yet confirmed. This patch adds a condition based on vm_state to ensure we don't reraise DiskNotFound exceptions while resize is not confirmed. Closes-Bug: 1774249 Co-Authored-By: Vladyslav Drok <vdrok@mirantis.com> Change-Id: Id687e11e235fd6c2f99bb647184310dfdce9a08d (cherry picked from commit 966192704c20d1b4e9faf384c8dafac8ea6e06ea) (cherry picked from commit f1280ab849d20819791f7c4030f570a917d3e91d) (cherry picked from commit fd5c45473823105d8572d7940980163c6f09169c) (cherry picked from commit d3bdeb26155c2d3b53850b790d3800a2dd78cada)
* | | Clean up allocation if unshelve fails due to neutronBalazs Gibizer2020-07-132-11/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When port binding update fails during unshelve of a shelve offloaded instance compute manager has to catch the exception and clean up the destination host allocation. Change-Id: I4c3fbb213e023ac16efc0b8561f975a659311684 Closes-Bug: #1862633 (cherry picked from commit e65d4a131a7ebc02261f5df69fa1b394a502f268) (cherry picked from commit e6b749dbdd735e2cd0054654b5da7a02280a080b) (cherry picked from commit 405a35587a2291e3cf9eb4efc8f102c91bb4ef76) (cherry picked from commit aeeab5d064492e112cd626a2988a6808250fb029) (cherry picked from commit 9a073c9edc993525e896f67eeda1639a248fe2df)
* | | Reproduce bug 1862633Balazs Gibizer2020-07-131-0/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If port update fails during unshelve of an offloaded server then placement allocation on the target host is leaked. Changes in test_bug_1862633.py due to: * the NeutronFixture improvement done in Id8d2c48c9c864554a917596e377d30515465fec1 is missing from stable/pike therefore the fault injection mock needed to be moved to a higher level function. * the Ie4676eed0039c927b35af7573f0b57fd762adbaa refactor is also missing and causing the name change of wait_for_versioned_notification Change-Id: I7be32e4fc2e69f805535e0a437931516f491e5cb Related-Bug: #1862633 (cherry picked from commit c33ebdafbd633578a0a4b6f1b118c756510acea6) (cherry picked from commit bd1bfc13d7e2c418afc409871ab56da454a1334d) (cherry picked from commit f960d1751d752d559ea18604bfd1fcaf1a3283cd) (cherry picked from commit 5e452f8eb743c226af4f4998835ece8dd142a011) (cherry picked from commit eb4f0a5aa93feb2dc7730987207f052a04bc33db)
* | | Merge "Init HostState.failed_builds" into stable/pikeZuul2020-07-122-0/+3
|\ \ \ | |/ / |/| |
| * | Init HostState.failed_buildsMatt Riedemann2020-04-182-0/+3
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | If _update_from_compute_node returns early and the HostState is not filtered out we can hit an AttributeError in the BuildFailureWeigher because the failed_builds attribute is not set. This simply initializes the attribute like the other stats fields. Change-Id: I5f8e4d32c6a1d6b61396b4fa11c5d776f432df0c Closes-Bug: #1834691 (cherry picked from commit d540903463aa9b0cf69cefac7cc60e5b70e40a1c) (cherry picked from commit 725b37f515e4ad01e1f4491b6d8137ce1416f6d6) (cherry picked from commit 5acbea506a137492511a762e454d785810365bd8) (cherry picked from commit 9bc95675325498a7d30b67089e19b5b953d77e75)
* | Fix os_CODENAME detection and repo refresh during ceph testsLee Yarwood2020-07-101-0/+1
|/ | | | | | | | | | | | | | | | | | | | | | This is a partial backport of Iea6288fe6d341ee92f87a35e0b0a59fe564ab96c a change introduced in Stein that fixed OS detection in the nova-live-migration job while installing Ceph. This is required after the devstack-plugin-ceph job was recently refactored by I51c90e592070b99422e692d5e9e97083d93146e8 and in doing so introduced various OS specific conditionals that require os_CODENAME to be set. NOTE(lyarwood): Conflicts due to the partial backport of I902e459093af9b82f9033d58cffcb2a628f5ec39 in stable/queens. Conflicts: nova/tests/live_migration/hooks/run_tests.sh Change-Id: Iea6288fe6d341ee92f87a35e0b0a59fe564ab96c (cherry picked from commit 9b2a7f9e7c9c24ad5b698f78681a1de1593b4a53) (cherry picked from commit 7207b9ee6cff002c02b8cd46ce4088d86de8afdc) (cherry picked from commit 97cc4783aa9dfb5cfb550968222fd04050bc64a3)
* Merge "Improve metadata server performance with large security groups" into ↵Zuul2020-03-252-11/+45
|\ | | | | | | stable/pike
| * Improve metadata server performance with large security groupsDoug Wiegley2020-03-242-11/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't include the rules in the SG fetch in the metadata server, since we don't need them there, and with >1000 rules, it starts to get really slow, especially in Pike and later. Closes-Bug: #1851430 Co-Authored-By: Doug Wiegley <dougwig@parkside.io> Co-Authored-By: Matt Riedemann <mriedem.os@gmail.com> Conflicts: nova/tests/unit/network/security_group/test_neutron_driver.py NOTE(mriedem): The conflict is due to not having change I31c9ea8628c6f3985f8e9118d9687bbfb8789b68 in Pike. Change-Id: I7de14456d04370c842b4c35597dca3a628a826a2 (cherry picked from commit eaf16fdde59a14fb38df669b21a911a0c2d2576f) (cherry picked from commit 418af2d865809cfa907678f883dae07f4f31baa2) (cherry picked from commit fec95a2e4f763e15193504483383f918feb3e636) (cherry picked from commit 38b2f68a17533e839819e654825613aefd4effd4) (cherry picked from commit 00d438adb325610a04af9f8f18cdb1c622df5418)
* | Merge "Remove exp legacy-tempest-dsvm-full-devstack-plugin-nfs" into stable/pikeZuul2020-03-251-15/+0
|\ \
| * | Remove exp legacy-tempest-dsvm-full-devstack-plugin-nfsLuigi Toscano2020-03-241-15/+0
| |/ | | | | | | | | | | | | | | | | | | It is a legacy experimental job on a EM branch. See also the changset Ifd4387a02b3103e1258e146e63c73be1ad10030c Change-Id: Ib4f1cfe12bbd1172dcf2b413332f9a1e7fb0d1b0 (cherry picked from commit c68e22f32cff8ef3a6e7fe40a544736ab2928340)
* | Merge "Mask the token used to allow access to consoles" into stable/pikeZuul2020-03-254-7/+32
|\ \
| * | Mask the token used to allow access to consolesBalazs Gibizer2020-03-254-7/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hide the novncproxy token from the logs. Conflicts: nova/tests/unit/consoleauth/test_consoleauth.py NOTE: conflict is due to Iffdd4e251bfa2bac1bfd49498e32b738843709de is only backported till Queens. Co-Authored-By:paul-carlton2 <paul.carlton2@hp.com> Co-Authored-By:Tristan Cacqueray <tdecacqu@redhat.com> Change-Id: I5b8fa4233d297722c3af08176901d12887bae3de Closes-Bug: #1492140 (cherry picked from commit 26d4047e17eba9bc271f8868f1d0ffeec97b555e) (cherry picked from commit d7826bcd761af035f3f76f67c607dde2a1d04e48) (cherry picked from commit d8fbf04f325f593836f8d44b6bbf42b85bde94e3) (cherry picked from commit 08f1f914cc219cf526adfb08c46b8f40b4e78232) (cherry picked from commit 366515dcd1090ca2f9f303009c78394b5665ce1f)
* | | Avoid circular reference during serializationBalazs Gibizer2020-03-252-1/+54
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an instance with numa topology is re-scheduled the conductor migrate task blows with circular reference during request spec serialization. It happens because there are ovos in the request spec that jsonutils.dumps only serialize if requested explicitly. This patch makes the explicit request. This is a stable only bug fix as the borken code was removed in Stein by the feature patch I4244f7dd8fe74565180f73684678027067b4506e Conflicts: nova/tests/unit/conductor/tasks/test_migrate.py The unit test case was re-implemented the test refactoring in I57568e9a01664ee373ea00a8db3164109c982909 is missing from pike. Closes-Bug: #1864665 Change-Id: I1942bfa9bd1baf8738d34c287216db7b59000a36 (cherry picked from commit 3871b38fe03aee7a1ffbbdfdf8a60b8c09e0ba76) (cherry picked from commit 54ca5d9afb11867ea022464d7ecad9f1ce13e453)
* | Merge "pike-only: remove broken non-voting ceph jobs" into stable/pikeZuul2020-03-251-31/+0
|\ \
| * | pike-only: remove broken non-voting ceph jobsMatt Riedemann2020-03-241-31/+0
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ceph job is broken on pike and the dependent fix in devstack [1] has not gotten attention after several months. Since pike is in extended maintenance mode, the job is non-voting even if it were working, and there does not seem to be much desire in making this work again (it would still be non-voting and could easily break again), we might as well just remove it from pike testing. Related mailing list thread: [2] [1] https://review.opendev.org/684756/ [2] http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011632.html Change-Id: I9e153f86c81ed6d9f8d9682b66d6d5c7f7b25296 Closes-Bug: #1835627
* | Merge "rt: only map compute node if we created it" into stable/pikeZuul2020-03-252-33/+65
|\ \ | |/ |/|
| * rt: only map compute node if we created itMatt Riedemann2019-12-052-33/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If ComputeNode.create() fails, the update_available_resource periodic will not try to create it again because it will be mapped in the compute_nodes dict and _init_compute_node will return early but trying to save changes to that ComputeNode object later will fail because there is no id on the object, since we failed to create it in the DB. This simply reverses the logic such that we only map the compute node if we successfully created it. Some existing _init_compute_node testing had to be changed since it relied on the order of when the ComputeNode object is created and put into the compute_nodes dict in order to pass the object along to some much lower-level PCI tracker code, which was arguably mocking too deep for a unit test. That is changed to avoid the low-level mocking and assertions and just assert that _setup_pci_tracker is called as expected. Conflicts: nova/tests/unit/compute/test_resource_tracker.py NOTE(mriedem): The conflict and slight change to test test_compute_node_create_fail_retry_works is because I120a98cc4c11772f24099081ef3ac44a50daf71d is not in Pike. Change-Id: I9fa1d509a3de405d6246fb8670612c65c10cc93b Closes-Bug: #1839674 (cherry picked from commit f578146f372386e1889561cba33e95495e66ce97) (cherry picked from commit 648770bd6897aa2ec95df3ec55344d5803543f07) (cherry picked from commit 35273a844ab2dc2494f0166d9b8228ee302acd4f) (cherry picked from commit 5a3430983ab37aaed6bfc7ead0eb14121ffe69d3)
* | nova-live-migration: Wait for n-cpu services to come up after configuring CephMatt Riedemann2020-03-243-61/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously the ceph.sh script used during the nova-live-migration job would only grep for a `compute` process when checking if the services had been restarted. This check was bogus and would always return 0 as it would always match itself. For example: 2020-03-13 21:06:47.682073 | primary | 2020-03-13 21:06:47.681 | root 29529 0.0 0.0 4500 736 pts/0 S+ 21:06 0:00 /bin/sh -c ps aux | grep compute 2020-03-13 21:06:47.683964 | primary | 2020-03-13 21:06:47.683 | root 29531 0.0 0.0 14616 944 pts/0 S+ 21:06 0:00 grep compute Failures of this job were seen on the stable/pike branch where slower CI nodes appeared to struggle to allow Libvirt to report to n-cpu in time before Tempest was started. This in-turn caused instance build failures and the overall failure of the job. This change resolves this issue by switching to pgrep and ensuring n-cpu services are reported as fully up after a cold restart before starting the Tempest test run. NOTE(lyarwood): The following change is squashed here to avoid endless retries in the gate due to bug #1867380. Replace ansible --sudo with --become in live_migration/hooks scripts Ansible deprecated --sudo in 1.9 so this change replaces it with --become. NOTE(lyarwood): Conflict due to Ifbadce909393268b340b7a08c78a6faa2d7888b2 not being present in stable/pike. Conflicts: nova/tests/live_migration/hooks/ceph.sh Change-Id: I40f40766a7b84423c1dcf9d5ed58476b86d61cc4 (cherry picked from commit 7f16800f71f6124736382be51d9da234800f7618) (cherry picked from commit 18931544d8a57953c6ce9ee4bf4bcc7a4e9e4295) (cherry picked from commit 1a09f753559aa7ed617192853215c5b0ace7756a) Closes-Bug: 1867380 Change-Id: Icd7ab2ca4ddbed92c7e883a63a23245920d961e7 (cherry picked from commit e23c3c2c8df3843c5853c87ef684bd21c4af95d8) (cherry picked from commit 70447bca2f4f33c6872eaf94a2e4351bb257c22a) (cherry picked from commit 373c4ffde2053c7ff11bd38339b88d144cd442f2) (cherry picked from commit 63ed32ef49adcb6830ef3b5329a561542bddf656) (cherry picked from commit 0718015f3fd2899720613bfef789f7023f112e30)
* | Use stable constraint for Tempest pinned stable branchesGhanshyam Mann2020-02-102-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stable branches till stable/rocky is using python version <py3.6. Tempest test those branch in venv but Tempest tox use the master upper-constraint[1] which block installation due to dependencies require >=py3.6. For exmaple, oslo.concurrency 4.0.0 is not compatible for <py3.6. As we pin Tempest for EM stable brach, we should be able to use stable constraint for Tempest installation as well as while running during run-tempest playbook. tox.ini is hard coded to use master constraint[1] which force run-tempest to recreate the tox env and use the master constraint. Fix for that- https://review.opendev.org/#/c/705870/ nova-live-migration test hook run_test.sh needs to use the stable u-c so that Tempest installation in venv will use stable branch constraint. Modify the irrelevant-files for nova-live-migration job to run for run_test.sh script. [1] https://opendev.org/openstack/tempest/src/commit/bc9fe8eca801f54915ff3eafa418e6e18ac2df63/tox.ini#L14 Change-Id: I8190f93e0a754fa59ed848a3a230d1ef63a06abc (cherry picked from commit 48a66c56441861a206f9369b8c242cfd4dffd80d)
* | Avoid redundant initialize_connection on source post live migrationMatthew Booth2020-01-225-75/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During live migration we update bdm.connection_info for attached volumes in pre_live_migration to reflect the new connection on the destination node. This means that after migration completes the BDM no longer has a reference to the original connection_info to do the detach on the source host. To address this, change I3dfb75eb added a second call to initialize_connection on the source host to re-fetch the source host connection_info before calling disconnect. Unfortunately the cinder driver interface does not strictly require that multiple calls to initialize_connection will return consistent results. Although they normally do in practice, there is at least one cinder driver (delliscsi) which doesn't. This results in a failure to disconnect on the source host post migration. This change avoids the issue entirely by fetching the BDMs prior to modification on the destination node. As well as working round this specific issue, it also avoids a redundant cinder call in all cases. Note that this massively simplifies post_live_migration in the libvirt driver. The complexity removed was concerned with reconstructing the original connection_info. This required considering the cinder v2 and v3 use cases, and reconstructing the multipath_id which was written to connection_info by the libvirt fibrechannel volume connector on connection. These things are not necessary when we just use the original data unmodified. Other drivers affected are Xenapi and HyperV. Xenapi doesn't touch volumes in post_live_migration, so is unaffected. HyperV did not previously account for differences in connection_info between source and destination, so was likely previously broken. This change should fix it. Conflicts: nova/compute/manager.py nova/objects/migrate_data.py nova/tests/unit/compute/test_compute.py nova/tests/unit/compute/test_compute_mgr.py nova/tests/unit/virt/libvirt/test_driver.py nova/virt/libvirt/driver.py NOTE(mriedem): The conflicts are primarily due to not having change I0bfb11296430dfffe9b091ae7c3a793617bd9d0d in Pike. In addition, the libvirt driver conflicts are also due to not having change I61a0bee9e71e9a67f6a7c04a7bfd6e77fe818a77 nor change Ica323b87fa85a454fca9d46ada3677f18fe50022 in Pike. NOTE(melwitt): The difference from the Queens change in nova/compute/manager.py to be able to treat the instance variable as a dict is because _do_live_migration is expected to be able to handle an old-style instance in Pike (this is exposed in the unit test). Other sources of conflict in nova/tests/unit/compute/test_compute.py are because change I9068a5a5b47cef565802a6d58f37777464644100 is not in Pike. The difference from the Queens change in nova/tests/unit/compute/test_compute_mgr.py to add a mock for the new get_by_instance_uuid database call is needed because in Queens the test was using a MagicMock as the compute manager whereas in Pike the test is using a real compute manager. The mock needs to be added to avoid a test error accessing the database in a NoDBTestCase. Another source of conflicts in nova/tests/unit/compute/test_compute_mgr.py is because changes I0f3ab6604d8b79bdb75cf67571e359cfecc039d8 and I9068a5a5b47cef565802a6d58f37777464644100 are not in Pike. Closes-Bug: #1754716 Closes-Bug: #1814245 Change-Id: I0390c9ff51f49b063f736ca6ef868a4fa782ede5 (cherry picked from commit b626c0dc7b113365002e743e6de2aeb40121fc81) (cherry picked from commit 75e0f5a9b18293546db0ddf0fb073854e6704115) (cherry picked from commit 013f421bca4067bd430a9fac1e3b290cf1388ee4)
* | Merge "Error out interrupted builds" into stable/pikeZuul2020-01-193-27/+274
|\ \
| * | Error out interrupted buildsBalazs Gibizer2020-01-163-27/+274
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the compute service is restarted while build requests are executing the instance_claim or waiting for the COMPUTE_RESOURCE_SEMAPHORE then those instances will be stuck forever in BUILDING state. If the instance already finished instance_claim then instance.host is set and when the compute restarts the instance is put to ERROR state. This patch changes compute service startup to put instances into ERROR state if they a) are in the BUILDING state, and b) have allocations on the compute resource provider, but c) do not have instance.host set to that compute. Note: changes in manager.py and test_compute_mgr.py compared to Queens: * the signature change of the get_allocations_for_resource_provider call is due to I7891b98f225f97ad47f189afb9110ef31c810717 is missing from stable/pike. * the VirtDriverNotReady exception does not exists in pike as Ib0ec1012b74e9a9e74c8879f3feed5f9332b711f is missing. In pike ironic returns an empty node list instead of raising an exception so the bugfix and the test is adapted accordingly. Change-Id: I856a3032c83fc2f605d8c9b6e5aa3bcfa415f96a Closes-Bug: #1833581 (cherry picked from commit a1a735bc6efa40d8277c9fc5339f3b74f968b58e) (cherry picked from commit 06fd7c730172190d7bf7d52bc9062eecba8d7d27) (cherry picked from commit cb951cbcb246221e04a063cd7b5ae2e83ddfe6dd) (cherry picked from commit 13bb7ed701121955ba015103c2e44429927e78d4) (cherry picked from commit 4164b96de9f62fdc35a12adf514d767460187d55)
* | | Merge "lxc: make use of filter python3 compatible" into stable/pikeZuul2020-01-162-5/+29
|\ \ \ | |/ / |/| |
| * | lxc: make use of filter python3 compatibleSean Mooney2019-12-172-5/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | _detect_nbd_devices uses the filter builtin internally to filter valid devices. In python 2, filter returns a list. In python 3, filter returns an iterable or generator function. This change eagerly converts the result of calling filter to a list to preserve the python 2 behaviour under python 3. NOTE(mriedem): In this backport the test module needs a mock import since change Ib5e585fa4bfb99617cd3ca983674114d323a3cce is not in Pike. Closes-Bug: #1840068 Change-Id: I25616c5761ea625a15d725777ae58175651558f8 (cherry picked from commit fc9fb383c16ecb98b1b546f21e7fabb5f00a42ac) (cherry picked from commit e135afec851e33148644d024a9d78e56f962efd4) (cherry picked from commit 944c08ff764c1cb598dbebbad8aa51bbdd0a692c) (cherry picked from commit 04bcb98678c1289810f5a8542b5bf9fe7aeeaa12)
* | | Merge "Functional reproduce for bug 1833581" into stable/pikeZuul2020-01-151-0/+73
|\ \ \