summaryrefslogtreecommitdiff
path: root/releasenotes
Commit message (Collapse)AuthorAgeFilesLines
* Use force=True for os-brick disconnect during deletemelanie witt2023-05-101-0/+11
| | | | | | | | | | | | | | | | | | | | | The 'force' parameter of os-brick's disconnect_volume() method allows callers to ignore flushing errors and ensure that devices are being removed from the host. We should use force=True when we are going to delete an instance to avoid leaving leftover devices connected to the compute host which could then potentially be reused to map to volumes to an instance that should not have access to those volumes. We can use force=True even when disconnecting a volume that will not be deleted on termination because os-brick will always attempt to flush and disconnect gracefully before forcefully removing devices. Closes-Bug: #2004555 Change-Id: I3629b84d3255a8fe9d8a7cea8c6131d7c40899e8 (cherry picked from commit db455548a12beac1153ce04eca5e728d7b773901) (cherry picked from commit efb01985db88d6333897018174649b425feaa1b4)
* Fix rescue volume-based instanceRajesh Tailor2023-01-301-0/+6
| | | | | | | | | | | | | | | | | | As of now, when attempting to rescue a volume-based instance using an image without the hw_rescue_device and/or hw_rescue_bus properties set, the rescue api call fails (as non-stable rescue for volume-based instances are not supported) leaving the instance in error state. This change checks for hw_rescue_device/hw_rescue_bus image properties before attempting to rescue and if the property is not set, then fail with proper error message, without changing instance state. Related-Bug: #1978958 Closes-Bug: #1926601 Change-Id: Id4c8c5f3b32985ac7d3d7c833b82e0876f7367c1 (cherry picked from commit 6eed55bf55469f4ceaa7d4d4eb1be635e14bc73b)
* Ironic nodes with instance reserved in placementJohn Garbutt2022-12-151-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when you delete an ironic instance, we trigger and undeploy in ironic and we release our allocation in placement. We do this well before the ironic node is actually available. We have attempted to fix this my marking unavailable nodes as reserved in placement. This works great until you try and re-image lots of nodes. It turns out, ironic nodes that are waiting for their automatic clean to finish, are returned as a valid allocation candidates for quite some time. Eventually we mark then as reserved. This patch takes a strange approach, if we mark all nodes as reserved as soon as the instance lands, we close the race. That is, when the allocation is removed the node is still unavailable until the next update of placement is done and notices that the node has become available. That may or may not have been after automatic cleaning. The trade off is that when you don't have automatic cleaning, we wait a bit longer to notice the node is available again. Note, this is also useful when a broken Ironic node is marked as in-maintainance while it is in-use by a nova instance. In a similar way, we mark the Nova as reserved immmeidately, rather than first waiting for the instance to be deleted before reserving the resources in Placement. Closes-Bug: #1974070 Change-Id: Iab92124b5776a799c7f90d07281d28fcf191c8fe (cherry picked from commit 3c022e968375c1b2eadf3c2dd7190b9434c6d4c1)
* Support multiple config file with mod_wsgiSean Mooney2022-12-121-0/+14
| | | | | | | | | | | | | | | | | | | | | Unlike uwsgi, apache mod_wsgi does not support passing commandline arguments to the python wsgi script it invokes. As a result while you can pass --config-file when hosting the api and metadata wsgi applications with uwsgi there is no way to use multiple config files with mod_wsgi. This change mirrors how this is supported in keystone today by intoducing a new OS_NOVA_CONFIG_FILES env var to allow operators to optional pass a ';' delimited list of config files to load. This change also add docs for this env var and the existing undocumented OS_NOVA_CONFIG_DIR. Closes-Bug: 1994056 Change-Id: I8e3ccd75cbb7f2e132b403cb38022787c2c0a37b (cherry picked from commit 73fe84fa0ea6f7c7fa55544f6bce5326d87743a6)
* Handle "no RAM info was set" migration caseBrett Milford2022-10-071-0/+11
| | | | | | | | | | | | | This handles the case where the live migration monitoring thread may race and call jobStats() after the migration has completed resulting in the following error: libvirt.libvirtError: internal error: migration was active, but no RAM info was set Closes-Bug: #1982284 Change-Id: I77fdfa9cffbd44b2889f49f266b2582bcc6a4267 (cherry picked from commit 9fea934c71d3c2fa7fdd80c67d94e18466c5cf9a)
* Prelude section for Zed releaseSylvain Bauza2022-09-141-0/+46
| | | | | | We need it before RC1. Change-Id: Ib674ca6a13f7c5d0254b222effa20d1948a80fe5
* Gracefully ERROR in _init_instance if vnic_type changedBalazs Gibizer2022-09-081-0/+9
| | | | | | | | | | | | | | | | | If the vnic_type of a bound port changes from "direct" to "macvtap" and then the compute service is restarted then during _init_instance nova tries to plug the vif of the changed port. However as it now has macvtap vnic_type nova tries to look up the netdev of the parent VF. Still that VF is consumed by the instance so there is no such netdev on the host OS. This error killed the compute service at startup due to unhandled exception. This patch adds the exception handler, logs an ERROR and continue initializing other instances on the host. Also this patch adds a detailed ERROR log when nova detects that the vnic_type changed during _heal_instance_info_cache periodic. Closes-Bug: #1981813 Change-Id: I1719f8eda04e8d15a3b01f0612977164c4e55e85
* Merge "Doc follow up for PCI in placement"Zuul2022-09-061-1/+1
|\
| * Doc follow up for PCI in placementBalazs Gibizer2022-09-021-1/+1
| | | | | | | | | | | | | | | | This fixes the doc comments for the already merged (or being merged) patches in the series. blueprint: pci-device-tracking-in-placement Change-Id: Ia99138d603722a66c9a6ac61b035384d86ccca75
* | Merge "libvirt: Add vIOMMU device to guest"Zuul2022-09-011-0/+21
|\ \
| * | libvirt: Add vIOMMU device to guestStephen Finucane2022-09-011-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implementation for BP/libvirt-viommu-device. With provide `hw:viommu_model` property to extra_specs or `hw_viommu_model` to image property. will enable viommu to libvirt guest. [1] https://www.berrange.com/posts/2017/02/16/setting-up-a-nested-kvm-guest-for-developing-testing-pci-device-assignment-with-numa/ [2] https://review.opendev.org/c/openstack/nova-specs/+/840310 Implements: blueprint libvirt-viommu-device Change-Id: Ief9c550292788160433a28a7a1c36ba38a6bc849 Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
* | | Merge "Allow enabling PCI tracking in Placement"Zuul2022-09-011-0/+9
|\ \ \ | | |/ | |/|
| * | Allow enabling PCI tracking in PlacementBalazs Gibizer2022-08-271-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces the [pci]report_in_placement config option that is False by default but if set to True will enable reporting of the PCI passthrough inventories to Placement. blueprint: pci-device-tracking-in-placement Change-Id: I49a3dbf4c5708d2d92dedd29a9dc3ef25b6cd66c
* | | Merge "Add API support for rebuilding BFV instances"Zuul2022-09-011-0/+10
|\ \ \
| * | | Add API support for rebuilding BFV instancesDan Smith2022-08-311-0/+10
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a microversion and API support for triggering a rebuild of volume-backed instances by leveraging cinder functionality to do so. Implements: blueprint volume-backed-server-rebuild Closes-Bug: #1482040 Co-Authored-By: Rajat Dhasmana <rajatdhasmana@gmail.com> Change-Id: I211ad6b8aa7856eb94bfd40e4fdb7376a7f5c358
* | | Add documentation and releasenotes for RBAC changeghanshyam mann2022-08-301-0/+36
|/ / | | | | | | | | | | | | | | | | | | | | | | We have droped the system scope from Nova policy and keeping the legacy admin behaviour same. This commit adds the releasenotes and update the policy configuration documentation accordingly. Also, remove the upgrade check for policy which was added for the system scope configuration protection. Change-Id: I127cc4da689a82dbde07059de90c451eb09ea4cf
* | Merge "Add locked_memory extra spec and image property"Zuul2022-08-261-0/+13
|\ \ | |/ |/|
| * Add locked_memory extra spec and image propertySean Mooney2022-08-241-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | This change adds a new hw:locked_memory extra spec and hw_locked_memory image property to contol preventing guest memory from swapping. This change adds docs and extend the flavor validators for the new extra spec. Also add new image property. Blueprint: libvirt-viommu-device Change-Id: Id3779594f0078a5045031aded2ed68ee4301abbd
* | Trigger reschedule if PCI consumption fail on computeBalazs Gibizer2022-08-251-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PciPassthroughFilter logic checks each InstancePCIRequest individually against the available PCI pools of a given host and given boot request. So it is possible that the scheduler accepts a host that has a single PCI device available even if two devices are requested for a single instance via two separate PCI aliases. Then the PCI claim on the compute detects this but does not stop the boot just logs an ERROR. This results in the instance booted without any PCI device. This patch does two things: 1) changes the PCI claim to fail with an exception and trigger a re-schedule instead of just logging an ERROR. 2) change the PciDeviceStats.support_requests that is called during scheduling to not just filter pools for individual requests but also consume the request from the pool within the scope of a single boot request. The fix in #2) would not be enough alone as two parallel scheduling request could race for a single device on the same host. #1) is the ultimate place where we consume devices under a compute global lock so we need the fix there too. Closes-Bug: #1986838 Change-Id: Iea477be57ae4e95dfc03acc9368f31d4be895343
* | Add VDPA support for suspend and livemigrateSean Mooney2022-08-231-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | This change append vnic-type vdpa to the list of passthough vnic types and removes the api blocks This should enable the existing suspend and live migrate code to properly manage vdpa interfaces enabling "hot plug" live migrations similar to direct sr-iov. Implements: blueprint vdpa-suspend-detach-and-live-migrate Change-Id: I878a9609ce0d84f7e3c2fef99e369b34d627a0df
* | Merge "enable blocked VDPA move operations"Zuul2022-08-201-0/+11
|\ \ | |/ |/|
| * enable blocked VDPA move operationsSean Mooney2022-08-161-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | This change adds functional test for operations on servers with VDPA devices that are expected to work but currently blocked due to lack of testing or qemu bugs. cold-migrate, resize, evacuate,and shelve are enabled and tested by this patch Closes-Bug: #1970467 Change-Id: I6e220cf3231670d156632e075fcf7701df744773
* | Add reno for fixing bug 1941005Balazs Gibizer2022-08-161-0/+6
|/ | | | | | Related-Bug: #1941005 Related-Bug: #1983753 Change-Id: I16ed1143ead3779c87698aa29bac005678db2993
* Merge "Optimize numa_fit_instance_to_host"Zuul2022-08-111-0/+9
|\
| * Optimize numa_fit_instance_to_hostBalazs Gibizer2022-06-151-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The numa_fit_instance_to_host algorithm tries all the possible host cell permutations to fit the instance cells. So in worst case scenario it does n! / (n-k)! _numa_fit_instance_cell calls (n=len(host_cells) k=len(instance_cells)) to find if the instance can be fit to the host. With 16 NUMA nodes host and 8 NUMA node guests this means 500 million calls to _numa_fit_instance_cell. This takes excessive time. However going through these permutations there are many repetitive host_cell, instance_cell pairs to try to fit. E.g. host_cells=[H1, H2, H2] instance_cells=[G1, G2] Produces pairings: * H1 <- G1 and H2 <- G2 * H1 <- G1 and H3 <- G2 ... Here G1 is checked to fit H1 twice. But if it does not fit in the first time then we know that it will not fit in the second time either. So we can cache the result of the first check and use that cache for the later permutations. This patch adds two caches to the algo. A fit_cache to hold host_cell.id, instance_cell.id pairs that we know fit, and a no_fit_cache for those pairs that we already know that doesn't fit. This change significantly boost the performance of the algorithm. The reproduction provided in the bug 1978372 took 6 minutes on my local machine to run without the optimization. With the optimization it run in 3 seconds. This change increase the memory usage of the algorithm with the two caches. Those caches are sets of integer two tuples. And the total size of the cache is the total number of possible host_cell, instance_cell pairs which is len(host_cell) * len(instance_cells). So form the above example (16 host, 8 instance NUMA) it is 128 pairs of integers in the cache. That will not cause a significant memory increase. Closes-Bug: #1978372 Change-Id: Ibcf27d741429a239d13f0404348c61e2668b4ce4
* | Merge "update default numa allocation strategy"Zuul2022-08-101-0/+12
|\ \
| * | update default numa allocation strategySean Mooney2022-08-101-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change updated the default of [compute]/packing_host_numa_cells_allocation_strategy to False making nova spread vms across numa nodes by defualt. This should significantly improve scheduling performace when there are a large number of host and guest numa node and non empty hosts. see bug 1978372 for details. Related-Bug: #1978372 Change-Id: I6fcd2c6b58dd36674be57eee70894ce04335955a
* | | Rename [pci]passthrough_whitelist to device_specBalazs Gibizer2022-08-101-0/+6
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A later patch in the pci-device-tracking-in-placement work will extend the existing [pci]passthrough_whitelist config syntax. So we take the opportunity here to deprecate the old non inclusive passthrough_whitelist name and introduce a better one. All the usage of CONF.pci.passthrough_whitelist is now changed over to the new device_spec config. Also the in tree documentation is updated accordinly. However the nova code still has a bunch of references to the "whitelist" terminology. That will be handled in subsequent patches. blueprint: pci-device-tracking-in-placement Change-Id: I843032e113642416114f169069eebf6a56ed78dd
* | Merge "Update libvirt enlightenments for Windows"Zuul2022-08-091-0/+21
|\ \
| * | Update libvirt enlightenments for WindowsArtom Lifshitz2022-08-021-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | libvirt has a set of enlightenments in domain XML that make it friendlier for Windows guests. We already enabled a few of these, this patch just completes the list. All of them are available in libvirt 4.7.0 (QEMU 3.0) [1], which is way below our current minimum libvirt and QEMU versions, so we don't need any extra checks. [1] https://libvirt.org/formatdomain.html#hypervisor-features Implements: bp/libvirt-update-windows-englightenments Change-Id: I008841988547573878c4e06e82f0fa55084e51b5
* | | Merge "For evacuation, ignore if task_state is not None"Zuul2022-08-041-0/+11
|\ \ \
| * | | For evacuation, ignore if task_state is not NoneAmit Uniyal2022-08-031-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | ignore instance task state and continue with vm evacutaion Closes-Bug: #1978983 Change-Id: I5540df6c7497956219c06cff6f15b51c2c8bc29d
* | | | Remove the PowerVM driverStephen Finucane2022-08-021-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PowerVM driver was deprecated in November 2021 as part of change Icdef0a03c3c6f56b08ec9685c6958d6917bc88cb. As noted there, all indications suggest that this driver is no longer maintained and may be abandonware. It's been some time and there's still no activity here so it's time to abandon this for real. This isn't as tied into the codebase as the old XenAPI driver was, so removal is mostly a case of deleting large swathes of code. Lovely. Change-Id: Ibf4f36136f2c65adad64f75d665c00cf2de4b400 Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
* | | | Merge "api: Drop generating a keypair and add special chars to naming"Zuul2022-07-281-0/+10
|\ \ \ \
| * | | | api: Drop generating a keypair and add special chars to namingSylvain Bauza2022-07-281-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As agreed in the spec, we will both drop the generation support for a keypair but we'll also accept @ (at) and . (dot) chars in the keyname, all of them in the same API microversion. Rebased the work from I5de15935e83823afa545a250cf84f6a7a37036b4 APIImpact Implements: blueprint keypair-generation-removal Co-Authored-By: Nicolas Parquet <nicolas.parquet@gandi.net> Change-Id: I6a7c71fb4385348c87067543d0454f302907395e
* | | | | Merge "Add a workaround to skip hypervisor version check on LM"Zuul2022-07-271-0/+13
|\ \ \ \ \ | |/ / / / |/| | | |
| * | | | Add a workaround to skip hypervisor version check on LMKashyap Chamarthy2022-07-271-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When turned on, this will disable the version-checking of hypervisors during live-migration. This can be useful for operators in certain scenarios when upgrading. E.g. if you want to relocate all instances off a compute node due to an emergency hardware issue, and you only have another old compute node ready at the time. Note, though: libvirt will do its own internal compatibility checks, and might still reject live migration if the destination is incompatible. Closes-Bug: #1982853 Change-Id: Iec387dcbc49ddb91ebf5cfd188224eaf6021c0e1 Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
* | | | | Allow unshelve to a specific host (REST API part)René Ribaud2022-07-221-0/+10
| |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support to the REST API, in a new microversion, for specifying a destination host to unshelve server action when the server is shelved offloaded. This patch also supports the ability to unpin the availability_zone of an instance that is bound to it. Note that the functional test changes are due to those tests using the "latest" microversion 2.91. Implements: blueprint unshelve-to-host Change-Id: I9e95428c208582741e6cd99bd3260d6742fcc6b7
* | | | Merge "Adds link in releasenotes for hw machine type bug"Zuul2022-07-191-1/+2
|\ \ \ \
| * | | | Adds link in releasenotes for hw machine type bugAmit Uniyal2022-07-121-1/+2
| |/ / / | | | | | | | | | | | | Change-Id: Icdc96b1773bfaf224b9adf1a82cc1ebb75af67e3
* | | | Merge "libvirt: remove default cputune shares value"Zuul2022-07-151-0/+15
|\ \ \ \ | |/ / / |/| | |
| * | | libvirt: remove default cputune shares valueArtom Lifshitz2022-07-141-0/+15
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the libvirt driver defaulted to 1024 * (# of CPUs) for the value of domain/cputune/shares in the libvirt XML. This value is then passed directly by libvirt to the cgroups API. Cgroups v2 imposes a maximum value of 10000 that can be passed in. This makes Nova unable to launch instances with more than 9 CPUs on hosts that run cgroups v2, like Ubuntu Jammy or RHEL 9. Fix this by just removing the default entirely. Because there is no longer a guarantee that domain/cputune will contain at least a shares element, we can stop always generating the former, and only generate it if it will actually contain something. We can also make operators's lives easier by leveraging the fact that we update the XML during live migration, so this patch also adds a method to remove the shares value from the live migration XML if one was not set as the quota:cpu_shares flavor extra spec. For operators that *have* set this extra spec to something greater than 10000, their flavors will have to get updates, and their instances resized. Partial-bug: 1978489 Change-Id: I49d757f5f261b3562ada27e6cf57284f615ca395
* | | Merge "Retry attachment delete API call for 504 Gateway Timeout"Zuul2022-07-081-0/+7
|\ \ \
| * | | Retry attachment delete API call for 504 Gateway TimeoutTakashi Kajinami2022-06-131-0/+7
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When cinder-api runs behind a load balancer(eg haproxy), the load balancer can return 504 Gateway Timeout when cinder-api does not respond within timeout. This change ensures nova retries deleting a volume attachment in that case. Also this change makes nova ignore 404 in the API call. This is required because cinder might continue deleting the attachment even if the load balancer returns 504. This also helps us in the situation where the volume attachment was accidentally removed by users. Closes-Bug: #1978444 Change-Id: I593011d9f4c43cdae7a3d53b556c6e2a2b939989
* | | Merge "Adds validation for hw machine type in host caps"Zuul2022-07-011-0/+9
|\ \ \
| * | | Adds validation for hw machine type in host capsAmit Uniyal2022-06-271-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added function '_check_machine_type' which accept host capabilities (caps) and machine type as param and look for machine type in host caps object, if machine type is not found raises exception InvalidMachineType Closes-Bug: #1933097 Change-Id: I59d22c0342d6b0f3c0398ce62ec177dae39b5677
* | | | Merge "ignore deleted server groups in validation"Zuul2022-06-301-0/+13
|\ \ \ \ | |_|_|/ |/| | |
| * | | ignore deleted server groups in validationSean Mooney2022-06-211-0/+13
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | This change simply catches the exception raised when we lookup a servergroup via a hint and the validation upcall is enabled. Change-Id: I858b4da35382a9f4dcf88f4b6db340e1f34eb82d Closes-Bug: #1890244
* | | Merge "Change TooOldComputeService upgrade check code to failure"Zuul2022-06-231-0/+6
|\ \ \
| * | | Change TooOldComputeService upgrade check code to failurePierre Riteau2022-06-091-0/+6
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The TooOldComputeService upgrade check currently produces a warning, which may be missed if the upgrade process only checks the exit code of the upgrade check command. Since this can lead to Nova control services failing to start, make the upgrade check a failure instead, so it results in a non-zero exit code. Closes-Bug: #1956983 Change-Id: Ia3ce6a0b0b810667ac0a66502a43038fe43c5aed