| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The 'force' parameter of os-brick's disconnect_volume() method allows
callers to ignore flushing errors and ensure that devices are being
removed from the host.
We should use force=True when we are going to delete an instance to
avoid leaving leftover devices connected to the compute host which
could then potentially be reused to map to volumes to an instance that
should not have access to those volumes.
We can use force=True even when disconnecting a volume that will not be
deleted on termination because os-brick will always attempt to flush
and disconnect gracefully before forcefully removing devices.
Closes-Bug: #2004555
Change-Id: I3629b84d3255a8fe9d8a7cea8c6131d7c40899e8
(cherry picked from commit db455548a12beac1153ce04eca5e728d7b773901)
(cherry picked from commit efb01985db88d6333897018174649b425feaa1b4)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As of now, when attempting to rescue a volume-based instance
using an image without the hw_rescue_device and/or hw_rescue_bus
properties set, the rescue api call fails (as non-stable rescue
for volume-based instances are not supported) leaving the instance
in error state.
This change checks for hw_rescue_device/hw_rescue_bus image
properties before attempting to rescue and if the property
is not set, then fail with proper error message, without changing
instance state.
Related-Bug: #1978958
Closes-Bug: #1926601
Change-Id: Id4c8c5f3b32985ac7d3d7c833b82e0876f7367c1
(cherry picked from commit 6eed55bf55469f4ceaa7d4d4eb1be635e14bc73b)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, when you delete an ironic instance, we trigger
and undeploy in ironic and we release our allocation in placement.
We do this well before the ironic node is actually available.
We have attempted to fix this my marking unavailable nodes
as reserved in placement. This works great until you try
and re-image lots of nodes.
It turns out, ironic nodes that are waiting for their automatic
clean to finish, are returned as a valid allocation candidates
for quite some time. Eventually we mark then as reserved.
This patch takes a strange approach, if we mark all nodes as
reserved as soon as the instance lands, we close the race.
That is, when the allocation is removed the node is still
unavailable until the next update of placement is done and
notices that the node has become available. That may or may
not have been after automatic cleaning. The trade off is
that when you don't have automatic cleaning, we wait a bit
longer to notice the node is available again.
Note, this is also useful when a broken Ironic node is
marked as in-maintainance while it is in-use by a nova
instance. In a similar way, we mark the Nova as reserved
immmeidately, rather than first waiting for the instance to be
deleted before reserving the resources in Placement.
Closes-Bug: #1974070
Change-Id: Iab92124b5776a799c7f90d07281d28fcf191c8fe
(cherry picked from commit 3c022e968375c1b2eadf3c2dd7190b9434c6d4c1)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike uwsgi, apache mod_wsgi does not support passing
commandline arguments to the python wsgi script it invokes.
As a result while you can pass --config-file when hosting the
api and metadata wsgi applications with uwsgi there is no
way to use multiple config files with mod_wsgi.
This change mirrors how this is supported in keystone today
by intoducing a new OS_NOVA_CONFIG_FILES env var to allow
operators to optional pass a ';' delimited list of config
files to load.
This change also add docs for this env var and the existing
undocumented OS_NOVA_CONFIG_DIR.
Closes-Bug: 1994056
Change-Id: I8e3ccd75cbb7f2e132b403cb38022787c2c0a37b
(cherry picked from commit 73fe84fa0ea6f7c7fa55544f6bce5326d87743a6)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This handles the case where the live migration monitoring thread may
race and call jobStats() after the migration has completed resulting in
the following error:
libvirt.libvirtError: internal error: migration was active, but no
RAM info was set
Closes-Bug: #1982284
Change-Id: I77fdfa9cffbd44b2889f49f266b2582bcc6a4267
(cherry picked from commit 9fea934c71d3c2fa7fdd80c67d94e18466c5cf9a)
|
|
|
|
|
|
| |
We need it before RC1.
Change-Id: Ib674ca6a13f7c5d0254b222effa20d1948a80fe5
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the vnic_type of a bound port changes from "direct" to "macvtap" and
then the compute service is restarted then during _init_instance nova
tries to plug the vif of the changed port. However as it now has macvtap
vnic_type nova tries to look up the netdev of the parent VF. Still that
VF is consumed by the instance so there is no such netdev on the host
OS. This error killed the compute service at startup due to unhandled
exception. This patch adds the exception handler, logs an ERROR and
continue initializing other instances on the host.
Also this patch adds a detailed ERROR log when nova detects that the
vnic_type changed during _heal_instance_info_cache periodic.
Closes-Bug: #1981813
Change-Id: I1719f8eda04e8d15a3b01f0612977164c4e55e85
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| | |
This fixes the doc comments for the already merged (or being merged)
patches in the series.
blueprint: pci-device-tracking-in-placement
Change-Id: Ia99138d603722a66c9a6ac61b035384d86ccca75
|
|\ \ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Implementation for BP/libvirt-viommu-device.
With provide `hw:viommu_model` property to extra_specs or
`hw_viommu_model` to image property. will enable viommu to libvirt
guest.
[1] https://www.berrange.com/posts/2017/02/16/setting-up-a-nested-kvm-guest-for-developing-testing-pci-device-assignment-with-numa/
[2] https://review.opendev.org/c/openstack/nova-specs/+/840310
Implements: blueprint libvirt-viommu-device
Change-Id: Ief9c550292788160433a28a7a1c36ba38a6bc849
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
|
|\ \ \
| | |/
| |/| |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch introduces the [pci]report_in_placement config option that is
False by default but if set to True will enable reporting of the PCI
passthrough inventories to Placement.
blueprint: pci-device-tracking-in-placement
Change-Id: I49a3dbf4c5708d2d92dedd29a9dc3ef25b6cd66c
|
|\ \ \ |
|
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This adds a microversion and API support for triggering a rebuild
of volume-backed instances by leveraging cinder functionality to
do so.
Implements: blueprint volume-backed-server-rebuild
Closes-Bug: #1482040
Co-Authored-By: Rajat Dhasmana <rajatdhasmana@gmail.com>
Change-Id: I211ad6b8aa7856eb94bfd40e4fdb7376a7f5c358
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We have droped the system scope from Nova policy
and keeping the legacy admin behaviour same. This
commit adds the releasenotes and update the policy
configuration documentation accordingly.
Also, remove the upgrade check for policy which was
added for the system scope configuration protection.
Change-Id: I127cc4da689a82dbde07059de90c451eb09ea4cf
|
|\ \
| |/
|/| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This change adds a new hw:locked_memory extra spec and hw_locked_memory
image property to contol preventing guest memory from swapping.
This change adds docs and extend the flavor
validators for the new extra spec.
Also add new image property.
Blueprint: libvirt-viommu-device
Change-Id: Id3779594f0078a5045031aded2ed68ee4301abbd
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The PciPassthroughFilter logic checks each InstancePCIRequest
individually against the available PCI pools of a given host and given
boot request. So it is possible that the scheduler accepts a host that
has a single PCI device available even if two devices are requested for
a single instance via two separate PCI aliases. Then the PCI claim on
the compute detects this but does not stop the boot just logs an ERROR.
This results in the instance booted without any PCI device.
This patch does two things:
1) changes the PCI claim to fail with an exception and trigger a
re-schedule instead of just logging an ERROR.
2) change the PciDeviceStats.support_requests that is called during
scheduling to not just filter pools for individual requests but also
consume the request from the pool within the scope of a single boot
request.
The fix in #2) would not be enough alone as two parallel scheduling
request could race for a single device on the same host. #1) is the
ultimate place where we consume devices under a compute global lock so
we need the fix there too.
Closes-Bug: #1986838
Change-Id: Iea477be57ae4e95dfc03acc9368f31d4be895343
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This change append vnic-type vdpa to the list
of passthough vnic types and removes the api blocks
This should enable the existing suspend and live migrate
code to properly manage vdpa interfaces enabling
"hot plug" live migrations similar to direct sr-iov.
Implements: blueprint vdpa-suspend-detach-and-live-migrate
Change-Id: I878a9609ce0d84f7e3c2fef99e369b34d627a0df
|
|\ \
| |/
|/| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This change adds functional test for operations on servers with VDPA
devices that are expected to work but currently blocked due to lack
of testing or qemu bugs.
cold-migrate, resize, evacuate,and shelve are enabled
and tested by this patch
Closes-Bug: #1970467
Change-Id: I6e220cf3231670d156632e075fcf7701df744773
|
|/
|
|
|
|
| |
Related-Bug: #1941005
Related-Bug: #1983753
Change-Id: I16ed1143ead3779c87698aa29bac005678db2993
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The numa_fit_instance_to_host algorithm tries all the possible
host cell permutations to fit the instance cells. So in worst case
scenario it does n! / (n-k)! _numa_fit_instance_cell calls
(n=len(host_cells) k=len(instance_cells)) to find if the instance can be
fit to the host. With 16 NUMA nodes host and 8 NUMA node guests this
means 500 million calls to _numa_fit_instance_cell. This takes excessive
time.
However going through these permutations there are many repetitive
host_cell, instance_cell pairs to try to fit.
E.g.
host_cells=[H1, H2, H2]
instance_cells=[G1, G2]
Produces pairings:
* H1 <- G1 and H2 <- G2
* H1 <- G1 and H3 <- G2
...
Here G1 is checked to fit H1 twice. But if it does not fit in the first
time then we know that it will not fit in the second time either. So we
can cache the result of the first check and use that cache for the later
permutations.
This patch adds two caches to the algo. A fit_cache to hold
host_cell.id, instance_cell.id pairs that we know fit, and a
no_fit_cache for those pairs that we already know that doesn't fit.
This change significantly boost the performance of the algorithm. The
reproduction provided in the bug 1978372 took 6 minutes on my local
machine to run without the optimization. With the optimization it run in
3 seconds.
This change increase the memory usage of the algorithm with the two
caches. Those caches are sets of integer two tuples. And the total size
of the cache is the total number of possible host_cell, instance_cell
pairs which is len(host_cell) * len(instance_cells). So form the above
example (16 host, 8 instance NUMA) it is 128 pairs of integers in the
cache. That will not cause a significant memory increase.
Closes-Bug: #1978372
Change-Id: Ibcf27d741429a239d13f0404348c61e2668b4ce4
|
|\ \ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This change updated the default of
[compute]/packing_host_numa_cells_allocation_strategy
to False making nova spread vms across numa nodes by defualt.
This should significantly improve scheduling performace
when there are a large number of host and guest numa node
and non empty hosts. see bug 1978372 for details.
Related-Bug: #1978372
Change-Id: I6fcd2c6b58dd36674be57eee70894ce04335955a
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A later patch in the pci-device-tracking-in-placement work
will extend the existing [pci]passthrough_whitelist config syntax.
So we take the opportunity here to deprecate the old non inclusive
passthrough_whitelist name and introduce a better one.
All the usage of CONF.pci.passthrough_whitelist is now changed over to
the new device_spec config. Also the in tree documentation is updated
accordinly.
However the nova code still has a bunch of references to the
"whitelist" terminology. That will be handled in subsequent patches.
blueprint: pci-device-tracking-in-placement
Change-Id: I843032e113642416114f169069eebf6a56ed78dd
|
|\ \ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
libvirt has a set of enlightenments in domain XML that make it
friendlier for Windows guests. We already enabled a few of these, this
patch just completes the list. All of them are available in libvirt
4.7.0 (QEMU 3.0) [1], which is way below our current minimum libvirt
and QEMU versions, so we don't need any extra checks.
[1] https://libvirt.org/formatdomain.html#hypervisor-features
Implements: bp/libvirt-update-windows-englightenments
Change-Id: I008841988547573878c4e06e82f0fa55084e51b5
|
|\ \ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
ignore instance task state and continue with vm evacutaion
Closes-Bug: #1978983
Change-Id: I5540df6c7497956219c06cff6f15b51c2c8bc29d
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The PowerVM driver was deprecated in November 2021 as part of change
Icdef0a03c3c6f56b08ec9685c6958d6917bc88cb. As noted there, all
indications suggest that this driver is no longer maintained and may be
abandonware. It's been some time and there's still no activity here so
it's time to abandon this for real.
This isn't as tied into the codebase as the old XenAPI driver was, so
removal is mostly a case of deleting large swathes of code. Lovely.
Change-Id: Ibf4f36136f2c65adad64f75d665c00cf2de4b400
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
|
|\ \ \ \ |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
As agreed in the spec, we will both drop the generation support for a keypair
but we'll also accept @ (at) and . (dot) chars in the keyname, all of them in
the same API microversion.
Rebased the work from I5de15935e83823afa545a250cf84f6a7a37036b4
APIImpact
Implements: blueprint keypair-generation-removal
Co-Authored-By: Nicolas Parquet <nicolas.parquet@gandi.net>
Change-Id: I6a7c71fb4385348c87067543d0454f302907395e
|
|\ \ \ \ \
| |/ / / /
|/| | | | |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When turned on, this will disable the version-checking of hypervisors
during live-migration. This can be useful for operators in certain
scenarios when upgrading. E.g. if you want to relocate all instances
off a compute node due to an emergency hardware issue, and you only have
another old compute node ready at the time.
Note, though: libvirt will do its own internal compatibility checks, and
might still reject live migration if the destination is incompatible.
Closes-Bug: #1982853
Change-Id: Iec387dcbc49ddb91ebf5cfd188224eaf6021c0e1
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
|
| |/ / /
|/| | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This adds support to the REST API, in a new microversion, for specifying
a destination host to unshelve server action when the server
is shelved offloaded.
This patch also supports the ability to unpin the availability_zone of an
instance that is bound to it.
Note that the functional test changes are due to those tests using the
"latest" microversion 2.91.
Implements: blueprint unshelve-to-host
Change-Id: I9e95428c208582741e6cd99bd3260d6742fcc6b7
|
|\ \ \ \ |
|
| |/ / /
| | | |
| | | |
| | | | |
Change-Id: Icdc96b1773bfaf224b9adf1a82cc1ebb75af67e3
|
|\ \ \ \
| |/ / /
|/| | | |
|
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Previously, the libvirt driver defaulted to 1024 * (# of CPUs) for the
value of domain/cputune/shares in the libvirt XML. This value is then
passed directly by libvirt to the cgroups API. Cgroups v2 imposes a
maximum value of 10000 that can be passed in. This makes Nova
unable to launch instances with more than 9 CPUs on hosts that run
cgroups v2, like Ubuntu Jammy or RHEL 9.
Fix this by just removing the default entirely. Because there is no
longer a guarantee that domain/cputune will contain at least a shares
element, we can stop always generating the former, and only generate
it if it will actually contain something.
We can also make operators's lives easier by leveraging the fact that
we update the XML during live migration, so this patch also adds a
method to remove the shares value from the live migration XML if one
was not set as the quota:cpu_shares flavor extra spec.
For operators that *have* set this extra spec to something greater
than 10000, their flavors will have to get updates, and their
instances resized.
Partial-bug: 1978489
Change-Id: I49d757f5f261b3562ada27e6cf57284f615ca395
|
|\ \ \ |
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When cinder-api runs behind a load balancer(eg haproxy), the load
balancer can return 504 Gateway Timeout when cinder-api does not
respond within timeout. This change ensures nova retries deleting
a volume attachment in that case.
Also this change makes nova ignore 404 in the API call. This is
required because cinder might continue deleting the attachment even if
the load balancer returns 504. This also helps us in the situation
where the volume attachment was accidentally removed by users.
Closes-Bug: #1978444
Change-Id: I593011d9f4c43cdae7a3d53b556c6e2a2b939989
|
|\ \ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Added function '_check_machine_type' which accept host
capabilities (caps) and machine type as param and look
for machine type in host caps object, if machine type
is not found raises exception InvalidMachineType
Closes-Bug: #1933097
Change-Id: I59d22c0342d6b0f3c0398ce62ec177dae39b5677
|
|\ \ \ \
| |_|_|/
|/| | | |
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This change simply catches the exception raised when
we lookup a servergroup via a hint and the validation
upcall is enabled.
Change-Id: I858b4da35382a9f4dcf88f4b6db340e1f34eb82d
Closes-Bug: #1890244
|
|\ \ \ |
|
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The TooOldComputeService upgrade check currently produces a warning,
which may be missed if the upgrade process only checks the exit code of
the upgrade check command.
Since this can lead to Nova control services failing to start, make the
upgrade check a failure instead, so it results in a non-zero exit code.
Closes-Bug: #1956983
Change-Id: Ia3ce6a0b0b810667ac0a66502a43038fe43c5aed
|