| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At start up of nova-compute service, the PCI stat pools are
populated based on information in pci_devices table in Nova
database. The pools are updated only when new device is added
or removed but not on any device changes like device type.
If an existing device is configured as SRIOV and nova-compute
is restarted, the pci_devices table gets updated but the device
is still listed under the old pool in pci_tracker.stats.pool
(in-memory object).
This patch looks for device type updates in existing devices
and updates the pools accordingly.
Conflicts:
nova/tests/functional/libvirt/test_pci_sriov_servers.py
nova/tests/unit/virt/libvirt/fakelibvirt.py
nova/tests/functional/libvirt/base.py
To avoid the conflicts and make the new functional test execute,
following changes are performed
- Modified the test case to use flavor extra spec pci_passthrough
:alias to create a server with sriov port instead of creating a
sriov port and passing port information during server creation.
- Removed changes in nova/tests/functional/libvirt/base.py as they
are required only if neutron sriov port is created in the test
case.
Change-Id: Id4ebb06e634a612c8be4be6c678d8265e0b99730
Closes-Bug: #1892361
(cherry picked from commit b8695de6da56db42b83b9d9d4c330148766644be)
(cherry picked from commit d8b8a8193b6b8228f6e7d6bde68b5ea6bb53dd8b)
(cherry picked from commit f58399cf496566e39d11f82a61e0b47900f2eafa)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The 1.6.3 [1] release has dropped support for py2 [2] so cap to 1.6.2
when using py2.
This change also raises hacking to 1.1.0 in lower-constraints.txt after
it was bumped by I35c654bd39f343417e0a1124263ff31dcd0b05c9. This also
means that flake8 is bumped to 2.6.0. stestr is also bumped to 2.0.0 as
required by oslotest 3.8.0.
All of these changes are squashed into a single change to pass the gate.
[1] https://github.com/PyCQA/bandit/releases/tag/1.6.3
[2] https://github.com/PyCQA/bandit/pull/615
Depends-On: https://review.opendev.org/c/openstack/devstack/+/768256
Depends-On: https://review.opendev.org/c/openstack/swift/+/766214
Closes-Bug: #1907438
Closes-Bug: #1907756
Change-Id: Ie5221bf37c6ed9268a4aa0737ffcdd811e39360a
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Previously, the default value of num_retries for glance is 0.
It means that the request to glance is sent only one time.
On the other hand, neutron and cinder clients set the default
value to 3.
To align the default value for retry to other components, we
should change the default value to 3.
Closes-Bug: #1888168
Change-Id: Ibbd4bd26408328b9e1a1128b3794721405631193
(cherry picked from commit 662af9fab6eacb46bcaee38d076d33c2c0f82b9b)
(cherry picked from commit 1f9dd694b937cc55a81a64fdce442829f009afb3)
|
|\ \ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
According to the api-ref, the id passed to calls in os-aggregates is
supposed to be an integer. No function validated this, so any value
passed to these functions would directly reach the DB. While this is
fine for SQLite, making a query with a string for an integer column on
other databases like PostgreSQL results in a DBError exception and thus
a HTTP 500 instead of 400 or 404.
This commit adds validation for the id parameter the same way it's
already done for other endpoints.
Conflicts:
nova/api/openstack/compute/aggregates.py
Changes:
nova/tests/unit/api/openstack/compute/test_aggregates.py
NOTE(stephenfin): Conflicts are due to absence of change
I4ab96095106b38737ed355fcad07e758f8b5a9b0 ("Add image caching API for
aggregates") which we don't want to backport. A test related to this
feature must also be removed.
Change-Id: I83817f7301680801beaee375825f02eda526eda1
Closes-Bug: 1865040
(cherry picked from commit 2e70a1717f25652912886cbefa3f40e6df908c00)
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Devices that report SR-IOV capabilities cannot be used without special
configuration - namely, the addition of "'device_type': 'type-PF'" or
"'device_type': 'type-VF'" to the '[pci] alias' configuration option.
Spell this out in the docs.
Change-Id: I4abbe30505a5e4ccba16027addd6d5f45066e31b
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Closes-Bug: #1852727
(cherry picked from commit 810aafc5ec9a7d25b33cf6c137c47b117c91269a)
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It doesn't really make sense to describe the "higher level"
configuration steps necessary for PCI passthrough before describing
things like BIOS configuration. Simply switch the ordering.
Change-Id: I4ea1d9a332d6585ce2c0d5a531fa3c4ad9c89482
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Related-Bug: #1852727
(cherry picked from commit 557728abaf0c822f2b1a5cdd4fb2e11e19d8ead7)
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Rewrite the document, making the following changes:
- Remove use of bullet points in favour of more descriptive steps
- Cross-reference various configuration options
- Emphasise that ``[pci] alias`` must be set on both controller and
compute node
- Style nits, such as fixing the header style
Change-Id: I2ac7df7d235f0af25f5a99bc8f6abddbae2cb3af
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Related-Bug: #1852727
(cherry picked from commit d5259abfe163058b13ad943ad16a5c281c2080e7)
|
|\ \
| |/
|/| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This change adds a max_queues config option to allow
operators to set the maximium number of virtio queue
pairs that can be allocated to a virtio network
interface.
Change-Id: I9abe783a9a9443c799e7c74a57cc30835f679a01
Closes-Bug: #1847367
(cherry picked from commit 0e6aac3c2d97c999451da50537df6a0cbddeb4a6)
|
|\ \ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This is a follow up to change
I8e4e5afc773d53dee9c1c24951bb07a45ddc2f1a which fixed an issue with
validation when the topmost patch after a Zuul rebase is a merge
patch.
We need to also use the $commit_hash variable for the check for
stable-only patches, else it will incorrectly fail because it is
checking the merge patch's commit message.
Change-Id: Ia725346b65dd5e2f16aa049c74b45d99e22b3524
(cherry picked from commit 1e10461c71cb78226824988b8c903448ba7a8a76)
(cherry picked from commit f1e4f6b078baf72e83cd7341c380aa0fc511519e)
(cherry picked from commit e676a480544b3fa71fcaa984a658e2131b7538c5)
|
|\ \ \
| |/ /
|/| | |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Previously disk_bus values were never validated and could easily end up
being ignored by the underlying virt driver and hypervisor.
For example, a common mistake made by users is to request a virtio-scsi
disk_bus when using the libvirt virt driver. This however isn't a valid
bus and is ignored, defaulting back to the virtio (virtio-blk) bus.
This change adds a simple validation in the compute API using the
potential disk_bus values provided by the DiskBus field class as used
when validating the hw_*_bus image properties.
Conflicts:
nova/tests/unit/compute/test_compute_api.py
NOTE(lyarwood): Conflict as If9c459a9a0aa752c478949e4240286cbdb146494 is
not present in stable/train. test_validate_bdm_disk_bus is also updated
as Ib31ba2cbff0ebb22503172d8801b6e0c3d2aa68a is not present in
stable/train.
Closes-Bug: #1876301
Change-Id: I77b28b9cc8f99b159f628f4655d85ff305a71db8
(cherry picked from commit 5913bd889f9d3dfc8d154415e666c821054c229d)
(cherry picked from commit fb31ae430a2e4f8869e77e31ea0d6a9478f6aa61)
|
|\ \ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When creating a live snapshot of an instance, nova creates a
copy of the instance disk using a QEMU shallow rebase. This
copy - the delta file - is then extracted and uploaded. The
delta file will eventually be deleted, when the temporary
working directory nova is using for the live snapshot is
discarded, however, until this happens, we will use 3x the
size of the image of host disk space: the original disk,
the delta file, and the extracted file. This can be problematic
when concurrent snapshots of multiple instances are requested
at once.
The solution is simple: delete the delta file after it has
been extracted and is no longer necessary.
Change-Id: I15e9975fa516d81e7d34206e5a4069db5431caa9
Closes-Bug: #1881727
(cherry picked from commit d2af7ca7a5c862f53f18c00ac76fc85336fa79e6)
(cherry picked from commit e51555b3f0324b8b72a2b3280a1c30e104b6d8ea)
|
|\ \ \ \ |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When vif_type="tap" (such as when using calico),
attempting to create an instance using an image that has
the property hw_vif_multiqueue_enabled=True fails, because
the interface is always being created without multiqueue
flags.
This change checks if the property is defined and passes
the multiqueue parameter to create the tap interface
accordingly.
In case the multiqueue parameter is passed but the
vif_model is not virtio (or unspecified), the old
behavior is maintained.
Change-Id: I0307c43dcd0cace1620d2ac75925651d4ee2e96c
Closes-bug: #1893263
(cherry picked from commit 84cfc8e9ab1396ec17abcfc9646c7d40f1d966ae)
(cherry picked from commit a69845f3732843ee1451b2e4ebf547d9801e898d)
|
|\ \ \ \ \ |
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
As noted inline, the 'policies' field may be a list but it expects one
of two items.
Change-Id: I34c68df1e6330dab1524aa0abec733610211a407
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Closes-Bug: #1894966
(cherry picked from commit 32c43fc8017ee89d4e6cdf79086d87735a00f0c0)
(cherry picked from commit 781210bd598c3e0ee9bd6a7db5d25688b5fc0131)
|
|\ \ \ \ \ \
| |/ / / / /
| | / / / /
| |/ / / /
|/| | | | |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
You must specify the 'policies' field. Currently, not doing so will
result in a HTTP 500 error code. This should be a 4xx error. Add a test
to demonstrate the bug before we provide a fix.
Changes:
nova/tests/functional/regressions/test_bug_1894966.py
NOTE(stephenfin): Need to update 'super' call to Python 2-compatible
variant.
Change-Id: I72e85855f621d3a51cd58d14247abd302dcd958b
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Related-Bug: #1894966
(cherry picked from commit 2c66962c7a40d8ef4fab54324e06edcdec1bd716)
(cherry picked from commit 94d24e3e8d04488abdebd4969daf98b780125297)
|
|\ \ \ \ \ |
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
In vSphere 7.0, the VirtualDevice.key cannot be the same any more.
So set different values to VirtualDevice.key
Change-Id: I574ed88729d2f0760ea4065cc0e542eea8d20cc2
Closes-Bug: #1892961
(cherry picked from commit a5d153a4c64f6947531823c0df91be5cbc491977)
(cherry picked from commit 0ea5bcca9d7bebf835b173c5e75dc89e666bcb99)
|
|\ \ \ \ \ \
| |/ / / / /
|/| | | | | |
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
mnaser reported a weird case where an instance was found
in both cell0 (deleted there) and in cell1 (not deleted
there but in error state from a failed build). It's unclear
how this could happen besides some weird clustered rabbitmq
issue where maybe the schedule and build request to conductor
happens twice for the same instance and one picks a host and
tries to build and the other fails during scheduling and is
buried in cell0.
To avoid a split brain situation like this, we add a sanity
check in _bury_in_cell0 to make sure the instance mapping is
not pointing at a cell when we go to update it to cell0.
Similarly a check is added in the schedule_and_build_instances
flow (the code is moved to a private method to make it easier
to test).
Worst case is this is unnecessary but doesn't hurt anything,
best case is this helps avoid split brain clustered rabbit
issues.
Closes-Bug: #1775934
Change-Id: I335113f0ec59516cb337d34b6fc9078ea202130f
(cherry picked from commit 5b552518e1abdc63fb33c633661e30e4b2fe775e)
|
|\ \ \ \ \ \
| |_|/ / / /
|/| | | | | |
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Previously, we were setting the environment variable to disable
greendns in eventlet *after* import eventlet. This has no effect, as
eventlet processes environment variables at import time. This patch
moves the setting of EVENTLET_NO_GREENDNS before importing eventlet in
order to correctly disable greendns.
Closes-bug: 1895322
Change-Id: I4deed815c8984df095019a7f61d089f233f1fc66
(cherry picked from commit 7c1d964faab33a02fe2366b5194611252be045fc)
(cherry picked from commit 79e6b7fd30a04cdb2374abcaf496b6b5b76084ff)
|
|\ \ \ \ \ \
| |_|/ / / /
|/| | | | |
| | | | | | |
into stable/train
|
| | |/ / /
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Because of the libvirt issue[1], there is a bug[2] that if we set cache mode
whose write semantic is not O_DIRECT (.i.e unsafe, writeback or writethrough),
there will be a problem with the volume drivers
(.i.e nova.virt.libvirt.volume.LibvirtISCSIVolumeDriver,
nova.virt.libvirt.volume.LibvirtNFSVolumeDriver and so on), which designate
native io explicitly.
That problem will generate a libvirt xml for the instance,
whose content contains
```
...
<disk ... >
<driver ... cache='unsafe/writeback/writethrough' io='native' />
</disk>
...
```
In turn, it will fail to start the instance or attach the disk.
> When qemu is configured with a block device that has aio=native set, but
> the cache mode doesn't use O_DIRECT (i.e. isn't cache=none/directsync or any
> unnamed mode with explicit cache.direct=on), then the raw-posix block driver
> for local files and block devices will silently fall back to aio=threads.
> The blockdev-add interface rejects such combinations, but qemu can't
> change the existing legacy interfaces that libvirt uses today.
[1]: https://github.com/libvirt/libvirt/commit/058384003db776c580d0e5a3016a6384e8eb7b92
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1086704
Closes-Bug: #1841363
Change-Id: If9acc054100a6733f3659a15dd9fc2d462e84d64
(cherry picked from commit af2405e1181d70cdf60bcd0e40b3e80f2db2e3a6)
(cherry picked from commit 0bd58921a1fcaffcc4fac25f63434c9cab93b061)
|
|\ \ \ \ \ |
|
| |/ / / /
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In bug 1879787, the call to network_api.get_instance_nw_info() in
_post_live_migration() on the source compute manager eventually calls
out to the Neutron REST API. If this fails, the exception is
unhandled, and the migrating instance - which is fully running on the
destination at this point - will never be updated in the database.
This update normally happens later in
post_live_migration_at_destination().
The network_info variable obtained from get_instance_nw_info() is used
for two things: notifications - which aren't critical - and unplugging
the instance's vifs on the source - which is very important!
It turns out that at the time of the get_instance_nw_info() call, the
network info in the instance info cache is still valid for unplugging
the source vifs. The port bindings on the destination are only
activated by the network_api.migrate_instance_start() [1] call that
happens shortly *after* the problematic get_instance_nw_info() call.
In other words, get_instance_nw_info() will always return the source
ports. Because of that, we can replace it with a call to
instance.get_network_info().
NOTE(artom) The functional test has been excised, as in stable/train
the NeutronFixture does not properly support live migration with
ports, making the test worthless. The work to support this was done as
part of bp/support-move-ops-with-qos-ports-ussuri, and starts at
commit b2734b5a9ae8b869fc9e8e229826343da3b47fcb.
NOTE(artom) The
test_post_live_migration_no_shared_storage_working_correctly and
test_post_live_migration_cinder_v3_api unit tests had to be adjusted
as part of the backport to pass with the new code.
[1] https://opendev.org/openstack/nova/src/commit/d9e04c4ff0b1a9c3383f1848dc846e93030d83cb/nova/network/neutronv2/api.py#L2493-L2522
Change-Id: If0fbae33ce2af198188c91638afef939256c2556
Closes-bug: 1879787
(cherry picked from commit 6488a5dfb293831a448596e2084f484dd0bfa916)
(cherry picked from commit 2c949cb3eea9cd9282060da12d32771582953aa2)
|
|\ \ \ \ \
| |_|/ / /
|/| | | | |
|
| |/ / /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Make the spec of virtual persistent memory consistent with
the contents of the admin manual, update the dependency of virtual
persistent memory about daxio, and add NOTE for the tested kernel
version.
Closes-Bug: #1894022
Change-Id: I30539bb47c98a588b95c066a394949d60af9c520
(cherry picked from commit a8b0c6b456a9afdbdfab69daf8c0d3685f8e3084)
(cherry picked from commit eae463ca1541dacdc7507899d25e7d3505194363)
|
|\ \ \ \
| |/ / /
|/| | |
| | | | |
stable/train
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Attempting to boot an instance with 'hw:cpu_policy=dedicated' will
result in a request from nova-scheduler to placement for allocation
candidates with $flavor.vcpu 'PCPU' inventory. Similarly, booting an
instance with 'hw:cpu_thread_policy=isolate' will result in a request
for allocation candidates with 'HW_CPU_HYPERTHREADING=forbidden', i.e.
hosts without hyperthreading. This has been the case since the
cpu-resources feature was implemented in Train. However, as part of that
work and to enable upgrades from hosts that predated Train, we also make
a second request for candidates with $flavor.vcpu 'VCPU' inventory. The
idea behind this is that old compute nodes would only report 'VCPU' and
should be useable, and any new compute nodes that got caught up in this
second request could never actually be scheduled to since there wouldn't
be enough cores from 'ComputeNode.numa_topology.cells.[*].pcpuset'
available to schedule to, resulting in rejection by the
'NUMATopologyFilter'. However, if a host was rejected in the first
query because it reported the 'HW_CPU_HYPERTHREADING' trait, it could
get picked up by the second query and would happily be scheduled to,
resulting in an instance consuming 'VCPU' inventory from a host that
properly supported 'PCPU' inventory.
The solution is simply, though also a huge hack. If we detect that the
host is using new style configuration and should be able to report
'PCPU', check if the instance asked for no hyperthreading and whether
the host has it. If all are True, reject the request.
Change-Id: Id39aaaac09585ca1a754b669351c86e234b89dd9
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Closes-Bug: #1889633
(cherry picked from commit 9c270332041d6b98951c0b57d7b344fd551a413c)
(cherry picked from commit 7ddab327675d36a4ba59d02d22d042d418236336)
|
|\ \ \ \ |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When rebuilding, we should only allow detaching
the volume with 'in-use' status, volume in status
such as 'retyping' should not allowed.
Conflicts:
nova/api/openstack/compute/servers.py
nova/compute/api.py
nova/tests/unit/api/openstack/compute/test_server_actions.py
Modified:
nova/tests/unit/compute/test_compute_api.py
NOTE(elod.illes):
* conflicts in servers.py and test_server_actions.py are due to bug
fixing patch I25eff0271c856a8d3e83867b448e1dec6f6732ab is not
backported to stable/train
* api.py conflict is due to Ic2ad1468d31b7707b7f8f2b845a9cf47d9d076d5
is part of a feature introduced in Ussuri
* modification of test_compute_api.py is also required due to patch
I25eff0271c856a8d3e83867b448e1dec6f6732ab is not backported and
another patch, Ide8eb9e09d22f20165474d499ef0524aefc67854, that
cannot be backported to stable/train
Change-Id: I7f93cfd18f948134c9cb429dea55740d2cf97994
Closes-Bug: #1489304
(cherry picked from commit 10e9a9b9fc62a3cf72c3717e3621ed95d3cf5519)
(cherry picked from commit bcbeae2c605f4ab4ad805dddccac802928a180b6)
|
|\ \ \ \ \
| | |/ / /
| |/| | | |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
With the introduction of the cpu-resources work [1], (libvirt) hosts can
now report 'PCPU' inventory separate from 'VCPU' inventory, which is
consumed by instances with pinned CPUs ('hw:cpu_policy=dedicated'). As
part of that effort, we had to drop support for the ability to boot
instances with 'hw:cpu_thread_policy=isolate' (i.e. I don't want
hyperthreads) on hosts with hyperthreading. This had been previously
implemented by marking thread siblings of the host cores used by such an
instance as reserved and unusable by other instances, but such a design
wasn't possible in world where we had to track resource consumption in
placement before landing in the host. Instead, the 'isolate' policy now
simply means "give me a host without hyperthreads". This is enforced by
hosts with hyperthreads reporting the 'HW_CPU_HYPERTHREADING' trait, and
instances with the 'isolate' policy requesting
'HW_CPU_HYPERTHREADING=forbidden'.
Or at least, that's how it should work. We also have a fallback query
for placement to find hosts with 'VCPU' inventory and that doesn't care
about the 'HW_CPU_HYPERTHREADING' trait. This was envisioned to ensure
hosts with old style configuration ('[DEFAULT] vcpu_pin_set') could
continue to be scheduled to. We figured that this second fallback query
could accidentally pick up hosts with new-style configuration, but we
are also tracking the available and used cores from those listed in the
'[compute] cpu_dedicated_set' as part of the host 'NUMATopology' objects
(specifically, via the 'pcpuset' and 'cpu_pinning' fields of the
'NUMACell' child objects). These are validated by both the
'NUMATopologyFilter' and the virt driver itself, which means hosts with
new style configuration that got caught up in this second query would be
rejected by this filter or by a late failure on the host. (Hint: there's
much more detail on this in the spec).
Unfortunately we didn't think about hyperthreading. If a host gets
picked up in the second request, it might well have enough PCPU
inventory but simply be rejected in the first query since it had
hyperthreads. In this case, because it has enough free cores available
for pinning, neither the filter nor the virt driver will reject the
request, resulting in a situation whereby the instance ends up falling
back to the old code paths and consuming $flavor.vcpu host cores, plus
the thread siblings for each of these cores. Despite this, it will be
marked as consuming $flavor.vcpu VCPU (not PCPU) inventory in placement.
This patch proves this to be the case, allowing us to resolve the issue
later.
[1] https://specs.openstack.org/openstack/nova-specs/specs/train/approved/cpu-resources.html
Change-Id: I87cd4d14192b1a40cbdca6e3af0f818f2cab613e
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Related-Bug: #1889633
(cherry picked from commit 737e0c0111acd364d1481bdabd9d23bc8d5d6a2e)
(cherry picked from commit 49a793c8ee7a9be26e4e3d6ddd097a6ee6fea29d)
|
|\ \ \ \ \
| |_|_|/ /
|/| | | | |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Deletion of an instance after disabling the hypervisor by a non-admin
user leaks the host fqdn in fault msg of instance.Removing the
'host' field from the error message of HypervisorUnavaiable
cause it's leaking host fqdn to non-admin users. The admin user will
see the Hypervisor unavailable exception msg but will be able to figure
on which compute host the guest is on and that the connection is broken.
Change-Id: I0eae19399670f59c17c9a1a24e1bfcbf1b514e7b
Closes-Bug: #1851587
(cherry picked from commit a89ffab83261060bbb9dedb2b8de6297b2d07efd)
(cherry picked from commit ff82601204e9d724b3032dc94c49fa5c8de2699b)
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | | |
into stable/train
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
I0bfb11296430dfffe9b091ae7c3a793617bd9d0d introduced support for live
migration with cinderv3 volume attachments during Queens. This initial
support handled failures in pre_live_migration directly by removing any
attachments created on the destination and reverting to the original
attachment ids before re-raising the caught exception to the source
compute. It also added rollback code within the main
_rollback_live_migration method but missed that this would also be
called during a pre_live_migration rollback.
As a result after a failure in pre_live_migration
_rollback_live_migration will attempt to delete the source host volume
attachments referenced by the bdm before updating the bdms with the now
non-existent attachment ids, leaving the volumes in an `available` state
in Cinder as they have no attachment records associated with them
anymore.
This change aims to resolve this within _rollback_volume_bdms by
ensuring that the current and original attachment_ids are not equal
before requesting that the current attachment referenced by the bdm is
deleted. When called after a failure in pre_live_migration this should
result in no attempt being made to remove the original source host
attachments from Cinder.
Note that the following changes muddy the waters slightly here but
introduced no actual changes to the logic within
_rollback_live_migration:
* I0f3ab6604d8b79bdb75cf67571e359cfecc039d8 reworked some of the error
handling in Rocky but isn't the source of the issue here.
* Ibe9215c07a1ee00e0e121c69bcf7ee1b1b80fae0 reworked
_rollback_live_migration to use the provided source_bdms.
* I6bc73e8c8f98d9955f33f309beb8a7c56981b553 then refactored
_rollback_live_migration, moving the logic into a self contained
_rollback_volume_bdms method.
Closes-Bug: #1889108
Change-Id: I9edb36c4df1cc0d8b529e669f06540de71766085
(cherry picked from commit 2102f1834a6ac9fd870bfb457b28a2172f33e281)
(cherry picked from commit 034b2fa8fea0e34fed95a2ba728e4387ce4e78de)
|
|\ \ \ \ \ \
| |/ / / / / |
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Previously any exception while rolling back the connection_info and
attachment_id of volume bdms would result in the overall attempt to
rollback a LM failing. This change refactors this specific bdm rollback
logic into two self contained methods that ignore by default errors
where possible to allow the LM rollback attempt to continue.
Change-Id: I6bc73e8c8f98d9955f33f309beb8a7c56981b553
(cherry picked from commit 9524a5a1b5745f6064f88cbfbf5bbfae3a973bef)
|
|\ \ \ \ \ \
| |/ / / / /
| | | | / /
| |_|_|/ /
|/| | | | |
|
| |/ / /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
NOTE(lyarwood): Various changes were required to get this to work on
stable/train without backporting a considerable number of changes to the
func tests including the following:
- Adding TestVolAttachmentsDuringPreLiveMigration to super() as
functional tests run against py27
- Adding USE_NEUTRON=True
- Adding api_major_version='v2.1'
- Adding self.api to self._wait_for_state_change calls
- Removing the use of _build_server and crafting the server creation
request by hand
- Removing the use of _live_migrate and crafting the live migration
request by hand
Related-Bug: #1889108
Change-Id: Ib9dbc792dc918e7ea45915e2c1dbd96be82ef562
(cherry picked from commit 4c970f499c31370495d84c91a10319d308d13fb9)
(cherry picked from commit 6db72002a65f30ac44b8df0a642b400ea272247e)
|
|\ \ \ \
| | | | |
| | | | |
| | | | | |
v4.1.0" into stable/train
|