| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change add a new _post_live_migration_update_host
function that wraps _post_live_migration and just ensures
that if we exit due to an exception instance.host is set
to the destination host.
when we are in _post_live_migration the guest has already
started running on the destination host and we cannot revert.
Sometimes admins or users will hard reboot the instance expecting
that to fix everything when the vm enters the error state after
the failed migrations. Previously this would end up recreating the
instance on the source node leading to possible data corruption if
the instance used shared storage.
Change-Id: Ibc4bc7edf1c8d1e841c72c9188a0a62836e9f153
Partial-Bug: #1628606
(cherry picked from commit 8449b7caefa4a5c0728e11380a088525f15ad6f5)
(cherry picked from commit 643b0c7d35752b214eee19b8d7298a19a8493f6b)
(cherry picked from commit 17ae907569e45cc0f5c7da9511bb668a877b7b2e)
(cherry picked from commit 15502ddedc23e6591ace4e73fa8ce5b18b5644b0)
(cherry picked from commit 43c0e40d288960760a6eaad05cb9670e01ef40d0)
(cherry picked from commit 0ac64bba8b7aba2fb358e00e970e88b32d26ef7e)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds a regression test or repoducer for post live migration
fail at destination, the possible casue can be fail to get
instance network info or block device info
changes:
adds return server from _live_migrate in _integrated_helpers
NOTE(auniyal): Differences
* Replaced GlanceFixture with fake.stub_out_image_service in regression test, as GlanceFixture does not exist in Ussuri
Related-Bug: #1628606
Change-Id: I48dbe0aae8a3943fdde69cda1bd663d70ea0eb19
(cherry picked from commit a20baeca1f5ebb0dfe9607335a6986e9ed0e1725)
(cherry picked from commit 74a618a8118642c9fd32c4e0d502d12ac826affe)
(cherry picked from commit 71e5a1dbcc22aeaa798d3d06ce392cf73364b8db)
(cherry picked from commit 5efcc3f695e02d61cb8b881e009308c2fef3aa58)
(cherry picked from commit ed1ea71489b60c0f95d76ab05f554cd046c60bac)
(cherry picked from commit 6dda4f7ca3f25a11cd0178352ad24fe2e8b74785)
|
|
|
|
|
|
|
|
|
|
|
|
| |
ignore instance task state and continue with vm evacutaion.
Closes-Bug: #1978983
Change-Id: I5540df6c7497956219c06cff6f15b51c2c8bc29d
(cherry picked from commit db919aa15f24c0d74f3c5c0e8341fad3f2392e57)
(cherry picked from commit 6d61fccb8455367aaa37ae7bddf3b8befd3c3d88)
(cherry picked from commit 8e9aa71e1a4d3074a94911db920cae44334ba2c3)
(cherry picked from commit 0b8124b99601e1aba492be8ed564f769438bd93d)
(cherry picked from commit 3224ceb3fffc57d2375e5163d8ffbbb77529bc38)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change add a repoducer test for evacuating
a vm in the powering-off state
Conflicts:
nova/tests/functional/integrated_helpers.py
nova/tests/functional/test_servers.py
Difference:
nova/tests/functional/regressions/test_bug_1978983.py
NOTE(auniyal): Conflicts are due to the following changes that are not in Ussuri:
* I147bf4d95e6d86ff1f967a8ce37260730f21d236 (Cyborg evacuate support)
* Ia3d7351c1805d98bcb799ab0375673c7f1cb8848 (Functional tests for NUMA live migration)
NOTE(auniyal): Differences
* Replaced GlanceFixture with fake.stub_out_image_service in regression test, as GlanceFixture does not exist in Ussuri
Related-Bug: #1978983
Change-Id: I5540df6c7497956219c06cff6f15b51c2c8bc299
(cherry picked from commit 5904c7f993ac737d68456fc05adf0aaa7a6f3018)
(cherry picked from commit 6bd0bf00fca6ac6460d70c855eded3898cfe2401)
(cherry picked from commit 1e0af92e17f878ce64bd16e428cb3c10904b0877)
(cherry picked from commit b57b0eef218fd7604658842c9277aad782d11b45)
(cherry picked from commit b6c877377f58ccaa797af3384b199002726745ea)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change I4d3193d8401614311010ed0e055fcb3aaeeebaed added some
additional local delete cleanup to prevent leaking of placement
allocations. The change introduced a regression in our "delete while
booting" handling as the _local_delete_cleanup required a valid
instance object to do its work and in two cases, we could have
instance = None from _lookup_instance if we are racing with a create
request and the conductor has deleted the instance record while we
are in the middle of processing the delete request.
This handles those scenarios by doing two things:
(1) Changing the _local_delete_cleanup and
_update_queued_for_deletion methods to take an instance UUID
instead of a full instance object as they really only need the
UUID to do their work
(2) Saving a copy of the instance UUID before doing another instance
lookup which might return None and passing that UUID to the
_local_delete_cleanup and _update_queued_for_deletion methods
Closes-Bug: #1914777
Change-Id: I03cf285ad83e09d88cdb702a88dfed53c01610f8
(cherry picked from commit 123f6262f63477d3f50dfad09688978e044bd9e0)
(cherry picked from commit 35112d7667cee9fc33db660ee241164b38468c31)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds two tests to cover a regression where racing create and
delete requests could result in a user receiving a 500 error when
attempting to delete an instance:
Unexpected exception in API method: AttributeError: 'NoneType' object
has no attribute 'uuid'
Related-Bug: #1914777
NOTE(melwitt): The difference from the cherry picked change is because
I6daea47988181dfa6dde3d9c42004c0ecf6ae87a is not in Ussuri.
Change-Id: I8249c572c6f727ef4ca434843106b9b57c47e585
(cherry picked from commit f7975d640ce1e9fa06d045d35177f07451716f0c)
(cherry picked from commit 4f17ea2f7dedb67802c2a8bc0caa64ff66fd2c9c)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Quotas.check_deltas method needs a user_id keyword arg in order to
scope a quota check to a particular user. However, when we call
check_num_instances_quota we don't pass a project_id or user_id because
at the time of the quota check, we have not yet created an instance
record and thus will not use that to determine the appropriate project
and user.
Instead, we should rely on the RequestContext.project_id and
RequestContext.user_id as defaults in this case, but
check_num_instances_quota only defaults project_id and not user_id.
check_num_instances_quota should also default user_id to the
RequestContext.user_id when user_id is not explicitly passed.
check_num_instances_quota should also check whether any per-user quota
limits are defined for instance-related resources before passing along
the user_id to scope resource counting and limit checking. Counting
resources across a user is costly, so we should avoid it if it's not
needed.
Closes-Bug: #1893284
Change-Id: I3cfb1edc30b0bda4671e0d2cc2a8993055dcc9ff
(cherry picked from commit 4c11d5467a30506a82dd5d32dd22b8958a187c0b)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a regression test for a bug where quota limit checking during
server creates is not properly scoped per-user when per-user quota has
been defined.
As a result, users who should be able to create a server are rejected
with a 403 "quota exceeded" error when they should be allowed to create
a server because servers owned by other users in the project are
incorrectly being counted for the current user.
Related-Bug: #1893284
Change-Id: I615ada45ffcbac081474c0a0cf005afdb8eec953
(cherry picked from commit 38bc8b871a4b954f6de84f10dc1e8beb21a7c9ed)
|
|\
| |
| |
| | |
stable/ussuri
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In the rocky cycle nova was enhanced to support the multiple
port binding live migration workflow when neutron supports
the binding-extended API extension.
When the migration_data object was extended to support
multiple port bindings, populating the vifs field was used
as a sentinel to indicate that the new workflow should
be used.
In the train release
I734cc01dce13f9e75a16639faf890ddb1661b7eb
(SR-IOV Live migration indirect port support)
broke the semantics of the migrate_data object by
unconditionally populating the vifs field
This change restores the rocky semantics, which are depended
on by several parts of the code base, by only conditionally
populating vifs if neutron supports multiple port bindings.
Changes to patch:
- unit/virt/libvirt/fakelibvirt.py: Include partial pick from
change Ia3d7351c1805d98bcb799ab0375673c7f1cb8848 to add the
jobStats, complete_job and fail_job to fakelibvirt. The full
change was not cherry-picked as it was part of the numa aware
live migration feature in Victoria.
Co-Authored-By: Sean Mooney <work@seanmooney.info>
Change-Id: Ia00277ac8a68a635db85f9e0ce2c6d8df396e0d8
Closes-Bug: #1888395
(cherry picked from commit b8f3be6b3c5af91d215b4a0cecb9be098e8d8799)
|
|\ \
| |/ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This change adds a funcitonal regression test that
assert the broken behavior when trying to live migrate
with a neutron backend that does not support multiple port
bindings.
Conflicts/Changes:
nova/tests/functional/regressions/test_bug_1888395.py:
- specify api major version to allow block_migration 'auto'
- use TempDir fixture for instances path
nova/tests/unit/virt/libvirt/fake_imagebackend.py:
- include portion of change Ia3d7351c1805d98bcb799ab0375673c7f1cb8848
which stubs out the is_file_in_instance_path method. That was
included in a feature patch set so just pulling the necessary
bit.
Change-Id: I470a016d35afe69809321bd67359f466c3feb90a
Partial-Bug: #1888395
(cherry picked from commit 71bc6fc9b89535679252ffe5a737eddad60e4102)
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The _update_available_resources periodic makes resource allocation
adjustments while holding the COMPUTE_RESOURCE_SEMAPHORE based on the
list of instances assigned to this host of the resource tracker and
based on the migrations where the source or the target host is the host
of the resource tracker. So if the instance.host or the migration
context changes without holding the COMPUTE_RESOURCE_SEMAPHORE while
the _update_available_resources task is running there there will be data
inconsistency in the resource tracker.
This patch makes sure that during evacuation the instance.host and the
migration context is changed while holding the semaphore.
Conflicts:
nova/tests/unit/compute/test_compute_mgr.py due to
I147bf4d95e6d86ff1f967a8ce37260730f21d236 is not in stable/ussuri
Change-Id: Ica180165184b319651d22fe77e076af036228860
Closes-Bug: #1896463
(cherry picked from commit 7675964af81472f3dd57db952d704539e61a3e6e)
(cherry picked from commit 3ecc098d28addd8f3e1da16b29940f4337209e62)
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There is a race condition between the rebuild and the
_update_available_resource periodic task on the compute. This patch adds
a reproducer functional test. Unfortunately it needs some injected sleep
to make the race happen in a stable way. This is suboptimal but only
adds 3 seconds of slowness to the test execution.
stable/ussuri specific changes:
* GlanceFixture does not exist as
I6daea47988181dfa6dde3d9c42004c0ecf6ae87a is not in ussuri
* MigrationList.get_in_progress_and_error was added in
I422a907056543f9bf95acbffdd2658438febf801 which is not in ussuri
Change-Id: Id0577bceed9808b52da4acc352cf9c204f6c8861
Related-Bug: #1896463
(cherry picked from commit 3f348602ae4a40c52c7135b2cb48deaa6052c488)
(cherry picked from commit d768cdbb88d0b0b3ca38623c4bb26d5eabdf1596)
|
|\ \
| | |
| | |
| | | |
volume" into stable/ussuri
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Idf84ccff254d26fa13473fe9741ddac21cbcf321 added this check in order for
Nova to avoid booting encrypted images created by Cinder as there is
currently no support for using such images (rotating keys etc).
The check however missed the slightly convoluted use case where this
image property is found against a volume after the volume in question is
created using an encrypted image created by cinder from an encrypted
volume. In other words:
- Cinder creates an encrypted volume A
- Glance creates an encrypted image A from volume A
- Cinder creates an encrypted volume B from image A
- Nova attempts to boot an instance using volume B
Note that Nova may request the creation of volume B or a user could also
do this directly through Cinder.
As such this change simply ensures that the instance isn't booting from
a volume when preforming the check as it is only valid when booting from
an image.
Closes-Bug: #1895696
Change-Id: Ic92cab7362fa25050e5bbef5c3e360108365b5c7
(cherry picked from commit f9b67893acf94c06fd41be36b80b99788dc77e48)
|
|\ \ \
| |/ /
| | /
| |/
|/| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
NOTE(lyarwood): I6daea47988181dfa6dde3d9c42004c0ecf6ae87a isn't present
on stable/ussuri so this test now uses the fake_image_service.
1f81c086579a0867f9aecd2d5d7cc531b4d3347f also isn't present on
stable/ussuri so this test also uses the CinderFixture directly.
Related-Bug: #1895696
Change-Id: I15271fb0b8de7f1184acddd607d605837e2eb7d4
(cherry picked from commit e76cccddd3ac64d8cc3e7e63e1772d5d23f20669)
|
|\ \
| | |
| | |
| | | |
stable/ussuri
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
As noted inline, the 'policies' field may be a list but it expects one
of two items.
Change-Id: I34c68df1e6330dab1524aa0abec733610211a407
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Closes-Bug: #1894966
(cherry picked from commit 32c43fc8017ee89d4e6cdf79086d87735a00f0c0)
|
|\ \ \
| |/ / |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
You must specify the 'policies' field. Currently, not doing so will
result in a HTTP 500 error code. This should be a 4xx error. Add a test
to demonstrate the bug before we provide a fix.
Change-Id: I72e85855f621d3a51cd58d14247abd302dcd958b
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Related-Bug: #1894966
(cherry picked from commit 2c66962c7a40d8ef4fab54324e06edcdec1bd716)
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As discussed in change I26b050c402f5721fc490126e9becb643af9279b4, the
resource tracker's periodic task is reliant on the status of migrations
to determine whether to include usage from these migrations in the
total, and races between setting the migration status and decrementing
resource usage via 'drop_move_claim' can result in incorrect usage.
That change tackled the confirm resize operation. This one changes the
revert resize operation, and is a little trickier due to kinks in how
both the same-cell and cross-cell resize revert operations work.
For same-cell resize revert, the 'ComputeManager.revert_resize'
function, running on the destination host, sets the migration status to
'reverted' before dropping the move claim. This exposes the same race
that we previously saw with the confirm resize operation. It then calls
back to 'ComputeManager.finish_revert_resize' on the source host to boot
up the instance itself. This is kind of weird, because, even ignoring
the race, we're marking the migration as 'reverted' before we've done
any of the necessary work on the source host.
The cross-cell resize revert splits dropping of the move claim and
setting of the migration status between the source and destination host
tasks. Specifically, we do cleanup on the destination and drop the move
claim first, via 'ComputeManager.revert_snapshot_based_resize_at_dest'
before resuming the instance and setting the migration status on the
source via
'ComputeManager.finish_revert_snapshot_based_resize_at_source'. This
would appear to avoid the weird quirk of same-cell migration, however,
in typical weird cross-cell fashion, these are actually different
instances and different migration records.
The solution is once again to move the setting of the migration status
and the dropping of the claim under 'COMPUTE_RESOURCE_SEMAPHORE'. This
introduces the weird setting of migration status before completion to
the cross-cell resize case and perpetuates it in the same-cell case, but
this seems like a suitable compromise to avoid attempts to do things
like unplugging already unplugged PCI devices or unpinning already
unpinned CPUs. From an end-user perspective, instance state changes are
what really matter and once a revert is completed on the destination
host and the instance has been marked as having returned to the source
host, hard reboots can help us resolve any remaining issues.
Change-Id: I29d6f4a78c0206385a550967ce244794e71cef6d
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Closes-Bug: #1879878
(cherry picked from commit dc9c7a5ebf11253f86127238d33dff7401465155)
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The 'ResourceTracker.update_available_resource' periodic task builds
usage information for the current host by inspecting instances and
in-progress migrations, combining the two. Specifically, it finds all
instances that are not in the 'DELETED' or 'SHELVED_OFFLOADED' state,
calculates the usage from these, then finds all in-progress migrations
for the host that don't have an associated instance (to prevent double
accounting) and includes the usage for these.
In addition to the periodic task, the 'ResourceTracker' class has a
number of helper functions to make or drop claims for the inventory
generated by the 'update_available_resource' periodic task as part of
the various instance operations. These helpers naturally assume that
when making a claim for a particular instance or migration, there
shouldn't already be resources allocated for same. Conversely, when
dropping claims, the resources should currently be allocated. However,
the check for *active* instances and *in-progress* migrations in the
periodic task means we have to be careful in how we make changes to a
given instance or migration record. Running the periodic task between
such an operation and an attempt to make or drop a claim can result in
TOCTOU-like races.
This generally isn't an issue: we use the 'COMPUTE_RESOURCE_SEMAPHORE'
semaphore to prevent the periodic task running while we're claiming
resources in helpers like 'ResourceTracker.instance_claim' and we make
our changes to the instances and migrations within this context. There
is one exception though: the 'drop_move_claim' helper. This function is
used when dropping a claim for either a cold migration, a resize or a
live migration, and will drop usage from either the source host (based
on the "old" flavor) for a resize confirm or the destination host (based
on the "new" flavor) for a resize revert or live migration rollback.
Unfortunately, while the function itself is wrapped in the semaphore, no
changes to the state or the instance or migration in question are
protected by it.
Consider the confirm resize case, which we're addressing here. If we
mark the migration as 'confirmed' before running 'drop_move_claim', then
the periodic task running between these steps will not account for the
usage on the source since the migration is allegedly 'confirmed'. The
call to 'drop_move_claim' will then result in the tracker dropping usage
that we're no longer accounting for. This "set migration status before
dropping usage" is the current behaviour for both same-cell and
cross-cell resize, via the 'ComputeManager.confirm_resize' and
'ComputeManager.confirm_snapshot_based_resize_at_source' functions,
respectively. We could reverse those calls and run 'drop_move_claim'
before marking the migration as 'confirmed', but while our usage will be
momentarily correct, the periodic task running between these steps will
re-add the usage we just dropped since the migration isn't yet
'confirmed'. The correct solution is to close this gap between setting
the migration status and dropping the move claim to zero. We do this by
putting both operations behind the 'COMPUTE_RESOURCE_SEMAPHORE', just
like the claim operations.
Change-Id: I26b050c402f5721fc490126e9becb643af9279b4
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Partial-Bug: #1879878
(cherry picked from commit a57800d3825ef2fb833d8024c6f7e5c550f55e2f)
|
|/
|
|
|
|
|
|
|
| |
It's not necessary for this test case since we don't query placement
stuff.
Change-Id: I5136b1cab8d704f55d01e5490c42ead6e3e14d98
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
(cherry picked from commit e91b07dc522db6b6e4df8972437f78ef2697790b)
|
|
|
|
|
|
|
|
| |
No need for the libvirt driver in all its complexity here.
Change-Id: Ifea9a15fb01c0b25e9973024f4f61faecc56e1cd
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
(cherry picked from commit e1adbced92453329f7285ec38c1dc7821ebb52c7)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I0bfb11296430dfffe9b091ae7c3a793617bd9d0d introduced support for live
migration with cinderv3 volume attachments during Queens. This initial
support handled failures in pre_live_migration directly by removing any
attachments created on the destination and reverting to the original
attachment ids before re-raising the caught exception to the source
compute. It also added rollback code within the main
_rollback_live_migration method but missed that this would also be
called during a pre_live_migration rollback.
As a result after a failure in pre_live_migration
_rollback_live_migration will attempt to delete the source host volume
attachments referenced by the bdm before updating the bdms with the now
non-existent attachment ids, leaving the volumes in an `available` state
in Cinder as they have no attachment records associated with them
anymore.
This change aims to resolve this within _rollback_volume_bdms by
ensuring that the current and original attachment_ids are not equal
before requesting that the current attachment referenced by the bdm is
deleted. When called after a failure in pre_live_migration this should
result in no attempt being made to remove the original source host
attachments from Cinder.
Note that the following changes muddy the waters slightly here but
introduced no actual changes to the logic within
_rollback_live_migration:
* I0f3ab6604d8b79bdb75cf67571e359cfecc039d8 reworked some of the error
handling in Rocky but isn't the source of the issue here.
* Ibe9215c07a1ee00e0e121c69bcf7ee1b1b80fae0 reworked
_rollback_live_migration to use the provided source_bdms.
* I6bc73e8c8f98d9955f33f309beb8a7c56981b553 then refactored
_rollback_live_migration, moving the logic into a self contained
_rollback_volume_bdms method.
Closes-Bug: #1889108
Change-Id: I9edb36c4df1cc0d8b529e669f06540de71766085
(cherry picked from commit 2102f1834a6ac9fd870bfb457b28a2172f33e281)
|
|
|
|
|
|
| |
Related-Bug: #1889108
Change-Id: Ib9dbc792dc918e7ea45915e2c1dbd96be82ef562
(cherry picked from commit 4c970f499c31370495d84c91a10319d308d13fb9)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The keypair of a server is updated when rebuilding the server with a
keypair. This function has been added since API microversion 2.54.
However the 'keypairs' of the instance object is not saved when saving
the instance object currently.
Make the instance object update the 'keypairs' field when saving the
instance object.
Change-Id: I8a2726b39d0444de8c35480024078a97430f5d0c
Closes-Bug: #1843708
Co-authored-by: Stephen Finucane <stephenfin@redhat.com>
(cherry picked from commit 086796021b189c3ac64805ed8f6bde833906d284)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When evacuating an instance between compute hosts on shared storage,
during the rebuild operation we call spawn() on the destination
compute. spawn() currently assumes that it should cleanup all
resources on failure, which results in user data being deleted in the
evacuate case.
This change modifies spawn in the libvirt driver such that it only
cleans up resources it created.
Co-Authored-By: Lee Yarwood <lyarwood@redhat.com>
Closes-Bug: #1550919
Change-Id: I764481966c96a67d993da6e902dc9fc3ad29ee36
(cherry picked from commit 497360b0ea970f1e68912be8229ef8c3f5454e9e)
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
1. Claim allocations from placement first, then claim specific
resources in Resource Tracker on destination to populate
migration_context.new_resources
3. cleanup specific resources when live migration succeeds/fails
Because we store specific resources in migration_context during
live migration, to ensure cleanup correctly we can't drop
migration_context before cleanup is complete:
a) when post live migration, we move source host cleanup before
destination cleanup(post_live_migration_at_destination will
apply migration_context and drop it)
b) when rollback live migration, we drop migration_context after
rollback operations are complete
For different specific resource, we might need driver specific support,
such as vpmem. This change just ensures that new claimed specific
resources are populated to migration_context and migration_context is not
droped before cleanup is complete.
Change-Id: I44ad826f0edb39d770bb3201c675dff78154cbf3
Implements: blueprint support-live-migration-with-virtual-persistent-memory
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two almost identical implementations of the _run_periodics()
helper - and a third one would have joined them in a subsequent patch,
if not for this patch. This patch moves the _run_periodics() to the
base test class. In addition, _run_periodics() depends on the
self.computes dict used for compute service tracking. The method that
populates that dict, _start_compute(), is therefore also moved to the
base class.
This enables some light refactoring of existing tests that need
either the _run_periodics() helper, or the compute service tracking.
In addition, a needless override of _start_compute() in
test_aggregates that provided no added value is removed. This is done
to avoid any potential confusion around _start_compute()'s role.
Change-Id: I36dd64dc272ea1743995b3b696323a9431666489
safdasdf
Change-Id: I33d8ac0a1cae0b2d275a21287d5e44c008a68122
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The test_unshelve_offloaded_fails_due_to_neutron could fail due to race
condition. The test case only waits for the first instance.save() call
at [1] but the allocation delete happens after it. This causes that the
test case can still see the allocation of the offloaded server in
placement.
The fix makes sure that the test waits for the second instance.save() by
checking for the host of the instance.
[1] https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L5274-L5288
Related-Bug #1862633
Change-Id: Ic1c3d35749fbdc7f5b6f6ec1e16b8fcf37c10de8
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This test would previously only attempt to invoke a race between
instance.save(expected_task_state=task_states.SPAWNING) and a parallel
attempt to delete an instance when the instance also has a vm_state of
ACTIVE and task_state of None. However vm_state and task_state would
often be different within the test resulting in no attempt to invoke the
test being made.
As instance.save is only called with expected_task_state set to
task_states.SPAWNING by _unshelve_instance and _build_and_run_instance
we should just check for this and avoid any state races within the test.
Additionally when attempting to invoke the race this test would call
_wait_for_server_parameter and provide self.api. This change removes
this argument as since I8c96b337f32148f8f5899c9b87af331b1fa41424 this is
no longer required and will result in a `TypeError:
'TestOpenStackClient' object is not subscriptable` error.
Closes-Bug: #1866072
Change-Id: I36da36cc5b099174eece0dfba29485fc20b2867b
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If an instance fails to build, which is possible for a variety of
reasons, we may end up in a situation where we have remnants of a
plugged VIF (typically files) left on the host. This is because we
cleanup from the neutron perspective but don't attempt to unplug the
VIF, a call which may have many side-effects depending on the VIF
driver. Resolve this by always attempting to unplug VIFs as part of the
network cleanup.
A now invalid note is also removed and a unit test corrected.
Closes-Bug: #1831771
Related-Bug: #1830081
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Change-Id: Ibdbde4ed460a99b0cbe0d6b76e0e5b3c0650f9d9
|
|
|
|
|
|
|
|
|
| |
Adds a regression-style test for two cleanup bugs when
'UnexpectedDeletingTaskStateError' is raised during build.
Change-Id: Ief1dfbb6cc9d67b73dfab4c7b63358e76e12866b
Related-Bug: #1848666
Related-Bug: #1831771
|
|
|
|
|
|
|
|
|
| |
When port binding update fails during unshelve of a shelve offloaded
instance compute manager has to catch the exception and clean up the
destination host allocation.
Change-Id: I4c3fbb213e023ac16efc0b8561f975a659311684
Closes-Bug: #1862633
|
|
|
|
|
|
|
|
| |
If port update fails during unshelve of an offloaded server then
placement allocation on the target host is leaked.
Change-Id: I7be32e4fc2e69f805535e0a437931516f491e5cb
Related-Bug: #1862633
|
|
|
|
|
|
|
|
| |
This is a *very* common pattern. Centralize it. Only a few users are
added for now, but we can migrate more later (it's rather tedious work).
Change-Id: I84c58de90dad6d86271767363aef90ddac0f1730
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
|
|
|
|
|
|
|
| |
Another broadly useful helper.
Change-Id: I67f5b00541c9f72d50c6fe626987872a0930e982
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
'_IntegratedTestBase' has subclassed 'InstanceHelperMixin' since change
I0d21cb94c932e6e556eca964c57868c705b2d120, which means both now provide
a '_build_minimal_create_server_request' function. However, only
'_IntegratedTestBase' provides a '_build_server' function. The
'_build_minimal_create_server_request' and '_build_server' functions do
pretty much the same thing but there are some differences. Combine these
under the '_build_server' alias.
Change-Id: I91fa2f73185fef48e9aae9b7f61389c374e06676
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This means we can drop many duplicate implementations of things
- nova.tests.function.integrated_helpers._IntegratedTestBase
- _build_minimal_create_server_request
- nova.tests.functional.test_servers.ServersTestBase
- _wait_for_server_parameter
- _wait_for_state_change
- _wait_until_deleted
Change-Id: I0d21cb94c932e6e556eca964c57868c705b2d120
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change I8742071b55f018f864f5a382de20075a5b444a79 in Ocata
moved the creation of the instance record from the API to
conductor. As a result, the "create" instance action was
only being created in conductor when the instance is created
in a non-cell0 database. This is a regression because before
that change when a server create would fail during scheduling
you could still list instance actions for the server and see
the "create" action but that was lost once we started burying
those instances in cell0.
This fixes the bug by creating the "create" action in the cell0
database when burying an instance there. It goes a step further
and also creates and finishes an event so the overall action
message shows up as "Error" with the details about where the
failure occurred in the event traceback.
A short release note is added since a new action event is
added here (conductor_schedule_and_build_instances) rather than
re-use some kind of event that we could generate from the
compute service, e.g. compute__do_build_and_run_instance.
Change-Id: I1e9431e739adfbcfc1ca34b87e826a516a4b18e2
Closes-Bug: #1852458
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pretty much every test case in 'nova.tests.functional' defines an 'api'
attribute, and many define an 'admin_api' attribute. We can pull these
from the class rather than explicitly passing them to helpers. Rework
things so this happens.
Note that the bulk of the changes here are in the
'nova/tests/functional/integrated_helpers.py' file. The rest of the
changes were auto-generated using the following script (my sed-fu is
non-existent):
$ cd nova/tests/functional
$ python3
>>> import glob
>>> import re
>>> pattern = r'_state_change\((\n\s+)?self\.(admin_)?api,\s+'
>>> replace = r'_state_change(\1'
>>> for path in glob.glob('*.py') + glob.glob('*/*.py'):
... with open(path) as fh:
... data = fh.read()
... new = re.sub(pattern, replace, data, flags=re.MULTILINE)
... if new != data:
... with open(path, 'w') as fh:
... fh.write(new)
...
>>> quit()
(ditto for the other substitutions)
Some manual fixups were required after, which pre-commit highlighted :)
Change-Id: I8c96b337f32148f8f5899c9b87af331b1fa41424
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Most of this is just removal of unnecessary 'use_neutron=True' conf
overrides. The sole exception is the conversion of the metadata API
tests, which turned out to be far less effort to convert than I
expected.
Change-Id: I5114bf13e7df95ed02be70f29d453116387d4f9c
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There is a bug in oslo messaging fake RPC driver bug 1529084. This
driver is used in the nova functional test and the bug currently worked
around in the nova production code.
As the bug 1529084 is fixed now in oslo.messaging 10.3.0 we can remove
the workaround in the nova code by bumping the minimum version of
oslo.messaging.
Change-Id: I4a32a688c7ceb05c263a0e93a91fb9b8ff0c65d4
Related-Bug: #1529084
|
|\ \ |
|