| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The resource tracker will silently ignore attempts to claim resources
when the node requested is not managed by this host. The misleading
"self.disabled(nodename)" check will fail if the nodename is not known
to the resource tracker, causing us to bail early with a NopClaim.
That means we also don't do additional setup like creating a migration
context for the instance, claim resources in placement, and handle
PCI/NUMA things. This behavior is quite old, and clearly doesn't make
sense in a world with things like placement. The bulk of the test
changes here are due to the fact that a lot of tests were relying on
this silent ignoring of a mismatching node, because they were passing
node names that weren't even tracked.
This change makes us raise an error if this happens so that we can
actually catch it, and avoid silently continuing with no resource
claim.
Change-Id: I416126ee5d10428c296fe618aa877cca0e8dffcf
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We do run update_available_resource() synchronously during service
startup, but we only allow certain exceptions to abort startup. This
makes us abort for InvalidConfiguration, and makes the resource
tracker raise that for the case where the compute node create failed
due to a duplicate entry.
This also modifies the object to raise a nova-specific error for that
condition to avoid the compute node needing to import oslo_db stuff
just to be able to catch it.
Change-Id: I5de98e6fe52e45996bc2e1014fa8a09a2de53682
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes the resource tracker look up and create ComputeNode objects
by uuid instead of nodename. For drivers like ironic that already
provide 'uuid' in the resources dict, we can use that. For those
that do not, we force the uuid to be the locally-persisted node
uuid, and use that to find/create the ComputeNode object.
A (happy) side-effect of this is that if we find a deleted compute
node object that matches that of our hypervisor, we undelete it
instead of re-creating one with a new uuid, which may clash with our
old one. This means we remove some of the special-casing of ironic
rebalance, although the tests for that still largely stay the same.
Change-Id: I6a582a38c302fd1554a49abc38cfeda7c324d911
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Id02e445c55fc956965b7d725f0260876d42422f2 added special case in the
healing logic for same host resize. Now that the scheduler also creates
allocation on the destination host during resize we need to make sure
that the drop_move_claim code that runs during revert and confirm drops
the tracked migration from the resource tracker only after the healing
logic run as these migrations being confirmed / reverted are still
affecting PciDevices at this point.
blueprint: pci-device-tracking-in-placement
Change-Id: I6241965fe6c1cc1f2560fcce65d5e32ef308d502
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Nova's PCI scheduling (and the PCI claim) works based on PCI device
pools where the similar available PCI devices are assigned. The PCI
devices are now represented in placement as RPs. And the allocation
candidates during scheduling and the allocation after scheduling
now contain PCI devices. This information needs to affect the PCI
scheduling and PCI claim. To be able to do that we need to map PCI
device pools to RPs. We achieve that here by first mapping
PciDevice objects to RPs during placement PCI inventory reporting.
Then mapping pools to RPs based on the PCI devices assigned to the
pools.
Also because now ResourceTracker._update_to_placement() call updates
the PCI device pools the sequence of events needed to changed in the
ResourceTracker to:
1) run _update_to_placement()
2) copy the pools to the CompouteNode object
3) save the compute to the DB
4) save the PCI tracker
blueprint: pci-device-tracking-in-placement
Change-Id: I9bb450ac235ab72ff0d8078635e7a11c04ff6c1e
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PCI devices which are allocated to instances can be removed from the
[pci]device_spec configuration or can be removed from the hypervisor
directly. The existing PciTracker code handle this cases by keeping the
PciDevice in the nova DB exists and allocated and issue a warning in the
logs during the compute service startup that nova is in an inconsistent
state. Similar behavior is now added to the PCI placement tracking code
so the PCI inventories and allocations in placement is kept in such
situation.
There is one case where we cannot simply accept the PCI device
reconfiguration by keeping the existing allocations and applying the new
config. It is when a PF that is configured and allocated is removed and
VFs from this PF is now configured in the [pci]device_spec. And vice
versa when VFs are removed and its parent PF is configured. In this case
keeping the existing inventory and allocations and adding the new inventory
to placement would result in placement model where a single PCI device
would provide both PF and VF inventories. This dependent device
configuration is not supported as it could lead to double consumption.
In such situation the compute service will refuse to start.
blueprint: pci-device-tracking-in-placement
Change-Id: Id130893de650cc2d38953cea7cf9f53af71ced93
|
|
|
|
|
|
|
|
|
|
| |
Same host resize needs special handling in the allocation healing logic
as both the source and the dest host PCI devices are visible to the
healing code as the PciDevice.instance_uuid points to the healed
instance in both cases.
blueprint: pci-device-tracking-in-placement
Change-Id: Id02e445c55fc956965b7d725f0260876d42422f2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During a normal update_available_resources run if the local provider
tree caches is invalid (i.e. due to the scheduler made an allocation
bumping the generation of the RPs) and the virt driver try to update the
inventory of an RP based on the cache Placement will report conflict,
the report client will invalidate the caches and the retry decorator
on ResourceTracker._update_to_placement will re-drive the top of the
fresh RP data.
However the same thing can happen during reshape as well but the retry
mechanism is missing in that code path so the stale caches can cause
reshape failures.
This patch adds specific error handling in the reshape code path to
implement the same retry mechanism as exists for inventory update.
blueprint: pci-device-tracking-in-placement
Change-Id: Ieb954a04e6aba827611765f7f401124a1fe298f3
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A new PCI resource handler is added to the update_available_resources
code path update the ProviderTree with PCI device RPs, inventories and
traits.
It is a bit different than the other Placement inventory reporter. It
does not run in the virt driver level as PCI is tracked in a generic way
in the PCI tracker in the resource tracker. So the virt specific
information is already parsed and abstracted by the resource tracker.
Another difference is that to support rolling upgrade the PCI handler
code needs to be prepared for situations where the scheduler does not
create PCI allocations even after some of the compute already started
reporting inventories and started healing PCI allocations. So the code
is not prepared to do a single, one shot, reshape at startup, but
instead to do a continuous healing of the allocations. We can remove
this continuous healing after the PCI prefilter will be made mandatory
in a future release.
The whole PCI placement reporting behavior is disabled by default while
it is incomplete. When it is functionally complete a new
[pci]report_in_placement config option will be added to allow enabling
the feature. This config is intentionally not added by this patch as we
don't want to allow enabling this logic yet.
blueprint: pci-device-tracking-in-placement
Change-Id: If975c3ec09ffa95f647eb4419874aa8417a59721
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have many places where we implement singleton behavior for the
placement client. This unifies them into a single place and
implementation. Not only does this DRY things up, but may cause us
to initialize it fewer times and also allows for emitting a common
set of error messages about expected failures for better
troubleshooting.
Change-Id: Iab8a791f64323f996e1d6e6d5a7e7a7c34eb4fb3
Related-Bug: #1846820
|
|
|
|
|
|
|
| |
This change fixes typos in conf parameter help messages
and in error log message.
Change-Id: Iedc268072d77771b208603e663b0ce9b94215eb8
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`binding:profile` updates are handled differently for migration from
instance creation which was not taken into account previously. Relevant
fields (card_serial_number, pf_mac_address, vf_num) are now added to the
`binding:profile` after a new remote-managed PCI device is determined at
the destination node.
Likewise, there is special handling for the unshelve operation which is
fixed too.
Func testing:
* Allow the generated device XML to contain the PCI VPD capability;
* Add test cases for basic operations on instances with remote-managed
ports (tunnel or physical);
* Add a live migration test case similar to how it is done for
non-remote-managed SR-IOV ports but taking remote-managed port related
specifics into account;
* Add evacuate, shelve/unshelve, cold migration test cases.
Change-Id: I9a1532e9a98f89db69b9ae3b41b06318a43519b3
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a pre-filter for requests that contain VNIC_TYPE_REMOTE_MANAGED
ports in them: hosts that do not have either the relevant compute
driver capability COMPUTE_REMOTE_MANAGED_PORTS or PCI device pools
with "remote_managed" devices are filtered out early. Presence of
devices actually available for allocation is checked at a later
point by the PciPassthroughFilter.
Change-Id: I168d3ccc914f25a3d4255c9b319ee6b91a2f66e2
Implements: blueprint integration-with-off-path-network-backends
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a race condition between an incoming resize and an
update_available_resource periodic in the resource tracker. The race
window starts when the resize_instance RPC finishes and ends when the
finish_resize compute RPC finally applies the migration context on the
instance.
In the race window, if the update_available_resource periodic is run on
the destination node, then it will see the instance as being tracked on
this host as the instance.node is already pointing to the dest. But the
instance.numa_topology still points to the source host topology as the
migration context is not applied yet. This leads to CPU pinning error if
the source topology does not fit to the dest topology. Also it stops the
periodic task and leaves the tracker in an inconsistent state. The
inconsistent state only cleanup up after the periodic is run outside of
the race window.
This patch applies the migration context temporarily to the specific
instances during the periodic to keep resource accounting correct.
Change-Id: Icaad155e22c9e2d86e464a0deb741c73f0dfb28a
Closes-Bug: #1953359
Closes-Bug: #1952915
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a force kwarg to delete_allocation_for_instance which
defaults to True because that was found to be the most common use case
by a significant margin during implementation of this patch.
In most cases, this method is called when we want to delete the
allocations because they should be gone, e.g. server delete, failed
build, or shelve offload. The alternative in these cases is the caller
could trap the conflict error and retry but we might as well just force
the delete in that case (it's cleaner).
When force=True, it will DELETE the consumer allocations rather than
GET and PUT with an empty allocations dict and the consumer generation
which can result in a 409 conflict from Placement. For example, bug
1836754 shows that in one tempest test that creates a server and then
immediately deletes it, we can hit a very tight window where the method
GETs the allocations and before it PUTs the empty allocations to remove
them, something changes which results in a conflict and the server
delete fails with a 409 error.
It's worth noting that delete_allocation_for_instance used to just
DELETE the allocations before Stein [1] when we started taking consumer
generations into account. There was also a related mailing list thread
[2].
Closes-Bug: #1836754
[1] I77f34788dd7ab8fdf60d668a4f76452e03cf9888
[2] http://lists.openstack.org/pipermail/openstack-dev/2018-August/133374.html
Change-Id: Ife3c7a5a95c5d707983ab33fd2fbfc1cfb72f676
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a race condition in nova-compute with the ironic virt driver
as nodes get rebalanced. It can lead to compute nodes being removed in
the DB and not repopulated. Ultimately this prevents these nodes from
being scheduled to.
The issue being addressed here is that if a compute node is deleted by a
host which thinks it is an orphan, then the resource provider for that
node might also be deleted. The compute host that owns the node might
not recreate the resource provider if it exists in the provider tree
cache.
This change fixes the issue by clearing resource providers from the
provider tree cache for which a compute node entry does not exist. Then,
when the available resource for the node is updated, the resource
providers are not found in the cache and get recreated in placement.
Change-Id: Ia53ff43e6964963cdf295604ba0fb7171389606e
Related-Bug: #1853009
Related-Bug: #1841481
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a race condition in nova-compute with the ironic virt driver as
nodes get rebalanced. It can lead to compute nodes being removed in the
DB and not repopulated. Ultimately this prevents these nodes from being
scheduled to.
The issue being addressed here is that if a compute node is deleted by a host
which thinks it is an orphan, then the compute host that actually owns the node
might not recreate it if the node is already in its resource tracker cache.
This change fixes the issue by clearing nodes from the resource tracker cache
for which a compute node entry does not exist. Then, when the available
resource for the node is updated, the compute node object is not found in the
cache and gets recreated.
Change-Id: I39241223b447fcc671161c370dbf16e1773b684a
Partial-Bug: #1853009
|
|
|
|
|
|
|
|
|
|
|
| |
This continues on from I81fec10535034f3a81d46713a6eda813f90561cf and
removes all other references to 'instance_type' where it's possible to
do so. The only things left are DB columns, o.vo fields, some
unversioned objects, and RPC API methods. If we want to remove these, we
can but it's a lot more work.
Change-Id: I264d6df1809d7283415e69a66a9153829b8df537
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To implement the `socket` PCI NUMA affinity policy, we'll need to
track the host NUMA topology in the PCI stats code. To achieve this,
PCI stats will need to know the compute node it's running on. Prepare
for this by replacing the node_id parameter with compute_node. Node_id
was previously optional, but that looks to have been only to
facilitate testing, as that's the only place where it was not passed
it. We use compute_node (instead of just making node_id mandatory)
because it allows for an optimization later on wherein the PCI manager
does not need to pull the ComputeNode object from the database
needlessly.
Implements: blueprint pci-socket-affinity
Change-Id: Idc839312d1449e9327ee7e3793d53ed080a44d0c
|
|
|
|
|
|
|
|
|
|
|
| |
NUMA aware live migration and SRIOV live migration was implemented as
two separate feature. As a consequence the case when both SRIOV and NUMA
is present in the instance was missed. When the PCI device is claimed on
the destination host the NUMA topology of the instance needs to be
passed to the claim call.
Change-Id: If469762b22d687151198468f0291821cebdf26b2
Closes-Bug: #1893221
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The _update_available_resources periodic makes resource allocation
adjustments while holding the COMPUTE_RESOURCE_SEMAPHORE based on the
list of instances assigned to this host of the resource tracker and
based on the migrations where the source or the target host is the host
of the resource tracker. So if the instance.host or the migration
context changes without holding the COMPUTE_RESOURCE_SEMAPHORE while
the _update_available_resources task is running there there will be data
inconsistency in the resource tracker.
This patch makes sure that during evacuation the instance.host and the
migration context is changed while holding the semaphore.
Change-Id: Ica180165184b319651d22fe77e076af036228860
Closes-Bug: #1896463
|
|\ \
| |/
|/| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Another pretty trivial one. This one was intended to provide an overview
of instances that weren't properly tracked but were running on the host.
It was only ever implemented for the XenAPI driver so remove it now.
Change-Id: Icaba3fc89e3295200e3d165722a5c24ee070002c
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
|
|\ \ |
|
| |/
| |
| |
| |
| |
| | |
Part of blueprint sriov-interface-attach-detach
Change-Id: Ifc5417a8eddf62ad49d898fa6c9c1da71c6e0bb3
|
|\ \
| |/
|/| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As discussed in change I26b050c402f5721fc490126e9becb643af9279b4, the
resource tracker's periodic task is reliant on the status of migrations
to determine whether to include usage from these migrations in the
total, and races between setting the migration status and decrementing
resource usage via 'drop_move_claim' can result in incorrect usage.
That change tackled the confirm resize operation. This one changes the
revert resize operation, and is a little trickier due to kinks in how
both the same-cell and cross-cell resize revert operations work.
For same-cell resize revert, the 'ComputeManager.revert_resize'
function, running on the destination host, sets the migration status to
'reverted' before dropping the move claim. This exposes the same race
that we previously saw with the confirm resize operation. It then calls
back to 'ComputeManager.finish_revert_resize' on the source host to boot
up the instance itself. This is kind of weird, because, even ignoring
the race, we're marking the migration as 'reverted' before we've done
any of the necessary work on the source host.
The cross-cell resize revert splits dropping of the move claim and
setting of the migration status between the source and destination host
tasks. Specifically, we do cleanup on the destination and drop the move
claim first, via 'ComputeManager.revert_snapshot_based_resize_at_dest'
before resuming the instance and setting the migration status on the
source via
'ComputeManager.finish_revert_snapshot_based_resize_at_source'. This
would appear to avoid the weird quirk of same-cell migration, however,
in typical weird cross-cell fashion, these are actually different
instances and different migration records.
The solution is once again to move the setting of the migration status
and the dropping of the claim under 'COMPUTE_RESOURCE_SEMAPHORE'. This
introduces the weird setting of migration status before completion to
the cross-cell resize case and perpetuates it in the same-cell case, but
this seems like a suitable compromise to avoid attempts to do things
like unplugging already unplugged PCI devices or unpinning already
unpinned CPUs. From an end-user perspective, instance state changes are
what really matter and once a revert is completed on the destination
host and the instance has been marked as having returned to the source
host, hard reboots can help us resolve any remaining issues.
Change-Id: I29d6f4a78c0206385a550967ce244794e71cef6d
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Closes-Bug: #1879878
|
|\ \ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If rollback_live_migration failed, the migration status is set to
'error', and there might me some resource not be cleaned up like vpmem
since rollback is not completed. So we propose to track those 'error'
migrations in resource tracker until they are cleaned up by periodic
task '_cleanup_incomplete_migrations'.
So if rollback_live_migration succeeds, we need to set the migration
status to 'failed' which will not be tracked in resource tracker. The
'failed' status is already used for resize to indicated a migration
finishing the cleanup.
'_cleanup_incomplete_migrations' will also handle failed
rollback_live_migration cleanup except for failed resize/revert-resize.
Besides, we introduce a new 'cleanup_lingering_instance_resources' virt
driver interface to handle lingering instance resources cleanup
including vpmem cleanup and whatever we add in the future.
Change-Id: I422a907056543f9bf95acbffdd2658438febf801
Partially-Implements: blueprint vpmem-enhancement
|
|\ \ \
| |_|/
|/| | |
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For attach:
* Generates InstancePciRequest for SRIOV interfaces attach requests
* Claims and allocates a PciDevice for such request
For detach:
* Frees PciDevice and deletes the InstancePciRequests
On the libvirt driver side the following small fixes was necessar:
* Fixes PCI address generation to avoid double 0x prefixes in LibvirtConfigGuestHostdevPCI
* Adds support for comparing LibvirtConfigGuestHostdevPCI objects
* Extends the comparison of LibvirtConfigGuestInterface to support
macvtap interfaces where target_dev is only known by libvirt but not
nova
* generalize guest.get_interface_by_cfg() to work with both
LibvirtConfigGuest[Inteface|HostdevPCI] objects
Implements: blueprint sriov-interface-attach-detach
Change-Id: I67504a37b0fe2ae5da3cba2f3122d9d0e18b9481
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The 'ResourceTracker.update_available_resource' periodic task builds
usage information for the current host by inspecting instances and
in-progress migrations, combining the two. Specifically, it finds all
instances that are not in the 'DELETED' or 'SHELVED_OFFLOADED' state,
calculates the usage from these, then finds all in-progress migrations
for the host that don't have an associated instance (to prevent double
accounting) and includes the usage for these.
In addition to the periodic task, the 'ResourceTracker' class has a
number of helper functions to make or drop claims for the inventory
generated by the 'update_available_resource' periodic task as part of
the various instance operations. These helpers naturally assume that
when making a claim for a particular instance or migration, there
shouldn't already be resources allocated for same. Conversely, when
dropping claims, the resources should currently be allocated. However,
the check for *active* instances and *in-progress* migrations in the
periodic task means we have to be careful in how we make changes to a
given instance or migration record. Running the periodic task between
such an operation and an attempt to make or drop a claim can result in
TOCTOU-like races.
This generally isn't an issue: we use the 'COMPUTE_RESOURCE_SEMAPHORE'
semaphore to prevent the periodic task running while we're claiming
resources in helpers like 'ResourceTracker.instance_claim' and we make
our changes to the instances and migrations within this context. There
is one exception though: the 'drop_move_claim' helper. This function is
used when dropping a claim for either a cold migration, a resize or a
live migration, and will drop usage from either the source host (based
on the "old" flavor) for a resize confirm or the destination host (based
on the "new" flavor) for a resize revert or live migration rollback.
Unfortunately, while the function itself is wrapped in the semaphore, no
changes to the state or the instance or migration in question are
protected by it.
Consider the confirm resize case, which we're addressing here. If we
mark the migration as 'confirmed' before running 'drop_move_claim', then
the periodic task running between these steps will not account for the
usage on the source since the migration is allegedly 'confirmed'. The
call to 'drop_move_claim' will then result in the tracker dropping usage
that we're no longer accounting for. This "set migration status before
dropping usage" is the current behaviour for both same-cell and
cross-cell resize, via the 'ComputeManager.confirm_resize' and
'ComputeManager.confirm_snapshot_based_resize_at_source' functions,
respectively. We could reverse those calls and run 'drop_move_claim'
before marking the migration as 'confirmed', but while our usage will be
momentarily correct, the periodic task running between these steps will
re-add the usage we just dropped since the migration isn't yet
'confirmed'. The correct solution is to close this gap between setting
the migration status and dropping the move claim to zero. We do this by
putting both operations behind the 'COMPUTE_RESOURCE_SEMAPHORE', just
like the claim operations.
Change-Id: I26b050c402f5721fc490126e9becb643af9279b4
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Partial-Bug: #1879878
|
|
|
|
|
|
|
|
|
|
|
|
| |
This series implements the referenced blueprint to allow for specifying
custom resource provider traits and inventories via yaml config files.
This fourth commit adds the config option, release notes, documentation,
functional tests, and calls to the previously implemented functions in
order to load provider config files and merge them to the provider tree.
Change-Id: I59c5758c570acccb629f7010d3104e00d79976e4
Blueprint: provider-config-file
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This series implements the referenced blueprint to allow for specifying
custom resource provider traits and inventories via yaml config files.
This third commit includes functions on the provider tree to merge
additional inventories and traits to resource providers and update
those providers on the provider tree. Those functions are not currently
being called, but will be in a future commit.
Co-Author: Tony Su <tao.su@intel.com>
Author: Dustin Cowles <dustin.cowles@intel.com>
Blueprint: provider-config-file
Change-Id: I142a1f24ff2219cf308578f0236259d183785cff
|
|
|
|
|
|
|
|
|
|
|
| |
We use these things many places in the code and it would be good to have
constants to reference. Do just that.
Note that this results in a change in the object hash. However, there
are no actual changes in the output object so that's okay.
Change-Id: If02567ce0a3431dda5b2bf6d398bbf7cc954eed0
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Claim allocations from placement first, then claim specific
resources in Resource Tracker on destination to populate
migration_context.new_resources
3. cleanup specific resources when live migration succeeds/fails
Because we store specific resources in migration_context during
live migration, to ensure cleanup correctly we can't drop
migration_context before cleanup is complete:
a) when post live migration, we move source host cleanup before
destination cleanup(post_live_migration_at_destination will
apply migration_context and drop it)
b) when rollback live migration, we drop migration_context after
rollback operations are complete
For different specific resource, we might need driver specific support,
such as vpmem. This change just ensures that new claimed specific
resources are populated to migration_context and migration_context is not
droped before cleanup is complete.
Change-Id: I44ad826f0edb39d770bb3201c675dff78154cbf3
Implements: blueprint support-live-migration-with-virtual-persistent-memory
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the resource tracker has to lock a compute host for updates or
inspection, it uses a single semaphore. In most cases, this is fine, as
a compute process only is tracking one hypervisor. However, in Ironic, it's
possible for one compute process to track many hypervisors. In this
case, wait queues for instance claims can get "stuck" briefly behind
longer processing loops such as the update_resources periodic job. The
reason this is possible is because the oslo.lockutils synchronized
library does not use fair locks by default. When a lock is released, one
of the threads waiting for the lock is randomly allowed to take the lock
next. A fair lock ensures that the thread that next requested the lock
will be allowed to take it.
This should ensure that instance claim requests do not have a chance of
losing the lock contest, which should ensure that instance build
requests do not queue unnecessarily behind long-running tasks.
This includes bumping the oslo.concurrency dependency; fair locks were
added in 3.29.0 (I37577becff4978bf643c65fa9bc2d78d342ea35a).
Change-Id: Ia5e521e0f0c7a78b5ace5de9f343e84d872553f9
Related-Bug: #1864122
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During creating or moving of an instance with qos SRIOV port the PCI
device claim on the destination compute needs to be restricted to select
PCI VFs from the same PF where the bandwidth for the qos port is
allocated from. This is achieved by updating the spec part of the
InstancePCIRequest with the device name of the PF by calling
update_pci_request_spec_with_allocated_interface_name(). Until now
such update of the instance object was directly persisted by the call.
During code review it was came up that the instance.save() in the util
is not appropriate as the caller has a lot more context to decide when
to persist the changes.
The original eager instance.save was introduced when support added to
the server create flow. Now I realized that the need for such save was
due to a mistake in the original ResourceTracker.instance_claim() call
that loads the InstancePCIRequest from the DB instead of using the
requests through the passed in instance object. By removing the extra DB
call the need for eagerly persisting the PCI spec update is also
removed. It turned out that both the server create code path and every
server move code paths eventually persist the instance object either
during at the end of the claim process or in case of live migration in
the post_live_migration_at_destination compute manager call. This means
that the code now can be simplified. Especially the live migration cases.
In the live migrate abort case we don't need to roll back the eagerly
persisted PCI change as now such change is only persisted at the end
of the migration but still we need to refresh pci_requests field of
the instance object during the rollback as that field might be stale,
containing dest host related PCI information.
Also in case of rescheduling during live migrate if the rescheduling
failed the PCI change needed to be rolled back to the source host by a
specific code. But now those change are never persisted until the
migration finishes so this rollback code can be removed too.
Change-Id: Ied8f96b4e67f79498519931cb6b35dad5288bbb8
blueprint: support-move-ops-with-qos-ports-ussuri
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When reverting a cross-cell resize, conductor will:
1. clean up the destination host
2. set instance.hidden=True and destroy the instance in the
target cell database
3. finish the revert on the source host which will revert the
allocations on the source host held by the migration record
so the instance will hold those again and drop the allocations
against the dest host which were held by the instance.
If the ResourceTracker.update_available_resource periodic task runs
between steps 2 and 3 it could see that the instance is deleted
from the target cell but there are still allocations held by it and
delete them. Step 3 is what handles deleting those allocations for
the destination node, so we want to leave it that way and take the
ResourceTracker out of the flow.
This change simply checks the instance.hidden value on the deleted
instance and if hidden=True, assumes the allocations will be cleaned
up elsehwere (finish_revert_snapshot_based_resize_at_source).
Ultimately this is probably not something we *have* to have since
finish_revert_snapshot_based_resize_at_source is going to drop the
destination node allocations anyway, but it is good to keep clear
which actor is doing what in this process.
Part of blueprint cross-cell-resize
Change-Id: Idb82b056c39fd167864cadd205d624cb87cbe9cb
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We have at least one use case [1] for identifying resource providers
which represent compute nodes. There are a few ways we could do that
hackishly (e.g. [2], [3]) but the clean way is to have nova-compute mark
the provider with a trait, since nova-compute knows which one it is
anyway.
This commit uses the COMPUTE_NODE trait for this purpose, and bumps the
os-traits requirement to 1.1.0 where it is introduced.
Arguably this is a no-op until something starts using it, but a release
note is added anyway warning that all compute nodes should be upgraded
to ussuri (or the trait added manually) for the trait to be useful.
[1] https://review.opendev.org/#/c/670112/7/nova/cmd/manage.py@2921
[2] Assume a provider with a certain resource class, like MEMORY_MB, is
always a compute node. This is not necessarily future-proof (maybe all
MEMORY_MB will someday reside on NUMA node providers; similar for other
resource classes) and isn't necessarily true in all cases today anyway
(ironic nodes don't have MEMORY_MB inventory) and there's also currently
no easy way to query for that (GET /resource_providers?MEMORY_MB:1 won't
return "full" providers, and you can't ask for :0).
[3] Assume a root provider without the MISC_SHARES_VIA_AGGREGATE trait
is a compute node. This assumes you're only using placement for nova-ish
things.
Change-Id: I4cb9cbe1e02c3f6c6148f73a38d10e8db7e61b1a
|
|\ \ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This addresses some nits from that review related to
the tense in the docs and no longer valid code comments
in the resource tracker.
Change-Id: Idde7ef4e91d516b8f225118862e36feda4c8a9d4
|
|\ \ \
| |/ / |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
With Ib62ac0b692eb92a2ed364ec9f486ded05def39ad and the
get_inventory method gone nothing uses this so we can
remove it now.
Change-Id: I3f55e09641465279b8b92551a2302219fe6fc5ca
|
|\ \ \
| |/ / |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In Train [1] we deprecated support for compute drivers
that did not implement the update_provider_tree method.
That compat code is now removed along with the get_inventory
method definition and (most) references to it.
As a result there are more things we can remove but those
will come in separate changes.
[1] I1eae47bce08f6292d38e893a2122289bcd6f4b58
Change-Id: Ib62ac0b692eb92a2ed364ec9f486ded05def39ad
|
|\ \ \
| |/ /
|/| | |
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| | |
During the instance claim the resource tracker sets the
instance host, node and launched_on values. If the build
fails the compute manager resets the host and node values
but was not clearing the launched_on field so that is done
in this change.
Change-Id: I37c5475e66570415b46d0b75edc91547225fd818
|