summaryrefslogtreecommitdiff
path: root/nova/compute/resource_tracker.py
Commit message (Collapse)AuthorAgeFilesLines
* Stop ignoring missing compute nodes in claimsDan Smith2023-04-241-13/+25
| | | | | | | | | | | | | | | | | | | | The resource tracker will silently ignore attempts to claim resources when the node requested is not managed by this host. The misleading "self.disabled(nodename)" check will fail if the nodename is not known to the resource tracker, causing us to bail early with a NopClaim. That means we also don't do additional setup like creating a migration context for the instance, claim resources in placement, and handle PCI/NUMA things. This behavior is quite old, and clearly doesn't make sense in a world with things like placement. The bulk of the test changes here are due to the fact that a lot of tests were relying on this silent ignoring of a mismatching node, because they were passing node names that weren't even tracked. This change makes us raise an error if this happens so that we can actually catch it, and avoid silently continuing with no resource claim. Change-Id: I416126ee5d10428c296fe618aa877cca0e8dffcf
* Abort startup if nodename conflict is detectedDan Smith2023-02-011-1/+7
| | | | | | | | | | | | | | We do run update_available_resource() synchronously during service startup, but we only allow certain exceptions to abort startup. This makes us abort for InvalidConfiguration, and makes the resource tracker raise that for the case where the compute node create failed due to a duplicate entry. This also modifies the object to raise a nova-specific error for that condition to avoid the compute node needing to import oslo_db stuff just to be able to catch it. Change-Id: I5de98e6fe52e45996bc2e1014fa8a09a2de53682
* Make resource tracker use UUIDs instead of namesDan Smith2023-01-301-52/+31
| | | | | | | | | | | | | | | | This makes the resource tracker look up and create ComputeNode objects by uuid instead of nodename. For drivers like ironic that already provide 'uuid' in the resources dict, we can use that. For those that do not, we force the uuid to be the locally-persisted node uuid, and use that to find/create the ComputeNode object. A (happy) side-effect of this is that if we find a deleted compute node object that matches that of our hypervisor, we undelete it instead of re-creating one with a new uuid, which may clash with our old one. This means we remove some of the special-casing of ironic rebalance, although the tests for that still largely stay the same. Change-Id: I6a582a38c302fd1554a49abc38cfeda7c324d911
* Support same host resize with PCI in placementBalazs Gibizer2022-12-211-11/+13
| | | | | | | | | | | | | Id02e445c55fc956965b7d725f0260876d42422f2 added special case in the healing logic for same host resize. Now that the scheduler also creates allocation on the destination host during resize we need to make sure that the drop_move_claim code that runs during revert and confirm drops the tracked migration from the resource tracker only after the healing logic run as these migrations being confirmed / reverted are still affecting PciDevices at this point. blueprint: pci-device-tracking-in-placement Change-Id: I6241965fe6c1cc1f2560fcce65d5e32ef308d502
* Map PCI pools to RP UUIDsBalazs Gibizer2022-10-171-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | Nova's PCI scheduling (and the PCI claim) works based on PCI device pools where the similar available PCI devices are assigned. The PCI devices are now represented in placement as RPs. And the allocation candidates during scheduling and the allocation after scheduling now contain PCI devices. This information needs to affect the PCI scheduling and PCI claim. To be able to do that we need to map PCI device pools to RPs. We achieve that here by first mapping PciDevice objects to RPs during placement PCI inventory reporting. Then mapping pools to RPs based on the PCI devices assigned to the pools. Also because now ResourceTracker._update_to_placement() call updates the PCI device pools the sequence of events needed to changed in the ResourceTracker to: 1) run _update_to_placement() 2) copy the pools to the CompouteNode object 3) save the compute to the DB 4) save the PCI tracker blueprint: pci-device-tracking-in-placement Change-Id: I9bb450ac235ab72ff0d8078635e7a11c04ff6c1e
* Handle PCI dev reconf with allocationsBalazs Gibizer2022-08-261-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | PCI devices which are allocated to instances can be removed from the [pci]device_spec configuration or can be removed from the hypervisor directly. The existing PciTracker code handle this cases by keeping the PciDevice in the nova DB exists and allocated and issue a warning in the logs during the compute service startup that nova is in an inconsistent state. Similar behavior is now added to the PCI placement tracking code so the PCI inventories and allocations in placement is kept in such situation. There is one case where we cannot simply accept the PCI device reconfiguration by keeping the existing allocations and applying the new config. It is when a PF that is configured and allocated is removed and VFs from this PF is now configured in the [pci]device_spec. And vice versa when VFs are removed and its parent PF is configured. In this case keeping the existing inventory and allocations and adding the new inventory to placement would result in placement model where a single PCI device would provide both PF and VF inventories. This dependent device configuration is not supported as it could lead to double consumption. In such situation the compute service will refuse to start. blueprint: pci-device-tracking-in-placement Change-Id: Id130893de650cc2d38953cea7cf9f53af71ced93
* Heal allocation for same host resizeBalazs Gibizer2022-08-261-1/+11
| | | | | | | | | | Same host resize needs special handling in the allocation healing logic as both the source and the dest host PCI devices are visible to the healing code as the PciDevice.instance_uuid points to the healed instance in both cases. blueprint: pci-device-tracking-in-placement Change-Id: Id02e445c55fc956965b7d725f0260876d42422f2
* Retry /reshape at provider generation conflictBalazs Gibizer2022-08-251-3/+10
| | | | | | | | | | | | | | | | | | | | During a normal update_available_resources run if the local provider tree caches is invalid (i.e. due to the scheduler made an allocation bumping the generation of the RPs) and the virt driver try to update the inventory of an RP based on the cache Placement will report conflict, the report client will invalidate the caches and the retry decorator on ResourceTracker._update_to_placement will re-drive the top of the fresh RP data. However the same thing can happen during reshape as well but the retry mechanism is missing in that code path so the stale caches can cause reshape failures. This patch adds specific error handling in the reshape code path to implement the same retry mechanism as exists for inventory update. blueprint: pci-device-tracking-in-placement Change-Id: Ieb954a04e6aba827611765f7f401124a1fe298f3
* Basics for PCI Placement reportingBalazs Gibizer2022-08-251-9/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new PCI resource handler is added to the update_available_resources code path update the ProviderTree with PCI device RPs, inventories and traits. It is a bit different than the other Placement inventory reporter. It does not run in the virt driver level as PCI is tracked in a generic way in the PCI tracker in the resource tracker. So the virt specific information is already parsed and abstracted by the resource tracker. Another difference is that to support rolling upgrade the PCI handler code needs to be prepared for situations where the scheduler does not create PCI allocations even after some of the compute already started reporting inventories and started healing PCI allocations. So the code is not prepared to do a single, one shot, reshape at startup, but instead to do a continuous healing of the allocations. We can remove this continuous healing after the PCI prefilter will be made mandatory in a future release. The whole PCI placement reporting behavior is disabled by default while it is incomplete. When it is functionally complete a new [pci]report_in_placement config option will be added to allow enabling the feature. This config is intentionally not added by this patch as we don't want to allow enabling this logic yet. blueprint: pci-device-tracking-in-placement Change-Id: If975c3ec09ffa95f647eb4419874aa8417a59721
* Unify placement client singleton implementationsDan Smith2022-08-181-1/+1
| | | | | | | | | | | | We have many places where we implement singleton behavior for the placement client. This unifies them into a single place and implementation. Not only does this DRY things up, but may cause us to initialize it fewer times and also allows for emitting a common set of error messages about expected failures for better troubleshooting. Change-Id: Iab8a791f64323f996e1d6e6d5a7e7a7c34eb4fb3 Related-Bug: #1846820
* Fix typos in help messagesRajesh Tailor2022-05-301-1/+1
| | | | | | | This change fixes typos in conf parameter help messages and in error log message. Change-Id: Iedc268072d77771b208603e663b0ce9b94215eb8
* Fix migration with remote-managed ports & add FTDmitrii Shcherbakov2022-03-041-1/+6
| | | | | | | | | | | | | | | | | | | | | | | `binding:profile` updates are handled differently for migration from instance creation which was not taken into account previously. Relevant fields (card_serial_number, pf_mac_address, vf_num) are now added to the `binding:profile` after a new remote-managed PCI device is determined at the destination node. Likewise, there is special handling for the unshelve operation which is fixed too. Func testing: * Allow the generated device XML to contain the PCI VPD capability; * Add test cases for basic operations on instances with remote-managed ports (tunnel or physical); * Add a live migration test case similar to how it is done for non-remote-managed SR-IOV ports but taking remote-managed port related specifics into account; * Add evacuate, shelve/unshelve, cold migration test cases. Change-Id: I9a1532e9a98f89db69b9ae3b41b06318a43519b3
* Filter computes without remote-managed ports earlyDmitrii Shcherbakov2022-02-091-1/+27
| | | | | | | | | | | | Add a pre-filter for requests that contain VNIC_TYPE_REMOTE_MANAGED ports in them: hosts that do not have either the relevant compute driver capability COMPUTE_REMOTE_MANAGED_PORTS or PCI device pools with "remote_managed" devices are filtered out early. Presence of devices actually available for allocation is checked at a later point by the PciPassthroughFilter. Change-Id: I168d3ccc914f25a3d4255c9b319ee6b91a2f66e2 Implements: blueprint integration-with-off-path-network-backends
* [rt] Apply migration context for incoming migrationsBalazs Gibizer2021-12-071-4/+31
| | | | | | | | | | | | | | | | | | | | | | | | | There is a race condition between an incoming resize and an update_available_resource periodic in the resource tracker. The race window starts when the resize_instance RPC finishes and ends when the finish_resize compute RPC finally applies the migration context on the instance. In the race window, if the update_available_resource periodic is run on the destination node, then it will see the instance as being tracked on this host as the instance.node is already pointing to the dest. But the instance.numa_topology still points to the source host topology as the migration context is not applied yet. This leads to CPU pinning error if the source topology does not fit to the dest topology. Also it stops the periodic task and leaves the tracker in an inconsistent state. The inconsistent state only cleanup up after the periodic is run outside of the race window. This patch applies the migration context temporarily to the specific instances during the periodic to keep resource accounting correct. Change-Id: Icaad155e22c9e2d86e464a0deb741c73f0dfb28a Closes-Bug: #1953359 Closes-Bug: #1952915
* Add force kwarg to delete_allocation_for_instanceMatt Riedemann2021-08-301-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a force kwarg to delete_allocation_for_instance which defaults to True because that was found to be the most common use case by a significant margin during implementation of this patch. In most cases, this method is called when we want to delete the allocations because they should be gone, e.g. server delete, failed build, or shelve offload. The alternative in these cases is the caller could trap the conflict error and retry but we might as well just force the delete in that case (it's cleaner). When force=True, it will DELETE the consumer allocations rather than GET and PUT with an empty allocations dict and the consumer generation which can result in a 409 conflict from Placement. For example, bug 1836754 shows that in one tempest test that creates a server and then immediately deletes it, we can hit a very tight window where the method GETs the allocations and before it PUTs the empty allocations to remove them, something changes which results in a conflict and the server delete fails with a 409 error. It's worth noting that delete_allocation_for_instance used to just DELETE the allocations before Stein [1] when we started taking consumer generations into account. There was also a related mailing list thread [2]. Closes-Bug: #1836754 [1] I77f34788dd7ab8fdf60d668a4f76452e03cf9888 [2] http://lists.openstack.org/pipermail/openstack-dev/2018-August/133374.html Change-Id: Ife3c7a5a95c5d707983ab33fd2fbfc1cfb72f676
* Invalidate provider tree when compute node disappearsMark Goddard2021-08-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | There is a race condition in nova-compute with the ironic virt driver as nodes get rebalanced. It can lead to compute nodes being removed in the DB and not repopulated. Ultimately this prevents these nodes from being scheduled to. The issue being addressed here is that if a compute node is deleted by a host which thinks it is an orphan, then the resource provider for that node might also be deleted. The compute host that owns the node might not recreate the resource provider if it exists in the provider tree cache. This change fixes the issue by clearing resource providers from the provider tree cache for which a compute node entry does not exist. Then, when the available resource for the node is updated, the resource providers are not found in the cache and get recreated in placement. Change-Id: Ia53ff43e6964963cdf295604ba0fb7171389606e Related-Bug: #1853009 Related-Bug: #1841481
* Clear rebalanced compute nodes from resource trackerStephen Finucane2021-08-121-0/+17
| | | | | | | | | | | | | | | | | | | There is a race condition in nova-compute with the ironic virt driver as nodes get rebalanced. It can lead to compute nodes being removed in the DB and not repopulated. Ultimately this prevents these nodes from being scheduled to. The issue being addressed here is that if a compute node is deleted by a host which thinks it is an orphan, then the compute host that actually owns the node might not recreate it if the node is already in its resource tracker cache. This change fixes the issue by clearing nodes from the resource tracker cache for which a compute node entry does not exist. Then, when the available resource for the node is updated, the compute node object is not found in the cache and gets recreated. Change-Id: I39241223b447fcc671161c370dbf16e1773b684a Partial-Bug: #1853009
* Remove (almost) all references to 'instance_type'Stephen Finucane2021-03-291-56/+55
| | | | | | | | | | | This continues on from I81fec10535034f3a81d46713a6eda813f90561cf and removes all other references to 'instance_type' where it's possible to do so. The only things left are DB columns, o.vo fields, some unversioned objects, and RPC API methods. If we want to remove these, we can but it's a lot more work. Change-Id: I264d6df1809d7283415e69a66a9153829b8df537 Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
* pci manager: replace node_id parameter with compute_nodeArtom Lifshitz2021-03-081-2/+1
| | | | | | | | | | | | | | | | To implement the `socket` PCI NUMA affinity policy, we'll need to track the host NUMA topology in the PCI stats code. To achieve this, PCI stats will need to know the compute node it's running on. Prepare for this by replacing the node_id parameter with compute_node. Node_id was previously optional, but that looks to have been only to facilitate testing, as that's the only place where it was not passed it. We use compute_node (instead of just making node_id mandatory) because it allows for an optimization later on wherein the PCI manager does not need to pull the ComputeNode object from the database needlessly. Implements: blueprint pci-socket-affinity Change-Id: Idc839312d1449e9327ee7e3793d53ed080a44d0c
* Make PCI claim NUMA aware during live migrationBalazs Gibizer2020-11-241-3/+1
| | | | | | | | | | | NUMA aware live migration and SRIOV live migration was implemented as two separate feature. As a consequence the case when both SRIOV and NUMA is present in the instance was missed. When the PCI device is claimed on the destination host the NUMA topology of the instance needs to be passed to the claim call. Change-Id: If469762b22d687151198468f0291821cebdf26b2 Closes-Bug: #1893221
* Merge "Set instance host and drop migration under lock"Zuul2020-11-181-0/+18
|\
| * Set instance host and drop migration under lockBalazs Gibizer2020-11-041-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The _update_available_resources periodic makes resource allocation adjustments while holding the COMPUTE_RESOURCE_SEMAPHORE based on the list of instances assigned to this host of the resource tracker and based on the migrations where the source or the target host is the host of the resource tracker. So if the instance.host or the migration context changes without holding the COMPUTE_RESOURCE_SEMAPHORE while the _update_available_resources task is running there there will be data inconsistency in the resource tracker. This patch makes sure that during evacuation the instance.host and the migration context is changed while holding the semaphore. Change-Id: Ica180165184b319651d22fe77e076af036228860 Closes-Bug: #1896463
* | Merge "virt: Remove 'get_per_instance_usage' API"Zuul2020-11-091-40/+1
|\ \ | |/ |/|
| * virt: Remove 'get_per_instance_usage' APIStephen Finucane2020-09-111-40/+1
| | | | | | | | | | | | | | | | | | Another pretty trivial one. This one was intended to provide an overview of instances that weren't properly tracked but were running on the host. It was only ever implemented for the XenAPI driver so remove it now. Change-Id: Icaba3fc89e3295200e3d165722a5c24ee070002c Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
* | Merge "Follow up for I67504a37b0fe2ae5da3cba2f3122d9d0e18b9481"Zuul2020-09-111-1/+2
|\ \
| * | Follow up for I67504a37b0fe2ae5da3cba2f3122d9d0e18b9481Balazs Gibizer2020-09-111-1/+2
| |/ | | | | | | | | | | Part of blueprint sriov-interface-attach-detach Change-Id: Ifc5417a8eddf62ad49d898fa6c9c1da71c6e0bb3
* | Merge "Move revert resize under semaphore"Zuul2020-09-111-0/+18
|\ \ | |/ |/|
| * Move revert resize under semaphoreStephen Finucane2020-09-031-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As discussed in change I26b050c402f5721fc490126e9becb643af9279b4, the resource tracker's periodic task is reliant on the status of migrations to determine whether to include usage from these migrations in the total, and races between setting the migration status and decrementing resource usage via 'drop_move_claim' can result in incorrect usage. That change tackled the confirm resize operation. This one changes the revert resize operation, and is a little trickier due to kinks in how both the same-cell and cross-cell resize revert operations work. For same-cell resize revert, the 'ComputeManager.revert_resize' function, running on the destination host, sets the migration status to 'reverted' before dropping the move claim. This exposes the same race that we previously saw with the confirm resize operation. It then calls back to 'ComputeManager.finish_revert_resize' on the source host to boot up the instance itself. This is kind of weird, because, even ignoring the race, we're marking the migration as 'reverted' before we've done any of the necessary work on the source host. The cross-cell resize revert splits dropping of the move claim and setting of the migration status between the source and destination host tasks. Specifically, we do cleanup on the destination and drop the move claim first, via 'ComputeManager.revert_snapshot_based_resize_at_dest' before resuming the instance and setting the migration status on the source via 'ComputeManager.finish_revert_snapshot_based_resize_at_source'. This would appear to avoid the weird quirk of same-cell migration, however, in typical weird cross-cell fashion, these are actually different instances and different migration records. The solution is once again to move the setting of the migration status and the dropping of the claim under 'COMPUTE_RESOURCE_SEMAPHORE'. This introduces the weird setting of migration status before completion to the cross-cell resize case and perpetuates it in the same-cell case, but this seems like a suitable compromise to avoid attempts to do things like unplugging already unplugged PCI devices or unpinning already unpinned CPUs. From an end-user perspective, instance state changes are what really matter and once a revert is completed on the destination host and the instance has been marked as having returned to the source host, hard reboots can help us resolve any remaining issues. Change-Id: I29d6f4a78c0206385a550967ce244794e71cef6d Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Closes-Bug: #1879878
* | Merge "Track error migrations in resource tracker"Zuul2020-09-111-3/+8
|\ \
| * | Track error migrations in resource trackerLuyaoZhong2020-09-101-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If rollback_live_migration failed, the migration status is set to 'error', and there might me some resource not be cleaned up like vpmem since rollback is not completed. So we propose to track those 'error' migrations in resource tracker until they are cleaned up by periodic task '_cleanup_incomplete_migrations'. So if rollback_live_migration succeeds, we need to set the migration status to 'failed' which will not be tracked in resource tracker. The 'failed' status is already used for resize to indicated a migration finishing the cleanup. '_cleanup_incomplete_migrations' will also handle failed rollback_live_migration cleanup except for failed resize/revert-resize. Besides, we introduce a new 'cleanup_lingering_instance_resources' virt driver interface to handle lingering instance resources cleanup including vpmem cleanup and whatever we add in the future. Change-Id: I422a907056543f9bf95acbffdd2658438febf801 Partially-Implements: blueprint vpmem-enhancement
* | | Merge "Support SRIOV interface attach and detach"Zuul2020-09-101-2/+18
|\ \ \ | |_|/ |/| |
| * | Support SRIOV interface attach and detachBalazs Gibizer2020-09-101-2/+18
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For attach: * Generates InstancePciRequest for SRIOV interfaces attach requests * Claims and allocates a PciDevice for such request For detach: * Frees PciDevice and deletes the InstancePciRequests On the libvirt driver side the following small fixes was necessar: * Fixes PCI address generation to avoid double 0x prefixes in LibvirtConfigGuestHostdevPCI * Adds support for comparing LibvirtConfigGuestHostdevPCI objects * Extends the comparison of LibvirtConfigGuestInterface to support macvtap interfaces where target_dev is only known by libvirt but not nova * generalize guest.get_interface_by_cfg() to work with both LibvirtConfigGuest[Inteface|HostdevPCI] objects Implements: blueprint sriov-interface-attach-detach Change-Id: I67504a37b0fe2ae5da3cba2f3122d9d0e18b9481
* | Move confirm resize under semaphoreStephen Finucane2020-09-031-0/+22
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'ResourceTracker.update_available_resource' periodic task builds usage information for the current host by inspecting instances and in-progress migrations, combining the two. Specifically, it finds all instances that are not in the 'DELETED' or 'SHELVED_OFFLOADED' state, calculates the usage from these, then finds all in-progress migrations for the host that don't have an associated instance (to prevent double accounting) and includes the usage for these. In addition to the periodic task, the 'ResourceTracker' class has a number of helper functions to make or drop claims for the inventory generated by the 'update_available_resource' periodic task as part of the various instance operations. These helpers naturally assume that when making a claim for a particular instance or migration, there shouldn't already be resources allocated for same. Conversely, when dropping claims, the resources should currently be allocated. However, the check for *active* instances and *in-progress* migrations in the periodic task means we have to be careful in how we make changes to a given instance or migration record. Running the periodic task between such an operation and an attempt to make or drop a claim can result in TOCTOU-like races. This generally isn't an issue: we use the 'COMPUTE_RESOURCE_SEMAPHORE' semaphore to prevent the periodic task running while we're claiming resources in helpers like 'ResourceTracker.instance_claim' and we make our changes to the instances and migrations within this context. There is one exception though: the 'drop_move_claim' helper. This function is used when dropping a claim for either a cold migration, a resize or a live migration, and will drop usage from either the source host (based on the "old" flavor) for a resize confirm or the destination host (based on the "new" flavor) for a resize revert or live migration rollback. Unfortunately, while the function itself is wrapped in the semaphore, no changes to the state or the instance or migration in question are protected by it. Consider the confirm resize case, which we're addressing here. If we mark the migration as 'confirmed' before running 'drop_move_claim', then the periodic task running between these steps will not account for the usage on the source since the migration is allegedly 'confirmed'. The call to 'drop_move_claim' will then result in the tracker dropping usage that we're no longer accounting for. This "set migration status before dropping usage" is the current behaviour for both same-cell and cross-cell resize, via the 'ComputeManager.confirm_resize' and 'ComputeManager.confirm_snapshot_based_resize_at_source' functions, respectively. We could reverse those calls and run 'drop_move_claim' before marking the migration as 'confirmed', but while our usage will be momentarily correct, the periodic task running between these steps will re-add the usage we just dropped since the migration isn't yet 'confirmed'. The correct solution is to close this gap between setting the migration status and dropping the move claim to zero. We do this by putting both operations behind the 'COMPUTE_RESOURCE_SEMAPHORE', just like the claim operations. Change-Id: I26b050c402f5721fc490126e9becb643af9279b4 Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Partial-Bug: #1879878
* Provider Config File: Enable loading and merging of provider configsDustin Cowles2020-08-261-1/+26
| | | | | | | | | | | | This series implements the referenced blueprint to allow for specifying custom resource provider traits and inventories via yaml config files. This fourth commit adds the config option, release notes, documentation, functional tests, and calls to the previously implemented functions in order to load provider config files and merge them to the provider tree. Change-Id: I59c5758c570acccb629f7010d3104e00d79976e4 Blueprint: provider-config-file
* Provider Config File: Functions to merge provider configs to provider treeDustin Cowles2020-08-261-0/+134
| | | | | | | | | | | | | | | This series implements the referenced blueprint to allow for specifying custom resource provider traits and inventories via yaml config files. This third commit includes functions on the provider tree to merge additional inventories and traits to resource providers and update those providers on the provider tree. Those functions are not currently being called, but will be in a future commit. Co-Author: Tony Su <tao.su@intel.com> Author: Dustin Cowles <dustin.cowles@intel.com> Blueprint: provider-config-file Change-Id: I142a1f24ff2219cf308578f0236259d183785cff
* objects: Add MigrationTypeFieldStephen Finucane2020-05-081-10/+13
| | | | | | | | | | | We use these things many places in the code and it would be good to have constants to reference. Do just that. Note that this results in a change in the object hash. However, there are no actual changes in the output object so that's okay. Change-Id: If02567ce0a3431dda5b2bf6d398bbf7cc954eed0 Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
* partial support for live migration with specific resourcesLuyaoZhong2020-04-071-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | 1. Claim allocations from placement first, then claim specific resources in Resource Tracker on destination to populate migration_context.new_resources 3. cleanup specific resources when live migration succeeds/fails Because we store specific resources in migration_context during live migration, to ensure cleanup correctly we can't drop migration_context before cleanup is complete: a) when post live migration, we move source host cleanup before destination cleanup(post_live_migration_at_destination will apply migration_context and drop it) b) when rollback live migration, we drop migration_context after rollback operations are complete For different specific resource, we might need driver specific support, such as vpmem. This change just ensures that new claimed specific resources are populated to migration_context and migration_context is not droped before cleanup is complete. Change-Id: I44ad826f0edb39d770bb3201c675dff78154cbf3 Implements: blueprint support-live-migration-with-virtual-persistent-memory
* Use fair locks in resource trackerJason Anderson2020-03-091-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | When the resource tracker has to lock a compute host for updates or inspection, it uses a single semaphore. In most cases, this is fine, as a compute process only is tracking one hypervisor. However, in Ironic, it's possible for one compute process to track many hypervisors. In this case, wait queues for instance claims can get "stuck" briefly behind longer processing loops such as the update_resources periodic job. The reason this is possible is because the oslo.lockutils synchronized library does not use fair locks by default. When a lock is released, one of the threads waiting for the lock is randomly allowed to take the lock next. A fair lock ensures that the thread that next requested the lock will be allowed to take it. This should ensure that instance claim requests do not have a chance of losing the lock contest, which should ensure that instance build requests do not queue unnecessarily behind long-running tasks. This includes bumping the oslo.concurrency dependency; fair locks were added in 3.29.0 (I37577becff4978bf643c65fa9bc2d78d342ea35a). Change-Id: Ia5e521e0f0c7a78b5ace5de9f343e84d872553f9 Related-Bug: #1864122
* Remove extra instance.save() calls related to qos SRIOV portsBalazs Gibizer2020-02-031-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During creating or moving of an instance with qos SRIOV port the PCI device claim on the destination compute needs to be restricted to select PCI VFs from the same PF where the bandwidth for the qos port is allocated from. This is achieved by updating the spec part of the InstancePCIRequest with the device name of the PF by calling update_pci_request_spec_with_allocated_interface_name(). Until now such update of the instance object was directly persisted by the call. During code review it was came up that the instance.save() in the util is not appropriate as the caller has a lot more context to decide when to persist the changes. The original eager instance.save was introduced when support added to the server create flow. Now I realized that the need for such save was due to a mistake in the original ResourceTracker.instance_claim() call that loads the InstancePCIRequest from the DB instead of using the requests through the passed in instance object. By removing the extra DB call the need for eagerly persisting the PCI spec update is also removed. It turned out that both the server create code path and every server move code paths eventually persist the instance object either during at the end of the claim process or in case of live migration in the post_live_migration_at_destination compute manager call. This means that the code now can be simplified. Especially the live migration cases. In the live migrate abort case we don't need to roll back the eagerly persisted PCI change as now such change is only persisted at the end of the migration but still we need to refresh pci_requests field of the instance object during the rollback as that field might be stale, containing dest host related PCI information. Also in case of rescheduling during live migrate if the rescheduling failed the PCI change needed to be rolled back to the source host by a specific code. But now those change are never persisted until the migration finishes so this rollback code can be removed too. Change-Id: Ied8f96b4e67f79498519931cb6b35dad5288bbb8 blueprint: support-move-ops-with-qos-ports-ussuri
* Deal with cross-cell resize in _remove_deleted_instances_allocationsMatt Riedemann2019-12-121-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When reverting a cross-cell resize, conductor will: 1. clean up the destination host 2. set instance.hidden=True and destroy the instance in the target cell database 3. finish the revert on the source host which will revert the allocations on the source host held by the migration record so the instance will hold those again and drop the allocations against the dest host which were held by the instance. If the ResourceTracker.update_available_resource periodic task runs between steps 2 and 3 it could see that the instance is deleted from the target cell but there are still allocations held by it and delete them. Step 3 is what handles deleting those allocations for the destination node, so we want to leave it that way and take the ResourceTracker out of the flow. This change simply checks the instance.hidden value on the deleted instance and if hidden=True, assumes the allocations will be cleaned up elsehwere (finish_revert_snapshot_based_resize_at_source). Ultimately this is probably not something we *have* to have since finish_revert_snapshot_based_resize_at_source is going to drop the destination node allocations anyway, but it is good to keep clear which actor is doing what in this process. Part of blueprint cross-cell-resize Change-Id: Idb82b056c39fd167864cadd205d624cb87cbe9cb
* Merge "Always trait the compute node RP with COMPUTE_NODE"Zuul2019-11-151-0/+5
|\
| * Always trait the compute node RP with COMPUTE_NODEEric Fried2019-10-211-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have at least one use case [1] for identifying resource providers which represent compute nodes. There are a few ways we could do that hackishly (e.g. [2], [3]) but the clean way is to have nova-compute mark the provider with a trait, since nova-compute knows which one it is anyway. This commit uses the COMPUTE_NODE trait for this purpose, and bumps the os-traits requirement to 1.1.0 where it is introduced. Arguably this is a no-op until something starts using it, but a release note is added anyway warning that all compute nodes should be upgraded to ussuri (or the trait added manually) for the trait to be useful. [1] https://review.opendev.org/#/c/670112/7/nova/cmd/manage.py@2921 [2] Assume a provider with a certain resource class, like MEMORY_MB, is always a compute node. This is not necessarily future-proof (maybe all MEMORY_MB will someday reside on NUMA node providers; similar for other resource classes) and isn't necessarily true in all cases today anyway (ironic nodes don't have MEMORY_MB inventory) and there's also currently no easy way to query for that (GET /resource_providers?MEMORY_MB:1 won't return "full" providers, and you can't ask for :0). [3] Assume a root provider without the MISC_SHARES_VIA_AGGREGATE trait is a compute node. This assumes you're only using placement for nova-ish things. Change-Id: I4cb9cbe1e02c3f6c6148f73a38d10e8db7e61b1a
* | Merge "FUP for Ib62ac0b692eb92a2ed364ec9f486ded05def39ad"Zuul2019-11-151-7/+7
|\ \
| * | FUP for Ib62ac0b692eb92a2ed364ec9f486ded05def39adMatt Riedemann2019-11-081-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | This addresses some nits from that review related to the tense in the docs and no longer valid code comments in the resource tracker. Change-Id: Idde7ef4e91d516b8f225118862e36feda4c8a9d4
* | | Merge "Delete _normalize_inventory_from_cn_obj"Zuul2019-11-141-47/+0
|\ \ \ | |/ /
| * | Delete _normalize_inventory_from_cn_objMatt Riedemann2019-11-071-47/+0
| | | | | | | | | | | | | | | | | | | | | | | | With Ib62ac0b692eb92a2ed364ec9f486ded05def39ad and the get_inventory method gone nothing uses this so we can remove it now. Change-Id: I3f55e09641465279b8b92551a2302219fe6fc5ca
* | | Merge "Drop compat for non-update_provider_tree code paths"Zuul2019-11-141-53/+33
|\ \ \ | |/ /
| * | Drop compat for non-update_provider_tree code pathsMatt Riedemann2019-11-071-53/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In Train [1] we deprecated support for compute drivers that did not implement the update_provider_tree method. That compat code is now removed along with the get_inventory method definition and (most) references to it. As a result there are more things we can remove but those will come in separate changes. [1] I1eae47bce08f6292d38e893a2122289bcd6f4b58 Change-Id: Ib62ac0b692eb92a2ed364ec9f486ded05def39ad
* | | Merge "Clear instance.launched_on when build fails"Zuul2019-11-131-0/+3
|\ \ \ | |/ / |/| |
| * | Clear instance.launched_on when build failsMatt Riedemann2019-09-201-0/+3
| |/ | | | | | | | | | | | | | | | | | | During the instance claim the resource tracker sets the instance host, node and launched_on values. If the build fails the compute manager resets the host and node values but was not clearing the launched_on field so that is done in this change. Change-Id: I37c5475e66570415b46d0b75edc91547225fd818