delta/openstack/nova.git - opendev.org: openstack/nova.git

	Commit message (Collapse)	Author	Age	Files	Lines
...
\| *	ec2: Remove unused functions from 'ec2utils'	Stephen Finucane	2019-07-10	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove a large number of util functions from the aforementioned modules. These were identified by searching for matches to functions bottom up, removing those with new matches, then repeating until no matches were found. Most of what's left is used by the metadata API and could be moved there in the future. An exception that is no longer used is removed. There are some unused objects also, but these will be removed in a follow-up as their removal is significantly more involved. Change-Id: I852648a975745d0327378ffc30f1469a7ce82ca7 Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
* \|	Merge "Handle Invalid exceptions as expected in attach_interface"	Zuul	2019-07-23	1	-2/+2
\|\ \
\| * \|	Handle Invalid exceptions as expected in attach_interface	Matt Riedemann	2019-04-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The bug prompting this is a tempest test which is requesting a port attachment to a server but not specifying a port or network to use, so nova-compute looks for a valid network and finds there are two and raises NetworkAmbiguous. This is treated as a 400 error in the API but because this is a synchronous RPC call from nova-api to nova-compute, oslo.messaging logs an exception traceback for the unexpected error. That traceback is pretty gross in the compute logs for something that is a user error and the cloud operator has nothing to do to fix it. We can handle the traceback by registering our expected exceptions for the attach_interface method with oslo.messaging, which is what this change does. While looking to just add NetworkAmbiguous it became clear that lots of different user errors can be raised from this method and none of those should result in a traceback, so this change just expects Invalid and its subclasses. The one exception is InterfaceAttachFailed which is raised when something in allocate_port_for_instance or driver.attach_interface fails. That is an unexpected situation so the parent class for InterfaceAttachFailed is changed from Invalid to NovaException so it continues to be traced in the logs as an exception. InterfaceAttachFailedNoNetwork is kept as Invalid since it is a user error (trying to attach an interface when the user has no access to any networks). test_tagged_attach_interface_raises is adjusted to show the ExpectedException handling for one of the Invalid cases. Change-Id: I927ff1d8c8f45405833d6012b7d7af37b98b10a0 Closes-Bug: #1823198
* \| \|	Merge "nova-manage: heal port allocations"	Zuul	2019-07-22	1	-0/+50
\|\ \ \
\| * \| \|	nova-manage: heal port allocations	Balazs Gibizer	2019-07-15	1	-0/+50
\| \| \|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before I97f06d0ec34cbd75c182caaa686b8de5c777a576 it was possible to create servers with neutron ports which had resource_request (e.g. a port with QoS minimum bandwidth policy rule) without allocating the requested resources in placement. So there could be servers for which the allocation needs to be healed in placement. This patch extends the nova-manage heal_allocation CLI to create the missing port allocations in placement and update the port in neutron with the resource provider uuid that is used for the allocation. There are known limiations of this patch. It does not try to reimplement Placement's allocation candidate functionality. Therefore it cannot handle the situation when there is more than one RP in the compute tree which provides the required traits for a port. In this situation deciding which RP to use would require the in_tree allocation candidate support from placement which is not available yet and 2) information about which PCI PF an SRIOV port is allocated from its VF and which RP represents that PCI device in placement. This information is only available on the compute hosts. For the unsupported cases the command will fail gracefully. As soon as migration support for such servers are implemented in the blueprint support-move-ops-with-qos-ports the admin can heal the allocation of such servers by migrating them. During healing both placement and neutron need to be updated. If any of those updates fail the code tries to roll back the previous updates for the instance to make sure that the healing can be re-run later without issue. However if the rollback fails then the script will terminate with an error message pointing to documentation that describes how to recover from such a partially healed situation manually. Closes-Bug: #1819923 Change-Id: I4b2b1688822eb2f0174df0c8c6c16d554781af85
* \| \|	Remove Rocky-era min compute trusted certs compat check	Matt Riedemann	2019-07-05	1	-6/+0
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Compute service version 31 was added in Rocky [1] so we can now remove the compatibility code for trusted certs support from the API. [1] Ie3130e104d7ca80289f1bd9f0fee9a7a198c263c Change-Id: I58e9933d910a40a138de7a1c12cc643745e1cc47
* \|	Remove 'MultiattachSupportNotYetAvailable' exception	Stephen Finucane	2019-06-27	1	-8/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the removal of cinder < v3.44 support and cells v1 support, there is no longer anything raising the aforementioned exception. Remove references to it. Part of blueprint remove-cells-v1 Change-Id: I79bc1cdab28474d8e979ef9b7a07b674013d2ac3 Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
* \|	Merge "Remove 'InstanceUnknownCell' exception"	Zuul	2019-06-15	1	-4/+0
\|\ \
\| * \|	Remove 'InstanceUnknownCell' exception	Stephen Finucane	2019-06-12	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This should have been removed in change I1dd6abcc2be17ff76f108e7ff3771314f33259c6 but was not. Remove it now. Part of blueprint remove-cells-v1 Change-Id: I52f3e4b191d14a2808315081b79474ebe0ce4f79 Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
* \| \|	Warn for duplicate host mappings during discover_hosts	melanie witt	2019-06-13	1	-0/+4
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the 'nova-manage cellv2 discover_hosts' command is run in parallel during a deployment, it results in simultaneous attempts to map the same compute or service hosts at the same time, resulting in tracebacks: "DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, u\"Duplicate entry 'compute-0.localdomain' for key 'uniq_host_mappings0host'\") [SQL: u'INSERT INTO host_mappings (created_at, updated_at, cell_id, host) VALUES (%(created_at)s, %(updated_at)s, %(cell_id)s, %(host)s)'] [parameters: {'host': u'compute-0.localdomain', %'cell_id': 5, 'created_at': datetime.datetime(2019, 4, 10, 15, 20, %50, 527925), 'updated_at': None}] This adds more information to the command help and adds a warning message when duplicate host mappings are detected with guidance about how to run the command. The command will return 2 if a duplicate host mapping is encountered and the documentation is updated to explain this. This also adds a warning to the scheduler periodic task to recommend enabling the periodic on only one scheduler to prevent collisions. We choose to warn and stop instead of ignoring DBDuplicateEntry because there could potentially be a large number of parallel tasks competing to insert duplicate records where only one can succeed. If we ignore and continue to the next record, the large number of tasks will repeatedly collide in a tight loop until all get through the entire list of compute hosts that are being mapped. So we instead stop the colliding task and emit a message. Closes-Bug: #1824445 Change-Id: Ia7718ce099294e94309103feb9cc2397ff8f5188
* \|	Merge "db: Remove cell APIs"	Zuul	2019-06-04	1	-8/+0
\|\ \
\| * \|	db: Remove cell APIs	Stephen Finucane	2019-05-29	1	-8/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These are no longer needed. We can't actually remove the 'cells' table yet (that has to wait another release or two) but a TODO is added to ensure this eventually happens. The 'CellExists' and 'CellNotFound' exceptions, which were only raised by these APIs, are removed. Part of blueprint remove-cells-v1 Change-Id: Ibc402b446c9b92ce03a1dd98f41ec6cf5db20642 Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
* \| \|	Merge "Add get_usages_counts_for_quota to SchedulerReportClient"	Zuul	2019-05-30	1	-0/+5
\|\ \ \
\| * \| \|	Add get_usages_counts_for_quota to SchedulerReportClient	melanie witt	2019-04-18	1	-0/+5
\| \| \|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a method for requesting /usages from placement for the purpose of counting quota usage for cores and ram. It is used in the next patch in the series. Part of blueprint count-quota-usage-from-placement Change-Id: I35f98f88f8353602e1bfc135f35d1b7bc9ba42a4
* \| \|	Merge "Block swap volume on volumes with >1 rw attachment"	Zuul	2019-05-30	1	-0/+5
\|\ \ \ \| \|_\|/ \|/\| \|
\| * \|	Block swap volume on volumes with >1 rw attachment	Matt Riedemann	2019-05-22	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we're swapping from a multiattach volume that has more than one read/write attachment, another server on the secondary attachment could be writing to the volume which is not getting copied into the volume to which we're swapping, so we could have data loss during the swap. This change does volume read/write attachment counting for the volume we're swapping from and if there is more than one read/write attachment on the volume, the swap volume operation fails with a 400 BadRequest error. Depends-On: https://review.openstack.org/573025/ Closes-Bug: #1775418 Change-Id: Icd7fcb87a09c35a13e4e14235feb30a289d22778
* \| \|	Merge "Remove cells code"	Zuul	2019-05-29	1	-20/+0
\|\ \ \ \| \|/ / \|/\| \|
\| * \|	Remove cells code	Stephen Finucane	2019-05-20	1	-20/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Thankfully the bulk of this is neatly organized in a single directory and can be removed, now that the bulk of the references to it have been removed. The only complicated area is the tests, though effort has been taken to minimise the diff here wherever possible. Part of blueprint remove-cells-v1 Change-Id: Ib0e0b708c46e4330e51f8f8fdfbb02d45aaf0f44 Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
* \| \|	Fix failure to boot instances with qcow2 format images	zhu.boxiang	2019-05-20	1	-0/+4
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ceph doesn't support QCOW2 for hosting a virtual machine disk: http://docs.ceph.com/docs/master/rbd/rbd-openstack/ When we set image_type as rbd and force_raw_images as False and we don't launch an instance with boot-from-volume, the instance is spawned using qcow2 as root disk but fails to boot because data is accessed as raw. To fix this, we raise an error and refuse to start nova-compute service when force_raw_images and image_type are incompatible. When we import image into rbd, check the format of cache images. If the format is not raw, remove it first and fetch it again. It will be raw format now. Change-Id: I1aa471e8df69fbb6f5d9aeb35651bd32c7123d78 Closes-Bug: 1816686
* \|	Merge "Added mount fstype based validation of Quobyte mounts"	Zuul	2019-04-05	1	-0/+4
\|\ \ \| \|/ \|/\|
\| *	Added mount fstype based validation of Quobyte mounts	Silvan Kaiser	2019-02-21	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The validation of Quobyte mounts is extended to validate based on a mounts file system type being set to "fuse.quobyte". This includes adding a new StaleVolumeMount exception type to the Nova exceptions. This also closes a bug concerning multi-registry configurations for Quobyte volumes due to no longer using the is_mounted() method that failed in that case. Finally this adds exception handling for the unmount call that is issued on trying to mount an already mounted volume. Closes-Bug: #1730933 Closes-Bug: #1737131 Change-Id: Ia5a23ce1123a68608ee2ec6f2ac5dca02da67c59
* \|	Add get_instance_pci_request_from_vif	Adrian Chiris	2019-03-07	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change extends nova/pci/request.py with a method that retrieves an instance's PCI request from a given VIF if the given VIF required a PCI allocation during instance creation. The PCI request, if retrieved, belongs to a PCI device in the compute node where the instance is running. The change is required to facilitate SR-IOV live migration allowing to claim VIF related PCI resources on the destination node. Change-Id: I9ba475e91b8283f063db446de74d3e4b2de002c5 Partial-Implements: blueprint libvirt-neutron-sriov-livemigration
* \|	Support server create with ports having resource request	Balazs Gibizer	2019-03-05	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A new API microversion, 2.72, is added that enables support for Neutron ports having resource request during server create. Note that server delete and port detach operations already handle such ports and will clean up the allocation properly. Change-Id: I7555914473e16782d8ba4513a09ce45679962c14 blueprint: bandwidth-resource-provider
* \|	Merge "Improve existing flavor and image metadata validation"	Zuul	2019-03-05	1	-0/+10
\|\ \
\| * \|	Improve existing flavor and image metadata validation	Chris Friesen	2019-03-04	1	-0/+10
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This improves the existing validation of flavor extra-specs and image properties in hardware.py in preparation for calling the validation from more places in a follow-on patch. get_cpu_topology_constraints() becomes public as we'll need to call it from the API code in the next patch in the series. If the CPU topology is not valid, raise an InvalidRequest exception with a useful error message instead of a ValueError message without any context. Add checks to ensure that the CPU and CPU thread policies are valid, and if not then raise newly-added exceptions with useful error messages. Tweak various docstrings to more accurately reflect what exceptions they might raise. Change-Id: I20854134c80b8f4598f375eae137fd2920114891 blueprint: flavor-extra-spec-image-property-validation Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
* \|	Merge "ironic: partition compute services by conductor group"	Zuul	2019-02-28	1	-0/+5
\|\ \
\| * \|	ironic: partition compute services by conductor group	Jim Rollenhagen	2019-02-27	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This uses ironic’s conductor group feature to limit the subset of nodes which a nova-compute service will manage. This allows for partitioning nova-compute services to a particular location (building, aisle, rack, etc), and provides a way for operators to manage the failure domain of a given nova-compute service. This adds two config options to the [ironic] config group: * partition_key: the key to match with the conductor_group property on ironic nodes to limit the subset of nodes which can be managed. * peer_list: a list of other compute service hostnames which manage the same partition_key, used when building the hash ring. Change-Id: I1b184ff37948dc403fe38874613cd4d870c644fd Implements: blueprint ironic-conductor-groups
* \| \|	Merge "Use placement.inventory.inuse in report client"	Zuul	2019-02-25	1	-4/+1
\|\ \ \ \| \|_\|/ \|/\| \|
\| * \|	Use placement.inventory.inuse in report client	Eric Fried	2019-02-11	1	-4/+1
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since I9a833aa35d474caa35e640bbad6c436a3b16ac5e we've had the framework for placement to return specific error codes allowing us to differentiate among error conditions for oft-repeated status codes. That change also included as its proof-of-concept a specific code for the placement side of InventoryInUse - i.e. an attempt to delete an inventory record for which there are existing allocations. SchedulerReportClient was previously identifying this error condition by parsing the text of the 409 response. With this change, it instead uses the provided error code. Change-Id: Ic621adcadf10cc607455eba48c4cb1882bde23fa
* \|	Fup for the bandwidth resource provider series	Balazs Gibizer	2019-02-04	1	-3/+6
\|/ \| \| \| \| \| \| \| \| \|	This patch fixes minor comments found in the following patches: I7e1edede827cf8469771c0496b1dce55c627cf5d I4473cb192447b5bfa9d1dfcc0dd5216c536caf73 I97f06d0ec34cbd75c182caaa686b8de5c777a576 Change-Id: I8c51654eb0744c400a1b55a9d6b8594a103dadcc blueprint: bandwidth-resource-provider
*	Merge "Reject server create with port having resource request"	Zuul	2019-02-03	1	-0/+5
\|\
\| *	Reject server create with port having resource request	Balazs Gibizer	2019-01-28	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Nova does not consider the resource request of a Neutron port as of now. So this patch makes sure that server create request is rejected if it involves a port that has resource request. When the feature is ready on the nova side this limitation will be lifted. Change-Id: I97f06d0ec34cbd75c182caaa686b8de5c777a576 blueprint: bandwidth-resource-provider
* \|	Merge "Reject networks with QoS policy"	Zuul	2019-02-02	1	-0/+5
\|\ \ \| \|/
\| *	Reject networks with QoS policy	Balazs Gibizer	2019-01-28	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When nova needs to create ports in Neutron in a network that has minimum bandwidth policy Nova would need to create allocation for the bandwidth resources. The port creation happens in the compute manager after the scheduling and resource claiming. Supporting for this is considered out of scope for the first iteration of this feature. To avoid resource allocation inconsistencies this patch propose to reject such request. This rejection does not break any existing use case as minimum bandwidth policy rule only supported by the SRIOV Neutron backend but Nova only support booting with SRIOV port if those ports are precreated in Neutron. Co-Authored-By: Elod Illes <elod.illes@ericsson.com> Change-Id: I7e1edede827cf8469771c0496b1dce55c627cf5d blueprint: bandwidth-resource-provider
* \|	Merge "Reject interface attach with QoS aware port"	Zuul	2019-01-30	1	-0/+5
\|\ \ \| \|/
\| *	Reject interface attach with QoS aware port	Balazs Gibizer	2019-01-24	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Attaching a port with minimum bandwidth policy would require to update the allocation of the server. But for that nova would need to select the proper networking resource provider under the compute resource provider the server is running on. For the first iteration of the feature we consider this out of scope. To avoid resource allocation inconsistencies this patch propose to reject such attach interface request. Rejecting such interface attach does not break existing functionality as today only the SRIOV Neutron backend supports the minimum bandwidth policy but Nova does not support interface attach with SRIOV interfaces today. A subsequent patch will handle attaching a network that has QoS policy. Co-Authored-By: Elod Illes <elod.illes@ericsson.com> Change-Id: Id8b5c48a6e8cf65dc0a7dc13a80a0a72684f70d9 blueprint: bandwidth-resource-provider
* \|	Raise 403 instead of 500 error from attach volume API	melanie witt	2019-01-25	1	-0/+6
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the libvirt driver has a limit on the maximum number of disk devices allowed to attach to a single instance of 26. If a user attempts to attach a volume which would make the total number of attached disk devices > 26 for the instance, the user receives a 500 error from the API. This adds a new exception type TooManyDiskDevices and raises it for the "No free disk devices names" condition, instead of InternalError, and handles it in the attach volume API. We raise TooManyDiskDevices directly from the libvirt driver because InternalError is ambiguous and can be raised for different error reasons within the same method call. Closes-Bug: #1770527 Change-Id: I1b08ed6826d7eb41ecdfc7102e5e8fcf3d1eb2e1
*	Merge "Convert exception messages to strings"	Zuul	2018-12-24	1	-8/+10
\|\
\| *	Convert exception messages to strings	Lucian Petrut	2018-11-05	1	-8/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Exception messages are expected to be strings, but it's not always the case. Some exceptions are wrapped and passed as new exception messages, which affects some exception handlers. Finding all the occurrences may be difficult. For now, it's easier to just convert the exception messages to strings in the NovaException class. Closes-Bug: #1801713 Change-Id: Ic470791aa2b08ca9911bf70cb3cc68652d3647f2
* \|	Merge "Add compute API validation for when a volume_type is requested"	Zuul	2018-10-12	1	-0/+12
\|\ \ \| \|/ \|/\|
\| *	Add compute API validation for when a volume_type is requested	zhangbailin	2018-10-11	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds validation in the compute API for when a block_device_mapping_v2 request specifies a volume_type. Note that this is all noop code right now since users cannot specify that field in the API until the microversion is added which enables that feature. In other words, this is just plumbing. Part of bp boot-instance-specific-storage-backend Change-Id: I45bd42908d44a0f05e1231febab926b23232b57b
* \|	consumer gen: move_allocations	Balazs Gibizer	2018-09-25	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch renames the set_and_clear_allocations function in the scheduler report client to move_allocations and adds handling of consumer generation conflict for it. This call now moves everything from one consumer to another and raises AllocationMoveFailed to the caller if the move fails due to consumer generation conflict. When migration or resize fails to move the source host allocation to the migration_uuid then the API returns HTTP 409 and the migration is aborted. If reverting a migration, a resize, or a resize to same host fails to move the source host allocation back to the instance_uuid due consumer generation conflict the instance will be put into ERROR state. The instance still has two allocations in this state and deleting the instance only deletes the one that is held by the instance_uuid. This patch logs an ERROR describing that in this case the allocation held by the migration_uuid is leaked. Blueprint: use-nested-allocation-candidates Change-Id: Ie991d4b53e9bb5e7ec26da99219178ab7695abf6
* \|	Consumer gen support for put allocations	Balazs Gibizer	2018-09-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The placement API version 1.28 introduced consumer generation as a way to make updating allocation safe even if it is done from multiple places. This patch changes the scheduler report client put_allocations function to raise AllocationUpdateFailed in case of generation conflict. The only direct user of this call is the nova-manage heal_allocations CLI which will simply fail to heal the allocation for this instance. Blueprint: use-nested-allocation-candidates Change-Id: Iba230201803ef3d33bccaaf83eb10453eea43f20
* \|	Consumer gen support for delete instance allocations	Balazs Gibizer	2018-09-25	1	-0/+5
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The placement API version 1.28 introduced consumer generation as a way to make updating allocation safe even if it is done from multiple places. This patch changes delete_allocation_for_instance to use PUT /allocations instead of DELETE /allocations to benefit from the consumer generation handling. In this patch the report client will GET the current allocation of the instance including the consumer generation and then try to PUT an empty allocation with that generation. If this fails due to a consumer generation conflict, meaning something modified the allocation of the instance in between GET and PUT then the report client will raise AllocationDeleteFailed exception. This will cause that the instance goes to ERROR state. This patch only detects a small portion of possible cases when allocation is modified outside of the delete code path. The rest can only be detected if nova would cache at least the consumer generation of the instance. To be able to put the instance state to ERROR the instance.destroy() call is moved to the end to of the deletion call path. To keep the instance.delete.end notification behavior consistent with this move (e.g. deleted_at field is filled) the notification sending needed to be moved too. Blueprint: use-nested-allocation-candidates Change-Id: I77f34788dd7ab8fdf60d668a4f76452e03cf9888
*	Merge "Explicitly fail if trying to attach SR-IOV port"	Zuul	2018-09-18	1	-0/+6
\|\
\| *	Explicitly fail if trying to attach SR-IOV port	Matt Riedemann	2018-08-21	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Attaching SR-IOV ports to existing instances is not supported since the compute service does not perform any kind of PCI device allocation, so we should fail fast with a clear error if attempted. Note that the compute RPC API "attach_interface" method is an RPC call from nova-api to nova-compute so the error raised here will result in a 400 response to the user. Blueprint sriov-interface-attach-detach would need to be implemented to support this use case, and could arguably involve a microversion to indicate when the feature was made available. A related neutron docs patch https://review.openstack.org/594325 is posted for mentioning the limitation with SR-IOV port attach as well. Change-Id: Ibbf2bd3cdd45bcd61eebff883c30ded525b2495d Closes-Bug: #1708433
* \|	Compute: Handle reshaped provider trees	Eric Fried	2018-08-28	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the compute side "plumbing" for the referenced blueprint, implementing changes in the compute manager, resource tracker, and virt driver interface. - The `startup` parameter (indicating compute service startup vs. periodic task) is percolated from the compute manager's update_available_resource all the way down the call stack to the resource tracker's _update method, where it is used to determine whether to allow the virt driver's update_provider_tree to indicate that a reshape is required. - The update_provider_tree method gets a new `allocations` parameter. When None, the method should raise ReshapeNeeded (see below) if inventories need to be moved, whereupon the resource tracker will populate the `allocations` parameter and reinvoke update_provider_tree with it. - A new ReshapeNeeded exception is introduced. It is used as a signal by the virt driver that inventories need to be moved. It should only be raised by the virt driver a) on startup, and b) when the `allocations` parameter is None. - The compute manager's _update_available_resource_for_node gets special exception handling clauses for ReshapeNeeded and ReshapeFailed, both of which blow up the compute service on startup. (On periodic, where this should "never" happen, they just make the logs red. We may later want to disable the compute service or similar to make this unusual catastrophic event more noticeable.) Change-Id: Ic062446e5c620c89aec3065b34bcdc6bf5966275 blueprint: reshape-provider-tree
* \|	Report client: _reshape helper, placement min bump	Eric Fried	2018-08-24	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a thin wrapper to invoke the POST /reshaper placement API with appropriate error checking. This bumps the placement minimum to the reshaper microversion, 1.30. Change-Id: Idf8997d5efdfdfca6967899a0882ffb9ecf96915 blueprint: reshape-provider-tree
* \|	Report client: Real get_allocs_for_consumer	Eric Fried	2018-08-24	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In preparation for reshaper work, implement a superior method to retrieve allocations for a consumer. The new get_allocs_for_consumer: - Uses the microversion that returns consumer generations (1.28). - Doesn't hide error conditions: - If the request returns non-200, instead of returning {}, it raises a new ConsumerAllocationRetrievalFailed exception. - If we fail to communicate with the placement API, instead of returning None, it raises (a subclass of) ksa ClientException. - Returns the entire payload rather than just the 'allocations' dict. The existing get_allocations_for_consumer is refactored to behave compatibly (except it logs warnings for the previously-silently-hidden error conditions). In a subsequent patch, we should rework all callers of this method to use the new one, and get rid of the old one. Change-Id: I0e9a804ae7717252175f7fe409223f5eb8f50013 blueprint: reshape-provider-tree
* \|	Make get_allocations_for_resource_provider raise	Eric Fried	2018-08-23	1	-0/+5
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In preparation for reshaper work, and because it's The Right Thing To Do, this patch makes the report client's get_allocations_for_resource_provider method stop acting like something didn't go wrong when its placement API request fails. We remove the @safe_connect decorator so it raises (a subclass of) ksa's ClientException when something goes wrong with the API communication. And the method now raises a new ResourceProviderAllocationRetrievalFailed exception on non-200. In the spirit of the aggregate and trait retrieval methods, it now returns a namedtuple containing the allocation information. Unit tests, which were entirely absent, are added for the method. The resource tracker's _remove_deleted_instances_allocations, which is get_allocations_for_resource_provider's only consumer (for now, until reshaper work starts using it) is refactored to behave the same way it used to, which is to no-op if the placement API request fails. However, a) we take an earlier short-circuit out of the method, which saves a little work copying context stuff; and b) we now emit a warning log message if the no-op is due to the newly-raised exceptions. Change-Id: I020e7dc47efc79f8907b7bfb753ec779a8da69a1