delta/openstack/nova.git - opendev.org: openstack/nova.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Handle zero pinned CPU in a cell with mixed policy	Balazs Gibizer	2022-12-13	1	-23/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When cpu_policy is mixed the scheduler tries to find a valid CPU pinning for each instance NUMA cell. However if there is an instance NUMA cell that does not request any pinned CPUs then such logic will calculate empty pinning information for that cell. Then the scheduler logic wrongly assumes that an empty pinning result means there was no valid pinning. However there is difference between a None result when no valid pinning found, from an empty result [] which means there was nothing to pin. This patch makes sure that pinning == None is differentiated from pinning == []. Closes-Bug: #1994526 Change-Id: I5a35a45abfcfbbb858a94927853777f112e73e5b
*	Reproduce asym NUMA mixed CPU policy bug	Balazs Gibizer	2022-12-13	1	-0/+74
\| \| \| \| \|	Related-Bug: #1994526 Change-Id: I52ee068377cc48ef4b4cdcb4b05fdc8d926faddf
*	Add compute restart capability for libvirt func tests	Balazs Gibizer	2022-08-10	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The existing generic restart_compute_service() call in the nova test base class is not appropriate for the libvirt functional test that needs to reconfigure the libvirt connection as it is not aware of the libvirt specific mocking needed when a compute service is started. So this patch adds a specific restart_compute_service() call to nova.tests.functional.libvirt.base.ServersTestBase. This will be used by a later patch testing [pci]device_spec reconfiguration scenarios. This change showed that some of the existing libvirt functional test used the incomplete restart_compute_service from the base class. Others used local mocking to inject new pci config to the restart. I moved all these to the new function and removed the local mocking. Change-Id: Ic717dc42ac6b6cace59d344acaf12f9d1ee35564
*	Use unittest.mock instead of third party mock	Stephen Finucane	2022-08-01	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we no longer support py27, we can use the standard library unittest.mock module instead of the third party mock lib. Most of this is autogenerated, as described below, but there is one manual change necessary: nova/tests/functional/regressions/test_bug_1781286.py We need to avoid using 'fixtures.MockPatch' since fixtures is using 'mock' (the library) under the hood and a call to 'mock.patch.stop' found in that test will now "stop" mocks from the wrong library. We have discussed making this configurable but the option proposed isn't that pretty [1] so this is better. The remainder was auto-generated with the following (hacky) script, with one or two manual tweaks after the fact: import glob for path in glob.glob('nova/tests/*/.py', recursive=True): with open(path) as fh: lines = fh.readlines() if 'import mock\n' not in lines: continue import_group_found = False create_first_party_group = False for num, line in enumerate(lines): line = line.strip() if line.startswith('import ') or line.startswith('from '): tokens = line.split() for lib in ( 'ddt', 'six', 'webob', 'fixtures', 'testtools' 'neutron', 'cinder', 'ironic', 'keystone', 'oslo', ): if lib in tokens[1]: create_first_party_group = True break if create_first_party_group: break import_group_found = True if not import_group_found: continue if line.startswith('import ') or line.startswith('from '): tokens = line.split() if tokens[1] > 'unittest': break elif tokens[1] == 'unittest' and ( len(tokens) == 2 or tokens[4] > 'mock' ): break elif not line: break if create_first_party_group: lines.insert(num, 'from unittest import mock\n\n') else: lines.insert(num, 'from unittest import mock\n') del lines[lines.index('import mock\n')] with open(path, 'w+') as fh: fh.writelines(lines) Note that we cannot remove mock from our requirements files yet due to importing pypowervm unit test code in nova unit tests. This library still uses the mock lib, and since we are importing test code and that lib (correctly) only declares mock in its test-requirements.txt, mock would not otherwise be installed and would cause errors while loading nova unit test code. [1] https://github.com/testing-cabal/fixtures/pull/49 Change-Id: Id5b04cf2f6ca24af8e366d23f15cf0e5cac8e1cc Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
*	Fix typos	Rajesh Tailor	2022-05-30	1	-1/+1
\| \| \| \| \| \| \|	This change fixes some of the typos in unit tests as well as in nova code-base. Change-Id: I209bbb270baf889fcb2b9a4d1ce0ab4a962d0d0e
*	[rt] Apply migration context for incoming migrations	Balazs Gibizer	2021-12-07	1	-20/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a race condition between an incoming resize and an update_available_resource periodic in the resource tracker. The race window starts when the resize_instance RPC finishes and ends when the finish_resize compute RPC finally applies the migration context on the instance. In the race window, if the update_available_resource periodic is run on the destination node, then it will see the instance as being tracked on this host as the instance.node is already pointing to the dest. But the instance.numa_topology still points to the source host topology as the migration context is not applied yet. This leads to CPU pinning error if the source topology does not fit to the dest topology. Also it stops the periodic task and leaves the tracker in an inconsistent state. The inconsistent state only cleanup up after the periodic is run outside of the race window. This patch applies the migration context temporarily to the specific instances during the periodic to keep resource accounting correct. Change-Id: Icaad155e22c9e2d86e464a0deb741c73f0dfb28a Closes-Bug: #1953359 Closes-Bug: #1952915
*	Extend the reproducer for 1953359 and 1952915	Balazs Gibizer	2021-12-07	1	-17/+45
\| \| \| \| \| \| \| \| \| \|	This patch extends the original reproduction I4be429c56aaa15ee12f448978c38214e741eae63 to cover bug 1952915 as well as they have a common root cause. Change-Id: I57982131768d87e067d1413012b96f1baa68052b Related-Bug: #1953359 Related-Bug: #1952915
*	Reproduce bug 1953359	Balazs Gibizer	2021-12-06	1	-0/+81
\| \| \| \| \| \| \| \|	This patch adds a functional test that reproduces a race between incoming migration and the update_available_resource periodic Change-Id: I4be429c56aaa15ee12f448978c38214e741eae63 Related-Bug: #1953359
*	Store old_flavor already on source host during resize	Balazs Gibizer	2021-09-27	1	-27/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During resize, on the source host, in resize_instance(), the instance.host and .node is updated to point to the destination host. This indicates to the source host's resource tracker that the allocation of this instance does not need to be tracked as an instance but as an outbound migration instead. However for the source host's resource tracker to do that it, needs to use the instance.old_flavor. Unfortunately the instance.old_flavor is only set during finish_resize() on the destination host. (resize_instance cast to the finish_resize). So it is possible that a running resize_instance() set the instance.host to point to the destination and then before the finish_resize could set the old_flavor an update_available_resources periodic runs on the source host. This causes that the allocation of this instance is not tracked as an instance as the instance.host point to the destination but it is not tracked as a migration either as the instance.old_flavor is not yet set. So the allocation on the source host is simply dropped by the periodic job. When such migration is confirmed the confirm_resize() tries to drop the same resource allocation again but fails as the pinned CPUs of the instance already freed. When such migration is reverted instead, then revert succeeds but the source host resource allocation will not contain the resource allocation of the instance until the next update_available_resources periodic runs and corrects it. This does not affect resources tracked exclusively in placement (e.g. VCPU, MEMORY_MB, DISK_GB) but it does affect NUMA related resource that are still tracked in the resource tracker (e.g. huge pages, pinned CPUs). This patch moves the instance.old_flavor setting to the source node to the same transaction that sets the instance.host to point to the destination host. Hence solving the race condition. Change-Id: Ic0d6c59147abe5e094e8f13e0d3523b178daeba9 Closes-Bug: #1944759
*	Reproduce bug 1944759	Balazs Gibizer	2021-09-24	1	-0/+121
\| \| \| \| \| \| \| \| \|	Add functional tests to reproduce the race between resize_instance() and update_available_resources(). Related-Bug: #1944759 Change-Id: Icb7e3379248fe00f9a94f9860181b5de44902379
*	scheduler: Merge driver into manager	Stephen Finucane	2021-08-23	1	-1/+1
\| \| \| \| \| \| \| \| \|	There's only one driver now, which means there isn't really a driver at all. Move the code into the manager altogether and avoid a useless layer of abstraction. Change-Id: I609df5b707e05ea70c8a738701423ca751682575 Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
*	Fix max cpu topologies with numa affinity	Sean Mooney	2021-06-18	1	-18/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Nova has never supported specifying per numa node cpu toplogies. Logically the cpu toplogy of a guest is independent of its numa toplogy and there is no way to model different cpu toplogies per numa node or implement that in hardware. The presence of the code in nova that allowed the generation of these invalid configuration has now been removed as it broke the automatic selection of cpu topologies based on hw:max_[cpus\|sockets\|threads] flavor and image properties. This change removed the incorrect code and related unit tests with assert nova could generate invalid topologies. Closes-Bug: #1910466 Change-Id: Ia81a0fdbd950b51dbcc70c65ba492549a224ce2b
*	Test numa and vcpu topologies bug: #1910466	Sean Mooney	2021-06-18	1	-2/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change reproduces bug #1910466 When hw:cpu_max_[sockets\|cores\|threads] is configured in addition to an explict numa topologies and cpu pinning nova is currently incapable of generating the correct virtual CPU topology resulting in an index out of range error as we attempt to retrieve the first topology from an empty list. This change reproduces the error via a new functional test. Related-Bug: #1910466 Change-Id: I333b3d85deed971678141307dd06545e308cf989
*	Create a fixture around fake_notifier	Balazs Gibizer	2021-05-24	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \|	The fake_notifier uses module globals and also needs careful stub and reset calls to work properly. This patch wraps the fake_notifier into a proper Fixture that automates the complexity. This is fairly rage patch but it does not change any logic just redirect calls from the fake_notifier to the new NotificationFixture Change-Id: I456f685f480b8de71014cf232a8f08c731605ad8
*	tests: Move libvirt-specific fixtures	Stephen Finucane	2021-05-24	1	-1/+1
\| \| \| \| \| \| \| \|	These were left to last since there's a bit of cleanup necessary to move everything across. Change-Id: I921c812ac03f7d32eec31200772020c17f292851 Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
*	tests: Move remaining non-libvirt fixtures	Stephen Finucane	2021-05-12	1	-2/+2
\| \| \| \| \| \| \| \|	Move these to the central place. There's a large amount of test damage but it's pretty trivial. Change-Id: If581eb7aa463c9dde13714f34f0f1b41549a7130 Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
*	fakelibvirt: make kB_mem default not laughable	Artom Lifshitz	2021-03-09	1	-38/+19
\| \| \| \| \| \| \| \| \| \|	Previously, the default for kB_mem was 2048. That's 2MB of RAM. Change this to 16GB, and cleanup tests that were forced to explicitly set sane value by removing the kB_mem argument. In cases where tests were using the default CPU configuration, arguments to HostInfo are removed entirely. Change-Id: I1c97776583f9e43c4a284fb9ef59bd6293730a28
*	Merge "hardware: Check inventory of shared CPUs for 'mixed' policy"	Zuul	2021-02-17	1	-5/+2
\|\
\| *	hardware: Check inventory of shared CPUs for 'mixed' policy	Stephen Finucane	2020-10-12	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If using the 'mixed' CPU policy then, by design, the instance is consuming both "shared" and "dedicated" host CPUs. However, we were only checking the latter. Fix this. Change-Id: Ic21c918736e7046ad32a2b8a3330496ce42950b0 Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Closes-Bug: #1898272
* \|	Merge "functional: Add test for #1898272"	Zuul	2021-02-16	1	-1/+52
\|\ \ \| \|/
\| *	functional: Add test for #1898272	Stephen Finucane	2020-10-12	1	-1/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If you have two host NUMA nodes and you mark the entirety of one node for shared vCPUs (VCPU) and the other for dedicated vCPUs (PCPU), then it should not be possible to boot an instance with a 'mixed' CPU policy and a single NUMA node, since satisfying that request should necessitate splitting across host nodes. This is not currently the case. Prove that out. Change-Id: I72afe21d117e932cc00c9f616618fb3722bf8ca4 Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Related-Bug: #1898272
* \|	Merge "functional: Add tests for mixed CPU policy"	Zuul	2021-02-04	1	-7/+64
\|\ \ \| \|/
\| *	functional: Add tests for mixed CPU policy	Stephen Finucane	2020-10-12	1	-7/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm not sure why these weren't included in the series, but they help prove out what this is doing. This highlights a small "bug" in the code, whereby the topology object associated with the instance's NUMA cell doesn't have the correct number of CPUs. This has no adverse effects since that attribute isn't actually used except to indicate a minimum thread count necessary for the cell, but it is wrong as we fix it all the same. Some tests are renamed now that we're testing more than one "legacy" policy. Part of blueprint use-pcpu-and-vcpu-in-one-instance Change-Id: I30a040d549f48d53cab2c59c00bb269f821ace88 Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
* \|	Remove six.text_type (2/2)	Takashi Natsume	2020-12-13	1	-4/+2
\|/ \| \| \| \| \| \| \| \|	Replace six.text_type with str. This patch completes six removal. Change-Id: I779bd1446dc1f070fa5100ccccda7881fa508d79 Implements: blueprint six-removal Signed-off-by: Takashi Natsume <takanattie@gmail.com>
*	functional: Add and use 'GlanceFixture'	Stephen Finucane	2020-09-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This rather beefy (but also quite simple) patch replaces the 'stub_out_image_service' call and associated cleanup in all functional tests with a new 'GlanceFixture', based on the old 'FakeImageService'. The use of a fixture means we don't have to worry about teardown and allows us to stub Glance in the same manners as Cinder, Neutron, Placement etc. Unit test cleanup is handled in a later patch. Change-Id: I6daea47988181dfa6dde3d9c42004c0ecf6ae87a Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
*	functional: Enable real policy fixture by default	Stephen Finucane	2020-09-16	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enable the policy fixture by default, which should yield more realistic functional tests. We need to update some tests to use admin APIs where policy dictates they are necessary. Note that we're currently testing the legacy policy - not the updated, scoped policy - since the legacy policy is the default one currently. Note that we also need to modify the 'SingleCellSimple' fixture in this change to use the same project ID as the 'OSAPIFixture'. Change-Id: Ia3dea78f16cb3c7081714c4db36e20d5ee76ed7d Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
*	functional: Add 'start_compute' helper	Stephen Finucane	2020-09-10	1	-72/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change Id3f77c4ecccfdc4caa6dbf120c3df4fbdfce9d0f added the 'start_computes' function as a helper to start multiple compute hosts with different 'HostInfo' objects. Unfortunately, there are cases where you might also want to start multiple computes with e.g. different 'PCIInfo' objects and this can't handle those cases. We could expand the 'host_info_dict' parameter to transition from a mapping of hostnames -> HostInfo objects to a mapping of hostnames -> HostInfo and PCIInfo objects, but after a while that gets frankly quite ridiculous. Instead, replace 'start_computes' with 'start_compute', a new helper that can handle creating a new service with its own unique 'HostInfo' and various other attributes currently accepted by the '_get_connection' helper. This allows us to remove 'start_computes' in its entirety. Change-Id: I79a16a0a62c6060cd3062174ce68fd8cbde9f3fc Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
*	Don't unset Instance.old_flavor, new_flavor until necessary	Stephen Finucane	2020-09-01	1	-9/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since change Ia6d8a7909081b0b856bd7e290e234af7e42a2b38, the resource tracker's 'drop_move_claim' method has been capable of freeing up resource usage. However, this relies on accurate resource reporting. It transpires that there's a race whereby the resource tracker's 'update_available_resource' periodic task can end up not accounting for usage from migrations that are in the process of being completed. The root cause is the resource tracker's reliance on the stashed flavor in a given migration record [1]. Previously, this information was deleted by the compute manager at the start of the confirm migration operation [2]. The compute manager would then call the virt driver [3], which could take a not insignificant amount of time to return, before finally dropping the move claim. If the periodic task ran between the clearing of the stashed flavor and the return of the virt driver, it would find a migration record with no stashed flavor and would therefore ignore this record for accounting purposes [4], resulting in an incorrect record for the compute node, and an exception when the 'drop_move_claim' attempts to free up the resources that aren't being tracked. The solution to this issue is pretty simple. Instead of unsetting the old flavor record from the migration at the start of the various move operations, do it afterwards. [1] https://github.com/openstack/nova/blob/6557d67/nova/compute/resource_tracker.py#L1288 [2] https://github.com/openstack/nova/blob/6557d67/nova/compute/manager.py#L4310-L4315 [3] https://github.com/openstack/nova/blob/6557d67/nova/compute/manager.py#L4330-L4331 [4] https://github.com/openstack/nova/blob/6557d67/nova/compute/resource_tracker.py#L1300 Change-Id: I4760b01b695c94fa371b72216d398388cf981d28 Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Partial-Bug: #1879878 Related-Bug: #1834349 Related-Bug: #1818914
*	Merge "hardware: Reject requests for no hyperthreads on hosts with HT"	Zuul	2020-08-26	1	-6/+1
\|\
\| *	hardware: Reject requests for no hyperthreads on hosts with HT	Stephen Finucane	2020-07-31	1	-6/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Attempting to boot an instance with 'hw:cpu_policy=dedicated' will result in a request from nova-scheduler to placement for allocation candidates with $flavor.vcpu 'PCPU' inventory. Similarly, booting an instance with 'hw:cpu_thread_policy=isolate' will result in a request for allocation candidates with 'HW_CPU_HYPERTHREADING=forbidden', i.e. hosts without hyperthreading. This has been the case since the cpu-resources feature was implemented in Train. However, as part of that work and to enable upgrades from hosts that predated Train, we also make a second request for candidates with $flavor.vcpu 'VCPU' inventory. The idea behind this is that old compute nodes would only report 'VCPU' and should be useable, and any new compute nodes that got caught up in this second request could never actually be scheduled to since there wouldn't be enough cores from 'ComputeNode.numa_topology.cells.[*].pcpuset' available to schedule to, resulting in rejection by the 'NUMATopologyFilter'. However, if a host was rejected in the first query because it reported the 'HW_CPU_HYPERTHREADING' trait, it could get picked up by the second query and would happily be scheduled to, resulting in an instance consuming 'VCPU' inventory from a host that properly supported 'PCPU' inventory. The solution is simply, though also a huge hack. If we detect that the host is using new style configuration and should be able to report 'PCPU', check if the instance asked for no hyperthreading and whether the host has it. If all are True, reject the request. Change-Id: Id39aaaac09585ca1a754b669351c86e234b89dd9 Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Closes-Bug: #1889633
* \|	Merge "tests: Add reproducer for bug #1889633"	Zuul	2020-08-26	1	-0/+71
\|\ \ \| \|/
\| *	tests: Add reproducer for bug #1889633	Stephen Finucane	2020-07-31	1	-0/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the introduction of the cpu-resources work [1], (libvirt) hosts can now report 'PCPU' inventory separate from 'VCPU' inventory, which is consumed by instances with pinned CPUs ('hw:cpu_policy=dedicated'). As part of that effort, we had to drop support for the ability to boot instances with 'hw:cpu_thread_policy=isolate' (i.e. I don't want hyperthreads) on hosts with hyperthreading. This had been previously implemented by marking thread siblings of the host cores used by such an instance as reserved and unusable by other instances, but such a design wasn't possible in world where we had to track resource consumption in placement before landing in the host. Instead, the 'isolate' policy now simply means "give me a host without hyperthreads". This is enforced by hosts with hyperthreads reporting the 'HW_CPU_HYPERTHREADING' trait, and instances with the 'isolate' policy requesting 'HW_CPU_HYPERTHREADING=forbidden'. Or at least, that's how it should work. We also have a fallback query for placement to find hosts with 'VCPU' inventory and that doesn't care about the 'HW_CPU_HYPERTHREADING' trait. This was envisioned to ensure hosts with old style configuration ('[DEFAULT] vcpu_pin_set') could continue to be scheduled to. We figured that this second fallback query could accidentally pick up hosts with new-style configuration, but we are also tracking the available and used cores from those listed in the '[compute] cpu_dedicated_set' as part of the host 'NUMATopology' objects (specifically, via the 'pcpuset' and 'cpu_pinning' fields of the 'NUMACell' child objects). These are validated by both the 'NUMATopologyFilter' and the virt driver itself, which means hosts with new style configuration that got caught up in this second query would be rejected by this filter or by a late failure on the host. (Hint: there's much more detail on this in the spec). Unfortunately we didn't think about hyperthreading. If a host gets picked up in the second request, it might well have enough PCPU inventory but simply be rejected in the first query since it had hyperthreads. In this case, because it has enough free cores available for pinning, neither the filter nor the virt driver will reject the request, resulting in a situation whereby the instance ends up falling back to the old code paths and consuming $flavor.vcpu host cores, plus the thread siblings for each of these cores. Despite this, it will be marked as consuming $flavor.vcpu VCPU (not PCPU) inventory in placement. This patch proves this to be the case, allowing us to resolve the issue later. [1] https://specs.openstack.org/openstack/nova-specs/specs/train/approved/cpu-resources.html Change-Id: I87cd4d14192b1a40cbdca6e3af0f818f2cab613e Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Related-Bug: #1889633
* \|	tests: Add helpers for rebuild, cold migrate, and shelve/unshelve	Stephen Finucane	2020-08-25	1	-7/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	These are also useful for vTPM tests. Change-Id: I5d6db24a440397e588ba69f98a9cd2b8a846adc2 Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
* \|	Merge "functional: Drop '_api' suffix from placement fixture"	Zuul	2020-08-21	1	-25/+25
\|\ \
\| * \|	functional: Drop '_api' suffix from placement fixture	Stephen Finucane	2020-08-19	1	-25/+25
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's unnecessary, particularly when nothing of the other service fixtures use it. Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Change-Id: If849f80c0372872b2de57b20e8b63c069a54ccff Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
* \|	tests: Add reproducer for bug #1879878	Stephen Finucane	2020-08-19	1	-0/+83
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When one resizes a pinned instance, the instance claims host CPUs for pinning purposes on the destination. However, the host CPUs on the source are not immediately relinquished. Rather, they are held by the migration record, to handle the event that the resize is reverted. It is only when one confirms this resize that the old cores are finally relinquished. It appears there is a potential race between the resource tracker's periodic task and the freeing of these resources, resulting in attempts to unpin host cores that have already been unpinned. This test highlights that bug pending a fix. Change-Id: Ie092628ac71eb87c9dfa7220255a2953ada9e04d Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Related-Bug: #1879878
*	Increase code reuse in test_numa_servers	Artom Lifshitz	2020-03-20	1	-35/+6
\| \| \| \| \| \| \| \|	While working on I36dd64dc272ea1743995b3b696323a9431666489, it was noticed that test_numa_server.py was needlessly duplicating code around compute service starup and restarting. This patch fixes this. Change-Id: I00cd09fac0a8cb88191cc4cb3cbaf306987a97d6
*	Merge "tests: Validate huge pages"	Zuul	2020-02-19	1	-0/+58
\|\
\| *	tests: Validate huge pages	Stephen Finucane	2020-01-31	1	-0/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Validate basic huge page assignment to an instance. When assigning huge pages to an instance, a NUMA topology will automatically be added, thus this is a subclass of the NUMA tests. This will prove useful as a starting base for functional tests if we ever do manage to get memory pages modelled in placement. Change-Id: I8c760ed242a8ffd9ad963a5f51364f541909cd4c Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
* \|	Recalculate 'RequestSpec.numa_topology' on resize	Stephen Finucane	2020-01-31	1	-2/+108
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When resizing, it's possible to change the NUMA topology of an instance, or remove it entirely, due to different extra specs in the new flavor. Unfortunately we cache the instance's NUMA topology object in 'RequestSpec.numa_topology' and don't update it when resizing. This means if a given host doesn't have enough free CPUs or mempages of the size requested by the old flavor, that host can be rejected by the filter. Correct this by regenerating the 'RequestSpec.numa_topology' field as part of the resize operation, ensuring that we revert to the old field value in the case of a resize-revert. Change-Id: I0ca50665b86b9fdb4618192d4d6a3bcaa6ea2291 Signed-off-by: Stephen Finucane <sfinucan@redhat.com> Co-Authored-By: He Jie Xu <hejie.xu@intel.com> Closes-bug: #1805767
*	functional: Add '_create_server' helper	Stephen Finucane	2020-01-20	1	-43/+13
\| \| \| \| \| \| \| \|	This is a very common pattern. Centralize it. Only a few users are added for now, but we can migrate more later (it's rather tedious work). Change-Id: I84c58de90dad6d86271767363aef90ddac0f1730 Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
*	functional: Add unified '_build_server' helper function	Stephen Finucane	2020-01-15	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	'_IntegratedTestBase' has subclassed 'InstanceHelperMixin' since change I0d21cb94c932e6e556eca964c57868c705b2d120, which means both now provide a '_build_minimal_create_server_request' function. However, only '_IntegratedTestBase' provides a '_build_server' function. The '_build_minimal_create_server_request' and '_build_server' functions do pretty much the same thing but there are some differences. Combine these under the '_build_server' alias. Change-Id: I91fa2f73185fef48e9aae9b7f61389c374e06676 Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
*	FUP for in-place numa rebuild	Sean Mooney	2019-12-19	1	-10/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch addresses a number of typos and minor issues raised during review of [1][2]. A summary of the changes are corrections to typos in comments, a correction to the exception message, an update to the release note and the addition of debug logging. [1] I0322d872bdff68936033a6f5a54e8296a6fb3434 [2] I48bccc4b9adcac3c7a3e42769c11fdeb8f6fd132 Related-Bug: #1804502 Related-Bug: #1763766 Change-Id: I8975e524cd5a9c7dfb065bb2dc8ceb03f1b89e7b
*	Disable NUMATopologyFilter on rebuild	Sean Mooney	2019-12-09	1	-8/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change leverages the new NUMA constraint checking added in in I0322d872bdff68936033a6f5a54e8296a6fb3434 to allow the NUMATopologyFilter to be skipped on rebuild. As the new behavior of rebuild enfroces that no changes to the numa constraints are allowed on rebuild we no longer need to execute the NUMATopologyFilter. Previously the NUMATopologyFilter would process the rebuild request as if it was a request to spawn a new instnace as the numa_fit_instance_to_host function is not rebuild aware. As such prior to this change a rebuild would only succeed if a host had enough additional capacity for a second instance on the same host meeting the requirement of the new image and existing flavor. This behavior was incorrect on two counts as a rebuild uses a noop claim. First the resouce usage cannot change so it was incorrect to require the addtional capacity to rebuild an instance. Secondly it was incorrect not to assert the resouce usage remained the same. I0322d872bdff68936033a6f5a54e8296a6fb3434 adressed guarding the rebuild against altering the resouce usage and this change allows in place rebuild. This change found a latent bug that will be adressed in a follow up change and updated the functional tests to note the incorrect behavior. Change-Id: I48bccc4b9adcac3c7a3e42769c11fdeb8f6fd132 Closes-Bug: #1804502 Implements: blueprint inplace-rebuild-of-numa-instances
*	Block rebuild when NUMA topology changed	Sean Mooney	2019-12-05	1	-4/+128
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the image change during a rebuild it's possible for the request NUMA topology to change. As a rebuild uses a noop claim in the resource tracker the NUMA topology will not be updated as part of a rebuild. If the NUMA constraints do not change, a rebuild will continue as normal. If the new constraints conflict with the existing NUMA constraints of the instance the rebuild will be rejected without altering the status of the instance. This change introduces an API check to block rebuild when the NUMA requirements for the new image do not match the existing NUMA constraints. This is in line with the previous check introduced to prevent the rebuild of volume-backed instances which similarly are not supported. This change adds functional tests to assert the expected behaviour of rebuilding NUMA instances with new images. This change also asserts that in place rebuilds of numa instances is currently not supported. Closes-Bug: #1763766 Partial-implements: blueprint inplace-rebuild-of-numa-instances Change-Id: I0322d872bdff68936033a6f5a54e8296a6fb3434
*	functional: Make '_wait_for_state_change' behave consistently	Stephen Finucane	2019-11-19	1	-44/+16
\| \| \| \| \| \| \| \| \| \| \| \| \|	Both 'ServersTestBase' and 'InstanceHelperMixin' provide implementations of '_wait_for_state_change' but they behave differently. The former waits for an instance to transition from the provided state, while the latter, somewhat more sanely, waits for the transition to the provided state. Switch to using the latter everywhere and make the necessary changes. Due to class hierarchies, we end up with two nearly identical implementations but these will be merged in a future change. Change-Id: I80cdc0a33ec27b1389130c22f9c3a8ff69f6b1a0 Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
*	functional: Rework '_delete_server'	Stephen Finucane	2019-11-15	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We want 'ServerBase' to inherit from 'InstanceHelperMixin'. They both have implementations of a "wait until server is deleted" function. The 'ServerBase' implementation is called '_wait_for_deletion' and takes a 'server_id', while the 'InstanceHelperMixin' implementation is called '_wait_until_deleted' and takes a full server JSON-y dict. A later change, I0c56841d098d3e9d72db65be3143f3c893f0b6ba, will rework the 'ServersTestBase' version to bring it inline with 'InstanceHelperMixin' version. However, 'ServerBase._delete_server' currently calls a '_wait_for_deletion' function and passes it a 'server_id'. As such, '_delete_server' itself is only passed a server_id. After the future change, this 'server_id' will no longer be enough as '_delete_server' will need to call the newly merged '_wait_until_deleted' that takes a full server JSON-y dict, so '_delete_server' itself needs to receive a full server JSON-y dict. Do this work now to simplify the future patch. Change-Id: Iceafa5ff2b7abff7c6d974ba49036ef03fb1c85f Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
*	Helper to start computes with different HostInfos	Artom Lifshitz	2019-11-12	1	-64/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Sometimes functional tests need to start multiple compute hosts with different HostInfo objects (representing, for example, different NUMA topologies). In the future this will also be needed by both NUMA-aware live migration functional tests and a regression test for bug 1843639. This patch adds a helper function to the base libvirt function test class that takes a hostname to HostInfo dict and starts a compute for each host. Existing tests that can make use of this new helper are refactored. Change-Id: Id3f77c4ecccfdc4caa6dbf120c3df4fbdfce9d0f
*	Merge "Remove 'test_cold_migrate_with_physnet_fails' test"	Zuul	2019-10-30	1	-58/+0
\|\
\| *	Remove 'test_cold_migrate_with_physnet_fails' test	Stephen Finucane	2019-09-23	1	-58/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This never actually worked. What we wanted to do was test a setup where we had two hosts, one with the necessary configuration needed for a specific request and one without, and attempt to cold migrate the instance from the former to the latter resulting in a fail. However, because it's not possible to use different configuration for different hosts, we were attempting to "break" the configuration on one host. Unfortunately, the driver correctly detects this broken behavior, resulting in an error message like so: ERROR [nova.compute.manager] Error updating resources for node test_compute1. Traceback (most recent call last): File "nova/compute/manager.py", line 8524, in _update_available_resource_for_node startup=startup) File "nova/compute/resource_tracker.py", line 867, in update_available_resource resources = self.driver.get_available_resource(nodename) File "nova/virt/libvirt/driver.py", line 7907, in get_available_resource numa_topology = self._get_host_numa_topology() File "nova/virt/libvirt/driver.py", line 7057, in _get_host_numa_topology physnet_affinities = _get_physnet_numa_affinity() File "nova/virt/libvirt/driver.py", line 7039, in _get_physnet_numa_affinity raise exception.InvalidNetworkNUMAAffinity(reason=msg) InvalidNetworkNUMAAffinity: Invalid NUMA network affinity configured: node 1 for physnet foo is not present in host affinity set {0: set([])} There isn't really an alternative. We can't configure compute nodes separately so that's ruled out and the virt driver will correctly detect any other attempts to break the configuration. Since the test never actually worked, the correct thing to do seems to be to remove it. Change-Id: I14637d788205408dcf9a007d7727358c03033dcd Signed-off-by: Stephen Finucane <sfinucan@redhat.com>