| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is a combination of several legacy-to-zuulv3 job patches to
unblock the gate: with the latest Ceph release the legacy grenade jobs
started to fail with the following erros (back till ussuri):
'Error EPERM: configuring pool size as 1 is disabled by default.'
The patch contains almost a clean backport of the job configuration.
Conflicts:
gate/live_migration/hooks/run_tests.sh
roles/run-evacuate-hook/tasks/main.yaml
NOTE(melwitt): The conflict is because change
I67255fa1b919a27e92028da95d71ddd4bf53edc1 (lower-constraints: Bump
packaging to 20.4) is not in Ussuri.
NOTE(lyarwood): An additional change was required to the
run-evacuate-hook as we are now running against Bionic based hosts.
These hosts only have a single libvirtd service running so stop and
start only this during an evacuation run.
List of included patches:
1. zuul: Start to migrate nova-live-migration to zuulv3
2. zuul: Replace nova-live-migration with zuulv3 jobs
Closes-Bug: #1901739
Change-Id: Ib342e2d3c395830b4667a60de7e492d3b9de2f0a
(cherry picked from commit 4ac4a04d1843b0450e8d6d80189ce3e85253dcd0)
(cherry picked from commit 478be6f4fbbbc7b05becd5dd92a27f0c4e8f8ef8)
3. zuul: Replace grenade and nova-grenade-multinode with grenade-multinode
Change-Id: I02b2b851a74f24816d2f782a66d94de81ee527b0
(cherry picked from commit 91e53e4c2b90ea57aeac4ec522dd7c8c54961d09)
(cherry picked from commit c45bedd98d50af865d727b7456c974c8e27bff8b)
(cherry picked from commit 2af08fb5ead8ca1fa4d6b8ea00f3c5c3d26e562c)
Change-Id: Ibbb3930a6e629e93a424b3ae048f599f11923be3
(cherry picked from commit 1c733d973015999ee692ed48fb10a282c50fdc49)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I49dc963ada17a595232d3eb329d94632d07b874b missed that
call_hook_if_defined will actually cause the entire run to fail [1] if
we attempt to stop the non-existent libvirt-bin service so just remove
it now we are using the train UCA.
[1] https://opendev.org/openstack/devstack-gate/src/commit/7a70f559c559e22b498d735b4ed20aadc71b7f39/functions.sh#L74
NOTE(lyarwood): The following change is also squashed into this one to
unblock the nova-live-migration job on stable/ussuri.
test_evacuate.sh: Support libvirt-bin and libvirtd systemd services
The systemd service unit for libvirtd has changed name from libvirt-bin
to libvirtd, as such the evacuation test script needs to be changed to
support both as we move between these versions.
Change-Id: I49dc963ada17a595232d3eb329d94632d07b874b
(cherry picked from commit 6c62830ae802379e20651ffe14b10809d1122792)
Change-Id: Ife26f1ceb6208e12328ccdccbab0681ee55d5a2a
(cherry picked from commit 5ab9b28161291047b8de2cc9c27edc87b319a7bc)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I8af2ad741ca08c3d88efb9aa817c4d1470491a23 started to correctly fence the
subnode ahead of evacuation testing but missed that c-vol and g-api
where also running on the host. As a result the BFV evacuation test will
fail if the volume being used is created on the c-vol backend hosted on
the subnode.
This change now avoids this by limiting the services stopped ahead of
the evacuation on the subnode to n-cpu and q-agt.
Change-Id: Ia7c317e373e4037495d379d06eda19a71412d409
Closes-Bug: #1868234
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As stated in the forced-down API [1]:
> Setting a service forced down without completely fencing it will
> likely result in the corruption of VMs on that host.
Previously only the libvirtd service was stopped on the subnode prior to
calling this API, allowing n-cpu, q-agt and the underlying guest domains
to continue running on the host.
This change now ensures all devstack services are stopped on the subnode
and all active domains destroyed.
It is hoped that this will resolve bug #1813789 where evacuations have
timed out due to VIF plugging issues on the new destination host.
[1] https://docs.openstack.org/api-ref/compute/?expanded=update-forced-down-detail#update-forced-down
Related-Bug: #1813789
Change-Id: I8af2ad741ca08c3d88efb9aa817c4d1470491a23
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the ceph.sh script used during the nova-live-migration job
would only grep for a `compute` process when checking if the services
had been restarted. This check was bogus and would always return 0 as it
would always match itself. For example:
2020-03-13 21:06:47.682073 | primary | 2020-03-13 21:06:47.681 | root
29529 0.0 0.0 4500 736 pts/0 S+ 21:06 0:00 /bin/sh -c ps
aux | grep compute
2020-03-13 21:06:47.683964 | primary | 2020-03-13 21:06:47.683 | root
29531 0.0 0.0 14616 944 pts/0 S+ 21:06 0:00 grep compute
Failures of this job were seen on the stable/pike branch where slower CI
nodes appeared to struggle to allow Libvirt to report to n-cpu in time
before Tempest was started. This in-turn caused instance build failures
and the overall failure of the job.
This change resolves this issue by switching to pgrep and ensuring
n-cpu services are reported as fully up after a cold restart before
starting the Tempest test run.
Closes-Bug: 1867380
Change-Id: Icd7ab2ca4ddbed92c7e883a63a23245920d961e7
|
|
|
|
|
|
| |
This makes these legacy devstack-gate-based jobs run with python3.
Change-Id: Id565a20ba3ebe2ea1a72b879bd2762ba3e655658
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With OSC 4.0.0 we could now use the --boot-from-volume option
to create a volume-backed server from the provided image. However,
that option leaves the created root volume around since
delete_on_termination defaults to false in the API. So while we
could use that option and convert from nova boot to openstack
server create, it would mean we'd have to find and manually delete
the created volume after the server is created, which is more work
than it's worth to implement the TODO so just remove it.
Change-Id: I0b70b19d74007041fc2da55a4edb1c636af691d6
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the instance_create DB API method, it ensures the (legacy) default
security group gets created for the specified project_id if it does
not already exist. If the security group does not exist, it is created
in a separate transaction.
Later in the instance_create method, it reads the default security group
back that it wrote earlier (via the same ensure default security group
code). But since it was written in a separate transaction, the current
transaction will not be able to see it and will get back 0 rows. So, it
creates a duplicate default security group record if project_id=NULL
(which it will be, if running nova-manage db online_data_migrations,
which uses an anonymous RequestContext with project_id=NULL). This
succeeds despite the unique constraint on project_id because in MySQL,
unique constraints are only enforced on non-NULL values [1].
To avoid creation of a duplicate default security group for
project_id=NULL, we can use the default security group object that was
returned from the first security_group_ensure_default call earlier in
instance_create method and remove the second, redundant call.
This also breaks out the security groups setup code from a nested
method as it was causing confusion during code review and is not being
used for any particular purpose. Inspection of the original commit
where it was added in 2012 [2] did not contain any comments about the
nested method and it appeared to either be a way to organize the code
or a way to reuse the 'models' module name as a local variable name.
Closes-Bug: #1824435
[1] https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-unique
[2] https://review.opendev.org/#/c/8973/2/nova/db/sqlalchemy/api.py@1339
Change-Id: Idb205ab5b16bbf96965418cd544016fa9cc92de9
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a regression test in our post test hook. We are not able to
do a similar test in the unit or functional tests because SQLite does
not provide any isolation between transactions on the same database
connection [1] and the bug can only be reproduced with the isolation
that is present when using a real MySQL database.
Related-Bug: #1824435
[1] https://www.sqlite.org/isolation.html
Change-Id: I204361d6ff7c2323bc744878d8a9fa2d20a480b1
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch extends the existing integration test for
heal_allocations to test the recently implemented port
allocation healing functionality.
Change-Id: I993c9661c37da012cc975ee8c04daa0eb9216744
Related-Bug: #1819923
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Changes in [1] could potentially break a mixed-compute-version
environment as we don't have grenade coverage for cold migrate and
resize. This adds that coverage to the nova-grenade-multinode
job.
[1]https://review.opendev.org/#/c/655721/10
Change-Id: I81372d610ddf8abb473621deb6e7cb68eb000fee
|
|/
|
|
|
|
|
|
| |
This patch resolves a TODO in the .zuul.yaml about using common
irrelevant files in our dsvm jobs. To be able to do that we need to move
the test hooks from nova/tests/live_migraton under gate/.
Change-Id: I4e5352fd1a99ff2b4134a734eac6626be772caf1
|
|
|
|
|
|
|
|
| |
As of commit 1c9de9c7779b1faf9d9542b3e5bd20da70067365, we no longer
pass any args to the archive_deleted_rows function, so we can remove
the argument list from the function.
Change-Id: I73b2f716908088b137102631f9360939a1d7341a
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We are already running archive_deleted_rows in the gate, but we are
not verifying whether all instance records, for example, were actually
successfully removed from the databases (cell0 and cell1).
This adds the --all-cells option to our archive_deleted_rows runs and
verifies that instance records were successfully removed from all cell
databases.
It is not sufficient to check only for return code 0 because
archive_deleted_rows will still return 0 when it misses archiving
records in cell databases.
Related-Bug: #1719487
Change-Id: If133b12bf02d708c099504a88b474dce0bdb0f00
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the 'purge_db' call occurs before 'set -e', so if and when
the database purge fails (return non-zero) it does not cause the script
to exit with a failure.
This moves the call after 'set -e' to make the script exit with a
failure if the database purge step fails.
Closes-Bug: #1840967
Change-Id: I6ae27c4e11acafdc0bba8813f47059d084758b4e
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the most part this should be a pretty straight-forward
port of the run.yaml. The most complicated thing is executing
the post_test_hook.sh script. For that, a new post-run playbook
and role are added.
The relative path to devstack scripts in post_test_hook.sh itself
had to drop the 'new' directory since we are no longer executing
the script through devstack-gate anymore the 'new' path does not
exist.
Change-Id: Ie3dc90862c895a8bd9bff4511a16254945f45478
|
|
|
|
|
|
|
|
|
|
| |
This adds a simple scenario for the heal_allocations CLI
to the post_test_hook script run at the end of the nova-next
job. The functional testing in-tree is pretty extensive but
it's always good to have real integration testing.
Change-Id: If86e4796a9db3020d4fdb751e8bc771c6f98aa47
Related-Bug: #1819923
|
|
|
|
| |
Change-Id: I4fbd0cb73c73ab680af3f341d6069addb57393fb
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Devstack change Ib2e7096175c991acf35de04e840ac188752d3c17 started
creating a second network which is shared when tempest is enabled.
This causes the "openstack server create" and "nova boot" commands
in test_evacuate.sh to fail with:
Multiple possible networks found, use a Network ID to be more specific.
This change selects the non-shared network and uses it to create
the servers during evacuate testing.
Change-Id: I2085a306e4d6565df4a641efabd009a3bc182e87
Closes-Bug: #1822605
|
|
|
|
|
|
|
|
|
| |
micro-version 2.68 removed force evacuation, this chage
updates gate/test_evacuate.sh to use micro-version 2.67
Closes-Bug: #1819166
Change-Id: I44a3514b4b0ba1648aa96f92e896729c823b151c
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
gate/post_test_perf_check.sh did some simplistic performance testing of
placement. With the extraction of placement we want it to happen during
openstack/placement CI changes so we remove it here.
The depends-on is to the placement change that turns it on there, using
an independent (and very small) job.
Depends-On: I93875e3ce1f77fdb237e339b7b3e38abe3dad8f7
Change-Id: I30a7bc9a0148fd3ed15ddd997d8dab11e4fb1fe1
|
|/
|
|
|
|
|
|
|
|
| |
Waiting 30 seconds for an evacuate to complete is not enough
time on some slower CI test nodes. This change uses the
same build timeout configuration from tempest to determine
the overall evacuate timeout in our evacuate tests.
Change-Id: Ie5935ae54d2cbf1a4272e93815ee5f67d3ffe2eb
Closes-Bug: #1806925
|
|
|
|
|
|
|
| |
This adds a volume-backed instance evacuate scenario
to the test_evacuate post-test script.
Change-Id: I37120d9ce02de6dadbd279de195d2f289c891123
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a post-test bash script to test evacuate
in a multinode job.
This performs two tests:
1. A negative test where we inject a fault by stopping
libvirt prior to the evacuation and wait for the
server to go to ERROR status.
2. A positive where we restart libvirt, wait for the
compute service to be enabled and then evacuate
the server and wait for it to be ACTIVE.
For now we hack this into the nova-live-migration
job, but it should probably live in a different job
long-term.
Change-Id: I9b7c9ad6b0ab167ba4583681efbbce4b18941178
|
|
|
|
|
|
|
|
|
|
|
| |
This updates the EXPLANATION and sets the pinned version placeload
to the just release 0.3.0. This ought to hold us for a while. If
we need to do this again, we should probably switch to using
requirements files in some fashion, but I'm hoping we can avoid
that until later, potentially even after placement extraction
when we will have to moving and changing this anyway.
Change-Id: Ia3383c5dbbf8445254df774dc6ad23f2b9a3721e
|
|
|
|
|
|
|
|
|
|
|
| |
The pirate on crack output of placeload can be confusing
so this change adds a prefix to the placement-perf.txt log
file so that it is somewhat more self-explanatory.
This change also pins the version of placeload because the
explanation is version dependent.
Change-Id: I055adb5f6004c93109b17db8313a7fef85538217
|
|
|
|
|
|
|
|
|
|
| |
This change adds a post test hook to the nova-next job to report
timing of a query to GET /allocation_candidates when there are 1000
resource providers with the same inventory.
A summary of the work ends up in logs/placement-perf.txt
Change-Id: Idc446347cd8773f579b23c96235348d8e10ea3f6
|
|
|
|
|
|
|
|
|
|
| |
This makes purge iterate over all cells if requested. This also makes our
post_test_hook.sh use the --all-cells variant with just the base config
file.
Related to blueprint purge-db
Change-Id: I7eb5ed05224838cdba18e96724162cc930f4422e
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a simple purge command to nova-manage. It either deletes all
shadow archived data, or data older than a date if provided.
This also adds a post-test hook to run purge after archive to validate
that it at least works on data generated by a gate run.
Related to blueprint purge-db
Change-Id: I6f87cf03d49be6bfad2c5e6f0c8accf0fab4e6ee
|
|
|
|
| |
Change-Id: I4af326fe66f0cf24ede8a8b7a8ce0e528c4f437c
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The post_test_hook.sh runs in the nova-next CI job. The 1.0.0
version of the osc-placement plugin adds the CLIs to show consumer
resource allocations.
This adds some sanity check code to the post_test_hook.sh script
to look for any resource provider (compute nodes) that have allocations
against them, which shouldn't be the case for successful test runs
where servers are cleaned up properly.
Change-Id: I9801ad04eedf2fede24f3eb104715dcc8e20063d
|
|
We prevent a lot of tests from getting run on tools/ changes given
that most of that is unrelated to running any tests. By having the
gate hooks in that directory it made for somewhat odd separation of
what is test sensitive and what is not.
This moves things to the gate/ top level directory, and puts a symlink
in place to handle project-config compatibility until that can be
updated.
Change-Id: Iec9e89f0380256c1ae8df2d19c547d67bbdebd65
|