summaryrefslogtreecommitdiff
path: root/doc/source/contributor
diff options
context:
space:
mode:
authorZuul <zuul@review.opendev.org>2019-06-06 18:50:50 +0000
committerGerrit Code Review <review@openstack.org>2019-06-06 18:50:50 +0000
commit0979ea2954ec4dc09252aaf3f4b19bb9621ac4a5 (patch)
tree775e020223500bef0e255ba7908bb8225735382b /doc/source/contributor
parent520fed1d917f2a6632b6ddc5810036f1fe496106 (diff)
parent47fd7e332493fedef3a31a90bcb2b0099a486930 (diff)
downloadnova-0979ea2954ec4dc09252aaf3f4b19bb9621ac4a5.tar.gz
Merge "Add testing guide for down cells"
Diffstat (limited to 'doc/source/contributor')
-rw-r--r--doc/source/contributor/index.rst2
-rw-r--r--doc/source/contributor/testing/down-cell.rst238
2 files changed, 240 insertions, 0 deletions
diff --git a/doc/source/contributor/index.rst b/doc/source/contributor/index.rst
index 84c78255cf..5995413f5b 100644
--- a/doc/source/contributor/index.rst
+++ b/doc/source/contributor/index.rst
@@ -78,6 +78,8 @@ be Python code. All new code needs to be validated somehow.
* :doc:`/contributor/testing/zero-downtime-upgrade`
+ * :doc:`/contributor/testing/down-cell`
+
The Nova API
============
diff --git a/doc/source/contributor/testing/down-cell.rst b/doc/source/contributor/testing/down-cell.rst
new file mode 100644
index 0000000000..98065f4bda
--- /dev/null
+++ b/doc/source/contributor/testing/down-cell.rst
@@ -0,0 +1,238 @@
+==================
+Testing Down Cells
+==================
+
+This document describes how to recreate a down-cell scenario in a single-node
+devstack environment. This can be useful for testing the reliability of the
+controller services when a cell in the deployment is down.
+
+
+Setup
+=====
+
+DevStack config
+---------------
+
+This guide is based on a devstack install from the Train release using
+an Ubuntu Bionic 18.04 VM with 8 VCPU, 8 GB RAM and 200 GB of disk following
+the `All-In-One Single Machine`_ guide.
+
+The following minimal local.conf was used:
+
+.. code-block:: ini
+
+ [[local|localrc]]
+ # Define passwords
+ OS_PASSWORD=openstack1
+ SERVICE_TOKEN=$OS_PASSWORD
+ ADMIN_PASSWORD=$OS_PASSWORD
+ MYSQL_PASSWORD=$OS_PASSWORD
+ RABBIT_PASSWORD=$OS_PASSWORD
+ SERVICE_PASSWORD=$OS_PASSWORD
+ # Logging config
+ LOGFILE=$DEST/logs/stack.sh.log
+ LOGDAYS=2
+ # Disable non-essential services
+ disable_service horizon tempest
+
+.. _All-In-One Single Machine: https://docs.openstack.org/devstack/latest/guides/single-machine.html
+
+Populate cell1
+--------------
+
+Create a test server first so there is something in cell1:
+
+.. code-block:: console
+
+ $ source openrc admin admin
+ $ IMAGE=$(openstack image list -f value -c ID)
+ $ openstack server create --wait --flavor m1.tiny --image $IMAGE cell1-server
+
+
+Take down cell1
+===============
+
+Break the connection to the cell1 database by changing the
+``database_connection`` URL, in this case with an invalid host IP:
+
+.. code-block:: console
+
+ mysql> select database_connection from cell_mappings where name='cell1';
+ +-------------------------------------------------------------------+
+ | database_connection |
+ +-------------------------------------------------------------------+
+ | mysql+pymysql://root:openstack1@127.0.0.1/nova_cell1?charset=utf8 |
+ +-------------------------------------------------------------------+
+ 1 row in set (0.00 sec)
+
+ mysql> update cell_mappings set database_connection='mysql+pymysql://root:openstack1@192.0.0.1/nova_cell1?charset=utf8' where name='cell1';
+ Query OK, 1 row affected (0.01 sec)
+ Rows matched: 1 Changed: 1 Warnings: 0
+
+
+Update controller services
+==========================
+
+Prepare the controller services for the down cell. See
+:ref:`Handling cell failures <handling-cell-failures>` for details.
+
+Modify nova.conf
+----------------
+
+Configure the API to avoid long timeouts and slow start times due to
+`bug 1815697`_ by modifying ``/etc/nova/nova.conf``:
+
+.. code-block:: ini
+
+ [database]
+ ...
+ max_retries = 1
+ retry_interval = 1
+
+ [upgrade_levels]
+ ...
+ compute = stein # N-1 from train release, just something other than "auto"
+
+.. _bug 1815697: https://bugs.launchpad.net/nova/+bug/1815697
+
+Restart services
+----------------
+
+.. note:: It is useful to tail the n-api service logs in another screen to
+ watch for errors / warnings in the logs due to down cells:
+
+ .. code-block:: console
+
+ $ sudo journalctl -f -a -u devstack@n-api.service
+
+Restart controller services to flush the cell cache:
+
+.. code-block:: console
+
+ $ sudo systemctl restart devstack@n-api.service devstack@n-super-cond.service devstack@n-sch.service
+
+
+Test cases
+==========
+
+1. Try to create a server which should fail and go to cell0.
+
+ .. code-block:: console
+
+ $ openstack server create --wait --flavor m1.tiny --image $IMAGE cell0-server
+
+ You can expect to see errors like this in the n-api logs:
+
+ .. code-block:: console
+
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context [None req-fdaff415-48b9-44a7-b4c3-015214e80b90 None None] Error gathering result from cell 4f495a21-294a-4051-9a3d-8b34a250bbb4: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on u'192.0.0.1' ([Errno 101] ENETUNREACH)") (Background on this error at: http://sqlalche.me/e/e3q8)
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context Traceback (most recent call last):
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context File "/opt/stack/nova/nova/context.py", line 441, in gather_result
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context result = fn(cctxt, *args, **kwargs)
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context File "/opt/stack/nova/nova/db/sqlalchemy/api.py", line 211, in wrapper
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context with reader_mode.using(context):
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context return self.gen.next()
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 1061, in _transaction_scope
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context context=context) as resource:
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context return self.gen.next()
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 659, in _session
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context bind=self.connection, mode=self.mode)
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 418, in _create_session
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context self._start()
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 510, in _start
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context engine_args, maker_args)
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 534, in _setup_for_connection
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context sql_connection=sql_connection, **engine_kwargs)
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context File "/usr/local/lib/python2.7/dist-packages/debtcollector/renames.py", line 43, in decorator
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context return wrapped(*args, **kwargs)
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/engines.py", line 201, in create_engine
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context test_conn = _test_connection(engine, max_retries, retry_interval)
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/engines.py", line 387, in _test_connection
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context six.reraise(type(de_ref), de_ref)
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context File "<string>", line 3, in reraise
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on u'192.0.0.1' ([Errno 101] ENETUNREACH)") (Background on this error at: http://sqlalche.me/e/e3q8)
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: ERROR nova.context
+ Apr 04 20:48:22 train devstack@n-api.service[10884]: WARNING nova.objects.service [None req-1cf4bf5c-2f74-4be0-a18d-51ff81df57dd admin admin] Failed to get minimum service version for cell 4f495a21-294a-4051-9a3d-8b34a250bbb4
+
+2. List servers with the 2.69 microversion for down cells.
+
+ .. note:: Requires python-openstackclient >= 3.18.0 for v2.69 support.
+
+ The server in cell1 (which is down) will show up with status UNKNOWN:
+
+ .. code-block:: console
+
+ $ openstack --os-compute-api-version 2.69 server list
+ +--------------------------------------+--------------+---------+----------+--------------------------+--------+
+ | ID | Name | Status | Networks | Image | Flavor |
+ +--------------------------------------+--------------+---------+----------+--------------------------+--------+
+ | 8e90f1f0-e8dd-4783-8bb3-ec8d594e60f1 | | UNKNOWN | | | |
+ | afd45d84-2bd7-4e49-9dff-93359f742bc1 | cell0-server | ERROR | | cirros-0.4.0-x86_64-disk | |
+ +--------------------------------------+--------------+---------+----------+--------------------------+--------+
+
+3. Using v2.1 the UNKNOWN server is filtered out by default due to
+ :oslo.config:option:`api.list_records_by_skipping_down_cells`:
+
+ .. code-block:: console
+
+ $ openstack --os-compute-api-version 2.1 server list
+ +--------------------------------------+--------------+--------+----------+--------------------------+---------+
+ | ID | Name | Status | Networks | Image | Flavor |
+ +--------------------------------------+--------------+--------+----------+--------------------------+---------+
+ | afd45d84-2bd7-4e49-9dff-93359f742bc1 | cell0-server | ERROR | | cirros-0.4.0-x86_64-disk | m1.tiny |
+ +--------------------------------------+--------------+--------+----------+--------------------------+---------+
+
+4. Configure nova-api with ``list_records_by_skipping_down_cells=False``
+
+ .. code-block:: ini
+
+ [api]
+ list_records_by_skipping_down_cells = False
+
+5. Restart nova-api and then listing servers should fail:
+
+ .. code-block:: console
+
+ $ sudo systemctl restart devstack@n-api.service
+ $ openstack --os-compute-api-version 2.1 server list
+ Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
+ <class 'nova.exception.NovaException'> (HTTP 500) (Request-ID: req-e2264d67-5b6c-4f17-ae3d-16c7562f1b69)
+
+6. Try listing compute services with a down cell.
+
+ The services from the down cell are skipped:
+
+ .. code-block:: console
+
+ $ openstack --os-compute-api-version 2.1 compute service list
+ +----+------------------+-------+----------+---------+-------+----------------------------+
+ | ID | Binary | Host | Zone | Status | State | Updated At |
+ +----+------------------+-------+----------+---------+-------+----------------------------+
+ | 2 | nova-scheduler | train | internal | enabled | up | 2019-04-04T21:12:47.000000 |
+ | 6 | nova-consoleauth | train | internal | enabled | up | 2019-04-04T21:12:38.000000 |
+ | 7 | nova-conductor | train | internal | enabled | up | 2019-04-04T21:12:47.000000 |
+ +----+------------------+-------+----------+---------+-------+----------------------------+
+
+ With 2.69 the nova-compute service from cell1 is shown with status UNKNOWN:
+
+ .. code-block:: console
+
+ $ openstack --os-compute-api-version 2.69 compute service list
+ +--------------------------------------+------------------+-------+----------+---------+-------+----------------------------+
+ | ID | Binary | Host | Zone | Status | State | Updated At |
+ +--------------------------------------+------------------+-------+----------+---------+-------+----------------------------+
+ | f68a96d9-d994-4122-a8f9-1b0f68ed69c2 | nova-scheduler | train | internal | enabled | up | 2019-04-04T21:13:47.000000 |
+ | 70cd668a-6d60-4a9a-ad83-f863920d4c44 | nova-consoleauth | train | internal | enabled | up | 2019-04-04T21:13:38.000000 |
+ | ca88f023-1de4-49e0-90b0-581e16bebaed | nova-conductor | train | internal | enabled | up | 2019-04-04T21:13:47.000000 |
+ | | nova-compute | train | | UNKNOWN | | |
+ +--------------------------------------+------------------+-------+----------+---------+-------+----------------------------+
+
+
+Future
+======
+
+This guide could be expanded for having multiple non-cell0 cells where one
+cell is down while the other is available and go through scenarios where the
+down cell is marked as disabled to take it out of scheduling consideration.