summaryrefslogtreecommitdiff
path: root/doc/source/admin/support-compute.rst
blob: 8522e51d795a446051b0e1789a67ce8f3934ccab (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
====================
Troubleshoot Compute
====================

Common problems for Compute typically involve misconfigured networking or
credentials that are not sourced properly in the environment. Also, most flat
networking configurations do not enable :command:`ping` or :command:`ssh` from
a compute node to the instances that run on that node. Another common problem
is trying to run 32-bit images on a 64-bit compute node.  This section shows
you how to troubleshoot Compute.

.. todo:: Move the sections below into sub-pages for readability.

.. toctree::
   :maxdepth: 1

   troubleshooting/orphaned-allocations.rst
   troubleshooting/rebuild-placement-db.rst
   troubleshooting/affinity-policy-violated.rst


Compute service logging
-----------------------

Compute stores a log file for each service in ``/var/log/nova``. For example,
``nova-compute.log`` is the log for the ``nova-compute`` service. You can set
the following options to format log strings for the ``nova.log`` module in the
``nova.conf`` file:

* ``logging_context_format_string``

* ``logging_default_format_string``

If the log level is set to ``debug``, you can also specify
``logging_debug_format_suffix`` to append extra formatting.  For information
about what variables are available for the formatter, see `Formatter Objects
<https://docs.python.org/library/logging.html#formatter-objects>`_.

You have two logging options for OpenStack Compute based on configuration
settings. In ``nova.conf``, include the ``logfile`` option to enable logging.
Alternatively you can set ``use_syslog = 1`` so that the nova daemon logs to
syslog.


Guru Meditation reports
-----------------------

A Guru Meditation report is sent by the Compute service upon receipt of the
``SIGUSR2`` signal (``SIGUSR1`` before Mitaka). This report is a
general-purpose error report that includes details about the current state of
the service. The error report is sent to ``stderr``.

For example, if you redirect error output to ``nova-api-err.log`` using
:command:`nova-api 2>/var/log/nova/nova-api-err.log`, resulting in the process
ID 8675, you can then run:

.. code-block:: console

   # kill -USR2 8675

This command triggers the Guru Meditation report to be printed to
``/var/log/nova/nova-api-err.log``.

The report has the following sections:

* Package: Displays information about the package to which the process belongs,
  including version information.

* Threads: Displays stack traces and thread IDs for each of the threads within
  the process.

* Green Threads: Displays stack traces for each of the green threads within the
  process (green threads do not have thread IDs).

* Configuration: Lists all configuration options currently accessible through
  the CONF object for the current process.

For more information, see :doc:`/reference/gmr`.


.. _compute-common-errors-and-fixes:

Common errors and fixes for Compute
-----------------------------------

The `ask.openstack.org <http://ask.openstack.org>`_ site offers a place to ask
and answer questions, and you can also mark questions as frequently asked
questions. This section describes some errors people have posted previously.
Bugs are constantly being fixed, so online resources are a great way to get the
most up-to-date errors and fixes.


Credential errors, 401, and 403 forbidden errors
------------------------------------------------

Problem
~~~~~~~

Missing credentials cause a ``403 forbidden`` error.

Solution
~~~~~~~~

To resolve this issue, use one of these methods:

#. Manual method

   Gets the ``novarc`` file from the project ZIP file, saves existing
   credentials in case of override, and manually sources the ``novarc`` file.

#. Script method

   Generates ``novarc`` from the project ZIP file and sources it for you.

When you run ``nova-api`` the first time, it generates the certificate
authority information, including ``openssl.cnf``. If you start the CA services
before this, you might not be able to create your ZIP file. Restart the
services.  When your CA information is available, create your ZIP file.

Also, check your HTTP proxy settings to see whether they cause problems with
``novarc`` creation.


Live migration permission issues
--------------------------------

Problem
~~~~~~~

When live migrating an instance, you may see errors like the below:

.. code-block:: shell

    libvirtError: operation failed: Failed to connect to remote libvirt URI
    qemu+ssh://stack@cld6b16/system: Cannot recv data: Host key verification
    failed.: Connection reset by peer

Solution
~~~~~~~~

Ensure you have completed all the steps outlined in
:doc:`/admin/ssh-configuration`. In particular, it's important to note
that the ``libvirt`` process runs as ``root`` even though it may be connecting
to a different user (``stack`` in the above example). You can ensure everything
is correctly configured by attempting to connect to the remote host via the
``root`` user. Using the above example once again:

.. code-block:: shell

    $ su - -c 'ssh stack@cld6b16'


Instance errors
---------------

Problem
~~~~~~~

Sometimes a particular instance shows ``pending`` or you cannot SSH to it.
Sometimes the image itself is the problem. For example, when you use flat
manager networking, you do not have a DHCP server and certain images do not
support interface injection; you cannot connect to them.

Solution
~~~~~~~~

To fix instance errors use an image that does support this method, such as
Ubuntu, which obtains an IP address correctly with FlatManager network
settings.

To troubleshoot other possible problems with an instance, such as an instance
that stays in a spawning state, check the directory for the particular instance
under ``/var/lib/nova/instances`` on the ``nova-compute`` host and make sure
that these files are present:

* ``libvirt.xml``
* ``disk``
* ``disk-raw``
* ``kernel``
* ``ramdisk``
* ``console.log``, after the instance starts.

If any files are missing, empty, or very small, the ``nova-compute`` service
did not successfully download the images from the Image service.

Also check ``nova-compute.log`` for exceptions. Sometimes they do not appear in
the console output.

Next, check the log file for the instance in the ``/var/log/libvirt/qemu``
directory to see if it exists and has any useful error messages in it.

Finally, from the ``/var/lib/nova/instances`` directory for the instance, see
if this command returns an error:

.. code-block:: console

   # virsh create libvirt.xml


Empty log output for Linux instances
------------------------------------

Problem
~~~~~~~

You can view the log output of running instances from either the
:guilabel:`Log` tab of the dashboard or the output of :command:`nova
console-log`. In some cases, the log output of a running Linux instance will be
empty or only display a single character (for example, the `?` character).

This occurs when the Compute service attempts to retrieve the log output of the
instance via a serial console while the instance itself is not configured to
send output to the console.

Solution
~~~~~~~~

To rectify this, append the following parameters to kernel arguments specified
in the instance's boot loader:

.. code-block:: ini

   console=tty0 console=ttyS0,115200n8

Upon rebooting, the instance will be configured to send output to the Compute
service.


Reset the state of an instance
------------------------------

Problem
~~~~~~~

Instances can remain in an intermediate state, such as ``deleting``.

Solution
~~~~~~~~

You can use the :command:`nova reset-state` command to manually reset the state
of an instance to an error state. You can then delete the instance. For
example:

.. code-block:: console

   $ nova reset-state c6bbbf26-b40a-47e7-8d5c-eb17bf65c485
   $ openstack server delete c6bbbf26-b40a-47e7-8d5c-eb17bf65c485

You can also use the ``--active`` parameter to force the instance back to an
active state instead of an error state. For example:

.. code-block:: console

   $ nova reset-state --active c6bbbf26-b40a-47e7-8d5c-eb17bf65c485


Injection problems
------------------

Problem
~~~~~~~

Instances may boot slowly, or do not boot. File injection can cause this
problem.

Solution
~~~~~~~~

To disable injection in libvirt, set the following in ``nova.conf``:

.. code-block:: ini

   [libvirt]
   inject_partition = -2

.. note::

   If you have not enabled the config drive and you want to make user-specified
   files available from the metadata server for to improve performance and
   avoid boot failure if injection fails, you must disable injection.


Cannot find suitable emulator for x86_64
----------------------------------------

Problem
~~~~~~~

When you attempt to create a VM, the error shows the VM is in the ``BUILD``
then ``ERROR`` state.

Solution
~~~~~~~~

On the KVM host, run :command:`cat /proc/cpuinfo`. Make sure the ``vmx`` or
``svm`` flags are set.

Follow the instructions in the :ref:`enable-kvm`
section in the Nova Configuration Reference to enable hardware
virtualization support in your BIOS.


Failed to attach volume after detaching
---------------------------------------

Problem
~~~~~~~

Failed to attach a volume after detaching the same volume.

Solution
~~~~~~~~

You must change the device name on the :command:`nova-attach` command. The VM
might not clean up after a :command:`nova-detach` command runs. This example
shows how the :command:`nova-attach` command fails when you use the ``vdb``,
``vdc``, or ``vdd`` device names:

.. code-block:: console

   # ls -al /dev/disk/by-path/
   total 0
   drwxr-xr-x 2 root root 200 2012-08-29 17:33 .
   drwxr-xr-x 5 root root 100 2012-08-29 17:33 ..
   lrwxrwxrwx 1 root root 9 2012-08-29 17:33 pci-0000:00:04.0-virtio-pci-virtio0 -> ../../vda
   lrwxrwxrwx 1 root root 10 2012-08-29 17:33 pci-0000:00:04.0-virtio-pci-virtio0-part1 -> ../../vda1
   lrwxrwxrwx 1 root root 10 2012-08-29 17:33 pci-0000:00:04.0-virtio-pci-virtio0-part2 -> ../../vda2
   lrwxrwxrwx 1 root root 10 2012-08-29 17:33 pci-0000:00:04.0-virtio-pci-virtio0-part5 -> ../../vda5
   lrwxrwxrwx 1 root root 9 2012-08-29 17:33 pci-0000:00:06.0-virtio-pci-virtio2 -> ../../vdb
   lrwxrwxrwx 1 root root 9 2012-08-29 17:33 pci-0000:00:08.0-virtio-pci-virtio3 -> ../../vdc
   lrwxrwxrwx 1 root root 9 2012-08-29 17:33 pci-0000:00:09.0-virtio-pci-virtio4 -> ../../vdd
   lrwxrwxrwx 1 root root 10 2012-08-29 17:33 pci-0000:00:09.0-virtio-pci-virtio4-part1 -> ../../vdd1

You might also have this problem after attaching and detaching the same volume
from the same VM with the same mount point multiple times. In this case,
restart the KVM host.


Failed to attach volume, systool is not installed
-------------------------------------------------

Problem
~~~~~~~

This warning and error occurs if you do not have the required ``sysfsutils``
package installed on the compute node:

.. code-block:: console

   WARNING nova.virt.libvirt.utils [req-1200f887-c82b-4e7c-a891-fac2e3735dbb\
   admin admin|req-1200f887-c82b-4e7c-a891-fac2e3735dbb admin admin] systool\
   is not installed
   ERROR nova.compute.manager [req-1200f887-c82b-4e7c-a891-fac2e3735dbb admin\
   admin|req-1200f887-c82b-4e7c-a891-fac2e3735dbb admin admin]
   [instance: df834b5a-8c3f-477a-be9b-47c97626555c|instance: df834b5a-8c3f-47\
   7a-be9b-47c97626555c]
   Failed to attach volume 13d5c633-903a-4764-a5a0-3336945b1db1 at /dev/vdk.

Solution
~~~~~~~~

Install the ``sysfsutils`` package on the compute node. For example:

.. code-block:: console

   # apt-get install sysfsutils


Failed to connect volume in FC SAN
----------------------------------

Problem
~~~~~~~

The compute node failed to connect to a volume in a Fibre Channel (FC) SAN
configuration. The WWN may not be zoned correctly in your FC SAN that links the
compute host to the storage array:

.. code-block:: console

   ERROR nova.compute.manager [req-2ddd5297-e405-44ab-aed3-152cd2cfb8c2 admin\
   demo|req-2ddd5297-e405-44ab-aed3-152cd2cfb8c2 admin demo] [instance: 60ebd\
   6c7-c1e3-4bf0-8ef0-f07aa4c3d5f3|instance: 60ebd6c7-c1e3-4bf0-8ef0-f07aa4c3\
   d5f3]
   Failed to connect to volume 6f6a6a9c-dfcf-4c8d-b1a8-4445ff883200 while\
   attaching at /dev/vdjTRACE nova.compute.manager [instance: 60ebd6c7-c1e3-4\
   bf0-8ef0-f07aa4c3d5f3|instance: 60ebd6c7-c1e3-4bf0-8ef0-f07aa4c3d5f3]
   Traceback (most recent call last):...f07aa4c3d5f3\] ClientException: The\
   server has either erred or is incapable of performing the requested\
   operation.(HTTP 500)(Request-ID: req-71e5132b-21aa-46ee-b3cc-19b5b4ab2f00)

Solution
~~~~~~~~

The network administrator must configure the FC SAN fabric by correctly zoning
the WWN (port names) from your compute node HBAs.


Multipath call failed exit
--------------------------

Problem
~~~~~~~

Multipath call failed exit. This warning occurs in the Compute log if you do
not have the optional ``multipath-tools`` package installed on the compute
node. This is an optional package and the volume attachment does work without
the multipath tools installed.  If the ``multipath-tools`` package is installed
on the compute node, it is used to perform the volume attachment.  The IDs in
your message are unique to your system.

.. code-block:: console

   WARNING nova.storage.linuxscsi [req-cac861e3-8b29-4143-8f1b-705d0084e571 \
   admin admin|req-cac861e3-8b29-4143-8f1b-705d0084e571 admin admin] \
   Multipath call failed exit (96)

Solution
~~~~~~~~

Install the ``multipath-tools`` package on the compute node. For example:

.. code-block:: console

   # apt-get install multipath-tools


Failed to Attach Volume, Missing sg_scan
----------------------------------------

Problem
~~~~~~~

Failed to attach volume to an instance, ``sg_scan`` file not found. This error
occurs when the sg3-utils package is not installed on the compute node.  The
IDs in your message are unique to your system:

.. code-block:: console

   ERROR nova.compute.manager [req-cf2679fd-dd9e-4909-807f-48fe9bda3642 admin admin|req-cf2679fd-dd9e-4909-807f-48fe9bda3642 admin admin]
   [instance: 7d7c92e0-49fa-4a8e-87c7-73f22a9585d5|instance:  7d7c92e0-49fa-4a8e-87c7-73f22a9585d5]
   Failed to attach volume  4cc104c4-ac92-4bd6-9b95-c6686746414a at /dev/vdcTRACE nova.compute.manager
   [instance:  7d7c92e0-49fa-4a8e-87c7-73f22a9585d5|instance: 7d7c92e0-49fa-4a8e-87c7-73f22a9585d5]
   Stdout: '/usr/local/bin/nova-rootwrap: Executable not found: /usr/bin/sg_scan'

Solution
~~~~~~~~

Install the ``sg3-utils`` package on the compute node. For example:

.. code-block:: console

   # apt-get install sg3-utils


Requested microversions are ignored
-----------------------------------

Problem
~~~~~~~

When making a request with a microversion beyond 2.1, for example:

.. code-block:: console

  $ openstack --os-compute-api-version 2.15 server group create \
    --policy soft-anti-affinity my-soft-anti-group

It fails saying that "soft-anti-affinity" is not a valid policy, even
thought it is allowed with the `2.15 microversion`_.

.. _2.15 microversion: https://docs.openstack.org/nova/latest/reference/api-microversion-history.html#id13

Solution
~~~~~~~~

Ensure the ``compute`` endpoint in the identity service catalog is pointing
at ``/v2.1`` instead of ``/v2``. The former route supports microversions,
while the latter route is considered the legacy v2.0 compatibility-mode
route which renders all requests as if they were made on the legacy v2.0 API.


.. _user_token_timeout:

User token times out during long-running operations
---------------------------------------------------

Problem
~~~~~~~

Long-running operations such as live migration or snapshot can sometimes
overrun the expiry of the user token. In such cases, post operations such
as cleaning up after a live migration can fail when the nova-compute service
needs to cleanup resources in other services, such as in the block-storage
(cinder) or networking (neutron) services.

For example:

.. code-block:: console

  2018-12-17 13:47:29.591 16987 WARNING nova.virt.libvirt.migration [req-7bc758de-b2e4-461b-a971-f79be6cd4703 313d1247d7b845da9c731eec53e50a26 2f693c782fa748c2baece8db95b4ba5b - default default] [instance: ead8ecc3-f473-4672-a67b-c44534c6042d] Live migration not completed after 2400 sec
  2018-12-17 13:47:30.097 16987 WARNING nova.virt.libvirt.driver [req-7bc758de-b2e4-461b-a971-f79be6cd4703 313d1247d7b845da9c731eec53e50a26 2f693c782fa748c2baece8db95b4ba5b - default default] [instance: ead8ecc3-f473-4672-a67b-c44534c6042d] Migration operation was cancelled
  2018-12-17 13:47:30.299 16987 ERROR nova.virt.libvirt.driver [req-7bc758de-b2e4-461b-a971-f79be6cd4703 313d1247d7b845da9c731eec53e50a26 2f693c782fa748c2baece8db95b4ba5b - default default] [instance: ead8ecc3-f473-4672-a67b-c44534c6042d] Live Migration failure: operation aborted: migration job: canceled by client: libvirtError: operation aborted: migration job: canceled by client
  2018-12-17 13:47:30.685 16987 INFO nova.compute.manager [req-7bc758de-b2e4-461b-a971-f79be6cd4703 313d1247d7b845da9c731eec53e50a26 2f693c782fa748c2baece8db95b4ba5b - default default] [instance: ead8ecc3-f473-4672-a67b-c44534c6042d] Swapping old allocation on 3e32d595-bd1f-4136-a7f4-c6703d2fbe18 held by migration 17bec61d-544d-47e0-a1c1-37f9d7385286 for instance
  2018-12-17 13:47:32.450 16987 ERROR nova.volume.cinder [req-7bc758de-b2e4-461b-a971-f79be6cd4703 313d1247d7b845da9c731eec53e50a26 2f693c782fa748c2baece8db95b4ba5b - default default] Delete attachment failed for attachment 58997d5b-24f0-4073-819e-97916fb1ee19. Error: The request you have made requires authentication. (HTTP 401) Code: 401: Unauthorized: The request you have made requires authentication. (HTTP 401)

Solution
~~~~~~~~

Configure nova to use service user tokens to supplement the regular user token
used to initiate the operation. The identity service (keystone) will then
authenticate a request using the service user token if the user token has
already expired.

To use, create a service user in the identity service similar as you would when
creating the ``nova`` service user.

Then configure the :oslo.config:group:`service_user` section of the nova
configuration file, for example:

.. code-block:: ini

  [service_user]
  send_service_user_token = True
  auth_type = password
  project_domain_name = Default
  project_name = service
  user_domain_name = Default
  password = secretservice
  username = nova
  auth_url = https://104.130.216.102/identity
  ...

And configure the other identity options as necessary for the service user,
much like you would configure nova to work with the image service (glance)
or networking service.

.. note::

  Please note that the role of the :oslo.config:group:`service_user` you
  configure needs to be a superset of
  :oslo.config:option:`keystone_authtoken.service_token_roles` (The option
  :oslo.config:option:`keystone_authtoken.service_token_roles` is configured
  in cinder, glance and neutron).