<feed xmlns='http://www.w3.org/2005/Atom'>
<title>delta/openstack/nova.git/nova/context.py, branch master</title>
<subtitle>opendev.org: openstack/nova.git
</subtitle>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/openstack/nova.git/'/>
<entry>
<title>Log the exception returned from a cell during API.get()</title>
<updated>2022-05-03T02:03:26+00:00</updated>
<author>
<name>melanie witt</name>
<email>melwittt@gmail.com</email>
</author>
<published>2022-05-03T01:48:36+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/openstack/nova.git/commit/?id=1d4dbfd4680d32d0619e84dbe563deed892e0506'/>
<id>1d4dbfd4680d32d0619e84dbe563deed892e0506</id>
<content type='text'>
When getting an instance using the compute.API we call
scatter_gather_single_cell() to be able to capture details when we fail
to retrieve a result from a cell such as timeouts and exceptions.

Currently however, we aren't logging the content of an exception if
scatter_gather_single_cell() returns an exception as the result. The
scatter gather method itself logs exceptions that are not of type
NovaException as these represent definite unexpected errors such as
database errors but NovaException handling are left for the caller to
decide whether they want to log it or re-raise it and so on.

It can be difficult to debug a situation where a cell is returning a
NovaException result so this adds logging of the exception content in
the compute API when we encounter an unexpected NovaException.

The existing log message has been updated to more accurately reflect
what has happened (did not respond vs exception). The assignment of the
exception object in scatter gather has also been updated to not
unnecessarily construct a new exception object because it (a) wasn't
necessary and (b) made asserting the LOG.exception() call argument in
the unit test difficult.

Related-Bug: #1970087

Change-Id: Iae1c61c72be5b6017b934293e3dc079a24eeb0e7
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When getting an instance using the compute.API we call
scatter_gather_single_cell() to be able to capture details when we fail
to retrieve a result from a cell such as timeouts and exceptions.

Currently however, we aren't logging the content of an exception if
scatter_gather_single_cell() returns an exception as the result. The
scatter gather method itself logs exceptions that are not of type
NovaException as these represent definite unexpected errors such as
database errors but NovaException handling are left for the caller to
decide whether they want to log it or re-raise it and so on.

It can be difficult to debug a situation where a cell is returning a
NovaException result so this adds logging of the exception content in
the compute API when we encounter an unexpected NovaException.

The existing log message has been updated to more accurately reflect
what has happened (did not respond vs exception). The assignment of the
exception object in scatter gather has also been updated to not
unnecessarily construct a new exception object because it (a) wasn't
necessary and (b) made asserting the LOG.exception() call argument in
the unit test difficult.

Related-Bug: #1970087

Change-Id: Iae1c61c72be5b6017b934293e3dc079a24eeb0e7
</pre>
</div>
</content>
</entry>
<entry>
<title>db: Unify 'nova.db.api', 'nova.db.sqlalchemy.api'</title>
<updated>2021-08-09T14:34:40+00:00</updated>
<author>
<name>Stephen Finucane</name>
<email>stephenfin@redhat.com</email>
</author>
<published>2021-04-01T16:49:02+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/openstack/nova.git/commit/?id=100b9dc62c0ec9f7b38739837c06646122c818d5'/>
<id>100b9dc62c0ec9f7b38739837c06646122c818d5</id>
<content type='text'>
Merge these, removing an unnecessary layer of abstraction, and place
them in the new 'nova.db.main' directory. The resulting change is huge,
but it's mainly the result of 's/sqlalchemy import api/main import api/'
and 's/nova.db.api/nova.db.main.api/' with some necessary cleanup. We
also need to rework how we do the blocking of API calls since we no
longer have a 'DBAPI' object that we can monkey patch as we were doing
before. This is now done via a global variable that is set by the 'main'
function of 'nova.cmd.compute'.

The main impact of this change is that it's no longer possible to set
'[database] use_db_reconnect' and have all APIs automatically wrapped in
a DB retry. Seeing as this behavior is experimental, isn't applied to
any of the API DB methods (which don't use oslo.db's 'DBAPI' helper),
and is used explicitly in what would appear to be the critical cases
(via the explicit 'oslo_db.api.wrap_db_retry' decorator), this doesn't
seem like a huge loss.

Change-Id: Iad2e4da4546b80a016e477577d23accb2606a6e4
Signed-off-by: Stephen Finucane &lt;stephenfin@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Merge these, removing an unnecessary layer of abstraction, and place
them in the new 'nova.db.main' directory. The resulting change is huge,
but it's mainly the result of 's/sqlalchemy import api/main import api/'
and 's/nova.db.api/nova.db.main.api/' with some necessary cleanup. We
also need to rework how we do the blocking of API calls since we no
longer have a 'DBAPI' object that we can monkey patch as we were doing
before. This is now done via a global variable that is set by the 'main'
function of 'nova.cmd.compute'.

The main impact of this change is that it's no longer possible to set
'[database] use_db_reconnect' and have all APIs automatically wrapped in
a DB retry. Seeing as this behavior is experimental, isn't applied to
any of the API DB methods (which don't use oslo.db's 'DBAPI' helper),
and is used explicitly in what would appear to be the critical cases
(via the explicit 'oslo_db.api.wrap_db_retry' decorator), this doesn't
seem like a huge loss.

Change-Id: Iad2e4da4546b80a016e477577d23accb2606a6e4
Signed-off-by: Stephen Finucane &lt;stephenfin@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Remove six.binary_type/integer_types/string_types</title>
<updated>2020-12-13T11:25:14+00:00</updated>
<author>
<name>Takashi Natsume</name>
<email>takanattie@gmail.com</email>
</author>
<published>2020-05-14T12:04:12+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/openstack/nova.git/commit/?id=07462dd0050fbfea89e517759b312b67a368e279'/>
<id>07462dd0050fbfea89e517759b312b67a368e279</id>
<content type='text'>
Replace the following items with Python 3 style code.

- six.binary_type
- six.integer_types
- six.string_types

Subsequent patches will replace other six usages.

Change-Id: Ide65686cf02463045f5c32771ca949802b19636f
Implements: blueprint six-removal
Signed-off-by: Takashi Natsume &lt;takanattie@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Replace the following items with Python 3 style code.

- six.binary_type
- six.integer_types
- six.string_types

Subsequent patches will replace other six usages.

Change-Id: Ide65686cf02463045f5c32771ca949802b19636f
Implements: blueprint six-removal
Signed-off-by: Takashi Natsume &lt;takanattie@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Reset the cell cache for database access in Service</title>
<updated>2020-04-08T17:48:18+00:00</updated>
<author>
<name>melanie witt</name>
<email>melwittt@gmail.com</email>
</author>
<published>2020-04-03T21:22:27+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/openstack/nova.git/commit/?id=941559042f609ee43ff3160c0f0d0c45187be17f'/>
<id>941559042f609ee43ff3160c0f0d0c45187be17f</id>
<content type='text'>
We have had a gate bug for a long time where occasionally the scheduler
service gets into a state where many requests fail in it with
CellTimeout errors. Example:

  Timed out waiting for response from cell &lt;cell uuid&gt;

Through the use of much DNM patch debug logging in oslo.db, it was
revealed that service child processes (workers) were sometimes starting
off with already locked internal oslo.db locks. This is a known issue
in python [1] where if a parent process forks a child process while a
lock is held, the child will inherit the held lock which can never be
acquired.

The python issue is not considered a bug and the recommended way to
handle it is by making use of the os.register_at_fork() in the oslo.db
to reinitialize its lock. The method is new in python 3.7, so as long
as we still support python 3.6, we must handle the situation outside of
oslo.db.

We can do this by clearing the cell cache that holds oslo.db database
transaction context manager objects during service start(). This way,
we get fresh oslo.db locks that are in an unlocked state when a child
process begins.

We can also take this opportunity to resolve part of a TODO to clear
the same cell cache during service reset() (SIGHUP) since it is another
case where we intended to clear it. The rest of the TODO related to
periodic clearing of the cache is removed after discussion on the
review, as such clearing would be unsynchronized among multiple
services and for periods of time each service might have a different
view of cached cells than another.

Closes-Bug: #1844929

[1] https://bugs.python.org/issue6721

Change-Id: Id233f673a57461cc312e304873a41442d732c051
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have had a gate bug for a long time where occasionally the scheduler
service gets into a state where many requests fail in it with
CellTimeout errors. Example:

  Timed out waiting for response from cell &lt;cell uuid&gt;

Through the use of much DNM patch debug logging in oslo.db, it was
revealed that service child processes (workers) were sometimes starting
off with already locked internal oslo.db locks. This is a known issue
in python [1] where if a parent process forks a child process while a
lock is held, the child will inherit the held lock which can never be
acquired.

The python issue is not considered a bug and the recommended way to
handle it is by making use of the os.register_at_fork() in the oslo.db
to reinitialize its lock. The method is new in python 3.7, so as long
as we still support python 3.6, we must handle the situation outside of
oslo.db.

We can do this by clearing the cell cache that holds oslo.db database
transaction context manager objects during service start(). This way,
we get fresh oslo.db locks that are in an unlocked state when a child
process begins.

We can also take this opportunity to resolve part of a TODO to clear
the same cell cache during service reset() (SIGHUP) since it is another
case where we intended to clear it. The rest of the TODO related to
periodic clearing of the cache is removed after discussion on the
review, as such clearing would be unsynchronized among multiple
services and for periods of time each service might have a different
view of cached cells than another.

Closes-Bug: #1844929

[1] https://bugs.python.org/issue6721

Change-Id: Id233f673a57461cc312e304873a41442d732c051
</pre>
</div>
</content>
</entry>
<entry>
<title>ksa auth conf and client for Cyborg access</title>
<updated>2020-03-21T19:03:37+00:00</updated>
<author>
<name>Sundar Nadathur</name>
<email>sundar.nadathur@intel.com</email>
</author>
<published>2019-01-16T08:31:01+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/openstack/nova.git/commit/?id=c071741d565950ba0a6b43f7b66aad0bdbaf1dff'/>
<id>c071741d565950ba0a6b43f7b66aad0bdbaf1dff</id>
<content type='text'>
Framework for communication with the Cyborg API.

- Standard keystoneauth1 config options for setting up authentication in
the [cyborg] section of nova*.conf.
- A new nova.accelerator.cyborg module containing a get_client method to
return a client containing a keystoneauth1 adapter pointing
to the Cyborg service with user- and service- based authentication.
- Requirements updates to pull in the os-service-types release
containing the 'accelerator' service type.

Change-Id: Iee0766269d61948ad701911e8b0e5e24d3d6eb04
Blueprint: nova-cyborg-interaction
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Framework for communication with the Cyborg API.

- Standard keystoneauth1 config options for setting up authentication in
the [cyborg] section of nova*.conf.
- A new nova.accelerator.cyborg module containing a get_client method to
return a client containing a keystoneauth1 adapter pointing
to the Cyborg service with user- and service- based authentication.
- Requirements updates to pull in the os-service-types release
containing the 'accelerator' service type.

Change-Id: Iee0766269d61948ad701911e8b0e5e24d3d6eb04
Blueprint: nova-cyborg-interaction
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "Log CellTimeout traceback in scatter_gather_cells"</title>
<updated>2019-10-22T21:12:28+00:00</updated>
<author>
<name>Matt Riedemann</name>
<email>mriedem.os@gmail.com</email>
</author>
<published>2019-10-22T21:10:19+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/openstack/nova.git/commit/?id=9377d00ccf7a73071b4fb75d66ce5ad7bd321174'/>
<id>9377d00ccf7a73071b4fb75d66ce5ad7bd321174</id>
<content type='text'>
This reverts commit 0436a95f37df086ddc99017376cb9a312e40517a.

This was meant to get us more debug details when hitting the
failure but the results are not helpful [1] so revert this
and the fix for the resulting regression [2].

[1] http://paste.openstack.org/show/782116/
[2] I7f9edc9a4b4930f4dce98df271888fa8082a1701

Change-Id: Iab8029f081a654278ea7dbbec79a766aea6764ae
Related-Bug: #1844929
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit 0436a95f37df086ddc99017376cb9a312e40517a.

This was meant to get us more debug details when hitting the
failure but the results are not helpful [1] so revert this
and the fix for the resulting regression [2].

[1] http://paste.openstack.org/show/782116/
[2] I7f9edc9a4b4930f4dce98df271888fa8082a1701

Change-Id: Iab8029f081a654278ea7dbbec79a766aea6764ae
Related-Bug: #1844929
</pre>
</div>
</content>
</entry>
<entry>
<title>[Gate fix] Avoid use cell_uuid before assignment</title>
<updated>2019-10-07T18:40:39+00:00</updated>
<author>
<name>ricolin</name>
<email>rico.lin.guanyu@gmail.com</email>
</author>
<published>2019-10-07T10:32:24+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/openstack/nova.git/commit/?id=35d76c7cf5b122c9a8b24e62adc73510bbd6d94c'/>
<id>35d76c7cf5b122c9a8b24e62adc73510bbd6d94c</id>
<content type='text'>
Found this error in heat gate during running grenade test job.

Should not asking to provide cell_uuid if queue.get is timeout.

Closes-Bug: #1847131

Change-Id: I7f9edc9a4b4930f4dce98df271888fa8082a1701
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Found this error in heat gate during running grenade test job.

Should not asking to provide cell_uuid if queue.get is timeout.

Closes-Bug: #1847131

Change-Id: I7f9edc9a4b4930f4dce98df271888fa8082a1701
</pre>
</div>
</content>
</entry>
<entry>
<title>Log CellTimeout traceback in scatter_gather_cells</title>
<updated>2019-09-23T18:57:44+00:00</updated>
<author>
<name>Matt Riedemann</name>
<email>mriedem.os@gmail.com</email>
</author>
<published>2019-09-23T18:57:44+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/openstack/nova.git/commit/?id=0436a95f37df086ddc99017376cb9a312e40517a'/>
<id>0436a95f37df086ddc99017376cb9a312e40517a</id>
<content type='text'>
When a call to a cell in scatter_gather_cells times out
we log a warning and set the did_not_respond_sentinel for
that cell but it would be useful if we logged the traceback
with the warning for debugging where the call is happening.

Change-Id: I8f4069474a3955eea6c967d3090f2960e739224c
Related-Bug: #1844929
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When a call to a cell in scatter_gather_cells times out
we log a warning and set the did_not_respond_sentinel for
that cell but it would be useful if we logged the traceback
with the warning for debugging where the call is happening.

Change-Id: I8f4069474a3955eea6c967d3090f2960e739224c
Related-Bug: #1844929
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge "Move default policy target"</title>
<updated>2019-07-27T00:32:20+00:00</updated>
<author>
<name>Zuul</name>
<email>zuul@review.opendev.org</email>
</author>
<published>2019-07-27T00:32:20+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/openstack/nova.git/commit/?id=4e58d4c10b79457718bff1e3e851ec7440c20da4'/>
<id>4e58d4c10b79457718bff1e3e851ec7440c20da4</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Make RequestContext(instance_lock_checked) fail</title>
<updated>2019-06-13T15:36:03+00:00</updated>
<author>
<name>Eric Fried</name>
<email>openstack@fried.cc</email>
</author>
<published>2019-06-12T19:07:15+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/openstack/nova.git/commit/?id=bcd9c4f048d5574ef0b0c927d8832d48cd08c7cc'/>
<id>bcd9c4f048d5574ef0b0c927d8832d48cd08c7cc</id>
<content type='text'>
The instance_lock_checked kwarg was added to RequestContext in [1], no
longer used in Nova since [2], and deprecated with a warning in [3].
This commit removes all handling of this kwarg, which will cause using
it to blow up with a TypeError in the super().__init__.

[1] I1127e31d86a061a93a64ee1eb4a4d900d8bf49b5
[2] Ic18017a16c5bffee85a43db65ff17283599a27ba
[3] Ib0b23a47c2e833073108af6700b16f8026631a83

Change-Id: Ie5f5a0ae018ce601c588a466399515159d8f58c0
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The instance_lock_checked kwarg was added to RequestContext in [1], no
longer used in Nova since [2], and deprecated with a warning in [3].
This commit removes all handling of this kwarg, which will cause using
it to blow up with a TypeError in the super().__init__.

[1] I1127e31d86a061a93a64ee1eb4a4d900d8bf49b5
[2] Ic18017a16c5bffee85a43db65ff17283599a27ba
[3] Ib0b23a47c2e833073108af6700b16f8026631a83

Change-Id: Ie5f5a0ae018ce601c588a466399515159d8f58c0
</pre>
</div>
</content>
</entry>
</feed>
