<feed xmlns='http://www.w3.org/2005/Atom'>
<title>delta/python-packages/qpid-python.git/cpp/src/tests/ha_tests.py, branch QPID-6125-ProtocolRefactoring</title>
<subtitle>git.apache.org: qpid.git
</subtitle>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/python-packages/qpid-python.git/'/>
<entry>
<title>NO-JIRA: HA Fix ha_tests.py failures with SWIG 0.10 client.</title>
<updated>2014-09-05T01:39:01+00:00</updated>
<author>
<name>Alan Conway</name>
<email>aconway@apache.org</email>
</author>
<published>2014-09-05T01:39:01+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/python-packages/qpid-python.git/commit/?id=edfe5c920983f99f0167f849bb0e9d0ac77c3f38'/>
<id>edfe5c920983f99f0167f849bb0e9d0ac77c3f38</id>
<content type='text'>
- Fix un-necessary re-sends in amqp0_10::SenderImpl::replay.
- Throw NotFound and UnauthorizedAccess correctly from amqp0_10::SessionImpl and ConnectionImpl
- Fix ha_test wait_address and valid_address re-using a session after it is closed by NotFound.

git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk/qpid@1622592 13f79535-47bb-0310-9956-ffa450edef68
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Fix un-necessary re-sends in amqp0_10::SenderImpl::replay.
- Throw NotFound and UnauthorizedAccess correctly from amqp0_10::SessionImpl and ConnectionImpl
- Fix ha_test wait_address and valid_address re-using a session after it is closed by NotFound.

git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk/qpid@1622592 13f79535-47bb-0310-9956-ffa450edef68
</pre>
</div>
</content>
</entry>
<entry>
<title>NO-JIRA: Fix qpid/cpp/src/tests/ha_tests.py to work on python 2.4.</title>
<updated>2014-09-04T20:48:27+00:00</updated>
<author>
<name>Alan Conway</name>
<email>aconway@apache.org</email>
</author>
<published>2014-09-04T20:48:27+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/python-packages/qpid-python.git/commit/?id=4add739d004d6c1eb0d92904fc0f2a8b4b600b9a'/>
<id>4add739d004d6c1eb0d92904fc0f2a8b4b600b9a</id>
<content type='text'>
git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk/qpid@1622560 13f79535-47bb-0310-9956-ffa450edef68
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk/qpid@1622560 13f79535-47bb-0310-9956-ffa450edef68
</pre>
</div>
</content>
</entry>
<entry>
<title>QPID-5975: HA extra/missing messages when running qpid-txtest2 in a loop with failover.</title>
<updated>2014-08-28T21:47:44+00:00</updated>
<author>
<name>Alan Conway</name>
<email>aconway@apache.org</email>
</author>
<published>2014-08-28T21:47:44+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/python-packages/qpid-python.git/commit/?id=b93b20dd123757b208f9e78ef778e3648c3438a0'/>
<id>b93b20dd123757b208f9e78ef778e3648c3438a0</id>
<content type='text'>
This is partly not-a-bug, there is a client error handling issue that has been
corrected.

qpid-txtest2 initializes a queue with messages at the start and drains the
queues at the end. These operations are *not transactional*. Therefore
duplicates are expected if there is a failover during initialization or
draining. When duplicates were observed, there was indeed a failover at one of
these times.

Making these operations transactional is not enough to pass, now we see the test
fail with "no messages to fetch". This is explained as follows:

If there is a failover during a transaction, TransactionAborted is raised. The
client assumes the transaction was rolled back and re-plays it. However, if the
failover occurs at a critical point *after* the client has sent commit
but *before* it has received a response, then the the client *does not know*
whether the transaction was committed or rolled-back on the new primary.

Re-playing in this case can duplicate the transaction. Each transaction moves
messages from one queue to another so as long as transactions are atomic the
total number of messages will not change. However, if transactions are
duplicated, a transactional session may try to move more messages than exist on
the queue, hence "no messages to fetch". For example if thread 1 moves N
messages from q1 to q2, and thread 2 tries to move N+M messages back, then
thread 2 will fail.

This problem has been corrected as follows: C++ and python clients now raise the
following exceptions:

- TransactionAborted: The transaction has definitely been rolled back due to a
  connection failure before commit or a broker error (e.g. a store error) during commit.
  It can safely be replayed.

- TransactionUnknown: The transaction outcome is unknown because the connection
  failed at the critical time. There's no simple automatic way to know what
  happened without examining the state of the broker queues.

Unfortunately With this fix qpid-txtest2 is no longer useful test for TX
failover because it regularly raises TransactionUnknown and there's not much we
can do with that.

A better test of TX atomicity with failover is to run a pair of
qpid-send/qpid-receive with fail-over and verify that the number of
enqueues/dequeues and message depth are a multiple of the transaction size. See
the JIRA for such a test. (Note these test also sometimes raise
TransactionUnknown but it doesn't matter since all we are checking is that
messages go on and off the queues in multiple of the TX size.)  )

Note: the original bug also reported seeing missing messages from
qpid-txtest2. I don't have a good explanation for that but since the
qpid-send/receive test shows that transactions are atomic I am going to let that
go for now.

git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk/qpid@1621211 13f79535-47bb-0310-9956-ffa450edef68
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is partly not-a-bug, there is a client error handling issue that has been
corrected.

qpid-txtest2 initializes a queue with messages at the start and drains the
queues at the end. These operations are *not transactional*. Therefore
duplicates are expected if there is a failover during initialization or
draining. When duplicates were observed, there was indeed a failover at one of
these times.

Making these operations transactional is not enough to pass, now we see the test
fail with "no messages to fetch". This is explained as follows:

If there is a failover during a transaction, TransactionAborted is raised. The
client assumes the transaction was rolled back and re-plays it. However, if the
failover occurs at a critical point *after* the client has sent commit
but *before* it has received a response, then the the client *does not know*
whether the transaction was committed or rolled-back on the new primary.

Re-playing in this case can duplicate the transaction. Each transaction moves
messages from one queue to another so as long as transactions are atomic the
total number of messages will not change. However, if transactions are
duplicated, a transactional session may try to move more messages than exist on
the queue, hence "no messages to fetch". For example if thread 1 moves N
messages from q1 to q2, and thread 2 tries to move N+M messages back, then
thread 2 will fail.

This problem has been corrected as follows: C++ and python clients now raise the
following exceptions:

- TransactionAborted: The transaction has definitely been rolled back due to a
  connection failure before commit or a broker error (e.g. a store error) during commit.
  It can safely be replayed.

- TransactionUnknown: The transaction outcome is unknown because the connection
  failed at the critical time. There's no simple automatic way to know what
  happened without examining the state of the broker queues.

Unfortunately With this fix qpid-txtest2 is no longer useful test for TX
failover because it regularly raises TransactionUnknown and there's not much we
can do with that.

A better test of TX atomicity with failover is to run a pair of
qpid-send/qpid-receive with fail-over and verify that the number of
enqueues/dequeues and message depth are a multiple of the transaction size. See
the JIRA for such a test. (Note these test also sometimes raise
TransactionUnknown but it doesn't matter since all we are checking is that
messages go on and off the queues in multiple of the TX size.)  )

Note: the original bug also reported seeing missing messages from
qpid-txtest2. I don't have a good explanation for that but since the
qpid-send/receive test shows that transactions are atomic I am going to let that
go for now.

git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk/qpid@1621211 13f79535-47bb-0310-9956-ffa450edef68
</pre>
</div>
</content>
</entry>
<entry>
<title>NO-JIRA: Clean up test_store.cpp async functionality.</title>
<updated>2014-08-22T14:13:09+00:00</updated>
<author>
<name>Alan Conway</name>
<email>aconway@apache.org</email>
</author>
<published>2014-08-22T14:13:09+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/python-packages/qpid-python.git/commit/?id=f7d0aff1958f3cf2836c4759940c4e55b86a7020'/>
<id>f7d0aff1958f3cf2836c4759940c4e55b86a7020</id>
<content type='text'>
Clean up test_store.cpp to allow control over async completion of messsages.

git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk/qpid@1619814 13f79535-47bb-0310-9956-ffa450edef68
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Clean up test_store.cpp to allow control over async completion of messsages.

git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk/qpid@1619814 13f79535-47bb-0310-9956-ffa450edef68
</pre>
</div>
</content>
</entry>
<entry>
<title>QPID-5966: HA mixing tx enqueue and non-tx dequeue leaves extra messages on backup.</title>
<updated>2014-08-08T09:24:15+00:00</updated>
<author>
<name>Alan Conway</name>
<email>aconway@apache.org</email>
</author>
<published>2014-08-08T09:24:15+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/python-packages/qpid-python.git/commit/?id=0ef26b59c9a070ebcc56f451c9944d0ea69a584d'/>
<id>0ef26b59c9a070ebcc56f451c9944d0ea69a584d</id>
<content type='text'>
There were several problems:

1. Positions of transactionally enqueued messages not known to QueueReplicator, so not dequeued
   on backup if dequeued outside a TX on primary.

2. Race condition if tx created immediately after queue could cause duplication of TX message.

3. Replication IDs were not being set during recovery from store (regression, store change?)

Fix:

1. Update positions QueueReplicator positions via QueueObserver::enqueued to see all enqueues.
2. Check for duplicate replication-ids on backup in QueueReplicator::route.
3. Set replication-id in publish() if not already set in record().

git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk/qpid@1616704 13f79535-47bb-0310-9956-ffa450edef68
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There were several problems:

1. Positions of transactionally enqueued messages not known to QueueReplicator, so not dequeued
   on backup if dequeued outside a TX on primary.

2. Race condition if tx created immediately after queue could cause duplication of TX message.

3. Replication IDs were not being set during recovery from store (regression, store change?)

Fix:

1. Update positions QueueReplicator positions via QueueObserver::enqueued to see all enqueues.
2. Check for duplicate replication-ids on backup in QueueReplicator::route.
3. Set replication-id in publish() if not already set in record().

git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk/qpid@1616704 13f79535-47bb-0310-9956-ffa450edef68
</pre>
</div>
</content>
</entry>
<entry>
<title>QPID-5973: HA cluster state may get stuck in recovering</title>
<updated>2014-08-08T09:23:54+00:00</updated>
<author>
<name>Alan Conway</name>
<email>aconway@apache.org</email>
</author>
<published>2014-08-08T09:23:54+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/python-packages/qpid-python.git/commit/?id=fe4c2fde825a90dd1ff5abf20206cd3b95509a28'/>
<id>fe4c2fde825a90dd1ff5abf20206cd3b95509a28</id>
<content type='text'>
A backup queue is considered "ready" when all messages up to the first guarded
position have either been replicated and acknowledged or dequeued.

Previously this was implemented by waiting for the replicationg subscription to
advance to the first guarded position and wating for all expected acks. However
if messages are dequeued out-of-order (which happens with transactions) there
can be a gap at the tail of the queue. The replicating subscription will not
advance past this gap because it only advances when there are messages to
consume. This resulted in backups stuck in catch-up. The recovering primary has
a time-out for backups that never re-connect, but if they connect sucessfully
and don't disconnect, the primary assumes they will become ready and waits -
causing the primary to be stuck in "recovering".

The fixes is to notify a replicating subscription if it becomes "stopped"
because there are no more messages available on the queue. This implies that
either it is at the tail OR there are no more messags until the tail. Either way
we should consider this "ready" from the point of view of HA catch-up.

git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk/qpid@1616702 13f79535-47bb-0310-9956-ffa450edef68
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A backup queue is considered "ready" when all messages up to the first guarded
position have either been replicated and acknowledged or dequeued.

Previously this was implemented by waiting for the replicationg subscription to
advance to the first guarded position and wating for all expected acks. However
if messages are dequeued out-of-order (which happens with transactions) there
can be a gap at the tail of the queue. The replicating subscription will not
advance past this gap because it only advances when there are messages to
consume. This resulted in backups stuck in catch-up. The recovering primary has
a time-out for backups that never re-connect, but if they connect sucessfully
and don't disconnect, the primary assumes they will become ready and waits -
causing the primary to be stuck in "recovering".

The fixes is to notify a replicating subscription if it becomes "stopped"
because there are no more messages available on the queue. This implies that
either it is at the tail OR there are no more messags until the tail. Either way
we should consider this "ready" from the point of view of HA catch-up.

git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk/qpid@1616702 13f79535-47bb-0310-9956-ffa450edef68
</pre>
</div>
</content>
</entry>
<entry>
<title>QPID-5888: Fix test to ignore occasional TransactionAborted failures on connection.close.</title>
<updated>2014-07-29T12:56:40+00:00</updated>
<author>
<name>Alan Conway</name>
<email>aconway@apache.org</email>
</author>
<published>2014-07-29T12:56:40+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/python-packages/qpid-python.git/commit/?id=2b9bd980678fc07dca4ac1d524a387fbdd7b97d5'/>
<id>2b9bd980678fc07dca4ac1d524a387fbdd7b97d5</id>
<content type='text'>
git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk/qpid@1614331 13f79535-47bb-0310-9956-ffa450edef68
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk/qpid@1614331 13f79535-47bb-0310-9956-ffa450edef68
</pre>
</div>
</content>
</entry>
<entry>
<title>QPID-5888: transaction should always be aborted on failover</title>
<updated>2014-07-18T18:18:42+00:00</updated>
<author>
<name>Alan Conway</name>
<email>aconway@apache.org</email>
</author>
<published>2014-07-18T18:18:42+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/python-packages/qpid-python.git/commit/?id=20735b99c966b71511c3094afaa4bbdccd66a2e3'/>
<id>20735b99c966b71511c3094afaa4bbdccd66a2e3</id>
<content type='text'>
C++ and python clients were attempting to continue the transation transparently
after failover which is in correct. They were re-sending messages in the
transaction but there is no way to re-do transactional receives. The transaction
must be aborted.

The C++ and python clients have been modified to kill a transactional session
with a TransactionAborted exception if there is a failover.

Note the Java client already behaves correctly but not identically.
It defers raising an exception until commit rather than failing
immediately on failover, and the session can still be used.

The following commits are involved:

r1611349 QPID-5887: revised approach to implict abort
r1610959 QPID-5887: allow qpid-txtest2 to be run by make test
r1610958 QPID-5887: fix to new txtest2, acknowledge messages in the check phase to ensure queues remain drained for any subsequent runs
r1609748 QPID-5887: abort transactional session on failover; added equivalent of txtest using messaging API

This commit does the following:

- Update ha_tests.py tx_simpler_failover test to expect transaction aborted.
- Minor improvements to qpid-txtest2
- Fix native (non-swig) python client.

git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk/qpid@1611748 13f79535-47bb-0310-9956-ffa450edef68
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
C++ and python clients were attempting to continue the transation transparently
after failover which is in correct. They were re-sending messages in the
transaction but there is no way to re-do transactional receives. The transaction
must be aborted.

The C++ and python clients have been modified to kill a transactional session
with a TransactionAborted exception if there is a failover.

Note the Java client already behaves correctly but not identically.
It defers raising an exception until commit rather than failing
immediately on failover, and the session can still be used.

The following commits are involved:

r1611349 QPID-5887: revised approach to implict abort
r1610959 QPID-5887: allow qpid-txtest2 to be run by make test
r1610958 QPID-5887: fix to new txtest2, acknowledge messages in the check phase to ensure queues remain drained for any subsequent runs
r1609748 QPID-5887: abort transactional session on failover; added equivalent of txtest using messaging API

This commit does the following:

- Update ha_tests.py tx_simpler_failover test to expect transaction aborted.
- Minor improvements to qpid-txtest2
- Fix native (non-swig) python client.

git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk/qpid@1611748 13f79535-47bb-0310-9956-ffa450edef68
</pre>
</div>
</content>
</entry>
<entry>
<title>QPID-5817: [C++ broker] Improve ACL authorisation of QMF methods and queries</title>
<updated>2014-06-18T07:40:22+00:00</updated>
<author>
<name>Pavel Moravec</name>
<email>pmoravec@apache.org</email>
</author>
<published>2014-06-18T07:40:22+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/python-packages/qpid-python.git/commit/?id=192742ada1386af87047662cb4a7f3d414984738'/>
<id>192742ada1386af87047662cb4a7f3d414984738</id>
<content type='text'>
git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk/qpid@1603364 13f79535-47bb-0310-9956-ffa450edef68
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk/qpid@1603364 13f79535-47bb-0310-9956-ffa450edef68
</pre>
</div>
</content>
</entry>
<entry>
<title>NO-JIRA: HA minor cleanup of qpid-ha tool</title>
<updated>2014-04-24T18:59:03+00:00</updated>
<author>
<name>Alan Conway</name>
<email>aconway@apache.org</email>
</author>
<published>2014-04-24T18:59:03+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/python-packages/qpid-python.git/commit/?id=e8deb749c9669fedb1c073b7b20fbd3ce2b42f0f'/>
<id>e8deb749c9669fedb1c073b7b20fbd3ce2b42f0f</id>
<content type='text'>
- Remove some dead code.
- Removed "set" command - not ready for production. All settings in qpidd.conf.
  - Removed related tests in ha_tests
- Improved help on promote command.
- Made option group for common broker connection options.

git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk/qpid@1589834 13f79535-47bb-0310-9956-ffa450edef68
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Remove some dead code.
- Removed "set" command - not ready for production. All settings in qpidd.conf.
  - Removed related tests in ha_tests
- Improved help on promote command.
- Made option group for common broker connection options.

git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk/qpid@1589834 13f79535-47bb-0310-9956-ffa450edef68
</pre>
</div>
</content>
</entry>
</feed>
