summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--qpid/doc/book/src/cpp-broker/Active-Passive-Cluster.xml84
1 files changed, 62 insertions, 22 deletions
diff --git a/qpid/doc/book/src/cpp-broker/Active-Passive-Cluster.xml b/qpid/doc/book/src/cpp-broker/Active-Passive-Cluster.xml
index 6e0225a2af..246a0a4ab5 100644
--- a/qpid/doc/book/src/cpp-broker/Active-Passive-Cluster.xml
+++ b/qpid/doc/book/src/cpp-broker/Active-Passive-Cluster.xml
@@ -219,6 +219,12 @@ under the License.
The broker must load the <filename>ha</filename> module, it is loaded by
default. The following broker options are available for the HA module.
</para>
+ <note>
+ <para>
+ Incorrect security settings are a common cause of problems when
+ getting started, see <xref linkend="ha-security"/>.
+ </para>
+ </note>
<table frame="all" id="ha-broker-options">
<title>Broker Options for High Availability Messaging Cluster</title>
<tgroup align="left" cols="2" colsep="1" rowsep="1">
@@ -822,8 +828,22 @@ connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconn
Please see <xref linkend="chap-Messaging_User_Guide-Security"/> for
more details on enabling authentication and setting up Access Control Lists.
</para>
+ <note>
+ <para>
+ Unless you disable authentication with <literal>auth=no</literal> in
+ your configuration, you <emphasis>must</emphasis> set the options below
+ and you <emphasis>must</emphasis> have an ACL file with at least the
+ entry described below.
+ </para>
+ <para>
+ Backups will be <emphasis>unable to connect to the primary</emphasis> if
+ the security configuration is incorrect. See also <xref
+ linkend="ha-troubleshoot-security"/>
+ </para>
+ </note>
<para>
- When authentication is enabled, HA brokers use the credentials set by the following options:
+ When authentication is enabled you must set the credentials used by HA
+ brokers with following options:
</para>
<table frame="all" id="ha-security-options">
<title>HA Security Options</title>
@@ -848,7 +868,13 @@ connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconn
</row>
<row>
<entry><para><literal>ha-mechanism</literal> <replaceable>MECHANISM</replaceable></para></entry>
- <entry><para>Mechanism for HA brokers.</para></entry>
+ <entry>
+ <para>
+ Mechanism for HA brokers. Any mechanism you enable for
+ broker-to-broker communication can also be used by a client, so
+ do not use ha-mechanism=ANONYMOUS in a secure environment.
+ </para>
+ </entry>
</row>
</tbody>
</tgroup>
@@ -922,27 +948,41 @@ qpid-ha -b <replaceable>broker-address</replaceable> promote
This section applies to clusters that are using rgmanager as the
cluster manager.
</para>
- <section id="authentication-failures">
- <title>Authentication failures</title>
+ <section id="ha-troubleshoot-no-primary">
+ <title>No primary broker</title>
+ <para>
+ When you initially start a HA cluster, all brokers are in
+ <literal>joining</literal> mode. The brokers do not automatically select
+ a primary, they rely on the cluster manager <literal>rgmanager</literal>
+ to do so. If <literal>rgmanager</literal> is not running or is not
+ configured correctly, brokers will remain in the
+ <literal>joining</literal> state. See <xref linkend="ha-rm-config"/>
+ </para>
+ </section>
+ <section id="ha-troubleshoot-security">
+ <title>Authentication and ACL failures</title>
<para>
- If a broker is unable to establish a connection to another broker
- in the cluster due to authentication problems, the log will
- contain SASL errors, for example:
+ If a broker is unable to establish a connection to another broker in the
+ cluster due to authentication or ACL problems the logs may contain
+ errors like the following:
+ <programlisting>
+info SASL: Authentication failed: SASL(-13): user not found: Password verification failed
+ </programlisting>
+ <programlisting>
+warning Client closed connection with 320: User anonymous@QPID federation connection denied. Systems with authentication enabled must specify ACL create link rules.
+ </programlisting>
<programlisting>
-2012-aug-04 10:17:37 info SASL: Authentication failed: SASL(-13): user not found: Password verification failed
+warning Client closed connection with 320: ACL denied anonymous@QPID creating a federation link.
</programlisting>
</para>
<para>
- Set the SASL user name and password used to connect to other
- brokers using the ha-username and ha-password properties when you
- start the broker. Set the SASL mode using ha-mechanism. Any
- mechanism you enable for broker-to-broker communication can also
- be used by a client, so do not enable ha-mechanism=ANONYMOUS in a
- secure environment. Once the cluster is running, run qpid-ha to
- make sure that the brokers are running as one cluster.
+ Set the HA security configuration and ACL file as described in <xref
+ linkend="ha-security"/>. Once the cluster is running and the primary is
+ promoted , run <literal>qpid-ha</literal> to make sure that the brokers
+ are running as one cluster.
</para>
</section>
- <section id="slow-recovery-times">
+ <section id="ha-troubleshoot-slow-recovery">
<title>Slow recovery times</title>
<para>
The following configuration settings affect recovery time. The
@@ -950,7 +990,7 @@ qpid-ha -b <replaceable>broker-address</replaceable> promote
loaded system. You should run tests to determine if the values are
appropriate for your system and load conditions.
</para>
- <section id="cluster.conf">
+ <section id="ha-troubleshoot-cluster.conf">
<title>cluster.conf:</title>
<programlisting>
&lt;rm status_poll_interval=1&gt;
@@ -970,7 +1010,7 @@ qpid-ha -b <replaceable>broker-address</replaceable> promote
failing over the VIP to a new address.
</para>
</section>
- <section id="qpidd.conf">
+ <section id="ha-troubleshoot-qpidd.conf">
<title>qpidd.conf</title>
<programlisting>
link-maintenance-interval=0.1
@@ -1006,7 +1046,7 @@ link-heartbeat-interval=5
</para>
</section>
</section>
- <section id="total-cluster-failure">
+ <section id="ha-troubleshoot-total-cluster-failure">
<title>Total cluster failure</title>
<para>
The cluster can only guarantee availability as long as there is at
@@ -1047,7 +1087,7 @@ link-heartbeat-interval=5
If the surviving broker fails before that the cluster will fail in
one of two modes (depending on the exact timing of failures)
</para>
- <section id="the-cluster-hangs">
+ <section id="ha-troubleshoot-the-cluster-hangs">
<title>1. The cluster hangs</title>
<para>
All brokers are in joining or catch-up mode. rgmanager tries to
@@ -1080,7 +1120,7 @@ service:qpidd-primary-service (20.0.10.33) stopped
with clusvcadm, then restart (primary last)
</para>
</section>
- <section id="the-cluster-reboots">
+ <section id="ha-troubleshoot-the-cluster-reboots">
<title>2. The cluster reboots</title>
<para>
A new primary is promoted and the cluster is functional but all
@@ -1088,7 +1128,7 @@ service:qpidd-primary-service (20.0.10.33) stopped
</para>
</section>
</section>
- <section id="fencing-and-network-partitions">
+ <section id="ha-troubleshoot-fencing-and-network-partitions">
<title>Fencing and network partitions</title>
<para>
A network partition is a a network failure that divides the