summaryrefslogtreecommitdiff
path: root/ctdb/NEWS
blob: 6dee537565a01b0905d63b7caee8c62a775a3522 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
Changes in CTDB 2.5.1
=====================

Important bug fixes
-------------------

* The locking code now correctly implements a per-database active
  locks limit.  Whole database lock requests can no longer be denied
  because there are too many active locks - this is particularly
  important for freezing databases during recovery.

* The debug_locks.sh script locks against itself.  If it is already
  running then subsequent invocations will exit immediately.

* ctdb tool commands that operate on databases now work correctly when
  a database ID is given.

* Various code fixes for issues found by Coverity.

Important internal changes
--------------------------

* statd-callout has been updated so that statd client information is
  always up-to-date across the cluster.  This is implemented by
  storing the client information in a persistent database using a new
  "ctdb ptrans" command.

* The transaction code for persistent databases now retries until it
  is able to take the transaction lock.  This makes the transation
  semantics compatible with Samba's implementation.

* Locking helpers are created with vfork(2) instead of fork(2),
  providing a performance improvement.

* config.guess has been updated to the latest upstream version so CTDB
  should build on more platforms.


Changes in CTDB 2.5
===================

User-visible changes
--------------------

* The default location of the ctdbd socket is now:

    /var/run/ctdb/ctdbd.socket

  If you currently set CTDB_SOCKET in configuration then unsetting it
  will probably do what you want.

* The default location of CTDB TDB databases is now:

    /var/lib/ctdb

  If you only set CTDB_DBDIR (to the old default of /var/ctdb) then
  you probably want to move your databases to /var/lib/ctdb, drop your
  setting of CTDB_DBDIR and just use the default.

  To maintain the database files in /var/ctdb you will need to set
  CTDB_DBDIR, CTDB_DBDIR_PERSISTENT and CTDB_DBDIR_STATE, since all of
  these have moved.

* Use of CTDB_OPTIONS to set ctdbd command-line options is no longer
  supported.  Please use individual configuration variables instead.

* Obsolete tunables VacuumDefaultInterval, VacuumMinInterval and
  VacuumMaxInterval have been removed.  Setting them had no effect but
  if you now try to set them in a configuration files via CTDB_SET_X=Y
  then CTDB will not start.

* Much improved manual pages.  Added new manpages ctdb(7),
  ctdbd.conf(5), ctdb-tunables(7).  Still some work to do.

* Most CTDB-specific configuration can now be set in
  /etc/ctdb/ctdbd.conf.

  This avoids cluttering distribution-specific configuration files,
  such as /etc/sysconfig/ctdb.  It also means that we can say: see
  ctdbd.conf(5) for more details.  :-)

* Configuration variable NFS_SERVER_MODE is deprecated and has been
  replaced by CTDB_NFS_SERVER_MODE.  See ctdbd.conf(5) for more
  details.

* "ctdb reloadips" is much improved and should be used for reloading
  the public IP configuration.

  This commands attempts to yield much more predictable IP allocations
  than using sequences of delip and addip commands.  See ctdb(1) for
  details.

* Ability to pass comma-separated string to ctdb(1) tool commands via
  the -n option is now documented and works for most commands.  See
  ctdb(1) for details.

* "ctdb rebalancenode" is now a debugging command and should not be
  used in normal operation.  See ctdb(1) for details.

* "ctdb ban 0" is now invalid.

  This was documented as causing a permanent ban.  However, this was
  not implemented and caused an "unban" instead.  To avoid confusion,
  0 is now an invalid ban duration.  To administratively "ban" a node
  use "ctdb stop" instead.

* The systemd configuration now puts the PID file in /run/ctdb (rather
  than /run/ctdbd) for consistency with the initscript and other uses
  of /var/run/ctdb.

Important bug fixes
-------------------

* Traverse regression fixed.

* The default recovery method for persistent databases has been
  changed to use database sequence numbers instead of doing
  record-by-record recovery (using record sequence numbers).  This
  fixes issues including registry corruption.

* Banned nodes are no longer told to run the "ipreallocated" event
  during a takeover run, when in fallback mode with nodes that don't
  support the IPREALLOCATED control.

Important internal changes
--------------------------

* Persistent transactions are now compatible with Samba and work
  reliably.

* The recovery master role has been made more stable by resetting the
  priority time each time a node becomes inactive.  This means that
  nodes that are active for a long time are more likely to retain the
  recovery master role.

* The incomplete libctdb library has been removed.

* Test suite now starts ctdbd with the --sloppy-start option to speed
  up startup.  However, this should not be done in production.


Changes in CTDB 2.4
===================

User-visible changes
--------------------

* A missing network interface now causes monitoring to fail and the
  node to become unhealthy.

* Changed ctdb command's default control timeout from 3s to 10s.

* debug-hung-script.sh now includes the output of "ctdb scriptstatus"
  to provide more information.

Important bug fixes
-------------------

* Starting CTDB daemon by running ctdbd directly should not remove
  existing unix socket unconditionally.

* ctdbd once again successfully kills client processes on releasing
  public IPs.  It was checking for them as tracked child processes
  and not finding them, so wasn't killing them.

* ctdbd_wrapper now exports CTDB_SOCKET so that child processes of
  ctdbd (such as uses of ctdb in eventscripts) use the correct socket.

* Always use Jenkins hash when creating volatile databases.  There
  were a few places where TDBs would be attached with the wrong flags.

* Vacuuming code fixes in CTDB 2.2 introduced bugs in the new code
  which led to header corruption for empty records.  This resulted
  in inconsistent headers on two nodes and a request for such a record
  keeps bouncing between nodes indefinitely and logs "High hopcount"
  messages in the log. This also caused performance degradation.

* ctdbd was losing log messages at shutdown because they weren't being
  given time to flush.  ctdbd now sleeps for a second during shutdown
  to allow time to flush log messages.

* Improved socket handling introduced in CTDB 2.2 caused ctdbd to
  process a large number of packets available on single FD before
  polling other FDs.  Use fixed size queue buffers to allow fair
  scheduling across multiple FDs.

Important internal changes
--------------------------

* A node that fails to take/release multiple IPs will only incur a
  single banning credit.  This makes a brief failure less likely to
  cause node to be banned.

* ctdb killtcp has been changed to read connections from stdin and
  10.interface now uses this feature to improve the time taken to kill
  connections.

* Improvements to hot records statistics in ctdb dbstatistics.

* Recovery daemon now assembles up-to-date node flags information
  from remote nodes before checking if any flags are inconsistent and
  forcing a recovery.

* ctdbd no longer creates multiple lock sub-processes for the same
  key.  This reduces the number of lock sub-processes substantially.

* Changed the nfsd RPC check failure policy to failover quickly
  instead of trying to repair a node first by restarting NFS.  Such
  restarts would often hang if the cause of the RPC check failure was
  the cluster filesystem or storage.

* Logging improvements relating to high hopcounts and sticky records.

* Make sure lower level tdb messages are logged correctly.

* CTDB commands disable/enable/stop/continue are now resilient to
  individual control failures and retry in case of failures.


Changes in CTDB 2.3
===================

User-visible changes
--------------------

* 2 new configuration variables for 60.nfs eventscript:

  - CTDB_MONITOR_NFS_THREAD_COUNT
  - CTDB_NFS_DUMP_STUCK_THREADS

  See ctdb.sysconfig for details.

* Removed DeadlockTimeout tunable.  To enable debug of locking issues set

   CTDB_DEBUG_LOCKS=/etc/ctdb/debug_locks.sh

* In overall statistics and database statistics, lock buckets have been
  updated to use following timings:

   < 1ms, < 10ms, < 100ms, < 1s, < 2s, < 4s, < 8s, < 16s, < 32s, < 64s, >= 64s

* Initscript is now simplified with most CTDB-specific functionality
  split out to ctdbd_wrapper, which is used to start and stop ctdbd.

* Add systemd support.

* CTDB subprocesses are now given informative names to allow them to
  be easily distinguished when using programs like "top" or "perf".

Important bug fixes
-------------------

* ctdb tool should not exit from a retry loop if a control times out
  (e.g. under high load).  This simple fix will stop an exit from the
  retry loop on any error.

* When updating flags on all nodes, use the correct updated flags.  This
  should avoid wrong flag change messages in the logs.

* The recovery daemon will not ban other nodes if the current node
  is banned.

* ctdb dbstatistics command now correctly outputs database statistics.

* Fixed a panic with overlapping shutdowns (regression in 2.2).

* Fixed 60.ganesha "monitor" event (regression in 2.2).

* Fixed a buffer overflow in the "reloadips" implementation.

* Fixed segmentation faults in ping_pong (called with incorrect
  argument) and test binaries (called when ctdbd not running).

Important internal changes
--------------------------

* The recovery daemon on stopped or banned node will stop participating in any
  cluster activity.

* Improve cluster wide database traverse by sending the records directly from
  traverse child process to requesting node.

* TDB checking and dropping of all IPs moved from initscript to "init"
  event in 00.ctdb.

* To avoid "rogue IPs" the release IP callback now fails if the
  released IP is still present on an interface.


Changes in CTDB 2.2
===================

User-visible changes
--------------------

* The "stopped" event has been removed.

  The "ipreallocated" event is now run when a node is stopped.  Use
  this instead of "stopped".

* New --pidfile option for ctdbd, used by initscript

* The 60.nfs eventscript now uses configuration files in
  /etc/ctdb/nfs-rpc-checks.d/ for timeouts and actions instead of
  hardcoding them into the script.

* Notification handler scripts can now be dropped into /etc/ctdb/notify.d/.

* The NoIPTakeoverOnDisabled tunable has been renamed to
  NoIPHostOnAllDisabled and now works properly when set on individual
  nodes.

* New ctdb subcommand "runstate" prints the current internal runstate.
  Runstates are used for serialising startup.

Important bug fixes
-------------------

* The Unix domain socket is now set to non-blocking after the
  connection succeeds.  This avoids connections failing with EAGAIN
  and not being retried.

* Fetching from the log ringbuffer now succeeds if the buffer is full.

* Fix a severe recovery bug that can lead to data corruption for SMB clients.

* The statd-callout script now runs as root via sudo.

* "ctdb delip" no longer fails if it is unable to move the IP.

* A race in the ctdb tool's ipreallocate code was fixed.  This fixes
  potential bugs in the "disable", "enable", "stop", "continue",
  "ban", "unban", "ipreallocate" and "sync" commands.

* The monitor cancellation code could sometimes hang indefinitely.
  This could cause "ctdb stop" and "ctdb shutdown" to fail.

Important internal changes
--------------------------

* The socket I/O handling has been optimised to improve performance.

* IPs will not be assigned to nodes during CTDB initialisation.  They
  will only be assigned to nodes that are in the "running" runstate.

* Improved database locking code.  One improvement is to use a
  standalone locking helper executable - the avoids creating many
  forked copies of ctdbd and potentially running a node out of memory.

* New control CTDB_CONTROL_IPREALLOCATED is now used to generate
  "ipreallocated" events.

* Message handlers are now indexed, providing a significant
  performance improvement.