summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Well, this seems to work.bug19662Matthew Sackman2009-06-301-4/+13
|
* and now clustering seems to work again...Matthew Sackman2009-06-301-4/+4
|
* Removed the dumb timer:sleep, and after testing, properly sorted out the ↵Matthew Sackman2009-06-242-13/+22
| | | | | | | | mnesia clustering details. This means that wait_for_tables now waits for _all_ tables which means the bug that was requiring the timer:sleep has gone away. The solution to the clustering issue was to make sure that tables which are local content only are created explicitly on each node before you call wait_for_tables. All tests pass.
* acktags in ack or tx_commit do not need to be ordered. Messages in tx_cancel ↵Matthew Sackman2009-06-232-8/+7
| | | | do not need to be ordered either. Hence removal of quite a lot of lists:reverse.
* without this, rabbit_disk_queue seems to refuse to start up if there are ↵Matthew Sackman2009-06-231-1/+1
| | | | messages to be recovered, due to mnesia not being running fast enough.
* These two fixes were done in 20980 but should really be in this branch.Matthew Sackman2009-06-222-3/+3
|
* substantially bulked up the tests for this bug. All tests pass.Matthew Sackman2009-06-222-48/+94
|
* fixed.Matthew Sackman2009-06-223-18/+53
| | | | | | There was a choice here of either pushing all the txn accountancy into the mixed_queue and taking it out of queue_process or just passing in all the txn pending messages to the mode switch. I chose the latter because the queue_process is already the more readable of the two modules and I didn't want to further complicate the mixed_queue. Also, this way is a smaller API change and really not that much code. Tests pass but I'm about to rewrite the test and bulk it up a bit. Also, running the previous tests - rabbitmq-java-client/build/dist$ sh runjava.sh com/rabbitmq/examples/MulticastMain -y 50 -r 100 -s 104857 -m 100 -z 120 - whilst running (reduce|increase)_memory_footprint is a good thing to do.
* A test. The problem really does exist. Not worked out how to fix it yet.Matthew Sackman2009-06-201-0/+49
|
* fixed bug documented in preceeding commentMatthew Sackman2009-06-201-4/+5
|
* comment typeoMatthew Sackman2009-06-191-1/+1
|
* get_cache_info ==> cache_info.Matthew Sackman2009-06-191-5/+5
| | | | | | | | An even better test (see parent commit message) is: rabbitmq-java-client/build/dist$ sh runjava.sh com/rabbitmq/examples/MulticastMain -y 50 -r 100 -s 1048576 -m 100 -z 120 Rabbit will now happily just sit there and work away (again, run reduce_memory_footprint twice first) even though it's seeing 100MB new a second which is going to 50 consumers, so 5GB a second. Needless to say, go back a few revisions, and it blows up within seconds.
* Just added a means to get the ets:info out for the cache. Testing shows that ↵Matthew Sackman2009-06-191-2/+8
| | | | | | | | | | | | | | | | | | | it does seem to get emptied successfully. So, using this revision, if you run: rabbitmq-java-client/build/dist$ sh runjava.sh com/rabbitmq/examples/MulticastMain -y 10 -r 50 -s 1048576 -m 100 -z 120 then over the two mins, I see beam take between about 30% and 45% of my memory, once it's up and running. Using the revision right after the API change, i.e. 9f0ee0399838, the same test tries to take between about 45% and 60% of my memory. Don't forget to run: rabbitmq-server$ ./scripts/rabbitmqctl reduce_memory_footprint rabbitmq-server$ ./scripts/rabbitmqctl reduce_memory_footprint before running the above test.
* fixesMatthew Sackman2009-06-191-2/+3
|
* Added caching layer using ets which, when a message is shared between ↵Matthew Sackman2009-06-191-10/+61
| | | | multiple queues, eliminates the need for multiple reads, provided the /next/ copy of the message is requested before the previous copy of the message has been acked. Should reduce memory pressure.
* Altered API so that the disk_queue understands about #basic_message. This ↵Matthew Sackman2009-06-194-79/+109
| | | | means that the mixed_queue avoids unnecessary term_to_binary calls. Tests adjusted and whole test suite still passes
* removing two usused functions: publish_with_seq and tx_commit_with_seqMatthew Sackman2009-06-181-28/+2
|
* just merging in identical change from defaultMatthew Sackman2009-06-1818-872/+3079
|\
| * fixing up my issues with tmp dir, hopefully once and for all. TMPDIR is a ↵Matthew Sackman2009-06-181-2/+6
| | | | | | | | standard unix variable which should be honoured
| * removal of two unused functions from disk_queue. There are two more unused ↵Matthew Sackman2009-06-172-19/+8
| | | | | | | | functions which I can't work out what to do about... Also cosmetic
| * preemptive tidyingMatthew Sackman2009-06-171-18/+12
| |
| * sorted out specs.Matthew Sackman2009-06-176-9/+62
| |
| * commentMatthew Sackman2009-06-171-1/+2
| |
| * FunAcc0 ==> FunAccMatthew Sackman2009-06-171-6/+6
| |
| * merging in defaultMatthew Sackman2009-06-177-22/+23
| |\
| * | more renaming and mnesia change to bat fileMatthew Sackman2009-06-173-17/+18
| | |
| * | Renaming variables. All tests still passMatthew Sackman2009-06-173-99/+95
| | |
| * | added batching for autoacks for general run_message_queueMatthew Sackman2009-06-171-16/+15
| | |
| * | adjusted HO-ness in deliver queue beautifully. Thus in the ↵Matthew Sackman2009-06-171-42/+45
| | | | | | | | | | | | deliver_from_queue case, we now reduce n calls to mixed_queue:is_empty to 1 call and pass around the remaining count as the acc. l33t
| * | More tidyingMatthew Sackman2009-06-174-23/+19
| | |
| * | further discussionMatthew Sackman2009-06-171-18/+19
| | |
| * | post case/if discussionMatthew Sackman2009-06-171-9/+6
| | |
| * | mainly if ==> case in suitable places, but also some formattingMatthew Sackman2009-06-175-69/+88
| | |
| * | tabs and line lengthMatthew Sackman2009-06-171-2/+3
| | |
| * | fixed line lengthsMatthew Sackman2009-06-1710-116/+177
| | |
| * | Removing the failed experiment that was the odbc db queueMatthew Sackman2009-06-172-476/+0
| | |
| * | just removing tabsMatthew Sackman2009-06-174-219/+219
| | |
| * | Yep, as I'd thought, the next_seq_id field was totally unused for anything ↵Matthew Sackman2009-06-121-81/+54
| | | | | | | | | | | | useful. The code is thus now a good bit simpler.
| * | Made mixed_queue track its length by itself. This avoids synchronous calls ↵Matthew Sackman2009-06-124-81/+106
| | | | | | | | | | | | | | | | | | to the disk_queue when operating in disk only mode and seems to have substantially improved performance (in addition to avoiding a sync call, repeated lasting for the length of a queue (erlang stdlib) with a million+ items in it can't have been cheap). It now seems to be very much the case that when coming out of disk only mode, huge back logs are recovered reliably. Also, added reduce_memory_footprint and increase_memory_footprint to control. Both can be run twice and alter whether the disk_queue changes mode or the individual queues.
| * | merging in from defaultMatthew Sackman2009-06-1118-845/+3419
| |\ \
| | * | And suddenly it works. Testing showed that removing the crude limit ↵Matthew Sackman2009-06-113-14/+17
| | | | | | | | | | | | | | | | UNSENT_MESSAGE_LIMIT made performance better. This then made me wonder if the unblock and notify_sent messages weren't getting through fast enough, and sure enough, using pcast is much better there. Also, turning on dbg:tpl showed that the common path in mixed_queue was to call publish_delivered (i.e. the message has been delivered to a consumer, we just need to record this fact). Making sure everything in there for the non-persistent, non-durable but disk-only mode is asynchronous also helped performance massively.
| | * | well, I've made the acking for messages which are on disk but are not ↵Matthew Sackman2009-06-102-14/+32
| | | | | | | | | | | | | | | | persistent/durable async, and it has improved some issues. But, if you switch to disk only mode, then allow, say 10k messages to build up (use MulticastMain) then switch back to ram mode, then it won't recover - the receive rate will stay very low, and rabbitmqctl list_queues will continue to grow insanely. This is very very odd, because querying the disk_queue directly for the queue length shows it drops to 0, but at least one CPU is maxed out at 100% use, messages continue to arrive, but the delivery rate never goes back up. Mysterious.
| | * | Took advantage of the gen_server2 priorities.Matthew Sackman2009-06-104-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reversed order - i.e. now when swapping out, the first thing is to alter the disk_queue, and the 2nd thing is to alter the queues. And vice versa. The reasoning is as follows: Changing the disk_queue is a BIG operation because it affects every message in there, from all queues. In order to minimise the impa ct of this operation, we must do it first, not second, because if we do it first, only persistent messages from durable queues will be in there, whereas if we do it second, then all messages from all queues will be in there. Similarly, when swapping in, altering the individual queues is the first thing to do because it prevents the disk queue from growing further (i.e. only persistent messages to durable queues then make it to the disk queue), and each queue pulls out from the disk qu eue all the messages in there and so subsequent delivery from the mixed queue becomes very fast (actually, this is a total lie because of the call to rabbit_disk_queue:phantom_deliver in rabbit_mixed_queue:deliver - if I could get rid of this or at least make it async then that would greatly improve matters).
| | * | Added means to alter all queues and switch to disk_only mode in the disk queue.Matthew Sackman2009-06-106-2/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rabbit_queue_mode_manager:change_memory_usage(undef, true). this will first ask all queues to switch from mixed to disk mode, and will on a 2nd call, ask the disk queue to switch to disk only mode. rabbit_queue_mode_manager:change_memory_usage(undef, false). moves the other way. This all works, eg set MulticastMain pushing in messages and switch modes, and it's fine. One immediate problem is that as soon as everything becomes disk only, the performance suffers, so as a result messages build up. This is as expected. Then, going back to the middle mode (i.e. disk queue in ram_disk mode and queues in disk mode), the switch in the disk queue eats up a lot of memory. I suspect this is the effect of converting the mnesia table from disc_only_copies to disc_copies when there are 40k+ messages in there (one row per message). As a result, this conversion on its own is very dangerous to make. It might be more sensible to use the "weird" mode, where the queues are in mixed mode and the disk queue is in disk_only mode so as to try and get the queues to drain as fast as possible, reducing the size of the mnesia table so that when it is finally converted back, it's small. More experimentation is needed. I'll hook the above commands into rabbitmqctl soon.
| | * | just merging in default.Matthew Sackman2009-06-1017-849/+3275
| | |\ \
| | | * | Two things have happened here. Firstly, the mixed_queue now functions ↵Matthew Sackman2009-06-103-124/+245
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | correctly when being run in disk_only mode. This is _much_ more complicated than I had thought because of the fact that the presence of a message on disk has nothing to do with whether it is persistent or not. As a result early acking is required and requeuing operations are horrendous to say the least. When going from disk-only mode to mixed mode, we don't ack anything at all. It's arguable that we should ack non-persistent messages at this point, but the problem there is that if the conversion fails then we lose messages. Therefore, we then arrive at the sitation where we're in mixed mode, and we have messages held in ram that are not persistent, but are still on disk, and require early acking when being delivered (again, requeue is hell). The conversion to and from disk-only and mixed mode now seems to work well. When starting up, non-persistent messages on disk are deleted. Finally, in disk_queue, publish now takes an IsDelivered flag. This allows you to publish messages and mark them delivered in one go. However, the message is still available for delivery (i.e. it's not waiting for an ack). Also in disk_queue, requeue_with_seqs is now [{AckTag, {NewSeqId, NewIsDelivered}}], which allows you to requeue and unset the delivered flag. Note however, that it is still not safe to requeue a message which isn't waiting for an ack. (Please note, it's now very important to distinguish between messages which "AreDelivered" _and_ are waiting for an ack _and_ are not going to appear if you call deliver(Q), VERSUS messages which "AreDelivered" but are not waiting for an ack and will appear (eventually) if you call deliver(Q).
| | | * | just committing as need to work from home tomorrow. Code in "interesting" ↵Matthew Sackman2009-06-093-52/+130
| | | | | | | | | | | | | | | | | | | | state of flux. disk mode to mixed mode in the mixed_queue is annoyingly hard.
| | | * | Using delayed_write batches together small writes and reduces the number of ↵Matthew Sackman2009-06-091-10/+32
| | | | | | | | | | | | | | | | | | | | OS calls. This is a good thing and makes writing to disk much faster. However, we can have the situation where we are trying to read a message off disk before that message has been fully written out to disk. Therefore, we need to fsync at choice times. Because fsync is quite expensive, we want to call fsync no more than absolutely necessary. Thus we now have a 'dirty' flag which tracks whether the current file has been written to sinc the last fsync, and we call fsync whenever is dirty and the file to read from is the current file. This has also had some similar changes elsewhere in the disk queue. In short however, it seems this does work as I'm no longer able to reproduce reads of messages which return all blanks.
| | | * | Logic failure which only came to light when trying to run the consumers as ↵Matthew Sackman2009-06-091-3/+5
| | | | | | | | | | | | | | | | | | | | documented in bug 20470
| | | * | can now switch the mixed queue between modesMatthew Sackman2009-06-081-1/+48
| | | | |