summaryrefslogtreecommitdiff
path: root/storage
diff options
context:
space:
mode:
authorMarko Mäkelä <marko.makela@mariadb.com>2023-03-16 17:19:58 +0200
committerMarko Mäkelä <marko.makela@mariadb.com>2023-03-16 17:19:58 +0200
commita55b951e6082a4ce9a1f2ed5ee176ea7dbbaf1f2 (patch)
treebbc01052f654499f11d4ee04bb17cf7480ae6e96 /storage
parent9593cccf285ee348fc9a2743c1ed7d24c768439b (diff)
downloadmariadb-git-a55b951e6082a4ce9a1f2ed5ee176ea7dbbaf1f2.tar.gz
MDEV-26827 Make page flushing even faster
For more convenient monitoring of something that could greatly affect the volume of page writes, we add the status variable Innodb_buffer_pool_pages_split that was previously only available via information_schema.innodb_metrics as "innodb_page_splits". This was suggested by Axel Schwenke. buf_flush_page_count: Replaced with buf_pool.stat.n_pages_written. We protect buf_pool.stat (except n_page_gets) with buf_pool.mutex and remove unnecessary export_vars indirection. buf_pool.flush_list_bytes: Moved from buf_pool.stat.flush_list_bytes. Protected by buf_pool.flush_list_mutex. buf_pool_t::page_cleaner_status: Replaces buf_pool_t::n_flush_LRU_, buf_pool_t::n_flush_list_, and buf_pool_t::page_cleaner_is_idle. Protected by buf_pool.flush_list_mutex. We will exclusively broadcast buf_pool.done_flush_list by the buf_flush_page_cleaner thread, and only wait for it when communicating with buf_flush_page_cleaner. There is no need to keep a count of pending writes by the buf_pool.flush_list processing. A single flag suffices for that. Waits for page write completion can be performed by simply waiting on block->page.lock, or by invoking buf_dblwr.wait_for_page_writes(). buf_LRU_block_free_non_file_page(): Broadcast buf_pool.done_free and set buf_pool.try_LRU_scan when freeing a page. This would be executed also as part of buf_page_write_complete(). buf_page_write_complete(): Do not broadcast buf_pool.done_flush_list, and do not acquire buf_pool.mutex unless buf_pool.LRU eviction is needed. Let buf_dblwr count all writes to persistent pages and broadcast a condition variable when no outstanding writes remain. buf_flush_page_cleaner(): Prioritize LRU flushing and eviction right after "furious flushing" (lsn_limit). Simplify the conditions and reduce the hold time of buf_pool.flush_list_mutex. Refuse to shut down or sleep if buf_pool.ran_out(), that is, LRU eviction is needed. buf_pool_t::page_cleaner_wakeup(): Add the optional parameter for_LRU. buf_LRU_get_free_block(): Protect buf_lru_free_blocks_error_printed with buf_pool.mutex. Invoke buf_pool.page_cleaner_wakeup(true) to to ensure that buf_flush_page_cleaner() will process the LRU flush request. buf_do_LRU_batch(), buf_flush_list(), buf_flush_list_space(): Update buf_pool.stat.n_pages_written when submitting writes (while holding buf_pool.mutex), not when completing them. buf_page_t::flush(), buf_flush_discard_page(): Require that the page U-latch be acquired upfront, and remove buf_page_t::ready_for_flush(). buf_pool_t::delete_from_flush_list(): Remove the parameter "bool clear". buf_flush_page(): Count pending page writes via buf_dblwr. buf_flush_try_neighbors(): Take the block of page_id as a parameter. If the tablespace is dropped before our page has been written out, release the page U-latch. buf_pool_invalidate(): Let the caller ensure that there are no outstanding writes. buf_flush_wait_batch_end(false), buf_flush_wait_batch_end_acquiring_mutex(false): Replaced with buf_dblwr.wait_for_page_writes(). buf_flush_wait_LRU_batch_end(): Replaces buf_flush_wait_batch_end(true). buf_flush_list(): Remove some broadcast of buf_pool.done_flush_list. buf_flush_buffer_pool(): Invoke also buf_dblwr.wait_for_page_writes(). buf_pool_t::io_pending(), buf_pool_t::n_flush_list(): Remove. Outstanding writes are reflected by buf_dblwr.pending_writes(). buf_dblwr_t::init(): New function, to initialize the mutex and the condition variables, but not the backing store. buf_dblwr_t::is_created(): Replaces buf_dblwr_t::is_initialised(). buf_dblwr_t::pending_writes(), buf_dblwr_t::writes_pending: Keeps track of writes of persistent data pages. buf_flush_LRU(): Allow calls while LRU flushing may be in progress in another thread. Tested by Matthias Leich (correctness) and Axel Schwenke (performance)
Diffstat (limited to 'storage')
-rw-r--r--storage/innobase/btr/btr0btr.cc4
-rw-r--r--storage/innobase/buf/buf0buf.cc51
-rw-r--r--storage/innobase/buf/buf0dblwr.cc66
-rw-r--r--storage/innobase/buf/buf0flu.cc731
-rw-r--r--storage/innobase/buf/buf0lru.cc42
-rw-r--r--storage/innobase/buf/buf0rea.cc105
-rw-r--r--storage/innobase/gis/gis0rtree.cc2
-rw-r--r--storage/innobase/handler/ha_innodb.cc34
-rw-r--r--storage/innobase/include/buf0buf.h159
-rw-r--r--storage/innobase/include/buf0dblwr.h69
-rw-r--r--storage/innobase/include/buf0flu.h9
-rw-r--r--storage/innobase/include/buf0rea.h9
-rw-r--r--storage/innobase/include/fil0fil.h2
-rw-r--r--storage/innobase/include/srv0srv.h16
-rw-r--r--storage/innobase/log/log0log.cc8
-rw-r--r--storage/innobase/srv/srv0mon.cc12
-rw-r--r--storage/innobase/srv/srv0srv.cc48
-rw-r--r--storage/innobase/srv/srv0start.cc2
-rw-r--r--storage/rocksdb/mysql-test/rocksdb/r/innodb_i_s_tables_disabled.result2
19 files changed, 703 insertions, 668 deletions
diff --git a/storage/innobase/btr/btr0btr.cc b/storage/innobase/btr/btr0btr.cc
index 1b69f4c7170..e54c2a101b8 100644
--- a/storage/innobase/btr/btr0btr.cc
+++ b/storage/innobase/btr/btr0btr.cc
@@ -2975,6 +2975,8 @@ btr_page_split_and_insert(
ut_ad(*err == DB_SUCCESS);
ut_ad(dtuple_check_typed(tuple));
+ buf_pool.pages_split++;
+
if (cursor->index()->is_spatial()) {
/* Split rtree page and update parent */
return rtr_page_split_and_insert(flags, cursor, offsets, heap,
@@ -3371,8 +3373,6 @@ func_exit:
left_block, right_block, mtr);
}
- MONITOR_INC(MONITOR_INDEX_SPLIT);
-
ut_ad(page_validate(buf_block_get_frame(left_block),
page_cursor->index));
ut_ad(page_validate(buf_block_get_frame(right_block),
diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc
index 510872c142e..106569f74b2 100644
--- a/storage/innobase/buf/buf0buf.cc
+++ b/storage/innobase/buf/buf0buf.cc
@@ -1401,8 +1401,10 @@ inline bool buf_pool_t::withdraw_blocks()
true);
mysql_mutex_unlock(&buf_pool.mutex);
buf_dblwr.flush_buffered_writes();
+ mysql_mutex_lock(&buf_pool.flush_list_mutex);
+ buf_flush_wait_LRU_batch_end();
+ mysql_mutex_unlock(&buf_pool.flush_list_mutex);
mysql_mutex_lock(&buf_pool.mutex);
- buf_flush_wait_batch_end(true);
}
/* relocate blocks/buddies in withdrawn area */
@@ -2265,13 +2267,15 @@ lookup:
return bpage;
must_read_page:
- if (dberr_t err= buf_read_page(page_id, zip_size))
- {
+ switch (dberr_t err= buf_read_page(page_id, zip_size)) {
+ case DB_SUCCESS:
+ case DB_SUCCESS_LOCKED_REC:
+ goto lookup;
+ default:
ib::error() << "Reading compressed page " << page_id
<< " failed with error: " << err;
return nullptr;
}
- goto lookup;
}
/********************************************************************//**
@@ -2511,20 +2515,23 @@ loop:
corrupted, or if an encrypted page with a valid
checksum cannot be decypted. */
- if (dberr_t local_err = buf_read_page(page_id, zip_size)) {
- if (local_err != DB_CORRUPTION
- && mode != BUF_GET_POSSIBLY_FREED
+ switch (dberr_t local_err = buf_read_page(page_id, zip_size)) {
+ case DB_SUCCESS:
+ case DB_SUCCESS_LOCKED_REC:
+ buf_read_ahead_random(page_id, zip_size, ibuf_inside(mtr));
+ break;
+ default:
+ if (mode != BUF_GET_POSSIBLY_FREED
&& retries++ < BUF_PAGE_READ_MAX_RETRIES) {
DBUG_EXECUTE_IF("intermittent_read_failure",
retries = BUF_PAGE_READ_MAX_RETRIES;);
- } else {
- if (err) {
- *err = local_err;
- }
- return nullptr;
}
- } else {
- buf_read_ahead_random(page_id, zip_size, ibuf_inside(mtr));
+ /* fall through */
+ case DB_PAGE_CORRUPTED:
+ if (err) {
+ *err = local_err;
+ }
+ return nullptr;
}
ut_d(if (!(++buf_dbg_counter % 5771)) buf_pool.validate());
@@ -3279,12 +3286,12 @@ retry:
buf_unzip_LRU_add_block(reinterpret_cast<buf_block_t*>(bpage), FALSE);
}
+ buf_pool.stat.n_pages_created++;
mysql_mutex_unlock(&buf_pool.mutex);
mtr->memo_push(reinterpret_cast<buf_block_t*>(bpage), MTR_MEMO_PAGE_X_FIX);
bpage->set_accessed();
- buf_pool.stat.n_pages_created++;
/* Delete possible entries for the page from the insert buffer:
such can exist if the page belonged to an index which was dropped */
@@ -3534,7 +3541,6 @@ dberr_t buf_page_t::read_complete(const fil_node_t &node)
ut_d(auto n=) buf_pool.n_pend_reads--;
ut_ad(n > 0);
- buf_pool.stat.n_pages_read++;
const byte *read_frame= zip.data ? zip.data : frame;
ut_ad(read_frame);
@@ -3686,9 +3692,6 @@ void buf_pool_invalidate()
{
mysql_mutex_lock(&buf_pool.mutex);
- buf_flush_wait_batch_end(true);
- buf_flush_wait_batch_end(false);
-
/* It is possible that a write batch that has been posted
earlier is still not complete. For buffer pool invalidation to
proceed we must ensure there is NO write activity happening. */
@@ -3839,8 +3842,8 @@ void buf_pool_t::print()
<< UT_LIST_GET_LEN(flush_list)
<< ", n pending decompressions=" << n_pend_unzip
<< ", n pending reads=" << n_pend_reads
- << ", n pending flush LRU=" << n_flush_LRU_
- << " list=" << n_flush_list_
+ << ", n pending flush LRU=" << n_flush()
+ << " list=" << buf_dblwr.pending_writes()
<< ", pages made young=" << stat.n_pages_made_young
<< ", not young=" << stat.n_pages_not_made_young
<< ", pages read=" << stat.n_pages_read
@@ -3952,13 +3955,13 @@ void buf_stats_get_pool_info(buf_pool_info_t *pool_info)
pool_info->flush_list_len = UT_LIST_GET_LEN(buf_pool.flush_list);
pool_info->n_pend_unzip = UT_LIST_GET_LEN(buf_pool.unzip_LRU);
- mysql_mutex_unlock(&buf_pool.flush_list_mutex);
pool_info->n_pend_reads = buf_pool.n_pend_reads;
- pool_info->n_pending_flush_lru = buf_pool.n_flush_LRU_;
+ pool_info->n_pending_flush_lru = buf_pool.n_flush();
- pool_info->n_pending_flush_list = buf_pool.n_flush_list_;
+ pool_info->n_pending_flush_list = buf_dblwr.pending_writes();
+ mysql_mutex_unlock(&buf_pool.flush_list_mutex);
current_time = time(NULL);
time_elapsed = 0.001 + difftime(current_time,
diff --git a/storage/innobase/buf/buf0dblwr.cc b/storage/innobase/buf/buf0dblwr.cc
index c71fd8df068..72b1ba5ca2b 100644
--- a/storage/innobase/buf/buf0dblwr.cc
+++ b/storage/innobase/buf/buf0dblwr.cc
@@ -46,7 +46,17 @@ inline buf_block_t *buf_dblwr_trx_sys_get(mtr_t *mtr)
0, RW_X_LATCH, mtr);
}
-/** Initialize the doublewrite buffer data structure.
+void buf_dblwr_t::init()
+{
+ if (!active_slot)
+ {
+ active_slot= &slots[0];
+ mysql_mutex_init(buf_dblwr_mutex_key, &mutex, nullptr);
+ pthread_cond_init(&cond, nullptr);
+ }
+}
+
+/** Initialise the persistent storage of the doublewrite buffer.
@param header doublewrite page header in the TRX_SYS page */
inline void buf_dblwr_t::init(const byte *header)
{
@@ -54,8 +64,6 @@ inline void buf_dblwr_t::init(const byte *header)
ut_ad(!active_slot->reserved);
ut_ad(!batch_running);
- mysql_mutex_init(buf_dblwr_mutex_key, &mutex, nullptr);
- pthread_cond_init(&cond, nullptr);
block1= page_id_t(0, mach_read_from_4(header + TRX_SYS_DOUBLEWRITE_BLOCK1));
block2= page_id_t(0, mach_read_from_4(header + TRX_SYS_DOUBLEWRITE_BLOCK2));
@@ -74,7 +82,7 @@ inline void buf_dblwr_t::init(const byte *header)
@return whether the operation succeeded */
bool buf_dblwr_t::create()
{
- if (is_initialised())
+ if (is_created())
return true;
mtr_t mtr;
@@ -343,7 +351,7 @@ func_exit:
void buf_dblwr_t::recover()
{
ut_ad(recv_sys.parse_start_lsn);
- if (!is_initialised())
+ if (!is_created())
return;
uint32_t page_no_dblwr= 0;
@@ -452,10 +460,9 @@ next_page:
/** Free the doublewrite buffer. */
void buf_dblwr_t::close()
{
- if (!is_initialised())
+ if (!active_slot)
return;
- /* Free the double write data structures. */
ut_ad(!active_slot->reserved);
ut_ad(!active_slot->first_free);
ut_ad(!batch_running);
@@ -469,35 +476,41 @@ void buf_dblwr_t::close()
mysql_mutex_destroy(&mutex);
memset((void*) this, 0, sizeof *this);
- active_slot= &slots[0];
}
/** Update the doublewrite buffer on write completion. */
-void buf_dblwr_t::write_completed()
+void buf_dblwr_t::write_completed(bool with_doublewrite)
{
ut_ad(this == &buf_dblwr);
- ut_ad(srv_use_doublewrite_buf);
- ut_ad(is_initialised());
ut_ad(!srv_read_only_mode);
mysql_mutex_lock(&mutex);
- ut_ad(batch_running);
- slot *flush_slot= active_slot == &slots[0] ? &slots[1] : &slots[0];
- ut_ad(flush_slot->reserved);
- ut_ad(flush_slot->reserved <= flush_slot->first_free);
+ ut_ad(writes_pending);
+ if (!--writes_pending)
+ pthread_cond_broadcast(&write_cond);
- if (!--flush_slot->reserved)
+ if (with_doublewrite)
{
- mysql_mutex_unlock(&mutex);
- /* This will finish the batch. Sync data files to the disk. */
- fil_flush_file_spaces();
- mysql_mutex_lock(&mutex);
+ ut_ad(is_created());
+ ut_ad(srv_use_doublewrite_buf);
+ ut_ad(batch_running);
+ slot *flush_slot= active_slot == &slots[0] ? &slots[1] : &slots[0];
+ ut_ad(flush_slot->reserved);
+ ut_ad(flush_slot->reserved <= flush_slot->first_free);
+
+ if (!--flush_slot->reserved)
+ {
+ mysql_mutex_unlock(&mutex);
+ /* This will finish the batch. Sync data files to the disk. */
+ fil_flush_file_spaces();
+ mysql_mutex_lock(&mutex);
- /* We can now reuse the doublewrite memory buffer: */
- flush_slot->first_free= 0;
- batch_running= false;
- pthread_cond_broadcast(&cond);
+ /* We can now reuse the doublewrite memory buffer: */
+ flush_slot->first_free= 0;
+ batch_running= false;
+ pthread_cond_broadcast(&cond);
+ }
}
mysql_mutex_unlock(&mutex);
@@ -642,7 +655,7 @@ void buf_dblwr_t::flush_buffered_writes_completed(const IORequest &request)
{
ut_ad(this == &buf_dblwr);
ut_ad(srv_use_doublewrite_buf);
- ut_ad(is_initialised());
+ ut_ad(is_created());
ut_ad(!srv_read_only_mode);
ut_ad(!request.bpage);
ut_ad(request.node == fil_system.sys_space->chain.start);
@@ -708,7 +721,7 @@ posted, and also when we may have to wait for a page latch!
Otherwise a deadlock of threads can occur. */
void buf_dblwr_t::flush_buffered_writes()
{
- if (!is_initialised() || !srv_use_doublewrite_buf)
+ if (!is_created() || !srv_use_doublewrite_buf)
{
fil_flush_file_spaces();
return;
@@ -741,6 +754,7 @@ void buf_dblwr_t::add_to_batch(const IORequest &request, size_t size)
const ulint buf_size= 2 * block_size();
mysql_mutex_lock(&mutex);
+ writes_pending++;
for (;;)
{
diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
index 70e1595e00e..326636e0c4d 100644
--- a/storage/innobase/buf/buf0flu.cc
+++ b/storage/innobase/buf/buf0flu.cc
@@ -47,15 +47,12 @@ Created 11/11/1995 Heikki Tuuri
#endif
/** Number of pages flushed via LRU. Protected by buf_pool.mutex.
-Also included in buf_flush_page_count. */
+Also included in buf_pool.stat.n_pages_written. */
ulint buf_lru_flush_page_count;
/** Number of pages freed without flushing. Protected by buf_pool.mutex. */
ulint buf_lru_freed_page_count;
-/** Number of pages flushed. Protected by buf_pool.mutex. */
-ulint buf_flush_page_count;
-
/** Flag indicating if the page_cleaner is in active state. */
Atomic_relaxed<bool> buf_page_cleaner_is_active;
@@ -115,8 +112,7 @@ static void buf_flush_validate_skip()
}
#endif /* UNIV_DEBUG */
-/** Wake up the page cleaner if needed */
-void buf_pool_t::page_cleaner_wakeup()
+void buf_pool_t::page_cleaner_wakeup(bool for_LRU)
{
if (!page_cleaner_idle())
return;
@@ -149,11 +145,12 @@ void buf_pool_t::page_cleaner_wakeup()
- by allowing last_activity_count to updated when page-cleaner is made
active and has work to do. This ensures that the last_activity signal
is consumed by the page-cleaner before the next one is generated. */
- if ((pct_lwm != 0.0 && pct_lwm <= dirty_pct) ||
- (pct_lwm != 0.0 && last_activity_count == srv_get_activity_count()) ||
+ if (for_LRU ||
+ (pct_lwm != 0.0 && (pct_lwm <= dirty_pct ||
+ last_activity_count == srv_get_activity_count())) ||
srv_max_buf_pool_modified_pct <= dirty_pct)
{
- page_cleaner_is_idle= false;
+ page_cleaner_status-= PAGE_CLEANER_IDLE;
pthread_cond_signal(&do_flush_list);
}
}
@@ -183,8 +180,8 @@ void buf_pool_t::insert_into_flush_list(buf_block_t *block, lsn_t lsn)
delete_from_flush_list_low(&block->page);
}
else
- stat.flush_list_bytes+= block->physical_size();
- ut_ad(stat.flush_list_bytes <= curr_pool_size);
+ flush_list_bytes+= block->physical_size();
+ ut_ad(flush_list_bytes <= curr_pool_size);
block->page.set_oldest_modification(lsn);
MEM_CHECK_DEFINED(block->page.zip.data
@@ -197,14 +194,12 @@ void buf_pool_t::insert_into_flush_list(buf_block_t *block, lsn_t lsn)
}
/** Remove a block from flush_list.
-@param bpage buffer pool page
-@param clear whether to invoke buf_page_t::clear_oldest_modification() */
-void buf_pool_t::delete_from_flush_list(buf_page_t *bpage, bool clear)
+@param bpage buffer pool page */
+void buf_pool_t::delete_from_flush_list(buf_page_t *bpage)
{
delete_from_flush_list_low(bpage);
- stat.flush_list_bytes-= bpage->physical_size();
- if (clear)
- bpage->clear_oldest_modification();
+ flush_list_bytes-= bpage->physical_size();
+ bpage->clear_oldest_modification();
#ifdef UNIV_DEBUG
buf_flush_validate_skip();
#endif /* UNIV_DEBUG */
@@ -219,10 +214,10 @@ void buf_flush_remove_pages(ulint id)
{
const page_id_t first(id, 0), end(id + 1, 0);
ut_ad(id);
- mysql_mutex_lock(&buf_pool.mutex);
for (;;)
{
+ mysql_mutex_lock(&buf_pool.mutex);
bool deferred= false;
mysql_mutex_lock(&buf_pool.flush_list_mutex);
@@ -245,18 +240,14 @@ void buf_flush_remove_pages(ulint id)
bpage= prev;
}
+ mysql_mutex_unlock(&buf_pool.mutex);
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
if (!deferred)
break;
- mysql_mutex_unlock(&buf_pool.mutex);
- std::this_thread::yield();
- mysql_mutex_lock(&buf_pool.mutex);
- buf_flush_wait_batch_end(false);
+ buf_dblwr.wait_for_page_writes();
}
-
- mysql_mutex_unlock(&buf_pool.mutex);
}
/*******************************************************************//**
@@ -301,7 +292,7 @@ buf_flush_relocate_on_flush_list(
bpage->clear_oldest_modification();
if (lsn == 1) {
- buf_pool.stat.flush_list_bytes -= dpage->physical_size();
+ buf_pool.flush_list_bytes -= dpage->physical_size();
dpage->list.prev = nullptr;
dpage->list.next = nullptr;
dpage->clear_oldest_modification();
@@ -341,6 +332,21 @@ inline void buf_page_t::write_complete(bool temporary)
lock.u_unlock(true);
}
+inline void buf_pool_t::n_flush_inc()
+{
+ mysql_mutex_assert_owner(&flush_list_mutex);
+ page_cleaner_status+= LRU_FLUSH;
+}
+
+inline void buf_pool_t::n_flush_dec()
+{
+ mysql_mutex_lock(&flush_list_mutex);
+ ut_ad(page_cleaner_status >= LRU_FLUSH);
+ if ((page_cleaner_status-= LRU_FLUSH) < LRU_FLUSH)
+ pthread_cond_broadcast(&done_flush_LRU);
+ mysql_mutex_unlock(&flush_list_mutex);
+}
+
/** Complete write of a file page from buf_pool.
@param request write request */
void buf_page_write_complete(const IORequest &request)
@@ -356,13 +362,6 @@ void buf_page_write_complete(const IORequest &request)
ut_ad(!buf_dblwr.is_inside(bpage->id()));
ut_ad(request.node->space->id == bpage->id().space());
- if (state < buf_page_t::WRITE_FIX_REINIT &&
- request.node->space->use_doublewrite())
- {
- ut_ad(request.node->space != fil_system.temp_space);
- buf_dblwr.write_completed();
- }
-
if (request.slot)
request.slot->release();
@@ -370,32 +369,31 @@ void buf_page_write_complete(const IORequest &request)
buf_page_monitor(*bpage, false);
DBUG_PRINT("ib_buf", ("write page %u:%u",
bpage->id().space(), bpage->id().page_no()));
- const bool temp= fsp_is_system_temporary(bpage->id().space());
- mysql_mutex_lock(&buf_pool.mutex);
+ mysql_mutex_assert_not_owner(&buf_pool.mutex);
mysql_mutex_assert_not_owner(&buf_pool.flush_list_mutex);
- buf_pool.stat.n_pages_written++;
- bpage->write_complete(temp);
if (request.is_LRU())
{
+ const bool temp= bpage->oldest_modification() == 2;
+ if (!temp)
+ buf_dblwr.write_completed(state < buf_page_t::WRITE_FIX_REINIT &&
+ request.node->space->use_doublewrite());
+ /* We must hold buf_pool.mutex while releasing the block, so that
+ no other thread can access it before we have freed it. */
+ mysql_mutex_lock(&buf_pool.mutex);
+ bpage->write_complete(temp);
buf_LRU_free_page(bpage, true);
- buf_pool.try_LRU_scan= true;
- pthread_cond_signal(&buf_pool.done_free);
+ mysql_mutex_unlock(&buf_pool.mutex);
- ut_ad(buf_pool.n_flush_LRU_);
- if (!--buf_pool.n_flush_LRU_)
- pthread_cond_broadcast(&buf_pool.done_flush_LRU);
+ buf_pool.n_flush_dec();
}
else
{
- ut_ad(!temp);
- ut_ad(buf_pool.n_flush_list_);
- if (!--buf_pool.n_flush_list_)
- pthread_cond_broadcast(&buf_pool.done_flush_list);
+ buf_dblwr.write_completed(state < buf_page_t::WRITE_FIX_REINIT &&
+ request.node->space->use_doublewrite());
+ bpage->write_complete(false);
}
-
- mysql_mutex_unlock(&buf_pool.mutex);
}
/** Calculate a ROW_FORMAT=COMPRESSED page checksum and update the page.
@@ -739,43 +737,41 @@ not_compressed:
}
/** Free a page whose underlying file page has been freed. */
-inline void buf_pool_t::release_freed_page(buf_page_t *bpage)
+ATTRIBUTE_COLD void buf_pool_t::release_freed_page(buf_page_t *bpage)
{
mysql_mutex_assert_owner(&mutex);
- mysql_mutex_lock(&flush_list_mutex);
ut_d(const lsn_t oldest_modification= bpage->oldest_modification();)
if (fsp_is_system_temporary(bpage->id().space()))
{
ut_ad(bpage->frame);
ut_ad(oldest_modification == 2);
+ bpage->clear_oldest_modification();
}
else
{
+ mysql_mutex_lock(&flush_list_mutex);
ut_ad(oldest_modification > 2);
- delete_from_flush_list(bpage, false);
+ delete_from_flush_list(bpage);
+ mysql_mutex_unlock(&flush_list_mutex);
}
- bpage->clear_oldest_modification();
- mysql_mutex_unlock(&flush_list_mutex);
- bpage->lock.u_unlock(true);
+ bpage->lock.u_unlock(true);
buf_LRU_free_page(bpage, true);
}
-/** Write a flushable page to a file. buf_pool.mutex must be held.
+/** Write a flushable page to a file or free a freeable block.
@param evict whether to evict the page on write completion
@param space tablespace
-@return whether the page was flushed and buf_pool.mutex was released */
-inline bool buf_page_t::flush(bool evict, fil_space_t *space)
+@return whether a page write was initiated and buf_pool.mutex released */
+bool buf_page_t::flush(bool evict, fil_space_t *space)
{
+ mysql_mutex_assert_not_owner(&buf_pool.flush_list_mutex);
ut_ad(in_file());
ut_ad(in_LRU_list);
ut_ad((space->purpose == FIL_TYPE_TEMPORARY) ==
(space == fil_system.temp_space));
- ut_ad(space->referenced());
ut_ad(evict || space != fil_system.temp_space);
-
- if (!lock.u_lock_try(true))
- return false;
+ ut_ad(space->referenced());
const auto s= state();
ut_a(s >= FREED);
@@ -783,18 +779,29 @@ inline bool buf_page_t::flush(bool evict, fil_space_t *space)
if (s < UNFIXED)
{
buf_pool.release_freed_page(this);
- mysql_mutex_unlock(&buf_pool.mutex);
- return true;
+ return false;
}
- if (s >= READ_FIX || oldest_modification() < 2)
+ ut_d(const auto f=) zip.fix.fetch_add(WRITE_FIX - UNFIXED);
+ ut_ad(f >= UNFIXED);
+ ut_ad(f < READ_FIX);
+ ut_ad((space == fil_system.temp_space)
+ ? oldest_modification() == 2
+ : oldest_modification() > 2);
+
+ /* Increment the I/O operation count used for selecting LRU policy. */
+ buf_LRU_stat_inc_io();
+ mysql_mutex_unlock(&buf_pool.mutex);
+
+ IORequest::Type type= IORequest::WRITE_ASYNC;
+ if (UNIV_UNLIKELY(evict))
{
- lock.u_unlock(true);
- return false;
+ type= IORequest::WRITE_LRU;
+ mysql_mutex_lock(&buf_pool.flush_list_mutex);
+ buf_pool.n_flush_inc();
+ mysql_mutex_unlock(&buf_pool.flush_list_mutex);
}
- mysql_mutex_assert_not_owner(&buf_pool.flush_list_mutex);
-
/* Apart from the U-lock, this block will also be protected by
is_write_fixed() and oldest_modification()>1.
Thus, it cannot be relocated or removed. */
@@ -802,25 +809,6 @@ inline bool buf_page_t::flush(bool evict, fil_space_t *space)
DBUG_PRINT("ib_buf", ("%s %u page %u:%u",
evict ? "LRU" : "flush_list",
id().space(), id().page_no()));
- ut_d(const auto f=) zip.fix.fetch_add(WRITE_FIX - UNFIXED);
- ut_ad(f >= UNFIXED);
- ut_ad(f < READ_FIX);
- ut_ad(space == fil_system.temp_space
- ? oldest_modification() == 2
- : oldest_modification() > 2);
- if (evict)
- {
- ut_ad(buf_pool.n_flush_LRU_ < ULINT_UNDEFINED);
- buf_pool.n_flush_LRU_++;
- }
- else
- {
- ut_ad(buf_pool.n_flush_list_ < ULINT_UNDEFINED);
- buf_pool.n_flush_list_++;
- }
- buf_flush_page_count++;
-
- mysql_mutex_unlock(&buf_pool.mutex);
buf_block_t *block= reinterpret_cast<buf_block_t*>(this);
page_t *write_frame= zip.data;
@@ -830,7 +818,6 @@ inline bool buf_page_t::flush(bool evict, fil_space_t *space)
#if defined HAVE_FALLOC_PUNCH_HOLE_AND_KEEP_SIZE || defined _WIN32
size_t orig_size;
#endif
- IORequest::Type type= evict ? IORequest::WRITE_LRU : IORequest::WRITE_ASYNC;
buf_tmp_buffer_t *slot= nullptr;
if (UNIV_UNLIKELY(!frame)) /* ROW_FORMAT=COMPRESSED */
@@ -874,7 +861,10 @@ inline bool buf_page_t::flush(bool evict, fil_space_t *space)
{
switch (space->chain.start->punch_hole) {
case 1:
- type= evict ? IORequest::PUNCH_LRU : IORequest::PUNCH;
+ static_assert(IORequest::PUNCH_LRU - IORequest::PUNCH ==
+ IORequest::WRITE_LRU - IORequest::WRITE_ASYNC, "");
+ type=
+ IORequest::Type(type + (IORequest::PUNCH - IORequest::WRITE_ASYNC));
break;
case 2:
size= orig_size;
@@ -896,15 +886,14 @@ inline bool buf_page_t::flush(bool evict, fil_space_t *space)
if (lsn > log_sys.get_flushed_lsn())
log_write_up_to(lsn, true);
}
+ if (UNIV_LIKELY(space->purpose != FIL_TYPE_TEMPORARY))
+ buf_dblwr.add_unbuffered();
space->io(IORequest{type, this, slot}, physical_offset(), size,
write_frame, this);
}
else
buf_dblwr.add_to_batch(IORequest{this, slot, space->chain.start, type},
size);
-
- /* Increment the I/O operation count used for selecting LRU policy. */
- buf_LRU_stat_inc_io();
return true;
}
@@ -931,7 +920,7 @@ static bool buf_flush_check_neighbor(const page_id_t id, ulint fold,
if (evict && !bpage->is_old())
return false;
- return bpage->oldest_modification() > 1 && bpage->ready_for_flush();
+ return bpage->oldest_modification() > 1 && !bpage->is_io_fixed();
}
/** Check which neighbors of a page can be flushed from the buf_pool.
@@ -1058,6 +1047,7 @@ uint32_t fil_space_t::flush_freed(bool writable)
and also write zeroes or punch the hole for the freed ranges of pages.
@param space tablespace
@param page_id page identifier
+@param bpage buffer page
@param contiguous whether to consider contiguous areas of pages
@param evict true=buf_pool.LRU; false=buf_pool.flush_list
@param n_flushed number of pages flushed so far in this batch
@@ -1065,10 +1055,12 @@ and also write zeroes or punch the hole for the freed ranges of pages.
@return number of pages flushed */
static ulint buf_flush_try_neighbors(fil_space_t *space,
const page_id_t page_id,
+ buf_page_t *bpage,
bool contiguous, bool evict,
ulint n_flushed, ulint n_to_flush)
{
ut_ad(space->id == page_id.space());
+ ut_ad(bpage->id() == page_id);
ulint count= 0;
page_id_t id= page_id;
@@ -1077,9 +1069,15 @@ static ulint buf_flush_try_neighbors(fil_space_t *space,
ut_ad(page_id >= id);
ut_ad(page_id < high);
- for (ulint id_fold= id.fold(); id < high && !space->is_stopping();
- ++id, ++id_fold)
+ for (ulint id_fold= id.fold(); id < high; ++id, ++id_fold)
{
+ if (UNIV_UNLIKELY(space->is_stopping()))
+ {
+ if (bpage)
+ bpage->lock.u_unlock(true);
+ break;
+ }
+
if (count + n_flushed >= n_to_flush)
{
if (id > page_id)
@@ -1093,26 +1091,39 @@ static ulint buf_flush_try_neighbors(fil_space_t *space,
const buf_pool_t::hash_chain &chain= buf_pool.page_hash.cell_get(id_fold);
mysql_mutex_lock(&buf_pool.mutex);
- if (buf_page_t *bpage= buf_pool.page_hash.get(id, chain))
+ if (buf_page_t *b= buf_pool.page_hash.get(id, chain))
{
- ut_ad(bpage->in_file());
- /* We avoid flushing 'non-old' blocks in an eviction flush,
- because the flushed blocks are soon freed */
- if (!evict || id == page_id || bpage->is_old())
+ ut_ad(b->in_file());
+ if (id == page_id)
{
- if (!buf_pool.watch_is_sentinel(*bpage) &&
- bpage->oldest_modification() > 1 && bpage->ready_for_flush() &&
- bpage->flush(evict, space))
+ ut_ad(bpage == b);
+ bpage= nullptr;
+ ut_ad(!buf_pool.watch_is_sentinel(*b));
+ ut_ad(b->oldest_modification() > 1);
+ flush:
+ if (b->flush(evict, space))
{
++count;
continue;
}
}
+ /* We avoid flushing 'non-old' blocks in an eviction flush,
+ because the flushed blocks are soon freed */
+ else if ((!evict || b->is_old()) && !buf_pool.watch_is_sentinel(*b) &&
+ b->oldest_modification() > 1 && b->lock.u_lock_try(true))
+ {
+ if (b->oldest_modification() < 2)
+ b->lock.u_unlock(true);
+ else
+ goto flush;
+ }
}
mysql_mutex_unlock(&buf_pool.mutex);
}
+ ut_ad(!bpage);
+
if (auto n= count - 1)
{
MONITOR_INC_VALUE_CUMULATIVE(MONITOR_FLUSH_NEIGHBOR_TOTAL_PAGE,
@@ -1185,27 +1196,20 @@ struct flush_counters_t
ulint evicted;
};
-/** Try to discard a dirty page.
+/** Discard a dirty page, and release buf_pool.flush_list_mutex.
@param bpage dirty page whose tablespace is not accessible */
static void buf_flush_discard_page(buf_page_t *bpage)
{
- mysql_mutex_assert_owner(&buf_pool.mutex);
- mysql_mutex_assert_not_owner(&buf_pool.flush_list_mutex);
ut_ad(bpage->in_file());
ut_ad(bpage->oldest_modification());
- if (!bpage->lock.u_lock_try(false))
- return;
-
- mysql_mutex_lock(&buf_pool.flush_list_mutex);
buf_pool.delete_from_flush_list(bpage);
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
ut_d(const auto state= bpage->state());
ut_ad(state == buf_page_t::FREED || state == buf_page_t::UNFIXED ||
state == buf_page_t::IBUF_EXIST || state == buf_page_t::REINIT);
- bpage->lock.u_unlock();
-
+ bpage->lock.u_unlock(true);
buf_LRU_free_page(bpage, true);
}
@@ -1227,7 +1231,6 @@ static void buf_flush_LRU_list_batch(ulint max, bool evict,
const auto neighbors= UT_LIST_GET_LEN(buf_pool.LRU) < BUF_LRU_OLD_MIN_LEN
? 0 : srv_flush_neighbors;
fil_space_t *space= nullptr;
- bool do_evict= evict;
uint32_t last_space_id= FIL_NULL;
static_assert(FIL_NULL > SRV_TMP_SPACE_ID, "consistency");
static_assert(FIL_NULL > SRV_SPACE_ID_UPPER_BOUND, "consistency");
@@ -1236,27 +1239,47 @@ static void buf_flush_LRU_list_batch(ulint max, bool evict,
bpage &&
((UT_LIST_GET_LEN(buf_pool.LRU) > BUF_LRU_MIN_LEN &&
UT_LIST_GET_LEN(buf_pool.free) < free_limit) ||
- recv_recovery_is_on()); ++scanned)
+ recv_recovery_is_on());
+ ++scanned, bpage= buf_pool.lru_hp.get())
{
buf_page_t *prev= UT_LIST_GET_PREV(LRU, bpage);
- const lsn_t oldest_modification= bpage->oldest_modification();
buf_pool.lru_hp.set(prev);
- const auto state= bpage->state();
+ auto state= bpage->state();
ut_ad(state >= buf_page_t::FREED);
ut_ad(bpage->in_LRU_list);
- if (oldest_modification <= 1)
- {
+ switch (bpage->oldest_modification()) {
+ case 0:
+ evict:
if (state != buf_page_t::FREED &&
(state >= buf_page_t::READ_FIX || (~buf_page_t::LRU_MASK & state)))
- goto must_skip;
- if (buf_LRU_free_page(bpage, true))
- ++n->evicted;
+ continue;
+ buf_LRU_free_page(bpage, true);
+ ++n->evicted;
+ /* fall through */
+ case 1:
+ continue;
}
- else if (state < buf_page_t::READ_FIX)
+
+ if (state < buf_page_t::READ_FIX && bpage->lock.u_lock_try(true))
{
+ ut_ad(!bpage->is_io_fixed());
+ bool do_evict= evict;
+ switch (bpage->oldest_modification()) {
+ case 1:
+ mysql_mutex_lock(&buf_pool.flush_list_mutex);
+ buf_pool.delete_from_flush_list(bpage);
+ mysql_mutex_unlock(&buf_pool.flush_list_mutex);
+ /* fall through */
+ case 0:
+ bpage->lock.u_unlock(true);
+ goto evict;
+ case 2:
+ /* LRU flushing will always evict pages of the temporary tablespace. */
+ do_evict= true;
+ }
/* Block is ready for flush. Dispatch an IO request.
- If evict=true, the page will be evicted by buf_page_write_complete(). */
+ If do_evict, the page may be evicted by buf_page_write_complete(). */
const page_id_t page_id(bpage->id());
const uint32_t space_id= page_id.space();
if (!space || space->id != space_id)
@@ -1269,14 +1292,10 @@ static void buf_flush_LRU_list_batch(ulint max, bool evict,
space->release();
auto p= buf_flush_space(space_id);
space= p.first;
- /* For the temporary tablespace, LRU flushing will always
- evict pages upon completing the write. */
- do_evict= evict || space == fil_system.temp_space;
last_space_id= space_id;
mysql_mutex_lock(&buf_pool.mutex);
if (p.second)
buf_pool.stat.n_pages_written+= p.second;
- goto retry;
}
else
ut_ad(!space);
@@ -1288,17 +1307,24 @@ static void buf_flush_LRU_list_batch(ulint max, bool evict,
}
if (!space)
+ {
+ mysql_mutex_lock(&buf_pool.flush_list_mutex);
buf_flush_discard_page(bpage);
+ }
else if (neighbors && space->is_rotational())
{
mysql_mutex_unlock(&buf_pool.mutex);
- n->flushed+= buf_flush_try_neighbors(space, page_id, neighbors == 1,
+ n->flushed+= buf_flush_try_neighbors(space, page_id, bpage,
+ neighbors == 1,
do_evict, n->flushed, max);
reacquire_mutex:
mysql_mutex_lock(&buf_pool.mutex);
}
else if (n->flushed >= max && !recv_recovery_is_on())
+ {
+ bpage->lock.u_unlock(true);
break;
+ }
else if (bpage->flush(do_evict, space))
{
++n->flushed;
@@ -1306,11 +1332,8 @@ reacquire_mutex:
}
}
else
- must_skip:
/* Can't evict or dispatch this block. Go to previous. */
ut_ad(buf_pool.lru_hp.is_hp(prev));
- retry:
- bpage= buf_pool.lru_hp.get();
}
buf_pool.lru_hp.set(nullptr);
@@ -1341,6 +1364,7 @@ static void buf_do_LRU_batch(ulint max, bool evict, flush_counters_t *n)
mysql_mutex_assert_owner(&buf_pool.mutex);
buf_lru_freed_page_count+= n->evicted;
buf_lru_flush_page_count+= n->flushed;
+ buf_pool.stat.n_pages_written+= n->flushed;
}
/** This utility flushes dirty blocks from the end of the flush_list.
@@ -1354,6 +1378,7 @@ static ulint buf_do_flush_list_batch(ulint max_n, lsn_t lsn)
ulint scanned= 0;
mysql_mutex_assert_owner(&buf_pool.mutex);
+ mysql_mutex_assert_owner(&buf_pool.flush_list_mutex);
const auto neighbors= UT_LIST_GET_LEN(buf_pool.LRU) < BUF_LRU_OLD_MIN_LEN
? 0 : srv_flush_neighbors;
@@ -1364,7 +1389,6 @@ static ulint buf_do_flush_list_batch(ulint max_n, lsn_t lsn)
/* Start from the end of the list looking for a suitable block to be
flushed. */
- mysql_mutex_lock(&buf_pool.flush_list_mutex);
ulint len= UT_LIST_GET_LEN(buf_pool.flush_list);
for (buf_page_t *bpage= UT_LIST_GET_LAST(buf_pool.flush_list);
@@ -1375,32 +1399,42 @@ static ulint buf_do_flush_list_batch(ulint max_n, lsn_t lsn)
break;
ut_ad(bpage->in_file());
- buf_page_t *prev= UT_LIST_GET_PREV(list, bpage);
-
- if (oldest_modification == 1)
{
- buf_pool.delete_from_flush_list(bpage);
- skip:
- bpage= prev;
- continue;
- }
+ buf_page_t *prev= UT_LIST_GET_PREV(list, bpage);
- ut_ad(oldest_modification > 2);
+ if (oldest_modification == 1)
+ {
+ clear:
+ buf_pool.delete_from_flush_list(bpage);
+ skip:
+ bpage= prev;
+ continue;
+ }
- if (!bpage->ready_for_flush())
- goto skip;
+ ut_ad(oldest_modification > 2);
- /* In order not to degenerate this scan to O(n*n) we attempt to
- preserve the pointer position. Any thread that would remove 'prev'
- from buf_pool.flush_list must adjust the hazard pointer.
+ if (!bpage->lock.u_lock_try(true))
+ goto skip;
- Note: A concurrent execution of buf_flush_list_space() may
- terminate this scan prematurely. The buf_pool.n_flush_list_
- should prevent multiple threads from executing
- buf_do_flush_list_batch() concurrently,
- but buf_flush_list_space() is ignoring that. */
- buf_pool.flush_hp.set(prev);
- mysql_mutex_unlock(&buf_pool.flush_list_mutex);
+ ut_ad(!bpage->is_io_fixed());
+
+ if (bpage->oldest_modification() == 1)
+ {
+ bpage->lock.u_unlock(true);
+ goto clear;
+ }
+
+ /* In order not to degenerate this scan to O(n*n) we attempt to
+ preserve the pointer position. Any thread that would remove 'prev'
+ from buf_pool.flush_list must adjust the hazard pointer.
+
+ Note: A concurrent execution of buf_flush_list_space() may
+ terminate this scan prematurely. The buf_pool.flush_list_active
+ should prevent multiple threads from executing
+ buf_do_flush_list_batch() concurrently,
+ but buf_flush_list_space() is ignoring that. */
+ buf_pool.flush_hp.set(prev);
+ }
const page_id_t page_id(bpage->id());
const uint32_t space_id= page_id.space();
@@ -1408,8 +1442,6 @@ static ulint buf_do_flush_list_batch(ulint max_n, lsn_t lsn)
{
if (last_space_id != space_id)
{
- mysql_mutex_lock(&buf_pool.flush_list_mutex);
- buf_pool.flush_hp.set(bpage);
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
mysql_mutex_unlock(&buf_pool.mutex);
if (space)
@@ -1418,18 +1450,8 @@ static ulint buf_do_flush_list_batch(ulint max_n, lsn_t lsn)
space= p.first;
last_space_id= space_id;
mysql_mutex_lock(&buf_pool.mutex);
- if (p.second)
- buf_pool.stat.n_pages_written+= p.second;
+ buf_pool.stat.n_pages_written+= p.second;
mysql_mutex_lock(&buf_pool.flush_list_mutex);
- bpage= buf_pool.flush_hp.get();
- if (!bpage)
- break;
- if (bpage->id() != page_id)
- continue;
- buf_pool.flush_hp.set(UT_LIST_GET_PREV(list, bpage));
- if (bpage->oldest_modification() <= 1 || !bpage->ready_for_flush())
- goto next;
- mysql_mutex_unlock(&buf_pool.flush_list_mutex);
}
else
ut_ad(!space);
@@ -1442,27 +1464,29 @@ static ulint buf_do_flush_list_batch(ulint max_n, lsn_t lsn)
if (!space)
buf_flush_discard_page(bpage);
- else if (neighbors && space->is_rotational())
- {
- mysql_mutex_unlock(&buf_pool.mutex);
- count+= buf_flush_try_neighbors(space, page_id, neighbors == 1,
- false, count, max_n);
- reacquire_mutex:
- mysql_mutex_lock(&buf_pool.mutex);
- }
- else if (bpage->flush(false, space))
+ else
{
- ++count;
- goto reacquire_mutex;
+ mysql_mutex_unlock(&buf_pool.flush_list_mutex);
+ if (neighbors && space->is_rotational())
+ {
+ mysql_mutex_unlock(&buf_pool.mutex);
+ count+= buf_flush_try_neighbors(space, page_id, bpage, neighbors == 1,
+ false, count, max_n);
+ reacquire_mutex:
+ mysql_mutex_lock(&buf_pool.mutex);
+ }
+ else if (bpage->flush(false, space))
+ {
+ ++count;
+ goto reacquire_mutex;
+ }
}
mysql_mutex_lock(&buf_pool.flush_list_mutex);
- next:
bpage= buf_pool.flush_hp.get();
}
buf_pool.flush_hp.set(nullptr);
- mysql_mutex_unlock(&buf_pool.flush_list_mutex);
if (space)
space->release();
@@ -1472,32 +1496,25 @@ static ulint buf_do_flush_list_batch(ulint max_n, lsn_t lsn)
MONITOR_FLUSH_BATCH_SCANNED_NUM_CALL,
MONITOR_FLUSH_BATCH_SCANNED_PER_CALL,
scanned);
- if (count)
- MONITOR_INC_VALUE_CUMULATIVE(MONITOR_FLUSH_BATCH_TOTAL_PAGE,
- MONITOR_FLUSH_BATCH_COUNT,
- MONITOR_FLUSH_BATCH_PAGES,
- count);
- mysql_mutex_assert_owner(&buf_pool.mutex);
return count;
}
-/** Wait until a flush batch ends.
-@param lru true=buf_pool.LRU; false=buf_pool.flush_list */
-void buf_flush_wait_batch_end(bool lru)
+/** Wait until a LRU flush batch ends. */
+void buf_flush_wait_LRU_batch_end()
{
- const auto &n_flush= lru ? buf_pool.n_flush_LRU_ : buf_pool.n_flush_list_;
+ mysql_mutex_assert_owner(&buf_pool.flush_list_mutex);
+ mysql_mutex_assert_not_owner(&buf_pool.mutex);
- if (n_flush)
+ if (buf_pool.n_flush())
{
- auto cond= lru ? &buf_pool.done_flush_LRU : &buf_pool.done_flush_list;
tpool::tpool_wait_begin();
thd_wait_begin(nullptr, THD_WAIT_DISKIO);
do
- my_cond_wait(cond, &buf_pool.mutex.m_mutex);
- while (n_flush);
+ my_cond_wait(&buf_pool.done_flush_LRU,
+ &buf_pool.flush_list_mutex.m_mutex);
+ while (buf_pool.n_flush());
tpool::tpool_wait_end();
thd_wait_end(nullptr);
- pthread_cond_broadcast(cond);
}
}
@@ -1514,21 +1531,31 @@ static ulint buf_flush_list_holding_mutex(ulint max_n= ULINT_UNDEFINED,
ut_ad(lsn);
mysql_mutex_assert_owner(&buf_pool.mutex);
- if (buf_pool.n_flush_list_)
+ mysql_mutex_lock(&buf_pool.flush_list_mutex);
+ if (buf_pool.flush_list_active())
+ {
+nothing_to_do:
+ mysql_mutex_unlock(&buf_pool.flush_list_mutex);
return 0;
-
- /* FIXME: we are performing a dirty read of buf_pool.flush_list.count
- while not holding buf_pool.flush_list_mutex */
- if (!UT_LIST_GET_LEN(buf_pool.flush_list))
+ }
+ if (!buf_pool.get_oldest_modification(0))
{
pthread_cond_broadcast(&buf_pool.done_flush_list);
- return 0;
+ goto nothing_to_do;
}
-
- buf_pool.n_flush_list_++;
+ buf_pool.flush_list_set_active();
const ulint n_flushed= buf_do_flush_list_batch(max_n, lsn);
- if (!--buf_pool.n_flush_list_)
- pthread_cond_broadcast(&buf_pool.done_flush_list);
+ if (n_flushed)
+ buf_pool.stat.n_pages_written+= n_flushed;
+ buf_pool.flush_list_set_inactive();
+ pthread_cond_broadcast(&buf_pool.done_flush_list);
+ mysql_mutex_unlock(&buf_pool.flush_list_mutex);
+
+ if (n_flushed)
+ MONITOR_INC_VALUE_CUMULATIVE(MONITOR_FLUSH_BATCH_TOTAL_PAGE,
+ MONITOR_FLUSH_BATCH_COUNT,
+ MONITOR_FLUSH_BATCH_PAGES,
+ n_flushed);
DBUG_PRINT("ib_buf", ("flush_list completed, " ULINTPF " pages", n_flushed));
return n_flushed;
@@ -1560,6 +1587,7 @@ bool buf_flush_list_space(fil_space_t *space, ulint *n_flushed)
bool may_have_skipped= false;
ulint max_n_flush= srv_io_capacity;
+ ulint n_flush= 0;
bool acquired= space->acquire();
{
@@ -1576,11 +1604,17 @@ bool buf_flush_list_space(fil_space_t *space, ulint *n_flushed)
ut_ad(bpage->in_file());
buf_page_t *prev= UT_LIST_GET_PREV(list, bpage);
- if (bpage->id().space() != space_id);
- else if (bpage->oldest_modification() == 1)
+ if (bpage->oldest_modification() == 1)
+ clear:
buf_pool.delete_from_flush_list(bpage);
- else if (!bpage->ready_for_flush())
+ else if (bpage->id().space() != space_id);
+ else if (!bpage->lock.u_lock_try(true))
may_have_skipped= true;
+ else if (bpage->oldest_modification() == 1)
+ {
+ bpage->lock.u_unlock(true);
+ goto clear;
+ }
else
{
/* In order not to degenerate this scan to O(n*n) we attempt to
@@ -1592,13 +1626,10 @@ bool buf_flush_list_space(fil_space_t *space, ulint *n_flushed)
concurrently. This may terminate our iteration prematurely,
leading us to return may_have_skipped=true. */
buf_pool.flush_hp.set(prev);
- mysql_mutex_unlock(&buf_pool.flush_list_mutex);
if (!acquired)
- {
was_freed:
buf_flush_discard_page(bpage);
- }
else
{
if (space->is_stopping())
@@ -1607,28 +1638,24 @@ bool buf_flush_list_space(fil_space_t *space, ulint *n_flushed)
acquired= false;
goto was_freed;
}
- if (!bpage->flush(false, space))
- {
- may_have_skipped= true;
- mysql_mutex_lock(&buf_pool.flush_list_mutex);
- goto next_after_skip;
- }
- if (n_flushed)
- ++*n_flushed;
- if (!--max_n_flush)
+ mysql_mutex_unlock(&buf_pool.flush_list_mutex);
+ if (bpage->flush(false, space))
{
+ ++n_flush;
+ if (!--max_n_flush)
+ {
+ mysql_mutex_lock(&buf_pool.mutex);
+ mysql_mutex_lock(&buf_pool.flush_list_mutex);
+ may_have_skipped= true;
+ goto done;
+ }
mysql_mutex_lock(&buf_pool.mutex);
- mysql_mutex_lock(&buf_pool.flush_list_mutex);
- may_have_skipped= true;
- break;
}
- mysql_mutex_lock(&buf_pool.mutex);
}
mysql_mutex_lock(&buf_pool.flush_list_mutex);
if (!buf_pool.flush_hp.is_hp(prev))
may_have_skipped= true;
- next_after_skip:
bpage= buf_pool.flush_hp.get();
continue;
}
@@ -1641,14 +1668,19 @@ bool buf_flush_list_space(fil_space_t *space, ulint *n_flushed)
buf_flush_list_space(). We should always return true from
buf_flush_list_space() if that should be the case; in
buf_do_flush_list_batch() we will simply perform less work. */
-
+done:
buf_pool.flush_hp.set(nullptr);
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
buf_pool.try_LRU_scan= true;
pthread_cond_broadcast(&buf_pool.done_free);
+
+ buf_pool.stat.n_pages_written+= n_flush;
mysql_mutex_unlock(&buf_pool.mutex);
+ if (n_flushed)
+ *n_flushed= n_flush;
+
if (acquired)
space->release();
@@ -1672,29 +1704,20 @@ ulint buf_flush_LRU(ulint max_n, bool evict)
{
mysql_mutex_assert_owner(&buf_pool.mutex);
- if (evict)
- {
- if (buf_pool.n_flush_LRU_)
- return 0;
- buf_pool.n_flush_LRU_= 1;
- }
-
flush_counters_t n;
buf_do_LRU_batch(max_n, evict, &n);
+ ulint pages= n.flushed;
+
if (n.evicted)
{
+ if (evict)
+ pages+= n.evicted;
buf_pool.try_LRU_scan= true;
- pthread_cond_signal(&buf_pool.done_free);
+ pthread_cond_broadcast(&buf_pool.done_free);
}
- if (!evict)
- return n.flushed;
-
- if (!--buf_pool.n_flush_LRU_)
- pthread_cond_broadcast(&buf_pool.done_flush_LRU);
-
- return n.evicted + n.flushed;
+ return pages;
}
/** Initiate a log checkpoint, discarding the start of the log.
@@ -1826,9 +1849,14 @@ static void buf_flush_wait(lsn_t lsn)
buf_flush_sync_lsn= lsn;
buf_pool.page_cleaner_set_idle(false);
pthread_cond_signal(&buf_pool.do_flush_list);
+ my_cond_wait(&buf_pool.done_flush_list,
+ &buf_pool.flush_list_mutex.m_mutex);
+ if (buf_pool.get_oldest_modification(lsn) >= lsn)
+ break;
}
- my_cond_wait(&buf_pool.done_flush_list,
- &buf_pool.flush_list_mutex.m_mutex);
+ mysql_mutex_unlock(&buf_pool.flush_list_mutex);
+ buf_dblwr.wait_for_page_writes();
+ mysql_mutex_lock(&buf_pool.flush_list_mutex);
}
}
@@ -1849,6 +1877,9 @@ ATTRIBUTE_COLD void buf_flush_wait_flushed(lsn_t sync_lsn)
if (buf_pool.get_oldest_modification(sync_lsn) < sync_lsn)
{
MONITOR_INC(MONITOR_FLUSH_SYNC_WAITS);
+ thd_wait_begin(nullptr, THD_WAIT_DISKIO);
+ tpool::tpool_wait_begin();
+
#if 1 /* FIXME: remove this, and guarantee that the page cleaner serves us */
if (UNIV_UNLIKELY(!buf_page_cleaner_is_active))
{
@@ -1856,28 +1887,23 @@ ATTRIBUTE_COLD void buf_flush_wait_flushed(lsn_t sync_lsn)
{
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
ulint n_pages= buf_flush_list(srv_max_io_capacity, sync_lsn);
- mysql_mutex_lock(&buf_pool.mutex);
- buf_flush_wait_batch_end(false);
- mysql_mutex_unlock(&buf_pool.mutex);
if (n_pages)
{
MONITOR_INC_VALUE_CUMULATIVE(MONITOR_FLUSH_SYNC_TOTAL_PAGE,
MONITOR_FLUSH_SYNC_COUNT,
MONITOR_FLUSH_SYNC_PAGES, n_pages);
}
+ buf_dblwr.wait_for_page_writes();
mysql_mutex_lock(&buf_pool.flush_list_mutex);
}
while (buf_pool.get_oldest_modification(sync_lsn) < sync_lsn);
}
else
#endif
- {
- thd_wait_begin(nullptr, THD_WAIT_DISKIO);
- tpool::tpool_wait_begin();
buf_flush_wait(sync_lsn);
- tpool::tpool_wait_end();
- thd_wait_end(nullptr);
- }
+
+ tpool::tpool_wait_end();
+ thd_wait_end(nullptr);
}
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
@@ -1930,11 +1956,10 @@ and try to initiate checkpoints until the target is met.
ATTRIBUTE_COLD static void buf_flush_sync_for_checkpoint(lsn_t lsn)
{
ut_ad(!srv_read_only_mode);
+ mysql_mutex_assert_not_owner(&buf_pool.flush_list_mutex);
for (;;)
{
- mysql_mutex_unlock(&buf_pool.flush_list_mutex);
-
if (ulint n_flushed= buf_flush_list(srv_max_io_capacity, lsn))
{
MONITOR_INC_VALUE_CUMULATIVE(MONITOR_FLUSH_SYNC_TOTAL_PAGE,
@@ -1985,6 +2010,7 @@ ATTRIBUTE_COLD static void buf_flush_sync_for_checkpoint(lsn_t lsn)
/* wake up buf_flush_wait() */
pthread_cond_broadcast(&buf_pool.done_flush_list);
+ mysql_mutex_unlock(&buf_pool.flush_list_mutex);
lsn= std::max(lsn, target);
@@ -2179,8 +2205,6 @@ static void buf_flush_page_cleaner()
timespec abstime;
set_timespec(abstime, 1);
- mysql_mutex_lock(&buf_pool.flush_list_mutex);
-
lsn_t lsn_limit;
ulint last_activity_count= srv_get_activity_count();
@@ -2188,45 +2212,34 @@ static void buf_flush_page_cleaner()
{
lsn_limit= buf_flush_sync_lsn;
- if (UNIV_UNLIKELY(lsn_limit != 0))
+ if (UNIV_UNLIKELY(lsn_limit != 0) && UNIV_LIKELY(srv_flush_sync))
{
furious_flush:
- if (UNIV_LIKELY(srv_flush_sync))
- {
- buf_flush_sync_for_checkpoint(lsn_limit);
- last_pages= 0;
- set_timespec(abstime, 1);
- continue;
- }
+ buf_flush_sync_for_checkpoint(lsn_limit);
+ last_pages= 0;
+ set_timespec(abstime, 1);
+ continue;
}
+
+ mysql_mutex_lock(&buf_pool.flush_list_mutex);
+ if (buf_pool.ran_out())
+ goto no_wait;
else if (srv_shutdown_state > SRV_SHUTDOWN_INITIATED)
break;
- /* If buf pager cleaner is idle and there is no work
- (either dirty pages are all flushed or adaptive flushing
- is not enabled) then opt for non-timed wait */
if (buf_pool.page_cleaner_idle() &&
(!UT_LIST_GET_LEN(buf_pool.flush_list) ||
srv_max_dirty_pages_pct_lwm == 0.0))
+ /* We are idle; wait for buf_pool.page_cleaner_wakeup() */
my_cond_wait(&buf_pool.do_flush_list,
&buf_pool.flush_list_mutex.m_mutex);
else
my_cond_timedwait(&buf_pool.do_flush_list,
&buf_pool.flush_list_mutex.m_mutex, &abstime);
-
+ no_wait:
set_timespec(abstime, 1);
- lsn_t soft_lsn_limit= buf_flush_async_lsn;
lsn_limit= buf_flush_sync_lsn;
-
- if (UNIV_UNLIKELY(lsn_limit != 0))
- {
- if (UNIV_LIKELY(srv_flush_sync))
- goto furious_flush;
- }
- else if (srv_shutdown_state > SRV_SHUTDOWN_INITIATED)
- break;
-
const lsn_t oldest_lsn= buf_pool.get_oldest_modification(0);
if (!oldest_lsn)
@@ -2241,6 +2254,8 @@ static void buf_flush_page_cleaner()
buf_flush_async_lsn= 0;
set_idle:
buf_pool.page_cleaner_set_idle(true);
+ if (UNIV_UNLIKELY(srv_shutdown_state > SRV_SHUTDOWN_INITIATED))
+ break;
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
end_of_batch:
buf_dblwr.flush_buffered_writes();
@@ -2257,10 +2272,57 @@ static void buf_flush_page_cleaner()
}
while (false);
+ if (!buf_pool.ran_out())
+ continue;
mysql_mutex_lock(&buf_pool.flush_list_mutex);
- continue;
}
+ lsn_t soft_lsn_limit= buf_flush_async_lsn;
+
+ if (UNIV_UNLIKELY(lsn_limit != 0))
+ {
+ if (srv_flush_sync)
+ goto do_furious_flush;
+ if (oldest_lsn >= lsn_limit)
+ {
+ buf_flush_sync_lsn= 0;
+ pthread_cond_broadcast(&buf_pool.done_flush_list);
+ }
+ else if (lsn_limit > soft_lsn_limit)
+ soft_lsn_limit= lsn_limit;
+ }
+
+ bool idle_flush= false;
+ ulint n_flushed= 0, n;
+
+ if (UNIV_UNLIKELY(soft_lsn_limit != 0))
+ {
+ if (oldest_lsn >= soft_lsn_limit)
+ buf_flush_async_lsn= soft_lsn_limit= 0;
+ }
+ else if (buf_pool.ran_out())
+ {
+ buf_pool.page_cleaner_set_idle(false);
+ mysql_mutex_unlock(&buf_pool.flush_list_mutex);
+ n= srv_max_io_capacity;
+ mysql_mutex_lock(&buf_pool.mutex);
+ LRU_flush:
+ n= buf_flush_LRU(n, false);
+ mysql_mutex_unlock(&buf_pool.mutex);
+ last_pages+= n;
+
+ if (!idle_flush)
+ goto end_of_batch;
+
+ /* when idle flushing kicks in page_cleaner is marked active.
+ reset it back to idle since the it was made active as part of
+ idle flushing stage. */
+ mysql_mutex_lock(&buf_pool.flush_list_mutex);
+ goto set_idle;
+ }
+ else if (UNIV_UNLIKELY(srv_shutdown_state > SRV_SHUTDOWN_INITIATED))
+ break;
+
const ulint dirty_blocks= UT_LIST_GET_LEN(buf_pool.flush_list);
ut_ad(dirty_blocks);
/* We perform dirty reads of the LRU+free list lengths here.
@@ -2268,60 +2330,53 @@ static void buf_flush_page_cleaner()
guaranteed to be nonempty, and it is a subset of buf_pool.LRU. */
const double dirty_pct= double(dirty_blocks) * 100.0 /
double(UT_LIST_GET_LEN(buf_pool.LRU) + UT_LIST_GET_LEN(buf_pool.free));
-
- bool idle_flush= false;
-
- if (lsn_limit || soft_lsn_limit);
- else if (af_needed_for_redo(oldest_lsn));
- else if (srv_max_dirty_pages_pct_lwm != 0.0)
+ if (srv_max_dirty_pages_pct_lwm != 0.0)
{
const ulint activity_count= srv_get_activity_count();
if (activity_count != last_activity_count)
+ {
last_activity_count= activity_count;
+ goto maybe_unemployed;
+ }
else if (buf_pool.page_cleaner_idle() && buf_pool.n_pend_reads == 0)
{
- /* reaching here means 3 things:
- - last_activity_count == activity_count: suggesting server is idle
- (no trx_t::commit activity)
- - page cleaner is idle (dirty_pct < srv_max_dirty_pages_pct_lwm)
- - there are no pending reads but there are dirty pages to flush */
- idle_flush= true;
+ /* reaching here means 3 things:
+ - last_activity_count == activity_count: suggesting server is idle
+ (no trx_t::commit() activity)
+ - page cleaner is idle (dirty_pct < srv_max_dirty_pages_pct_lwm)
+ - there are no pending reads but there are dirty pages to flush */
buf_pool.update_last_activity_count(activity_count);
+ mysql_mutex_unlock(&buf_pool.flush_list_mutex);
+ idle_flush= true;
+ goto idle_flush;
}
-
- if (!idle_flush && dirty_pct < srv_max_dirty_pages_pct_lwm)
- goto unemployed;
+ else
+ maybe_unemployed:
+ if (dirty_pct < srv_max_dirty_pages_pct_lwm)
+ goto possibly_unemployed;
}
else if (dirty_pct < srv_max_buf_pool_modified_pct)
- goto unemployed;
-
- if (UNIV_UNLIKELY(lsn_limit != 0) && oldest_lsn >= lsn_limit)
- lsn_limit= buf_flush_sync_lsn= 0;
- if (UNIV_UNLIKELY(soft_lsn_limit != 0) && oldest_lsn >= soft_lsn_limit)
- soft_lsn_limit= buf_flush_async_lsn= 0;
+ possibly_unemployed:
+ if (!soft_lsn_limit && !af_needed_for_redo(oldest_lsn))
+ goto unemployed;
buf_pool.page_cleaner_set_idle(false);
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
- if (!lsn_limit)
- lsn_limit= soft_lsn_limit;
-
- ulint n_flushed= 0, n;
-
- if (UNIV_UNLIKELY(lsn_limit != 0))
+ if (UNIV_UNLIKELY(soft_lsn_limit != 0))
{
n= srv_max_io_capacity;
goto background_flush;
}
- else if (idle_flush || !srv_adaptive_flushing)
+
+ if (!srv_adaptive_flushing)
{
+ idle_flush:
n= srv_io_capacity;
- lsn_limit= LSN_MAX;
+ soft_lsn_limit= LSN_MAX;
background_flush:
mysql_mutex_lock(&buf_pool.mutex);
- n_flushed= buf_flush_list_holding_mutex(n, lsn_limit);
- /* wake up buf_flush_wait() */
- pthread_cond_broadcast(&buf_pool.done_flush_list);
+ n_flushed= buf_flush_list_holding_mutex(n, soft_lsn_limit);
MONITOR_INC_VALUE_CUMULATIVE(MONITOR_FLUSH_BACKGROUND_TOTAL_PAGE,
MONITOR_FLUSH_BACKGROUND_COUNT,
MONITOR_FLUSH_BACKGROUND_PAGES,
@@ -2347,18 +2402,8 @@ static void buf_flush_page_cleaner()
goto unemployed;
}
- n= buf_flush_LRU(n >= n_flushed ? n - n_flushed : 0, false);
- mysql_mutex_unlock(&buf_pool.mutex);
- last_pages+= n;
-
- if (!idle_flush)
- goto end_of_batch;
-
- /* when idle flushing kicks in page_cleaner is marked active.
- reset it back to idle since the it was made active as part of
- idle flushing stage. */
- mysql_mutex_lock(&buf_pool.flush_list_mutex);
- goto set_idle;
+ n= n >= n_flushed ? n - n_flushed : 0;
+ goto LRU_flush;
}
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
@@ -2366,16 +2411,20 @@ static void buf_flush_page_cleaner()
if (srv_fast_shutdown != 2)
{
buf_dblwr.flush_buffered_writes();
- mysql_mutex_lock(&buf_pool.mutex);
- buf_flush_wait_batch_end(true);
- buf_flush_wait_batch_end(false);
- mysql_mutex_unlock(&buf_pool.mutex);
+ mysql_mutex_lock(&buf_pool.flush_list_mutex);
+ buf_flush_wait_LRU_batch_end();
+ mysql_mutex_unlock(&buf_pool.flush_list_mutex);
+ buf_dblwr.wait_for_page_writes();
}
mysql_mutex_lock(&buf_pool.flush_list_mutex);
lsn_limit= buf_flush_sync_lsn;
if (UNIV_UNLIKELY(lsn_limit != 0))
+ {
+ do_furious_flush:
+ mysql_mutex_unlock(&buf_pool.flush_list_mutex);
goto furious_flush;
+ }
buf_page_cleaner_is_active= false;
pthread_cond_broadcast(&buf_pool.done_flush_list);
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
@@ -2400,17 +2449,6 @@ ATTRIBUTE_COLD void buf_flush_page_cleaner_init()
std::thread(buf_flush_page_cleaner).detach();
}
-#if defined(HAVE_SYSTEMD) && !defined(EMBEDDED_LIBRARY)
-/** @return the number of dirty pages in the buffer pool */
-static ulint buf_flush_list_length()
-{
- mysql_mutex_lock(&buf_pool.flush_list_mutex);
- const ulint len= UT_LIST_GET_LEN(buf_pool.flush_list);
- mysql_mutex_unlock(&buf_pool.flush_list_mutex);
- return len;
-}
-#endif
-
/** Flush the buffer pool on shutdown. */
ATTRIBUTE_COLD void buf_flush_buffer_pool()
{
@@ -2425,24 +2463,20 @@ ATTRIBUTE_COLD void buf_flush_buffer_pool()
while (buf_pool.get_oldest_modification(0))
{
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
- mysql_mutex_lock(&buf_pool.mutex);
- buf_flush_list_holding_mutex(srv_max_io_capacity);
- if (buf_pool.n_flush_list_)
+ buf_flush_list(srv_max_io_capacity);
+ if (const size_t pending= buf_dblwr.pending_writes())
{
- mysql_mutex_unlock(&buf_pool.mutex);
timespec abstime;
service_manager_extend_timeout(INNODB_EXTEND_TIMEOUT_INTERVAL,
- "Waiting to flush " ULINTPF " pages",
- buf_flush_list_length());
+ "Waiting to write %zu pages", pending);
set_timespec(abstime, INNODB_EXTEND_TIMEOUT_INTERVAL / 2);
- buf_dblwr.flush_buffered_writes();
- mysql_mutex_lock(&buf_pool.mutex);
- while (buf_pool.n_flush_list_)
- my_cond_timedwait(&buf_pool.done_flush_list, &buf_pool.mutex.m_mutex,
- &abstime);
+ buf_dblwr.wait_for_page_writes(abstime);
}
- mysql_mutex_unlock(&buf_pool.mutex);
+
mysql_mutex_lock(&buf_pool.flush_list_mutex);
+ service_manager_extend_timeout(INNODB_EXTEND_TIMEOUT_INTERVAL,
+ "Waiting to flush " ULINTPF " pages",
+ UT_LIST_GET_LEN(buf_pool.flush_list));
}
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
@@ -2483,6 +2517,7 @@ void buf_flush_sync()
if (lsn == log_sys.get_lsn())
break;
}
+
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
tpool::tpool_wait_end();
thd_wait_end(nullptr);
diff --git a/storage/innobase/buf/buf0lru.cc b/storage/innobase/buf/buf0lru.cc
index 9fa6492d525..1947dfaeeb4 100644
--- a/storage/innobase/buf/buf0lru.cc
+++ b/storage/innobase/buf/buf0lru.cc
@@ -136,7 +136,6 @@ static void buf_LRU_block_free_hashed_page(buf_block_t *block)
@param[in] bpage control block */
static inline void incr_LRU_size_in_bytes(const buf_page_t* bpage)
{
- /* FIXME: use atomics, not mutex */
mysql_mutex_assert_owner(&buf_pool.mutex);
buf_pool.stat.LRU_bytes += bpage->physical_size();
@@ -400,6 +399,7 @@ buf_block_t *buf_LRU_get_free_block(bool have_mutex)
DBUG_EXECUTE_IF("recv_ran_out_of_buffer",
if (recv_recovery_is_on()
&& recv_sys.apply_log_recs) {
+ mysql_mutex_lock(&buf_pool.mutex);
goto flush_lru;
});
get_mutex:
@@ -445,20 +445,32 @@ got_block:
if ((block = buf_LRU_get_free_only()) != nullptr) {
goto got_block;
}
- if (!buf_pool.n_flush_LRU_) {
- break;
+ mysql_mutex_unlock(&buf_pool.mutex);
+ mysql_mutex_lock(&buf_pool.flush_list_mutex);
+ const auto n_flush = buf_pool.n_flush();
+ mysql_mutex_unlock(&buf_pool.flush_list_mutex);
+ mysql_mutex_lock(&buf_pool.mutex);
+ if (!n_flush) {
+ goto not_found;
+ }
+ if (!buf_pool.try_LRU_scan) {
+ mysql_mutex_lock(&buf_pool.flush_list_mutex);
+ buf_pool.page_cleaner_wakeup(true);
+ mysql_mutex_unlock(&buf_pool.flush_list_mutex);
+ my_cond_wait(&buf_pool.done_free,
+ &buf_pool.mutex.m_mutex);
}
- my_cond_wait(&buf_pool.done_free, &buf_pool.mutex.m_mutex);
}
-#ifndef DBUG_OFF
not_found:
-#endif
- mysql_mutex_unlock(&buf_pool.mutex);
+ if (n_iterations > 1) {
+ MONITOR_INC( MONITOR_LRU_GET_FREE_WAITS );
+ }
- if (n_iterations > 20 && !buf_lru_free_blocks_error_printed
+ if (n_iterations == 21 && !buf_lru_free_blocks_error_printed
&& srv_buf_pool_old_size == srv_buf_pool_size) {
-
+ buf_lru_free_blocks_error_printed = true;
+ mysql_mutex_unlock(&buf_pool.mutex);
ib::warn() << "Difficult to find free blocks in the buffer pool"
" (" << n_iterations << " search iterations)! "
<< flush_failures << " failed attempts to"
@@ -472,12 +484,7 @@ not_found:
<< os_n_file_writes << " OS file writes, "
<< os_n_fsyncs
<< " OS fsyncs.";
-
- buf_lru_free_blocks_error_printed = true;
- }
-
- if (n_iterations > 1) {
- MONITOR_INC( MONITOR_LRU_GET_FREE_WAITS );
+ mysql_mutex_lock(&buf_pool.mutex);
}
/* No free block was found: try to flush the LRU list.
@@ -491,8 +498,6 @@ not_found:
#ifndef DBUG_OFF
flush_lru:
#endif
- mysql_mutex_lock(&buf_pool.mutex);
-
if (!buf_flush_LRU(innodb_lru_flush_size, true)) {
MONITOR_INC(MONITOR_LRU_SINGLE_FLUSH_FAILURE_COUNT);
++flush_failures;
@@ -1039,7 +1044,8 @@ buf_LRU_block_free_non_file_page(
} else {
UT_LIST_ADD_FIRST(buf_pool.free, &block->page);
ut_d(block->page.in_free_list = true);
- pthread_cond_signal(&buf_pool.done_free);
+ buf_pool.try_LRU_scan= true;
+ pthread_cond_broadcast(&buf_pool.done_free);
}
MEM_NOACCESS(block->page.frame, srv_page_size);
diff --git a/storage/innobase/buf/buf0rea.cc b/storage/innobase/buf/buf0rea.cc
index b20b105a4c4..b39a8f49133 100644
--- a/storage/innobase/buf/buf0rea.cc
+++ b/storage/innobase/buf/buf0rea.cc
@@ -226,6 +226,7 @@ static buf_page_t* buf_page_init_for_read(ulint mode, const page_id_t page_id,
buf_LRU_add_block(bpage, true/* to old blocks */);
}
+ buf_pool.stat.n_pages_read++;
mysql_mutex_unlock(&buf_pool.mutex);
buf_pool.n_pend_reads++;
goto func_exit_no_mutex;
@@ -245,20 +246,18 @@ buffer buf_pool if it is not already there, in which case does nothing.
Sets the io_fix flag and sets an exclusive lock on the buffer frame. The
flag is cleared and the x-lock released by an i/o-handler thread.
-@param[out] err DB_SUCCESS or DB_TABLESPACE_DELETED
- if we are trying
- to read from a non-existent tablespace
@param[in,out] space tablespace
@param[in] sync true if synchronous aio is desired
@param[in] mode BUF_READ_IBUF_PAGES_ONLY, ...,
@param[in] page_id page id
@param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0
@param[in] unzip true=request uncompressed page
-@return whether a read request was queued */
+@return error code
+@retval DB_SUCCESS if the page was read
+@retval DB_SUCCESS_LOCKED_REC if the page exists in the buffer pool already */
static
-bool
+dberr_t
buf_read_page_low(
- dberr_t* err,
fil_space_t* space,
bool sync,
ulint mode,
@@ -268,15 +267,12 @@ buf_read_page_low(
{
buf_page_t* bpage;
- *err = DB_SUCCESS;
-
if (buf_dblwr.is_inside(page_id)) {
ib::error() << "Trying to read doublewrite buffer page "
<< page_id;
ut_ad(0);
-nothing_read:
space->release();
- return false;
+ return DB_PAGE_CORRUPTED;
}
if (sync) {
@@ -299,8 +295,9 @@ nothing_read:
completed */
bpage = buf_page_init_for_read(mode, page_id, zip_size, unzip);
- if (bpage == NULL) {
- goto nothing_read;
+ if (!bpage) {
+ space->release();
+ return DB_SUCCESS_LOCKED_REC;
}
ut_ad(bpage->in_file());
@@ -320,7 +317,6 @@ nothing_read:
? IORequest::READ_SYNC
: IORequest::READ_ASYNC),
page_id.page_no() * len, len, dst, bpage);
- *err = fio.err;
if (UNIV_UNLIKELY(fio.err != DB_SUCCESS)) {
ut_d(auto n=) buf_pool.n_pend_reads--;
@@ -329,14 +325,14 @@ nothing_read:
} else if (sync) {
thd_wait_end(NULL);
/* The i/o was already completed in space->io() */
- *err = bpage->read_complete(*fio.node);
+ fio.err = bpage->read_complete(*fio.node);
space->release();
- if (*err == DB_FAIL) {
- *err = DB_PAGE_CORRUPTED;
+ if (fio.err == DB_FAIL) {
+ fio.err = DB_PAGE_CORRUPTED;
}
}
- return true;
+ return fio.err;
}
/** Applies a random read-ahead in buf_pool if there are at least a threshold
@@ -414,24 +410,26 @@ read_ahead:
continue;
if (space->is_stopping())
break;
- dberr_t err;
space->reacquire();
- if (buf_read_page_low(&err, space, false, ibuf_mode, i, zip_size, false))
+ if (buf_read_page_low(space, false, ibuf_mode, i, zip_size, false) ==
+ DB_SUCCESS)
count++;
}
if (count)
+ {
DBUG_PRINT("ib_buf", ("random read-ahead %zu pages from %s: %u",
count, space->chain.start->name,
low.page_no()));
- space->release();
-
- /* Read ahead is considered one I/O operation for the purpose of
- LRU policy decision. */
- buf_LRU_stat_inc_io();
+ mysql_mutex_lock(&buf_pool.mutex);
+ /* Read ahead is considered one I/O operation for the purpose of
+ LRU policy decision. */
+ buf_LRU_stat_inc_io();
+ buf_pool.stat.n_ra_pages_read_rnd+= count;
+ mysql_mutex_unlock(&buf_pool.mutex);
+ }
- buf_pool.stat.n_ra_pages_read_rnd+= count;
- srv_stats.buf_pool_reads.add(count);
+ space->release();
return count;
}
@@ -441,8 +439,9 @@ on the buffer frame. The flag is cleared and the x-lock
released by the i/o-handler thread.
@param[in] page_id page id
@param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0
-@retval DB_SUCCESS if the page was read and is not corrupted,
-@retval DB_PAGE_CORRUPTED if page based on checksum check is corrupted,
+@retval DB_SUCCESS if the page was read and is not corrupted
+@retval DB_SUCCESS_LOCKED_REC if the page was not read
+@retval DB_PAGE_CORRUPTED if page based on checksum check is corrupted
@retval DB_DECRYPTION_FAILED if page post encryption checksum matches but
after decryption normal page checksum does not match.
@retval DB_TABLESPACE_DELETED if tablespace .ibd file is missing */
@@ -456,13 +455,9 @@ dberr_t buf_read_page(const page_id_t page_id, ulint zip_size)
return DB_TABLESPACE_DELETED;
}
- dberr_t err;
- if (buf_read_page_low(&err, space, true, BUF_READ_ANY_PAGE,
- page_id, zip_size, false))
- srv_stats.buf_pool_reads.add(1);
-
- buf_LRU_stat_inc_io();
- return err;
+ buf_LRU_stat_inc_io(); /* NOT protected by buf_pool.mutex */
+ return buf_read_page_low(space, true, BUF_READ_ANY_PAGE,
+ page_id, zip_size, false);
}
/** High-level function which reads a page asynchronously from a file to the
@@ -475,12 +470,8 @@ released by the i/o-handler thread.
void buf_read_page_background(fil_space_t *space, const page_id_t page_id,
ulint zip_size)
{
- dberr_t err;
-
- if (buf_read_page_low(&err, space, false, BUF_READ_ANY_PAGE,
- page_id, zip_size, false)) {
- srv_stats.buf_pool_reads.add(1);
- }
+ buf_read_page_low(space, false, BUF_READ_ANY_PAGE,
+ page_id, zip_size, false);
/* We do not increment number of I/O operations used for LRU policy
here (buf_LRU_stat_inc_io()). We use this in heuristics to decide
@@ -638,23 +629,26 @@ failed:
continue;
if (space->is_stopping())
break;
- dberr_t err;
space->reacquire();
- count+= buf_read_page_low(&err, space, false, ibuf_mode, new_low, zip_size,
- false);
+ if (buf_read_page_low(space, false, ibuf_mode, new_low, zip_size, false) ==
+ DB_SUCCESS)
+ count++;
}
if (count)
+ {
DBUG_PRINT("ib_buf", ("random read-ahead %zu pages from %s: %u",
count, space->chain.start->name,
new_low.page_no()));
- space->release();
-
- /* Read ahead is considered one I/O operation for the purpose of
- LRU policy decision. */
- buf_LRU_stat_inc_io();
+ mysql_mutex_lock(&buf_pool.mutex);
+ /* Read ahead is considered one I/O operation for the purpose of
+ LRU policy decision. */
+ buf_LRU_stat_inc_io();
+ buf_pool.stat.n_ra_pages_read+= count;
+ mysql_mutex_unlock(&buf_pool.mutex);
+ }
- buf_pool.stat.n_ra_pages_read+= count;
+ space->release();
return count;
}
@@ -709,13 +703,12 @@ void buf_read_recv_pages(ulint space_id, const uint32_t* page_nos, ulint n)
}
}
- dberr_t err;
space->reacquire();
- buf_read_page_low(&err, space, false,
- BUF_READ_ANY_PAGE, cur_page_id, zip_size,
- true);
-
- if (err != DB_SUCCESS) {
+ switch (buf_read_page_low(space, false, BUF_READ_ANY_PAGE,
+ cur_page_id, zip_size, true)) {
+ case DB_SUCCESS: case DB_SUCCESS_LOCKED_REC:
+ break;
+ default:
sql_print_error("InnoDB: Recovery failed to read page "
UINT32PF " from %s",
cur_page_id.page_no(),
diff --git a/storage/innobase/gis/gis0rtree.cc b/storage/innobase/gis/gis0rtree.cc
index 59d77c9c5fc..83afd732b21 100644
--- a/storage/innobase/gis/gis0rtree.cc
+++ b/storage/innobase/gis/gis0rtree.cc
@@ -1209,8 +1209,6 @@ after_insert:
ut_ad(!rec || rec_offs_validate(rec, cursor->index(), *offsets));
#endif
- MONITOR_INC(MONITOR_INDEX_SPLIT);
-
return(rec);
}
diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
index aa2fb7c38eb..cac20c70e02 100644
--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
@@ -915,43 +915,37 @@ static SHOW_VAR innodb_status_variables[]= {
(char*) &export_vars.innodb_buffer_pool_resize_status, SHOW_CHAR},
{"buffer_pool_load_incomplete",
&export_vars.innodb_buffer_pool_load_incomplete, SHOW_BOOL},
- {"buffer_pool_pages_data",
- &export_vars.innodb_buffer_pool_pages_data, SHOW_SIZE_T},
+ {"buffer_pool_pages_data", &UT_LIST_GET_LEN(buf_pool.LRU), SHOW_SIZE_T},
{"buffer_pool_bytes_data",
&export_vars.innodb_buffer_pool_bytes_data, SHOW_SIZE_T},
{"buffer_pool_pages_dirty",
- &export_vars.innodb_buffer_pool_pages_dirty, SHOW_SIZE_T},
- {"buffer_pool_bytes_dirty",
- &export_vars.innodb_buffer_pool_bytes_dirty, SHOW_SIZE_T},
- {"buffer_pool_pages_flushed", &buf_flush_page_count, SHOW_SIZE_T},
- {"buffer_pool_pages_free",
- &export_vars.innodb_buffer_pool_pages_free, SHOW_SIZE_T},
+ &UT_LIST_GET_LEN(buf_pool.flush_list), SHOW_SIZE_T},
+ {"buffer_pool_bytes_dirty", &buf_pool.flush_list_bytes, SHOW_SIZE_T},
+ {"buffer_pool_pages_flushed", &buf_pool.stat.n_pages_written, SHOW_SIZE_T},
+ {"buffer_pool_pages_free", &UT_LIST_GET_LEN(buf_pool.free), SHOW_SIZE_T},
#ifdef UNIV_DEBUG
{"buffer_pool_pages_latched",
&export_vars.innodb_buffer_pool_pages_latched, SHOW_SIZE_T},
#endif /* UNIV_DEBUG */
{"buffer_pool_pages_made_not_young",
- &export_vars.innodb_buffer_pool_pages_made_not_young, SHOW_SIZE_T},
+ &buf_pool.stat.n_pages_not_made_young, SHOW_SIZE_T},
{"buffer_pool_pages_made_young",
- &export_vars.innodb_buffer_pool_pages_made_young, SHOW_SIZE_T},
+ &buf_pool.stat.n_pages_made_young, SHOW_SIZE_T},
{"buffer_pool_pages_misc",
&export_vars.innodb_buffer_pool_pages_misc, SHOW_SIZE_T},
- {"buffer_pool_pages_old",
- &export_vars.innodb_buffer_pool_pages_old, SHOW_SIZE_T},
+ {"buffer_pool_pages_old", &buf_pool.LRU_old_len, SHOW_SIZE_T},
{"buffer_pool_pages_total",
&export_vars.innodb_buffer_pool_pages_total, SHOW_SIZE_T},
{"buffer_pool_pages_LRU_flushed", &buf_lru_flush_page_count, SHOW_SIZE_T},
{"buffer_pool_pages_LRU_freed", &buf_lru_freed_page_count, SHOW_SIZE_T},
+ {"buffer_pool_pages_split", &buf_pool.pages_split, SHOW_SIZE_T},
{"buffer_pool_read_ahead_rnd",
- &export_vars.innodb_buffer_pool_read_ahead_rnd, SHOW_SIZE_T},
- {"buffer_pool_read_ahead",
- &export_vars.innodb_buffer_pool_read_ahead, SHOW_SIZE_T},
+ &buf_pool.stat.n_ra_pages_read_rnd, SHOW_SIZE_T},
+ {"buffer_pool_read_ahead", &buf_pool.stat.n_ra_pages_read, SHOW_SIZE_T},
{"buffer_pool_read_ahead_evicted",
- &export_vars.innodb_buffer_pool_read_ahead_evicted, SHOW_SIZE_T},
- {"buffer_pool_read_requests",
- &export_vars.innodb_buffer_pool_read_requests, SHOW_SIZE_T},
- {"buffer_pool_reads",
- &export_vars.innodb_buffer_pool_reads, SHOW_SIZE_T},
+ &buf_pool.stat.n_ra_pages_evicted, SHOW_SIZE_T},
+ {"buffer_pool_read_requests", &buf_pool.stat.n_page_gets, SHOW_SIZE_T},
+ {"buffer_pool_reads", &buf_pool.stat.n_pages_read, SHOW_SIZE_T},
{"buffer_pool_wait_free", &buf_pool.stat.LRU_waits, SHOW_SIZE_T},
{"buffer_pool_write_requests",
&export_vars.innodb_buffer_pool_write_requests, SHOW_SIZE_T},
diff --git a/storage/innobase/include/buf0buf.h b/storage/innobase/include/buf0buf.h
index e79cbdadcd6..94f8dc2badb 100644
--- a/storage/innobase/include/buf0buf.h
+++ b/storage/innobase/include/buf0buf.h
@@ -782,11 +782,11 @@ public:
it from buf_pool.flush_list */
inline void write_complete(bool temporary);
- /** Write a flushable page to a file. buf_pool.mutex must be held.
+ /** Write a flushable page to a file or free a freeable block.
@param evict whether to evict the page on write completion
@param space tablespace
- @return whether the page was flushed and buf_pool.mutex was released */
- inline bool flush(bool evict, fil_space_t *space);
+ @return whether a page write was initiated and buf_pool.mutex released */
+ bool flush(bool evict, fil_space_t *space);
/** Notify that a page in a temporary tablespace has been modified. */
void set_temp_modified()
@@ -856,8 +856,6 @@ public:
/** @return whether the block is mapped to a data file */
bool in_file() const { return state() >= FREED; }
- /** @return whether the block is modified and ready for flushing */
- inline bool ready_for_flush() const;
/** @return whether the block can be relocated in memory.
The block can be dirty, but it must not be I/O-fixed or bufferfixed. */
inline bool can_relocate() const;
@@ -1030,10 +1028,10 @@ Compute the hash fold value for blocks in buf_pool.zip_hash. */
#define BUF_POOL_ZIP_FOLD_BPAGE(b) BUF_POOL_ZIP_FOLD((buf_block_t*) (b))
/* @} */
-/** A "Hazard Pointer" class used to iterate over page lists
-inside the buffer pool. A hazard pointer is a buf_page_t pointer
+/** A "Hazard Pointer" class used to iterate over buf_pool.LRU or
+buf_pool.flush_list. A hazard pointer is a buf_page_t pointer
which we intend to iterate over next and we want it remain valid
-even after we release the buffer pool mutex. */
+even after we release the mutex that protects the list. */
class HazardPointer
{
public:
@@ -1148,7 +1146,8 @@ struct buf_buddy_free_t {
/*!< Node of zip_free list */
};
-/** @brief The buffer pool statistics structure. */
+/** @brief The buffer pool statistics structure;
+protected by buf_pool.mutex unless otherwise noted. */
struct buf_pool_stat_t{
/** Initialize the counters */
void init() { memset((void*) this, 0, sizeof *this); }
@@ -1157,9 +1156,8 @@ struct buf_pool_stat_t{
/*!< number of page gets performed;
also successful searches through
the adaptive hash index are
- counted as page gets; this field
- is NOT protected by the buffer
- pool mutex */
+ counted as page gets;
+ NOT protected by buf_pool.mutex */
ulint n_pages_read; /*!< number read operations */
ulint n_pages_written;/*!< number write operations */
ulint n_pages_created;/*!< number of pages created
@@ -1177,10 +1175,9 @@ struct buf_pool_stat_t{
young because the first access
was not long enough ago, in
buf_page_peek_if_too_old() */
- /** number of waits for eviction; writes protected by buf_pool.mutex */
+ /** number of waits for eviction */
ulint LRU_waits;
ulint LRU_bytes; /*!< LRU size in bytes */
- ulint flush_list_bytes;/*!< flush_list size in bytes */
};
/** Statistics of buddy blocks of a given size. */
@@ -1501,6 +1498,11 @@ public:
n_chunks_new / 4 * chunks->size;
}
+ /** @return whether the buffer pool has run out */
+ TPOOL_SUPPRESS_TSAN
+ bool ran_out() const
+ { return UNIV_UNLIKELY(!try_LRU_scan || !UT_LIST_GET_LEN(free)); }
+
/** @return whether the buffer pool is shrinking */
inline bool is_shrinking() const
{
@@ -1538,14 +1540,10 @@ public:
/** Buffer pool mutex */
alignas(CPU_LEVEL1_DCACHE_LINESIZE) mysql_mutex_t mutex;
- /** Number of pending LRU flush; protected by mutex. */
- ulint n_flush_LRU_;
- /** broadcast when n_flush_LRU reaches 0; protected by mutex */
- pthread_cond_t done_flush_LRU;
- /** Number of pending flush_list flush; protected by mutex */
- ulint n_flush_list_;
- /** broadcast when n_flush_list reaches 0; protected by mutex */
- pthread_cond_t done_flush_list;
+ /** current statistics; protected by mutex */
+ buf_pool_stat_t stat;
+ /** old statistics; protected by mutex */
+ buf_pool_stat_t old_stat;
/** @name General fields */
/* @{ */
@@ -1706,11 +1704,12 @@ public:
buf_buddy_stat_t buddy_stat[BUF_BUDDY_SIZES_MAX + 1];
/*!< Statistics of buddy system,
indexed by block size */
- buf_pool_stat_t stat; /*!< current statistics */
- buf_pool_stat_t old_stat; /*!< old statistics */
/* @} */
+ /** number of index page splits */
+ Atomic_counter<ulint> pages_split;
+
/** @name Page flushing algorithm fields */
/* @{ */
@@ -1719,31 +1718,76 @@ public:
alignas(CPU_LEVEL1_DCACHE_LINESIZE) mysql_mutex_t flush_list_mutex;
/** "hazard pointer" for flush_list scans; protected by flush_list_mutex */
FlushHp flush_hp;
- /** modified blocks (a subset of LRU) */
+ /** flush_list size in bytes; protected by flush_list_mutex */
+ ulint flush_list_bytes;
+ /** possibly modified persistent pages (a subset of LRU);
+ buf_dblwr.pending_writes() is approximately COUNT(is_write_fixed()) */
UT_LIST_BASE_NODE_T(buf_page_t) flush_list;
private:
- /** whether the page cleaner needs wakeup from indefinite sleep */
- bool page_cleaner_is_idle;
+ static constexpr unsigned PAGE_CLEANER_IDLE= 1;
+ static constexpr unsigned FLUSH_LIST_ACTIVE= 2;
+ static constexpr unsigned LRU_FLUSH= 4;
+
+ /** Number of pending LRU flush * LRU_FLUSH +
+ PAGE_CLEANER_IDLE + FLUSH_LIST_ACTIVE flags */
+ unsigned page_cleaner_status;
/** track server activity count for signaling idle flushing */
ulint last_activity_count;
public:
/** signalled to wake up the page_cleaner; protected by flush_list_mutex */
pthread_cond_t do_flush_list;
+ /** broadcast when !n_flush(); protected by flush_list_mutex */
+ pthread_cond_t done_flush_LRU;
+ /** broadcast when a batch completes; protected by flush_list_mutex */
+ pthread_cond_t done_flush_list;
+
+ /** @return number of pending LRU flush */
+ unsigned n_flush() const
+ {
+ mysql_mutex_assert_owner(&flush_list_mutex);
+ return page_cleaner_status / LRU_FLUSH;
+ }
+
+ /** Increment the number of pending LRU flush */
+ inline void n_flush_inc();
+
+ /** Decrement the number of pending LRU flush */
+ inline void n_flush_dec();
+
+ /** @return whether flush_list flushing is active */
+ bool flush_list_active() const
+ {
+ mysql_mutex_assert_owner(&flush_list_mutex);
+ return page_cleaner_status & FLUSH_LIST_ACTIVE;
+ }
+
+ void flush_list_set_active()
+ {
+ ut_ad(!flush_list_active());
+ page_cleaner_status+= FLUSH_LIST_ACTIVE;
+ }
+ void flush_list_set_inactive()
+ {
+ ut_ad(flush_list_active());
+ page_cleaner_status-= FLUSH_LIST_ACTIVE;
+ }
/** @return whether the page cleaner must sleep due to being idle */
bool page_cleaner_idle() const
{
mysql_mutex_assert_owner(&flush_list_mutex);
- return page_cleaner_is_idle;
+ return page_cleaner_status & PAGE_CLEANER_IDLE;
}
- /** Wake up the page cleaner if needed */
- void page_cleaner_wakeup();
+ /** Wake up the page cleaner if needed.
+ @param for_LRU whether to wake up for LRU eviction */
+ void page_cleaner_wakeup(bool for_LRU= false);
/** Register whether an explicit wakeup of the page cleaner is needed */
void page_cleaner_set_idle(bool deep_sleep)
{
mysql_mutex_assert_owner(&flush_list_mutex);
- page_cleaner_is_idle= deep_sleep;
+ page_cleaner_status= (page_cleaner_status & ~PAGE_CLEANER_IDLE) |
+ (PAGE_CLEANER_IDLE * deep_sleep);
}
/** Update server last activity count */
@@ -1753,9 +1797,6 @@ public:
last_activity_count= activity_count;
}
- // n_flush_LRU_ + n_flush_list_
- // is approximately COUNT(is_write_fixed()) in flush_list
-
unsigned freed_page_clock;/*!< a sequence number used
to count the number of buffer
blocks removed from the end of
@@ -1765,16 +1806,10 @@ public:
to read this for heuristic
purposes without holding any
mutex or latch */
- bool try_LRU_scan; /*!< Cleared when an LRU
- scan for free block fails. This
- flag is used to avoid repeated
- scans of LRU list when we know
- that there is no free block
- available in the scan depth for
- eviction. Set whenever
- we flush a batch from the
- buffer pool. Protected by the
- buf_pool.mutex */
+ /** Cleared when buf_LRU_get_free_block() fails.
+ Set whenever the free list grows, along with a broadcast of done_free.
+ Protected by buf_pool.mutex. */
+ Atomic_relaxed<bool> try_LRU_scan;
/* @} */
/** @name LRU replacement algorithm fields */
@@ -1783,8 +1818,8 @@ public:
UT_LIST_BASE_NODE_T(buf_page_t) free;
/*!< base node of the free
block list */
- /** signaled each time when the free list grows and
- broadcast each time try_LRU_scan is set; protected by mutex */
+ /** broadcast each time when the free list grows or try_LRU_scan is set;
+ protected by mutex */
pthread_cond_t done_free;
UT_LIST_BASE_NODE_T(buf_page_t) withdraw;
@@ -1844,29 +1879,20 @@ public:
{
if (n_pend_reads)
return true;
- mysql_mutex_lock(&mutex);
- const bool any_pending{n_flush_LRU_ || n_flush_list_};
- mysql_mutex_unlock(&mutex);
+ mysql_mutex_lock(&flush_list_mutex);
+ const bool any_pending= page_cleaner_status > PAGE_CLEANER_IDLE ||
+ buf_dblwr.pending_writes();
+ mysql_mutex_unlock(&flush_list_mutex);
return any_pending;
}
- /** @return total amount of pending I/O */
- TPOOL_SUPPRESS_TSAN ulint io_pending() const
- {
- return n_pend_reads + n_flush_LRU_ + n_flush_list_;
- }
private:
/** Remove a block from the flush list. */
inline void delete_from_flush_list_low(buf_page_t *bpage);
- /** Remove a block from flush_list.
- @param bpage buffer pool page
- @param clear whether to invoke buf_page_t::clear_oldest_modification() */
- void delete_from_flush_list(buf_page_t *bpage, bool clear);
public:
/** Remove a block from flush_list.
@param bpage buffer pool page */
- void delete_from_flush_list(buf_page_t *bpage)
- { delete_from_flush_list(bpage, true); }
+ void delete_from_flush_list(buf_page_t *bpage);
/** Insert a modified block into the flush list.
@param block modified block
@@ -1874,7 +1900,7 @@ public:
void insert_into_flush_list(buf_block_t *block, lsn_t lsn);
/** Free a page whose underlying file page has been freed. */
- inline void release_freed_page(buf_page_t *bpage);
+ ATTRIBUTE_COLD void release_freed_page(buf_page_t *bpage);
private:
/** Temporary memory for page_compressed and encrypted I/O */
@@ -1994,17 +2020,6 @@ inline void buf_page_t::clear_oldest_modification()
oldest_modification_.store(0, std::memory_order_release);
}
-/** @return whether the block is modified and ready for flushing */
-inline bool buf_page_t::ready_for_flush() const
-{
- mysql_mutex_assert_owner(&buf_pool.mutex);
- ut_ad(in_LRU_list);
- const auto s= state();
- ut_a(s >= FREED);
- ut_ad(!fsp_is_system_temporary(id().space()) || oldest_modification() == 2);
- return s < READ_FIX;
-}
-
/** @return whether the block can be relocated in memory.
The block can be dirty, but it must not be I/O-fixed or bufferfixed. */
inline bool buf_page_t::can_relocate() const
diff --git a/storage/innobase/include/buf0dblwr.h b/storage/innobase/include/buf0dblwr.h
index fb9df55504c..d9c9239c0b4 100644
--- a/storage/innobase/include/buf0dblwr.h
+++ b/storage/innobase/include/buf0dblwr.h
@@ -1,7 +1,7 @@
/*****************************************************************************
Copyright (c) 1995, 2017, Oracle and/or its affiliates. All Rights Reserved.
-Copyright (c) 2017, 2020, MariaDB Corporation.
+Copyright (c) 2017, 2022, MariaDB Corporation.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
@@ -54,9 +54,9 @@ class buf_dblwr_t
};
/** the page number of the first doublewrite block (block_size() pages) */
- page_id_t block1= page_id_t(0, 0);
+ page_id_t block1{0, 0};
/** the page number of the second doublewrite block (block_size() pages) */
- page_id_t block2= page_id_t(0, 0);
+ page_id_t block2{0, 0};
/** mutex protecting the data members below */
mysql_mutex_t mutex;
@@ -72,11 +72,15 @@ class buf_dblwr_t
ulint writes_completed;
/** number of pages written by flush_buffered_writes_completed() */
ulint pages_written;
+ /** condition variable for !writes_pending */
+ pthread_cond_t write_cond;
+ /** number of pending page writes */
+ size_t writes_pending;
slot slots[2];
- slot *active_slot= &slots[0];
+ slot *active_slot;
- /** Initialize the doublewrite buffer data structure.
+ /** Initialise the persistent storage of the doublewrite buffer.
@param header doublewrite page header in the TRX_SYS page */
inline void init(const byte *header);
@@ -84,6 +88,8 @@ class buf_dblwr_t
bool flush_buffered_writes(const ulint size);
public:
+ /** Initialise the doublewrite buffer data structures. */
+ void init();
/** Create or restore the doublewrite buffer in the TRX_SYS page.
@return whether the operation succeeded */
bool create();
@@ -118,7 +124,7 @@ public:
void recover();
/** Update the doublewrite buffer on data page write completion. */
- void write_completed();
+ void write_completed(bool with_doublewrite);
/** Flush possible buffered writes to persistent storage.
It is very important to call this function after a batch of writes has been
posted, and also when we may have to wait for a page latch!
@@ -137,14 +143,14 @@ public:
@param size payload size in bytes */
void add_to_batch(const IORequest &request, size_t size);
- /** Determine whether the doublewrite buffer is initialized */
- bool is_initialised() const
+ /** Determine whether the doublewrite buffer has been created */
+ bool is_created() const
{ return UNIV_LIKELY(block1 != page_id_t(0, 0)); }
/** @return whether a page identifier is part of the doublewrite buffer */
bool is_inside(const page_id_t id) const
{
- if (!is_initialised())
+ if (!is_created())
return false;
ut_ad(block1 < block2);
if (id < block1)
@@ -156,13 +162,44 @@ public:
/** Wait for flush_buffered_writes() to be fully completed */
void wait_flush_buffered_writes()
{
- if (is_initialised())
- {
- mysql_mutex_lock(&mutex);
- while (batch_running)
- my_cond_wait(&cond, &mutex.m_mutex);
- mysql_mutex_unlock(&mutex);
- }
+ mysql_mutex_lock(&mutex);
+ while (batch_running)
+ my_cond_wait(&cond, &mutex.m_mutex);
+ mysql_mutex_unlock(&mutex);
+ }
+
+ /** Register an unbuffered page write */
+ void add_unbuffered()
+ {
+ mysql_mutex_lock(&mutex);
+ writes_pending++;
+ mysql_mutex_unlock(&mutex);
+ }
+
+ size_t pending_writes()
+ {
+ mysql_mutex_lock(&mutex);
+ const size_t pending{writes_pending};
+ mysql_mutex_unlock(&mutex);
+ return pending;
+ }
+
+ /** Wait for writes_pending to reach 0 */
+ void wait_for_page_writes()
+ {
+ mysql_mutex_lock(&mutex);
+ while (writes_pending)
+ my_cond_wait(&write_cond, &mutex.m_mutex);
+ mysql_mutex_unlock(&mutex);
+ }
+
+ /** Wait for writes_pending to reach 0 */
+ void wait_for_page_writes(const timespec &abstime)
+ {
+ mysql_mutex_lock(&mutex);
+ while (writes_pending)
+ my_cond_timedwait(&write_cond, &mutex.m_mutex, &abstime);
+ mysql_mutex_unlock(&mutex);
}
};
diff --git a/storage/innobase/include/buf0flu.h b/storage/innobase/include/buf0flu.h
index d71a05c0ec9..13a9363922b 100644
--- a/storage/innobase/include/buf0flu.h
+++ b/storage/innobase/include/buf0flu.h
@@ -30,10 +30,8 @@ Created 11/5/1995 Heikki Tuuri
#include "log0log.h"
#include "buf0buf.h"
-/** Number of pages flushed. Protected by buf_pool.mutex. */
-extern ulint buf_flush_page_count;
/** Number of pages flushed via LRU. Protected by buf_pool.mutex.
-Also included in buf_flush_page_count. */
+Also included in buf_pool.stat.n_pages_written. */
extern ulint buf_lru_flush_page_count;
/** Number of pages freed without flushing. Protected by buf_pool.mutex. */
extern ulint buf_lru_freed_page_count;
@@ -96,9 +94,8 @@ after releasing buf_pool.mutex.
@retval 0 if a buf_pool.LRU batch is already running */
ulint buf_flush_LRU(ulint max_n, bool evict);
-/** Wait until a flush batch ends.
-@param lru true=buf_pool.LRU; false=buf_pool.flush_list */
-void buf_flush_wait_batch_end(bool lru);
+/** Wait until a LRU flush batch ends. */
+void buf_flush_wait_LRU_batch_end();
/** Wait until all persistent pages are flushed up to a limit.
@param sync_lsn buf_pool.get_oldest_modification(LSN_MAX) to wait for */
ATTRIBUTE_COLD void buf_flush_wait_flushed(lsn_t sync_lsn);
diff --git a/storage/innobase/include/buf0rea.h b/storage/innobase/include/buf0rea.h
index 8d6b28194dc..d898c5efc63 100644
--- a/storage/innobase/include/buf0rea.h
+++ b/storage/innobase/include/buf0rea.h
@@ -33,10 +33,11 @@ Created 11/5/1995 Heikki Tuuri
buffer buf_pool if it is not already there. Sets the io_fix flag and sets
an exclusive lock on the buffer frame. The flag is cleared and the x-lock
released by the i/o-handler thread.
-@param[in] page_id page id
-@param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0
-@retval DB_SUCCESS if the page was read and is not corrupted,
-@retval DB_PAGE_CORRUPTED if page based on checksum check is corrupted,
+@param page_id page id
+@param zip_size ROW_FORMAT=COMPRESSED page size, or 0
+@retval DB_SUCCESS if the page was read and is not corrupted
+@retval DB_SUCCESS_LOCKED_REC if the page was not read
+@retval DB_PAGE_CORRUPTED if page based on checksum check is corrupted
@retval DB_DECRYPTION_FAILED if page post encryption checksum matches but
after decryption normal page checksum does not match.
@retval DB_TABLESPACE_DELETED if tablespace .ibd file is missing */
diff --git a/storage/innobase/include/fil0fil.h b/storage/innobase/include/fil0fil.h
index 533f595c852..ff6ece8a360 100644
--- a/storage/innobase/include/fil0fil.h
+++ b/storage/innobase/include/fil0fil.h
@@ -1170,7 +1170,7 @@ private:
inline bool fil_space_t::use_doublewrite() const
{
return !UT_LIST_GET_FIRST(chain)->atomic_write && srv_use_doublewrite_buf &&
- buf_dblwr.is_initialised();
+ buf_dblwr.is_created();
}
inline void fil_space_t::set_imported()
diff --git a/storage/innobase/include/srv0srv.h b/storage/innobase/include/srv0srv.h
index 9807d9cd9a4..90d3a21f761 100644
--- a/storage/innobase/include/srv0srv.h
+++ b/storage/innobase/include/srv0srv.h
@@ -108,10 +108,6 @@ struct srv_stats_t
/** Store the number of write requests issued */
ulint_ctr_1_t buf_pool_write_requests;
- /** Number of buffer pool reads that led to the reading of
- a disk page */
- ulint_ctr_1_t buf_pool_reads;
-
/** Number of bytes saved by page compression */
ulint_ctr_n_t page_compression_saved;
/* Number of pages compressed with page compression */
@@ -670,24 +666,12 @@ struct export_var_t{
char innodb_buffer_pool_resize_status[512];/*!< Buf pool resize status */
my_bool innodb_buffer_pool_load_incomplete;/*!< Buf pool load incomplete */
ulint innodb_buffer_pool_pages_total; /*!< Buffer pool size */
- ulint innodb_buffer_pool_pages_data; /*!< Data pages */
ulint innodb_buffer_pool_bytes_data; /*!< File bytes used */
- ulint innodb_buffer_pool_pages_dirty; /*!< Dirty data pages */
- ulint innodb_buffer_pool_bytes_dirty; /*!< File bytes modified */
ulint innodb_buffer_pool_pages_misc; /*!< Miscellanous pages */
- ulint innodb_buffer_pool_pages_free; /*!< Free pages */
#ifdef UNIV_DEBUG
ulint innodb_buffer_pool_pages_latched; /*!< Latched pages */
#endif /* UNIV_DEBUG */
- ulint innodb_buffer_pool_pages_made_not_young;
- ulint innodb_buffer_pool_pages_made_young;
- ulint innodb_buffer_pool_pages_old;
- ulint innodb_buffer_pool_read_requests; /*!< buf_pool.stat.n_page_gets */
- ulint innodb_buffer_pool_reads; /*!< srv_buf_pool_reads */
ulint innodb_buffer_pool_write_requests;/*!< srv_stats.buf_pool_write_requests */
- ulint innodb_buffer_pool_read_ahead_rnd;/*!< srv_read_ahead_rnd */
- ulint innodb_buffer_pool_read_ahead; /*!< srv_read_ahead */
- ulint innodb_buffer_pool_read_ahead_evicted;/*!< srv_read_ahead evicted*/
ulint innodb_checkpoint_age;
ulint innodb_checkpoint_max_age;
ulint innodb_data_pending_reads; /*!< Pending reads */
diff --git a/storage/innobase/log/log0log.cc b/storage/innobase/log/log0log.cc
index 70f561280d9..c53e2fd5074 100644
--- a/storage/innobase/log/log0log.cc
+++ b/storage/innobase/log/log0log.cc
@@ -1173,14 +1173,6 @@ wait_suspend_loop:
if (!buf_pool.is_initialised()) {
ut_ad(!srv_was_started);
- } else if (ulint pending_io = buf_pool.io_pending()) {
- if (srv_print_verbose_log && count > 600) {
- ib::info() << "Waiting for " << pending_io << " buffer"
- " page I/Os to complete";
- count = 0;
- }
-
- goto loop;
} else {
buf_flush_buffer_pool();
}
diff --git a/storage/innobase/srv/srv0mon.cc b/storage/innobase/srv/srv0mon.cc
index 60fef24d183..b6496d03908 100644
--- a/storage/innobase/srv/srv0mon.cc
+++ b/storage/innobase/srv/srv0mon.cc
@@ -909,7 +909,7 @@ static monitor_info_t innodb_counter_info[] =
MONITOR_DEFAULT_START, MONITOR_MODULE_INDEX},
{"index_page_splits", "index", "Number of index page splits",
- MONITOR_NONE,
+ MONITOR_EXISTING,
MONITOR_DEFAULT_START, MONITOR_INDEX_SPLIT},
{"index_page_merge_attempts", "index",
@@ -1411,10 +1411,12 @@ srv_mon_process_existing_counter(
/* Get the value from corresponding global variable */
switch (monitor_id) {
- /* export_vars.innodb_buffer_pool_reads. Num Reads from
- disk (page not in buffer) */
+ case MONITOR_INDEX_SPLIT:
+ value = buf_pool.pages_split;
+ break;
+
case MONITOR_OVLD_BUF_POOL_READS:
- value = srv_stats.buf_pool_reads;
+ value = buf_pool.stat.n_pages_read;
break;
/* innodb_buffer_pool_read_requests, the number of logical
@@ -1475,7 +1477,7 @@ srv_mon_process_existing_counter(
/* innodb_buffer_pool_bytes_dirty */
case MONITOR_OVLD_BUF_POOL_BYTES_DIRTY:
- value = buf_pool.stat.flush_list_bytes;
+ value = buf_pool.flush_list_bytes;
break;
/* innodb_buffer_pool_pages_free */
diff --git a/storage/innobase/srv/srv0srv.cc b/storage/innobase/srv/srv0srv.cc
index c16868b5cf5..2e9f5a0eff8 100644
--- a/storage/innobase/srv/srv0srv.cc
+++ b/storage/innobase/srv/srv0srv.cc
@@ -675,6 +675,7 @@ void srv_boot()
if (transactional_lock_enabled())
sql_print_information("InnoDB: Using transactional memory");
#endif
+ buf_dblwr.init();
srv_thread_pool_init();
trx_pool_init();
srv_init();
@@ -1001,59 +1002,22 @@ srv_export_innodb_status(void)
export_vars.innodb_data_writes = os_n_file_writes;
- ulint dblwr = 0;
-
- if (buf_dblwr.is_initialised()) {
- buf_dblwr.lock();
- dblwr = buf_dblwr.submitted();
- export_vars.innodb_dblwr_pages_written = buf_dblwr.written();
- export_vars.innodb_dblwr_writes = buf_dblwr.batches();
- buf_dblwr.unlock();
- }
+ buf_dblwr.lock();
+ ulint dblwr = buf_dblwr.submitted();
+ export_vars.innodb_dblwr_pages_written = buf_dblwr.written();
+ export_vars.innodb_dblwr_writes = buf_dblwr.batches();
+ buf_dblwr.unlock();
export_vars.innodb_data_written = srv_stats.data_written + dblwr;
- export_vars.innodb_buffer_pool_read_requests
- = buf_pool.stat.n_page_gets;
-
export_vars.innodb_buffer_pool_write_requests =
srv_stats.buf_pool_write_requests;
- export_vars.innodb_buffer_pool_reads = srv_stats.buf_pool_reads;
-
- export_vars.innodb_buffer_pool_read_ahead_rnd =
- buf_pool.stat.n_ra_pages_read_rnd;
-
- export_vars.innodb_buffer_pool_read_ahead =
- buf_pool.stat.n_ra_pages_read;
-
- export_vars.innodb_buffer_pool_read_ahead_evicted =
- buf_pool.stat.n_ra_pages_evicted;
-
- export_vars.innodb_buffer_pool_pages_data =
- UT_LIST_GET_LEN(buf_pool.LRU);
-
export_vars.innodb_buffer_pool_bytes_data =
buf_pool.stat.LRU_bytes
+ (UT_LIST_GET_LEN(buf_pool.unzip_LRU)
<< srv_page_size_shift);
- export_vars.innodb_buffer_pool_pages_dirty =
- UT_LIST_GET_LEN(buf_pool.flush_list);
-
- export_vars.innodb_buffer_pool_pages_made_young
- = buf_pool.stat.n_pages_made_young;
- export_vars.innodb_buffer_pool_pages_made_not_young
- = buf_pool.stat.n_pages_not_made_young;
-
- export_vars.innodb_buffer_pool_pages_old = buf_pool.LRU_old_len;
-
- export_vars.innodb_buffer_pool_bytes_dirty =
- buf_pool.stat.flush_list_bytes;
-
- export_vars.innodb_buffer_pool_pages_free =
- UT_LIST_GET_LEN(buf_pool.free);
-
#ifdef UNIV_DEBUG
export_vars.innodb_buffer_pool_pages_latched =
buf_get_latched_pages_number();
diff --git a/storage/innobase/srv/srv0start.cc b/storage/innobase/srv/srv0start.cc
index b0adc15300c..a881ae0ad6a 100644
--- a/storage/innobase/srv/srv0start.cc
+++ b/storage/innobase/srv/srv0start.cc
@@ -1997,7 +1997,7 @@ void innodb_shutdown()
ut_ad(dict_sys.is_initialised() || !srv_was_started);
ut_ad(trx_sys.is_initialised() || !srv_was_started);
- ut_ad(buf_dblwr.is_initialised() || !srv_was_started
+ ut_ad(buf_dblwr.is_created() || !srv_was_started
|| srv_read_only_mode
|| srv_force_recovery >= SRV_FORCE_NO_TRX_UNDO);
ut_ad(lock_sys.is_initialised() || !srv_was_started);
diff --git a/storage/rocksdb/mysql-test/rocksdb/r/innodb_i_s_tables_disabled.result b/storage/rocksdb/mysql-test/rocksdb/r/innodb_i_s_tables_disabled.result
index 1b3b43c0304..d3f0ee3bcd9 100644
--- a/storage/rocksdb/mysql-test/rocksdb/r/innodb_i_s_tables_disabled.result
+++ b/storage/rocksdb/mysql-test/rocksdb/r/innodb_i_s_tables_disabled.result
@@ -181,7 +181,7 @@ compress_pages_page_decompressed compression 0 NULL NULL NULL 0 NULL NULL NULL N
compress_pages_page_compression_error compression 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of page compression errors
compress_pages_encrypted compression 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of pages encrypted
compress_pages_decrypted compression 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of pages decrypted
-index_page_splits index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of index page splits
+index_page_splits index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 status_counter Number of index page splits
index_page_merge_attempts index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of index page merge attempts
index_page_merge_successful index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of successful index page merges
index_page_reorg_attempts index 0 NULL NULL NULL 0 NULL NULL NULL NULL NULL NULL NULL 0 counter Number of index page reorganization attempts