summaryrefslogtreecommitdiff
path: root/storage/innobase/ibuf
Commit message (Collapse)AuthorAgeFilesLines
...
| | * | Merge 10.2 into 10.3Marko Mäkelä2020-11-121-81/+44
| | |\ \ | | | |/
| | | * MDEV-24182 ibuf_merge_or_delete_for_page() contains dead codeMarko Mäkelä2020-11-111-79/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function ibuf_merge_or_delete_for_page() was always being invoked with update_ibuf_bitmap=true ever since commit cd623508dff53c210154392da6c0f65b7b6bcf4c fixed up something after MDEV-9566. Furthermore, the parameter page_size is never being passed as a null pointer, and therefore it should better be a reference to a constant object.
| | * | Merge 10.2 into 10.3Marko Mäkelä2020-10-281-1/+1
| | |\ \ | | | |/
| | | * MDEV-23991 dict_table_stats_lock() has unnecessarily long scopeEugene Kosov2020-10-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch removes dict_index_t::stats_latch. Table/index statistics now protected with dict_sys->mutex. That way statistics computation can happen in parallel in several threads and dict_sys->mutex will be locked only for a short period of time. This patch is a joint work with Marko Mäkelä dict_index_t::lock: make mutable which allows to pass const pointer when only lock is touched in an object btr_height_get() btr_get_size(): make index argument const for better type safety btr_estimate_number_of_different_key_vals(): now returns computed values instead of setting fields in dict_index_t directly remove everything related to dict_index_t::stats_latch dict_stats_index_set_n_diff(): now returns computed values instead of setting fields in dict_index_t directly dict_stats_analyze_index(): now returns computed values instead of setting fields in dict_index_t directly Reviewed by: Marko Mäkelä
| | * | Merge 10.2 into 10.3Marko Mäkelä2020-10-281-1/+1
| | |\ \ | | | |/
| | | * MDEV-23370 innodb_fts.innodb_fts_misc failed in buildbot, server crashed in ↵bb-10.2-MDEV-23370Thirunarayanan Balathandayuthapani2020-10-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dict_table_autoinc_destroy This issue is caused by MDEV-22456 ad6171b91cac33e70bb28fa6865488b2c65e858c. Fix involves the backported version of 10.4 patch MDEV-22778 5f2628d1eea21d9732f582b77782b072e5e04014 and few parts of MDEV-17441 (e9a5f288f21c15ec6b4d2dd3d654a320904bb1bf). dict_table_t::stats_latch_created: Removed dict_table_t::stats_latch: make value member and always lock it for simplicity even for stats cloned table. zip_pad_info_t::mutex_created: Removed zip_pad_info_t::mutex: make member value instead of pointer os0once.h: Removed dict_table_remove_from_cache_low(): Ensure that fts_free() is always called, even if dict_mem_table_free() is deferred until btr_search_lazy_free(). InnoDB would always zip_pad_info_t::mutex and dict_table_t::autoinc_mutex, even for tables are not in ROW_FORMAT=COMPRESSED nor include any AUTO_INCREMENT column.
* | | | Merge 10.4 into 10.5Marko Mäkelä2020-10-301-1/+1
|\ \ \ \ | |/ / /
| * | | Merge 10.3 into 10.4Marko Mäkelä2020-10-011-1/+4
| |\ \ \ | | |/ /
| | * | Merge 10.2 into 10.3Marko Mäkelä2020-09-301-1/+4
| | |\ \ | | | |/
| | | * MDEV-23839 innodb_fast_shutdown=0 hang on change buffer mergeMarko Mäkelä2020-09-291-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ibuf_merge_or_delete_for_page(): Do not attempt to invoke ibuf_delete_recs() on a page of the change buffer itself. The caller could already be holding ibuf->index->lock, and an attempt to acquire it in S mode would hang the release server or cause an assertion failure in rw_lock_s_lock_func() in a debug server. This problem was reproducible on 1 out of 2 runs of the following: ./mtr --no-reorder \ innodb.innodb-page_compression_default \ innodb.innodb-page_compression_snappy \ innodb.innodb-page_compression_zip \ innodb.innodb_wl6326_big innodb.xa_recovery
| | | * Merge 10.1 into 10.2Marko Mäkelä2020-07-201-2/+2
| | | |\
| | | | * MDEV-23190 InnoDB data file extension is not crash-safeMarko Mäkelä2020-07-201-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When InnoDB is extending a data file, it is updating the FSP_SIZE field in the first page of the data file. In commit 8451e09073e8b1a300f177d74a9e3a530776640a (MDEV-11556) we removed a work-around for this bug and made recovery stricter, by making it track changes to FSP_SIZE via redo log records, and extend the data files before any changes are being applied to them. It turns out that the function fsp_fill_free_list() is not crash-safe with respect to this when it is initializing the change buffer bitmap page (page 1, or generally, N*innodb_page_size+1). It uses a separate mini-transaction that is committed (and will be written to the redo log file) before the mini-transaction that actually extended the data file. Hence, recovery can observe a reference to a page that is beyond the current end of the data file. fsp_fill_free_list(): Initialize the change buffer bitmap page in the same mini-transaction. The rest of the changes are fixing a bug that the use of the separate mini-transaction was attempting to work around. Namely, we must ensure that no other thread will access the change buffer bitmap page before our mini-transaction has been committed and all page latches have been released. That is, for read-ahead as well as neighbour flushing, we must avoid accessing pages that might not yet be durably part of the tablespace. fil_space_t::committed_size: The size of the tablespace as persisted by mtr_commit(). fil_space_t::max_page_number_for_io(): Limit the highest page number for I/O batches to committed_size. MTR_MEMO_SPACE_X_LOCK: Replaces MTR_MEMO_X_LOCK for fil_space_t::latch. mtr_x_space_lock(): Replaces mtr_x_lock() for fil_space_t::latch. mtr_memo_slot_release_func(): When releasing MTR_MEMO_SPACE_X_LOCK, copy space->size to space->committed_size. In this way, read-ahead or flushing will never be invoked on pages that do not yet exist according to FSP_SIZE.
* | | | | MDEV-23855: Shrink fil_space_tMarko Mäkelä2020-10-261-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge n_pending_ios, n_pending_ops to std::atomic<uint32_t> n_pending. Change some more fil_space_t members to uint32_t to reduce the memory footprint. fil_space_t::add(), fil_ibd_create(): Attach the already opened handle to the tablespace, and enforce the fil_system.n_open limit. dict_boot(): Initialize fil_system.max_assigned_id. srv_boot(): Call srv_thread_pool_init() before anything else, so that files should be opened in the correct mode on Windows. fil_ibd_create(): Create the file in OS_FILE_AIO mode, just like fil_node_open_file_low() does it. dict_table_t::is_accessible(): Replaces fil_table_accessible(). Reviewed by: Vladislav Vaintroub
* | | | | MDEV-23855: Remove fil_system.LRU and reduce fil_system.mutex contentionMarko Mäkelä2020-10-261-19/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also fixes MDEV-23929: innodb_flush_neighbors is not being ignored for system tablespace on SSD When the maximum configured number of file is exceeded, InnoDB will close data files. We used to maintain a fil_system.LRU list and a counter fil_node_t::n_pending to achieve this, at the huge cost of multiple fil_system.mutex operations per I/O operation. fil_node_open_file_low(): Implement a FIFO replacement policy: The last opened file will be moved to the end of fil_system.space_list, and files will be closed from the start of the list. However, we will not move tablespaces in fil_system.space_list while i_s_tablespaces_encryption_fill_table() is executing (producing output for INFORMATION_SCHEMA.INNODB_TABLESPACES_ENCRYPTION) because it may cause information of some tablespaces to go missing. We also avoid this in mariabackup --backup because datafiles_iter_next() assumes that the ordering is not changed. IORequest: Fold more parameters to IORequest::type. fil_space_t::io(): Replaces fil_io(). fil_space_t::flush(): Replaces fil_flush(). OS_AIO_IBUF: Remove. We will always issue synchronous reads of the change buffer pages in buf_read_page_low(). We will always ignore some errors for background reads. This should reduce fil_system.mutex contention a little. fil_node_t::complete_write(): Replaces fil_node_t::complete_io(). On both read and write completion, fil_space_t::release_for_io() will have to be called. fil_space_t::io(): Do not acquire fil_system.mutex in the normal code path. xb_delta_open_matching_space(): Do not try to open the system tablespace which was already opened. This fixes a file sharing violation in mariabackup --prepare --incremental. Reviewed by: Vladislav Vaintroub
* | | | | Cleanup: Make InnoDB page numbers uint32_tMarko Mäkelä2020-10-151-86/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | InnoDB stores a 32-bit page number in page headers and in some data structures, such as FIL_ADDR (consisting of a 32-bit page number and a 16-bit byte offset within a page). For better compile-time error detection and to reduce the memory footprint in some data structures, let us use a uint32_t for the page number, instead of ulint (size_t) which can be 64 bits.
* | | | | Cleanup: Compare page_id_t directlyMarko Mäkelä2020-10-151-2/+1
| | | | |
* | | | | MDEV-23719: Make lock_sys use page_id_tMarko Mäkelä2020-09-171-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 8ccb3caafb7cba0fca12e89c5c9b67a740364fdd it should be more efficient to use page_id_t rather than two separate variables for tablespace identifier and page number. lock_rec_fold(): Replaced with page_id_t::fold(). lock_rec_hash(): Replaced with lock_sys.hash(page_id). lock_rec_expl_exist_on_page(), lock_rec_get_first_on_page_addr(), lock_rec_get_first_on_page(): Replaced with lock_sys.get_first().
* | | | | MDEV-22930 Unnecessary contention on rw_lock_list_mutex in ↵Eugene Kosov2020-07-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ibuf_dummy_index_create() 1. Do not initialize dict_table_t::stats_latch in ibuf 2. Remove overengineering in GenericPolicy to speed up things dict_mem_table_create(): add new argument init_stats_latch ibuf_dummy_index_create(): do not initialize dict_table_t::stats_latch GenericPolicy: add new members m_filename and m_line sync_file_create_register() sync_file_created_deregister() sync_file_created_get() CreateTracker: remove rw_lock_t::created: a new debug member
* | | | | MDEV-22877 Avoid unnecessary buf_pool.page_hash S-latch acquisitionMarko Mäkelä2020-06-121-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MDEV-15053 did not remove all unnecessary buf_pool.page_hash S-latch acquisition. There are code paths where we are holding buf_pool.mutex (which will sufficiently protect buf_pool.page_hash against changes) and unnecessarily acquire the latch. Many invocations of buf_page_hash_get_locked() can be replaced with the much simpler buf_pool.page_hash_get_low(). In the worst case the thread that is holding buf_pool.mutex will become a victim of MDEV-22871, suffering from a spurious reader-reader conflict with another thread that genuinely needs to acquire a buf_pool.page_hash S-latch. In many places, we were also evaluating page_id_t::fold() while holding buf_pool.mutex. Low-level functions such as buf_pool.page_hash_get_low() must get the page_id_t::fold() as a parameter. buf_buddy_relocate(): Defer the hash_lock acquisition to the critical section that starts by calling buf_page_t::can_relocate().
* | | | | MDEV-22110 preparation: Remove mtr_memo_contains macrosMarko Mäkelä2020-06-101-22/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let us invoke the debug member functions of mtr_t directly. mtr_t::memo_contains(): Change the parameter type to const rw_lock_t&. This function cannot be invoked on buf_block_t::lock. The function mtr_t::memo_contains_flagged() is intended to be invoked on buf_block_t* or rw_lock_t*, and it along with mtr_t::memo_contains_page_flagged() are the way to check whether a buffer pool page has been latched within a mini-transaction.
* | | | | Merge 10.4 into 10.5Marko Mäkelä2020-06-051-1/+1
|\ \ \ \ \ | |/ / / /
| * | | | Merge 10.3 into 10.4Marko Mäkelä2020-06-051-1/+1
| |\ \ \ \ | | |/ / /
| | * | | Merge 10.2 into 10.3Marko Mäkelä2020-06-051-1/+1
| | |\ \ \ | | | |/ /
| | | * | MDEV-22769 Shutdown hang or crash due to XA breaking locksMarko Mäkelä2020-06-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The background drop table queue in InnoDB is a work-around for cases where the SQL layer is requesting DDL on tables on which transactional locks exist. One such case are XA transactions. Our test case exploits the fact that the recovery of XA PREPARE transactions will only resurrect InnoDB table locks, but not MDL that should block any concurrent DDL. srv_shutdown_t: Introduce the srv_shutdown_state=SRV_SHUTDOWN_INITIATED for the initial part of shutdown, to wait for the background drop table queue to be emptied. srv_shutdown_bg_undo_sources(): Assign srv_shutdown_state=SRV_SHUTDOWN_INITIATED before waiting for the background drop table queue to be emptied. row_drop_tables_for_mysql_in_background(): On slow shutdown, if no active transactions exist (excluding ones that are in XA PREPARE state), skip any tables on which locks exist. row_drop_table_for_mysql(): Do not unnecessarily attempt to drop InnoDB persistent statistics for tables that have already been added to the background drop table queue. row_mysql_close(): Relax an assertion, and free all memory even if innodb_force_recovery=2 would prevent the background drop table queue from being emptied.
* | | | | MDEV-15053 Reduce buf_pool_t::mutex contentionMarko Mäkelä2020-06-051-37/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | User-visible changes: The INFORMATION_SCHEMA views INNODB_BUFFER_PAGE and INNODB_BUFFER_PAGE_LRU will report a dummy value FLUSH_TYPE=0 and will no longer report the PAGE_STATE value READY_FOR_USE. We will remove some fields from buf_page_t and move much code to member functions of buf_pool_t and buf_page_t, so that the access rules of data members can be enforced consistently. Evicting or adding pages in buf_pool.LRU will remain covered by buf_pool.mutex. Evicting or adding pages in buf_pool.page_hash will remain covered by both buf_pool.mutex and the buf_pool.page_hash X-latch. After this fix, buf_pool.page_hash lookups can entirely avoid acquiring buf_pool.mutex, only relying on buf_pool.hash_lock_get() S-latch. Similarly, buf_flush_check_neighbors() can will rely solely on buf_pool.mutex, no buf_pool.page_hash latch at all. The buf_pool.mutex is rather contended in I/O heavy benchmarks, especially when the workload does not fit in the buffer pool. The first attempt to alleviate the contention was the buf_pool_t::mutex split in commit 4ed7082eefe56b3e97e0edefb3df76dd7ef5e858 which introduced buf_block_t::mutex, which we are now removing. Later, multiple instances of buf_pool_t were introduced in commit c18084f71b02ea707c6461353e6cfc15d7553bc6 and recently removed by us in commit 1a6f708ec594ac0ae2dd30db926ab07b100fa24b (MDEV-15058). UNIV_BUF_DEBUG: Remove. This option to enable some buffer pool related debugging in otherwise non-debug builds has not been used for years. Instead, we have been using UNIV_DEBUG, which is enabled in CMAKE_BUILD_TYPE=Debug. buf_block_t::mutex, buf_pool_t::zip_mutex: Remove. We can mainly rely on std::atomic and the buf_pool.page_hash latches, and in some cases depend on buf_pool.mutex or buf_pool.flush_list_mutex just like before. We must always release buf_block_t::lock before invoking unfix() or io_unfix(), to prevent a glitch where a block that was added to the buf_pool.free list would apper X-latched. See commit c5883debd6ef440a037011c11873b396923e93c5 how this glitch was finally caught in a debug environment. We move some buf_pool_t::page_hash specific code from the ha and hash modules to buf_pool, for improved readability. buf_pool_t::close(): Assert that all blocks are clean, except on aborted startup or crash-like shutdown. buf_pool_t::validate(): No longer attempt to validate n_flush[] against the number of BUF_IO_WRITE fixed blocks, because buf_page_t::flush_type no longer exists. buf_pool_t::watch_set(): Replaces buf_pool_watch_set(). Reduce mutex contention by separating the buf_pool.watch[] allocation and the insert into buf_pool.page_hash. buf_pool_t::page_hash_lock<bool exclusive>(): Acquire a buf_pool.page_hash latch. Replaces and extends buf_page_hash_lock_s_confirm() and buf_page_hash_lock_x_confirm(). buf_pool_t::READ_AHEAD_PAGES: Renamed from BUF_READ_AHEAD_PAGES. buf_pool_t::curr_size, old_size, read_ahead_area, n_pend_reads: Use Atomic_counter. buf_pool_t::running_out(): Replaces buf_LRU_buf_pool_running_out(). buf_pool_t::LRU_remove(): Remove a block from the LRU list and return its predecessor. Incorporates buf_LRU_adjust_hp(), which was removed. buf_page_get_gen(): Remove a redundant call of fsp_is_system_temporary(), for mode == BUF_GET_IF_IN_POOL_OR_WATCH, which is only used by BTR_DELETE_OP (purge), which is never invoked on temporary tables. buf_free_from_unzip_LRU_list_batch(): Avoid redundant assignments. buf_LRU_free_from_unzip_LRU_list(): Simplify the loop condition. buf_LRU_free_page(): Clarify the function comment. buf_flush_check_neighbor(), buf_flush_check_neighbors(): Rewrite the construction of the page hash range. We will hold the buf_pool.mutex for up to buf_pool.read_ahead_area (at most 64) consecutive lookups of buf_pool.page_hash. buf_flush_page_and_try_neighbors(): Remove. Merge to its only callers, and remove redundant operations in buf_flush_LRU_list_batch(). buf_read_ahead_random(), buf_read_ahead_linear(): Rewrite. Do not acquire buf_pool.mutex, and iterate directly with page_id_t. ut_2_power_up(): Remove. my_round_up_to_next_power() is inlined and avoids any loops. fil_page_get_prev(), fil_page_get_next(), fil_addr_is_null(): Remove. buf_flush_page(): Add a fil_space_t* parameter. Minimize the buf_pool.mutex hold time. buf_pool.n_flush[] is no longer updated atomically with the io_fix, and we will protect most buf_block_t fields with buf_block_t::lock. The function buf_flush_write_block_low() is removed and merged here. buf_page_init_for_read(): Use static linkage. Initialize the newly allocated block and acquire the exclusive buf_block_t::lock while not holding any mutex. IORequest::IORequest(): Remove the body. We only need to invoke set_punch_hole() in buf_flush_page() and nowhere else. buf_page_t::flush_type: Remove. Replaced by IORequest::flush_type. This field is only used during a fil_io() call. That function already takes IORequest as a parameter, so we had better introduce for the rarely changing field. buf_block_t::init(): Replaces buf_page_init(). buf_page_t::init(): Replaces buf_page_init_low(). buf_block_t::initialise(): Initialise many fields, but keep the buf_page_t::state(). Both buf_pool_t::validate() and buf_page_optimistic_get() requires that buf_page_t::in_file() be protected atomically with buf_page_t::in_page_hash and buf_page_t::in_LRU_list. buf_page_optimistic_get(): Now that buf_block_t::mutex no longer exists, we must check buf_page_t::io_fix() after acquiring the buf_pool.page_hash lock, to detect whether buf_page_init_for_read() has been initiated. We will also check the io_fix() before acquiring hash_lock in order to avoid unnecessary computation. The field buf_block_t::modify_clock (protected by buf_block_t::lock) allows buf_page_optimistic_get() to validate the block. buf_page_t::real_size: Remove. It was only used while flushing pages of page_compressed tables. buf_page_encrypt(): Add an output parameter that allows us ot eliminate buf_page_t::real_size. Replace a condition with debug assertion. buf_page_should_punch_hole(): Remove. buf_dblwr_t::add_to_batch(): Replaces buf_dblwr_add_to_batch(). Add the parameter size (to replace buf_page_t::real_size). buf_dblwr_t::write_single_page(): Replaces buf_dblwr_write_single_page(). Add the parameter size (to replace buf_page_t::real_size). fil_system_t::detach(): Replaces fil_space_detach(). Ensure that fil_validate() will not be violated even if fil_system.mutex is released and reacquired. fil_node_t::complete_io(): Renamed from fil_node_complete_io(). fil_node_t::close_to_free(): Replaces fil_node_close_to_free(). Avoid invoking fil_node_t::close() because fil_system.n_open has already been decremented in fil_space_t::detach(). BUF_BLOCK_READY_FOR_USE: Remove. Directly use BUF_BLOCK_MEMORY. BUF_BLOCK_ZIP_DIRTY: Remove. Directly use BUF_BLOCK_ZIP_PAGE, and distinguish dirty pages by buf_page_t::oldest_modification(). BUF_BLOCK_POOL_WATCH: Remove. Use BUF_BLOCK_NOT_USED instead. This state was only being used for buf_page_t that are in buf_pool.watch. buf_pool_t::watch[]: Remove pointer indirection. buf_page_t::in_flush_list: Remove. It was set if and only if buf_page_t::oldest_modification() is nonzero. buf_page_decrypt_after_read(), buf_corrupt_page_release(), buf_page_check_corrupt(): Change the const fil_space_t* parameter to const fil_node_t& so that we can report the correct file name. buf_page_monitor(): Declare as an ATTRIBUTE_COLD global function. buf_page_io_complete(): Split to buf_page_read_complete() and buf_page_write_complete(). buf_dblwr_t::in_use: Remove. buf_dblwr_t::buf_block_array: Add IORequest::flush_t. buf_dblwr_sync_datafiles(): Remove. It was a useless wrapper of os_aio_wait_until_no_pending_writes(). buf_flush_write_complete(): Declare static, not global. Add the parameter IORequest::flush_t. buf_flush_freed_page(): Simplify the code. recv_sys_t::flush_lru: Renamed from flush_type and changed to bool. fil_read(), fil_write(): Replaced with direct use of fil_io(). fil_buffering_disabled(): Remove. Check srv_file_flush_method directly. fil_mutex_enter_and_prepare_for_io(): Return the resolved fil_space_t* to avoid a duplicated lookup in the caller. fil_report_invalid_page_access(): Clean up the parameters. fil_io(): Return fil_io_t, which comprises fil_node_t and error code. Always invoke fil_space_t::acquire_for_io() and let either the sync=true caller or fil_aio_callback() invoke fil_space_t::release_for_io(). fil_aio_callback(): Rewrite to replace buf_page_io_complete(). fil_check_pending_operations(): Remove a parameter, and remove some redundant lookups. fil_node_close_to_free(): Wait for n_pending==0. Because we no longer do an extra lookup of the tablespace between fil_io() and the completion of the operation, we must give fil_node_t::complete_io() a chance to decrement the counter. fil_close_tablespace(): Remove unused parameter trx, and document that this is only invoked during the error handling of IMPORT TABLESPACE. row_import_discard_changes(): Merged with the only caller, row_import_cleanup(). Do not lock up the data dictionary while invoking fil_close_tablespace(). logs_empty_and_mark_files_at_shutdown(): Do not invoke fil_close_all_files(), to avoid a !needs_flush assertion failure on fil_node_t::close(). innodb_shutdown(): Invoke os_aio_free() before fil_close_all_files(). fil_close_all_files(): Invoke fil_flush_file_spaces() to ensure proper durability. thread_pool::unbind(): Fix a crash that would occur on Windows after srv_thread_pool->disable_aio() and os_file_close(). This fix was submitted by Vladislav Vaintroub. Thanks to Matthias Leich and Axel Schwenke for extensive testing, Vladislav Vaintroub for helpful comments, and Eugene Kosov for a review.
* | | | | Merge 10.4 into 10.5Marko Mäkelä2020-05-181-1/+1
|\ \ \ \ \ | |/ / / /
| * | | | Merge 10.3 into 10.4Marko Mäkelä2020-05-161-1/+1
| |\ \ \ \ | | |/ / / | | | | | | | | | | | | | | | We will expose some more std::atomic internals in Atomic_counter, so that dict_index_t::lock will support the default assignment operator.
| | * | | Merge 10.2 into 10.3Marko Mäkelä2020-05-151-1/+1
| | |\ \ \ | | | |/ /
| | | * | MDEV-22456 Dropping the adaptive hash index may cause DDL to lock up InnoDBMarko Mäkelä2020-05-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the InnoDB buffer pool contains many pages for a table or index that is being dropped or rebuilt, and if many of such pages are pointed to by the adaptive hash index, dropping the adaptive hash index may consume a lot of time. The time-consuming operation of dropping the adaptive hash index entries is being executed while the InnoDB data dictionary cache dict_sys is exclusively locked. It is not actually necessary to drop all adaptive hash index entries at the time a table or index is being dropped or rebuilt. We can let the LRU replacement policy of the buffer pool take care of this gradually. For this to work, we must detach the dict_table_t and dict_index_t objects from the main dict_sys cache, and once the last adaptive hash index entry for the detached table is removed (when the garbage page is evicted from the buffer pool) we can free the dict_table_t and dict_index_t object. Related to this, in MDEV-16283, we made ALTER TABLE...DISCARD TABLESPACE skip both the buffer pool eviction and the drop of the adaptive hash index. We shifted the burden to ALTER TABLE...IMPORT TABLESPACE or DROP TABLE. We can remove the eviction from DROP TABLE. We must retain the eviction in the ALTER TABLE...IMPORT TABLESPACE code path, so that in case the discarded table is being re-imported with the same tablespace identifier, the fresh data from the imported tablespace will replace any stale pages in the buffer pool. rpl.rpl_failed_drop_tbl_binlog: Remove the test. DROP TABLE can no longer be interrupted inside InnoDB. fseg_free_page(), fseg_free_step(), fseg_free_step_not_header(), fseg_free_page_low(), fseg_free_extent(): Remove the parameter that specifies whether the adaptive hash index should be dropped. btr_search_lazy_free(): Lazily free an index when the last reference to it is dropped from the adaptive hash index. buf_pool_clear_hash_index(): Declare static, and move to the same compilation unit with the bulk of the adaptive hash index code. dict_index_t::clone(), dict_index_t::clone_if_needed(): Clone an index that is being rebuilt while adaptive hash index entries exist. The original index will be inserted into dict_table_t::freed_indexes and dict_index_t::set_freed() will be called. dict_index_t::set_freed(), dict_index_t::freed(): Note that or check whether the index has been freed. We will use the impossible page number 1 to denote this condition. dict_index_t::n_ahi_pages(): Replaces btr_search_info_get_ref_count(). dict_index_t::detach_columns(): Move the assignment n_fields=0 to ha_innobase_inplace_ctx::clear_added_indexes(). We must have access to the columns when freeing the adaptive hash index. Note: dict_table_t::v_cols[] will remain valid. If virtual columns are dropped or added, the table definition will be reloaded in ha_innobase::commit_inplace_alter_table(). buf_page_mtr_lock(): Drop a stale adaptive hash index if needed. We will also reduce the number of btr_get_search_latch() calls and enclose some more code inside #ifdef BTR_CUR_HASH_ADAPT in order to benefit cmake -DWITH_INNODB_AHI=OFF.
* | | | | Merge 10.4 into 10.5Marko Mäkelä2020-05-131-40/+22
|\ \ \ \ \ | |/ / / /
| * | | | Merge 10.3 into 10.4Marko Mäkelä2020-05-131-40/+22
| |\ \ \ \ | | |/ / /
| | * | | Merge 10.2 into 10.3Marko Mäkelä2020-05-131-41/+23
| | |\ \ \ | | | |/ /
| | | * | Merge 10.1 into 10.2Marko Mäkelä2020-05-131-41/+23
| | | |\ \ | | | | |/
| | | | * MDEV-22497 [ERROR] InnoDB: Unable to purge a recordMarko Mäkelä2020-05-071-41/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The InnoDB insert buffer was upgraded in MySQL 5.5 into a change buffer that also covers delete-mark and delete (purge) operations. There is an important constraint for delete operations: a B-tree leaf page must not become empty unless the entire tree becomes empty, consisting of an empty root page. Because change buffer merges only occur on a single leaf page at a time, delete operations must not be buffered if it is possible that the last record of the page could be deleted. (In that case, we would refuse to use the change buffer, and if we really delete the last record, we would shrink the index tree.) The function ibuf_get_volume_buffered_hash() is part of our insurance that the page would not become empty. It is supposed to map each buffered INSERT or DELETE_MARK record payload into a hash value. We will only count each such record as a distinct key if there is no hash collision. DELETE operations will always decrement the predicted number fo records in the page. Due to a bug in the function, we would actually compute the hash value not only on the record payload, but also on some following bytes, in case the record contains NULL values. In MySQL Bug #61104, we had some examples of this dating back to 2012. But back then, we failed to reproduce the bug, and in commit d84c95579ba1eca2f9bf5b0be9f14040e4441227 we simply demoted the hard assertion to a message printout and a debug assertion failure. ibuf_get_volume_buffered_hash(): Correctly compute the hash value of the payload bytes only. Note: we will consider ('foo','bar'),(NULL,'foobar'),('foob','ar') to be equal, but this is not a problem, because in case of a hash collision, we could also consider ('boo','far') to be equal, and underestimate the number of records in the page, leading to refusing to buffer a DELETE.
* | | | | MDEV-22495 Assertion ...status != buf_page_t::FREED in ibuf_read_merge_pages()Marko Mäkelä2020-05-071-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ibuf_read_merge_pages(): Request a possibly freed page. The change buffer is discarded lazily for freed pages either by this function or when buf_page_create() reuses a page. buf_page_get_low(): Relax a debug assertion. Do not attempt change buffer merge on freed pages. ibuf_merge_or_delete_for_page(): Assert that the page state is NORMAL. INIT_ON_FLUSH is not possible, because in that case buf_page_create() should have removed any buffered changes for the page. buf_page_get_gen(): Apply buffered changes also in the case when we can avoid reading the page based on buffered redo log records. This addresses a hard-to-reproduce scenario that was broken in commit 6697135c6d03935118c3dfa1c97faea7fa76afa6.
* | | | | Merge 10.4 into 10.5Marko Mäkelä2020-05-051-5/+5
|\ \ \ \ \ | |/ / / /
| * | | | Merge 10.3 into 10.4Marko Mäkelä2020-05-051-5/+5
| |\ \ \ \ | | |/ / /
| | * | | Merge branch '10.2' into 10.3Oleksandr Byelkin2020-05-041-5/+5
| | |\ \ \ | | | |/ /
| | | * | MDEV-21595: innodb offset_t rename to rec_offsDaniel Black2020-04-291-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | thanks to: perl -i -pe 's/\boffset_t\b/rec_offs/g' $(git grep -lw offset_t storage/innobase)
* | | | | MDEV-22126 Rename confusing constant mtr_t::OPTMarko Mäkelä2020-04-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The template parameter mtr_t::OPT refers to optional, not optimized. Also the default parameter mtr_t::NORMAL refers to optimized writes. The name MAYBE_NOP would be more descriptive, conveying the idea that a write to a durable page might not actually have any effect.
* | | | | Merge 10.4 into 10.5Marko Mäkelä2020-03-301-2/+5
|\ \ \ \ \ | |/ / / /
| * | | | Merge 10.3 into 10.4Marko Mäkelä2020-03-301-3/+8
| |\ \ \ \ | | |/ / /
| | * | | Merge 10.2 into 10.3Marko Mäkelä2020-03-301-4/+7
| | |\ \ \ | | | |/ /
| | | * | remove fishy reinterpret_cast from buf_page_is_zeroes()Eugene Kosov2020-03-201-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In my micro-benchmarks memcmp(4196) 3 times faster than old implementation. Also, it's generally better to use as less reinterpret_casts<> as possible. buf_is_zeroes(): renamed from buf_page_is_zeroes() and argument changed to span<> for convenience. st_::span<T>::const_iterator: fixed page_zip-verify_checksum(): make argument byte* instead of void*
* | | | | MDEV-21907: Fix or disable -Wconversion on GCC 5.3.0 i386Marko Mäkelä2020-03-131-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | Fix or disable those -Wconversion that were missed by GCC 5.4.0 targeting AMD64.
* | | | | MDEV-21907: InnoDB: Enable -Wconversion on clang and GCCMarko Mäkelä2020-03-121-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The -Wconversion in GCC seems to be stricter than in clang. GCC at least since version 4.4.7 issues truncation warnings for assignments to bitfields, while clang 10 appears to only issue warnings when the sizes in bytes rounded to the nearest integer powers of 2 are different. Before GCC 10.0.0, -Wconversion required more casts and would not allow some operations, such as x<<=1 or x+=1 on a data type that is narrower than int. GCC 5 (but not GCC 4, GCC 6, or any later version) is complaining about x|=y even when x and y are compatible types that are narrower than int. Hence, we must rewrite some x|=y as x=static_cast<byte>(x|y) or similar, or we must disable -Wconversion. In GCC 6 and later, the warning for assigning wider to bitfields that are narrower than 8, 16, or 32 bits can be suppressed by applying a bitwise & with the exact bitmask of the bitfield. For older GCC, we must disable -Wconversion for GCC 4 or 5 in such cases. The bitwise negation operator appears to promote short integers to a wider type, and hence we must add explicit truncation casts around them. Microsoft Visual C does not allow a static_cast to truncate a constant, such as static_cast<byte>(1) truncating int. Hence, we will use the constructor-style cast byte(~1) for such cases. This has been tested at least with GCC 4.8.5, 5.4.0, 7.4.0, 9.2.1, 10.0.0, clang 9.0.1, 10.0.0, and MSVC 14.22.27905 (Microsoft Visual Studio 2019) on 64-bit and 32-bit targets (IA-32, AMD64, POWER 8, POWER 9, ARMv8).
* | | | | MDEV-21907: Fix most clang -Wconversion in InnoDBMarko Mäkelä2020-03-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Declare innodb_purge_threads as 4-byte integer (UINT) instead of 4-or-8-byte (ULONG) and adjust the documentation string.
* | | | | MDEV-15528 Punch holes when pages are freedThirunarayanan Balathandayuthapani2020-03-101-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a InnoDB data file page is freed, its contents becomes garbage, and any storage allocated in the data file is wasted. During flushing, InnoDB initializes the page with zeros if scrubbing is enabled. If the tablespace is compressed then InnoDB should punch a hole else ignore the flushing of the freed page. buf_page_t: - Replaced the variable file_page_was_freed, init_on_flush in buf_page_t with status enum variable. - Changed all debug assert of file_page_was_freed to DBUG_ASSERT of buf_page_t::status Removed buf_page_set_file_page_was_freed(), buf_page_reset_file_page_was_freed(). buf_page_free(): Newly added function which takes X-lock on the page before marking the status as FREED. So that InnoDB flush handler can avoid concurrent flush of the freed page. Also while flushing the page, InnoDB make sure that redo log which does freeing of the page also written to the disk. Currently, this function only marks the page as FREED if it is in buffer pool buf_flush_freed_page(): Newly added function which initializes zeros asynchorously if innodb_immediate_scrub_data_uncompressed is enabled. Punch a hole to the file synchorously if page_compressed is enabled. Reset the io_fix to NORMAL. Release the block from flush list and associated mutex before writing zeros or punch a hole to the file. buf_flush_page(): Removed the unnecessary usage of temporary variable "flush" fil_io(): Introduce new parameter called punch_hole. It allows fil_io() to punch the hole to the file for the given offset. buf_page_create(): Let the callers assign buf_page_t::status. Every caller should eventually invoke mtr_t::init(). fsp_page_create(): Remove the unused mtr_t parameter. In all other callers of buf_page_create() except fsp_page_create(), before invoking mtr_t::init(), invoke mtr_t::sx_latch_at_savepoint() or mtr_t::x_latch_at_savepoint(). mtr_t::init(): Initialize buf_page_t::status also for the temporary tablespace (when redo logging is disabled), to avoid assertion failures.
* | | | | MDEV-12353: Remove support for crash-upgradebb-10.5-MDEV-12353Marko Mäkelä2020-02-131-19/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We tighten some assertions regarding dict_index_t::is_dummy and crash recovery, now that redo log processing will no longer create dummy objects.
* | | | | Cleanup ibuf_page_exists(): Take simpler parametersMarko Mäkelä2020-02-131-18/+11
| | | | |