delta/mariadb-git.git - github.com: MariaDB/server.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge 10.5 into 10.6	Marko Mäkelä	2023-02-27	1	-0/+4
\|\
\| *	MDEV-30671 InnoDB undo log truncation fails to wait for purge of history	Marko Mäkelä	2023-02-24	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is not safe to invoke trx_purge_free_segment() or execute innodb_undo_log_truncate=ON before all undo log records in the rollback segment has been processed. A prominent failure that would occur due to premature freeing of undo log pages is that trx_undo_get_undo_rec() would crash when trying to copy an undo log record to fetch the previous version of a record. If trx_undo_get_undo_rec() was not invoked in the unlucky time frame, then the symptom would be that some committed transaction history is never removed. This would be detected by CHECK TABLE...EXTENDED that was impleented in commit ab0190101b0587e0e03b2d75a967050b9a85fd1b. Such a garbage collection leak should be possible even when using innodb_undo_log_truncate=OFF, just involving trx_purge_free_segment(). trx_rseg_t::needs_purge: Change the type from Boolean to a transaction identifier, noting the most recent non-purged transaction, or 0 if everything has been purged. On transaction start, we initialize this to 1 more than the transaction start ID. On recovery, the field may be adjusted to the transaction end ID (TRX_UNDO_TRX_NO) if it is larger. The field TRX_UNDO_NEEDS_PURGE becomes write-only; only some debug assertions that would validate the value. The field reflects the old inaccurate Boolean field trx_rseg_t::needs_purge. trx_undo_mem_create_at_db_start(), trx_undo_lists_init(), trx_rseg_mem_restore(): Remove the parameter max_trx_id. Instead, store the maximum in trx_rseg_t::needs_purge, where trx_rseg_array_init() will find it. trx_purge_free_segment(): Contiguously hold a lock on trx_rseg_t to prevent any concurrent allocation of undo log. trx_purge_truncate_rseg_history(): Only invoke trx_purge_free_segment() if the rollback segment is empty and there are no pending transactions associated with it. trx_purge_truncate_history(): Only proceed with innodb_undo_log_truncate=ON if trx_rseg_t::needs_purge indicates that all history has been purged. Tested by: Matthias Leich
* \|	Merge 10.5 into 10.6	Marko Mäkelä	2023-01-03	1	-0/+4
\|\ \ \| \|/
\| *	MDEV-30225 RR isolation violation with locking unique search	Vlad Lesin	2022-12-20	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before the fix next-key lock was requested only if a record was delete-marked for locking unique search in RR isolation level. There can be several delete-marked records for the same unique key, that's why InnoDB scans the records until eighter non-delete-marked record is reached or all delete-marked records with the same unique key are scanned. For range scan next-key locks are used for RR to protect scanned range from inserting new records by other transactions. And this is the reason of why next-key locks are used for delete-marked records for unique searches. If a record is not delete-marked, the requested lock type was "not-gap". When a record is not delete-marked during lock request by trx 1, and some other transaction holds conflicting lock, trx 1 creates waiting not-gap lock on the record and suspends. During trx 1 suspending the record can be delete-marked. And when the lock is granted on conflicting transaction commit or rollback, its type is still "not-gap". So we have "not-gap" lock on delete-marked record for RR. And this let some other transaction to insert some record with the same unique key when trx 1 is not committed, what can cause isolation level violation. The fix is to set next-key locks for both delete-marked and non-delete-marked records for unique search in RR.
\| *	MDEV-20605 Awaken transaction can miss inserted by other transaction records ↵	Vlad Lesin	2022-02-21	1	-0/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	due to wrong persistent cursor restoration Backported from 10.5 20e9e804c131c6522bc7c469e4863e8d1eaa3ee0 and 5948d7602ec7f61937c368dcb134e6ec226a2990. sel_restore_position_for_mysql() moves forward persistent cursor position after btr_pcur_restore_position() call if cursor relative position is BTR_PCUR_ON and the cursor points to the record with NOT the same field values as in a stored record(and some other not important for this case conditions). It was done because btr_pcur_restore_position() sets page_cur_mode_t mode to PAGE_CUR_LE for cursor->rel_pos == BTR_PCUR_ON before opening cursor. So we are searching for the record less or equal to stored one. And if the found record is not equal to stored one, then it is less and we need to move cursor forward. But there can be a situation when the stored record was purged, but the new one with the same key but different value was inserted while row_search_mvcc() was suspended. In this case, when the thread is awaken, it will invoke sel_restore_position_for_mysql(), which, in turns, invoke btr_pcur_restore_position(), which will return false because found record don't match stored record, and sel_restore_position_for_mysql() will move forward cursor position. The above can lead to the case when awaken row_search_mvcc() do not see records inserted by other transactions while it slept. The mtr test case shows the example how it can be. The fix is to return special value from persistent cursor restoring function which would notify its caller that uniq fields of restored record and stored record are the same, and in this case sel_restore_position_for_mysql() don't move cursor forward. Delete-marked records are correctly processed in row_search_mvcc(). Non-unique secondary indexes are "uniquified" by adding the PK, the index->n_uniq should then be index->n_fields. So there is no need in additional checks in the fix. If transaction's readview can't see the changes made in secondary index record, it requests clustered index record in row_search_mvcc() to check its transaction id and get the correspondent record version. After this row_search_mvcc() commits mtr to preserve clustered index latching order, and starts mtr. Between those mtr commit and start secondary index pages are unlatched, and purge has the ability to remove stored in the cursor record, what causes rows duplication in result set for non-locking reads, as cursor position is restored to the previously visited record. To solve this the changes are just switched off for non-locking reads, it's quite simple solution, besides the changes don't make sense for non-locking reads. The more complex and effective from performance perspective solution is to create mtr savepoint before clustered record requesting and rolling back to that savepoint after that. See MDEV-27557. One more solution is to have per-record transaction id for secondary indexes. See MDEV-17598. If any of those is implemented, just remove select_lock_type argument in sel_restore_position_for_mysql().
*	MDEV-29081 trx_t::lock.was_chosen_as_deadlock_victim race in lock_wait_end()	Vlad Lesin	2022-08-24	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The issue is that trx_t::lock.was_chosen_as_deadlock_victim can be reset before the transaction check it and set trx_t::error_state. The fix is to reset trx_t::lock.was_chosen_as_deadlock_victim only in trx_t::commit_in_memory(), which is invoked on full rollback. There is also no need to have separate bit in trx_t::lock.was_chosen_as_deadlock_victim to flag transaction it was chosen as a victim of Galera conflict resolution, the same variable can be used for both cases except debug build. For debug build we need to distinguish deadlock and Galera's abort victims for debug checks. Also there is no need to check for deadlock in lock_table_enqueue_waiting() for Galera as the coresponding check presents in lock_wait(). Local variable "error_state" in lock_wait() was replaced with trx->error_state, because before the replace lock_sys_t::cancel<false>(trx, lock) and lock_sys.deadlock_check() could change trx->error_state, which then could be overwritten with the local "error_state" variable value. The lock_wait_suspend_thread_enter DEBUG_SYNC point name is misleading, because lock_wait_suspend_thread was eliminated in e71e613. It was renamed to lock_wait_start. Reviewed by: Marko Mäkelä, Jan Lindström.
*	MDEV-20605 Awaken transaction can miss inserted by other transaction records ↵	Vlad Lesin	2022-02-14	1	-0/+79
	due to wrong persistent cursor restoration sel_restore_position_for_mysql() moves forward persistent cursor position after btr_pcur_restore_position() call if cursor relative position is BTR_PCUR_ON and the cursor points to the record with NOT the same field values as in a stored record(and some other not important for this case conditions). It was done because btr_pcur_restore_position() sets page_cur_mode_t mode to PAGE_CUR_LE for cursor->rel_pos == BTR_PCUR_ON before opening cursor. So we are searching for the record less or equal to stored one. And if the found record is not equal to stored one, then it is less and we need to move cursor forward. But there can be a situation when the stored record was purged, but the new one with the same key but different value was inserted while row_search_mvcc() was suspended. In this case, when the thread is awaken, it will invoke sel_restore_position_for_mysql(), which, in turns, invoke btr_pcur_restore_position(), which will return false because found record don't match stored record, and sel_restore_position_for_mysql() will move forward cursor position. The above can lead to the case when awaken row_search_mvcc() do not see records inserted by other transactions while it slept. The mtr test case shows the example how it can be. The fix is to return special value from persistent cursor restoring function which would notify its caller that uniq fields of restored record and stored record are the same, and in this case sel_restore_position_for_mysql() don't move cursor forward. Delete-marked records are correctly processed in row_search_mvcc(). Non-unique secondary indexes are "uniquified" by adding the PK, the index->n_uniq should then be index->n_fields. So there is no need in additional checks in the fix. If transaction's readview can't see the changes made in secondary index record, it requests clustered index record in row_search_mvcc() to check its transaction id and get the correspondent record version. After this row_search_mvcc() commits mtr to preserve clustered index latching order, and starts mtr. Between those mtr commit and start secondary index pages are unlatched, and purge has the ability to remove stored in the cursor record, what causes rows duplication in result set for non-locking reads, as cursor position is restored to the previously visited record. To solve this the changes are just switched off for non-locking reads, it's quite simple solution, besides the changes don't make sense for non-locking reads. The more complex and effective from performance perspective solution is to create mtr savepoint before clustered record requesting and rolling back to that savepoint after that. See MDEV-27557. One more solution is to have per-record transaction id for secondary indexes. See MDEV-17598. If any of those is implemented, just remove select_lock_type argument in sel_restore_position_for_mysql().