diff options
author | Peter Geoghegan <pg@bowt.ie> | 2021-03-10 16:27:01 -0800 |
---|---|---|
committer | Peter Geoghegan <pg@bowt.ie> | 2021-03-10 16:27:01 -0800 |
commit | 9f3665fbfc34b963933e51778c7feaa8134ac885 (patch) | |
tree | f06201f72fc31718beb0b17f48facb270c28df26 /src/backend/access/nbtree/nbtree.c | |
parent | 845ac7f847a25505e91f30dca4e0330b25785ee0 (diff) | |
download | postgresql-9f3665fbfc34b963933e51778c7feaa8134ac885.tar.gz |
Don't consider newly inserted tuples in nbtree VACUUM.
Remove the entire idea of "stale stats" within nbtree VACUUM (stop
caring about stats involving the number of inserted tuples). Also
remove the vacuum_cleanup_index_scale_factor GUC/param on the master
branch (though just disable them on postgres 13).
The vacuum_cleanup_index_scale_factor/stats interface made the nbtree AM
partially responsible for deciding when pg_class.reltuples stats needed
to be updated. This seems contrary to the spirit of the index AM API,
though -- it is not actually necessary for an index AM's bulk delete and
cleanup callbacks to provide accurate stats when it happens to be
inconvenient. The core code owns that. (Index AMs have the authority
to perform or not perform certain kinds of deferred cleanup based on
their own considerations, such as page deletion and recycling, but that
has little to do with pg_class.reltuples/num_index_tuples.)
This issue was fairly harmless until the introduction of the
autovacuum_vacuum_insert_threshold feature by commit b07642db, which had
an undesirable interaction with the vacuum_cleanup_index_scale_factor
mechanism: it made insert-driven autovacuums perform full index scans,
even though there is no real benefit to doing so. This has been tied to
a regression with an append-only insert benchmark [1].
Also have remaining cases that perform a full scan of an index during a
cleanup-only nbtree VACUUM indicate that the final tuple count is only
an estimate. This prevents vacuumlazy.c from setting the index's
pg_class.reltuples in those cases (it will now only update pg_class when
vacuumlazy.c had TIDs for nbtree to bulk delete). This arguably fixes
an oversight in deduplication-related bugfix commit 48e12913.
[1] https://smalldatum.blogspot.com/2021/01/insert-benchmark-postgres-is-still.html
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoA4WHthN5uU6+WScZ7+J_RcEjmcuH94qcoUPuB42ShXzg@mail.gmail.com
Backpatch: 13-, where autovacuum_vacuum_insert_threshold was added.
Diffstat (limited to 'src/backend/access/nbtree/nbtree.c')
-rw-r--r-- | src/backend/access/nbtree/nbtree.c | 70 |
1 files changed, 21 insertions, 49 deletions
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c index 504f5bef17..c70647d6f3 100644 --- a/src/backend/access/nbtree/nbtree.c +++ b/src/backend/access/nbtree/nbtree.c @@ -789,11 +789,8 @@ _bt_vacuum_needs_cleanup(IndexVacuumInfo *info) Buffer metabuf; Page metapg; BTMetaPageData *metad; - BTOptions *relopts; - float8 cleanup_scale_factor; uint32 btm_version; BlockNumber prev_num_delpages; - float8 prev_num_heap_tuples; /* * Copy details from metapage to local variables quickly. @@ -816,33 +813,9 @@ _bt_vacuum_needs_cleanup(IndexVacuumInfo *info) } prev_num_delpages = metad->btm_last_cleanup_num_delpages; - prev_num_heap_tuples = metad->btm_last_cleanup_num_heap_tuples; _bt_relbuf(info->index, metabuf); /* - * If the underlying table has received a sufficiently high number of - * insertions since the last VACUUM operation that called btvacuumscan(), - * then have the current VACUUM operation call btvacuumscan() now. This - * happens when the statistics are deemed stale. - * - * XXX: We should have a more principled way of determining what - * "staleness" means. The vacuum_cleanup_index_scale_factor GUC (and the - * index-level storage param) seem hard to tune in a principled way. - */ - relopts = (BTOptions *) info->index->rd_options; - cleanup_scale_factor = (relopts && - relopts->vacuum_cleanup_index_scale_factor >= 0) - ? relopts->vacuum_cleanup_index_scale_factor - : vacuum_cleanup_index_scale_factor; - - if (cleanup_scale_factor <= 0 || - info->num_heap_tuples < 0 || - prev_num_heap_tuples <= 0 || - (info->num_heap_tuples - prev_num_heap_tuples) / - prev_num_heap_tuples >= cleanup_scale_factor) - return true; - - /* * Trigger cleanup in rare cases where prev_num_delpages exceeds 5% of the * total size of the index. We can reasonably expect (though are not * guaranteed) to be able to recycle this many pages if we decide to do a @@ -925,48 +898,45 @@ btvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats) /* * Since we aren't going to actually delete any leaf items, there's no - * need to go through all the vacuum-cycle-ID pushups here + * need to go through all the vacuum-cycle-ID pushups here. + * + * Posting list tuples are a source of inaccuracy for cleanup-only + * scans. btvacuumscan() will assume that the number of index tuples + * from each page can be used as num_index_tuples, even though + * num_index_tuples is supposed to represent the number of TIDs in the + * index. This naive approach can underestimate the number of tuples + * in the index significantly. + * + * We handle the problem by making num_index_tuples an estimate in + * cleanup-only case. */ stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult)); btvacuumscan(info, stats, NULL, NULL, 0); + stats->estimated_count = true; } /* * By here, we know for sure that this VACUUM operation won't be skipping - * its btvacuumscan() call. Maintain the count of the current number of - * heap tuples in the metapage. Also maintain the num_delpages value. + * its btvacuumscan() call. Maintain num_delpages value in metapage. * This information will be used by _bt_vacuum_needs_cleanup() during * future VACUUM operations that don't need to call btbulkdelete(). * * num_delpages is the number of deleted pages now in the index that were * not safe to place in the FSM to be recycled just yet. We expect that * it will almost certainly be possible to place all of these pages in the - * FSM during the next VACUUM operation. That factor alone might cause - * _bt_vacuum_needs_cleanup() to force the next VACUUM to proceed with a - * btvacuumscan() call. - * - * Note: We must delay the _bt_set_cleanup_info() call until this late - * stage of VACUUM (the btvacuumcleanup() phase), to keep num_heap_tuples - * accurate. The btbulkdelete()-time num_heap_tuples value is generally - * just pg_class.reltuples for the heap relation _before_ VACUUM began. - * In general cleanup info should describe the state of the index/table - * _after_ VACUUM finishes. + * FSM during the next VACUUM operation. _bt_vacuum_needs_cleanup() will + * force the next VACUUM to consider this before allowing btvacuumscan() + * to be skipped entirely. */ Assert(stats->pages_deleted >= stats->pages_free); num_delpages = stats->pages_deleted - stats->pages_free; - _bt_set_cleanup_info(info->index, num_delpages, info->num_heap_tuples); + _bt_set_cleanup_info(info->index, num_delpages); /* * It's quite possible for us to be fooled by concurrent page splits into * double-counting some index tuples, so disbelieve any total that exceeds * the underlying heap's count ... if we know that accurately. Otherwise * this might just make matters worse. - * - * Posting list tuples are another source of inaccuracy. Cleanup-only - * btvacuumscan calls assume that the number of index tuples can be used - * as num_index_tuples, even though num_index_tuples is supposed to - * represent the number of TIDs in the index. This naive approach can - * underestimate the number of tuples in the index. */ if (!info->estimated_count) { @@ -1016,7 +986,6 @@ btvacuumscan(IndexVacuumInfo *info, IndexBulkDeleteResult *stats, * pages in the index at the end of the VACUUM command.) */ stats->num_pages = 0; - stats->estimated_count = false; stats->num_index_tuples = 0; stats->pages_deleted = 0; stats->pages_free = 0; @@ -1421,7 +1390,10 @@ backtrack: * We don't count the number of live TIDs during cleanup-only calls to * btvacuumscan (i.e. when callback is not set). We count the number * of index tuples directly instead. This avoids the expense of - * directly examining all of the tuples on each page. + * directly examining all of the tuples on each page. VACUUM will + * treat num_index_tuples as an estimate in cleanup-only case, so it + * doesn't matter that this underestimates num_index_tuples + * significantly in some cases. */ if (minoff > maxoff) attempt_pagedel = (blkno == scanblkno); |