diff options
author | Alvaro Herrera <alvherre@alvh.no-ip.org> | 2013-12-16 11:29:50 -0300 |
---|---|---|
committer | Alvaro Herrera <alvherre@alvh.no-ip.org> | 2013-12-16 11:29:50 -0300 |
commit | 3b97e6823b949624afdc3ce4c92b29a80429715f (patch) | |
tree | a17cfce57aa3d963b0f7ab09c4b2649ed0a9eb50 /src/backend/access/rmgrdesc | |
parent | 30b96549ab41ce23399256d4ea9723a05c139558 (diff) | |
download | postgresql-3b97e6823b949624afdc3ce4c92b29a80429715f.tar.gz |
Rework tuple freezing protocol
Tuple freezing was broken in connection to MultiXactIds; commit
8e53ae025de9 tried to fix it, but didn't go far enough. As noted by
Noah Misch, freezing a tuple whose Xmax is a multi containing an aborted
update might cause locks in the multi to go ignored by later
transactions. This is because the code depended on a multixact above
their cutoff point not having any lock-only member older than the cutoff
point for Xids, which is easily defeated in READ COMMITTED transactions.
The fix for this involves creating a new MultiXactId when necessary.
But this cannot be done during WAL replay, and moreover multixact
examination requires using CLOG access routines which are not supposed
to be used during WAL replay either; so tuple freezing cannot be done
with the old freeze WAL record. Therefore, separate the freezing
computation from its execution, and change the WAL record to carry all
necessary information. At WAL replay time, it's easy to re-execute
freezing because we don't need to re-compute the new infomask/Xmax
values but just take them from the WAL record.
While at it, restructure the coding to ensure all page changes occur in
a single critical section without much room for failures. The previous
coding wasn't using a critical section, without any explanation as to
why this was acceptable.
In replication scenarios using the 9.3 branch, standby servers must be
upgraded before their master, so that they are prepared to deal with the
new WAL record once the master is upgraded; failure to do so will cause
WAL replay to die with a PANIC message. Later upgrade of the standby
will allow the process to continue where it left off, so there's no
disruption of the data in the standby in any case. Standbys know how to
deal with the old WAL record, so it's okay to keep the master running
the old code for a while.
In master, the old freeze WAL record is gone, for cleanliness' sake;
there's no compatibility concern there.
Backpatch to 9.3, where the original bug was introduced and where the
previous fix was backpatched.
Álvaro Herrera and Andres Freund
Diffstat (limited to 'src/backend/access/rmgrdesc')
-rw-r--r-- | src/backend/access/rmgrdesc/heapdesc.c | 16 |
1 files changed, 8 insertions, 8 deletions
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c index 39c53d0022..4a86b8527d 100644 --- a/src/backend/access/rmgrdesc/heapdesc.c +++ b/src/backend/access/rmgrdesc/heapdesc.c @@ -131,23 +131,23 @@ heap2_desc(StringInfo buf, uint8 xl_info, char *rec) uint8 info = xl_info & ~XLR_INFO_MASK; info &= XLOG_HEAP_OPMASK; - if (info == XLOG_HEAP2_FREEZE) + if (info == XLOG_HEAP2_CLEAN) { - xl_heap_freeze *xlrec = (xl_heap_freeze *) rec; + xl_heap_clean *xlrec = (xl_heap_clean *) rec; - appendStringInfo(buf, "freeze: rel %u/%u/%u; blk %u; cutoff xid %u multi %u", + appendStringInfo(buf, "clean: rel %u/%u/%u; blk %u remxid %u", xlrec->node.spcNode, xlrec->node.dbNode, xlrec->node.relNode, xlrec->block, - xlrec->cutoff_xid, xlrec->cutoff_multi); + xlrec->latestRemovedXid); } - else if (info == XLOG_HEAP2_CLEAN) + else if (info == XLOG_HEAP2_FREEZE_PAGE) { - xl_heap_clean *xlrec = (xl_heap_clean *) rec; + xl_heap_freeze_page *xlrec = (xl_heap_freeze_page *) rec; - appendStringInfo(buf, "clean: rel %u/%u/%u; blk %u remxid %u", + appendStringInfo(buf, "freeze_page: rel %u/%u/%u; blk %u; cutoff xid %u ntuples %u", xlrec->node.spcNode, xlrec->node.dbNode, xlrec->node.relNode, xlrec->block, - xlrec->latestRemovedXid); + xlrec->cutoff_xid, xlrec->ntuples); } else if (info == XLOG_HEAP2_CLEANUP_INFO) { |