summaryrefslogtreecommitdiff
path: root/src/include
diff options
context:
space:
mode:
authorTom Lane <tgl@sss.pgh.pa.us>2019-05-11 21:27:13 -0400
committerTom Lane <tgl@sss.pgh.pa.us>2019-05-11 21:27:13 -0400
commit27d3394b0d8979e632d562927f1438f1befcb84a (patch)
treeb393cac6f726fa706306cff1ef7a4490389d51d6 /src/include
parent8dab099bb49c9f8784842c246f04c0e0a0f7c8f9 (diff)
downloadpostgresql-27d3394b0d8979e632d562927f1438f1befcb84a.tar.gz
Rearrange pgstat_bestart() to avoid failures within its critical section.
We long ago decided to design the shared PgBackendStatus data structure to minimize the cost of writing status updates, which means that writers just have to increment the st_changecount field twice. That isn't hooked into any sort of resource management mechanism, which means that if something were to throw error between the two increments, the st_changecount field would be left odd indefinitely. That would cause readers to lock up. Now, since it's also a bad idea to leave the field odd for longer than absolutely necessary (because readers will spin while we have it set), the expectation was that we'd treat these segments like spinlock critical sections, with only short, more or less straight-line, code in them. That was fine as originally designed, but commit 9029f4b37 broke it by inserting a significant amount of non-straight-line code into pgstat_bestart(), code that is very capable of throwing errors, not to mention taking a significant amount of time during which readers will spin. We have a report from Neeraj Kumar of readers actually locking up, which I suspect was due to an encoding conversion error in X509_NAME_to_cstring, though conceivably it was just a garden-variety OOM failure. Subsequent commits have loaded even more dubious code into pgstat_bestart's critical section (and commit fc70a4b0d deserves some kind of booby prize for managing to miss the critical section entirely, although the negative consequences seem minimal given that the PgBackendStatus entry should be seen by readers as inactive at that point). The right way to fix this mess seems to be to compute all these values into a local copy of the process' PgBackendStatus struct, and then just copy the data back within the critical section proper. This plan can't be implemented completely cleanly because of the struct's heavy reliance on out-of-line strings, which we must initialize separately within the critical section. But still, the critical section is far smaller and safer than it was before. In hopes of forestalling future errors of the same ilk, rename the macros for st_changecount management to make it more apparent that the writer-side macros create a critical section. And to prevent the worst consequences if we nonetheless manage to mess it up anyway, adjust those macros so that they really are a critical section, ie they now bump CritSectionCount. That doesn't add much overhead, and it guarantees that if we do somehow throw an error while the counter is odd, it will lead to PANIC and a database restart to reset shared memory. Back-patch to 9.5 where the problem was introduced. In HEAD, also fix an oversight in commit b0b39f72b: it failed to teach pgstat_read_current_status to copy st_gssstatus data from shared memory to local memory. Hence, subsequent use of that data within the transaction would potentially see changing data that it shouldn't see. Discussion: https://postgr.es/m/CAPR3Wj5Z17=+eeyrn_ZDG3NQGYgMEOY6JV6Y-WRRhGgwc16U3Q@mail.gmail.com
Diffstat (limited to 'src/include')
-rw-r--r--src/include/pgstat.h91
1 files changed, 63 insertions, 28 deletions
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 410b7a8791..d689b52719 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -780,11 +780,12 @@ typedef struct PgBackendStatus
* the copy is valid; otherwise start over. This makes updates cheap
* while reads are potentially expensive, but that's the tradeoff we want.
*
- * The above protocol needs the memory barriers to ensure that the
- * apparent order of execution is as it desires. Otherwise, for example,
- * the CPU might rearrange the code so that st_changecount is incremented
- * twice before the modification on a machine with weak memory ordering.
- * This surprising result can lead to bugs.
+ * The above protocol needs memory barriers to ensure that the apparent
+ * order of execution is as it desires. Otherwise, for example, the CPU
+ * might rearrange the code so that st_changecount is incremented twice
+ * before the modification on a machine with weak memory ordering. Hence,
+ * use the macros defined below for manipulating st_changecount, rather
+ * than touching it directly.
*/
int st_changecount;
@@ -831,42 +832,76 @@ typedef struct PgBackendStatus
} PgBackendStatus;
/*
- * Macros to load and store st_changecount with the memory barriers.
+ * Macros to load and store st_changecount with appropriate memory barriers.
*
- * pgstat_increment_changecount_before() and
- * pgstat_increment_changecount_after() need to be called before and after
- * PgBackendStatus entries are modified, respectively. This makes sure that
- * st_changecount is incremented around the modification.
+ * Use PGSTAT_BEGIN_WRITE_ACTIVITY() before, and PGSTAT_END_WRITE_ACTIVITY()
+ * after, modifying the current process's PgBackendStatus data. Note that,
+ * since there is no mechanism for cleaning up st_changecount after an error,
+ * THESE MACROS FORM A CRITICAL SECTION. Any error between them will be
+ * promoted to PANIC, causing a database restart to clean up shared memory!
+ * Hence, keep the critical section as short and straight-line as possible.
+ * Aside from being safer, that minimizes the window in which readers will
+ * have to loop.
*
- * Also pgstat_save_changecount_before() and pgstat_save_changecount_after()
- * need to be called before and after PgBackendStatus entries are copied into
- * private memory, respectively.
+ * Reader logic should follow this sketch:
+ *
+ * for (;;)
+ * {
+ * int before_ct, after_ct;
+ *
+ * pgstat_begin_read_activity(beentry, before_ct);
+ * ... copy beentry data to local memory ...
+ * pgstat_end_read_activity(beentry, after_ct);
+ * if (pgstat_read_activity_complete(before_ct, after_ct))
+ * break;
+ * CHECK_FOR_INTERRUPTS();
+ * }
+ *
+ * For extra safety, we generally use volatile beentry pointers, although
+ * the memory barriers should theoretically be sufficient.
*/
-#define pgstat_increment_changecount_before(beentry) \
- do { \
- beentry->st_changecount++; \
+#define PGSTAT_BEGIN_WRITE_ACTIVITY(beentry) \
+ do { \
+ START_CRIT_SECTION(); \
+ (beentry)->st_changecount++; \
pg_write_barrier(); \
} while (0)
-#define pgstat_increment_changecount_after(beentry) \
- do { \
+#define PGSTAT_END_WRITE_ACTIVITY(beentry) \
+ do { \
pg_write_barrier(); \
- beentry->st_changecount++; \
- Assert((beentry->st_changecount & 1) == 0); \
+ (beentry)->st_changecount++; \
+ Assert(((beentry)->st_changecount & 1) == 0); \
+ END_CRIT_SECTION(); \
} while (0)
-#define pgstat_save_changecount_before(beentry, save_changecount) \
- do { \
- save_changecount = beentry->st_changecount; \
- pg_read_barrier(); \
+#define pgstat_begin_read_activity(beentry, before_changecount) \
+ do { \
+ (before_changecount) = (beentry)->st_changecount; \
+ pg_read_barrier(); \
} while (0)
-#define pgstat_save_changecount_after(beentry, save_changecount) \
- do { \
- pg_read_barrier(); \
- save_changecount = beentry->st_changecount; \
+#define pgstat_end_read_activity(beentry, after_changecount) \
+ do { \
+ pg_read_barrier(); \
+ (after_changecount) = (beentry)->st_changecount; \
} while (0)
+#define pgstat_read_activity_complete(before_changecount, after_changecount) \
+ ((before_changecount) == (after_changecount) && \
+ ((before_changecount) & 1) == 0)
+
+/* deprecated names for above macros; these are gone in v12 */
+#define pgstat_increment_changecount_before(beentry) \
+ PGSTAT_BEGIN_WRITE_ACTIVITY(beentry)
+#define pgstat_increment_changecount_after(beentry) \
+ PGSTAT_END_WRITE_ACTIVITY(beentry)
+#define pgstat_save_changecount_before(beentry, save_changecount) \
+ pgstat_begin_read_activity(beentry, save_changecount)
+#define pgstat_save_changecount_after(beentry, save_changecount) \
+ pgstat_end_read_activity(beentry, save_changecount)
+
+
/* ----------
* LocalPgBackendStatus
*