1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
|
---
stage: Data Stores
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
# Understanding EXPLAIN plans
PostgreSQL allows you to obtain query plans using the `EXPLAIN` command. This
command can be invaluable when trying to determine how a query performs.
You can use this command directly in your SQL query, as long as the query starts
with it:
```sql
EXPLAIN
SELECT COUNT(*)
FROM projects
WHERE visibility_level IN (0, 20);
```
When running this on GitLab.com, we are presented with the following output:
```sql
Aggregate (cost=922411.76..922411.77 rows=1 width=8)
-> Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0)
Filter: (visibility_level = ANY ('{0,20}'::integer[]))
```
When using _just_ `EXPLAIN`, PostgreSQL does not actually execute our query,
instead it produces an _estimated_ execution plan based on the available
statistics. This means the actual plan can differ quite a bit. Fortunately,
PostgreSQL provides us with the option to execute the query as well. To do so,
we need to use `EXPLAIN ANALYZE` instead of just `EXPLAIN`:
```sql
EXPLAIN ANALYZE
SELECT COUNT(*)
FROM projects
WHERE visibility_level IN (0, 20);
```
This produces:
```sql
Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1)
-> Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1)
Filter: (visibility_level = ANY ('{0,20}'::integer[]))
Rows Removed by Filter: 65677
Planning time: 2.861 ms
Execution time: 3428.596 ms
```
As we can see this plan is quite different, and includes a lot more data. Let's
discuss this step by step.
Because `EXPLAIN ANALYZE` executes the query, care should be taken when using a
query that writes data or might time out. If the query modifies data,
consider wrapping it in a transaction that rolls back automatically like so:
```sql
BEGIN;
EXPLAIN ANALYZE
DELETE FROM users WHERE id = 1;
ROLLBACK;
```
The `EXPLAIN` command also takes additional options, such as `BUFFERS`:
```sql
EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(*)
FROM projects
WHERE visibility_level IN (0, 20);
```
This then produces:
```sql
Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1)
Buffers: shared hit=208846
-> Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1)
Filter: (visibility_level = ANY ('{0,20}'::integer[]))
Rows Removed by Filter: 65677
Buffers: shared hit=208846
Planning time: 2.861 ms
Execution time: 3428.596 ms
```
For more information, refer to the official
[`EXPLAIN` documentation](https://www.postgresql.org/docs/current/sql-explain.html)
and [using `EXPLAIN` guide](https://www.postgresql.org/docs/current/using-explain.html).
## Nodes
Every query plan consists of nodes. Nodes can be nested, and are executed from
the inside out. This means that the innermost node is executed before an outer
node. This can be best thought of as nested function calls, returning their
results as they unwind. For example, a plan starting with an `Aggregate`
followed by a `Nested Loop`, followed by an `Index Only scan` can be thought of
as the following Ruby code:
```ruby
aggregate(
nested_loop(
index_only_scan()
index_only_scan()
)
)
```
Nodes are indicated using a `->` followed by the type of node taken. For
example:
```sql
Aggregate (cost=922411.76..922411.77 rows=1 width=8)
-> Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0)
Filter: (visibility_level = ANY ('{0,20}'::integer[]))
```
Here the first node executed is `Seq scan on projects`. The `Filter:` is an
additional filter applied to the results of the node. A filter is very similar
to Ruby's `Array#select`: it takes the input rows, applies the filter, and
produces a new list of rows. After the node is done, we perform the `Aggregate`
above it.
Nested nodes look like this:
```sql
Aggregate (cost=176.97..176.98 rows=1 width=8) (actual time=0.252..0.252 rows=1 loops=1)
Buffers: shared hit=155
-> Nested Loop (cost=0.86..176.75 rows=87 width=0) (actual time=0.035..0.249 rows=36 loops=1)
Buffers: shared hit=155
-> Index Only Scan using users_pkey on users users_1 (cost=0.43..4.95 rows=87 width=4) (actual time=0.029..0.123 rows=36 loops=1)
Index Cond: (id < 100)
Heap Fetches: 0
-> Index Only Scan using users_pkey on users (cost=0.43..1.96 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=36)
Index Cond: (id = users_1.id)
Heap Fetches: 0
Planning time: 2.585 ms
Execution time: 0.310 ms
```
Here we first perform two separate "Index Only" scans, followed by performing a
"Nested Loop" on the result of these two scans.
## Node statistics
Each node in a plan has a set of associated statistics, such as the cost, the
number of rows produced, the number of loops performed, and more. For example:
```sql
Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0)
```
Here we can see that our cost ranges from `0.00..908044.47` (we cover this in
a moment), and we estimate (since we're using `EXPLAIN` and not `EXPLAIN
ANALYZE`) a total of 5,746,914 rows to be produced by this node. The `width`
statistics describes the estimated width of each row, in bytes.
The `costs` field specifies how expensive a node was. The cost is measured in
arbitrary units determined by the query planner's cost parameters. What
influences the costs depends on a variety of settings, such as `seq_page_cost`,
`cpu_tuple_cost`, and various others.
The format of the costs field is as follows:
```sql
STARTUP COST..TOTAL COST
```
The startup cost states how expensive it was to start the node, with the total
cost describing how expensive the entire node was. In general: the greater the
values, the more expensive the node.
When using `EXPLAIN ANALYZE`, these statistics also include the actual time
(in milliseconds) spent, and other runtime statistics (for example, the actual number of
produced rows):
```sql
Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1)
```
Here we can see we estimated 5,746,969 rows to be returned, but in reality we
returned 5,746,940 rows. We can also see that _just_ this sequential scan took
2.98 seconds to run.
Using `EXPLAIN (ANALYZE, BUFFERS)` also gives us information about the
number of rows removed by a filter, the number of buffers used, and more. For
example:
```sql
Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1)
Filter: (visibility_level = ANY ('{0,20}'::integer[]))
Rows Removed by Filter: 65677
Buffers: shared hit=208846
```
Here we can see that our filter has to remove 65,677 rows, and that we use
208,846 buffers. Each buffer in PostgreSQL is 8 KB (8192 bytes), meaning our
above node uses *1.6 GB of buffers*. That's a lot!
Keep in mind that some statistics are per-loop averages, while others are total values:
| Field name | Value type |
| --- | --- |
| Actual Total Time | per-loop average |
| Actual Rows | per-loop average |
| Buffers Shared Hit | total value |
| Buffers Shared Read | total value |
| Buffers Shared Dirtied | total value |
| Buffers Shared Written | total value |
| I/O Read Time | total value |
| I/O Read Write | total value |
For example:
```sql
-> Index Scan using users_pkey on public.users (cost=0.43..3.44 rows=1 width=1318) (actual time=0.025..0.025 rows=1 loops=888)
Index Cond: (users.id = issues.author_id)
Buffers: shared hit=3543 read=9
I/O Timings: read=17.760 write=0.000
```
Here we can see that this node used 3552 buffers (3543 + 9), returned 888 rows (`888 * 1`), and the actual duration was 22.2 milliseconds (`888 * 0.025`).
17.76 milliseconds of the total duration was spent in reading from disk, to retrieve data that was not in the cache.
## Node types
There are quite a few different types of nodes, so we only cover some of the
more common ones here.
A full list of all the available nodes and their descriptions can be found in
the [PostgreSQL source file `plannodes.h`](https://gitlab.com/postgres/postgres/blob/master/src/include/nodes/plannodes.h).
pgMustard's [EXPLAIN docs](https://www.pgmustard.com/docs/explain) also offer detailed look into nodes and their fields.
### Seq Scan
A sequential scan over (a chunk of) a database table. This is like using
`Array#each`, but on a database table. Sequential scans can be quite slow when
retrieving lots of rows, so it's best to avoid these for large tables.
### Index Only Scan
A scan on an index that did not require fetching anything from the table. In
certain cases an index only scan may still fetch data from the table, in this
case the node includes a `Heap Fetches:` statistic.
### Index Scan
A scan on an index that required retrieving some data from the table.
### Bitmap Index Scan and Bitmap Heap scan
Bitmap scans fall between sequential scans and index scans. These are typically
used when we would read too much data from an index scan, but too little to
perform a sequential scan. A bitmap scan uses what is known as a
[bitmap index](https://en.wikipedia.org/wiki/Bitmap_index) to perform its work.
The [source code of PostgreSQL](https://gitlab.com/postgres/postgres/blob/REL_11_STABLE/src/include/nodes/plannodes.h#L441)
states the following on bitmap scans:
> Bitmap Index Scan delivers a bitmap of potential tuple locations; it does not
> access the heap itself. The bitmap is used by an ancestor Bitmap Heap Scan
> node, possibly after passing through intermediate Bitmap And and/or Bitmap Or
> nodes to combine it with the results of other Bitmap Index Scans.
### Limit
Applies a `LIMIT` on the input rows.
### Sort
Sorts the input rows as specified using an `ORDER BY` statement.
### Nested Loop
A nested loop executes its child nodes for every row produced by a node that
precedes it. For example:
```sql
-> Nested Loop (cost=0.86..176.75 rows=87 width=0) (actual time=0.035..0.249 rows=36 loops=1)
Buffers: shared hit=155
-> Index Only Scan using users_pkey on users users_1 (cost=0.43..4.95 rows=87 width=4) (actual time=0.029..0.123 rows=36 loops=1)
Index Cond: (id < 100)
Heap Fetches: 0
-> Index Only Scan using users_pkey on users (cost=0.43..1.96 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=36)
Index Cond: (id = users_1.id)
Heap Fetches: 0
```
Here the first child node (`Index Only Scan using users_pkey on users users_1`)
produces 36 rows, and is executed once (`rows=36 loops=1`). The next node
produces 1 row (`rows=1`), but is repeated 36 times (`loops=36`). This is
because the previous node produced 36 rows.
This means that nested loops can quickly slow the query down if the various
child nodes keep producing many rows.
## Optimising queries
With that out of the way, let's see how we can optimise a query. Let's use the
following query as an example:
```sql
SELECT COUNT(*)
FROM users
WHERE twitter != '';
```
This query counts the number of users that have a Twitter profile set.
Let's run this using `EXPLAIN (ANALYZE, BUFFERS)`:
```sql
EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(*)
FROM users
WHERE twitter != '';
```
This produces the following plan:
```sql
Aggregate (cost=845110.21..845110.22 rows=1 width=8) (actual time=1271.157..1271.158 rows=1 loops=1)
Buffers: shared hit=202662
-> Seq Scan on users (cost=0.00..844969.99 rows=56087 width=0) (actual time=0.019..1265.883 rows=51833 loops=1)
Filter: ((twitter)::text <> ''::text)
Rows Removed by Filter: 2487813
Buffers: shared hit=202662
Planning time: 0.390 ms
Execution time: 1271.180 ms
```
From this query plan we can see the following:
1. We need to perform a sequential scan on the `users` table.
1. This sequential scan filters out 2,487,813 rows using a `Filter`.
1. We use 202,622 buffers, which equals 1.58 GB of memory.
1. It takes us 1.2 seconds to do all of this.
Considering we are just counting users, that's quite expensive!
Before we start making any changes, let's see if there are any existing indexes
on the `users` table that we might be able to use. We can obtain this
information by running `\d users` in a `psql` console, then scrolling down to
the `Indexes:` section:
```sql
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
"index_users_on_confirmation_token" UNIQUE, btree (confirmation_token)
"index_users_on_email" UNIQUE, btree (email)
"index_users_on_reset_password_token" UNIQUE, btree (reset_password_token)
"index_users_on_static_object_token" UNIQUE, btree (static_object_token)
"index_users_on_unlock_token" UNIQUE, btree (unlock_token)
"index_on_users_name_lower" btree (lower(name::text))
"index_users_on_accepted_term_id" btree (accepted_term_id)
"index_users_on_admin" btree (admin)
"index_users_on_created_at" btree (created_at)
"index_users_on_email_trigram" gin (email gin_trgm_ops)
"index_users_on_feed_token" btree (feed_token)
"index_users_on_group_view" btree (group_view)
"index_users_on_incoming_email_token" btree (incoming_email_token)
"index_users_on_managing_group_id" btree (managing_group_id)
"index_users_on_name" btree (name)
"index_users_on_name_trigram" gin (name gin_trgm_ops)
"index_users_on_public_email" btree (public_email) WHERE public_email::text <> ''::text
"index_users_on_state" btree (state)
"index_users_on_state_and_user_type" btree (state, user_type)
"index_users_on_unconfirmed_email" btree (unconfirmed_email) WHERE unconfirmed_email IS NOT NULL
"index_users_on_user_type" btree (user_type)
"index_users_on_username" btree (username)
"index_users_on_username_trigram" gin (username gin_trgm_ops)
"tmp_idx_on_user_id_where_bio_is_filled" btree (id) WHERE COALESCE(bio, ''::character varying)::text IS DISTINCT FROM ''::text
```
Here we can see there is no index on the `twitter` column, which means
PostgreSQL has to perform a sequential scan in this case. Let's try to fix this
by adding the following index:
```sql
CREATE INDEX CONCURRENTLY twitter_test ON users (twitter);
```
If we now re-run our query using `EXPLAIN (ANALYZE, BUFFERS)` we get the
following plan:
```sql
Aggregate (cost=61002.82..61002.83 rows=1 width=8) (actual time=297.311..297.312 rows=1 loops=1)
Buffers: shared hit=51854 dirtied=19
-> Index Only Scan using twitter_test on users (cost=0.43..60873.13 rows=51877 width=0) (actual time=279.184..293.532 rows=51833 loops=1)
Filter: ((twitter)::text <> ''::text)
Rows Removed by Filter: 2487830
Heap Fetches: 26037
Buffers: shared hit=51854 dirtied=19
Planning time: 0.191 ms
Execution time: 297.334 ms
```
Now it takes just under 300 milliseconds to get our data, instead of 1.2
seconds. However, we still use 51,854 buffers, which is about 400 MB of memory.
300 milliseconds is also quite slow for such a simple query. To understand why
this query is still expensive, let's take a look at the following:
```sql
Index Only Scan using twitter_test on users (cost=0.43..60873.13 rows=51877 width=0) (actual time=279.184..293.532 rows=51833 loops=1)
Filter: ((twitter)::text <> ''::text)
Rows Removed by Filter: 2487830
```
We start with an index only scan on our index, but we somehow still apply a
`Filter` that filters out 2,487,830 rows. Why is that? Well, let's look at how
we created the index:
```sql
CREATE INDEX CONCURRENTLY twitter_test ON users (twitter);
```
We told PostgreSQL to index all possible values of the `twitter` column,
even empty strings. Our query in turn uses `WHERE twitter != ''`. This means
that the index does improve things, as we don't need to do a sequential scan,
but we may still encounter empty strings. This means PostgreSQL _has_ to apply a
Filter on the index results to get rid of those values.
Fortunately, we can improve this even further using "partial indexes". Partial
indexes are indexes with a `WHERE` condition that is applied when indexing data.
For example:
```sql
CREATE INDEX CONCURRENTLY some_index ON users (email) WHERE id < 100
```
This index would only index the `email` value of rows that match `WHERE id <
100`. We can use partial indexes to change our Twitter index to the following:
```sql
CREATE INDEX CONCURRENTLY twitter_test ON users (twitter) WHERE twitter != '';
```
After being created, if we run our query again we are given the following plan:
```sql
Aggregate (cost=1608.26..1608.27 rows=1 width=8) (actual time=19.821..19.821 rows=1 loops=1)
Buffers: shared hit=44036
-> Index Only Scan using twitter_test on users (cost=0.41..1479.71 rows=51420 width=0) (actual time=0.023..15.514 rows=51833 loops=1)
Heap Fetches: 1208
Buffers: shared hit=44036
Planning time: 0.123 ms
Execution time: 19.848 ms
```
That's _a lot_ better! Now it only takes 20 milliseconds to get the data, and we
only use about 344 MB of buffers (instead of the original 1.58 GB). The reason
this works is that now PostgreSQL no longer needs to apply a `Filter`, as the
index only contains `twitter` values that are not empty.
Keep in mind that you shouldn't just add partial indexes every time you want to
optimise a query. Every index has to be updated for every write, and they may
require quite a bit of space, depending on the amount of indexed data. As a
result, first check if there are any existing indexes you may be able to reuse.
If there aren't any, check if you can perhaps slightly change an existing one to
fit both the existing and new queries. Only add a new index if none of the
existing indexes can be used in any way.
When comparing execution plans, don't take timing as the only important metric.
Good timing is the main goal of any optimization, but it can be too volatile to
be used for comparison (for example, it depends a lot on the state of cache).
When optimizing a query, we usually need to reduce the amount of data we're
dealing with. Indexes are the way to work with fewer pages (buffers) to get the
result, so, during optimization, look at the number of buffers used (read and hit),
and work on reducing these numbers. Reduced timing is the consequence of reduced
buffer numbers. [Database Lab Engine](#database-lab-engine) guarantees that the plan is structurally
identical to production (and overall number of buffers is the same as on production),
but difference in cache state and I/O speed may lead to different timings.
## Queries that can't be optimised
Now that we have seen how to optimise a query, let's look at another query that
we might not be able to optimise:
```sql
EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(*)
FROM projects
WHERE visibility_level IN (0, 20);
```
The output of `EXPLAIN (ANALYZE, BUFFERS)` is as follows:
```sql
Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1)
Buffers: shared hit=208846
-> Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1)
Filter: (visibility_level = ANY ('{0,20}'::integer[]))
Rows Removed by Filter: 65677
Buffers: shared hit=208846
Planning time: 2.861 ms
Execution time: 3428.596 ms
```
Looking at the output we see the following Filter:
```sql
Filter: (visibility_level = ANY ('{0,20}'::integer[]))
Rows Removed by Filter: 65677
```
Looking at the number of rows removed by the filter, we may be tempted to add an
index on `projects.visibility_level` to somehow turn this Sequential scan +
filter into an index-only scan.
Unfortunately, doing so is unlikely to improve anything. Contrary to what some
might believe, an index being present _does not guarantee_ that PostgreSQL
actually uses it. For example, when doing a `SELECT * FROM projects` it is much
cheaper to just scan the entire table, instead of using an index and then
fetching data from the table. In such cases PostgreSQL may decide to not use an
index.
Second, let's think for a moment what our query does: it gets all projects with
visibility level 0 or 20. In the above plan we can see this produces quite a lot
of rows (5,745,940), but how much is that relative to the total? Let's find out
by running the following query:
```sql
SELECT visibility_level, count(*) AS amount
FROM projects
GROUP BY visibility_level
ORDER BY visibility_level ASC;
```
For GitLab.com this produces:
```sql
visibility_level | amount
------------------+---------
0 | 5071325
10 | 65678
20 | 674801
```
Here the total number of projects is 5,811,804, and 5,746,126 of those are of
level 0 or 20. That's 98% of the entire table!
So no matter what we do, this query retrieves 98% of the entire table. Since
most time is spent doing exactly that, there isn't really much we can do to
improve this query, other than _not_ running it at all.
What is important here is that while some may recommend to straight up add an
index the moment you see a sequential scan, it is _much more important_ to first
understand what your query does, how much data it retrieves, and so on. After
all, you can not optimise something you do not understand.
### Cardinality and selectivity
Earlier we saw that our query had to retrieve 98% of the rows in the table.
There are two terms commonly used for databases: cardinality, and selectivity.
Cardinality refers to the number of unique values in a particular column in a
table.
Selectivity is the number of unique values produced by an operation (for example, an
index scan or filter), relative to the total number of rows. The higher the
selectivity, the more likely PostgreSQL is able to use an index.
In the above example, there are only 3 unique values: 0, 10, and 20. This means
the cardinality is 3. The selectivity in turn is also very low: 0.0000003% (2 /
5,811,804), because our `Filter` only filters using two values (`0` and `20`).
With such a low selectivity value it's not surprising that PostgreSQL decides
using an index is not worth it, because it would produce almost no unique rows.
## Rewriting queries
So the above query can't really be optimised as-is, or at least not much. But
what if we slightly change the purpose of it? What if instead of retrieving all
projects with `visibility_level` 0 or 20, we retrieve those that a user
interacted with somehow?
Fortunately, GitLab has an answer for this, and it's a table called
`user_interacted_projects`. This table has the following schema:
```sql
Table "public.user_interacted_projects"
Column | Type | Modifiers
------------+---------+-----------
user_id | integer | not null
project_id | integer | not null
Indexes:
"index_user_interacted_projects_on_project_id_and_user_id" UNIQUE, btree (project_id, user_id)
"index_user_interacted_projects_on_user_id" btree (user_id)
Foreign-key constraints:
"fk_rails_0894651f08" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
"fk_rails_722ceba4f7" FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE
```
Let's rewrite our query to `JOIN` this table onto our projects, and get the
projects for a specific user:
```sql
EXPLAIN ANALYZE
SELECT COUNT(*)
FROM projects
INNER JOIN user_interacted_projects ON user_interacted_projects.project_id = projects.id
WHERE projects.visibility_level IN (0, 20)
AND user_interacted_projects.user_id = 1;
```
What we do here is the following:
1. Get our projects.
1. `INNER JOIN` `user_interacted_projects`, meaning we're only left with rows in
`projects` that have a corresponding row in `user_interacted_projects`.
1. Limit this to the projects with `visibility_level` of 0 or 20, and to
projects that the user with ID 1 interacted with.
If we run this query we get the following plan:
```sql
Aggregate (cost=871.03..871.04 rows=1 width=8) (actual time=9.763..9.763 rows=1 loops=1)
-> Nested Loop (cost=0.86..870.52 rows=203 width=0) (actual time=1.072..9.748 rows=143 loops=1)
-> Index Scan using index_user_interacted_projects_on_user_id on user_interacted_projects (cost=0.43..160.71 rows=205 width=4) (actual time=0.939..2.508 rows=145 loops=1)
Index Cond: (user_id = 1)
-> Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145)
Index Cond: (id = user_interacted_projects.project_id)
Filter: (visibility_level = ANY ('{0,20}'::integer[]))
Rows Removed by Filter: 0
Planning time: 2.614 ms
Execution time: 9.809 ms
```
Here it only took us just under 10 milliseconds to get the data. We can also see
we're retrieving far fewer projects:
```sql
Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145)
Index Cond: (id = user_interacted_projects.project_id)
Filter: (visibility_level = ANY ('{0,20}'::integer[]))
Rows Removed by Filter: 0
```
Here we see we perform 145 loops (`loops=145`), with every loop producing 1 row
(`rows=1`). This is much less than before, and our query performs much better!
If we look at the plan we also see our costs are very low:
```sql
Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145)
```
Here our cost is only 3.45, and it takes us 7.25 milliseconds to do so (0.05 * 145).
The next index scan is a bit more expensive:
```sql
Index Scan using index_user_interacted_projects_on_user_id on user_interacted_projects (cost=0.43..160.71 rows=205 width=4) (actual time=0.939..2.508 rows=145 loops=1)
```
Here the cost is 160.71 (`cost=0.43..160.71`), taking about 2.5 milliseconds
(based on the output of `actual time=....`).
The most expensive part here is the "Nested Loop" that acts upon the result of
these two index scans:
```sql
Nested Loop (cost=0.86..870.52 rows=203 width=0) (actual time=1.072..9.748 rows=143 loops=1)
```
Here we had to perform 870.52 disk page fetches for 203 rows, 9.748
milliseconds, producing 143 rows in a single loop.
The key takeaway here is that sometimes you have to rewrite (parts of) a query
to make it better. Sometimes that means having to slightly change your feature
to accommodate for better performance.
## What makes a bad plan
This is a bit of a difficult question to answer, because the definition of "bad"
is relative to the problem you are trying to solve. However, some patterns are
best avoided in most cases, such as:
- Sequential scans on large tables
- Filters that remove a lot of rows
- Performing a certain step that requires _a lot_ of
buffers (for example, an index scan for GitLab.com that requires more than 512 MB).
As a general guideline, aim for a query that:
1. Takes no more than 10 milliseconds. Our target time spent in SQL per request
is around 100 milliseconds, so every query should be as fast as possible.
1. Does not use an excessive number of buffers, relative to the workload. For
example, retrieving ten rows shouldn't require 1 GB of buffers.
1. Does not spend a long amount of time performing disk IO operations. The
setting `track_io_timing` must be enabled for this data to be included in the
output of `EXPLAIN ANALYZE`.
1. Applies a `LIMIT` when retrieving rows without aggregating them, such as
`SELECT * FROM users`.
1. Doesn't use a `Filter` to filter out too many rows, especially if the query
does not use a `LIMIT` to limit the number of returned rows. Filters can
usually be removed by adding a (partial) index.
These are _guidelines_ and not hard requirements, as different needs may require
different queries. The only _rule_ is that you _must always measure_ your query
(preferably using a production-like database) using `EXPLAIN (ANALYZE, BUFFERS)`
and related tools such as:
- [`explain.depesz.com`](https://explain.depesz.com/).
- [`explain.dalibo.com/`](https://explain.dalibo.com/).
## Producing query plans
There are a few ways to get the output of a query plan. Of course you
can directly run the `EXPLAIN` query in the `psql` console, or you can
follow one of the other options below.
### Database Lab Engine
GitLab team members can use [Database Lab Engine](https://gitlab.com/postgres-ai/database-lab), and the companion
SQL optimization tool - [Joe Bot](https://gitlab.com/postgres-ai/joe).
Database Lab Engine provides developers with their own clone of the production database, while Joe Bot helps with exploring execution plans.
Joe Bot is available in the [`#database-lab`](https://gitlab.slack.com/archives/CLJMDRD8C) channel on Slack,
and through its [web interface](https://console.postgres.ai/gitlab/joe-instances).
With Joe Bot you can execute DDL statements (like creating indexes, tables, and columns) and get query plans for `SELECT`, `UPDATE`, and `DELETE` statements.
For example, in order to test new index on a column that is not existing on production yet, you can do the following:
Create the column:
```sql
exec ALTER TABLE projects ADD COLUMN last_at timestamp without time zone
```
Create the index:
```sql
exec CREATE INDEX index_projects_last_activity ON projects (last_activity_at) WHERE last_activity_at IS NOT NULL
```
Analyze the table to update its statistics:
```sql
exec ANALYZE projects
```
Get the query plan:
```sql
explain SELECT * FROM projects WHERE last_activity_at < CURRENT_DATE
```
Once done you can rollback your changes:
```sql
reset
```
For more information about the available options, run:
```sql
help
```
The web interface comes with the following execution plan visualizers included:
- [Depesz](https://explain.depesz.com/)
- [PEV2](https://github.com/dalibo/pev2)
- [FlameGraph](https://github.com/mgartner/pg_flame)
#### Tips & Tricks
The database connection is now maintained during your whole session, so you can use `exec set ...` for any session variables (such as `enable_seqscan` or `work_mem`). These settings are applied to all subsequent commands until you reset them. For example you can disable parallel queries with
```sql
exec SET max_parallel_workers_per_gather = 0
```
### Rails console
Using the [`activerecord-explain-analyze`](https://github.com/6/activerecord-explain-analyze)
you can directly generate the query plan from the Rails console:
```ruby
pry(main)> require 'activerecord-explain-analyze'
=> true
pry(main)> Project.where('build_timeout > ?', 3600).explain(analyze: true)
Project Load (1.9ms) SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600)
↳ (pry):12
=> EXPLAIN for: SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600)
Seq Scan on public.projects (cost=0.00..2.17 rows=1 width=742) (actual time=0.040..0.041 rows=0 loops=1)
Output: id, name, path, description, created_at, updated_at, creator_id, namespace_id, ...
Filter: (projects.build_timeout > 3600)
Rows Removed by Filter: 14
Buffers: shared hit=2
Planning time: 0.411 ms
Execution time: 0.113 ms
```
### ChatOps
GitLab team members can also use our ChatOps solution, available in Slack
using the [`/chatops` slash command](../chatops_on_gitlabcom.md).
NOTE:
While ChatOps is still available, the recommended way to generate execution plans is to use [Database Lab Engine](#database-lab-engine).
You can use ChatOps to get a query plan by running the following:
```sql
/chatops run explain SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20)
```
Visualising the plan using <https://explain.depesz.com/> is also supported:
```sql
/chatops run explain --visual SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20)
```
Quoting the query is not necessary.
For more information about the available options, run:
```sql
/chatops run explain --help
```
## Further reading
A more extensive guide on understanding query plans can be found in
the [presentation](https://public.dalibo.com/exports/conferences/_archives/_2012/201211_explain/understanding_explain.pdf)
from [Dalibo.org](https://www.dalibo.com/en/).
Depesz's blog also has a good [section](https://www.depesz.com/tag/unexplainable/) dedicated to query plans.
|