PostgreSQL Source Code git master
vacuumlazy.c
Go to the documentation of this file.
1/*-------------------------------------------------------------------------
2 *
3 * vacuumlazy.c
4 * Concurrent ("lazy") vacuuming.
5 *
6 * Heap relations are vacuumed in three main phases. In phase I, vacuum scans
7 * relation pages, pruning and freezing tuples and saving dead tuples' TIDs in
8 * a TID store. If that TID store fills up or vacuum finishes scanning the
9 * relation, it progresses to phase II: index vacuuming. Index vacuuming
10 * deletes the dead index entries referenced in the TID store. In phase III,
11 * vacuum scans the blocks of the relation referred to by the TIDs in the TID
12 * store and reaps the corresponding dead items, freeing that space for future
13 * tuples.
14 *
15 * If there are no indexes or index scanning is disabled, phase II may be
16 * skipped. If phase I identified very few dead index entries or if vacuum's
17 * failsafe mechanism has triggered (to avoid transaction ID wraparound),
18 * vacuum may skip phases II and III.
19 *
20 * If the TID store fills up in phase I, vacuum suspends phase I and proceeds
21 * to phases II and III, cleaning up the dead tuples referenced in the current
22 * TID store. This empties the TID store, allowing vacuum to resume phase I.
23 *
24 * In a way, the phases are more like states in a state machine, but they have
25 * been referred to colloquially as phases for so long that they are referred
26 * to as such here.
27 *
28 * Manually invoked VACUUMs may scan indexes during phase II in parallel. For
29 * more information on this, see the comment at the top of vacuumparallel.c.
30 *
31 * In between phases, vacuum updates the freespace map (every
32 * VACUUM_FSM_EVERY_PAGES).
33 *
34 * After completing all three phases, vacuum may truncate the relation if it
35 * has emptied pages at the end. Finally, vacuum updates relation statistics
36 * in pg_class and the cumulative statistics subsystem.
37 *
38 * Relation Scanning:
39 *
40 * Vacuum scans the heap relation, starting at the beginning and progressing
41 * to the end, skipping pages as permitted by their visibility status, vacuum
42 * options, and various other requirements.
43 *
44 * Vacuums are either aggressive or normal. Aggressive vacuums must scan every
45 * unfrozen tuple in order to advance relfrozenxid and avoid transaction ID
46 * wraparound. Normal vacuums may scan otherwise skippable pages for one of
47 * two reasons:
48 *
49 * When page skipping is not disabled, a normal vacuum may scan pages that are
50 * marked all-visible (and even all-frozen) in the visibility map if the range
51 * of skippable pages is below SKIP_PAGES_THRESHOLD. This is primarily for the
52 * benefit of kernel readahead (see comment in heap_vac_scan_next_block()).
53 *
54 * A normal vacuum may also scan skippable pages in an effort to freeze them
55 * and decrease the backlog of all-visible but not all-frozen pages that have
56 * to be processed by the next aggressive vacuum. These are referred to as
57 * eagerly scanned pages. Pages scanned due to SKIP_PAGES_THRESHOLD do not
58 * count as eagerly scanned pages.
59 *
60 * Eagerly scanned pages that are set all-frozen in the VM are successful
61 * eager freezes and those not set all-frozen in the VM are failed eager
62 * freezes.
63 *
64 * Because we want to amortize the overhead of freezing pages over multiple
65 * vacuums, normal vacuums cap the number of successful eager freezes to
66 * MAX_EAGER_FREEZE_SUCCESS_RATE of the number of all-visible but not
67 * all-frozen pages at the beginning of the vacuum. Since eagerly frozen pages
68 * may be unfrozen before the next aggressive vacuum, capping the number of
69 * successful eager freezes also caps the downside of eager freezing:
70 * potentially wasted work.
71 *
72 * Once the success cap has been hit, eager scanning is disabled for the
73 * remainder of the vacuum of the relation.
74 *
75 * Success is capped globally because we don't want to limit our successes if
76 * old data happens to be concentrated in a particular part of the table. This
77 * is especially likely to happen for append-mostly workloads where the oldest
78 * data is at the beginning of the unfrozen portion of the relation.
79 *
80 * On the assumption that different regions of the table are likely to contain
81 * similarly aged data, normal vacuums use a localized eager freeze failure
82 * cap. The failure count is reset for each region of the table -- comprised
83 * of EAGER_SCAN_REGION_SIZE blocks. In each region, we tolerate
84 * vacuum_max_eager_freeze_failure_rate of EAGER_SCAN_REGION_SIZE failures
85 * before suspending eager scanning until the end of the region.
86 * vacuum_max_eager_freeze_failure_rate is configurable both globally and per
87 * table.
88 *
89 * Aggressive vacuums must examine every unfrozen tuple and thus are not
90 * subject to any of the limits imposed by the eager scanning algorithm.
91 *
92 * Once vacuum has decided to scan a given block, it must read the block and
93 * obtain a cleanup lock to prune tuples on the page. A non-aggressive vacuum
94 * may choose to skip pruning and freezing if it cannot acquire a cleanup lock
95 * on the buffer right away. In this case, it may miss cleaning up dead tuples
96 * and their associated index entries (though it is free to reap any existing
97 * dead items on the page).
98 *
99 * After pruning and freezing, pages that are newly all-visible and all-frozen
100 * are marked as such in the visibility map.
101 *
102 * Dead TID Storage:
103 *
104 * The major space usage for vacuuming is storage for the dead tuple IDs that
105 * are to be removed from indexes. We want to ensure we can vacuum even the
106 * very largest relations with finite memory space usage. To do that, we set
107 * upper bounds on the memory that can be used for keeping track of dead TIDs
108 * at once.
109 *
110 * We are willing to use at most maintenance_work_mem (or perhaps
111 * autovacuum_work_mem) memory space to keep track of dead TIDs. If the
112 * TID store is full, we must call lazy_vacuum to vacuum indexes (and to vacuum
113 * the pages that we've pruned). This frees up the memory space dedicated to
114 * store dead TIDs.
115 *
116 * In practice VACUUM will often complete its initial pass over the target
117 * heap relation without ever running out of space to store TIDs. This means
118 * that there only needs to be one call to lazy_vacuum, after the initial pass
119 * completes.
120 *
121 * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
122 * Portions Copyright (c) 1994, Regents of the University of California
123 *
124 *
125 * IDENTIFICATION
126 * src/backend/access/heap/vacuumlazy.c
127 *
128 *-------------------------------------------------------------------------
129 */
130#include "postgres.h"
131
132#include <math.h>
133
134#include "access/genam.h"
135#include "access/heapam.h"
136#include "access/htup_details.h"
137#include "access/multixact.h"
138#include "access/tidstore.h"
139#include "access/transam.h"
140#include "access/visibilitymap.h"
141#include "access/xloginsert.h"
142#include "catalog/storage.h"
143#include "commands/progress.h"
144#include "commands/vacuum.h"
145#include "common/int.h"
146#include "common/pg_prng.h"
147#include "executor/instrument.h"
148#include "miscadmin.h"
149#include "pgstat.h"
152#include "storage/bufmgr.h"
153#include "storage/freespace.h"
154#include "storage/lmgr.h"
155#include "storage/read_stream.h"
156#include "utils/lsyscache.h"
157#include "utils/pg_rusage.h"
158#include "utils/timestamp.h"
159
160
161/*
162 * Space/time tradeoff parameters: do these need to be user-tunable?
163 *
164 * To consider truncating the relation, we want there to be at least
165 * REL_TRUNCATE_MINIMUM or (relsize / REL_TRUNCATE_FRACTION) (whichever
166 * is less) potentially-freeable pages.
167 */
168#define REL_TRUNCATE_MINIMUM 1000
169#define REL_TRUNCATE_FRACTION 16
170
171/*
172 * Timing parameters for truncate locking heuristics.
173 *
174 * These were not exposed as user tunable GUC values because it didn't seem
175 * that the potential for improvement was great enough to merit the cost of
176 * supporting them.
177 */
178#define VACUUM_TRUNCATE_LOCK_CHECK_INTERVAL 20 /* ms */
179#define VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL 50 /* ms */
180#define VACUUM_TRUNCATE_LOCK_TIMEOUT 5000 /* ms */
181
182/*
183 * Threshold that controls whether we bypass index vacuuming and heap
184 * vacuuming as an optimization
185 */
186#define BYPASS_THRESHOLD_PAGES 0.02 /* i.e. 2% of rel_pages */
187
188/*
189 * Perform a failsafe check each time we scan another 4GB of pages.
190 * (Note that this is deliberately kept to a power-of-two, usually 2^19.)
191 */
192#define FAILSAFE_EVERY_PAGES \
193 ((BlockNumber) (((uint64) 4 * 1024 * 1024 * 1024) / BLCKSZ))
194
195/*
196 * When a table has no indexes, vacuum the FSM after every 8GB, approximately
197 * (it won't be exact because we only vacuum FSM after processing a heap page
198 * that has some removable tuples). When there are indexes, this is ignored,
199 * and we vacuum FSM after each index/heap cleaning pass.
200 */
201#define VACUUM_FSM_EVERY_PAGES \
202 ((BlockNumber) (((uint64) 8 * 1024 * 1024 * 1024) / BLCKSZ))
203
204/*
205 * Before we consider skipping a page that's marked as clean in
206 * visibility map, we must've seen at least this many clean pages.
207 */
208#define SKIP_PAGES_THRESHOLD ((BlockNumber) 32)
209
210/*
211 * Size of the prefetch window for lazy vacuum backwards truncation scan.
212 * Needs to be a power of 2.
213 */
214#define PREFETCH_SIZE ((BlockNumber) 32)
215
216/*
217 * Macro to check if we are in a parallel vacuum. If true, we are in the
218 * parallel mode and the DSM segment is initialized.
219 */
220#define ParallelVacuumIsActive(vacrel) ((vacrel)->pvs != NULL)
221
222/* Phases of vacuum during which we report error context. */
223typedef enum
224{
232
233/*
234 * An eager scan of a page that is set all-frozen in the VM is considered
235 * "successful". To spread out freezing overhead across multiple normal
236 * vacuums, we limit the number of successful eager page freezes. The maximum
237 * number of eager page freezes is calculated as a ratio of the all-visible
238 * but not all-frozen pages at the beginning of the vacuum.
239 */
240#define MAX_EAGER_FREEZE_SUCCESS_RATE 0.2
241
242/*
243 * On the assumption that different regions of the table tend to have
244 * similarly aged data, once vacuum fails to freeze
245 * vacuum_max_eager_freeze_failure_rate of the blocks in a region of size
246 * EAGER_SCAN_REGION_SIZE, it suspends eager scanning until it has progressed
247 * to another region of the table with potentially older data.
248 */
249#define EAGER_SCAN_REGION_SIZE 4096
250
251/*
252 * heap_vac_scan_next_block() sets these flags to communicate information
253 * about the block it read to the caller.
254 */
255#define VAC_BLK_WAS_EAGER_SCANNED (1 << 0)
256#define VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM (1 << 1)
257
258typedef struct LVRelState
259{
260 /* Target heap relation and its indexes */
264
265 /* Buffer access strategy and parallel vacuum state */
268
269 /* Aggressive VACUUM? (must set relfrozenxid >= FreezeLimit) */
271 /* Use visibility map to skip? (disabled by DISABLE_PAGE_SKIPPING) */
273 /* Consider index vacuuming bypass optimization? */
275
276 /* Doing index vacuuming, index cleanup, rel truncation? */
280
281 /* VACUUM operation's cutoffs for freezing and pruning */
284 /* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid */
288
289 /* Error reporting state */
290 char *dbname;
292 char *relname;
293 char *indname; /* Current index name */
294 BlockNumber blkno; /* used only for heap operations */
295 OffsetNumber offnum; /* used only for heap operations */
297 bool verbose; /* VACUUM VERBOSE? */
298
299 /*
300 * dead_items stores TIDs whose index tuples are deleted by index
301 * vacuuming. Each TID points to an LP_DEAD line pointer from a heap page
302 * that has been processed by lazy_scan_prune. Also needed by
303 * lazy_vacuum_heap_rel, which marks the same LP_DEAD line pointers as
304 * LP_UNUSED during second heap pass.
305 *
306 * Both dead_items and dead_items_info are allocated in shared memory in
307 * parallel vacuum cases.
308 */
309 TidStore *dead_items; /* TIDs whose index tuples we'll delete */
311
312 BlockNumber rel_pages; /* total number of pages */
313 BlockNumber scanned_pages; /* # pages examined (not skipped via VM) */
314
315 /*
316 * Count of all-visible blocks eagerly scanned (for logging only). This
317 * does not include skippable blocks scanned due to SKIP_PAGES_THRESHOLD.
318 */
320
321 BlockNumber removed_pages; /* # pages removed by relation truncation */
322 BlockNumber new_frozen_tuple_pages; /* # pages with newly frozen tuples */
323
324 /* # pages newly set all-visible in the VM */
326
327 /*
328 * # pages newly set all-visible and all-frozen in the VM. This is a
329 * subset of vm_new_visible_pages. That is, vm_new_visible_pages includes
330 * all pages set all-visible, but vm_new_visible_frozen_pages includes
331 * only those which were also set all-frozen.
332 */
334
335 /* # all-visible pages newly set all-frozen in the VM */
337
338 BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */
339 BlockNumber missed_dead_pages; /* # pages with missed dead tuples */
340 BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
341
342 /* Statistics output by us, for table */
343 double new_rel_tuples; /* new estimated total # of tuples */
344 double new_live_tuples; /* new estimated total # of live tuples */
345 /* Statistics output by index AMs */
347
348 /* Instrumentation counters */
350 /* Counters that follow are only for scanned_pages */
351 int64 tuples_deleted; /* # deleted from table */
352 int64 tuples_frozen; /* # newly frozen */
353 int64 lpdead_items; /* # deleted from indexes */
354 int64 live_tuples; /* # live tuples remaining */
355 int64 recently_dead_tuples; /* # dead, but not yet removable */
356 int64 missed_dead_tuples; /* # removable, but not removed */
357
358 /* State maintained by heap_vac_scan_next_block() */
359 BlockNumber current_block; /* last block returned */
360 BlockNumber next_unskippable_block; /* next unskippable block */
361 bool next_unskippable_allvis; /* its visibility status */
362 bool next_unskippable_eager_scanned; /* if it was eagerly scanned */
363 Buffer next_unskippable_vmbuffer; /* buffer containing its VM bit */
364
365 /* State related to managing eager scanning of all-visible pages */
366
367 /*
368 * A normal vacuum that has failed to freeze too many eagerly scanned
369 * blocks in a region suspends eager scanning.
370 * next_eager_scan_region_start is the block number of the first block
371 * eligible for resumed eager scanning.
372 *
373 * When eager scanning is permanently disabled, either initially
374 * (including for aggressive vacuum) or due to hitting the success cap,
375 * this is set to InvalidBlockNumber.
376 */
378
379 /*
380 * The remaining number of blocks a normal vacuum will consider eager
381 * scanning when it is successful. When eager scanning is enabled, this is
382 * initialized to MAX_EAGER_FREEZE_SUCCESS_RATE of the total number of
383 * all-visible but not all-frozen pages. For each eager freeze success,
384 * this is decremented. Once it hits 0, eager scanning is permanently
385 * disabled. It is initialized to 0 if eager scanning starts out disabled
386 * (including for aggressive vacuum).
387 */
389
390 /*
391 * The maximum number of blocks which may be eagerly scanned and not
392 * frozen before eager scanning is temporarily suspended. This is
393 * configurable both globally, via the
394 * vacuum_max_eager_freeze_failure_rate GUC, and per table, with a table
395 * storage parameter of the same name. It is calculated as
396 * vacuum_max_eager_freeze_failure_rate of EAGER_SCAN_REGION_SIZE blocks.
397 * It is 0 when eager scanning is disabled.
398 */
400
401 /*
402 * The number of eagerly scanned blocks vacuum failed to freeze (due to
403 * age) in the current eager scan region. Vacuum resets it to
404 * eager_scan_max_fails_per_region each time it enters a new region of the
405 * relation. If eager_scan_remaining_fails hits 0, eager scanning is
406 * suspended until the next region. It is also 0 if eager scanning has
407 * been permanently disabled.
408 */
411
412
413/* Struct for saving and restoring vacuum error information. */
414typedef struct LVSavedErrInfo
415{
420
421
422/* non-export function prototypes */
423static void lazy_scan_heap(LVRelState *vacrel);
424static void heap_vacuum_eager_scan_setup(LVRelState *vacrel,
425 const VacuumParams params);
427 void *callback_private_data,
428 void *per_buffer_data);
429static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
430static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
431 BlockNumber blkno, Page page,
432 bool sharelock, Buffer vmbuffer);
433static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
434 BlockNumber blkno, Page page,
435 Buffer vmbuffer, bool all_visible_according_to_vm,
436 bool *has_lpdead_items, bool *vm_page_frozen);
437static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
438 BlockNumber blkno, Page page,
439 bool *has_lpdead_items);
440static void lazy_vacuum(LVRelState *vacrel);
441static bool lazy_vacuum_all_indexes(LVRelState *vacrel);
442static void lazy_vacuum_heap_rel(LVRelState *vacrel);
443static void lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno,
444 Buffer buffer, OffsetNumber *deadoffsets,
445 int num_offsets, Buffer vmbuffer);
446static bool lazy_check_wraparound_failsafe(LVRelState *vacrel);
447static void lazy_cleanup_all_indexes(LVRelState *vacrel);
450 double reltuples,
451 LVRelState *vacrel);
454 double reltuples,
455 bool estimated_count,
456 LVRelState *vacrel);
457static bool should_attempt_truncation(LVRelState *vacrel);
458static void lazy_truncate_heap(LVRelState *vacrel);
460 bool *lock_waiter_detected);
461static void dead_items_alloc(LVRelState *vacrel, int nworkers);
462static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *offsets,
463 int num_offsets);
464static void dead_items_reset(LVRelState *vacrel);
465static void dead_items_cleanup(LVRelState *vacrel);
466
467#ifdef USE_ASSERT_CHECKING
468static bool heap_page_is_all_visible(Relation rel, Buffer buf,
469 TransactionId OldestXmin,
470 bool *all_frozen,
471 TransactionId *visibility_cutoff_xid,
472 OffsetNumber *logging_offnum);
473#endif
475 TransactionId OldestXmin,
476 OffsetNumber *deadoffsets,
477 int ndeadoffsets,
478 bool *all_frozen,
479 TransactionId *visibility_cutoff_xid,
480 OffsetNumber *logging_offnum);
481static void update_relstats_all_indexes(LVRelState *vacrel);
482static void vacuum_error_callback(void *arg);
483static void update_vacuum_error_info(LVRelState *vacrel,
484 LVSavedErrInfo *saved_vacrel,
485 int phase, BlockNumber blkno,
486 OffsetNumber offnum);
487static void restore_vacuum_error_info(LVRelState *vacrel,
488 const LVSavedErrInfo *saved_vacrel);
489
490
491
492/*
493 * Helper to set up the eager scanning state for vacuuming a single relation.
494 * Initializes the eager scan management related members of the LVRelState.
495 *
496 * Caller provides whether or not an aggressive vacuum is required due to
497 * vacuum options or for relfrozenxid/relminmxid advancement.
498 */
499static void
501{
502 uint32 randseed;
503 BlockNumber allvisible;
504 BlockNumber allfrozen;
505 float first_region_ratio;
506 bool oldest_unfrozen_before_cutoff = false;
507
508 /*
509 * Initialize eager scan management fields to their disabled values.
510 * Aggressive vacuums, normal vacuums of small tables, and normal vacuums
511 * of tables without sufficiently old tuples disable eager scanning.
512 */
515 vacrel->eager_scan_remaining_fails = 0;
517
518 /* If eager scanning is explicitly disabled, just return. */
519 if (params.max_eager_freeze_failure_rate == 0)
520 return;
521
522 /*
523 * The caller will have determined whether or not an aggressive vacuum is
524 * required by either the vacuum parameters or the relative age of the
525 * oldest unfrozen transaction IDs. An aggressive vacuum must scan every
526 * all-visible page to safely advance the relfrozenxid and/or relminmxid,
527 * so scans of all-visible pages are not considered eager.
528 */
529 if (vacrel->aggressive)
530 return;
531
532 /*
533 * Aggressively vacuuming a small relation shouldn't take long, so it
534 * isn't worth amortizing. We use two times the region size as the size
535 * cutoff because the eager scan start block is a random spot somewhere in
536 * the first region, making the second region the first to be eager
537 * scanned normally.
538 */
539 if (vacrel->rel_pages < 2 * EAGER_SCAN_REGION_SIZE)
540 return;
541
542 /*
543 * We only want to enable eager scanning if we are likely to be able to
544 * freeze some of the pages in the relation.
545 *
546 * Tuples with XIDs older than OldestXmin or MXIDs older than OldestMxact
547 * are technically freezable, but we won't freeze them unless the criteria
548 * for opportunistic freezing is met. Only tuples with XIDs/MXIDs older
549 * than the FreezeLimit/MultiXactCutoff are frozen in the common case.
550 *
551 * So, as a heuristic, we wait until the FreezeLimit has advanced past the
552 * relfrozenxid or the MultiXactCutoff has advanced past the relminmxid to
553 * enable eager scanning.
554 */
557 vacrel->cutoffs.FreezeLimit))
558 oldest_unfrozen_before_cutoff = true;
559
560 if (!oldest_unfrozen_before_cutoff &&
563 vacrel->cutoffs.MultiXactCutoff))
564 oldest_unfrozen_before_cutoff = true;
565
566 if (!oldest_unfrozen_before_cutoff)
567 return;
568
569 /* We have met the criteria to eagerly scan some pages. */
570
571 /*
572 * Our success cap is MAX_EAGER_FREEZE_SUCCESS_RATE of the number of
573 * all-visible but not all-frozen blocks in the relation.
574 */
575 visibilitymap_count(vacrel->rel, &allvisible, &allfrozen);
576
579 (allvisible - allfrozen));
580
581 /* If every all-visible page is frozen, eager scanning is disabled. */
582 if (vacrel->eager_scan_remaining_successes == 0)
583 return;
584
585 /*
586 * Now calculate the bounds of the first eager scan region. Its end block
587 * will be a random spot somewhere in the first EAGER_SCAN_REGION_SIZE
588 * blocks. This affects the bounds of all subsequent regions and avoids
589 * eager scanning and failing to freeze the same blocks each vacuum of the
590 * relation.
591 */
593
595
598
602
603 /*
604 * The first region will be smaller than subsequent regions. As such,
605 * adjust the eager freeze failures tolerated for this region.
606 */
607 first_region_ratio = 1 - (float) vacrel->next_eager_scan_region_start /
609
612 first_region_ratio;
613}
614
615/*
616 * heap_vacuum_rel() -- perform VACUUM for one heap relation
617 *
618 * This routine sets things up for and then calls lazy_scan_heap, where
619 * almost all work actually takes place. Finalizes everything after call
620 * returns by managing relation truncation and updating rel's pg_class
621 * entry. (Also updates pg_class entries for any indexes that need it.)
622 *
623 * At entry, we have already established a transaction and opened
624 * and locked the relation.
625 */
626void
628 BufferAccessStrategy bstrategy)
629{
630 LVRelState *vacrel;
631 bool verbose,
632 instrument,
633 skipwithvm,
634 frozenxid_updated,
635 minmulti_updated;
636 BlockNumber orig_rel_pages,
637 new_rel_pages,
638 new_rel_allvisible,
639 new_rel_allfrozen;
640 PGRUsage ru0;
641 TimestampTz starttime = 0;
642 PgStat_Counter startreadtime = 0,
643 startwritetime = 0;
644 WalUsage startwalusage = pgWalUsage;
645 BufferUsage startbufferusage = pgBufferUsage;
646 ErrorContextCallback errcallback;
647 char **indnames = NULL;
648
649 verbose = (params.options & VACOPT_VERBOSE) != 0;
650 instrument = (verbose || (AmAutoVacuumWorkerProcess() &&
651 params.log_vacuum_min_duration >= 0));
652 if (instrument)
653 {
654 pg_rusage_init(&ru0);
655 if (track_io_timing)
656 {
657 startreadtime = pgStatBlockReadTime;
658 startwritetime = pgStatBlockWriteTime;
659 }
660 }
661
662 /* Used for instrumentation and stats report */
663 starttime = GetCurrentTimestamp();
664
666 RelationGetRelid(rel));
667
668 /*
669 * Setup error traceback support for ereport() first. The idea is to set
670 * up an error context callback to display additional information on any
671 * error during a vacuum. During different phases of vacuum, we update
672 * the state so that the error context callback always display current
673 * information.
674 *
675 * Copy the names of heap rel into local memory for error reporting
676 * purposes, too. It isn't always safe to assume that we can get the name
677 * of each rel. It's convenient for code in lazy_scan_heap to always use
678 * these temp copies.
679 */
680 vacrel = (LVRelState *) palloc0(sizeof(LVRelState));
684 vacrel->indname = NULL;
686 vacrel->verbose = verbose;
687 errcallback.callback = vacuum_error_callback;
688 errcallback.arg = vacrel;
689 errcallback.previous = error_context_stack;
690 error_context_stack = &errcallback;
691
692 /* Set up high level stuff about rel and its indexes */
693 vacrel->rel = rel;
695 &vacrel->indrels);
696 vacrel->bstrategy = bstrategy;
697 if (instrument && vacrel->nindexes > 0)
698 {
699 /* Copy index names used by instrumentation (not error reporting) */
700 indnames = palloc(sizeof(char *) * vacrel->nindexes);
701 for (int i = 0; i < vacrel->nindexes; i++)
702 indnames[i] = pstrdup(RelationGetRelationName(vacrel->indrels[i]));
703 }
704
705 /*
706 * The index_cleanup param either disables index vacuuming and cleanup or
707 * forces it to go ahead when we would otherwise apply the index bypass
708 * optimization. The default is 'auto', which leaves the final decision
709 * up to lazy_vacuum().
710 *
711 * The truncate param allows user to avoid attempting relation truncation,
712 * though it can't force truncation to happen.
713 */
716 params.truncate != VACOPTVALUE_AUTO);
717
718 /*
719 * While VacuumFailSafeActive is reset to false before calling this, we
720 * still need to reset it here due to recursive calls.
721 */
722 VacuumFailsafeActive = false;
723 vacrel->consider_bypass_optimization = true;
724 vacrel->do_index_vacuuming = true;
725 vacrel->do_index_cleanup = true;
726 vacrel->do_rel_truncate = (params.truncate != VACOPTVALUE_DISABLED);
728 {
729 /* Force disable index vacuuming up-front */
730 vacrel->do_index_vacuuming = false;
731 vacrel->do_index_cleanup = false;
732 }
733 else if (params.index_cleanup == VACOPTVALUE_ENABLED)
734 {
735 /* Force index vacuuming. Note that failsafe can still bypass. */
736 vacrel->consider_bypass_optimization = false;
737 }
738 else
739 {
740 /* Default/auto, make all decisions dynamically */
742 }
743
744 /* Initialize page counters explicitly (be tidy) */
745 vacrel->scanned_pages = 0;
746 vacrel->eager_scanned_pages = 0;
747 vacrel->removed_pages = 0;
748 vacrel->new_frozen_tuple_pages = 0;
749 vacrel->lpdead_item_pages = 0;
750 vacrel->missed_dead_pages = 0;
751 vacrel->nonempty_pages = 0;
752 /* dead_items_alloc allocates vacrel->dead_items later on */
753
754 /* Allocate/initialize output statistics state */
755 vacrel->new_rel_tuples = 0;
756 vacrel->new_live_tuples = 0;
757 vacrel->indstats = (IndexBulkDeleteResult **)
758 palloc0(vacrel->nindexes * sizeof(IndexBulkDeleteResult *));
759
760 /* Initialize remaining counters (be tidy) */
761 vacrel->num_index_scans = 0;
762 vacrel->tuples_deleted = 0;
763 vacrel->tuples_frozen = 0;
764 vacrel->lpdead_items = 0;
765 vacrel->live_tuples = 0;
766 vacrel->recently_dead_tuples = 0;
767 vacrel->missed_dead_tuples = 0;
768
769 vacrel->vm_new_visible_pages = 0;
770 vacrel->vm_new_visible_frozen_pages = 0;
771 vacrel->vm_new_frozen_pages = 0;
772
773 /*
774 * Get cutoffs that determine which deleted tuples are considered DEAD,
775 * not just RECENTLY_DEAD, and which XIDs/MXIDs to freeze. Then determine
776 * the extent of the blocks that we'll scan in lazy_scan_heap. It has to
777 * happen in this order to ensure that the OldestXmin cutoff field works
778 * as an upper bound on the XIDs stored in the pages we'll actually scan
779 * (NewRelfrozenXid tracking must never be allowed to miss unfrozen XIDs).
780 *
781 * Next acquire vistest, a related cutoff that's used in pruning. We use
782 * vistest in combination with OldestXmin to ensure that
783 * heap_page_prune_and_freeze() always removes any deleted tuple whose
784 * xmax is < OldestXmin. lazy_scan_prune must never become confused about
785 * whether a tuple should be frozen or removed. (In the future we might
786 * want to teach lazy_scan_prune to recompute vistest from time to time,
787 * to increase the number of dead tuples it can prune away.)
788 */
789 vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
790 vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
791 vacrel->vistest = GlobalVisTestFor(rel);
792
793 /* Initialize state used to track oldest extant XID/MXID */
794 vacrel->NewRelfrozenXid = vacrel->cutoffs.OldestXmin;
795 vacrel->NewRelminMxid = vacrel->cutoffs.OldestMxact;
796
797 /*
798 * Initialize state related to tracking all-visible page skipping. This is
799 * very important to determine whether or not it is safe to advance the
800 * relfrozenxid/relminmxid.
801 */
802 vacrel->skippedallvis = false;
803 skipwithvm = true;
805 {
806 /*
807 * Force aggressive mode, and disable skipping blocks using the
808 * visibility map (even those set all-frozen)
809 */
810 vacrel->aggressive = true;
811 skipwithvm = false;
812 }
813
814 vacrel->skipwithvm = skipwithvm;
815
816 /*
817 * Set up eager scan tracking state. This must happen after determining
818 * whether or not the vacuum must be aggressive, because only normal
819 * vacuums use the eager scan algorithm.
820 */
821 heap_vacuum_eager_scan_setup(vacrel, params);
822
823 if (verbose)
824 {
825 if (vacrel->aggressive)
827 (errmsg("aggressively vacuuming \"%s.%s.%s\"",
828 vacrel->dbname, vacrel->relnamespace,
829 vacrel->relname)));
830 else
832 (errmsg("vacuuming \"%s.%s.%s\"",
833 vacrel->dbname, vacrel->relnamespace,
834 vacrel->relname)));
835 }
836
837 /*
838 * Allocate dead_items memory using dead_items_alloc. This handles
839 * parallel VACUUM initialization as part of allocating shared memory
840 * space used for dead_items. (But do a failsafe precheck first, to
841 * ensure that parallel VACUUM won't be attempted at all when relfrozenxid
842 * is already dangerously old.)
843 */
845 dead_items_alloc(vacrel, params.nworkers);
846
847 /*
848 * Call lazy_scan_heap to perform all required heap pruning, index
849 * vacuuming, and heap vacuuming (plus related processing)
850 */
851 lazy_scan_heap(vacrel);
852
853 /*
854 * Free resources managed by dead_items_alloc. This ends parallel mode in
855 * passing when necessary.
856 */
857 dead_items_cleanup(vacrel);
859
860 /*
861 * Update pg_class entries for each of rel's indexes where appropriate.
862 *
863 * Unlike the later update to rel's pg_class entry, this is not critical.
864 * Maintains relpages/reltuples statistics used by the planner only.
865 */
866 if (vacrel->do_index_cleanup)
868
869 /* Done with rel's indexes */
870 vac_close_indexes(vacrel->nindexes, vacrel->indrels, NoLock);
871
872 /* Optionally truncate rel */
873 if (should_attempt_truncation(vacrel))
874 lazy_truncate_heap(vacrel);
875
876 /* Pop the error context stack */
877 error_context_stack = errcallback.previous;
878
879 /* Report that we are now doing final cleanup */
882
883 /*
884 * Prepare to update rel's pg_class entry.
885 *
886 * Aggressive VACUUMs must always be able to advance relfrozenxid to a
887 * value >= FreezeLimit, and relminmxid to a value >= MultiXactCutoff.
888 * Non-aggressive VACUUMs may advance them by any amount, or not at all.
889 */
890 Assert(vacrel->NewRelfrozenXid == vacrel->cutoffs.OldestXmin ||
892 vacrel->cutoffs.relfrozenxid,
893 vacrel->NewRelfrozenXid));
894 Assert(vacrel->NewRelminMxid == vacrel->cutoffs.OldestMxact ||
896 vacrel->cutoffs.relminmxid,
897 vacrel->NewRelminMxid));
898 if (vacrel->skippedallvis)
899 {
900 /*
901 * Must keep original relfrozenxid in a non-aggressive VACUUM that
902 * chose to skip an all-visible page range. The state that tracks new
903 * values will have missed unfrozen XIDs from the pages we skipped.
904 */
905 Assert(!vacrel->aggressive);
908 }
909
910 /*
911 * For safety, clamp relallvisible to be not more than what we're setting
912 * pg_class.relpages to
913 */
914 new_rel_pages = vacrel->rel_pages; /* After possible rel truncation */
915 visibilitymap_count(rel, &new_rel_allvisible, &new_rel_allfrozen);
916 if (new_rel_allvisible > new_rel_pages)
917 new_rel_allvisible = new_rel_pages;
918
919 /*
920 * An all-frozen block _must_ be all-visible. As such, clamp the count of
921 * all-frozen blocks to the count of all-visible blocks. This matches the
922 * clamping of relallvisible above.
923 */
924 if (new_rel_allfrozen > new_rel_allvisible)
925 new_rel_allfrozen = new_rel_allvisible;
926
927 /*
928 * Now actually update rel's pg_class entry.
929 *
930 * In principle new_live_tuples could be -1 indicating that we (still)
931 * don't know the tuple count. In practice that can't happen, since we
932 * scan every page that isn't skipped using the visibility map.
933 */
934 vac_update_relstats(rel, new_rel_pages, vacrel->new_live_tuples,
935 new_rel_allvisible, new_rel_allfrozen,
936 vacrel->nindexes > 0,
937 vacrel->NewRelfrozenXid, vacrel->NewRelminMxid,
938 &frozenxid_updated, &minmulti_updated, false);
939
940 /*
941 * Report results to the cumulative stats system, too.
942 *
943 * Deliberately avoid telling the stats system about LP_DEAD items that
944 * remain in the table due to VACUUM bypassing index and heap vacuuming.
945 * ANALYZE will consider the remaining LP_DEAD items to be dead "tuples".
946 * It seems like a good idea to err on the side of not vacuuming again too
947 * soon in cases where the failsafe prevented significant amounts of heap
948 * vacuuming.
949 */
951 rel->rd_rel->relisshared,
952 Max(vacrel->new_live_tuples, 0),
953 vacrel->recently_dead_tuples +
954 vacrel->missed_dead_tuples,
955 starttime);
957
958 if (instrument)
959 {
961
962 if (verbose || params.log_vacuum_min_duration == 0 ||
963 TimestampDifferenceExceeds(starttime, endtime,
965 {
966 long secs_dur;
967 int usecs_dur;
968 WalUsage walusage;
969 BufferUsage bufferusage;
971 char *msgfmt;
972 int32 diff;
973 double read_rate = 0,
974 write_rate = 0;
975 int64 total_blks_hit;
976 int64 total_blks_read;
977 int64 total_blks_dirtied;
978
979 TimestampDifference(starttime, endtime, &secs_dur, &usecs_dur);
980 memset(&walusage, 0, sizeof(WalUsage));
981 WalUsageAccumDiff(&walusage, &pgWalUsage, &startwalusage);
982 memset(&bufferusage, 0, sizeof(BufferUsage));
983 BufferUsageAccumDiff(&bufferusage, &pgBufferUsage, &startbufferusage);
984
985 total_blks_hit = bufferusage.shared_blks_hit +
986 bufferusage.local_blks_hit;
987 total_blks_read = bufferusage.shared_blks_read +
988 bufferusage.local_blks_read;
989 total_blks_dirtied = bufferusage.shared_blks_dirtied +
990 bufferusage.local_blks_dirtied;
991
993 if (verbose)
994 {
995 /*
996 * Aggressiveness already reported earlier, in dedicated
997 * VACUUM VERBOSE ereport
998 */
999 Assert(!params.is_wraparound);
1000 msgfmt = _("finished vacuuming \"%s.%s.%s\": index scans: %d\n");
1001 }
1002 else if (params.is_wraparound)
1003 {
1004 /*
1005 * While it's possible for a VACUUM to be both is_wraparound
1006 * and !aggressive, that's just a corner-case -- is_wraparound
1007 * implies aggressive. Produce distinct output for the corner
1008 * case all the same, just in case.
1009 */
1010 if (vacrel->aggressive)
1011 msgfmt = _("automatic aggressive vacuum to prevent wraparound of table \"%s.%s.%s\": index scans: %d\n");
1012 else
1013 msgfmt = _("automatic vacuum to prevent wraparound of table \"%s.%s.%s\": index scans: %d\n");
1014 }
1015 else
1016 {
1017 if (vacrel->aggressive)
1018 msgfmt = _("automatic aggressive vacuum of table \"%s.%s.%s\": index scans: %d\n");
1019 else
1020 msgfmt = _("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n");
1021 }
1022 appendStringInfo(&buf, msgfmt,
1023 vacrel->dbname,
1024 vacrel->relnamespace,
1025 vacrel->relname,
1026 vacrel->num_index_scans);
1027 appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total), %u eagerly scanned\n"),
1028 vacrel->removed_pages,
1029 new_rel_pages,
1030 vacrel->scanned_pages,
1031 orig_rel_pages == 0 ? 100.0 :
1032 100.0 * vacrel->scanned_pages /
1033 orig_rel_pages,
1034 vacrel->eager_scanned_pages);
1036 _("tuples: %" PRId64 " removed, %" PRId64 " remain, %" PRId64 " are dead but not yet removable\n"),
1037 vacrel->tuples_deleted,
1038 (int64) vacrel->new_rel_tuples,
1039 vacrel->recently_dead_tuples);
1040 if (vacrel->missed_dead_tuples > 0)
1042 _("tuples missed: %" PRId64 " dead from %u pages not removed due to cleanup lock contention\n"),
1043 vacrel->missed_dead_tuples,
1044 vacrel->missed_dead_pages);
1045 diff = (int32) (ReadNextTransactionId() -
1046 vacrel->cutoffs.OldestXmin);
1048 _("removable cutoff: %u, which was %d XIDs old when operation ended\n"),
1049 vacrel->cutoffs.OldestXmin, diff);
1050 if (frozenxid_updated)
1051 {
1052 diff = (int32) (vacrel->NewRelfrozenXid -
1053 vacrel->cutoffs.relfrozenxid);
1055 _("new relfrozenxid: %u, which is %d XIDs ahead of previous value\n"),
1056 vacrel->NewRelfrozenXid, diff);
1057 }
1058 if (minmulti_updated)
1059 {
1060 diff = (int32) (vacrel->NewRelminMxid -
1061 vacrel->cutoffs.relminmxid);
1063 _("new relminmxid: %u, which is %d MXIDs ahead of previous value\n"),
1064 vacrel->NewRelminMxid, diff);
1065 }
1066 appendStringInfo(&buf, _("frozen: %u pages from table (%.2f%% of total) had %" PRId64 " tuples frozen\n"),
1067 vacrel->new_frozen_tuple_pages,
1068 orig_rel_pages == 0 ? 100.0 :
1069 100.0 * vacrel->new_frozen_tuple_pages /
1070 orig_rel_pages,
1071 vacrel->tuples_frozen);
1072
1074 _("visibility map: %u pages set all-visible, %u pages set all-frozen (%u were all-visible)\n"),
1075 vacrel->vm_new_visible_pages,
1077 vacrel->vm_new_frozen_pages,
1078 vacrel->vm_new_frozen_pages);
1079 if (vacrel->do_index_vacuuming)
1080 {
1081 if (vacrel->nindexes == 0 || vacrel->num_index_scans == 0)
1082 appendStringInfoString(&buf, _("index scan not needed: "));
1083 else
1084 appendStringInfoString(&buf, _("index scan needed: "));
1085
1086 msgfmt = _("%u pages from table (%.2f%% of total) had %" PRId64 " dead item identifiers removed\n");
1087 }
1088 else
1089 {
1091 appendStringInfoString(&buf, _("index scan bypassed: "));
1092 else
1093 appendStringInfoString(&buf, _("index scan bypassed by failsafe: "));
1094
1095 msgfmt = _("%u pages from table (%.2f%% of total) have %" PRId64 " dead item identifiers\n");
1096 }
1097 appendStringInfo(&buf, msgfmt,
1098 vacrel->lpdead_item_pages,
1099 orig_rel_pages == 0 ? 100.0 :
1100 100.0 * vacrel->lpdead_item_pages / orig_rel_pages,
1101 vacrel->lpdead_items);
1102 for (int i = 0; i < vacrel->nindexes; i++)
1103 {
1104 IndexBulkDeleteResult *istat = vacrel->indstats[i];
1105
1106 if (!istat)
1107 continue;
1108
1110 _("index \"%s\": pages: %u in total, %u newly deleted, %u currently deleted, %u reusable\n"),
1111 indnames[i],
1112 istat->num_pages,
1113 istat->pages_newly_deleted,
1114 istat->pages_deleted,
1115 istat->pages_free);
1116 }
1118 {
1119 /*
1120 * We bypass the changecount mechanism because this value is
1121 * only updated by the calling process. We also rely on the
1122 * above call to pgstat_progress_end_command() to not clear
1123 * the st_progress_param array.
1124 */
1125 appendStringInfo(&buf, _("delay time: %.3f ms\n"),
1127 }
1128 if (track_io_timing)
1129 {
1130 double read_ms = (double) (pgStatBlockReadTime - startreadtime) / 1000;
1131 double write_ms = (double) (pgStatBlockWriteTime - startwritetime) / 1000;
1132
1133 appendStringInfo(&buf, _("I/O timings: read: %.3f ms, write: %.3f ms\n"),
1134 read_ms, write_ms);
1135 }
1136 if (secs_dur > 0 || usecs_dur > 0)
1137 {
1138 read_rate = (double) BLCKSZ * total_blks_read /
1139 (1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
1140 write_rate = (double) BLCKSZ * total_blks_dirtied /
1141 (1024 * 1024) / (secs_dur + usecs_dur / 1000000.0);
1142 }
1143 appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
1144 read_rate, write_rate);
1146 _("buffer usage: %" PRId64 " hits, %" PRId64 " reads, %" PRId64 " dirtied\n"),
1147 total_blks_hit,
1148 total_blks_read,
1149 total_blks_dirtied);
1151 _("WAL usage: %" PRId64 " records, %" PRId64 " full page images, %" PRIu64 " bytes, %" PRIu64 " full page image bytes, %" PRId64 " buffers full\n"),
1152 walusage.wal_records,
1153 walusage.wal_fpi,
1154 walusage.wal_bytes,
1155 walusage.wal_fpi_bytes,
1156 walusage.wal_buffers_full);
1157 appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
1158
1159 ereport(verbose ? INFO : LOG,
1160 (errmsg_internal("%s", buf.data)));
1161 pfree(buf.data);
1162 }
1163 }
1164
1165 /* Cleanup index statistics and index names */
1166 for (int i = 0; i < vacrel->nindexes; i++)
1167 {
1168 if (vacrel->indstats[i])
1169 pfree(vacrel->indstats[i]);
1170
1171 if (instrument)
1172 pfree(indnames[i]);
1173 }
1174}
1175
1176/*
1177 * lazy_scan_heap() -- workhorse function for VACUUM
1178 *
1179 * This routine prunes each page in the heap, and considers the need to
1180 * freeze remaining tuples with storage (not including pages that can be
1181 * skipped using the visibility map). Also performs related maintenance
1182 * of the FSM and visibility map. These steps all take place during an
1183 * initial pass over the target heap relation.
1184 *
1185 * Also invokes lazy_vacuum_all_indexes to vacuum indexes, which largely
1186 * consists of deleting index tuples that point to LP_DEAD items left in
1187 * heap pages following pruning. Earlier initial pass over the heap will
1188 * have collected the TIDs whose index tuples need to be removed.
1189 *
1190 * Finally, invokes lazy_vacuum_heap_rel to vacuum heap pages, which
1191 * largely consists of marking LP_DEAD items (from vacrel->dead_items)
1192 * as LP_UNUSED. This has to happen in a second, final pass over the
1193 * heap, to preserve a basic invariant that all index AMs rely on: no
1194 * extant index tuple can ever be allowed to contain a TID that points to
1195 * an LP_UNUSED line pointer in the heap. We must disallow premature
1196 * recycling of line pointers to avoid index scans that get confused
1197 * about which TID points to which tuple immediately after recycling.
1198 * (Actually, this isn't a concern when target heap relation happens to
1199 * have no indexes, which allows us to safely apply the one-pass strategy
1200 * as an optimization).
1201 *
1202 * In practice we often have enough space to fit all TIDs, and so won't
1203 * need to call lazy_vacuum more than once, after our initial pass over
1204 * the heap has totally finished. Otherwise things are slightly more
1205 * complicated: our "initial pass" over the heap applies only to those
1206 * pages that were pruned before we needed to call lazy_vacuum, and our
1207 * "final pass" over the heap only vacuums these same heap pages.
1208 * However, we process indexes in full every time lazy_vacuum is called,
1209 * which makes index processing very inefficient when memory is in short
1210 * supply.
1211 */
1212static void
1214{
1215 ReadStream *stream;
1216 BlockNumber rel_pages = vacrel->rel_pages,
1217 blkno = 0,
1218 next_fsm_block_to_vacuum = 0;
1219 BlockNumber orig_eager_scan_success_limit =
1220 vacrel->eager_scan_remaining_successes; /* for logging */
1221 Buffer vmbuffer = InvalidBuffer;
1222 const int initprog_index[] = {
1226 };
1227 int64 initprog_val[3];
1228
1229 /* Report that we're scanning the heap, advertising total # of blocks */
1230 initprog_val[0] = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
1231 initprog_val[1] = rel_pages;
1232 initprog_val[2] = vacrel->dead_items_info->max_bytes;
1233 pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
1234
1235 /* Initialize for the first heap_vac_scan_next_block() call */
1238 vacrel->next_unskippable_allvis = false;
1239 vacrel->next_unskippable_eager_scanned = false;
1241
1242 /*
1243 * Set up the read stream for vacuum's first pass through the heap.
1244 *
1245 * This could be made safe for READ_STREAM_USE_BATCHING, but only with
1246 * explicit work in heap_vac_scan_next_block.
1247 */
1249 vacrel->bstrategy,
1250 vacrel->rel,
1253 vacrel,
1254 sizeof(uint8));
1255
1256 while (true)
1257 {
1258 Buffer buf;
1259 Page page;
1260 uint8 blk_info = 0;
1261 int ndeleted = 0;
1262 bool has_lpdead_items;
1263 void *per_buffer_data = NULL;
1264 bool vm_page_frozen = false;
1265 bool got_cleanup_lock = false;
1266
1267 vacuum_delay_point(false);
1268
1269 /*
1270 * Regularly check if wraparound failsafe should trigger.
1271 *
1272 * There is a similar check inside lazy_vacuum_all_indexes(), but
1273 * relfrozenxid might start to look dangerously old before we reach
1274 * that point. This check also provides failsafe coverage for the
1275 * one-pass strategy, and the two-pass strategy with the index_cleanup
1276 * param set to 'off'.
1277 */
1278 if (vacrel->scanned_pages > 0 &&
1279 vacrel->scanned_pages % FAILSAFE_EVERY_PAGES == 0)
1281
1282 /*
1283 * Consider if we definitely have enough space to process TIDs on page
1284 * already. If we are close to overrunning the available space for
1285 * dead_items TIDs, pause and do a cycle of vacuuming before we tackle
1286 * this page. However, let's force at least one page-worth of tuples
1287 * to be stored as to ensure we do at least some work when the memory
1288 * configured is so low that we run out before storing anything.
1289 */
1290 if (vacrel->dead_items_info->num_items > 0 &&
1292 {
1293 /*
1294 * Before beginning index vacuuming, we release any pin we may
1295 * hold on the visibility map page. This isn't necessary for
1296 * correctness, but we do it anyway to avoid holding the pin
1297 * across a lengthy, unrelated operation.
1298 */
1299 if (BufferIsValid(vmbuffer))
1300 {
1301 ReleaseBuffer(vmbuffer);
1302 vmbuffer = InvalidBuffer;
1303 }
1304
1305 /* Perform a round of index and heap vacuuming */
1306 vacrel->consider_bypass_optimization = false;
1307 lazy_vacuum(vacrel);
1308
1309 /*
1310 * Vacuum the Free Space Map to make newly-freed space visible on
1311 * upper-level FSM pages. Note that blkno is the previously
1312 * processed block.
1313 */
1314 FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
1315 blkno + 1);
1316 next_fsm_block_to_vacuum = blkno;
1317
1318 /* Report that we are once again scanning the heap */
1321 }
1322
1323 buf = read_stream_next_buffer(stream, &per_buffer_data);
1324
1325 /* The relation is exhausted. */
1326 if (!BufferIsValid(buf))
1327 break;
1328
1329 blk_info = *((uint8 *) per_buffer_data);
1331 page = BufferGetPage(buf);
1332 blkno = BufferGetBlockNumber(buf);
1333
1334 vacrel->scanned_pages++;
1335 if (blk_info & VAC_BLK_WAS_EAGER_SCANNED)
1336 vacrel->eager_scanned_pages++;
1337
1338 /* Report as block scanned, update error traceback information */
1341 blkno, InvalidOffsetNumber);
1342
1343 /*
1344 * Pin the visibility map page in case we need to mark the page
1345 * all-visible. In most cases this will be very cheap, because we'll
1346 * already have the correct page pinned anyway.
1347 */
1348 visibilitymap_pin(vacrel->rel, blkno, &vmbuffer);
1349
1350 /*
1351 * We need a buffer cleanup lock to prune HOT chains and defragment
1352 * the page in lazy_scan_prune. But when it's not possible to acquire
1353 * a cleanup lock right away, we may be able to settle for reduced
1354 * processing using lazy_scan_noprune.
1355 */
1356 got_cleanup_lock = ConditionalLockBufferForCleanup(buf);
1357
1358 if (!got_cleanup_lock)
1360
1361 /* Check for new or empty pages before lazy_scan_[no]prune call */
1362 if (lazy_scan_new_or_empty(vacrel, buf, blkno, page, !got_cleanup_lock,
1363 vmbuffer))
1364 {
1365 /* Processed as new/empty page (lock and pin released) */
1366 continue;
1367 }
1368
1369 /*
1370 * If we didn't get the cleanup lock, we can still collect LP_DEAD
1371 * items in the dead_items area for later vacuuming, count live and
1372 * recently dead tuples for vacuum logging, and determine if this
1373 * block could later be truncated. If we encounter any xid/mxids that
1374 * require advancing the relfrozenxid/relminxid, we'll have to wait
1375 * for a cleanup lock and call lazy_scan_prune().
1376 */
1377 if (!got_cleanup_lock &&
1378 !lazy_scan_noprune(vacrel, buf, blkno, page, &has_lpdead_items))
1379 {
1380 /*
1381 * lazy_scan_noprune could not do all required processing. Wait
1382 * for a cleanup lock, and call lazy_scan_prune in the usual way.
1383 */
1384 Assert(vacrel->aggressive);
1387 got_cleanup_lock = true;
1388 }
1389
1390 /*
1391 * If we have a cleanup lock, we must now prune, freeze, and count
1392 * tuples. We may have acquired the cleanup lock originally, or we may
1393 * have gone back and acquired it after lazy_scan_noprune() returned
1394 * false. Either way, the page hasn't been processed yet.
1395 *
1396 * Like lazy_scan_noprune(), lazy_scan_prune() will count
1397 * recently_dead_tuples and live tuples for vacuum logging, determine
1398 * if the block can later be truncated, and accumulate the details of
1399 * remaining LP_DEAD line pointers on the page into dead_items. These
1400 * dead items include those pruned by lazy_scan_prune() as well as
1401 * line pointers previously marked LP_DEAD.
1402 */
1403 if (got_cleanup_lock)
1404 ndeleted = lazy_scan_prune(vacrel, buf, blkno, page,
1405 vmbuffer,
1407 &has_lpdead_items, &vm_page_frozen);
1408
1409 /*
1410 * Count an eagerly scanned page as a failure or a success.
1411 *
1412 * Only lazy_scan_prune() freezes pages, so if we didn't get the
1413 * cleanup lock, we won't have frozen the page. However, we only count
1414 * pages that were too new to require freezing as eager freeze
1415 * failures.
1416 *
1417 * We could gather more information from lazy_scan_noprune() about
1418 * whether or not there were tuples with XIDs or MXIDs older than the
1419 * FreezeLimit or MultiXactCutoff. However, for simplicity, we simply
1420 * exclude pages skipped due to cleanup lock contention from eager
1421 * freeze algorithm caps.
1422 */
1423 if (got_cleanup_lock &&
1424 (blk_info & VAC_BLK_WAS_EAGER_SCANNED))
1425 {
1426 /* Aggressive vacuums do not eager scan. */
1427 Assert(!vacrel->aggressive);
1428
1429 if (vm_page_frozen)
1430 {
1431 if (vacrel->eager_scan_remaining_successes > 0)
1433
1434 if (vacrel->eager_scan_remaining_successes == 0)
1435 {
1436 /*
1437 * Report only once that we disabled eager scanning. We
1438 * may eagerly read ahead blocks in excess of the success
1439 * or failure caps before attempting to freeze them, so we
1440 * could reach here even after disabling additional eager
1441 * scanning.
1442 */
1443 if (vacrel->eager_scan_max_fails_per_region > 0)
1444 ereport(vacrel->verbose ? INFO : DEBUG2,
1445 (errmsg("disabling eager scanning after freezing %u eagerly scanned blocks of relation \"%s.%s.%s\"",
1446 orig_eager_scan_success_limit,
1447 vacrel->dbname, vacrel->relnamespace,
1448 vacrel->relname)));
1449
1450 /*
1451 * If we hit our success cap, permanently disable eager
1452 * scanning by setting the other eager scan management
1453 * fields to their disabled values.
1454 */
1455 vacrel->eager_scan_remaining_fails = 0;
1458 }
1459 }
1460 else if (vacrel->eager_scan_remaining_fails > 0)
1462 }
1463
1464 /*
1465 * Now drop the buffer lock and, potentially, update the FSM.
1466 *
1467 * Our goal is to update the freespace map the last time we touch the
1468 * page. If we'll process a block in the second pass, we may free up
1469 * additional space on the page, so it is better to update the FSM
1470 * after the second pass. If the relation has no indexes, or if index
1471 * vacuuming is disabled, there will be no second heap pass; if this
1472 * particular page has no dead items, the second heap pass will not
1473 * touch this page. So, in those cases, update the FSM now.
1474 *
1475 * Note: In corner cases, it's possible to miss updating the FSM
1476 * entirely. If index vacuuming is currently enabled, we'll skip the
1477 * FSM update now. But if failsafe mode is later activated, or there
1478 * are so few dead tuples that index vacuuming is bypassed, there will
1479 * also be no opportunity to update the FSM later, because we'll never
1480 * revisit this page. Since updating the FSM is desirable but not
1481 * absolutely required, that's OK.
1482 */
1483 if (vacrel->nindexes == 0
1484 || !vacrel->do_index_vacuuming
1485 || !has_lpdead_items)
1486 {
1487 Size freespace = PageGetHeapFreeSpace(page);
1488
1490 RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
1491
1492 /*
1493 * Periodically perform FSM vacuuming to make newly-freed space
1494 * visible on upper FSM pages. This is done after vacuuming if the
1495 * table has indexes. There will only be newly-freed space if we
1496 * held the cleanup lock and lazy_scan_prune() was called.
1497 */
1498 if (got_cleanup_lock && vacrel->nindexes == 0 && ndeleted > 0 &&
1499 blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
1500 {
1501 FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
1502 blkno);
1503 next_fsm_block_to_vacuum = blkno;
1504 }
1505 }
1506 else
1508 }
1509
1510 vacrel->blkno = InvalidBlockNumber;
1511 if (BufferIsValid(vmbuffer))
1512 ReleaseBuffer(vmbuffer);
1513
1514 /*
1515 * Report that everything is now scanned. We never skip scanning the last
1516 * block in the relation, so we can pass rel_pages here.
1517 */
1519 rel_pages);
1520
1521 /* now we can compute the new value for pg_class.reltuples */
1522 vacrel->new_live_tuples = vac_estimate_reltuples(vacrel->rel, rel_pages,
1523 vacrel->scanned_pages,
1524 vacrel->live_tuples);
1525
1526 /*
1527 * Also compute the total number of surviving heap entries. In the
1528 * (unlikely) scenario that new_live_tuples is -1, take it as zero.
1529 */
1530 vacrel->new_rel_tuples =
1531 Max(vacrel->new_live_tuples, 0) + vacrel->recently_dead_tuples +
1532 vacrel->missed_dead_tuples;
1533
1534 read_stream_end(stream);
1535
1536 /*
1537 * Do index vacuuming (call each index's ambulkdelete routine), then do
1538 * related heap vacuuming
1539 */
1540 if (vacrel->dead_items_info->num_items > 0)
1541 lazy_vacuum(vacrel);
1542
1543 /*
1544 * Vacuum the remainder of the Free Space Map. We must do this whether or
1545 * not there were indexes, and whether or not we bypassed index vacuuming.
1546 * We can pass rel_pages here because we never skip scanning the last
1547 * block of the relation.
1548 */
1549 if (rel_pages > next_fsm_block_to_vacuum)
1550 FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum, rel_pages);
1551
1552 /* report all blocks vacuumed */
1554
1555 /* Do final index cleanup (call each index's amvacuumcleanup routine) */
1556 if (vacrel->nindexes > 0 && vacrel->do_index_cleanup)
1558}
1559
1560/*
1561 * heap_vac_scan_next_block() -- read stream callback to get the next block
1562 * for vacuum to process
1563 *
1564 * Every time lazy_scan_heap() needs a new block to process during its first
1565 * phase, it invokes read_stream_next_buffer() with a stream set up to call
1566 * heap_vac_scan_next_block() to get the next block.
1567 *
1568 * heap_vac_scan_next_block() uses the visibility map, vacuum options, and
1569 * various thresholds to skip blocks which do not need to be processed and
1570 * returns the next block to process or InvalidBlockNumber if there are no
1571 * remaining blocks.
1572 *
1573 * The visibility status of the next block to process and whether or not it
1574 * was eager scanned is set in the per_buffer_data.
1575 *
1576 * callback_private_data contains a reference to the LVRelState, passed to the
1577 * read stream API during stream setup. The LVRelState is an in/out parameter
1578 * here (locally named `vacrel`). Vacuum options and information about the
1579 * relation are read from it. vacrel->skippedallvis is set if we skip a block
1580 * that's all-visible but not all-frozen (to ensure that we don't update
1581 * relfrozenxid in that case). vacrel also holds information about the next
1582 * unskippable block -- as bookkeeping for this function.
1583 */
1584static BlockNumber
1586 void *callback_private_data,
1587 void *per_buffer_data)
1588{
1589 BlockNumber next_block;
1590 LVRelState *vacrel = callback_private_data;
1591 uint8 blk_info = 0;
1592
1593 /* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
1594 next_block = vacrel->current_block + 1;
1595
1596 /* Have we reached the end of the relation? */
1597 if (next_block >= vacrel->rel_pages)
1598 {
1600 {
1603 }
1604 return InvalidBlockNumber;
1605 }
1606
1607 /*
1608 * We must be in one of the three following states:
1609 */
1610 if (next_block > vacrel->next_unskippable_block ||
1612 {
1613 /*
1614 * 1. We have just processed an unskippable block (or we're at the
1615 * beginning of the scan). Find the next unskippable block using the
1616 * visibility map.
1617 */
1618 bool skipsallvis;
1619
1620 find_next_unskippable_block(vacrel, &skipsallvis);
1621
1622 /*
1623 * We now know the next block that we must process. It can be the
1624 * next block after the one we just processed, or something further
1625 * ahead. If it's further ahead, we can jump to it, but we choose to
1626 * do so only if we can skip at least SKIP_PAGES_THRESHOLD consecutive
1627 * pages. Since we're reading sequentially, the OS should be doing
1628 * readahead for us, so there's no gain in skipping a page now and
1629 * then. Skipping such a range might even discourage sequential
1630 * detection.
1631 *
1632 * This test also enables more frequent relfrozenxid advancement
1633 * during non-aggressive VACUUMs. If the range has any all-visible
1634 * pages then skipping makes updating relfrozenxid unsafe, which is a
1635 * real downside.
1636 */
1637 if (vacrel->next_unskippable_block - next_block >= SKIP_PAGES_THRESHOLD)
1638 {
1639 next_block = vacrel->next_unskippable_block;
1640 if (skipsallvis)
1641 vacrel->skippedallvis = true;
1642 }
1643 }
1644
1645 /* Now we must be in one of the two remaining states: */
1646 if (next_block < vacrel->next_unskippable_block)
1647 {
1648 /*
1649 * 2. We are processing a range of blocks that we could have skipped
1650 * but chose not to. We know that they are all-visible in the VM,
1651 * otherwise they would've been unskippable.
1652 */
1653 vacrel->current_block = next_block;
1655 *((uint8 *) per_buffer_data) = blk_info;
1656 return vacrel->current_block;
1657 }
1658 else
1659 {
1660 /*
1661 * 3. We reached the next unskippable block. Process it. On next
1662 * iteration, we will be back in state 1.
1663 */
1664 Assert(next_block == vacrel->next_unskippable_block);
1665
1666 vacrel->current_block = next_block;
1667 if (vacrel->next_unskippable_allvis)
1670 blk_info |= VAC_BLK_WAS_EAGER_SCANNED;
1671 *((uint8 *) per_buffer_data) = blk_info;
1672 return vacrel->current_block;
1673 }
1674}
1675
1676/*
1677 * Find the next unskippable block in a vacuum scan using the visibility map.
1678 * The next unskippable block and its visibility information is updated in
1679 * vacrel.
1680 *
1681 * Note: our opinion of which blocks can be skipped can go stale immediately.
1682 * It's okay if caller "misses" a page whose all-visible or all-frozen marking
1683 * was concurrently cleared, though. All that matters is that caller scan all
1684 * pages whose tuples might contain XIDs < OldestXmin, or MXIDs < OldestMxact.
1685 * (Actually, non-aggressive VACUUMs can choose to skip all-visible pages with
1686 * older XIDs/MXIDs. The *skippedallvis flag will be set here when the choice
1687 * to skip such a range is actually made, making everything safe.)
1688 */
1689static void
1690find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
1691{
1692 BlockNumber rel_pages = vacrel->rel_pages;
1693 BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1;
1694 Buffer next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer;
1695 bool next_unskippable_eager_scanned = false;
1696 bool next_unskippable_allvis;
1697
1698 *skipsallvis = false;
1699
1700 for (;; next_unskippable_block++)
1701 {
1702 uint8 mapbits = visibilitymap_get_status(vacrel->rel,
1703 next_unskippable_block,
1704 &next_unskippable_vmbuffer);
1705
1706 next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
1707
1708 /*
1709 * At the start of each eager scan region, normal vacuums with eager
1710 * scanning enabled reset the failure counter, allowing vacuum to
1711 * resume eager scanning if it had been suspended in the previous
1712 * region.
1713 */
1714 if (next_unskippable_block >= vacrel->next_eager_scan_region_start)
1715 {
1719 }
1720
1721 /*
1722 * A block is unskippable if it is not all visible according to the
1723 * visibility map.
1724 */
1725 if (!next_unskippable_allvis)
1726 {
1727 Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
1728 break;
1729 }
1730
1731 /*
1732 * Caller must scan the last page to determine whether it has tuples
1733 * (caller must have the opportunity to set vacrel->nonempty_pages).
1734 * This rule avoids having lazy_truncate_heap() take access-exclusive
1735 * lock on rel to attempt a truncation that fails anyway, just because
1736 * there are tuples on the last page (it is likely that there will be
1737 * tuples on other nearby pages as well, but those can be skipped).
1738 *
1739 * Implement this by always treating the last block as unsafe to skip.
1740 */
1741 if (next_unskippable_block == rel_pages - 1)
1742 break;
1743
1744 /* DISABLE_PAGE_SKIPPING makes all skipping unsafe */
1745 if (!vacrel->skipwithvm)
1746 break;
1747
1748 /*
1749 * All-frozen pages cannot contain XIDs < OldestXmin (XIDs that aren't
1750 * already frozen by now), so this page can be skipped.
1751 */
1752 if ((mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
1753 continue;
1754
1755 /*
1756 * Aggressive vacuums cannot skip any all-visible pages that are not
1757 * also all-frozen.
1758 */
1759 if (vacrel->aggressive)
1760 break;
1761
1762 /*
1763 * Normal vacuums with eager scanning enabled only skip all-visible
1764 * but not all-frozen pages if they have hit the failure limit for the
1765 * current eager scan region.
1766 */
1767 if (vacrel->eager_scan_remaining_fails > 0)
1768 {
1769 next_unskippable_eager_scanned = true;
1770 break;
1771 }
1772
1773 /*
1774 * All-visible blocks are safe to skip in a normal vacuum. But
1775 * remember that the final range contains such a block for later.
1776 */
1777 *skipsallvis = true;
1778 }
1779
1780 /* write the local variables back to vacrel */
1781 vacrel->next_unskippable_block = next_unskippable_block;
1782 vacrel->next_unskippable_allvis = next_unskippable_allvis;
1783 vacrel->next_unskippable_eager_scanned = next_unskippable_eager_scanned;
1784 vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer;
1785}
1786
1787/*
1788 * lazy_scan_new_or_empty() -- lazy_scan_heap() new/empty page handling.
1789 *
1790 * Must call here to handle both new and empty pages before calling
1791 * lazy_scan_prune or lazy_scan_noprune, since they're not prepared to deal
1792 * with new or empty pages.
1793 *
1794 * It's necessary to consider new pages as a special case, since the rules for
1795 * maintaining the visibility map and FSM with empty pages are a little
1796 * different (though new pages can be truncated away during rel truncation).
1797 *
1798 * Empty pages are not really a special case -- they're just heap pages that
1799 * have no allocated tuples (including even LP_UNUSED items). You might
1800 * wonder why we need to handle them here all the same. It's only necessary
1801 * because of a corner-case involving a hard crash during heap relation
1802 * extension. If we ever make relation-extension crash safe, then it should
1803 * no longer be necessary to deal with empty pages here (or new pages, for
1804 * that matter).
1805 *
1806 * Caller must hold at least a shared lock. We might need to escalate the
1807 * lock in that case, so the type of lock caller holds needs to be specified
1808 * using 'sharelock' argument.
1809 *
1810 * Returns false in common case where caller should go on to call
1811 * lazy_scan_prune (or lazy_scan_noprune). Otherwise returns true, indicating
1812 * that lazy_scan_heap is done processing the page, releasing lock on caller's
1813 * behalf.
1814 *
1815 * No vm_page_frozen output parameter (like that passed to lazy_scan_prune())
1816 * is passed here because neither empty nor new pages can be eagerly frozen.
1817 * New pages are never frozen. Empty pages are always set frozen in the VM at
1818 * the same time that they are set all-visible, and we don't eagerly scan
1819 * frozen pages.
1820 */
1821static bool
1823 Page page, bool sharelock, Buffer vmbuffer)
1824{
1825 Size freespace;
1826
1827 if (PageIsNew(page))
1828 {
1829 /*
1830 * All-zeroes pages can be left over if either a backend extends the
1831 * relation by a single page, but crashes before the newly initialized
1832 * page has been written out, or when bulk-extending the relation
1833 * (which creates a number of empty pages at the tail end of the
1834 * relation), and then enters them into the FSM.
1835 *
1836 * Note we do not enter the page into the visibilitymap. That has the
1837 * downside that we repeatedly visit this page in subsequent vacuums,
1838 * but otherwise we'll never discover the space on a promoted standby.
1839 * The harm of repeated checking ought to normally not be too bad. The
1840 * space usually should be used at some point, otherwise there
1841 * wouldn't be any regular vacuums.
1842 *
1843 * Make sure these pages are in the FSM, to ensure they can be reused.
1844 * Do that by testing if there's any space recorded for the page. If
1845 * not, enter it. We do so after releasing the lock on the heap page,
1846 * the FSM is approximate, after all.
1847 */
1849
1850 if (GetRecordedFreeSpace(vacrel->rel, blkno) == 0)
1851 {
1852 freespace = BLCKSZ - SizeOfPageHeaderData;
1853
1854 RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
1855 }
1856
1857 return true;
1858 }
1859
1860 if (PageIsEmpty(page))
1861 {
1862 /*
1863 * It seems likely that caller will always be able to get a cleanup
1864 * lock on an empty page. But don't take any chances -- escalate to
1865 * an exclusive lock (still don't need a cleanup lock, though).
1866 */
1867 if (sharelock)
1868 {
1871
1872 if (!PageIsEmpty(page))
1873 {
1874 /* page isn't new or empty -- keep lock and pin for now */
1875 return false;
1876 }
1877 }
1878 else
1879 {
1880 /* Already have a full cleanup lock (which is more than enough) */
1881 }
1882
1883 /*
1884 * Unlike new pages, empty pages are always set all-visible and
1885 * all-frozen.
1886 */
1887 if (!PageIsAllVisible(page))
1888 {
1890
1891 /* mark buffer dirty before writing a WAL record */
1893
1894 /*
1895 * It's possible that another backend has extended the heap,
1896 * initialized the page, and then failed to WAL-log the page due
1897 * to an ERROR. Since heap extension is not WAL-logged, recovery
1898 * might try to replay our record setting the page all-visible and
1899 * find that the page isn't initialized, which will cause a PANIC.
1900 * To prevent that, check whether the page has been previously
1901 * WAL-logged, and if not, do that now.
1902 */
1903 if (RelationNeedsWAL(vacrel->rel) &&
1905 log_newpage_buffer(buf, true);
1906
1907 PageSetAllVisible(page);
1908 visibilitymap_set(vacrel->rel, blkno, buf,
1910 vmbuffer, InvalidTransactionId,
1914
1915 /* Count the newly all-frozen pages for logging */
1916 vacrel->vm_new_visible_pages++;
1918 }
1919
1920 freespace = PageGetHeapFreeSpace(page);
1922 RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
1923 return true;
1924 }
1925
1926 /* page isn't new or empty -- keep lock and pin */
1927 return false;
1928}
1929
1930/* qsort comparator for sorting OffsetNumbers */
1931static int
1932cmpOffsetNumbers(const void *a, const void *b)
1933{
1934 return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
1935}
1936
1937/*
1938 * lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
1939 *
1940 * Caller must hold pin and buffer cleanup lock on the buffer.
1941 *
1942 * vmbuffer is the buffer containing the VM block with visibility information
1943 * for the heap block, blkno. all_visible_according_to_vm is the saved
1944 * visibility status of the heap block looked up earlier by the caller. We
1945 * won't rely entirely on this status, as it may be out of date.
1946 *
1947 * *has_lpdead_items is set to true or false depending on whether, upon return
1948 * from this function, any LP_DEAD items are still present on the page.
1949 *
1950 * *vm_page_frozen is set to true if the page is newly set all-frozen in the
1951 * VM. The caller currently only uses this for determining whether an eagerly
1952 * scanned page was successfully set all-frozen.
1953 *
1954 * Returns the number of tuples deleted from the page during HOT pruning.
1955 */
1956static int
1958 Buffer buf,
1959 BlockNumber blkno,
1960 Page page,
1961 Buffer vmbuffer,
1962 bool all_visible_according_to_vm,
1963 bool *has_lpdead_items,
1964 bool *vm_page_frozen)
1965{
1966 Relation rel = vacrel->rel;
1967 PruneFreezeResult presult;
1968 PruneFreezeParams params = {
1969 .relation = rel,
1970 .buffer = buf,
1971 .reason = PRUNE_VACUUM_SCAN,
1972 .options = HEAP_PAGE_PRUNE_FREEZE,
1973 .vistest = vacrel->vistest,
1974 .cutoffs = &vacrel->cutoffs,
1975 };
1976
1977 Assert(BufferGetBlockNumber(buf) == blkno);
1978
1979 /*
1980 * Prune all HOT-update chains and potentially freeze tuples on this page.
1981 *
1982 * If the relation has no indexes, we can immediately mark would-be dead
1983 * items LP_UNUSED.
1984 *
1985 * The number of tuples removed from the page is returned in
1986 * presult.ndeleted. It should not be confused with presult.lpdead_items;
1987 * presult.lpdead_items's final value can be thought of as the number of
1988 * tuples that were deleted from indexes.
1989 *
1990 * We will update the VM after collecting LP_DEAD items and freezing
1991 * tuples. Pruning will have determined whether or not the page is
1992 * all-visible.
1993 */
1994 if (vacrel->nindexes == 0)
1996
1998 &presult,
1999 &vacrel->offnum,
2000 &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
2001
2004
2005 if (presult.nfrozen > 0)
2006 {
2007 /*
2008 * We don't increment the new_frozen_tuple_pages instrumentation
2009 * counter when nfrozen == 0, since it only counts pages with newly
2010 * frozen tuples (don't confuse that with pages newly set all-frozen
2011 * in VM).
2012 */
2013 vacrel->new_frozen_tuple_pages++;
2014 }
2015
2016 /*
2017 * VACUUM will call heap_page_is_all_visible() during the second pass over
2018 * the heap to determine all_visible and all_frozen for the page -- this
2019 * is a specialized version of the logic from this function. Now that
2020 * we've finished pruning and freezing, make sure that we're in total
2021 * agreement with heap_page_is_all_visible() using an assertion.
2022 */
2023#ifdef USE_ASSERT_CHECKING
2024 if (presult.all_visible)
2025 {
2026 TransactionId debug_cutoff;
2027 bool debug_all_frozen;
2028
2029 Assert(presult.lpdead_items == 0);
2030
2031 if (!heap_page_is_all_visible(vacrel->rel, buf,
2032 vacrel->cutoffs.OldestXmin, &debug_all_frozen,
2033 &debug_cutoff, &vacrel->offnum))
2034 Assert(false);
2035
2036 Assert(presult.all_frozen == debug_all_frozen);
2037
2038 Assert(!TransactionIdIsValid(debug_cutoff) ||
2039 debug_cutoff == presult.vm_conflict_horizon);
2040 }
2041#endif
2042
2043 /*
2044 * Now save details of the LP_DEAD items from the page in vacrel
2045 */
2046 if (presult.lpdead_items > 0)
2047 {
2048 vacrel->lpdead_item_pages++;
2049
2050 /*
2051 * deadoffsets are collected incrementally in
2052 * heap_page_prune_and_freeze() as each dead line pointer is recorded,
2053 * with an indeterminate order, but dead_items_add requires them to be
2054 * sorted.
2055 */
2056 qsort(presult.deadoffsets, presult.lpdead_items, sizeof(OffsetNumber),
2058
2059 dead_items_add(vacrel, blkno, presult.deadoffsets, presult.lpdead_items);
2060 }
2061
2062 /* Finally, add page-local counts to whole-VACUUM counts */
2063 vacrel->tuples_deleted += presult.ndeleted;
2064 vacrel->tuples_frozen += presult.nfrozen;
2065 vacrel->lpdead_items += presult.lpdead_items;
2066 vacrel->live_tuples += presult.live_tuples;
2067 vacrel->recently_dead_tuples += presult.recently_dead_tuples;
2068
2069 /* Can't truncate this page */
2070 if (presult.hastup)
2071 vacrel->nonempty_pages = blkno + 1;
2072
2073 /* Did we find LP_DEAD items? */
2074 *has_lpdead_items = (presult.lpdead_items > 0);
2075
2076 Assert(!presult.all_visible || !(*has_lpdead_items));
2077 Assert(!presult.all_frozen || presult.all_visible);
2078
2079 /*
2080 * Handle setting visibility map bit based on information from the VM (as
2081 * of last heap_vac_scan_next_block() call), and from all_visible and
2082 * all_frozen variables
2083 */
2084 if (!all_visible_according_to_vm && presult.all_visible)
2085 {
2086 uint8 old_vmbits;
2088
2089 if (presult.all_frozen)
2090 {
2092 flags |= VISIBILITYMAP_ALL_FROZEN;
2093 }
2094
2095 /*
2096 * It should never be the case that the visibility map page is set
2097 * while the page-level bit is clear, but the reverse is allowed (if
2098 * checksums are not enabled). Regardless, set both bits so that we
2099 * get back in sync.
2100 *
2101 * NB: If the heap page is all-visible but the VM bit is not set, we
2102 * don't need to dirty the heap page. However, if checksums are
2103 * enabled, we do need to make sure that the heap page is dirtied
2104 * before passing it to visibilitymap_set(), because it may be logged.
2105 * Given that this situation should only happen in rare cases after a
2106 * crash, it is not worth optimizing.
2107 */
2108 PageSetAllVisible(page);
2110 old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
2112 vmbuffer, presult.vm_conflict_horizon,
2113 flags);
2114
2115 /*
2116 * If the page wasn't already set all-visible and/or all-frozen in the
2117 * VM, count it as newly set for logging.
2118 */
2119 if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
2120 {
2121 vacrel->vm_new_visible_pages++;
2122 if (presult.all_frozen)
2123 {
2125 *vm_page_frozen = true;
2126 }
2127 }
2128 else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
2129 presult.all_frozen)
2130 {
2131 vacrel->vm_new_frozen_pages++;
2132 *vm_page_frozen = true;
2133 }
2134 }
2135
2136 /*
2137 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
2138 * page-level bit is clear. However, it's possible that the bit got
2139 * cleared after heap_vac_scan_next_block() was called, so we must recheck
2140 * with buffer lock before concluding that the VM is corrupt.
2141 */
2142 else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
2143 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
2144 {
2147 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
2148 vacrel->relname, blkno)));
2149
2150 visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
2152 }
2153
2154 /*
2155 * It's possible for the value returned by
2156 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
2157 * wrong for us to see tuples that appear to not be visible to everyone
2158 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
2159 * never moves backwards, but GetOldestNonRemovableTransactionId() is
2160 * conservative and sometimes returns a value that's unnecessarily small,
2161 * so if we see that contradiction it just means that the tuples that we
2162 * think are not visible to everyone yet actually are, and the
2163 * PD_ALL_VISIBLE flag is correct.
2164 *
2165 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
2166 * however.
2167 */
2168 else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
2169 {
2172 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
2173 vacrel->relname, blkno)));
2174
2175 PageClearAllVisible(page);
2177 visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
2179 }
2180
2181 /*
2182 * If the all-visible page is all-frozen but not marked as such yet, mark
2183 * it as all-frozen.
2184 */
2185 else if (all_visible_according_to_vm && presult.all_frozen &&
2186 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
2187 {
2188 uint8 old_vmbits;
2189
2190 /*
2191 * Avoid relying on all_visible_according_to_vm as a proxy for the
2192 * page-level PD_ALL_VISIBLE bit being set, since it might have become
2193 * stale -- even when all_visible is set
2194 */
2195 if (!PageIsAllVisible(page))
2196 {
2197 PageSetAllVisible(page);
2199 }
2200
2201 /*
2202 * Set the page all-frozen (and all-visible) in the VM.
2203 *
2204 * We can pass InvalidTransactionId as our cutoff_xid, since a
2205 * snapshotConflictHorizon sufficient to make everything safe for REDO
2206 * was logged when the page's tuples were frozen.
2207 */
2209 old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
2211 vmbuffer, InvalidTransactionId,
2214
2215 /*
2216 * The page was likely already set all-visible in the VM. However,
2217 * there is a small chance that it was modified sometime between
2218 * setting all_visible_according_to_vm and checking the visibility
2219 * during pruning. Check the return value of old_vmbits anyway to
2220 * ensure the visibility map counters used for logging are accurate.
2221 */
2222 if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
2223 {
2224 vacrel->vm_new_visible_pages++;
2226 *vm_page_frozen = true;
2227 }
2228
2229 /*
2230 * We already checked that the page was not set all-frozen in the VM
2231 * above, so we don't need to test the value of old_vmbits.
2232 */
2233 else
2234 {
2235 vacrel->vm_new_frozen_pages++;
2236 *vm_page_frozen = true;
2237 }
2238 }
2239
2240 return presult.ndeleted;
2241}
2242
2243/*
2244 * lazy_scan_noprune() -- lazy_scan_prune() without pruning or freezing
2245 *
2246 * Caller need only hold a pin and share lock on the buffer, unlike
2247 * lazy_scan_prune, which requires a full cleanup lock. While pruning isn't
2248 * performed here, it's quite possible that an earlier opportunistic pruning
2249 * operation left LP_DEAD items behind. We'll at least collect any such items
2250 * in dead_items for removal from indexes.
2251 *
2252 * For aggressive VACUUM callers, we may return false to indicate that a full
2253 * cleanup lock is required for processing by lazy_scan_prune. This is only
2254 * necessary when the aggressive VACUUM needs to freeze some tuple XIDs from
2255 * one or more tuples on the page. We always return true for non-aggressive
2256 * callers.
2257 *
2258 * If this function returns true, *has_lpdead_items gets set to true or false
2259 * depending on whether, upon return from this function, any LP_DEAD items are
2260 * present on the page. If this function returns false, *has_lpdead_items
2261 * is not updated.
2262 */
2263static bool
2265 Buffer buf,
2266 BlockNumber blkno,
2267 Page page,
2268 bool *has_lpdead_items)
2269{
2270 OffsetNumber offnum,
2271 maxoff;
2272 int lpdead_items,
2273 live_tuples,
2274 recently_dead_tuples,
2275 missed_dead_tuples;
2276 bool hastup;
2277 HeapTupleHeader tupleheader;
2278 TransactionId NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
2279 MultiXactId NoFreezePageRelminMxid = vacrel->NewRelminMxid;
2281
2282 Assert(BufferGetBlockNumber(buf) == blkno);
2283
2284 hastup = false; /* for now */
2285
2286 lpdead_items = 0;
2287 live_tuples = 0;
2288 recently_dead_tuples = 0;
2289 missed_dead_tuples = 0;
2290
2291 maxoff = PageGetMaxOffsetNumber(page);
2292 for (offnum = FirstOffsetNumber;
2293 offnum <= maxoff;
2294 offnum = OffsetNumberNext(offnum))
2295 {
2296 ItemId itemid;
2297 HeapTupleData tuple;
2298
2299 vacrel->offnum = offnum;
2300 itemid = PageGetItemId(page, offnum);
2301
2302 if (!ItemIdIsUsed(itemid))
2303 continue;
2304
2305 if (ItemIdIsRedirected(itemid))
2306 {
2307 hastup = true;
2308 continue;
2309 }
2310
2311 if (ItemIdIsDead(itemid))
2312 {
2313 /*
2314 * Deliberately don't set hastup=true here. See same point in
2315 * lazy_scan_prune for an explanation.
2316 */
2317 deadoffsets[lpdead_items++] = offnum;
2318 continue;
2319 }
2320
2321 hastup = true; /* page prevents rel truncation */
2322 tupleheader = (HeapTupleHeader) PageGetItem(page, itemid);
2323 if (heap_tuple_should_freeze(tupleheader, &vacrel->cutoffs,
2324 &NoFreezePageRelfrozenXid,
2325 &NoFreezePageRelminMxid))
2326 {
2327 /* Tuple with XID < FreezeLimit (or MXID < MultiXactCutoff) */
2328 if (vacrel->aggressive)
2329 {
2330 /*
2331 * Aggressive VACUUMs must always be able to advance rel's
2332 * relfrozenxid to a value >= FreezeLimit (and be able to
2333 * advance rel's relminmxid to a value >= MultiXactCutoff).
2334 * The ongoing aggressive VACUUM won't be able to do that
2335 * unless it can freeze an XID (or MXID) from this tuple now.
2336 *
2337 * The only safe option is to have caller perform processing
2338 * of this page using lazy_scan_prune. Caller might have to
2339 * wait a while for a cleanup lock, but it can't be helped.
2340 */
2341 vacrel->offnum = InvalidOffsetNumber;
2342 return false;
2343 }
2344
2345 /*
2346 * Non-aggressive VACUUMs are under no obligation to advance
2347 * relfrozenxid (even by one XID). We can be much laxer here.
2348 *
2349 * Currently we always just accept an older final relfrozenxid
2350 * and/or relminmxid value. We never make caller wait or work a
2351 * little harder, even when it likely makes sense to do so.
2352 */
2353 }
2354
2355 ItemPointerSet(&(tuple.t_self), blkno, offnum);
2356 tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
2357 tuple.t_len = ItemIdGetLength(itemid);
2358 tuple.t_tableOid = RelationGetRelid(vacrel->rel);
2359
2360 switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
2361 buf))
2362 {
2364 case HEAPTUPLE_LIVE:
2365
2366 /*
2367 * Count both cases as live, just like lazy_scan_prune
2368 */
2369 live_tuples++;
2370
2371 break;
2372 case HEAPTUPLE_DEAD:
2373
2374 /*
2375 * There is some useful work for pruning to do, that won't be
2376 * done due to failure to get a cleanup lock.
2377 */
2378 missed_dead_tuples++;
2379 break;
2381
2382 /*
2383 * Count in recently_dead_tuples, just like lazy_scan_prune
2384 */
2385 recently_dead_tuples++;
2386 break;
2388
2389 /*
2390 * Do not count these rows as live, just like lazy_scan_prune
2391 */
2392 break;
2393 default:
2394 elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
2395 break;
2396 }
2397 }
2398
2399 vacrel->offnum = InvalidOffsetNumber;
2400
2401 /*
2402 * By here we know for sure that caller can put off freezing and pruning
2403 * this particular page until the next VACUUM. Remember its details now.
2404 * (lazy_scan_prune expects a clean slate, so we have to do this last.)
2405 */
2406 vacrel->NewRelfrozenXid = NoFreezePageRelfrozenXid;
2407 vacrel->NewRelminMxid = NoFreezePageRelminMxid;
2408
2409 /* Save any LP_DEAD items found on the page in dead_items */
2410 if (vacrel->nindexes == 0)
2411 {
2412 /* Using one-pass strategy (since table has no indexes) */
2413 if (lpdead_items > 0)
2414 {
2415 /*
2416 * Perfunctory handling for the corner case where a single pass
2417 * strategy VACUUM cannot get a cleanup lock, and it turns out
2418 * that there is one or more LP_DEAD items: just count the LP_DEAD
2419 * items as missed_dead_tuples instead. (This is a bit dishonest,
2420 * but it beats having to maintain specialized heap vacuuming code
2421 * forever, for vanishingly little benefit.)
2422 */
2423 hastup = true;
2424 missed_dead_tuples += lpdead_items;
2425 }
2426 }
2427 else if (lpdead_items > 0)
2428 {
2429 /*
2430 * Page has LP_DEAD items, and so any references/TIDs that remain in
2431 * indexes will be deleted during index vacuuming (and then marked
2432 * LP_UNUSED in the heap)
2433 */
2434 vacrel->lpdead_item_pages++;
2435
2436 dead_items_add(vacrel, blkno, deadoffsets, lpdead_items);
2437
2438 vacrel->lpdead_items += lpdead_items;
2439 }
2440
2441 /*
2442 * Finally, add relevant page-local counts to whole-VACUUM counts
2443 */
2444 vacrel->live_tuples += live_tuples;
2445 vacrel->recently_dead_tuples += recently_dead_tuples;
2446 vacrel->missed_dead_tuples += missed_dead_tuples;
2447 if (missed_dead_tuples > 0)
2448 vacrel->missed_dead_pages++;
2449
2450 /* Can't truncate this page */
2451 if (hastup)
2452 vacrel->nonempty_pages = blkno + 1;
2453
2454 /* Did we find LP_DEAD items? */
2455 *has_lpdead_items = (lpdead_items > 0);
2456
2457 /* Caller won't need to call lazy_scan_prune with same page */
2458 return true;
2459}
2460
2461/*
2462 * Main entry point for index vacuuming and heap vacuuming.
2463 *
2464 * Removes items collected in dead_items from table's indexes, then marks the
2465 * same items LP_UNUSED in the heap. See the comments above lazy_scan_heap
2466 * for full details.
2467 *
2468 * Also empties dead_items, freeing up space for later TIDs.
2469 *
2470 * We may choose to bypass index vacuuming at this point, though only when the
2471 * ongoing VACUUM operation will definitely only have one index scan/round of
2472 * index vacuuming.
2473 */
2474static void
2476{
2477 bool bypass;
2478
2479 /* Should not end up here with no indexes */
2480 Assert(vacrel->nindexes > 0);
2481 Assert(vacrel->lpdead_item_pages > 0);
2482
2483 if (!vacrel->do_index_vacuuming)
2484 {
2485 Assert(!vacrel->do_index_cleanup);
2486 dead_items_reset(vacrel);
2487 return;
2488 }
2489
2490 /*
2491 * Consider bypassing index vacuuming (and heap vacuuming) entirely.
2492 *
2493 * We currently only do this in cases where the number of LP_DEAD items
2494 * for the entire VACUUM operation is close to zero. This avoids sharp
2495 * discontinuities in the duration and overhead of successive VACUUM
2496 * operations that run against the same table with a fixed workload.
2497 * Ideally, successive VACUUM operations will behave as if there are
2498 * exactly zero LP_DEAD items in cases where there are close to zero.
2499 *
2500 * This is likely to be helpful with a table that is continually affected
2501 * by UPDATEs that can mostly apply the HOT optimization, but occasionally
2502 * have small aberrations that lead to just a few heap pages retaining
2503 * only one or two LP_DEAD items. This is pretty common; even when the
2504 * DBA goes out of their way to make UPDATEs use HOT, it is practically
2505 * impossible to predict whether HOT will be applied in 100% of cases.
2506 * It's far easier to ensure that 99%+ of all UPDATEs against a table use
2507 * HOT through careful tuning.
2508 */
2509 bypass = false;
2510 if (vacrel->consider_bypass_optimization && vacrel->rel_pages > 0)
2511 {
2512 BlockNumber threshold;
2513
2514 Assert(vacrel->num_index_scans == 0);
2515 Assert(vacrel->lpdead_items == vacrel->dead_items_info->num_items);
2516 Assert(vacrel->do_index_vacuuming);
2517 Assert(vacrel->do_index_cleanup);
2518
2519 /*
2520 * This crossover point at which we'll start to do index vacuuming is
2521 * expressed as a percentage of the total number of heap pages in the
2522 * table that are known to have at least one LP_DEAD item. This is
2523 * much more important than the total number of LP_DEAD items, since
2524 * it's a proxy for the number of heap pages whose visibility map bits
2525 * cannot be set on account of bypassing index and heap vacuuming.
2526 *
2527 * We apply one further precautionary test: the space currently used
2528 * to store the TIDs (TIDs that now all point to LP_DEAD items) must
2529 * not exceed 32MB. This limits the risk that we will bypass index
2530 * vacuuming again and again until eventually there is a VACUUM whose
2531 * dead_items space is not CPU cache resident.
2532 *
2533 * We don't take any special steps to remember the LP_DEAD items (such
2534 * as counting them in our final update to the stats system) when the
2535 * optimization is applied. Though the accounting used in analyze.c's
2536 * acquire_sample_rows() will recognize the same LP_DEAD items as dead
2537 * rows in its own stats report, that's okay. The discrepancy should
2538 * be negligible. If this optimization is ever expanded to cover more
2539 * cases then this may need to be reconsidered.
2540 */
2541 threshold = (double) vacrel->rel_pages * BYPASS_THRESHOLD_PAGES;
2542 bypass = (vacrel->lpdead_item_pages < threshold &&
2543 TidStoreMemoryUsage(vacrel->dead_items) < 32 * 1024 * 1024);
2544 }
2545
2546 if (bypass)
2547 {
2548 /*
2549 * There are almost zero TIDs. Behave as if there were precisely
2550 * zero: bypass index vacuuming, but do index cleanup.
2551 *
2552 * We expect that the ongoing VACUUM operation will finish very
2553 * quickly, so there is no point in considering speeding up as a
2554 * failsafe against wraparound failure. (Index cleanup is expected to
2555 * finish very quickly in cases where there were no ambulkdelete()
2556 * calls.)
2557 */
2558 vacrel->do_index_vacuuming = false;
2559 }
2560 else if (lazy_vacuum_all_indexes(vacrel))
2561 {
2562 /*
2563 * We successfully completed a round of index vacuuming. Do related
2564 * heap vacuuming now.
2565 */
2566 lazy_vacuum_heap_rel(vacrel);
2567 }
2568 else
2569 {
2570 /*
2571 * Failsafe case.
2572 *
2573 * We attempted index vacuuming, but didn't finish a full round/full
2574 * index scan. This happens when relfrozenxid or relminmxid is too
2575 * far in the past.
2576 *
2577 * From this point on the VACUUM operation will do no further index
2578 * vacuuming or heap vacuuming. This VACUUM operation won't end up
2579 * back here again.
2580 */
2582 }
2583
2584 /*
2585 * Forget the LP_DEAD items that we just vacuumed (or just decided to not
2586 * vacuum)
2587 */
2588 dead_items_reset(vacrel);
2589}
2590
2591/*
2592 * lazy_vacuum_all_indexes() -- Main entry for index vacuuming
2593 *
2594 * Returns true in the common case when all indexes were successfully
2595 * vacuumed. Returns false in rare cases where we determined that the ongoing
2596 * VACUUM operation is at risk of taking too long to finish, leading to
2597 * wraparound failure.
2598 */
2599static bool
2601{
2602 bool allindexes = true;
2603 double old_live_tuples = vacrel->rel->rd_rel->reltuples;
2604 const int progress_start_index[] = {
2607 };
2608 const int progress_end_index[] = {
2612 };
2613 int64 progress_start_val[2];
2614 int64 progress_end_val[3];
2615
2616 Assert(vacrel->nindexes > 0);
2617 Assert(vacrel->do_index_vacuuming);
2618 Assert(vacrel->do_index_cleanup);
2619
2620 /* Precheck for XID wraparound emergencies */
2622 {
2623 /* Wraparound emergency -- don't even start an index scan */
2624 return false;
2625 }
2626
2627 /*
2628 * Report that we are now vacuuming indexes and the number of indexes to
2629 * vacuum.
2630 */
2631 progress_start_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_INDEX;
2632 progress_start_val[1] = vacrel->nindexes;
2633 pgstat_progress_update_multi_param(2, progress_start_index, progress_start_val);
2634
2635 if (!ParallelVacuumIsActive(vacrel))
2636 {
2637 for (int idx = 0; idx < vacrel->nindexes; idx++)
2638 {
2639 Relation indrel = vacrel->indrels[idx];
2640 IndexBulkDeleteResult *istat = vacrel->indstats[idx];
2641
2642 vacrel->indstats[idx] = lazy_vacuum_one_index(indrel, istat,
2643 old_live_tuples,
2644 vacrel);
2645
2646 /* Report the number of indexes vacuumed */
2648 idx + 1);
2649
2651 {
2652 /* Wraparound emergency -- end current index scan */
2653 allindexes = false;
2654 break;
2655 }
2656 }
2657 }
2658 else
2659 {
2660 /* Outsource everything to parallel variant */
2661 parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, old_live_tuples,
2662 vacrel->num_index_scans);
2663
2664 /*
2665 * Do a postcheck to consider applying wraparound failsafe now. Note
2666 * that parallel VACUUM only gets the precheck and this postcheck.
2667 */
2669 allindexes = false;
2670 }
2671
2672 /*
2673 * We delete all LP_DEAD items from the first heap pass in all indexes on
2674 * each call here (except calls where we choose to do the failsafe). This
2675 * makes the next call to lazy_vacuum_heap_rel() safe (except in the event
2676 * of the failsafe triggering, which prevents the next call from taking
2677 * place).
2678 */
2679 Assert(vacrel->num_index_scans > 0 ||
2680 vacrel->dead_items_info->num_items == vacrel->lpdead_items);
2681 Assert(allindexes || VacuumFailsafeActive);
2682
2683 /*
2684 * Increase and report the number of index scans. Also, we reset
2685 * PROGRESS_VACUUM_INDEXES_TOTAL and PROGRESS_VACUUM_INDEXES_PROCESSED.
2686 *
2687 * We deliberately include the case where we started a round of bulk
2688 * deletes that we weren't able to finish due to the failsafe triggering.
2689 */
2690 vacrel->num_index_scans++;
2691 progress_end_val[0] = 0;
2692 progress_end_val[1] = 0;
2693 progress_end_val[2] = vacrel->num_index_scans;
2694 pgstat_progress_update_multi_param(3, progress_end_index, progress_end_val);
2695
2696 return allindexes;
2697}
2698
2699/*
2700 * Read stream callback for vacuum's third phase (second pass over the heap).
2701 * Gets the next block from the TID store and returns it or InvalidBlockNumber
2702 * if there are no further blocks to vacuum.
2703 *
2704 * NB: Assumed to be safe to use with READ_STREAM_USE_BATCHING.
2705 */
2706static BlockNumber
2708 void *callback_private_data,
2709 void *per_buffer_data)
2710{
2711 TidStoreIter *iter = callback_private_data;
2712 TidStoreIterResult *iter_result;
2713
2714 iter_result = TidStoreIterateNext(iter);
2715 if (iter_result == NULL)
2716 return InvalidBlockNumber;
2717
2718 /*
2719 * Save the TidStoreIterResult for later, so we can extract the offsets.
2720 * It is safe to copy the result, according to TidStoreIterateNext().
2721 */
2722 memcpy(per_buffer_data, iter_result, sizeof(*iter_result));
2723
2724 return iter_result->blkno;
2725}
2726
2727/*
2728 * lazy_vacuum_heap_rel() -- second pass over the heap for two pass strategy
2729 *
2730 * This routine marks LP_DEAD items in vacrel->dead_items as LP_UNUSED. Pages
2731 * that never had lazy_scan_prune record LP_DEAD items are not visited at all.
2732 *
2733 * We may also be able to truncate the line pointer array of the heap pages we
2734 * visit. If there is a contiguous group of LP_UNUSED items at the end of the
2735 * array, it can be reclaimed as free space. These LP_UNUSED items usually
2736 * start out as LP_DEAD items recorded by lazy_scan_prune (we set items from
2737 * each page to LP_UNUSED, and then consider if it's possible to truncate the
2738 * page's line pointer array).
2739 *
2740 * Note: the reason for doing this as a second pass is we cannot remove the
2741 * tuples until we've removed their index entries, and we want to process
2742 * index entry removal in batches as large as possible.
2743 */
2744static void
2746{
2747 ReadStream *stream;
2748 BlockNumber vacuumed_pages = 0;
2749 Buffer vmbuffer = InvalidBuffer;
2750 LVSavedErrInfo saved_err_info;
2751 TidStoreIter *iter;
2752
2753 Assert(vacrel->do_index_vacuuming);
2754 Assert(vacrel->do_index_cleanup);
2755 Assert(vacrel->num_index_scans > 0);
2756
2757 /* Report that we are now vacuuming the heap */
2760
2761 /* Update error traceback information */
2762 update_vacuum_error_info(vacrel, &saved_err_info,
2765
2766 iter = TidStoreBeginIterate(vacrel->dead_items);
2767
2768 /*
2769 * Set up the read stream for vacuum's second pass through the heap.
2770 *
2771 * It is safe to use batchmode, as vacuum_reap_lp_read_stream_next() does
2772 * not need to wait for IO and does not perform locking. Once we support
2773 * parallelism it should still be fine, as presumably the holder of locks
2774 * would never be blocked by IO while holding the lock.
2775 */
2778 vacrel->bstrategy,
2779 vacrel->rel,
2782 iter,
2783 sizeof(TidStoreIterResult));
2784
2785 while (true)
2786 {
2787 BlockNumber blkno;
2788 Buffer buf;
2789 Page page;
2790 TidStoreIterResult *iter_result;
2791 Size freespace;
2793 int num_offsets;
2794
2795 vacuum_delay_point(false);
2796
2797 buf = read_stream_next_buffer(stream, (void **) &iter_result);
2798
2799 /* The relation is exhausted */
2800 if (!BufferIsValid(buf))
2801 break;
2802
2803 vacrel->blkno = blkno = BufferGetBlockNumber(buf);
2804
2805 Assert(iter_result);
2806 num_offsets = TidStoreGetBlockOffsets(iter_result, offsets, lengthof(offsets));
2807 Assert(num_offsets <= lengthof(offsets));
2808
2809 /*
2810 * Pin the visibility map page in case we need to mark the page
2811 * all-visible. In most cases this will be very cheap, because we'll
2812 * already have the correct page pinned anyway.
2813 */
2814 visibilitymap_pin(vacrel->rel, blkno, &vmbuffer);
2815
2816 /* We need a non-cleanup exclusive lock to mark dead_items unused */
2818 lazy_vacuum_heap_page(vacrel, blkno, buf, offsets,
2819 num_offsets, vmbuffer);
2820
2821 /* Now that we've vacuumed the page, record its available space */
2822 page = BufferGetPage(buf);
2823 freespace = PageGetHeapFreeSpace(page);
2824
2826 RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
2827 vacuumed_pages++;
2828 }
2829
2830 read_stream_end(stream);
2831 TidStoreEndIterate(iter);
2832
2833 vacrel->blkno = InvalidBlockNumber;
2834 if (BufferIsValid(vmbuffer))
2835 ReleaseBuffer(vmbuffer);
2836
2837 /*
2838 * We set all LP_DEAD items from the first heap pass to LP_UNUSED during
2839 * the second heap pass. No more, no less.
2840 */
2841 Assert(vacrel->num_index_scans > 1 ||
2842 (vacrel->dead_items_info->num_items == vacrel->lpdead_items &&
2843 vacuumed_pages == vacrel->lpdead_item_pages));
2844
2846 (errmsg("table \"%s\": removed %" PRId64 " dead item identifiers in %u pages",
2847 vacrel->relname, vacrel->dead_items_info->num_items,
2848 vacuumed_pages)));
2849
2850 /* Revert to the previous phase information for error traceback */
2851 restore_vacuum_error_info(vacrel, &saved_err_info);
2852}
2853
2854/*
2855 * lazy_vacuum_heap_page() -- free page's LP_DEAD items listed in the
2856 * vacrel->dead_items store.
2857 *
2858 * Caller must have an exclusive buffer lock on the buffer (though a full
2859 * cleanup lock is also acceptable). vmbuffer must be valid and already have
2860 * a pin on blkno's visibility map page.
2861 */
2862static void
2864 OffsetNumber *deadoffsets, int num_offsets,
2865 Buffer vmbuffer)
2866{
2867 Page page = BufferGetPage(buffer);
2869 int nunused = 0;
2870 TransactionId visibility_cutoff_xid;
2871 TransactionId conflict_xid = InvalidTransactionId;
2872 bool all_frozen;
2873 LVSavedErrInfo saved_err_info;
2874 uint8 vmflags = 0;
2875
2876 Assert(vacrel->do_index_vacuuming);
2877
2879
2880 /* Update error traceback information */
2881 update_vacuum_error_info(vacrel, &saved_err_info,
2884
2885 /*
2886 * Before marking dead items unused, check whether the page will become
2887 * all-visible once that change is applied. This lets us reap the tuples
2888 * and mark the page all-visible within the same critical section,
2889 * enabling both changes to be emitted in a single WAL record. Since the
2890 * visibility checks may perform I/O and allocate memory, they must be
2891 * done outside the critical section.
2892 */
2893 if (heap_page_would_be_all_visible(vacrel->rel, buffer,
2894 vacrel->cutoffs.OldestXmin,
2895 deadoffsets, num_offsets,
2896 &all_frozen, &visibility_cutoff_xid,
2897 &vacrel->offnum))
2898 {
2899 vmflags |= VISIBILITYMAP_ALL_VISIBLE;
2900 if (all_frozen)
2901 {
2902 vmflags |= VISIBILITYMAP_ALL_FROZEN;
2903 Assert(!TransactionIdIsValid(visibility_cutoff_xid));
2904 }
2905
2906 /*
2907 * Take the lock on the vmbuffer before entering a critical section.
2908 * The heap page lock must also be held while updating the VM to
2909 * ensure consistency.
2910 */
2912 }
2913
2915
2916 for (int i = 0; i < num_offsets; i++)
2917 {
2918 ItemId itemid;
2919 OffsetNumber toff = deadoffsets[i];
2920
2921 itemid = PageGetItemId(page, toff);
2922
2923 Assert(ItemIdIsDead(itemid) && !ItemIdHasStorage(itemid));
2924 ItemIdSetUnused(itemid);
2925 unused[nunused++] = toff;
2926 }
2927
2928 Assert(nunused > 0);
2929
2930 /* Attempt to truncate line pointer array now */
2932
2933 if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
2934 {
2935 /*
2936 * The page is guaranteed to have had dead line pointers, so we always
2937 * set PD_ALL_VISIBLE.
2938 */
2939 PageSetAllVisible(page);
2941 vmbuffer, vmflags,
2942 vacrel->rel->rd_locator);
2943 conflict_xid = visibility_cutoff_xid;
2944 }
2945
2946 /*
2947 * Mark buffer dirty before we write WAL.
2948 */
2949 MarkBufferDirty(buffer);
2950
2951 /* XLOG stuff */
2952 if (RelationNeedsWAL(vacrel->rel))
2953 {
2954 log_heap_prune_and_freeze(vacrel->rel, buffer,
2955 vmflags != 0 ? vmbuffer : InvalidBuffer,
2956 vmflags,
2957 conflict_xid,
2958 false, /* no cleanup lock required */
2960 NULL, 0, /* frozen */
2961 NULL, 0, /* redirected */
2962 NULL, 0, /* dead */
2963 unused, nunused);
2964 }
2965
2967
2968 if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
2969 {
2970 /* Count the newly set VM page for logging */
2971 LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
2972 vacrel->vm_new_visible_pages++;
2973 if (all_frozen)
2975 }
2976
2977 /* Revert to the previous phase information for error traceback */
2978 restore_vacuum_error_info(vacrel, &saved_err_info);
2979}
2980
2981/*
2982 * Trigger the failsafe to avoid wraparound failure when vacrel table has a
2983 * relfrozenxid and/or relminmxid that is dangerously far in the past.
2984 * Triggering the failsafe makes the ongoing VACUUM bypass any further index
2985 * vacuuming and heap vacuuming. Truncating the heap is also bypassed.
2986 *
2987 * Any remaining work (work that VACUUM cannot just bypass) is typically sped
2988 * up when the failsafe triggers. VACUUM stops applying any cost-based delay
2989 * that it started out with.
2990 *
2991 * Returns true when failsafe has been triggered.
2992 */
2993static bool
2995{
2996 /* Don't warn more than once per VACUUM */
2998 return true;
2999
3001 {
3002 const int progress_index[] = {
3005 };
3006 int64 progress_val[2] = {0, 0};
3007
3008 VacuumFailsafeActive = true;
3009
3010 /*
3011 * Abandon use of a buffer access strategy to allow use of all of
3012 * shared buffers. We assume the caller who allocated the memory for
3013 * the BufferAccessStrategy will free it.
3014 */
3015 vacrel->bstrategy = NULL;
3016
3017 /* Disable index vacuuming, index cleanup, and heap rel truncation */
3018 vacrel->do_index_vacuuming = false;
3019 vacrel->do_index_cleanup = false;
3020 vacrel->do_rel_truncate = false;
3021
3022 /* Reset the progress counters */
3023 pgstat_progress_update_multi_param(2, progress_index, progress_val);
3024
3026 (errmsg("bypassing nonessential maintenance of table \"%s.%s.%s\" as a failsafe after %d index scans",
3027 vacrel->dbname, vacrel->relnamespace, vacrel->relname,
3028 vacrel->num_index_scans),
3029 errdetail("The table's relfrozenxid or relminmxid is too far in the past."),
3030 errhint("Consider increasing configuration parameter \"maintenance_work_mem\" or \"autovacuum_work_mem\".\n"
3031 "You might also need to consider other ways for VACUUM to keep up with the allocation of transaction IDs.")));
3032
3033 /* Stop applying cost limits from this point on */
3034 VacuumCostActive = false;
3036
3037 return true;
3038 }
3039
3040 return false;
3041}
3042
3043/*
3044 * lazy_cleanup_all_indexes() -- cleanup all indexes of relation.
3045 */
3046static void
3048{
3049 double reltuples = vacrel->new_rel_tuples;
3050 bool estimated_count = vacrel->scanned_pages < vacrel->rel_pages;
3051 const int progress_start_index[] = {
3054 };
3055 const int progress_end_index[] = {
3058 };
3059 int64 progress_start_val[2];
3060 int64 progress_end_val[2] = {0, 0};
3061
3062 Assert(vacrel->do_index_cleanup);
3063 Assert(vacrel->nindexes > 0);
3064
3065 /*
3066 * Report that we are now cleaning up indexes and the number of indexes to
3067 * cleanup.
3068 */
3069 progress_start_val[0] = PROGRESS_VACUUM_PHASE_INDEX_CLEANUP;
3070 progress_start_val[1] = vacrel->nindexes;
3071 pgstat_progress_update_multi_param(2, progress_start_index, progress_start_val);
3072
3073 if (!ParallelVacuumIsActive(vacrel))
3074 {
3075 for (int idx = 0; idx < vacrel->nindexes; idx++)
3076 {
3077 Relation indrel = vacrel->indrels[idx];
3078 IndexBulkDeleteResult *istat = vacrel->indstats[idx];
3079
3080 vacrel->indstats[idx] =
3081 lazy_cleanup_one_index(indrel, istat, reltuples,
3082 estimated_count, vacrel);
3083
3084 /* Report the number of indexes cleaned up */
3086 idx + 1);
3087 }
3088 }
3089 else
3090 {
3091 /* Outsource everything to parallel variant */
3092 parallel_vacuum_cleanup_all_indexes(vacrel->pvs, reltuples,
3093 vacrel->num_index_scans,
3094 estimated_count);
3095 }
3096
3097 /* Reset the progress counters */
3098 pgstat_progress_update_multi_param(2, progress_end_index, progress_end_val);
3099}
3100
3101/*
3102 * lazy_vacuum_one_index() -- vacuum index relation.
3103 *
3104 * Delete all the index tuples containing a TID collected in
3105 * vacrel->dead_items. Also update running statistics. Exact
3106 * details depend on index AM's ambulkdelete routine.
3107 *
3108 * reltuples is the number of heap tuples to be passed to the
3109 * bulkdelete callback. It's always assumed to be estimated.
3110 * See indexam.sgml for more info.
3111 *
3112 * Returns bulk delete stats derived from input stats
3113 */
3114static IndexBulkDeleteResult *
3116 double reltuples, LVRelState *vacrel)
3117{
3118 IndexVacuumInfo ivinfo;
3119 LVSavedErrInfo saved_err_info;
3120
3121 ivinfo.index = indrel;
3122 ivinfo.heaprel = vacrel->rel;
3123 ivinfo.analyze_only = false;
3124 ivinfo.report_progress = false;
3125 ivinfo.estimated_count = true;
3126 ivinfo.message_level = DEBUG2;
3127 ivinfo.num_heap_tuples = reltuples;
3128 ivinfo.strategy = vacrel->bstrategy;
3129
3130 /*
3131 * Update error traceback information.
3132 *
3133 * The index name is saved during this phase and restored immediately
3134 * after this phase. See vacuum_error_callback.
3135 */
3136 Assert(vacrel->indname == NULL);
3137 vacrel->indname = pstrdup(RelationGetRelationName(indrel));
3138 update_vacuum_error_info(vacrel, &saved_err_info,
3141
3142 /* Do bulk deletion */
3143 istat = vac_bulkdel_one_index(&ivinfo, istat, vacrel->dead_items,
3144 vacrel->dead_items_info);
3145
3146 /* Revert to the previous phase information for error traceback */
3147 restore_vacuum_error_info(vacrel, &saved_err_info);
3148 pfree(vacrel->indname);
3149 vacrel->indname = NULL;
3150
3151 return istat;
3152}
3153
3154/*
3155 * lazy_cleanup_one_index() -- do post-vacuum cleanup for index relation.
3156 *
3157 * Calls index AM's amvacuumcleanup routine. reltuples is the number
3158 * of heap tuples and estimated_count is true if reltuples is an
3159 * estimated value. See indexam.sgml for more info.
3160 *
3161 * Returns bulk delete stats derived from input stats
3162 */
3163static IndexBulkDeleteResult *
3165 double reltuples, bool estimated_count,
3166 LVRelState *vacrel)
3167{
3168 IndexVacuumInfo ivinfo;
3169 LVSavedErrInfo saved_err_info;
3170
3171 ivinfo.index = indrel;
3172 ivinfo.heaprel = vacrel->rel;
3173 ivinfo.analyze_only = false;
3174 ivinfo.report_progress = false;
3175 ivinfo.estimated_count = estimated_count;
3176 ivinfo.message_level = DEBUG2;
3177
3178 ivinfo.num_heap_tuples = reltuples;
3179 ivinfo.strategy = vacrel->bstrategy;
3180
3181 /*
3182 * Update error traceback information.
3183 *
3184 * The index name is saved during this phase and restored immediately
3185 * after this phase. See vacuum_error_callback.
3186 */
3187 Assert(vacrel->indname == NULL);
3188 vacrel->indname = pstrdup(RelationGetRelationName(indrel));
3189 update_vacuum_error_info(vacrel, &saved_err_info,
3192
3193 istat = vac_cleanup_one_index(&ivinfo, istat);
3194
3195 /* Revert to the previous phase information for error traceback */
3196 restore_vacuum_error_info(vacrel, &saved_err_info);
3197 pfree(vacrel->indname);
3198 vacrel->indname = NULL;
3199
3200 return istat;
3201}
3202
3203/*
3204 * should_attempt_truncation - should we attempt to truncate the heap?
3205 *
3206 * Don't even think about it unless we have a shot at releasing a goodly
3207 * number of pages. Otherwise, the time taken isn't worth it, mainly because
3208 * an AccessExclusive lock must be replayed on any hot standby, where it can
3209 * be particularly disruptive.
3210 *
3211 * Also don't attempt it if wraparound failsafe is in effect. The entire
3212 * system might be refusing to allocate new XIDs at this point. The system
3213 * definitely won't return to normal unless and until VACUUM actually advances
3214 * the oldest relfrozenxid -- which hasn't happened for target rel just yet.
3215 * If lazy_truncate_heap attempted to acquire an AccessExclusiveLock to
3216 * truncate the table under these circumstances, an XID exhaustion error might
3217 * make it impossible for VACUUM to fix the underlying XID exhaustion problem.
3218 * There is very little chance of truncation working out when the failsafe is
3219 * in effect in any case. lazy_scan_prune makes the optimistic assumption
3220 * that any LP_DEAD items it encounters will always be LP_UNUSED by the time
3221 * we're called.
3222 */
3223static bool
3225{
3226 BlockNumber possibly_freeable;
3227
3228 if (!vacrel->do_rel_truncate || VacuumFailsafeActive)
3229 return false;
3230
3231 possibly_freeable = vacrel->rel_pages - vacrel->nonempty_pages;
3232 if (possibly_freeable > 0 &&
3233 (possibly_freeable >= REL_TRUNCATE_MINIMUM ||
3234 possibly_freeable >= vacrel->rel_pages / REL_TRUNCATE_FRACTION))
3235 return true;
3236
3237 return false;
3238}
3239
3240/*
3241 * lazy_truncate_heap - try to truncate off any empty pages at the end
3242 */
3243static void
3245{
3246 BlockNumber orig_rel_pages = vacrel->rel_pages;
3247 BlockNumber new_rel_pages;
3248 bool lock_waiter_detected;
3249 int lock_retry;
3250
3251 /* Report that we are now truncating */
3254
3255 /* Update error traceback information one last time */
3258
3259 /*
3260 * Loop until no more truncating can be done.
3261 */
3262 do
3263 {
3264 /*
3265 * We need full exclusive lock on the relation in order to do
3266 * truncation. If we can't get it, give up rather than waiting --- we
3267 * don't want to block other backends, and we don't want to deadlock
3268 * (which is quite possible considering we already hold a lower-grade
3269 * lock).
3270 */
3271 lock_waiter_detected = false;
3272 lock_retry = 0;
3273 while (true)
3274 {
3276 break;
3277
3278 /*
3279 * Check for interrupts while trying to (re-)acquire the exclusive
3280 * lock.
3281 */
3283
3284 if (++lock_retry > (VACUUM_TRUNCATE_LOCK_TIMEOUT /
3286 {
3287 /*
3288 * We failed to establish the lock in the specified number of
3289 * retries. This means we give up truncating.
3290 */
3291 ereport(vacrel->verbose ? INFO : DEBUG2,
3292 (errmsg("\"%s\": stopping truncate due to conflicting lock request",
3293 vacrel->relname)));
3294 return;
3295 }
3296
3297 (void) WaitLatch(MyLatch,
3300 WAIT_EVENT_VACUUM_TRUNCATE);
3302 }
3303
3304 /*
3305 * Now that we have exclusive lock, look to see if the rel has grown
3306 * whilst we were vacuuming with non-exclusive lock. If so, give up;
3307 * the newly added pages presumably contain non-deletable tuples.
3308 */
3309 new_rel_pages = RelationGetNumberOfBlocks(vacrel->rel);
3310 if (new_rel_pages != orig_rel_pages)
3311 {
3312 /*
3313 * Note: we intentionally don't update vacrel->rel_pages with the
3314 * new rel size here. If we did, it would amount to assuming that
3315 * the new pages are empty, which is unlikely. Leaving the numbers
3316 * alone amounts to assuming that the new pages have the same
3317 * tuple density as existing ones, which is less unlikely.
3318 */
3320 return;
3321 }
3322
3323 /*
3324 * Scan backwards from the end to verify that the end pages actually
3325 * contain no tuples. This is *necessary*, not optional, because
3326 * other backends could have added tuples to these pages whilst we
3327 * were vacuuming.
3328 */
3329 new_rel_pages = count_nondeletable_pages(vacrel, &lock_waiter_detected);
3330 vacrel->blkno = new_rel_pages;
3331
3332 if (new_rel_pages >= orig_rel_pages)
3333 {
3334 /* can't do anything after all */
3336 return;
3337 }
3338
3339 /*
3340 * Okay to truncate.
3341 */
3342 RelationTruncate(vacrel->rel, new_rel_pages);
3343
3344 /*
3345 * We can release the exclusive lock as soon as we have truncated.
3346 * Other backends can't safely access the relation until they have
3347 * processed the smgr invalidation that smgrtruncate sent out ... but
3348 * that should happen as part of standard invalidation processing once
3349 * they acquire lock on the relation.
3350 */
3352
3353 /*
3354 * Update statistics. Here, it *is* correct to adjust rel_pages
3355 * without also touching reltuples, since the tuple count wasn't
3356 * changed by the truncation.
3357 */
3358 vacrel->removed_pages += orig_rel_pages - new_rel_pages;
3359 vacrel->rel_pages = new_rel_pages;
3360
3361 ereport(vacrel->verbose ? INFO : DEBUG2,
3362 (errmsg("table \"%s\": truncated %u to %u pages",
3363 vacrel->relname,
3364 orig_rel_pages, new_rel_pages)));
3365 orig_rel_pages = new_rel_pages;
3366 } while (new_rel_pages > vacrel->nonempty_pages && lock_waiter_detected);
3367}
3368
3369/*
3370 * Rescan end pages to verify that they are (still) empty of tuples.
3371 *
3372 * Returns number of nondeletable pages (last nonempty page + 1).
3373 */
3374static BlockNumber
3375count_nondeletable_pages(LVRelState *vacrel, bool *lock_waiter_detected)
3376{
3377 BlockNumber blkno;
3378 BlockNumber prefetchedUntil;
3379 instr_time starttime;
3380
3381 /* Initialize the starttime if we check for conflicting lock requests */
3382 INSTR_TIME_SET_CURRENT(starttime);
3383
3384 /*
3385 * Start checking blocks at what we believe relation end to be and move
3386 * backwards. (Strange coding of loop control is needed because blkno is
3387 * unsigned.) To make the scan faster, we prefetch a few blocks at a time
3388 * in forward direction, so that OS-level readahead can kick in.
3389 */
3390 blkno = vacrel->rel_pages;
3392 "prefetch size must be power of 2");
3393 prefetchedUntil = InvalidBlockNumber;
3394 while (blkno > vacrel->nonempty_pages)
3395 {
3396 Buffer buf;
3397 Page page;
3398 OffsetNumber offnum,
3399 maxoff;
3400 bool hastup;
3401
3402 /*
3403 * Check if another process requests a lock on our relation. We are
3404 * holding an AccessExclusiveLock here, so they will be waiting. We
3405 * only do this once per VACUUM_TRUNCATE_LOCK_CHECK_INTERVAL, and we
3406 * only check if that interval has elapsed once every 32 blocks to
3407 * keep the number of system calls and actual shared lock table
3408 * lookups to a minimum.
3409 */
3410 if ((blkno % 32) == 0)
3411 {
3412 instr_time currenttime;
3413 instr_time elapsed;
3414
3415 INSTR_TIME_SET_CURRENT(currenttime);
3416 elapsed = currenttime;
3417 INSTR_TIME_SUBTRACT(elapsed, starttime);
3418 if ((INSTR_TIME_GET_MICROSEC(elapsed) / 1000)
3420 {
3422 {
3423 ereport(vacrel->verbose ? INFO : DEBUG2,
3424 (errmsg("table \"%s\": suspending truncate due to conflicting lock request",
3425 vacrel->relname)));
3426
3427 *lock_waiter_detected = true;
3428 return blkno;
3429 }
3430 starttime = currenttime;
3431 }
3432 }
3433
3434 /*
3435 * We don't insert a vacuum delay point here, because we have an
3436 * exclusive lock on the table which we want to hold for as short a
3437 * time as possible. We still need to check for interrupts however.
3438 */
3440
3441 blkno--;
3442
3443 /* If we haven't prefetched this lot yet, do so now. */
3444 if (prefetchedUntil > blkno)
3445 {
3446 BlockNumber prefetchStart;
3447 BlockNumber pblkno;
3448
3449 prefetchStart = blkno & ~(PREFETCH_SIZE - 1);
3450 for (pblkno = prefetchStart; pblkno <= blkno; pblkno++)
3451 {
3452 PrefetchBuffer(vacrel->rel, MAIN_FORKNUM, pblkno);
3454 }
3455 prefetchedUntil = prefetchStart;
3456 }
3457
3459 vacrel->bstrategy);
3460
3461 /* In this phase we only need shared access to the buffer */
3463
3464 page = BufferGetPage(buf);
3465
3466 if (PageIsNew(page) || PageIsEmpty(page))
3467 {
3469 continue;
3470 }
3471
3472 hastup = false;
3473 maxoff = PageGetMaxOffsetNumber(page);
3474 for (offnum = FirstOffsetNumber;
3475 offnum <= maxoff;
3476 offnum = OffsetNumberNext(offnum))
3477 {
3478 ItemId itemid;
3479
3480 itemid = PageGetItemId(page, offnum);
3481
3482 /*
3483 * Note: any non-unused item should be taken as a reason to keep
3484 * this page. Even an LP_DEAD item makes truncation unsafe, since
3485 * we must not have cleaned out its index entries.
3486 */
3487 if (ItemIdIsUsed(itemid))
3488 {
3489 hastup = true;
3490 break; /* can stop scanning */
3491 }
3492 } /* scan along page */
3493
3495
3496 /* Done scanning if we found a tuple here */
3497 if (hastup)
3498 return blkno + 1;
3499 }
3500
3501 /*
3502 * If we fall out of the loop, all the previously-thought-to-be-empty
3503 * pages still are; we need not bother to look at the last known-nonempty
3504 * page.
3505 */
3506 return vacrel->nonempty_pages;
3507}
3508
3509/*
3510 * Allocate dead_items and dead_items_info (either using palloc, or in dynamic
3511 * shared memory). Sets both in vacrel for caller.
3512 *
3513 * Also handles parallel initialization as part of allocating dead_items in
3514 * DSM when required.
3515 */
3516static void
3517dead_items_alloc(LVRelState *vacrel, int nworkers)
3518{
3519 VacDeadItemsInfo *dead_items_info;
3520 int vac_work_mem = AmAutoVacuumWorkerProcess() &&
3521 autovacuum_work_mem != -1 ?
3523
3524 /*
3525 * Initialize state for a parallel vacuum. As of now, only one worker can
3526 * be used for an index, so we invoke parallelism only if there are at
3527 * least two indexes on a table.
3528 */
3529 if (nworkers >= 0 && vacrel->nindexes > 1 && vacrel->do_index_vacuuming)
3530 {
3531 /*
3532 * Since parallel workers cannot access data in temporary tables, we
3533 * can't perform parallel vacuum on them.
3534 */
3535 if (RelationUsesLocalBuffers(vacrel->rel))
3536 {
3537 /*
3538 * Give warning only if the user explicitly tries to perform a
3539 * parallel vacuum on the temporary table.
3540 */
3541 if (nworkers > 0)
3543 (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporary tables in parallel",
3544 vacrel->relname)));
3545 }
3546 else
3547 vacrel->pvs = parallel_vacuum_init(vacrel->rel, vacrel->indrels,
3548 vacrel->nindexes, nworkers,
3549 vac_work_mem,
3550 vacrel->verbose ? INFO : DEBUG2,
3551 vacrel->bstrategy);
3552
3553 /*
3554 * If parallel mode started, dead_items and dead_items_info spaces are
3555 * allocated in DSM.
3556 */
3557 if (ParallelVacuumIsActive(vacrel))
3558 {
3560 &vacrel->dead_items_info);
3561 return;
3562 }
3563 }
3564
3565 /*
3566 * Serial VACUUM case. Allocate both dead_items and dead_items_info
3567 * locally.
3568 */
3569
3570 dead_items_info = (VacDeadItemsInfo *) palloc(sizeof(VacDeadItemsInfo));
3571 dead_items_info->max_bytes = vac_work_mem * (Size) 1024;
3572 dead_items_info->num_items = 0;
3573 vacrel->dead_items_info = dead_items_info;
3574
3575 vacrel->dead_items = TidStoreCreateLocal(dead_items_info->max_bytes, true);
3576}
3577
3578/*
3579 * Add the given block number and offset numbers to dead_items.
3580 */
3581static void
3583 int num_offsets)
3584{
3585 const int prog_index[2] = {
3588 };
3589 int64 prog_val[2];
3590
3591 TidStoreSetBlockOffsets(vacrel->dead_items, blkno, offsets, num_offsets);
3592 vacrel->dead_items_info->num_items += num_offsets;
3593
3594 /* update the progress information */
3595 prog_val[0] = vacrel->dead_items_info->num_items;
3596 prog_val[1] = TidStoreMemoryUsage(vacrel->dead_items);
3597 pgstat_progress_update_multi_param(2, prog_index, prog_val);
3598}
3599
3600/*
3601 * Forget all collected dead items.
3602 */
3603static void
3605{
3606 if (ParallelVacuumIsActive(vacrel))
3607 {
3610 &vacrel->dead_items_info);
3611 return;
3612 }
3613
3614 /* Recreate the tidstore with the same max_bytes limitation */
3615 TidStoreDestroy(vacrel->dead_items);
3616 vacrel->dead_items = TidStoreCreateLocal(vacrel->dead_items_info->max_bytes, true);
3617
3618 /* Reset the counter */
3619 vacrel->dead_items_info->num_items = 0;
3620}
3621
3622/*
3623 * Perform cleanup for resources allocated in dead_items_alloc
3624 */
3625static void
3627{
3628 if (!ParallelVacuumIsActive(vacrel))
3629 {
3630 /* Don't bother with pfree here */
3631 return;
3632 }
3633
3634 /* End parallel mode */
3635 parallel_vacuum_end(vacrel->pvs, vacrel->indstats);
3636 vacrel->pvs = NULL;
3637}
3638
3639#ifdef USE_ASSERT_CHECKING
3640
3641/*
3642 * Wrapper for heap_page_would_be_all_visible() which can be used for callers
3643 * that expect no LP_DEAD on the page. Currently assert-only, but there is no
3644 * reason not to use it outside of asserts.
3645 */
3646static bool
3647heap_page_is_all_visible(Relation rel, Buffer buf,
3648 TransactionId OldestXmin,
3649 bool *all_frozen,
3650 TransactionId *visibility_cutoff_xid,
3651 OffsetNumber *logging_offnum)
3652{
3653
3655 OldestXmin,
3656 NULL, 0,
3657 all_frozen,
3658 visibility_cutoff_xid,
3659 logging_offnum);
3660}
3661#endif
3662
3663/*
3664 * Check whether the heap page in buf is all-visible except for the dead
3665 * tuples referenced in the deadoffsets array.
3666 *
3667 * Vacuum uses this to check if a page would become all-visible after reaping
3668 * known dead tuples. This function does not remove the dead items.
3669 *
3670 * This cannot be called in a critical section, as the visibility checks may
3671 * perform IO and allocate memory.
3672 *
3673 * Returns true if the page is all-visible other than the provided
3674 * deadoffsets and false otherwise.
3675 *
3676 * OldestXmin is used to determine visibility.
3677 *
3678 * Output parameters:
3679 *
3680 * - *all_frozen: true if every tuple on the page is frozen
3681 * - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
3682 * - *logging_offnum: OffsetNumber of current tuple being processed;
3683 * used by vacuum's error callback system.
3684 *
3685 * Callers looking to verify that the page is already all-visible can call
3686 * heap_page_is_all_visible().
3687 *
3688 * This logic is closely related to heap_prune_record_unchanged_lp_normal().
3689 * If you modify this function, ensure consistency with that code. An
3690 * assertion cross-checks that both remain in agreement. Do not introduce new
3691 * side-effects.
3692 */
3693static bool
3695 TransactionId OldestXmin,
3696 OffsetNumber *deadoffsets,
3697 int ndeadoffsets,
3698 bool *all_frozen,
3699 TransactionId *visibility_cutoff_xid,
3700 OffsetNumber *logging_offnum)
3701{
3702 Page page = BufferGetPage(buf);
3704 OffsetNumber offnum,
3705 maxoff;
3706 bool all_visible = true;
3707 int matched_dead_count = 0;
3708
3709 *visibility_cutoff_xid = InvalidTransactionId;
3710 *all_frozen = true;
3711
3712 Assert(ndeadoffsets == 0 || deadoffsets);
3713
3714#ifdef USE_ASSERT_CHECKING
3715 /* Confirm input deadoffsets[] is strictly sorted */
3716 if (ndeadoffsets > 1)
3717 {
3718 for (int i = 1; i < ndeadoffsets; i++)
3719 Assert(deadoffsets[i - 1] < deadoffsets[i]);
3720 }
3721#endif
3722
3723 maxoff = PageGetMaxOffsetNumber(page);
3724 for (offnum = FirstOffsetNumber;
3725 offnum <= maxoff && all_visible;
3726 offnum = OffsetNumberNext(offnum))
3727 {
3728 ItemId itemid;
3729 HeapTupleData tuple;
3730
3731 /*
3732 * Set the offset number so that we can display it along with any
3733 * error that occurred while processing this tuple.
3734 */
3735 *logging_offnum = offnum;
3736 itemid = PageGetItemId(page, offnum);
3737
3738 /* Unused or redirect line pointers are of no interest */
3739 if (!ItemIdIsUsed(itemid) || ItemIdIsRedirected(itemid))
3740 continue;
3741
3742 ItemPointerSet(&(tuple.t_self), blockno, offnum);
3743
3744 /*
3745 * Dead line pointers can have index pointers pointing to them. So
3746 * they can't be treated as visible
3747 */
3748 if (ItemIdIsDead(itemid))
3749 {
3750 if (!deadoffsets ||
3751 matched_dead_count >= ndeadoffsets ||
3752 deadoffsets[matched_dead_count] != offnum)
3753 {
3754 *all_frozen = all_visible = false;
3755 break;
3756 }
3757 matched_dead_count++;
3758 continue;
3759 }
3760
3761 Assert(ItemIdIsNormal(itemid));
3762
3763 tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
3764 tuple.t_len = ItemIdGetLength(itemid);
3765 tuple.t_tableOid = RelationGetRelid(rel);
3766
3767 /* Visibility checks may do IO or allocate memory */
3769 switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
3770 {
3771 case HEAPTUPLE_LIVE:
3772 {
3773 TransactionId xmin;
3774
3775 /* Check comments in lazy_scan_prune. */
3777 {
3778 all_visible = false;
3779 *all_frozen = false;
3780 break;
3781 }
3782
3783 /*
3784 * The inserter definitely committed. But is it old enough
3785 * that everyone sees it as committed?
3786 */
3787 xmin = HeapTupleHeaderGetXmin(tuple.t_data);
3788 if (!TransactionIdPrecedes(xmin, OldestXmin))
3789 {
3790 all_visible = false;
3791 *all_frozen = false;
3792 break;
3793 }
3794
3795 /* Track newest xmin on page. */
3796 if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
3798 *visibility_cutoff_xid = xmin;
3799
3800 /* Check whether this tuple is already frozen or not */
3801 if (all_visible && *all_frozen &&
3803 *all_frozen = false;
3804 }
3805 break;
3806
3807 case HEAPTUPLE_DEAD:
3811 {
3812 all_visible = false;
3813 *all_frozen = false;
3814 break;
3815 }
3816 default:
3817 elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
3818 break;
3819 }
3820 } /* scan along page */
3821
3822 /* Clear the offset information once we have processed the given page. */
3823 *logging_offnum = InvalidOffsetNumber;
3824
3825 return all_visible;
3826}
3827
3828/*
3829 * Update index statistics in pg_class if the statistics are accurate.
3830 */
3831static void
3833{
3834 Relation *indrels = vacrel->indrels;
3835 int nindexes = vacrel->nindexes;
3836 IndexBulkDeleteResult **indstats = vacrel->indstats;
3837
3838 Assert(vacrel->do_index_cleanup);
3839
3840 for (int idx = 0; idx < nindexes; idx++)
3841 {
3842 Relation indrel = indrels[idx];
3843 IndexBulkDeleteResult *istat = indstats[idx];
3844
3845 if (istat == NULL || istat->estimated_count)
3846 continue;
3847
3848 /* Update index statistics */
3849 vac_update_relstats(indrel,
3850 istat->num_pages,
3851 istat->num_index_tuples,
3852 0, 0,
3853 false,
3856 NULL, NULL, false);
3857 }
3858}
3859
3860/*
3861 * Error context callback for errors occurring during vacuum. The error
3862 * context messages for index phases should match the messages set in parallel
3863 * vacuum. If you change this function for those phases, change
3864 * parallel_vacuum_error_callback() as well.
3865 */
3866static void
3868{
3869 LVRelState *errinfo = arg;
3870
3871 switch (errinfo->phase)
3872 {
3874 if (BlockNumberIsValid(errinfo->blkno))
3875 {
3876 if (OffsetNumberIsValid(errinfo->offnum))
3877 errcontext("while scanning block %u offset %u of relation \"%s.%s\"",
3878 errinfo->blkno, errinfo->offnum, errinfo->relnamespace, errinfo->relname);
3879 else
3880 errcontext("while scanning block %u of relation \"%s.%s\"",
3881 errinfo->blkno, errinfo->relnamespace, errinfo->relname);
3882 }
3883 else
3884 errcontext("while scanning relation \"%s.%s\"",
3885 errinfo->relnamespace, errinfo->relname);
3886 break;
3887
3889 if (BlockNumberIsValid(errinfo->blkno))
3890 {
3891 if (OffsetNumberIsValid(errinfo->offnum))
3892 errcontext("while vacuuming block %u offset %u of relation \"%s.%s\"",
3893 errinfo->blkno, errinfo->offnum, errinfo->relnamespace, errinfo->relname);
3894 else
3895 errcontext("while vacuuming block %u of relation \"%s.%s\"",
3896 errinfo->blkno, errinfo->relnamespace, errinfo->relname);
3897 }
3898 else
3899 errcontext("while vacuuming relation \"%s.%s\"",
3900 errinfo->relnamespace, errinfo->relname);
3901 break;
3902
3904 errcontext("while vacuuming index \"%s\" of relation \"%s.%s\"",
3905 errinfo->indname, errinfo->relnamespace, errinfo->relname);
3906 break;
3907
3909 errcontext("while cleaning up index \"%s\" of relation \"%s.%s\"",
3910 errinfo->indname, errinfo->relnamespace, errinfo->relname);
3911 break;
3912
3914 if (BlockNumberIsValid(errinfo->blkno))
3915 errcontext("while truncating relation \"%s.%s\" to %u blocks",
3916 errinfo->relnamespace, errinfo->relname, errinfo->blkno);
3917 break;
3918
3920 default:
3921 return; /* do nothing; the errinfo may not be
3922 * initialized */
3923 }
3924}
3925
3926/*
3927 * Updates the information required for vacuum error callback. This also saves
3928 * the current information which can be later restored via restore_vacuum_error_info.
3929 */
3930static void
3932 int phase, BlockNumber blkno, OffsetNumber offnum)
3933{
3934 if (saved_vacrel)
3935 {
3936 saved_vacrel->offnum = vacrel->offnum;
3937 saved_vacrel->blkno = vacrel->blkno;
3938 saved_vacrel->phase = vacrel->phase;
3939 }
3940
3941 vacrel->blkno = blkno;
3942 vacrel->offnum = offnum;
3943 vacrel->phase = phase;
3944}
3945
3946/*
3947 * Restores the vacuum information saved via a prior call to update_vacuum_error_info.
3948 */
3949static void
3951 const LVSavedErrInfo *saved_vacrel)
3952{
3953 vacrel->blkno = saved_vacrel->blkno;
3954 vacrel->offnum = saved_vacrel->offnum;
3955 vacrel->phase = saved_vacrel->phase;
3956}
Datum idx(PG_FUNCTION_ARGS)
Definition: _int_op.c:262
int autovacuum_work_mem
Definition: autovacuum.c:120
void TimestampDifference(TimestampTz start_time, TimestampTz stop_time, long *secs, int *microsecs)
Definition: timestamp.c:1721
bool TimestampDifferenceExceeds(TimestampTz start_time, TimestampTz stop_time, int msec)
Definition: timestamp.c:1781
TimestampTz GetCurrentTimestamp(void)
Definition: timestamp.c:1645
void pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid)
void pgstat_progress_update_param(int index, int64 val)
void pgstat_progress_update_multi_param(int nparam, const int *index, const int64 *val)
void pgstat_progress_end_command(void)
@ PROGRESS_COMMAND_VACUUM
PgBackendStatus * MyBEEntry
uint32 BlockNumber
Definition: block.h:31
#define InvalidBlockNumber
Definition: block.h:33
static bool BlockNumberIsValid(BlockNumber blockNumber)
Definition: block.h:71
int Buffer
Definition: buf.h:23
#define InvalidBuffer
Definition: buf.h:25
bool track_io_timing
Definition: bufmgr.c:147
void CheckBufferIsPinnedOnce(Buffer buffer)
Definition: bufmgr.c:5651
BlockNumber BufferGetBlockNumber(Buffer buffer)
Definition: bufmgr.c:4223
PrefetchBufferResult PrefetchBuffer(Relation reln, ForkNumber forkNum, BlockNumber blockNum)
Definition: bufmgr.c:653
void ReleaseBuffer(Buffer buffer)
Definition: bufmgr.c:5366
void UnlockReleaseBuffer(Buffer buffer)
Definition: bufmgr.c:5383
void MarkBufferDirty(Buffer buffer)
Definition: bufmgr.c:2943
void LockBufferForCleanup(Buffer buffer)
Definition: bufmgr.c:5684
void LockBuffer(Buffer buffer, int mode)
Definition: bufmgr.c:5604
Buffer ReadBufferExtended(Relation reln, ForkNumber forkNum, BlockNumber blockNum, ReadBufferMode mode, BufferAccessStrategy strategy)
Definition: bufmgr.c:792
bool ConditionalLockBufferForCleanup(Buffer buffer)
Definition: bufmgr.c:5857
#define BUFFER_LOCK_UNLOCK
Definition: bufmgr.h:203
#define BUFFER_LOCK_SHARE
Definition: bufmgr.h:204
#define RelationGetNumberOfBlocks(reln)
Definition: bufmgr.h:291
static Page BufferGetPage(Buffer buffer)
Definition: bufmgr.h:425
#define BUFFER_LOCK_EXCLUSIVE
Definition: bufmgr.h:205
@ RBM_NORMAL
Definition: bufmgr.h:46
static bool BufferIsValid(Buffer bufnum)
Definition: bufmgr.h:376
Size PageGetHeapFreeSpace(const PageData *page)
Definition: bufpage.c:990
void PageTruncateLinePointerArray(Page page)
Definition: bufpage.c:834
static bool PageIsEmpty(const PageData *page)
Definition: bufpage.h:223
static bool PageIsAllVisible(const PageData *page)
Definition: bufpage.h:428
static void PageClearAllVisible(Page page)
Definition: bufpage.h:438
static void * PageGetItem(const PageData *page, const ItemIdData *itemId)
Definition: bufpage.h:353
static bool PageIsNew(const PageData *page)
Definition: bufpage.h:233
#define SizeOfPageHeaderData
Definition: bufpage.h:216
static void PageSetAllVisible(Page page)
Definition: bufpage.h:433
static ItemId PageGetItemId(Page page, OffsetNumber offsetNumber)
Definition: bufpage.h:243
PageData * Page
Definition: bufpage.h:81
static XLogRecPtr PageGetLSN(const PageData *page)
Definition: bufpage.h:385
static OffsetNumber PageGetMaxOffsetNumber(const PageData *page)
Definition: bufpage.h:371
uint8_t uint8
Definition: c.h:541
#define Max(x, y)
Definition: c.h:1002
int64_t int64
Definition: c.h:540
TransactionId MultiXactId
Definition: c.h:672
int32_t int32
Definition: c.h:539
#define unlikely(x)
Definition: c.h:407
uint32_t uint32
Definition: c.h:543
#define lengthof(array)
Definition: c.h:792
#define StaticAssertStmt(condition, errmessage)
Definition: c.h:942
uint32 TransactionId
Definition: c.h:662
size_t Size
Definition: c.h:615
int64 TimestampTz
Definition: timestamp.h:39
int errmsg_internal(const char *fmt,...)
Definition: elog.c:1170
int errdetail(const char *fmt,...)
Definition: elog.c:1216
ErrorContextCallback * error_context_stack
Definition: elog.c:95
int errhint(const char *fmt,...)
Definition: elog.c:1330
int errcode(int sqlerrcode)
Definition: elog.c:863
int errmsg(const char *fmt,...)
Definition: elog.c:1080
#define _(x)
Definition: elog.c:91
#define LOG
Definition: elog.h:31
#define errcontext
Definition: elog.h:198
#define WARNING
Definition: elog.h:36
#define DEBUG2
Definition: elog.h:29
#define ERROR
Definition: elog.h:39
#define elog(elevel,...)
Definition: elog.h:226
#define INFO
Definition: elog.h:34
#define ereport(elevel,...)
Definition: elog.h:150
void FreeSpaceMapVacuumRange(Relation rel, BlockNumber start, BlockNumber end)
Definition: freespace.c:377
Size GetRecordedFreeSpace(Relation rel, BlockNumber heapBlk)
Definition: freespace.c:244
void RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk, Size spaceAvail)
Definition: freespace.c:194
bool VacuumCostActive
Definition: globals.c:158
int VacuumCostBalance
Definition: globals.c:157
int maintenance_work_mem
Definition: globals.c:133
volatile uint32 CritSectionCount
Definition: globals.c:45
struct Latch * MyLatch
Definition: globals.c:63
Oid MyDatabaseId
Definition: globals.c:94
Assert(PointerIsAligned(start, uint64))
bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple)
Definition: heapam.c:7821
bool heap_tuple_should_freeze(HeapTupleHeader tuple, const struct VacuumCutoffs *cutoffs, TransactionId *NoFreezePageRelfrozenXid, MultiXactId *NoFreezePageRelminMxid)
Definition: heapam.c:7876
#define HEAP_PAGE_PRUNE_FREEZE
Definition: heapam.h:44
@ HEAPTUPLE_RECENTLY_DEAD
Definition: heapam.h:128
@ HEAPTUPLE_INSERT_IN_PROGRESS
Definition: heapam.h:129
@ HEAPTUPLE_LIVE
Definition: heapam.h:127
@ HEAPTUPLE_DELETE_IN_PROGRESS
Definition: heapam.h:130
@ HEAPTUPLE_DEAD
Definition: heapam.h:126
@ PRUNE_VACUUM_CLEANUP
Definition: heapam.h:230
@ PRUNE_VACUUM_SCAN
Definition: heapam.h:229
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW
Definition: heapam.h:43
HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin, Buffer buffer)
HeapTupleHeaderData * HeapTupleHeader
Definition: htup.h:23
static TransactionId HeapTupleHeaderGetXmin(const HeapTupleHeaderData *tup)
Definition: htup_details.h:324
#define MaxHeapTuplesPerPage
Definition: htup_details.h:624
static bool HeapTupleHeaderXminCommitted(const HeapTupleHeaderData *tup)
Definition: htup_details.h:337
int verbose
#define INSTR_TIME_SET_CURRENT(t)
Definition: instr_time.h:122
#define INSTR_TIME_SUBTRACT(x, y)
Definition: instr_time.h:181
#define INSTR_TIME_GET_MICROSEC(t)
Definition: instr_time.h:194
WalUsage pgWalUsage
Definition: instrument.c:22
void WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub)
Definition: instrument.c:288
BufferUsage pgBufferUsage
Definition: instrument.c:20
void BufferUsageAccumDiff(BufferUsage *dst, const BufferUsage *add, const BufferUsage *sub)
Definition: instrument.c:248
static int pg_cmp_u16(uint16 a, uint16 b)
Definition: int.h:640
int b
Definition: isn.c:74
int a
Definition: isn.c:73
int i
Definition: isn.c:77
#define ItemIdGetLength(itemId)
Definition: itemid.h:59
#define ItemIdIsNormal(itemId)
Definition: itemid.h:99
#define ItemIdIsDead(itemId)
Definition: itemid.h:113
#define ItemIdIsUsed(itemId)
Definition: itemid.h:92
#define ItemIdSetUnused(itemId)
Definition: itemid.h:128
#define ItemIdIsRedirected(itemId)
Definition: itemid.h:106
#define ItemIdHasStorage(itemId)
Definition: itemid.h:120
static void ItemPointerSet(ItemPointerData *pointer, BlockNumber blockNumber, OffsetNumber offNum)
Definition: itemptr.h:135
void ResetLatch(Latch *latch)
Definition: latch.c:374
int WaitLatch(Latch *latch, int wakeEvents, long timeout, uint32 wait_event_info)
Definition: latch.c:172
void UnlockRelation(Relation relation, LOCKMODE lockmode)
Definition: lmgr.c:314
bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode)
Definition: lmgr.c:278
bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode)
Definition: lmgr.c:367
#define NoLock
Definition: lockdefs.h:34
#define AccessExclusiveLock
Definition: lockdefs.h:43
#define RowExclusiveLock
Definition: lockdefs.h:38
char * get_database_name(Oid dbid)
Definition: lsyscache.c:1259
char * get_namespace_name(Oid nspid)
Definition: lsyscache.c:3533
char * pstrdup(const char *in)
Definition: mcxt.c:1759
void pfree(void *pointer)
Definition: mcxt.c:1594
void * palloc0(Size size)
Definition: mcxt.c:1395
void * palloc(Size size)
Definition: mcxt.c:1365
#define AmAutoVacuumWorkerProcess()
Definition: miscadmin.h:383
#define START_CRIT_SECTION()
Definition: miscadmin.h:150
#define CHECK_FOR_INTERRUPTS()
Definition: miscadmin.h:123
#define END_CRIT_SECTION()
Definition: miscadmin.h:152
bool MultiXactIdPrecedes(MultiXactId multi1, MultiXactId multi2)
Definition: multixact.c:3265
bool MultiXactIdPrecedesOrEquals(MultiXactId multi1, MultiXactId multi2)
Definition: multixact.c:3279
#define MultiXactIdIsValid(multi)
Definition: multixact.h:29
#define InvalidMultiXactId
Definition: multixact.h:25
#define InvalidOffsetNumber
Definition: off.h:26
#define OffsetNumberIsValid(offsetNumber)
Definition: off.h:39
#define OffsetNumberNext(offsetNumber)
Definition: off.h:52
uint16 OffsetNumber
Definition: off.h:24
#define FirstOffsetNumber
Definition: off.h:27
#define MaxOffsetNumber
Definition: off.h:28
void * arg
#define ERRCODE_DATA_CORRUPTED
Definition: pg_basebackup.c:42
uint32 pg_prng_uint32(pg_prng_state *state)
Definition: pg_prng.c:227
pg_prng_state pg_global_prng_state
Definition: pg_prng.c:34
const char * pg_rusage_show(const PGRUsage *ru0)
Definition: pg_rusage.c:40
void pg_rusage_init(PGRUsage *ru0)
Definition: pg_rusage.c:27
static char * buf
Definition: pg_test_fsync.c:72
int64 PgStat_Counter
Definition: pgstat.h:67
PgStat_Counter pgStatBlockReadTime
PgStat_Counter pgStatBlockWriteTime
void pgstat_report_vacuum(Oid tableoid, bool shared, PgStat_Counter livetuples, PgStat_Counter deadtuples, TimestampTz starttime)
#define qsort(a, b, c, d)
Definition: port.h:479
GlobalVisState * GlobalVisTestFor(Relation rel)
Definition: procarray.c:4069
#define PROGRESS_VACUUM_PHASE_FINAL_CLEANUP
Definition: progress.h:39
#define PROGRESS_VACUUM_DEAD_TUPLE_BYTES
Definition: progress.h:27
#define PROGRESS_VACUUM_PHASE_SCAN_HEAP
Definition: progress.h:34
#define PROGRESS_VACUUM_TOTAL_HEAP_BLKS
Definition: progress.h:22
#define PROGRESS_VACUUM_PHASE
Definition: progress.h:21
#define PROGRESS_VACUUM_DELAY_TIME
Definition: progress.h:31
#define PROGRESS_VACUUM_NUM_INDEX_VACUUMS
Definition: progress.h:25
#define PROGRESS_VACUUM_PHASE_VACUUM_HEAP
Definition: progress.h:36
#define PROGRESS_VACUUM_NUM_DEAD_ITEM_IDS
Definition: progress.h:28
#define PROGRESS_VACUUM_MAX_DEAD_TUPLE_BYTES
Definition: progress.h:26
#define PROGRESS_VACUUM_HEAP_BLKS_SCANNED
Definition: progress.h:23
#define PROGRESS_VACUUM_PHASE_INDEX_CLEANUP
Definition: progress.h:37
#define PROGRESS_VACUUM_PHASE_VACUUM_INDEX
Definition: progress.h:35
#define PROGRESS_VACUUM_INDEXES_PROCESSED
Definition: progress.h:30
#define PROGRESS_VACUUM_INDEXES_TOTAL
Definition: progress.h:29
#define PROGRESS_VACUUM_HEAP_BLKS_VACUUMED
Definition: progress.h:24
#define PROGRESS_VACUUM_PHASE_TRUNCATE
Definition: progress.h:38
void heap_page_prune_and_freeze(PruneFreezeParams *params, PruneFreezeResult *presult, OffsetNumber *off_loc, TransactionId *new_relfrozen_xid, MultiXactId *new_relmin_mxid)
Definition: pruneheap.c:484
void log_heap_prune_and_freeze(Relation relation, Buffer buffer, Buffer vmbuffer, uint8 vmflags, TransactionId conflict_xid, bool cleanup_lock, PruneReason reason, HeapTupleFreeze *frozen, int nfrozen, OffsetNumber *redirected, int nredirected, OffsetNumber *dead, int ndead, OffsetNumber *unused, int nunused)
Definition: pruneheap.c:2103
Buffer read_stream_next_buffer(ReadStream *stream, void **per_buffer_data)
Definition: read_stream.c:791
ReadStream * read_stream_begin_relation(int flags, BufferAccessStrategy strategy, Relation rel, ForkNumber forknum, ReadStreamBlockNumberCB callback, void *callback_private_data, size_t per_buffer_data_size)
Definition: read_stream.c:737
void read_stream_end(ReadStream *stream)
Definition: read_stream.c:1089
#define READ_STREAM_MAINTENANCE
Definition: read_stream.h:28
#define READ_STREAM_USE_BATCHING
Definition: read_stream.h:64
#define RelationGetRelid(relation)
Definition: rel.h:515
#define RelationGetRelationName(relation)
Definition: rel.h:549
#define RelationNeedsWAL(relation)
Definition: rel.h:638
#define RelationUsesLocalBuffers(relation)
Definition: rel.h:647
#define RelationGetNamespace(relation)
Definition: rel.h:556
@ MAIN_FORKNUM
Definition: relpath.h:58
void RelationTruncate(Relation rel, BlockNumber nblocks)
Definition: storage.c:289
void appendStringInfo(StringInfo str, const char *fmt,...)
Definition: stringinfo.c:145
void appendStringInfoString(StringInfo str, const char *s)
Definition: stringinfo.c:230
void initStringInfo(StringInfo str)
Definition: stringinfo.c:97
int64 shared_blks_dirtied
Definition: instrument.h:28
int64 local_blks_hit
Definition: instrument.h:30
int64 shared_blks_read
Definition: instrument.h:27
int64 local_blks_read
Definition: instrument.h:31
int64 local_blks_dirtied
Definition: instrument.h:32
int64 shared_blks_hit
Definition: instrument.h:26
struct ErrorContextCallback * previous
Definition: elog.h:297
void(* callback)(void *arg)
Definition: elog.h:298
ItemPointerData t_self
Definition: htup.h:65
uint32 t_len
Definition: htup.h:64
HeapTupleHeader t_data
Definition: htup.h:68
Oid t_tableOid
Definition: htup.h:66
BlockNumber pages_deleted
Definition: genam.h:109
BlockNumber pages_newly_deleted
Definition: genam.h:108
BlockNumber pages_free
Definition: genam.h:110
BlockNumber num_pages
Definition: genam.h:104
double num_index_tuples
Definition: genam.h:106
Relation index
Definition: genam.h:73
double num_heap_tuples
Definition: genam.h:79
bool analyze_only
Definition: genam.h:75
BufferAccessStrategy strategy
Definition: genam.h:80
Relation heaprel
Definition: genam.h:74
bool report_progress
Definition: genam.h:76
int message_level
Definition: genam.h:78
bool estimated_count
Definition: genam.h:77
BlockNumber next_eager_scan_region_start
Definition: vacuumlazy.c:377
ParallelVacuumState * pvs
Definition: vacuumlazy.c:267
bool next_unskippable_eager_scanned
Definition: vacuumlazy.c:362
bool verbose
Definition: vacuumlazy.c:297
VacDeadItemsInfo * dead_items_info
Definition: vacuumlazy.c:310
BlockNumber vm_new_frozen_pages
Definition: vacuumlazy.c:336
int nindexes
Definition: vacuumlazy.c:263
Buffer next_unskippable_vmbuffer
Definition: vacuumlazy.c:363
OffsetNumber offnum
Definition: vacuumlazy.c:295
TidStore * dead_items
Definition: vacuumlazy.c:309
int64 tuples_deleted
Definition: vacuumlazy.c:351
BlockNumber nonempty_pages
Definition: vacuumlazy.c:340
BlockNumber eager_scan_remaining_fails
Definition: vacuumlazy.c:409
bool do_rel_truncate
Definition: vacuumlazy.c:279
BlockNumber scanned_pages
Definition: vacuumlazy.c:313
bool aggressive
Definition: vacuumlazy.c:270
BlockNumber new_frozen_tuple_pages
Definition: vacuumlazy.c:322
GlobalVisState * vistest
Definition: vacuumlazy.c:283
BlockNumber removed_pages
Definition: vacuumlazy.c:321
int num_index_scans
Definition: vacuumlazy.c:349
IndexBulkDeleteResult ** indstats
Definition: vacuumlazy.c:346
double new_live_tuples
Definition: vacuumlazy.c:344
double new_rel_tuples
Definition: vacuumlazy.c:343
TransactionId NewRelfrozenXid
Definition: vacuumlazy.c:285
Relation rel
Definition: vacuumlazy.c:261
bool consider_bypass_optimization
Definition: vacuumlazy.c:274
BlockNumber rel_pages
Definition: vacuumlazy.c:312
BlockNumber next_unskippable_block
Definition: vacuumlazy.c:360
int64 recently_dead_tuples
Definition: vacuumlazy.c:355
int64 tuples_frozen
Definition: vacuumlazy.c:352
char * dbname
Definition: vacuumlazy.c:290
BlockNumber missed_dead_pages
Definition: vacuumlazy.c:339
BlockNumber current_block
Definition: vacuumlazy.c:359
char * relnamespace
Definition: vacuumlazy.c:291
int64 live_tuples
Definition: vacuumlazy.c:354
int64 lpdead_items
Definition: vacuumlazy.c:353
BufferAccessStrategy bstrategy
Definition: vacuumlazy.c:266
BlockNumber eager_scan_remaining_successes
Definition: vacuumlazy.c:388
bool skippedallvis
Definition: vacuumlazy.c:287
BlockNumber lpdead_item_pages
Definition: vacuumlazy.c:338
BlockNumber eager_scanned_pages
Definition: vacuumlazy.c:319
Relation * indrels
Definition: vacuumlazy.c:262
bool skipwithvm
Definition: vacuumlazy.c:272
bool do_index_cleanup
Definition: vacuumlazy.c:278
MultiXactId NewRelminMxid
Definition: vacuumlazy.c:286
int64 missed_dead_tuples
Definition: vacuumlazy.c:356
BlockNumber blkno
Definition: vacuumlazy.c:294
struct VacuumCutoffs cutoffs
Definition: vacuumlazy.c:282
bool next_unskippable_allvis
Definition: vacuumlazy.c:361
BlockNumber vm_new_visible_pages
Definition: vacuumlazy.c:325
char * relname
Definition: vacuumlazy.c:292
BlockNumber eager_scan_max_fails_per_region
Definition: vacuumlazy.c:399
VacErrPhase phase
Definition: vacuumlazy.c:296
char * indname
Definition: vacuumlazy.c:293
BlockNumber vm_new_visible_frozen_pages
Definition: vacuumlazy.c:333
bool do_index_vacuuming
Definition: vacuumlazy.c:277
BlockNumber blkno
Definition: vacuumlazy.c:416
VacErrPhase phase
Definition: vacuumlazy.c:418
OffsetNumber offnum
Definition: vacuumlazy.c:417
int64 st_progress_param[PGSTAT_NUM_PROGRESS_PARAM]
Relation relation
Definition: heapam.h:238
int recently_dead_tuples
Definition: heapam.h:285
TransactionId vm_conflict_horizon
Definition: heapam.h:300
OffsetNumber deadoffsets[MaxHeapTuplesPerPage]
Definition: heapam.h:314
bool all_visible
Definition: heapam.h:298
RelFileLocator rd_locator
Definition: rel.h:57
Form_pg_class rd_rel
Definition: rel.h:111
BlockNumber blkno
Definition: tidstore.h:29
size_t max_bytes
Definition: vacuum.h:299
int64 num_items
Definition: vacuum.h:300
TransactionId FreezeLimit
Definition: vacuum.h:289
TransactionId OldestXmin
Definition: vacuum.h:279
TransactionId relfrozenxid
Definition: vacuum.h:263
MultiXactId relminmxid
Definition: vacuum.h:264
MultiXactId MultiXactCutoff
Definition: vacuum.h:290
MultiXactId OldestMxact
Definition: vacuum.h:280
int nworkers
Definition: vacuum.h:251
VacOptValue truncate
Definition: vacuum.h:236
bits32 options
Definition: vacuum.h:219
int log_vacuum_min_duration
Definition: vacuum.h:227
bool is_wraparound
Definition: vacuum.h:226
VacOptValue index_cleanup
Definition: vacuum.h:235
double max_eager_freeze_failure_rate
Definition: vacuum.h:244
int64 wal_buffers_full
Definition: instrument.h:57
uint64 wal_bytes
Definition: instrument.h:55
int64 wal_fpi
Definition: instrument.h:54
uint64 wal_fpi_bytes
Definition: instrument.h:56
int64 wal_records
Definition: instrument.h:53
TidStoreIter * TidStoreBeginIterate(TidStore *ts)
Definition: tidstore.c:471
void TidStoreEndIterate(TidStoreIter *iter)
Definition: tidstore.c:518
TidStoreIterResult * TidStoreIterateNext(TidStoreIter *iter)
Definition: tidstore.c:493
TidStore * TidStoreCreateLocal(size_t max_bytes, bool insert_only)
Definition: tidstore.c:162
void TidStoreDestroy(TidStore *ts)
Definition: tidstore.c:317
int TidStoreGetBlockOffsets(TidStoreIterResult *result, OffsetNumber *offsets, int max_offsets)
Definition: tidstore.c:566
void TidStoreSetBlockOffsets(TidStore *ts, BlockNumber blkno, OffsetNumber *offsets, int num_offsets)
Definition: tidstore.c:345
size_t TidStoreMemoryUsage(TidStore *ts)
Definition: tidstore.c:532
static bool TransactionIdFollows(TransactionId id1, TransactionId id2)
Definition: transam.h:297
static TransactionId ReadNextTransactionId(void)
Definition: transam.h:377
#define InvalidTransactionId
Definition: transam.h:31
static bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
Definition: transam.h:282
#define TransactionIdIsValid(xid)
Definition: transam.h:41
#define TransactionIdIsNormal(xid)
Definition: transam.h:42
static bool TransactionIdPrecedes(TransactionId id1, TransactionId id2)
Definition: transam.h:263
bool track_cost_delay_timing
Definition: vacuum.c:82
void vac_open_indexes(Relation relation, LOCKMODE lockmode, int *nindexes, Relation **Irel)
Definition: vacuum.c:2360
IndexBulkDeleteResult * vac_cleanup_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat)
Definition: vacuum.c:2652
void vac_close_indexes(int nindexes, Relation *Irel, LOCKMODE lockmode)
Definition: vacuum.c:2403
void vacuum_delay_point(bool is_analyze)
Definition: vacuum.c:2424
bool vacuum_xid_failsafe_check(const struct VacuumCutoffs *cutoffs)
Definition: vacuum.c:1266
bool VacuumFailsafeActive
Definition: vacuum.c:110
double vac_estimate_reltuples(Relation relation, BlockNumber total_pages, BlockNumber scanned_pages, double scanned_tuples)
Definition: vacuum.c:1328
void vac_update_relstats(Relation relation, BlockNumber num_pages, double num_tuples, BlockNumber num_all_visible_pages, BlockNumber num_all_frozen_pages, bool hasindex, TransactionId frozenxid, MultiXactId minmulti, bool *frozenxid_updated, bool *minmulti_updated, bool in_outer_xact)
Definition: vacuum.c:1424
bool vacuum_get_cutoffs(Relation rel, const VacuumParams params, struct VacuumCutoffs *cutoffs)
Definition: vacuum.c:1098
IndexBulkDeleteResult * vac_bulkdel_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat, TidStore *dead_items, VacDeadItemsInfo *dead_items_info)
Definition: vacuum.c:2631
#define VACOPT_VERBOSE
Definition: vacuum.h:182
@ VACOPTVALUE_AUTO
Definition: vacuum.h:203
@ VACOPTVALUE_ENABLED
Definition: vacuum.h:205
@ VACOPTVALUE_UNSPECIFIED
Definition: vacuum.h:202
@ VACOPTVALUE_DISABLED
Definition: vacuum.h:204
#define VACOPT_DISABLE_PAGE_SKIPPING
Definition: vacuum.h:188
static void dead_items_cleanup(LVRelState *vacrel)
Definition: vacuumlazy.c:3626
#define VAC_BLK_WAS_EAGER_SCANNED
Definition: vacuumlazy.c:255
static void update_relstats_all_indexes(LVRelState *vacrel)
Definition: vacuumlazy.c:3832
static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *offsets, int num_offsets)
Definition: vacuumlazy.c:3582
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf, BlockNumber blkno, Page page, Buffer vmbuffer, bool all_visible_according_to_vm, bool *has_lpdead_items, bool *vm_page_frozen)
Definition: vacuumlazy.c:1957
void heap_vacuum_rel(Relation rel, const VacuumParams params, BufferAccessStrategy bstrategy)
Definition: vacuumlazy.c:627
static BlockNumber heap_vac_scan_next_block(ReadStream *stream, void *callback_private_data, void *per_buffer_data)
Definition: vacuumlazy.c:1585
static void heap_vacuum_eager_scan_setup(LVRelState *vacrel, const VacuumParams params)
Definition: vacuumlazy.c:500
#define VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL
Definition: vacuumlazy.c:179
static void vacuum_error_callback(void *arg)
Definition: vacuumlazy.c:3867
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf, TransactionId OldestXmin, OffsetNumber *deadoffsets, int ndeadoffsets, bool *all_frozen, TransactionId *visibility_cutoff_xid, OffsetNumber *logging_offnum)
Definition: vacuumlazy.c:3694
#define EAGER_SCAN_REGION_SIZE
Definition: vacuumlazy.c:249
static void lazy_truncate_heap(LVRelState *vacrel)
Definition: vacuumlazy.c:3244
static void lazy_vacuum(LVRelState *vacrel)
Definition: vacuumlazy.c:2475
static void lazy_cleanup_all_indexes(LVRelState *vacrel)
Definition: vacuumlazy.c:3047
#define MAX_EAGER_FREEZE_SUCCESS_RATE
Definition: vacuumlazy.c:240
static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf, BlockNumber blkno, Page page, bool *has_lpdead_items)
Definition: vacuumlazy.c:2264
static BlockNumber vacuum_reap_lp_read_stream_next(ReadStream *stream, void *callback_private_data, void *per_buffer_data)
Definition: vacuumlazy.c:2707
#define REL_TRUNCATE_MINIMUM
Definition: vacuumlazy.c:168
static bool should_attempt_truncation(LVRelState *vacrel)
Definition: vacuumlazy.c:3224
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno, Page page, bool sharelock, Buffer vmbuffer)
Definition: vacuumlazy.c:1822
VacErrPhase
Definition: vacuumlazy.c:224
@ VACUUM_ERRCB_PHASE_SCAN_HEAP
Definition: vacuumlazy.c:226
@ VACUUM_ERRCB_PHASE_VACUUM_INDEX
Definition: vacuumlazy.c:227
@ VACUUM_ERRCB_PHASE_TRUNCATE
Definition: vacuumlazy.c:230
@ VACUUM_ERRCB_PHASE_INDEX_CLEANUP
Definition: vacuumlazy.c:229
@ VACUUM_ERRCB_PHASE_VACUUM_HEAP
Definition: vacuumlazy.c:228
@ VACUUM_ERRCB_PHASE_UNKNOWN
Definition: vacuumlazy.c:225
static void lazy_scan_heap(LVRelState *vacrel)
Definition: vacuumlazy.c:1213
#define ParallelVacuumIsActive(vacrel)
Definition: vacuumlazy.c:220
static void restore_vacuum_error_info(LVRelState *vacrel, const LVSavedErrInfo *saved_vacrel)
Definition: vacuumlazy.c:3950
static IndexBulkDeleteResult * lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat, double reltuples, LVRelState *vacrel)
Definition: vacuumlazy.c:3115
static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
Definition: vacuumlazy.c:1690
static void dead_items_reset(LVRelState *vacrel)
Definition: vacuumlazy.c:3604
#define REL_TRUNCATE_FRACTION
Definition: vacuumlazy.c:169
static bool lazy_check_wraparound_failsafe(LVRelState *vacrel)
Definition: vacuumlazy.c:2994
struct LVSavedErrInfo LVSavedErrInfo
static IndexBulkDeleteResult * lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat, double reltuples, bool estimated_count, LVRelState *vacrel)
Definition: vacuumlazy.c:3164
#define PREFETCH_SIZE
Definition: vacuumlazy.c:214
static void lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer, OffsetNumber *deadoffsets, int num_offsets, Buffer vmbuffer)
Definition: vacuumlazy.c:2863
struct LVRelState LVRelState
#define BYPASS_THRESHOLD_PAGES
Definition: vacuumlazy.c:186
static void dead_items_alloc(LVRelState *vacrel, int nworkers)
Definition: vacuumlazy.c:3517
#define VACUUM_TRUNCATE_LOCK_TIMEOUT
Definition: vacuumlazy.c:180
static bool lazy_vacuum_all_indexes(LVRelState *vacrel)
Definition: vacuumlazy.c:2600
static void update_vacuum_error_info(LVRelState *vacrel, LVSavedErrInfo *saved_vacrel, int phase, BlockNumber blkno, OffsetNumber offnum)
Definition: vacuumlazy.c:3931
static BlockNumber count_nondeletable_pages(LVRelState *vacrel, bool *lock_waiter_detected)
Definition: vacuumlazy.c:3375
#define VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM
Definition: vacuumlazy.c:256
#define SKIP_PAGES_THRESHOLD
Definition: vacuumlazy.c:208
#define FAILSAFE_EVERY_PAGES
Definition: vacuumlazy.c:192
#define VACUUM_TRUNCATE_LOCK_CHECK_INTERVAL
Definition: vacuumlazy.c:178
static int cmpOffsetNumbers(const void *a, const void *b)
Definition: vacuumlazy.c:1932
static void lazy_vacuum_heap_rel(LVRelState *vacrel)
Definition: vacuumlazy.c:2745
#define VACUUM_FSM_EVERY_PAGES
Definition: vacuumlazy.c:201
TidStore * parallel_vacuum_get_dead_items(ParallelVacuumState *pvs, VacDeadItemsInfo **dead_items_info_p)
ParallelVacuumState * parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes, int nrequested_workers, int vac_work_mem, int elevel, BufferAccessStrategy bstrategy)
void parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long num_table_tuples, int num_index_scans)
void parallel_vacuum_reset_dead_items(ParallelVacuumState *pvs)
void parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tuples, int num_index_scans, bool estimated_count)
void parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats)
bool visibilitymap_clear(Relation rel, BlockNumber heapBlk, Buffer vmbuf, uint8 flags)
void visibilitymap_pin(Relation rel, BlockNumber heapBlk, Buffer *vmbuf)
uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf)
uint8 visibilitymap_set_vmbits(BlockNumber heapBlk, Buffer vmBuf, uint8 flags, const RelFileLocator rlocator)
void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen)
uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf, XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid, uint8 flags)
#define VM_ALL_FROZEN(r, b, v)
Definition: visibilitymap.h:27
#define VISIBILITYMAP_VALID_BITS
#define VISIBILITYMAP_ALL_FROZEN
#define VISIBILITYMAP_ALL_VISIBLE
#define WL_TIMEOUT
Definition: waiteventset.h:37
#define WL_EXIT_ON_PM_DEATH
Definition: waiteventset.h:39
#define WL_LATCH_SET
Definition: waiteventset.h:34
bool IsInParallelMode(void)
Definition: xact.c:1090
#define XLogRecPtrIsValid(r)
Definition: xlogdefs.h:29
#define InvalidXLogRecPtr
Definition: xlogdefs.h:28
XLogRecPtr log_newpage_buffer(Buffer buffer, bool page_std)
Definition: xloginsert.c:1259