PostgreSQL Source Code git master
worker.c
Go to the documentation of this file.
1/*-------------------------------------------------------------------------
2 * worker.c
3 * PostgreSQL logical replication worker (apply)
4 *
5 * Copyright (c) 2016-2025, PostgreSQL Global Development Group
6 *
7 * IDENTIFICATION
8 * src/backend/replication/logical/worker.c
9 *
10 * NOTES
11 * This file contains the worker which applies logical changes as they come
12 * from remote logical replication stream.
13 *
14 * The main worker (apply) is started by logical replication worker
15 * launcher for every enabled subscription in a database. It uses
16 * walsender protocol to communicate with publisher.
17 *
18 * This module includes server facing code and shares libpqwalreceiver
19 * module with walreceiver for providing the libpq specific functionality.
20 *
21 *
22 * STREAMED TRANSACTIONS
23 * ---------------------
24 * Streamed transactions (large transactions exceeding a memory limit on the
25 * upstream) are applied using one of two approaches:
26 *
27 * 1) Write to temporary files and apply when the final commit arrives
28 *
29 * This approach is used when the user has set the subscription's streaming
30 * option as on.
31 *
32 * Unlike the regular (non-streamed) case, handling streamed transactions has
33 * to handle aborts of both the toplevel transaction and subtransactions. This
34 * is achieved by tracking offsets for subtransactions, which is then used
35 * to truncate the file with serialized changes.
36 *
37 * The files are placed in tmp file directory by default, and the filenames
38 * include both the XID of the toplevel transaction and OID of the
39 * subscription. This is necessary so that different workers processing a
40 * remote transaction with the same XID doesn't interfere.
41 *
42 * We use BufFiles instead of using normal temporary files because (a) the
43 * BufFile infrastructure supports temporary files that exceed the OS file size
44 * limit, (b) provides a way for automatic clean up on the error and (c) provides
45 * a way to survive these files across local transactions and allow to open and
46 * close at stream start and close. We decided to use FileSet
47 * infrastructure as without that it deletes the files on the closure of the
48 * file and if we decide to keep stream files open across the start/stop stream
49 * then it will consume a lot of memory (more than 8K for each BufFile and
50 * there could be multiple such BufFiles as the subscriber could receive
51 * multiple start/stop streams for different transactions before getting the
52 * commit). Moreover, if we don't use FileSet then we also need to invent
53 * a new way to pass filenames to BufFile APIs so that we are allowed to open
54 * the file we desired across multiple stream-open calls for the same
55 * transaction.
56 *
57 * 2) Parallel apply workers.
58 *
59 * This approach is used when the user has set the subscription's streaming
60 * option as parallel. See logical/applyparallelworker.c for information about
61 * this approach.
62 *
63 * TWO_PHASE TRANSACTIONS
64 * ----------------------
65 * Two phase transactions are replayed at prepare and then committed or
66 * rolled back at commit prepared and rollback prepared respectively. It is
67 * possible to have a prepared transaction that arrives at the apply worker
68 * when the tablesync is busy doing the initial copy. In this case, the apply
69 * worker skips all the prepared operations [e.g. inserts] while the tablesync
70 * is still busy (see the condition of should_apply_changes_for_rel). The
71 * tablesync worker might not get such a prepared transaction because say it
72 * was prior to the initial consistent point but might have got some later
73 * commits. Now, the tablesync worker will exit without doing anything for the
74 * prepared transaction skipped by the apply worker as the sync location for it
75 * will be already ahead of the apply worker's current location. This would lead
76 * to an "empty prepare", because later when the apply worker does the commit
77 * prepare, there is nothing in it (the inserts were skipped earlier).
78 *
79 * To avoid this, and similar prepare confusions the subscription's two_phase
80 * commit is enabled only after the initial sync is over. The two_phase option
81 * has been implemented as a tri-state with values DISABLED, PENDING, and
82 * ENABLED.
83 *
84 * Even if the user specifies they want a subscription with two_phase = on,
85 * internally it will start with a tri-state of PENDING which only becomes
86 * ENABLED after all tablesync initializations are completed - i.e. when all
87 * tablesync workers have reached their READY state. In other words, the value
88 * PENDING is only a temporary state for subscription start-up.
89 *
90 * Until the two_phase is properly available (ENABLED) the subscription will
91 * behave as if two_phase = off. When the apply worker detects that all
92 * tablesyncs have become READY (while the tri-state was PENDING) it will
93 * restart the apply worker process. This happens in
94 * ProcessSyncingTablesForApply.
95 *
96 * When the (re-started) apply worker finds that all tablesyncs are READY for a
97 * two_phase tri-state of PENDING it start streaming messages with the
98 * two_phase option which in turn enables the decoding of two-phase commits at
99 * the publisher. Then, it updates the tri-state value from PENDING to ENABLED.
100 * Now, it is possible that during the time we have not enabled two_phase, the
101 * publisher (replication server) would have skipped some prepares but we
102 * ensure that such prepares are sent along with commit prepare, see
103 * ReorderBufferFinishPrepared.
104 *
105 * If the subscription has no tables then a two_phase tri-state PENDING is
106 * left unchanged. This lets the user still do an ALTER SUBSCRIPTION REFRESH
107 * PUBLICATION which might otherwise be disallowed (see below).
108 *
109 * If ever a user needs to be aware of the tri-state value, they can fetch it
110 * from the pg_subscription catalog (see column subtwophasestate).
111 *
112 * Finally, to avoid problems mentioned in previous paragraphs from any
113 * subsequent (not READY) tablesyncs (need to toggle two_phase option from 'on'
114 * to 'off' and then again back to 'on') there is a restriction for
115 * ALTER SUBSCRIPTION REFRESH PUBLICATION. This command is not permitted when
116 * the two_phase tri-state is ENABLED, except when copy_data = false.
117 *
118 * We can get prepare of the same GID more than once for the genuine cases
119 * where we have defined multiple subscriptions for publications on the same
120 * server and prepared transaction has operations on tables subscribed to those
121 * subscriptions. For such cases, if we use the GID sent by publisher one of
122 * the prepares will be successful and others will fail, in which case the
123 * server will send them again. Now, this can lead to a deadlock if user has
124 * set synchronous_standby_names for all the subscriptions on subscriber. To
125 * avoid such deadlocks, we generate a unique GID (consisting of the
126 * subscription oid and the xid of the prepared transaction) for each prepare
127 * transaction on the subscriber.
128 *
129 * FAILOVER
130 * ----------------------
131 * The logical slot on the primary can be synced to the standby by specifying
132 * failover = true when creating the subscription. Enabling failover allows us
133 * to smoothly transition to the promoted standby, ensuring that we can
134 * subscribe to the new primary without losing any data.
135 *
136 * RETAIN DEAD TUPLES
137 * ----------------------
138 * Each apply worker that enabled retain_dead_tuples option maintains a
139 * non-removable transaction ID (oldest_nonremovable_xid) in shared memory to
140 * prevent dead rows from being removed prematurely when the apply worker still
141 * needs them to detect update_deleted conflicts. Additionally, this helps to
142 * retain the required commit_ts module information, which further helps to
143 * detect update_origin_differs and delete_origin_differs conflicts reliably, as
144 * otherwise, vacuum freeze could remove the required information.
145 *
146 * The logical replication launcher manages an internal replication slot named
147 * "pg_conflict_detection". It asynchronously aggregates the non-removable
148 * transaction ID from all apply workers to determine the appropriate xmin for
149 * the slot, thereby retaining necessary tuples.
150 *
151 * The non-removable transaction ID in the apply worker is advanced to the
152 * oldest running transaction ID once all concurrent transactions on the
153 * publisher have been applied and flushed locally. The process involves:
154 *
155 * - RDT_GET_CANDIDATE_XID:
156 * Call GetOldestActiveTransactionId() to take oldestRunningXid as the
157 * candidate xid.
158 *
159 * - RDT_REQUEST_PUBLISHER_STATUS:
160 * Send a message to the walsender requesting the publisher status, which
161 * includes the latest WAL write position and information about transactions
162 * that are in the commit phase.
163 *
164 * - RDT_WAIT_FOR_PUBLISHER_STATUS:
165 * Wait for the status from the walsender. After receiving the first status,
166 * do not proceed if there are concurrent remote transactions that are still
167 * in the commit phase. These transactions might have been assigned an
168 * earlier commit timestamp but have not yet written the commit WAL record.
169 * Continue to request the publisher status (RDT_REQUEST_PUBLISHER_STATUS)
170 * until all these transactions have completed.
171 *
172 * - RDT_WAIT_FOR_LOCAL_FLUSH:
173 * Advance the non-removable transaction ID if the current flush location has
174 * reached or surpassed the last received WAL position.
175 *
176 * - RDT_STOP_CONFLICT_INFO_RETENTION:
177 * This phase is required only when max_retention_duration is defined. We
178 * enter this phase if the wait time in either the
179 * RDT_WAIT_FOR_PUBLISHER_STATUS or RDT_WAIT_FOR_LOCAL_FLUSH phase exceeds
180 * configured max_retention_duration. In this phase,
181 * pg_subscription.subretentionactive is updated to false within a new
182 * transaction, and oldest_nonremovable_xid is set to InvalidTransactionId.
183 *
184 * - RDT_RESUME_CONFLICT_INFO_RETENTION:
185 * This phase is required only when max_retention_duration is defined. We
186 * enter this phase if the retention was previously stopped, and the time
187 * required to advance the non-removable transaction ID in the
188 * RDT_WAIT_FOR_LOCAL_FLUSH phase has decreased to within acceptable limits
189 * (or if max_retention_duration is set to 0). During this phase,
190 * pg_subscription.subretentionactive is updated to true within a new
191 * transaction, and the worker will be restarted.
192 *
193 * The overall state progression is: GET_CANDIDATE_XID ->
194 * REQUEST_PUBLISHER_STATUS -> WAIT_FOR_PUBLISHER_STATUS -> (loop to
195 * REQUEST_PUBLISHER_STATUS till concurrent remote transactions end) ->
196 * WAIT_FOR_LOCAL_FLUSH -> loop back to GET_CANDIDATE_XID.
197 *
198 * Retaining the dead tuples for this period is sufficient for ensuring
199 * eventual consistency using last-update-wins strategy, as dead tuples are
200 * useful for detecting conflicts only during the application of concurrent
201 * transactions from remote nodes. After applying and flushing all remote
202 * transactions that occurred concurrently with the tuple DELETE, any
203 * subsequent UPDATE from a remote node should have a later timestamp. In such
204 * cases, it is acceptable to detect an update_missing scenario and convert the
205 * UPDATE to an INSERT when applying it. But, for concurrent remote
206 * transactions with earlier timestamps than the DELETE, detecting
207 * update_deleted is necessary, as the UPDATEs in remote transactions should be
208 * ignored if their timestamp is earlier than that of the dead tuples.
209 *
210 * Note that advancing the non-removable transaction ID is not supported if the
211 * publisher is also a physical standby. This is because the logical walsender
212 * on the standby can only get the WAL replay position but there may be more
213 * WALs that are being replicated from the primary and those WALs could have
214 * earlier commit timestamp.
215 *
216 * Similarly, when the publisher has subscribed to another publisher,
217 * information necessary for conflict detection cannot be retained for
218 * changes from origins other than the publisher. This is because publisher
219 * lacks the information on concurrent transactions of other publishers to
220 * which it subscribes. As the information on concurrent transactions is
221 * unavailable beyond subscriber's immediate publishers, the non-removable
222 * transaction ID might be advanced prematurely before changes from other
223 * origins have been fully applied.
224 *
225 * XXX Retaining information for changes from other origins might be possible
226 * by requesting the subscription on that origin to enable retain_dead_tuples
227 * and fetching the conflict detection slot.xmin along with the publisher's
228 * status. In the RDT_WAIT_FOR_PUBLISHER_STATUS phase, the apply worker could
229 * wait for the remote slot's xmin to reach the oldest active transaction ID,
230 * ensuring that all transactions from other origins have been applied on the
231 * publisher, thereby getting the latest WAL position that includes all
232 * concurrent changes. However, this approach may impact performance, so it
233 * might not worth the effort.
234 *
235 * XXX It seems feasible to get the latest commit's WAL location from the
236 * publisher and wait till that is applied. However, we can't do that
237 * because commit timestamps can regress as a commit with a later LSN is not
238 * guaranteed to have a later timestamp than those with earlier LSNs. Having
239 * said that, even if that is possible, it won't improve performance much as
240 * the apply always lag and moves slowly as compared with the transactions
241 * on the publisher.
242 *-------------------------------------------------------------------------
243 */
244
245#include "postgres.h"
246
247#include <sys/stat.h>
248#include <unistd.h>
249
250#include "access/commit_ts.h"
251#include "access/table.h"
252#include "access/tableam.h"
253#include "access/twophase.h"
254#include "access/xact.h"
255#include "catalog/indexing.h"
256#include "catalog/pg_inherits.h"
260#include "commands/tablecmds.h"
261#include "commands/trigger.h"
262#include "executor/executor.h"
264#include "libpq/pqformat.h"
265#include "miscadmin.h"
266#include "optimizer/optimizer.h"
268#include "pgstat.h"
269#include "postmaster/bgworker.h"
270#include "postmaster/interrupt.h"
271#include "postmaster/walwriter.h"
272#include "replication/conflict.h"
277#include "replication/origin.h"
278#include "replication/slot.h"
282#include "storage/buffile.h"
283#include "storage/ipc.h"
284#include "storage/lmgr.h"
285#include "storage/procarray.h"
286#include "tcop/tcopprot.h"
287#include "utils/acl.h"
288#include "utils/guc.h"
289#include "utils/inval.h"
290#include "utils/lsyscache.h"
291#include "utils/memutils.h"
292#include "utils/pg_lsn.h"
293#include "utils/rel.h"
294#include "utils/rls.h"
295#include "utils/snapmgr.h"
296#include "utils/syscache.h"
297#include "utils/usercontext.h"
298
299#define NAPTIME_PER_CYCLE 1000 /* max sleep time between cycles (1s) */
300
301typedef struct FlushPosition
302{
307
309
310typedef struct ApplyExecutionData
311{
312 EState *estate; /* executor state, used to track resources */
313
314 LogicalRepRelMapEntry *targetRel; /* replication target rel */
315 ResultRelInfo *targetRelInfo; /* ResultRelInfo for same */
316
317 /* These fields are used when the target relation is partitioned: */
318 ModifyTableState *mtstate; /* dummy ModifyTable state */
319 PartitionTupleRouting *proute; /* partition routing info */
321
322/* Struct for saving and restoring apply errcontext information */
324{
325 LogicalRepMsgType command; /* 0 if invalid */
327
328 /* Remote node information */
329 int remote_attnum; /* -1 if invalid */
334
335/*
336 * The action to be taken for the changes in the transaction.
337 *
338 * TRANS_LEADER_APPLY:
339 * This action means that we are in the leader apply worker or table sync
340 * worker. The changes of the transaction are either directly applied or
341 * are read from temporary files (for streaming transactions) and then
342 * applied by the worker.
343 *
344 * TRANS_LEADER_SERIALIZE:
345 * This action means that we are in the leader apply worker or table sync
346 * worker. Changes are written to temporary files and then applied when the
347 * final commit arrives.
348 *
349 * TRANS_LEADER_SEND_TO_PARALLEL:
350 * This action means that we are in the leader apply worker and need to send
351 * the changes to the parallel apply worker.
352 *
353 * TRANS_LEADER_PARTIAL_SERIALIZE:
354 * This action means that we are in the leader apply worker and have sent some
355 * changes directly to the parallel apply worker and the remaining changes are
356 * serialized to a file, due to timeout while sending data. The parallel apply
357 * worker will apply these serialized changes when the final commit arrives.
358 *
359 * We can't use TRANS_LEADER_SERIALIZE for this case because, in addition to
360 * serializing changes, the leader worker also needs to serialize the
361 * STREAM_XXX message to a file, and wait for the parallel apply worker to
362 * finish the transaction when processing the transaction finish command. So
363 * this new action was introduced to keep the code and logic clear.
364 *
365 * TRANS_PARALLEL_APPLY:
366 * This action means that we are in the parallel apply worker and changes of
367 * the transaction are applied directly by the worker.
368 */
369typedef enum
370{
371 /* The action for non-streaming transactions. */
373
374 /* Actions for streaming transactions. */
380
381/*
382 * The phases involved in advancing the non-removable transaction ID.
383 *
384 * See comments atop worker.c for details of the transition between these
385 * phases.
386 */
387typedef enum
388{
396
397/*
398 * Critical information for managing phase transitions within the
399 * RetainDeadTuplesPhase.
400 */
402{
403 RetainDeadTuplesPhase phase; /* current phase */
404 XLogRecPtr remote_lsn; /* WAL write position on the publisher */
405
406 /*
407 * Oldest transaction ID that was in the commit phase on the publisher.
408 * Use FullTransactionId to prevent issues with transaction ID wraparound,
409 * where a new remote_oldestxid could falsely appear to originate from the
410 * past and block advancement.
411 */
413
414 /*
415 * Next transaction ID to be assigned on the publisher. Use
416 * FullTransactionId for consistency and to allow straightforward
417 * comparisons with remote_oldestxid.
418 */
420
421 TimestampTz reply_time; /* when the publisher responds with status */
422
423 /*
424 * Publisher transaction ID that must be awaited to complete before
425 * entering the final phase (RDT_WAIT_FOR_LOCAL_FLUSH). Use
426 * FullTransactionId for the same reason as remote_nextxid.
427 */
429
430 TransactionId candidate_xid; /* candidate for the non-removable
431 * transaction ID */
432 TimestampTz flushpos_update_time; /* when the remote flush position was
433 * updated in final phase
434 * (RDT_WAIT_FOR_LOCAL_FLUSH) */
435
436 long table_sync_wait_time; /* time spent waiting for table sync
437 * to finish */
438
439 /*
440 * The following fields are used to determine the timing for the next
441 * round of transaction ID advancement.
442 */
443 TimestampTz last_recv_time; /* when the last message was received */
444 TimestampTz candidate_xid_time; /* when the candidate_xid is decided */
445 int xid_advance_interval; /* how much time (ms) to wait before
446 * attempting to advance the
447 * non-removable transaction ID */
449
450/*
451 * The minimum (100ms) and maximum (3 minutes) intervals for advancing
452 * non-removable transaction IDs. The maximum interval is a bit arbitrary but
453 * is sufficient to not cause any undue network traffic.
454 */
455#define MIN_XID_ADVANCE_INTERVAL 100
456#define MAX_XID_ADVANCE_INTERVAL 180000
457
458/* errcontext tracker */
460{
461 .command = 0,
462 .rel = NULL,
463 .remote_attnum = -1,
464 .remote_xid = InvalidTransactionId,
465 .finish_lsn = InvalidXLogRecPtr,
466 .origin_name = NULL,
467};
468
470
473
474/* per stream context for streaming transactions */
476
478
480static bool MySubscriptionValid = false;
481
483
486
487/* fields valid only when processing streamed transaction */
488static bool in_streamed_transaction = false;
489
491
492/*
493 * The number of changes applied by parallel apply worker during one streaming
494 * block.
495 */
497
498/* Are we initializing an apply worker? */
500
501/*
502 * We enable skipping all data modification changes (INSERT, UPDATE, etc.) for
503 * the subscription if the remote transaction's finish LSN matches the subskiplsn.
504 * Once we start skipping changes, we don't stop it until we skip all changes of
505 * the transaction even if pg_subscription is updated and MySubscription->skiplsn
506 * gets changed or reset during that. Also, in streaming transaction cases (streaming = on),
507 * we don't skip receiving and spooling the changes since we decide whether or not
508 * to skip applying the changes when starting to apply changes. The subskiplsn is
509 * cleared after successfully skipping the transaction or applying non-empty
510 * transaction. The latter prevents the mistakenly specified subskiplsn from
511 * being left. Note that we cannot skip the streaming transactions when using
512 * parallel apply workers because we cannot get the finish LSN before applying
513 * the changes. So, we don't start parallel apply worker when finish LSN is set
514 * by the user.
515 */
517#define is_skipping_changes() (unlikely(XLogRecPtrIsValid(skip_xact_finish_lsn)))
518
519/* BufFile handle of the current streaming file */
520static BufFile *stream_fd = NULL;
521
522/*
523 * The remote WAL position that has been applied and flushed locally. We record
524 * and use this information both while sending feedback to the server and
525 * advancing oldest_nonremovable_xid.
526 */
528
529typedef struct SubXactInfo
530{
531 TransactionId xid; /* XID of the subxact */
532 int fileno; /* file number in the buffile */
533 off_t offset; /* offset in the file */
535
536/* Sub-transaction data for the current streaming transaction */
537typedef struct ApplySubXactData
538{
539 uint32 nsubxacts; /* number of sub-transactions */
540 uint32 nsubxacts_max; /* current capacity of subxacts */
541 TransactionId subxact_last; /* xid of the last sub-transaction */
542 SubXactInfo *subxacts; /* sub-xact offset in changes file */
544
546
547static inline void subxact_filename(char *path, Oid subid, TransactionId xid);
548static inline void changes_filename(char *path, Oid subid, TransactionId xid);
549
550/*
551 * Information about subtransactions of a given toplevel transaction.
552 */
553static void subxact_info_write(Oid subid, TransactionId xid);
554static void subxact_info_read(Oid subid, TransactionId xid);
555static void subxact_info_add(TransactionId xid);
556static inline void cleanup_subxact_info(void);
557
558/*
559 * Serialize and deserialize changes for a toplevel transaction.
560 */
561static void stream_open_file(Oid subid, TransactionId xid,
562 bool first_segment);
563static void stream_write_change(char action, StringInfo s);
565static void stream_close_file(void);
566
567static void send_feedback(XLogRecPtr recvpos, bool force, bool requestReply);
568
570 bool status_received);
573 bool status_received);
574static void get_candidate_xid(RetainDeadTuplesData *rdt_data);
577 bool status_received);
578static void wait_for_local_flush(RetainDeadTuplesData *rdt_data);
582static bool update_retention_status(bool active);
585 bool new_xid_found);
586
587static void apply_worker_exit(void);
588
589static void apply_handle_commit_internal(LogicalRepCommitData *commit_data);
591 ResultRelInfo *relinfo,
592 TupleTableSlot *remoteslot);
594 ResultRelInfo *relinfo,
595 TupleTableSlot *remoteslot,
596 LogicalRepTupleData *newtup,
597 Oid localindexoid);
599 ResultRelInfo *relinfo,
600 TupleTableSlot *remoteslot,
601 Oid localindexoid);
602static bool FindReplTupleInLocalRel(ApplyExecutionData *edata, Relation localrel,
603 LogicalRepRelation *remoterel,
604 Oid localidxoid,
605 TupleTableSlot *remoteslot,
606 TupleTableSlot **localslot);
607static bool FindDeletedTupleInLocalRel(Relation localrel,
608 Oid localidxoid,
609 TupleTableSlot *remoteslot,
610 TransactionId *delete_xid,
611 RepOriginId *delete_origin,
612 TimestampTz *delete_time);
614 TupleTableSlot *remoteslot,
615 LogicalRepTupleData *newtup,
616 CmdType operation);
617
618/* Functions for skipping changes */
619static void maybe_start_skipping_changes(XLogRecPtr finish_lsn);
620static void stop_skipping_changes(void);
621static void clear_subscription_skip_lsn(XLogRecPtr finish_lsn);
622
623/* Functions for apply error callback */
624static inline void set_apply_error_context_xact(TransactionId xid, XLogRecPtr lsn);
625static inline void reset_apply_error_context_info(void);
626
629
630static void replorigin_reset(int code, Datum arg);
631
632/*
633 * Form the origin name for the subscription.
634 *
635 * This is a common function for tablesync and other workers. Tablesync workers
636 * must pass a valid relid. Other callers must pass relid = InvalidOid.
637 *
638 * Return the name in the supplied buffer.
639 */
640void
642 char *originname, Size szoriginname)
643{
644 if (OidIsValid(relid))
645 {
646 /* Replication origin name for tablesync workers. */
647 snprintf(originname, szoriginname, "pg_%u_%u", suboid, relid);
648 }
649 else
650 {
651 /* Replication origin name for non-tablesync workers. */
652 snprintf(originname, szoriginname, "pg_%u", suboid);
653 }
654}
655
656/*
657 * Should this worker apply changes for given relation.
658 *
659 * This is mainly needed for initial relation data sync as that runs in
660 * separate worker process running in parallel and we need some way to skip
661 * changes coming to the leader apply worker during the sync of a table.
662 *
663 * Note we need to do smaller or equals comparison for SYNCDONE state because
664 * it might hold position of end of initial slot consistent point WAL
665 * record + 1 (ie start of next record) and next record can be COMMIT of
666 * transaction we are now processing (which is what we set remote_final_lsn
667 * to in apply_handle_begin).
668 *
669 * Note that for streaming transactions that are being applied in the parallel
670 * apply worker, we disallow applying changes if the target table in the
671 * subscription is not in the READY state, because we cannot decide whether to
672 * apply the change as we won't know remote_final_lsn by that time.
673 *
674 * We already checked this in pa_can_start() before assigning the
675 * streaming transaction to the parallel worker, but it also needs to be
676 * checked here because if the user executes ALTER SUBSCRIPTION ... REFRESH
677 * PUBLICATION in parallel, the new table can be added to pg_subscription_rel
678 * while applying this transaction.
679 */
680static bool
682{
683 switch (MyLogicalRepWorker->type)
684 {
686 return MyLogicalRepWorker->relid == rel->localreloid;
687
689 /* We don't synchronize rel's that are in unknown state. */
690 if (rel->state != SUBREL_STATE_READY &&
691 rel->state != SUBREL_STATE_UNKNOWN)
693 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
694 errmsg("logical replication parallel apply worker for subscription \"%s\" will stop",
696 errdetail("Cannot handle streamed replication transactions using parallel apply workers until all tables have been synchronized.")));
697
698 return rel->state == SUBREL_STATE_READY;
699
700 case WORKERTYPE_APPLY:
701 return (rel->state == SUBREL_STATE_READY ||
702 (rel->state == SUBREL_STATE_SYNCDONE &&
703 rel->statelsn <= remote_final_lsn));
704
706 /* Should never happen. */
707 elog(ERROR, "sequence synchronization worker is not expected to apply changes");
708 break;
709
711 /* Should never happen. */
712 elog(ERROR, "Unknown worker type");
713 }
714
715 return false; /* dummy for compiler */
716}
717
718/*
719 * Begin one step (one INSERT, UPDATE, etc) of a replication transaction.
720 *
721 * Start a transaction, if this is the first step (else we keep using the
722 * existing transaction).
723 * Also provide a global snapshot and ensure we run in ApplyMessageContext.
724 */
725static void
727{
729
730 if (!IsTransactionState())
731 {
734 }
735
737
739}
740
741/*
742 * Finish up one step of a replication transaction.
743 * Callers of begin_replication_step() must also call this.
744 *
745 * We don't close out the transaction here, but we should increment
746 * the command counter to make the effects of this step visible.
747 */
748static void
750{
752
754}
755
756/*
757 * Handle streamed transactions for both the leader apply worker and the
758 * parallel apply workers.
759 *
760 * In the streaming case (receiving a block of the streamed transaction), for
761 * serialize mode, simply redirect it to a file for the proper toplevel
762 * transaction, and for parallel mode, the leader apply worker will send the
763 * changes to parallel apply workers and the parallel apply worker will define
764 * savepoints if needed. (LOGICAL_REP_MSG_RELATION or LOGICAL_REP_MSG_TYPE
765 * messages will be applied by both leader apply worker and parallel apply
766 * workers).
767 *
768 * Returns true for streamed transactions (when the change is either serialized
769 * to file or sent to parallel apply worker), false otherwise (regular mode or
770 * needs to be processed by parallel apply worker).
771 *
772 * Exception: If the message being processed is LOGICAL_REP_MSG_RELATION
773 * or LOGICAL_REP_MSG_TYPE, return false even if the message needs to be sent
774 * to a parallel apply worker.
775 */
776static bool
778{
779 TransactionId current_xid;
781 TransApplyAction apply_action;
782 StringInfoData original_msg;
783
784 apply_action = get_transaction_apply_action(stream_xid, &winfo);
785
786 /* not in streaming mode */
787 if (apply_action == TRANS_LEADER_APPLY)
788 return false;
789
791
792 /*
793 * The parallel apply worker needs the xid in this message to decide
794 * whether to define a savepoint, so save the original message that has
795 * not moved the cursor after the xid. We will serialize this message to a
796 * file in PARTIAL_SERIALIZE mode.
797 */
798 original_msg = *s;
799
800 /*
801 * We should have received XID of the subxact as the first part of the
802 * message, so extract it.
803 */
804 current_xid = pq_getmsgint(s, 4);
805
806 if (!TransactionIdIsValid(current_xid))
808 (errcode(ERRCODE_PROTOCOL_VIOLATION),
809 errmsg_internal("invalid transaction ID in streamed replication transaction")));
810
811 switch (apply_action)
812 {
815
816 /* Add the new subxact to the array (unless already there). */
817 subxact_info_add(current_xid);
818
819 /* Write the change to the current file */
821 return true;
822
824 Assert(winfo);
825
826 /*
827 * XXX The publisher side doesn't always send relation/type update
828 * messages after the streaming transaction, so also update the
829 * relation/type in leader apply worker. See function
830 * cleanup_rel_sync_cache.
831 */
832 if (pa_send_data(winfo, s->len, s->data))
833 return (action != LOGICAL_REP_MSG_RELATION &&
835
836 /*
837 * Switch to serialize mode when we are not able to send the
838 * change to parallel apply worker.
839 */
840 pa_switch_to_partial_serialize(winfo, false);
841
842 /* fall through */
844 stream_write_change(action, &original_msg);
845
846 /* Same reason as TRANS_LEADER_SEND_TO_PARALLEL case. */
847 return (action != LOGICAL_REP_MSG_RELATION &&
849
852
853 /* Define a savepoint for a subxact if needed. */
854 pa_start_subtrans(current_xid, stream_xid);
855 return false;
856
857 default:
858 elog(ERROR, "unexpected apply action: %d", (int) apply_action);
859 return false; /* silence compiler warning */
860 }
861}
862
863/*
864 * Executor state preparation for evaluation of constraint expressions,
865 * indexes and triggers for the specified relation.
866 *
867 * Note that the caller must open and close any indexes to be updated.
868 */
869static ApplyExecutionData *
871{
872 ApplyExecutionData *edata;
873 EState *estate;
874 RangeTblEntry *rte;
875 List *perminfos = NIL;
876 ResultRelInfo *resultRelInfo;
877
878 edata = (ApplyExecutionData *) palloc0(sizeof(ApplyExecutionData));
879 edata->targetRel = rel;
880
881 edata->estate = estate = CreateExecutorState();
882
883 rte = makeNode(RangeTblEntry);
884 rte->rtekind = RTE_RELATION;
885 rte->relid = RelationGetRelid(rel->localrel);
886 rte->relkind = rel->localrel->rd_rel->relkind;
887 rte->rellockmode = AccessShareLock;
888
889 addRTEPermissionInfo(&perminfos, rte);
890
891 ExecInitRangeTable(estate, list_make1(rte), perminfos,
893
894 edata->targetRelInfo = resultRelInfo = makeNode(ResultRelInfo);
895
896 /*
897 * Use Relation opened by logicalrep_rel_open() instead of opening it
898 * again.
899 */
900 InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
901
902 /*
903 * We put the ResultRelInfo in the es_opened_result_relations list, even
904 * though we don't populate the es_result_relations array. That's a bit
905 * bogus, but it's enough to make ExecGetTriggerResultRel() find them.
906 *
907 * ExecOpenIndices() is not called here either, each execution path doing
908 * an apply operation being responsible for that.
909 */
911 lappend(estate->es_opened_result_relations, resultRelInfo);
912
913 estate->es_output_cid = GetCurrentCommandId(true);
914
915 /* Prepare to catch AFTER triggers. */
917
918 /* other fields of edata remain NULL for now */
919
920 return edata;
921}
922
923/*
924 * Finish any operations related to the executor state created by
925 * create_edata_for_relation().
926 */
927static void
929{
930 EState *estate = edata->estate;
931
932 /* Handle any queued AFTER triggers. */
933 AfterTriggerEndQuery(estate);
934
935 /* Shut down tuple routing, if any was done. */
936 if (edata->proute)
937 ExecCleanupTupleRouting(edata->mtstate, edata->proute);
938
939 /*
940 * Cleanup. It might seem that we should call ExecCloseResultRelations()
941 * here, but we intentionally don't. It would close the rel we added to
942 * es_opened_result_relations above, which is wrong because we took no
943 * corresponding refcount. We rely on ExecCleanupTupleRouting() to close
944 * any other relations opened during execution.
945 */
946 ExecResetTupleTable(estate->es_tupleTable, false);
947 FreeExecutorState(estate);
948 pfree(edata);
949}
950
951/*
952 * Executes default values for columns for which we can't map to remote
953 * relation columns.
954 *
955 * This allows us to support tables which have more columns on the downstream
956 * than on the upstream.
957 */
958static void
960 TupleTableSlot *slot)
961{
963 int num_phys_attrs = desc->natts;
964 int i;
965 int attnum,
966 num_defaults = 0;
967 int *defmap;
968 ExprState **defexprs;
969 ExprContext *econtext;
970
971 econtext = GetPerTupleExprContext(estate);
972
973 /* We got all the data via replication, no need to evaluate anything. */
974 if (num_phys_attrs == rel->remoterel.natts)
975 return;
976
977 defmap = (int *) palloc(num_phys_attrs * sizeof(int));
978 defexprs = (ExprState **) palloc(num_phys_attrs * sizeof(ExprState *));
979
980 Assert(rel->attrmap->maplen == num_phys_attrs);
981 for (attnum = 0; attnum < num_phys_attrs; attnum++)
982 {
984 Expr *defexpr;
985
986 if (cattr->attisdropped || cattr->attgenerated)
987 continue;
988
989 if (rel->attrmap->attnums[attnum] >= 0)
990 continue;
991
992 defexpr = (Expr *) build_column_default(rel->localrel, attnum + 1);
993
994 if (defexpr != NULL)
995 {
996 /* Run the expression through planner */
997 defexpr = expression_planner(defexpr);
998
999 /* Initialize executable expression in copycontext */
1000 defexprs[num_defaults] = ExecInitExpr(defexpr, NULL);
1001 defmap[num_defaults] = attnum;
1002 num_defaults++;
1003 }
1004 }
1005
1006 for (i = 0; i < num_defaults; i++)
1007 slot->tts_values[defmap[i]] =
1008 ExecEvalExpr(defexprs[i], econtext, &slot->tts_isnull[defmap[i]]);
1009}
1010
1011/*
1012 * Store tuple data into slot.
1013 *
1014 * Incoming data can be either text or binary format.
1015 */
1016static void
1018 LogicalRepTupleData *tupleData)
1019{
1020 int natts = slot->tts_tupleDescriptor->natts;
1021 int i;
1022
1023 ExecClearTuple(slot);
1024
1025 /* Call the "in" function for each non-dropped, non-null attribute */
1026 Assert(natts == rel->attrmap->maplen);
1027 for (i = 0; i < natts; i++)
1028 {
1030 int remoteattnum = rel->attrmap->attnums[i];
1031
1032 if (!att->attisdropped && remoteattnum >= 0)
1033 {
1034 StringInfo colvalue = &tupleData->colvalues[remoteattnum];
1035
1036 Assert(remoteattnum < tupleData->ncols);
1037
1038 /* Set attnum for error callback */
1040
1041 if (tupleData->colstatus[remoteattnum] == LOGICALREP_COLUMN_TEXT)
1042 {
1043 Oid typinput;
1044 Oid typioparam;
1045
1046 getTypeInputInfo(att->atttypid, &typinput, &typioparam);
1047 slot->tts_values[i] =
1048 OidInputFunctionCall(typinput, colvalue->data,
1049 typioparam, att->atttypmod);
1050 slot->tts_isnull[i] = false;
1051 }
1052 else if (tupleData->colstatus[remoteattnum] == LOGICALREP_COLUMN_BINARY)
1053 {
1054 Oid typreceive;
1055 Oid typioparam;
1056
1057 /*
1058 * In some code paths we may be asked to re-parse the same
1059 * tuple data. Reset the StringInfo's cursor so that works.
1060 */
1061 colvalue->cursor = 0;
1062
1063 getTypeBinaryInputInfo(att->atttypid, &typreceive, &typioparam);
1064 slot->tts_values[i] =
1065 OidReceiveFunctionCall(typreceive, colvalue,
1066 typioparam, att->atttypmod);
1067
1068 /* Trouble if it didn't eat the whole buffer */
1069 if (colvalue->cursor != colvalue->len)
1070 ereport(ERROR,
1071 (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
1072 errmsg("incorrect binary data format in logical replication column %d",
1073 remoteattnum + 1)));
1074 slot->tts_isnull[i] = false;
1075 }
1076 else
1077 {
1078 /*
1079 * NULL value from remote. (We don't expect to see
1080 * LOGICALREP_COLUMN_UNCHANGED here, but if we do, treat it as
1081 * NULL.)
1082 */
1083 slot->tts_values[i] = (Datum) 0;
1084 slot->tts_isnull[i] = true;
1085 }
1086
1087 /* Reset attnum for error callback */
1089 }
1090 else
1091 {
1092 /*
1093 * We assign NULL to dropped attributes and missing values
1094 * (missing values should be later filled using
1095 * slot_fill_defaults).
1096 */
1097 slot->tts_values[i] = (Datum) 0;
1098 slot->tts_isnull[i] = true;
1099 }
1100 }
1101
1103}
1104
1105/*
1106 * Replace updated columns with data from the LogicalRepTupleData struct.
1107 * This is somewhat similar to heap_modify_tuple but also calls the type
1108 * input functions on the user data.
1109 *
1110 * "slot" is filled with a copy of the tuple in "srcslot", replacing
1111 * columns provided in "tupleData" and leaving others as-is.
1112 *
1113 * Caution: unreplaced pass-by-ref columns in "slot" will point into the
1114 * storage for "srcslot". This is OK for current usage, but someday we may
1115 * need to materialize "slot" at the end to make it independent of "srcslot".
1116 */
1117static void
1120 LogicalRepTupleData *tupleData)
1121{
1122 int natts = slot->tts_tupleDescriptor->natts;
1123 int i;
1124
1125 /* We'll fill "slot" with a virtual tuple, so we must start with ... */
1126 ExecClearTuple(slot);
1127
1128 /*
1129 * Copy all the column data from srcslot, so that we'll have valid values
1130 * for unreplaced columns.
1131 */
1132 Assert(natts == srcslot->tts_tupleDescriptor->natts);
1133 slot_getallattrs(srcslot);
1134 memcpy(slot->tts_values, srcslot->tts_values, natts * sizeof(Datum));
1135 memcpy(slot->tts_isnull, srcslot->tts_isnull, natts * sizeof(bool));
1136
1137 /* Call the "in" function for each replaced attribute */
1138 Assert(natts == rel->attrmap->maplen);
1139 for (i = 0; i < natts; i++)
1140 {
1142 int remoteattnum = rel->attrmap->attnums[i];
1143
1144 if (remoteattnum < 0)
1145 continue;
1146
1147 Assert(remoteattnum < tupleData->ncols);
1148
1149 if (tupleData->colstatus[remoteattnum] != LOGICALREP_COLUMN_UNCHANGED)
1150 {
1151 StringInfo colvalue = &tupleData->colvalues[remoteattnum];
1152
1153 /* Set attnum for error callback */
1155
1156 if (tupleData->colstatus[remoteattnum] == LOGICALREP_COLUMN_TEXT)
1157 {
1158 Oid typinput;
1159 Oid typioparam;
1160
1161 getTypeInputInfo(att->atttypid, &typinput, &typioparam);
1162 slot->tts_values[i] =
1163 OidInputFunctionCall(typinput, colvalue->data,
1164 typioparam, att->atttypmod);
1165 slot->tts_isnull[i] = false;
1166 }
1167 else if (tupleData->colstatus[remoteattnum] == LOGICALREP_COLUMN_BINARY)
1168 {
1169 Oid typreceive;
1170 Oid typioparam;
1171
1172 /*
1173 * In some code paths we may be asked to re-parse the same
1174 * tuple data. Reset the StringInfo's cursor so that works.
1175 */
1176 colvalue->cursor = 0;
1177
1178 getTypeBinaryInputInfo(att->atttypid, &typreceive, &typioparam);
1179 slot->tts_values[i] =
1180 OidReceiveFunctionCall(typreceive, colvalue,
1181 typioparam, att->atttypmod);
1182
1183 /* Trouble if it didn't eat the whole buffer */
1184 if (colvalue->cursor != colvalue->len)
1185 ereport(ERROR,
1186 (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
1187 errmsg("incorrect binary data format in logical replication column %d",
1188 remoteattnum + 1)));
1189 slot->tts_isnull[i] = false;
1190 }
1191 else
1192 {
1193 /* must be LOGICALREP_COLUMN_NULL */
1194 slot->tts_values[i] = (Datum) 0;
1195 slot->tts_isnull[i] = true;
1196 }
1197
1198 /* Reset attnum for error callback */
1200 }
1201 }
1202
1203 /* And finally, declare that "slot" contains a valid virtual tuple */
1205}
1206
1207/*
1208 * Handle BEGIN message.
1209 */
1210static void
1212{
1213 LogicalRepBeginData begin_data;
1214
1215 /* There must not be an active streaming transaction. */
1217
1218 logicalrep_read_begin(s, &begin_data);
1219 set_apply_error_context_xact(begin_data.xid, begin_data.final_lsn);
1220
1221 remote_final_lsn = begin_data.final_lsn;
1222
1224
1225 in_remote_transaction = true;
1226
1228}
1229
1230/*
1231 * Handle COMMIT message.
1232 *
1233 * TODO, support tracking of multiple origins
1234 */
1235static void
1237{
1238 LogicalRepCommitData commit_data;
1239
1240 logicalrep_read_commit(s, &commit_data);
1241
1242 if (commit_data.commit_lsn != remote_final_lsn)
1243 ereport(ERROR,
1244 (errcode(ERRCODE_PROTOCOL_VIOLATION),
1245 errmsg_internal("incorrect commit LSN %X/%08X in commit message (expected %X/%08X)",
1246 LSN_FORMAT_ARGS(commit_data.commit_lsn),
1248
1249 apply_handle_commit_internal(&commit_data);
1250
1251 /*
1252 * Process any tables that are being synchronized in parallel, as well as
1253 * any newly added tables or sequences.
1254 */
1255 ProcessSyncingRelations(commit_data.end_lsn);
1256
1259}
1260
1261/*
1262 * Handle BEGIN PREPARE message.
1263 */
1264static void
1266{
1267 LogicalRepPreparedTxnData begin_data;
1268
1269 /* Tablesync should never receive prepare. */
1270 if (am_tablesync_worker())
1271 ereport(ERROR,
1272 (errcode(ERRCODE_PROTOCOL_VIOLATION),
1273 errmsg_internal("tablesync worker received a BEGIN PREPARE message")));
1274
1275 /* There must not be an active streaming transaction. */
1277
1278 logicalrep_read_begin_prepare(s, &begin_data);
1279 set_apply_error_context_xact(begin_data.xid, begin_data.prepare_lsn);
1280
1281 remote_final_lsn = begin_data.prepare_lsn;
1282
1284
1285 in_remote_transaction = true;
1286
1288}
1289
1290/*
1291 * Common function to prepare the GID.
1292 */
1293static void
1295{
1296 char gid[GIDSIZE];
1297
1298 /*
1299 * Compute unique GID for two_phase transactions. We don't use GID of
1300 * prepared transaction sent by server as that can lead to deadlock when
1301 * we have multiple subscriptions from same node point to publications on
1302 * the same node. See comments atop worker.c
1303 */
1305 gid, sizeof(gid));
1306
1307 /*
1308 * BeginTransactionBlock is necessary to balance the EndTransactionBlock
1309 * called within the PrepareTransactionBlock below.
1310 */
1311 if (!IsTransactionBlock())
1312 {
1314 CommitTransactionCommand(); /* Completes the preceding Begin command. */
1315 }
1316
1317 /*
1318 * Update origin state so we can restart streaming from correct position
1319 * in case of crash.
1320 */
1321 replorigin_session_origin_lsn = prepare_data->end_lsn;
1323
1325}
1326
1327/*
1328 * Handle PREPARE message.
1329 */
1330static void
1332{
1333 LogicalRepPreparedTxnData prepare_data;
1334
1335 logicalrep_read_prepare(s, &prepare_data);
1336
1337 if (prepare_data.prepare_lsn != remote_final_lsn)
1338 ereport(ERROR,
1339 (errcode(ERRCODE_PROTOCOL_VIOLATION),
1340 errmsg_internal("incorrect prepare LSN %X/%08X in prepare message (expected %X/%08X)",
1341 LSN_FORMAT_ARGS(prepare_data.prepare_lsn),
1343
1344 /*
1345 * Unlike commit, here, we always prepare the transaction even though no
1346 * change has happened in this transaction or all changes are skipped. It
1347 * is done this way because at commit prepared time, we won't know whether
1348 * we have skipped preparing a transaction because of those reasons.
1349 *
1350 * XXX, We can optimize such that at commit prepared time, we first check
1351 * whether we have prepared the transaction or not but that doesn't seem
1352 * worthwhile because such cases shouldn't be common.
1353 */
1355
1356 apply_handle_prepare_internal(&prepare_data);
1357
1360 pgstat_report_stat(false);
1361
1362 /*
1363 * It is okay not to set the local_end LSN for the prepare because we
1364 * always flush the prepare record. So, we can send the acknowledgment of
1365 * the remote_end LSN as soon as prepare is finished.
1366 *
1367 * XXX For the sake of consistency with commit, we could have set it with
1368 * the LSN of prepare but as of now we don't track that value similar to
1369 * XactLastCommitEnd, and adding it for this purpose doesn't seems worth
1370 * it.
1371 */
1373
1374 in_remote_transaction = false;
1375
1376 /*
1377 * Process any tables that are being synchronized in parallel, as well as
1378 * any newly added tables or sequences.
1379 */
1380 ProcessSyncingRelations(prepare_data.end_lsn);
1381
1382 /*
1383 * Since we have already prepared the transaction, in a case where the
1384 * server crashes before clearing the subskiplsn, it will be left but the
1385 * transaction won't be resent. But that's okay because it's a rare case
1386 * and the subskiplsn will be cleared when finishing the next transaction.
1387 */
1390
1393}
1394
1395/*
1396 * Handle a COMMIT PREPARED of a previously PREPARED transaction.
1397 *
1398 * Note that we don't need to wait here if the transaction was prepared in a
1399 * parallel apply worker. In that case, we have already waited for the prepare
1400 * to finish in apply_handle_stream_prepare() which will ensure all the
1401 * operations in that transaction have happened in the subscriber, so no
1402 * concurrent transaction can cause deadlock or transaction dependency issues.
1403 */
1404static void
1406{
1408 char gid[GIDSIZE];
1409
1410 logicalrep_read_commit_prepared(s, &prepare_data);
1411 set_apply_error_context_xact(prepare_data.xid, prepare_data.commit_lsn);
1412
1413 /* Compute GID for two_phase transactions. */
1415 gid, sizeof(gid));
1416
1417 /* There is no transaction when COMMIT PREPARED is called */
1419
1420 /*
1421 * Update origin state so we can restart streaming from correct position
1422 * in case of crash.
1423 */
1426
1427 FinishPreparedTransaction(gid, true);
1430 pgstat_report_stat(false);
1431
1433 in_remote_transaction = false;
1434
1435 /*
1436 * Process any tables that are being synchronized in parallel, as well as
1437 * any newly added tables or sequences.
1438 */
1439 ProcessSyncingRelations(prepare_data.end_lsn);
1440
1442
1445}
1446
1447/*
1448 * Handle a ROLLBACK PREPARED of a previously PREPARED TRANSACTION.
1449 *
1450 * Note that we don't need to wait here if the transaction was prepared in a
1451 * parallel apply worker. In that case, we have already waited for the prepare
1452 * to finish in apply_handle_stream_prepare() which will ensure all the
1453 * operations in that transaction have happened in the subscriber, so no
1454 * concurrent transaction can cause deadlock or transaction dependency issues.
1455 */
1456static void
1458{
1460 char gid[GIDSIZE];
1461
1462 logicalrep_read_rollback_prepared(s, &rollback_data);
1463 set_apply_error_context_xact(rollback_data.xid, rollback_data.rollback_end_lsn);
1464
1465 /* Compute GID for two_phase transactions. */
1467 gid, sizeof(gid));
1468
1469 /*
1470 * It is possible that we haven't received prepare because it occurred
1471 * before walsender reached a consistent point or the two_phase was still
1472 * not enabled by that time, so in such cases, we need to skip rollback
1473 * prepared.
1474 */
1475 if (LookupGXact(gid, rollback_data.prepare_end_lsn,
1476 rollback_data.prepare_time))
1477 {
1478 /*
1479 * Update origin state so we can restart streaming from correct
1480 * position in case of crash.
1481 */
1484
1485 /* There is no transaction when ABORT/ROLLBACK PREPARED is called */
1487 FinishPreparedTransaction(gid, false);
1490
1492 }
1493
1494 pgstat_report_stat(false);
1495
1496 /*
1497 * It is okay not to set the local_end LSN for the rollback of prepared
1498 * transaction because we always flush the WAL record for it. See
1499 * apply_handle_prepare.
1500 */
1502 in_remote_transaction = false;
1503
1504 /*
1505 * Process any tables that are being synchronized in parallel, as well as
1506 * any newly added tables or sequences.
1507 */
1509
1512}
1513
1514/*
1515 * Handle STREAM PREPARE.
1516 */
1517static void
1519{
1520 LogicalRepPreparedTxnData prepare_data;
1522 TransApplyAction apply_action;
1523
1524 /* Save the message before it is consumed. */
1525 StringInfoData original_msg = *s;
1526
1528 ereport(ERROR,
1529 (errcode(ERRCODE_PROTOCOL_VIOLATION),
1530 errmsg_internal("STREAM PREPARE message without STREAM STOP")));
1531
1532 /* Tablesync should never receive prepare. */
1533 if (am_tablesync_worker())
1534 ereport(ERROR,
1535 (errcode(ERRCODE_PROTOCOL_VIOLATION),
1536 errmsg_internal("tablesync worker received a STREAM PREPARE message")));
1537
1538 logicalrep_read_stream_prepare(s, &prepare_data);
1539 set_apply_error_context_xact(prepare_data.xid, prepare_data.prepare_lsn);
1540
1541 apply_action = get_transaction_apply_action(prepare_data.xid, &winfo);
1542
1543 switch (apply_action)
1544 {
1545 case TRANS_LEADER_APPLY:
1546
1547 /*
1548 * The transaction has been serialized to file, so replay all the
1549 * spooled operations.
1550 */
1552 prepare_data.xid, prepare_data.prepare_lsn);
1553
1554 /* Mark the transaction as prepared. */
1555 apply_handle_prepare_internal(&prepare_data);
1556
1558
1559 /*
1560 * It is okay not to set the local_end LSN for the prepare because
1561 * we always flush the prepare record. See apply_handle_prepare.
1562 */
1564
1565 in_remote_transaction = false;
1566
1567 /* Unlink the files with serialized changes and subxact info. */
1569
1570 elog(DEBUG1, "finished processing the STREAM PREPARE command");
1571 break;
1572
1574 Assert(winfo);
1575
1576 if (pa_send_data(winfo, s->len, s->data))
1577 {
1578 /* Finish processing the streaming transaction. */
1579 pa_xact_finish(winfo, prepare_data.end_lsn);
1580 break;
1581 }
1582
1583 /*
1584 * Switch to serialize mode when we are not able to send the
1585 * change to parallel apply worker.
1586 */
1587 pa_switch_to_partial_serialize(winfo, true);
1588
1589 /* fall through */
1591 Assert(winfo);
1592
1593 stream_open_and_write_change(prepare_data.xid,
1595 &original_msg);
1596
1598
1599 /* Finish processing the streaming transaction. */
1600 pa_xact_finish(winfo, prepare_data.end_lsn);
1601 break;
1602
1604
1605 /*
1606 * If the parallel apply worker is applying spooled messages then
1607 * close the file before preparing.
1608 */
1609 if (stream_fd)
1611
1613
1614 /* Mark the transaction as prepared. */
1615 apply_handle_prepare_internal(&prepare_data);
1616
1618
1620
1621 /*
1622 * It is okay not to set the local_end LSN for the prepare because
1623 * we always flush the prepare record. See apply_handle_prepare.
1624 */
1626
1629
1631
1632 elog(DEBUG1, "finished processing the STREAM PREPARE command");
1633 break;
1634
1635 default:
1636 elog(ERROR, "unexpected apply action: %d", (int) apply_action);
1637 break;
1638 }
1639
1640 pgstat_report_stat(false);
1641
1642 /*
1643 * Process any tables that are being synchronized in parallel, as well as
1644 * any newly added tables or sequences.
1645 */
1646 ProcessSyncingRelations(prepare_data.end_lsn);
1647
1648 /*
1649 * Similar to prepare case, the subskiplsn could be left in a case of
1650 * server crash but it's okay. See the comments in apply_handle_prepare().
1651 */
1654
1656
1658}
1659
1660/*
1661 * Handle ORIGIN message.
1662 *
1663 * TODO, support tracking of multiple origins
1664 */
1665static void
1667{
1668 /*
1669 * ORIGIN message can only come inside streaming transaction or inside
1670 * remote transaction and before any actual writes.
1671 */
1675 ereport(ERROR,
1676 (errcode(ERRCODE_PROTOCOL_VIOLATION),
1677 errmsg_internal("ORIGIN message sent out of order")));
1678}
1679
1680/*
1681 * Initialize fileset (if not already done).
1682 *
1683 * Create a new file when first_segment is true, otherwise open the existing
1684 * file.
1685 */
1686void
1687stream_start_internal(TransactionId xid, bool first_segment)
1688{
1690
1691 /*
1692 * Initialize the worker's stream_fileset if we haven't yet. This will be
1693 * used for the entire duration of the worker so create it in a permanent
1694 * context. We create this on the very first streaming message from any
1695 * transaction and then use it for this and other streaming transactions.
1696 * Now, we could create a fileset at the start of the worker as well but
1697 * then we won't be sure that it will ever be used.
1698 */
1700 {
1701 MemoryContext oldctx;
1702
1704
1707
1708 MemoryContextSwitchTo(oldctx);
1709 }
1710
1711 /* Open the spool file for this transaction. */
1712 stream_open_file(MyLogicalRepWorker->subid, xid, first_segment);
1713
1714 /* If this is not the first segment, open existing subxact file. */
1715 if (!first_segment)
1717
1719}
1720
1721/*
1722 * Handle STREAM START message.
1723 */
1724static void
1726{
1727 bool first_segment;
1729 TransApplyAction apply_action;
1730
1731 /* Save the message before it is consumed. */
1732 StringInfoData original_msg = *s;
1733
1735 ereport(ERROR,
1736 (errcode(ERRCODE_PROTOCOL_VIOLATION),
1737 errmsg_internal("duplicate STREAM START message")));
1738
1739 /* There must not be an active streaming transaction. */
1741
1742 /* notify handle methods we're processing a remote transaction */
1744
1745 /* extract XID of the top-level transaction */
1746 stream_xid = logicalrep_read_stream_start(s, &first_segment);
1747
1749 ereport(ERROR,
1750 (errcode(ERRCODE_PROTOCOL_VIOLATION),
1751 errmsg_internal("invalid transaction ID in streamed replication transaction")));
1752
1754
1755 /* Try to allocate a worker for the streaming transaction. */
1756 if (first_segment)
1758
1759 apply_action = get_transaction_apply_action(stream_xid, &winfo);
1760
1761 switch (apply_action)
1762 {
1764
1765 /*
1766 * Function stream_start_internal starts a transaction. This
1767 * transaction will be committed on the stream stop unless it is a
1768 * tablesync worker in which case it will be committed after
1769 * processing all the messages. We need this transaction for
1770 * handling the BufFile, used for serializing the streaming data
1771 * and subxact info.
1772 */
1773 stream_start_internal(stream_xid, first_segment);
1774 break;
1775
1777 Assert(winfo);
1778
1779 /*
1780 * Once we start serializing the changes, the parallel apply
1781 * worker will wait for the leader to release the stream lock
1782 * until the end of the transaction. So, we don't need to release
1783 * the lock or increment the stream count in that case.
1784 */
1785 if (pa_send_data(winfo, s->len, s->data))
1786 {
1787 /*
1788 * Unlock the shared object lock so that the parallel apply
1789 * worker can continue to receive changes.
1790 */
1791 if (!first_segment)
1793
1794 /*
1795 * Increment the number of streaming blocks waiting to be
1796 * processed by parallel apply worker.
1797 */
1799
1800 /* Cache the parallel apply worker for this transaction. */
1802 break;
1803 }
1804
1805 /*
1806 * Switch to serialize mode when we are not able to send the
1807 * change to parallel apply worker.
1808 */
1809 pa_switch_to_partial_serialize(winfo, !first_segment);
1810
1811 /* fall through */
1813 Assert(winfo);
1814
1815 /*
1816 * Open the spool file unless it was already opened when switching
1817 * to serialize mode. The transaction started in
1818 * stream_start_internal will be committed on the stream stop.
1819 */
1820 if (apply_action != TRANS_LEADER_SEND_TO_PARALLEL)
1821 stream_start_internal(stream_xid, first_segment);
1822
1824
1825 /* Cache the parallel apply worker for this transaction. */
1827 break;
1828
1830 if (first_segment)
1831 {
1832 /* Hold the lock until the end of the transaction. */
1835
1836 /*
1837 * Signal the leader apply worker, as it may be waiting for
1838 * us.
1839 */
1842 }
1843
1845 break;
1846
1847 default:
1848 elog(ERROR, "unexpected apply action: %d", (int) apply_action);
1849 break;
1850 }
1851
1853}
1854
1855/*
1856 * Update the information about subxacts and close the file.
1857 *
1858 * This function should be called when the stream_start_internal function has
1859 * been called.
1860 */
1861void
1863{
1864 /*
1865 * Serialize information about subxacts for the toplevel transaction, then
1866 * close the stream messages spool file.
1867 */
1870
1871 /* We must be in a valid transaction state */
1873
1874 /* Commit the per-stream transaction */
1876
1877 /* Reset per-stream context */
1879}
1880
1881/*
1882 * Handle STREAM STOP message.
1883 */
1884static void
1886{
1888 TransApplyAction apply_action;
1889
1891 ereport(ERROR,
1892 (errcode(ERRCODE_PROTOCOL_VIOLATION),
1893 errmsg_internal("STREAM STOP message without STREAM START")));
1894
1895 apply_action = get_transaction_apply_action(stream_xid, &winfo);
1896
1897 switch (apply_action)
1898 {
1901 break;
1902
1904 Assert(winfo);
1905
1906 /*
1907 * Lock before sending the STREAM_STOP message so that the leader
1908 * can hold the lock first and the parallel apply worker will wait
1909 * for leader to release the lock. See Locking Considerations atop
1910 * applyparallelworker.c.
1911 */
1913
1914 if (pa_send_data(winfo, s->len, s->data))
1915 {
1917 break;
1918 }
1919
1920 /*
1921 * Switch to serialize mode when we are not able to send the
1922 * change to parallel apply worker.
1923 */
1924 pa_switch_to_partial_serialize(winfo, true);
1925
1926 /* fall through */
1931 break;
1932
1934 elog(DEBUG1, "applied %u changes in the streaming chunk",
1936
1937 /*
1938 * By the time parallel apply worker is processing the changes in
1939 * the current streaming block, the leader apply worker may have
1940 * sent multiple streaming blocks. This can lead to parallel apply
1941 * worker start waiting even when there are more chunk of streams
1942 * in the queue. So, try to lock only if there is no message left
1943 * in the queue. See Locking Considerations atop
1944 * applyparallelworker.c.
1945 *
1946 * Note that here we have a race condition where we can start
1947 * waiting even when there are pending streaming chunks. This can
1948 * happen if the leader sends another streaming block and acquires
1949 * the stream lock again after the parallel apply worker checks
1950 * that there is no pending streaming block and before it actually
1951 * starts waiting on a lock. We can handle this case by not
1952 * allowing the leader to increment the stream block count during
1953 * the time parallel apply worker acquires the lock but it is not
1954 * clear whether that is worth the complexity.
1955 *
1956 * Now, if this missed chunk contains rollback to savepoint, then
1957 * there is a risk of deadlock which probably shouldn't happen
1958 * after restart.
1959 */
1961 break;
1962
1963 default:
1964 elog(ERROR, "unexpected apply action: %d", (int) apply_action);
1965 break;
1966 }
1967
1970
1971 /*
1972 * The parallel apply worker could be in a transaction in which case we
1973 * need to report the state as STATE_IDLEINTRANSACTION.
1974 */
1977 else
1979
1981}
1982
1983/*
1984 * Helper function to handle STREAM ABORT message when the transaction was
1985 * serialized to file.
1986 */
1987static void
1989{
1990 /*
1991 * If the two XIDs are the same, it's in fact abort of toplevel xact, so
1992 * just delete the files with serialized info.
1993 */
1994 if (xid == subxid)
1996 else
1997 {
1998 /*
1999 * OK, so it's a subxact. We need to read the subxact file for the
2000 * toplevel transaction, determine the offset tracked for the subxact,
2001 * and truncate the file with changes. We also remove the subxacts
2002 * with higher offsets (or rather higher XIDs).
2003 *
2004 * We intentionally scan the array from the tail, because we're likely
2005 * aborting a change for the most recent subtransactions.
2006 *
2007 * We can't use the binary search here as subxact XIDs won't
2008 * necessarily arrive in sorted order, consider the case where we have
2009 * released the savepoint for multiple subtransactions and then
2010 * performed rollback to savepoint for one of the earlier
2011 * sub-transaction.
2012 */
2013 int64 i;
2014 int64 subidx;
2015 BufFile *fd;
2016 bool found = false;
2017 char path[MAXPGPATH];
2018
2019 subidx = -1;
2022
2023 for (i = subxact_data.nsubxacts; i > 0; i--)
2024 {
2025 if (subxact_data.subxacts[i - 1].xid == subxid)
2026 {
2027 subidx = (i - 1);
2028 found = true;
2029 break;
2030 }
2031 }
2032
2033 /*
2034 * If it's an empty sub-transaction then we will not find the subxid
2035 * here so just cleanup the subxact info and return.
2036 */
2037 if (!found)
2038 {
2039 /* Cleanup the subxact info */
2043 return;
2044 }
2045
2046 /* open the changes file */
2049 O_RDWR, false);
2050
2051 /* OK, truncate the file at the right offset */
2053 subxact_data.subxacts[subidx].offset);
2055
2056 /* discard the subxacts added later */
2057 subxact_data.nsubxacts = subidx;
2058
2059 /* write the updated subxact list */
2061
2064 }
2065}
2066
2067/*
2068 * Handle STREAM ABORT message.
2069 */
2070static void
2072{
2073 TransactionId xid;
2074 TransactionId subxid;
2075 LogicalRepStreamAbortData abort_data;
2077 TransApplyAction apply_action;
2078
2079 /* Save the message before it is consumed. */
2080 StringInfoData original_msg = *s;
2081 bool toplevel_xact;
2082
2084 ereport(ERROR,
2085 (errcode(ERRCODE_PROTOCOL_VIOLATION),
2086 errmsg_internal("STREAM ABORT message without STREAM STOP")));
2087
2088 /* We receive abort information only when we can apply in parallel. */
2089 logicalrep_read_stream_abort(s, &abort_data,
2091
2092 xid = abort_data.xid;
2093 subxid = abort_data.subxid;
2094 toplevel_xact = (xid == subxid);
2095
2096 set_apply_error_context_xact(subxid, abort_data.abort_lsn);
2097
2098 apply_action = get_transaction_apply_action(xid, &winfo);
2099
2100 switch (apply_action)
2101 {
2102 case TRANS_LEADER_APPLY:
2103
2104 /*
2105 * We are in the leader apply worker and the transaction has been
2106 * serialized to file.
2107 */
2108 stream_abort_internal(xid, subxid);
2109
2110 elog(DEBUG1, "finished processing the STREAM ABORT command");
2111 break;
2112
2114 Assert(winfo);
2115
2116 /*
2117 * For the case of aborting the subtransaction, we increment the
2118 * number of streaming blocks and take the lock again before
2119 * sending the STREAM_ABORT to ensure that the parallel apply
2120 * worker will wait on the lock for the next set of changes after
2121 * processing the STREAM_ABORT message if it is not already
2122 * waiting for STREAM_STOP message.
2123 *
2124 * It is important to perform this locking before sending the
2125 * STREAM_ABORT message so that the leader can hold the lock first
2126 * and the parallel apply worker will wait for the leader to
2127 * release the lock. This is the same as what we do in
2128 * apply_handle_stream_stop. See Locking Considerations atop
2129 * applyparallelworker.c.
2130 */
2131 if (!toplevel_xact)
2132 {
2136 }
2137
2138 if (pa_send_data(winfo, s->len, s->data))
2139 {
2140 /*
2141 * Unlike STREAM_COMMIT and STREAM_PREPARE, we don't need to
2142 * wait here for the parallel apply worker to finish as that
2143 * is not required to maintain the commit order and won't have
2144 * the risk of failures due to transaction dependencies and
2145 * deadlocks. However, it is possible that before the parallel
2146 * worker finishes and we clear the worker info, the xid
2147 * wraparound happens on the upstream and a new transaction
2148 * with the same xid can appear and that can lead to duplicate
2149 * entries in ParallelApplyTxnHash. Yet another problem could
2150 * be that we may have serialized the changes in partial
2151 * serialize mode and the file containing xact changes may
2152 * already exist, and after xid wraparound trying to create
2153 * the file for the same xid can lead to an error. To avoid
2154 * these problems, we decide to wait for the aborts to finish.
2155 *
2156 * Note, it is okay to not update the flush location position
2157 * for aborts as in worst case that means such a transaction
2158 * won't be sent again after restart.
2159 */
2160 if (toplevel_xact)
2162
2163 break;
2164 }
2165
2166 /*
2167 * Switch to serialize mode when we are not able to send the
2168 * change to parallel apply worker.
2169 */
2170 pa_switch_to_partial_serialize(winfo, true);
2171
2172 /* fall through */
2174 Assert(winfo);
2175
2176 /*
2177 * Parallel apply worker might have applied some changes, so write
2178 * the STREAM_ABORT message so that it can rollback the
2179 * subtransaction if needed.
2180 */
2182 &original_msg);
2183
2184 if (toplevel_xact)
2185 {
2188 }
2189 break;
2190
2192
2193 /*
2194 * If the parallel apply worker is applying spooled messages then
2195 * close the file before aborting.
2196 */
2197 if (toplevel_xact && stream_fd)
2199
2200 pa_stream_abort(&abort_data);
2201
2202 /*
2203 * We need to wait after processing rollback to savepoint for the
2204 * next set of changes.
2205 *
2206 * We have a race condition here due to which we can start waiting
2207 * here when there are more chunk of streams in the queue. See
2208 * apply_handle_stream_stop.
2209 */
2210 if (!toplevel_xact)
2212
2213 elog(DEBUG1, "finished processing the STREAM ABORT command");
2214 break;
2215
2216 default:
2217 elog(ERROR, "unexpected apply action: %d", (int) apply_action);
2218 break;
2219 }
2220
2222}
2223
2224/*
2225 * Ensure that the passed location is fileset's end.
2226 */
2227static void
2228ensure_last_message(FileSet *stream_fileset, TransactionId xid, int fileno,
2229 off_t offset)
2230{
2231 char path[MAXPGPATH];
2232 BufFile *fd;
2233 int last_fileno;
2234 off_t last_offset;
2235
2237
2239
2241
2242 fd = BufFileOpenFileSet(stream_fileset, path, O_RDONLY, false);
2243
2244 BufFileSeek(fd, 0, 0, SEEK_END);
2245 BufFileTell(fd, &last_fileno, &last_offset);
2246
2248
2250
2251 if (last_fileno != fileno || last_offset != offset)
2252 elog(ERROR, "unexpected message left in streaming transaction's changes file \"%s\"",
2253 path);
2254}
2255
2256/*
2257 * Common spoolfile processing.
2258 */
2259void
2261 XLogRecPtr lsn)
2262{
2263 int nchanges;
2264 char path[MAXPGPATH];
2265 char *buffer = NULL;
2266 MemoryContext oldcxt;
2267 ResourceOwner oldowner;
2268 int fileno;
2269 off_t offset;
2270
2273
2274 /* Make sure we have an open transaction */
2276
2277 /*
2278 * Allocate file handle and memory required to process all the messages in
2279 * TopTransactionContext to avoid them getting reset after each message is
2280 * processed.
2281 */
2283
2284 /* Open the spool file for the committed/prepared transaction */
2286 elog(DEBUG1, "replaying changes from file \"%s\"", path);
2287
2288 /*
2289 * Make sure the file is owned by the toplevel transaction so that the
2290 * file will not be accidentally closed when aborting a subtransaction.
2291 */
2292 oldowner = CurrentResourceOwner;
2294
2295 stream_fd = BufFileOpenFileSet(stream_fileset, path, O_RDONLY, false);
2296
2297 CurrentResourceOwner = oldowner;
2298
2299 buffer = palloc(BLCKSZ);
2300
2301 MemoryContextSwitchTo(oldcxt);
2302
2303 remote_final_lsn = lsn;
2304
2305 /*
2306 * Make sure the handle apply_dispatch methods are aware we're in a remote
2307 * transaction.
2308 */
2309 in_remote_transaction = true;
2311
2313
2314 /*
2315 * Read the entries one by one and pass them through the same logic as in
2316 * apply_dispatch.
2317 */
2318 nchanges = 0;
2319 while (true)
2320 {
2322 size_t nbytes;
2323 int len;
2324
2326
2327 /* read length of the on-disk record */
2328 nbytes = BufFileReadMaybeEOF(stream_fd, &len, sizeof(len), true);
2329
2330 /* have we reached end of the file? */
2331 if (nbytes == 0)
2332 break;
2333
2334 /* do we have a correct length? */
2335 if (len <= 0)
2336 elog(ERROR, "incorrect length %d in streaming transaction's changes file \"%s\"",
2337 len, path);
2338
2339 /* make sure we have sufficiently large buffer */
2340 buffer = repalloc(buffer, len);
2341
2342 /* and finally read the data into the buffer */
2343 BufFileReadExact(stream_fd, buffer, len);
2344
2345 BufFileTell(stream_fd, &fileno, &offset);
2346
2347 /* init a stringinfo using the buffer and call apply_dispatch */
2348 initReadOnlyStringInfo(&s2, buffer, len);
2349
2350 /* Ensure we are reading the data into our memory context. */
2352
2354
2356
2357 MemoryContextSwitchTo(oldcxt);
2358
2359 nchanges++;
2360
2361 /*
2362 * It is possible the file has been closed because we have processed
2363 * the transaction end message like stream_commit in which case that
2364 * must be the last message.
2365 */
2366 if (!stream_fd)
2367 {
2368 ensure_last_message(stream_fileset, xid, fileno, offset);
2369 break;
2370 }
2371
2372 if (nchanges % 1000 == 0)
2373 elog(DEBUG1, "replayed %d changes from file \"%s\"",
2374 nchanges, path);
2375 }
2376
2377 if (stream_fd)
2379
2380 elog(DEBUG1, "replayed %d (all) changes from file \"%s\"",
2381 nchanges, path);
2382
2383 return;
2384}
2385
2386/*
2387 * Handle STREAM COMMIT message.
2388 */
2389static void
2391{
2392 TransactionId xid;
2393 LogicalRepCommitData commit_data;
2395 TransApplyAction apply_action;
2396
2397 /* Save the message before it is consumed. */
2398 StringInfoData original_msg = *s;
2399
2401 ereport(ERROR,
2402 (errcode(ERRCODE_PROTOCOL_VIOLATION),
2403 errmsg_internal("STREAM COMMIT message without STREAM STOP")));
2404
2405 xid = logicalrep_read_stream_commit(s, &commit_data);
2406 set_apply_error_context_xact(xid, commit_data.commit_lsn);
2407
2408 apply_action = get_transaction_apply_action(xid, &winfo);
2409
2410 switch (apply_action)
2411 {
2412 case TRANS_LEADER_APPLY:
2413
2414 /*
2415 * The transaction has been serialized to file, so replay all the
2416 * spooled operations.
2417 */
2419 commit_data.commit_lsn);
2420
2421 apply_handle_commit_internal(&commit_data);
2422
2423 /* Unlink the files with serialized changes and subxact info. */
2425
2426 elog(DEBUG1, "finished processing the STREAM COMMIT command");
2427 break;
2428
2430 Assert(winfo);
2431
2432 if (pa_send_data(winfo, s->len, s->data))
2433 {
2434 /* Finish processing the streaming transaction. */
2435 pa_xact_finish(winfo, commit_data.end_lsn);
2436 break;
2437 }
2438
2439 /*
2440 * Switch to serialize mode when we are not able to send the
2441 * change to parallel apply worker.
2442 */
2443 pa_switch_to_partial_serialize(winfo, true);
2444
2445 /* fall through */
2447 Assert(winfo);
2448
2450 &original_msg);
2451
2453
2454 /* Finish processing the streaming transaction. */
2455 pa_xact_finish(winfo, commit_data.end_lsn);
2456 break;
2457
2459
2460 /*
2461 * If the parallel apply worker is applying spooled messages then
2462 * close the file before committing.
2463 */
2464 if (stream_fd)
2466
2467 apply_handle_commit_internal(&commit_data);
2468
2470
2471 /*
2472 * It is important to set the transaction state as finished before
2473 * releasing the lock. See pa_wait_for_xact_finish.
2474 */
2477
2479
2480 elog(DEBUG1, "finished processing the STREAM COMMIT command");
2481 break;
2482
2483 default:
2484 elog(ERROR, "unexpected apply action: %d", (int) apply_action);
2485 break;
2486 }
2487
2488 /*
2489 * Process any tables that are being synchronized in parallel, as well as
2490 * any newly added tables or sequences.
2491 */
2492 ProcessSyncingRelations(commit_data.end_lsn);
2493
2495
2497}
2498
2499/*
2500 * Helper function for apply_handle_commit and apply_handle_stream_commit.
2501 */
2502static void
2504{
2505 if (is_skipping_changes())
2506 {
2508
2509 /*
2510 * Start a new transaction to clear the subskiplsn, if not started
2511 * yet.
2512 */
2513 if (!IsTransactionState())
2515 }
2516
2517 if (IsTransactionState())
2518 {
2519 /*
2520 * The transaction is either non-empty or skipped, so we clear the
2521 * subskiplsn.
2522 */
2524
2525 /*
2526 * Update origin state so we can restart streaming from correct
2527 * position in case of crash.
2528 */
2531
2533
2534 if (IsTransactionBlock())
2535 {
2536 EndTransactionBlock(false);
2538 }
2539
2540 pgstat_report_stat(false);
2541
2543 }
2544 else
2545 {
2546 /* Process any invalidation messages that might have accumulated. */
2549 }
2550
2551 in_remote_transaction = false;
2552}
2553
2554/*
2555 * Handle RELATION message.
2556 *
2557 * Note we don't do validation against local schema here. The validation
2558 * against local schema is postponed until first change for given relation
2559 * comes as we only care about it when applying changes for it anyway and we
2560 * do less locking this way.
2561 */
2562static void
2564{
2565 LogicalRepRelation *rel;
2566
2568 return;
2569
2570 rel = logicalrep_read_rel(s);
2572
2573 /* Also reset all entries in the partition map that refer to remoterel. */
2575}
2576
2577/*
2578 * Handle TYPE message.
2579 *
2580 * This implementation pays no attention to TYPE messages; we expect the user
2581 * to have set things up so that the incoming data is acceptable to the input
2582 * functions for the locally subscribed tables. Hence, we just read and
2583 * discard the message.
2584 */
2585static void
2587{
2588 LogicalRepTyp typ;
2589
2591 return;
2592
2593 logicalrep_read_typ(s, &typ);
2594}
2595
2596/*
2597 * Check that we (the subscription owner) have sufficient privileges on the
2598 * target relation to perform the given operation.
2599 */
2600static void
2602{
2603 Oid relid;
2604 AclResult aclresult;
2605
2606 relid = RelationGetRelid(rel);
2607 aclresult = pg_class_aclcheck(relid, GetUserId(), mode);
2608 if (aclresult != ACLCHECK_OK)
2609 aclcheck_error(aclresult,
2610 get_relkind_objtype(rel->rd_rel->relkind),
2611 get_rel_name(relid));
2612
2613 /*
2614 * We lack the infrastructure to honor RLS policies. It might be possible
2615 * to add such infrastructure here, but tablesync workers lack it, too, so
2616 * we don't bother. RLS does not ordinarily apply to TRUNCATE commands,
2617 * but it seems dangerous to replicate a TRUNCATE and then refuse to
2618 * replicate subsequent INSERTs, so we forbid all commands the same.
2619 */
2620 if (check_enable_rls(relid, InvalidOid, false) == RLS_ENABLED)
2621 ereport(ERROR,
2622 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
2623 errmsg("user \"%s\" cannot replicate into relation with row-level security enabled: \"%s\"",
2626}
2627
2628/*
2629 * Handle INSERT message.
2630 */
2631
2632static void
2634{
2636 LogicalRepTupleData newtup;
2637 LogicalRepRelId relid;
2638 UserContext ucxt;
2639 ApplyExecutionData *edata;
2640 EState *estate;
2641 TupleTableSlot *remoteslot;
2642 MemoryContext oldctx;
2643 bool run_as_owner;
2644
2645 /*
2646 * Quick return if we are skipping data modification changes or handling
2647 * streamed transactions.
2648 */
2649 if (is_skipping_changes() ||
2651 return;
2652
2654
2655 relid = logicalrep_read_insert(s, &newtup);
2658 {
2659 /*
2660 * The relation can't become interesting in the middle of the
2661 * transaction so it's safe to unlock it.
2662 */
2665 return;
2666 }
2667
2668 /*
2669 * Make sure that any user-supplied code runs as the table owner, unless
2670 * the user has opted out of that behavior.
2671 */
2672 run_as_owner = MySubscription->runasowner;
2673 if (!run_as_owner)
2674 SwitchToUntrustedUser(rel->localrel->rd_rel->relowner, &ucxt);
2675
2676 /* Set relation for error callback */
2678
2679 /* Initialize the executor state. */
2680 edata = create_edata_for_relation(rel);
2681 estate = edata->estate;
2682 remoteslot = ExecInitExtraTupleSlot(estate,
2684 &TTSOpsVirtual);
2685
2686 /* Process and store remote tuple in the slot */
2688 slot_store_data(remoteslot, rel, &newtup);
2689 slot_fill_defaults(rel, estate, remoteslot);
2690 MemoryContextSwitchTo(oldctx);
2691
2692 /* For a partitioned table, insert the tuple into a partition. */
2693 if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
2695 remoteslot, NULL, CMD_INSERT);
2696 else
2697 {
2698 ResultRelInfo *relinfo = edata->targetRelInfo;
2699
2700 ExecOpenIndices(relinfo, false);
2701 apply_handle_insert_internal(edata, relinfo, remoteslot);
2702 ExecCloseIndices(relinfo);
2703 }
2704
2705 finish_edata(edata);
2706
2707 /* Reset relation for error callback */
2709
2710 if (!run_as_owner)
2711 RestoreUserContext(&ucxt);
2712
2714
2716}
2717
2718/*
2719 * Workhorse for apply_handle_insert()
2720 * relinfo is for the relation we're actually inserting into
2721 * (could be a child partition of edata->targetRelInfo)
2722 */
2723static void
2725 ResultRelInfo *relinfo,
2726 TupleTableSlot *remoteslot)
2727{
2728 EState *estate = edata->estate;
2729
2730 /* Caller should have opened indexes already. */
2731 Assert(relinfo->ri_IndexRelationDescs != NULL ||
2732 !relinfo->ri_RelationDesc->rd_rel->relhasindex ||
2734
2735 /* Caller will not have done this bit. */
2737 InitConflictIndexes(relinfo);
2738
2739 /* Do the insert. */
2741 ExecSimpleRelationInsert(relinfo, estate, remoteslot);
2742}
2743
2744/*
2745 * Check if the logical replication relation is updatable and throw
2746 * appropriate error if it isn't.
2747 */
2748static void
2750{
2751 /*
2752 * For partitioned tables, we only need to care if the target partition is
2753 * updatable (aka has PK or RI defined for it).
2754 */
2755 if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
2756 return;
2757
2758 /* Updatable, no error. */
2759 if (rel->updatable)
2760 return;
2761
2762 /*
2763 * We are in error mode so it's fine this is somewhat slow. It's better to
2764 * give user correct error.
2765 */
2767 {
2768 ereport(ERROR,
2769 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
2770 errmsg("publisher did not send replica identity column "
2771 "expected by the logical replication target relation \"%s.%s\"",
2772 rel->remoterel.nspname, rel->remoterel.relname)));
2773 }
2774
2775 ereport(ERROR,
2776 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
2777 errmsg("logical replication target relation \"%s.%s\" has "
2778 "neither REPLICA IDENTITY index nor PRIMARY "
2779 "KEY and published relation does not have "
2780 "REPLICA IDENTITY FULL",
2781 rel->remoterel.nspname, rel->remoterel.relname)));
2782}
2783
2784/*
2785 * Handle UPDATE message.
2786 *
2787 * TODO: FDW support
2788 */
2789static void
2791{
2793 LogicalRepRelId relid;
2794 UserContext ucxt;
2795 ApplyExecutionData *edata;
2796 EState *estate;
2797 LogicalRepTupleData oldtup;
2798 LogicalRepTupleData newtup;
2799 bool has_oldtup;
2800 TupleTableSlot *remoteslot;
2801 RTEPermissionInfo *target_perminfo;
2802 MemoryContext oldctx;
2803 bool run_as_owner;
2804
2805 /*
2806 * Quick return if we are skipping data modification changes or handling
2807 * streamed transactions.
2808 */
2809 if (is_skipping_changes() ||
2811 return;
2812
2814
2815 relid = logicalrep_read_update(s, &has_oldtup, &oldtup,
2816 &newtup);
2819 {
2820 /*
2821 * The relation can't become interesting in the middle of the
2822 * transaction so it's safe to unlock it.
2823 */
2826 return;
2827 }
2828
2829 /* Set relation for error callback */
2831
2832 /* Check if we can do the update. */
2834
2835 /*
2836 * Make sure that any user-supplied code runs as the table owner, unless
2837 * the user has opted out of that behavior.
2838 */
2839 run_as_owner = MySubscription->runasowner;
2840 if (!run_as_owner)
2841 SwitchToUntrustedUser(rel->localrel->rd_rel->relowner, &ucxt);
2842
2843 /* Initialize the executor state. */
2844 edata = create_edata_for_relation(rel);
2845 estate = edata->estate;
2846 remoteslot = ExecInitExtraTupleSlot(estate,
2848 &TTSOpsVirtual);
2849
2850 /*
2851 * Populate updatedCols so that per-column triggers can fire, and so
2852 * executor can correctly pass down indexUnchanged hint. This could
2853 * include more columns than were actually changed on the publisher
2854 * because the logical replication protocol doesn't contain that
2855 * information. But it would for example exclude columns that only exist
2856 * on the subscriber, since we are not touching those.
2857 */
2858 target_perminfo = list_nth(estate->es_rteperminfos, 0);
2859 for (int i = 0; i < remoteslot->tts_tupleDescriptor->natts; i++)
2860 {
2862 int remoteattnum = rel->attrmap->attnums[i];
2863
2864 if (!att->attisdropped && remoteattnum >= 0)
2865 {
2866 Assert(remoteattnum < newtup.ncols);
2867 if (newtup.colstatus[remoteattnum] != LOGICALREP_COLUMN_UNCHANGED)
2868 target_perminfo->updatedCols =
2869 bms_add_member(target_perminfo->updatedCols,
2871 }
2872 }
2873
2874 /* Build the search tuple. */
2876 slot_store_data(remoteslot, rel,
2877 has_oldtup ? &oldtup : &newtup);
2878 MemoryContextSwitchTo(oldctx);
2879
2880 /* For a partitioned table, apply update to correct partition. */
2881 if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
2883 remoteslot, &newtup, CMD_UPDATE);
2884 else
2886 remoteslot, &newtup, rel->localindexoid);
2887
2888 finish_edata(edata);
2889
2890 /* Reset relation for error callback */
2892
2893 if (!run_as_owner)
2894 RestoreUserContext(&ucxt);
2895
2897
2899}
2900
2901/*
2902 * Workhorse for apply_handle_update()
2903 * relinfo is for the relation we're actually updating in
2904 * (could be a child partition of edata->targetRelInfo)
2905 */
2906static void
2908 ResultRelInfo *relinfo,
2909 TupleTableSlot *remoteslot,
2910 LogicalRepTupleData *newtup,
2911 Oid localindexoid)
2912{
2913 EState *estate = edata->estate;
2914 LogicalRepRelMapEntry *relmapentry = edata->targetRel;
2915 Relation localrel = relinfo->ri_RelationDesc;
2916 EPQState epqstate;
2917 TupleTableSlot *localslot = NULL;
2918 ConflictTupleInfo conflicttuple = {0};
2919 bool found;
2920 MemoryContext oldctx;
2921
2922 EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1, NIL);
2923 ExecOpenIndices(relinfo, false);
2924
2925 found = FindReplTupleInLocalRel(edata, localrel,
2926 &relmapentry->remoterel,
2927 localindexoid,
2928 remoteslot, &localslot);
2929
2930 /*
2931 * Tuple found.
2932 *
2933 * Note this will fail if there are other conflicting unique indexes.
2934 */
2935 if (found)
2936 {
2937 /*
2938 * Report the conflict if the tuple was modified by a different
2939 * origin.
2940 */
2941 if (GetTupleTransactionInfo(localslot, &conflicttuple.xmin,
2942 &conflicttuple.origin, &conflicttuple.ts) &&
2943 conflicttuple.origin != replorigin_session_origin)
2944 {
2945 TupleTableSlot *newslot;
2946
2947 /* Store the new tuple for conflict reporting */
2948 newslot = table_slot_create(localrel, &estate->es_tupleTable);
2949 slot_store_data(newslot, relmapentry, newtup);
2950
2951 conflicttuple.slot = localslot;
2952
2954 remoteslot, newslot,
2955 list_make1(&conflicttuple));
2956 }
2957
2958 /* Process and store remote tuple in the slot */
2960 slot_modify_data(remoteslot, localslot, relmapentry, newtup);
2961 MemoryContextSwitchTo(oldctx);
2962
2963 EvalPlanQualSetSlot(&epqstate, remoteslot);
2964
2965 InitConflictIndexes(relinfo);
2966
2967 /* Do the actual update. */
2969 ExecSimpleRelationUpdate(relinfo, estate, &epqstate, localslot,
2970 remoteslot);
2971 }
2972 else
2973 {
2975 TupleTableSlot *newslot = localslot;
2976
2977 /*
2978 * Detecting whether the tuple was recently deleted or never existed
2979 * is crucial to avoid misleading the user during conflict handling.
2980 */
2981 if (FindDeletedTupleInLocalRel(localrel, localindexoid, remoteslot,
2982 &conflicttuple.xmin,
2983 &conflicttuple.origin,
2984 &conflicttuple.ts) &&
2985 conflicttuple.origin != replorigin_session_origin)
2987 else
2989
2990 /* Store the new tuple for conflict reporting */
2991 slot_store_data(newslot, relmapentry, newtup);
2992
2993 /*
2994 * The tuple to be updated could not be found or was deleted. Do
2995 * nothing except for emitting a log message.
2996 */
2997 ReportApplyConflict(estate, relinfo, LOG, type, remoteslot, newslot,
2998 list_make1(&conflicttuple));
2999 }
3000
3001 /* Cleanup. */
3002 ExecCloseIndices(relinfo);
3003 EvalPlanQualEnd(&epqstate);
3004}
3005
3006/*
3007 * Handle DELETE message.
3008 *
3009 * TODO: FDW support
3010 */
3011static void
3013{
3015 LogicalRepTupleData oldtup;
3016 LogicalRepRelId relid;
3017 UserContext ucxt;
3018 ApplyExecutionData *edata;
3019 EState *estate;
3020 TupleTableSlot *remoteslot;
3021 MemoryContext oldctx;
3022 bool run_as_owner;
3023
3024 /*
3025 * Quick return if we are skipping data modification changes or handling
3026 * streamed transactions.
3027 */
3028 if (is_skipping_changes() ||
3030 return;
3031
3033
3034 relid = logicalrep_read_delete(s, &oldtup);
3037 {
3038 /*
3039 * The relation can't become interesting in the middle of the
3040 * transaction so it's safe to unlock it.
3041 */
3044 return;
3045 }
3046
3047 /* Set relation for error callback */
3049
3050 /* Check if we can do the delete. */
3052
3053 /*
3054 * Make sure that any user-supplied code runs as the table owner, unless
3055 * the user has opted out of that behavior.
3056 */
3057 run_as_owner = MySubscription->runasowner;
3058 if (!run_as_owner)
3059 SwitchToUntrustedUser(rel->localrel->rd_rel->relowner, &ucxt);
3060
3061 /* Initialize the executor state. */
3062 edata = create_edata_for_relation(rel);
3063 estate = edata->estate;
3064 remoteslot = ExecInitExtraTupleSlot(estate,
3066 &TTSOpsVirtual);
3067
3068 /* Build the search tuple. */
3070 slot_store_data(remoteslot, rel, &oldtup);
3071 MemoryContextSwitchTo(oldctx);
3072
3073 /* For a partitioned table, apply delete to correct partition. */
3074 if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
3076 remoteslot, NULL, CMD_DELETE);
3077 else
3078 {
3079 ResultRelInfo *relinfo = edata->targetRelInfo;
3080
3081 ExecOpenIndices(relinfo, false);
3082 apply_handle_delete_internal(edata, relinfo,
3083 remoteslot, rel->localindexoid);
3084 ExecCloseIndices(relinfo);
3085 }
3086
3087 finish_edata(edata);
3088
3089 /* Reset relation for error callback */
3091
3092 if (!run_as_owner)
3093 RestoreUserContext(&ucxt);
3094
3096
3098}
3099
3100/*
3101 * Workhorse for apply_handle_delete()
3102 * relinfo is for the relation we're actually deleting from
3103 * (could be a child partition of edata->targetRelInfo)
3104 */
3105static void
3107 ResultRelInfo *relinfo,
3108 TupleTableSlot *remoteslot,
3109 Oid localindexoid)
3110{
3111 EState *estate = edata->estate;
3112 Relation localrel = relinfo->ri_RelationDesc;
3113 LogicalRepRelation *remoterel = &edata->targetRel->remoterel;
3114 EPQState epqstate;
3115 TupleTableSlot *localslot;
3116 ConflictTupleInfo conflicttuple = {0};
3117 bool found;
3118
3119 EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1, NIL);
3120
3121 /* Caller should have opened indexes already. */
3122 Assert(relinfo->ri_IndexRelationDescs != NULL ||
3123 !localrel->rd_rel->relhasindex ||
3124 RelationGetIndexList(localrel) == NIL);
3125
3126 found = FindReplTupleInLocalRel(edata, localrel, remoterel, localindexoid,
3127 remoteslot, &localslot);
3128
3129 /* If found delete it. */
3130 if (found)
3131 {
3132 /*
3133 * Report the conflict if the tuple was modified by a different
3134 * origin.
3135 */
3136 if (GetTupleTransactionInfo(localslot, &conflicttuple.xmin,
3137 &conflicttuple.origin, &conflicttuple.ts) &&
3138 conflicttuple.origin != replorigin_session_origin)
3139 {
3140 conflicttuple.slot = localslot;
3142 remoteslot, NULL,
3143 list_make1(&conflicttuple));
3144 }
3145
3146 EvalPlanQualSetSlot(&epqstate, localslot);
3147
3148 /* Do the actual delete. */
3150 ExecSimpleRelationDelete(relinfo, estate, &epqstate, localslot);
3151 }
3152 else
3153 {
3154 /*
3155 * The tuple to be deleted could not be found. Do nothing except for
3156 * emitting a log message.
3157 */
3158 ReportApplyConflict(estate, relinfo, LOG, CT_DELETE_MISSING,
3159 remoteslot, NULL, list_make1(&conflicttuple));
3160 }
3161
3162 /* Cleanup. */
3163 EvalPlanQualEnd(&epqstate);
3164}
3165
3166/*
3167 * Try to find a tuple received from the publication side (in 'remoteslot') in
3168 * the corresponding local relation using either replica identity index,
3169 * primary key, index or if needed, sequential scan.
3170 *
3171 * Local tuple, if found, is returned in '*localslot'.
3172 */
3173static bool
3175 LogicalRepRelation *remoterel,
3176 Oid localidxoid,
3177 TupleTableSlot *remoteslot,
3178 TupleTableSlot **localslot)
3179{
3180 EState *estate = edata->estate;
3181 bool found;
3182
3183 /*
3184 * Regardless of the top-level operation, we're performing a read here, so
3185 * check for SELECT privileges.
3186 */
3188
3189 *localslot = table_slot_create(localrel, &estate->es_tupleTable);
3190
3191 Assert(OidIsValid(localidxoid) ||
3192 (remoterel->replident == REPLICA_IDENTITY_FULL));
3193
3194 if (OidIsValid(localidxoid))
3195 {
3196#ifdef USE_ASSERT_CHECKING
3197 Relation idxrel = index_open(localidxoid, AccessShareLock);
3198
3199 /* Index must be PK, RI, or usable for REPLICA IDENTITY FULL tables */
3200 Assert(GetRelationIdentityOrPK(localrel) == localidxoid ||
3201 (remoterel->replident == REPLICA_IDENTITY_FULL &&
3203 edata->targetRel->attrmap)));
3205#endif
3206
3207 found = RelationFindReplTupleByIndex(localrel, localidxoid,
3209 remoteslot, *localslot);
3210 }
3211 else
3213 remoteslot, *localslot);
3214
3215 return found;
3216}
3217
3218/*
3219 * Determine whether the index can reliably locate the deleted tuple in the
3220 * local relation.
3221 *
3222 * An index may exclude deleted tuples if it was re-indexed or re-created during
3223 * change application. Therefore, an index is considered usable only if the
3224 * conflict detection slot.xmin (conflict_detection_xmin) is greater than the
3225 * index tuple's xmin. This ensures that any tuples deleted prior to the index
3226 * creation or re-indexing are not relevant for conflict detection in the
3227 * current apply worker.
3228 *
3229 * Note that indexes may also be excluded if they were modified by other DDL
3230 * operations, such as ALTER INDEX. However, this is acceptable, as the
3231 * likelihood of such DDL changes coinciding with the need to scan dead
3232 * tuples for the update_deleted is low.
3233 */
3234static bool
3236 TransactionId conflict_detection_xmin)
3237{
3238 HeapTuple index_tuple;
3239 TransactionId index_xmin;
3240
3241 index_tuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(localindexoid));
3242
3243 if (!HeapTupleIsValid(index_tuple)) /* should not happen */
3244 elog(ERROR, "cache lookup failed for index %u", localindexoid);
3245
3246 /*
3247 * No need to check for a frozen transaction ID, as
3248 * TransactionIdPrecedes() manages it internally, treating it as falling
3249 * behind the conflict_detection_xmin.
3250 */
3251 index_xmin = HeapTupleHeaderGetXmin(index_tuple->t_data);
3252
3253 ReleaseSysCache(index_tuple);
3254
3255 return TransactionIdPrecedes(index_xmin, conflict_detection_xmin);
3256}
3257
3258/*
3259 * Attempts to locate a deleted tuple in the local relation that matches the
3260 * values of the tuple received from the publication side (in 'remoteslot').
3261 * The search is performed using either the replica identity index, primary
3262 * key, other available index, or a sequential scan if necessary.
3263 *
3264 * Returns true if the deleted tuple is found. If found, the transaction ID,
3265 * origin, and commit timestamp of the deletion are stored in '*delete_xid',
3266 * '*delete_origin', and '*delete_time' respectively.
3267 */
3268static bool
3270 TupleTableSlot *remoteslot,
3271 TransactionId *delete_xid, RepOriginId *delete_origin,
3272 TimestampTz *delete_time)
3273{
3274 TransactionId oldestxmin;
3275
3276 /*
3277 * Return false if either dead tuples are not retained or commit timestamp
3278 * data is not available.
3279 */
3281 return false;
3282
3283 /*
3284 * For conflict detection, we use the leader worker's
3285 * oldest_nonremovable_xid value instead of invoking
3286 * GetOldestNonRemovableTransactionId() or using the conflict detection
3287 * slot's xmin. The oldest_nonremovable_xid acts as a threshold to
3288 * identify tuples that were recently deleted. These deleted tuples are no
3289 * longer visible to concurrent transactions. However, if a remote update
3290 * matches such a tuple, we log an update_deleted conflict.
3291 *
3292 * While GetOldestNonRemovableTransactionId() and slot.xmin may return
3293 * transaction IDs older than oldest_nonremovable_xid, for our current
3294 * purpose, it is acceptable to treat tuples deleted by transactions prior
3295 * to oldest_nonremovable_xid as update_missing conflicts.
3296 */
3298 {
3300 }
3301 else
3302 {
3303 LogicalRepWorker *leader;
3304
3305 /*
3306 * Obtain the information from the leader apply worker as only the
3307 * leader manages oldest_nonremovable_xid (see
3308 * maybe_advance_nonremovable_xid() for details).
3309 */
3310 LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
3313 false);
3314 if (!leader)
3315 {
3316 ereport(ERROR,
3317 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
3318 errmsg("could not detect conflict as the leader apply worker has exited")));
3319 }
3320
3321 SpinLockAcquire(&leader->relmutex);
3322 oldestxmin = leader->oldest_nonremovable_xid;
3323 SpinLockRelease(&leader->relmutex);
3324 LWLockRelease(LogicalRepWorkerLock);
3325 }
3326
3327 /*
3328 * Return false if the leader apply worker has stopped retaining
3329 * information for detecting conflicts. This implies that update_deleted
3330 * can no longer be reliably detected.
3331 */
3332 if (!TransactionIdIsValid(oldestxmin))
3333 return false;
3334
3335 if (OidIsValid(localidxoid) &&
3336 IsIndexUsableForFindingDeletedTuple(localidxoid, oldestxmin))
3337 return RelationFindDeletedTupleInfoByIndex(localrel, localidxoid,
3338 remoteslot, oldestxmin,
3339 delete_xid, delete_origin,
3340 delete_time);
3341 else
3342 return RelationFindDeletedTupleInfoSeq(localrel, remoteslot,
3343 oldestxmin, delete_xid,
3344 delete_origin, delete_time);
3345}
3346
3347/*
3348 * This handles insert, update, delete on a partitioned table.
3349 */
3350static void
3352 TupleTableSlot *remoteslot,
3353 LogicalRepTupleData *newtup,
3354 CmdType operation)
3355{
3356 EState *estate = edata->estate;
3357 LogicalRepRelMapEntry *relmapentry = edata->targetRel;
3358 ResultRelInfo *relinfo = edata->targetRelInfo;
3359 Relation parentrel = relinfo->ri_RelationDesc;
3360 ModifyTableState *mtstate;
3361 PartitionTupleRouting *proute;
3362 ResultRelInfo *partrelinfo;
3363 Relation partrel;
3364 TupleTableSlot *remoteslot_part;
3365 TupleConversionMap *map;
3366 MemoryContext oldctx;
3367 LogicalRepRelMapEntry *part_entry = NULL;
3368 AttrMap *attrmap = NULL;
3369
3370 /* ModifyTableState is needed for ExecFindPartition(). */
3371 edata->mtstate = mtstate = makeNode(ModifyTableState);
3372 mtstate->ps.plan = NULL;
3373 mtstate->ps.state = estate;
3374 mtstate->operation = operation;
3375 mtstate->resultRelInfo = relinfo;
3376
3377 /* ... as is PartitionTupleRouting. */
3378 edata->proute = proute = ExecSetupPartitionTupleRouting(estate, parentrel);
3379
3380 /*
3381 * Find the partition to which the "search tuple" belongs.
3382 */
3383 Assert(remoteslot != NULL);
3385 partrelinfo = ExecFindPartition(mtstate, relinfo, proute,
3386 remoteslot, estate);
3387 Assert(partrelinfo != NULL);
3388 partrel = partrelinfo->ri_RelationDesc;
3389
3390 /*
3391 * Check for supported relkind. We need this since partitions might be of
3392 * unsupported relkinds; and the set of partitions can change, so checking
3393 * at CREATE/ALTER SUBSCRIPTION would be insufficient.
3394 */
3395 CheckSubscriptionRelkind(partrel->rd_rel->relkind,
3396 relmapentry->remoterel.relkind,
3398 RelationGetRelationName(partrel));
3399
3400 /*
3401 * To perform any of the operations below, the tuple must match the
3402 * partition's rowtype. Convert if needed or just copy, using a dedicated
3403 * slot to store the tuple in any case.
3404 */
3405 remoteslot_part = partrelinfo->ri_PartitionTupleSlot;
3406 if (remoteslot_part == NULL)
3407 remoteslot_part = table_slot_create(partrel, &estate->es_tupleTable);
3408 map = ExecGetRootToChildMap(partrelinfo, estate);
3409 if (map != NULL)
3410 {
3411 attrmap = map->attrMap;
3412 remoteslot_part = execute_attr_map_slot(attrmap, remoteslot,
3413 remoteslot_part);
3414 }
3415 else
3416 {
3417 remoteslot_part = ExecCopySlot(remoteslot_part, remoteslot);
3418 slot_getallattrs(remoteslot_part);
3419 }
3420 MemoryContextSwitchTo(oldctx);
3421
3422 /* Check if we can do the update or delete on the leaf partition. */
3423 if (operation == CMD_UPDATE || operation == CMD_DELETE)
3424 {
3425 part_entry = logicalrep_partition_open(relmapentry, partrel,
3426 attrmap);
3427 check_relation_updatable(part_entry);
3428 }
3429
3430 switch (operation)
3431 {
3432 case CMD_INSERT:
3433 apply_handle_insert_internal(edata, partrelinfo,
3434 remoteslot_part);
3435 break;
3436
3437 case CMD_DELETE:
3438 apply_handle_delete_internal(edata, partrelinfo,
3439 remoteslot_part,
3440 part_entry->localindexoid);
3441 break;
3442
3443 case CMD_UPDATE:
3444
3445 /*
3446 * For UPDATE, depending on whether or not the updated tuple
3447 * satisfies the partition's constraint, perform a simple UPDATE
3448 * of the partition or move the updated tuple into a different
3449 * suitable partition.
3450 */
3451 {
3452 TupleTableSlot *localslot;
3453 ResultRelInfo *partrelinfo_new;
3454 Relation partrel_new;
3455 bool found;
3456 EPQState epqstate;
3457 ConflictTupleInfo conflicttuple = {0};
3458
3459 /* Get the matching local tuple from the partition. */
3460 found = FindReplTupleInLocalRel(edata, partrel,
3461 &part_entry->remoterel,
3462 part_entry->localindexoid,
3463 remoteslot_part, &localslot);
3464 if (!found)
3465 {
3467 TupleTableSlot *newslot = localslot;
3468
3469 /*
3470 * Detecting whether the tuple was recently deleted or
3471 * never existed is crucial to avoid misleading the user
3472 * during conflict handling.
3473 */
3474 if (FindDeletedTupleInLocalRel(partrel,
3475 part_entry->localindexoid,
3476 remoteslot_part,
3477 &conflicttuple.xmin,
3478 &conflicttuple.origin,
3479 &conflicttuple.ts) &&
3480 conflicttuple.origin != replorigin_session_origin)
3482 else
3484
3485 /* Store the new tuple for conflict reporting */
3486 slot_store_data(newslot, part_entry, newtup);
3487
3488 /*
3489 * The tuple to be updated could not be found or was
3490 * deleted. Do nothing except for emitting a log message.
3491 */
3492 ReportApplyConflict(estate, partrelinfo, LOG,
3493 type, remoteslot_part, newslot,
3494 list_make1(&conflicttuple));
3495
3496 return;
3497 }
3498
3499 /*
3500 * Report the conflict if the tuple was modified by a
3501 * different origin.
3502 */
3503 if (GetTupleTransactionInfo(localslot, &conflicttuple.xmin,
3504 &conflicttuple.origin,
3505 &conflicttuple.ts) &&
3506 conflicttuple.origin != replorigin_session_origin)
3507 {
3508 TupleTableSlot *newslot;
3509
3510 /* Store the new tuple for conflict reporting */
3511 newslot = table_slot_create(partrel, &estate->es_tupleTable);
3512 slot_store_data(newslot, part_entry, newtup);
3513
3514 conflicttuple.slot = localslot;
3515
3516 ReportApplyConflict(estate, partrelinfo, LOG, CT_UPDATE_ORIGIN_DIFFERS,
3517 remoteslot_part, newslot,
3518 list_make1(&conflicttuple));
3519 }
3520
3521 /*
3522 * Apply the update to the local tuple, putting the result in
3523 * remoteslot_part.
3524 */
3526 slot_modify_data(remoteslot_part, localslot, part_entry,
3527 newtup);
3528 MemoryContextSwitchTo(oldctx);
3529
3530 EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1, NIL);
3531
3532 /*
3533 * Does the updated tuple still satisfy the current
3534 * partition's constraint?
3535 */
3536 if (!partrel->rd_rel->relispartition ||
3537 ExecPartitionCheck(partrelinfo, remoteslot_part, estate,
3538 false))
3539 {
3540 /*
3541 * Yes, so simply UPDATE the partition. We don't call
3542 * apply_handle_update_internal() here, which would
3543 * normally do the following work, to avoid repeating some
3544 * work already done above to find the local tuple in the
3545 * partition.
3546 */
3547 InitConflictIndexes(partrelinfo);
3548
3549 EvalPlanQualSetSlot(&epqstate, remoteslot_part);
3551 ACL_UPDATE);
3552 ExecSimpleRelationUpdate(partrelinfo, estate, &epqstate,
3553 localslot, remoteslot_part);
3554 }
3555 else
3556 {
3557 /* Move the tuple into the new partition. */
3558
3559 /*
3560 * New partition will be found using tuple routing, which
3561 * can only occur via the parent table. We might need to
3562 * convert the tuple to the parent's rowtype. Note that
3563 * this is the tuple found in the partition, not the
3564 * original search tuple received by this function.
3565 */
3566 if (map)
3567 {
3568 TupleConversionMap *PartitionToRootMap =
3570 RelationGetDescr(parentrel));
3571
3572 remoteslot =
3573 execute_attr_map_slot(PartitionToRootMap->attrMap,
3574 remoteslot_part, remoteslot);
3575 }
3576 else
3577 {
3578 remoteslot = ExecCopySlot(remoteslot, remoteslot_part);
3579 slot_getallattrs(remoteslot);
3580 }
3581
3582 /* Find the new partition. */
3584 partrelinfo_new = ExecFindPartition(mtstate, relinfo,
3585 proute, remoteslot,
3586 estate);
3587 MemoryContextSwitchTo(oldctx);
3588 Assert(partrelinfo_new != partrelinfo);
3589 partrel_new = partrelinfo_new->ri_RelationDesc;
3590
3591 /* Check that new partition also has supported relkind. */
3592 CheckSubscriptionRelkind(partrel_new->rd_rel->relkind,
3593 relmapentry->remoterel.relkind,
3595 RelationGetRelationName(partrel_new));
3596
3597 /* DELETE old tuple found in the old partition. */
3598 EvalPlanQualSetSlot(&epqstate, localslot);
3600 ExecSimpleRelationDelete(partrelinfo, estate, &epqstate, localslot);
3601
3602 /* INSERT new tuple into the new partition. */
3603
3604 /*
3605 * Convert the replacement tuple to match the destination
3606 * partition rowtype.
3607 */
3609 remoteslot_part = partrelinfo_new->ri_PartitionTupleSlot;
3610 if (remoteslot_part == NULL)
3611 remoteslot_part = table_slot_create(partrel_new,
3612 &estate->es_tupleTable);
3613 map = ExecGetRootToChildMap(partrelinfo_new, estate);
3614 if (map != NULL)
3615 {
3616 remoteslot_part = execute_attr_map_slot(map->attrMap,
3617 remoteslot,
3618 remoteslot_part);
3619 }
3620 else
3621 {
3622 remoteslot_part = ExecCopySlot(remoteslot_part,
3623 remoteslot);
3624 slot_getallattrs(remoteslot);
3625 }
3626 MemoryContextSwitchTo(oldctx);
3627 apply_handle_insert_internal(edata, partrelinfo_new,
3628 remoteslot_part);
3629 }
3630
3631 EvalPlanQualEnd(&epqstate);
3632 }
3633 break;
3634
3635 default:
3636 elog(ERROR, "unrecognized CmdType: %d", (int) operation);
3637 break;
3638 }
3639}
3640
3641/*
3642 * Handle TRUNCATE message.
3643 *
3644 * TODO: FDW support
3645 */
3646static void
3648{
3649 bool cascade = false;
3650 bool restart_seqs = false;
3651 List *remote_relids = NIL;
3652 List *remote_rels = NIL;
3653 List *rels = NIL;
3654 List *part_rels = NIL;
3655 List *relids = NIL;
3656 List *relids_logged = NIL;
3657 ListCell *lc;
3658 LOCKMODE lockmode = AccessExclusiveLock;
3659
3660 /*
3661 * Quick return if we are skipping data modification changes or handling
3662 * streamed transactions.
3663 */
3664 if (is_skipping_changes() ||
3666 return;
3667
3669
3670 remote_relids = logicalrep_read_truncate(s, &cascade, &restart_seqs);
3671
3672 foreach(lc, remote_relids)
3673 {
3674 LogicalRepRelId relid = lfirst_oid(lc);
3676
3677 rel = logicalrep_rel_open(relid, lockmode);
3679 {
3680 /*
3681 * The relation can't become interesting in the middle of the
3682 * transaction so it's safe to unlock it.
3683 */
3684 logicalrep_rel_close(rel, lockmode);
3685 continue;
3686 }
3687
3688 remote_rels = lappend(remote_rels, rel);
3690 rels = lappend(rels, rel->localrel);
3691 relids = lappend_oid(relids, rel->localreloid);
3693 relids_logged = lappend_oid(relids_logged, rel->localreloid);
3694
3695 /*
3696 * Truncate partitions if we got a message to truncate a partitioned
3697 * table.
3698 */
3699 if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
3700 {
3701 ListCell *child;
3702 List *children = find_all_inheritors(rel->localreloid,
3703 lockmode,
3704 NULL);
3705
3706 foreach(child, children)
3707 {
3708 Oid childrelid = lfirst_oid(child);
3709 Relation childrel;
3710
3711 if (list_member_oid(relids, childrelid))
3712 continue;
3713
3714 /* find_all_inheritors already got lock */
3715 childrel = table_open(childrelid, NoLock);
3716
3717 /*
3718 * Ignore temp tables of other backends. See similar code in
3719 * ExecuteTruncate().
3720 */
3721 if (RELATION_IS_OTHER_TEMP(childrel))
3722 {
3723 table_close(childrel, lockmode);
3724 continue;
3725 }
3726
3728 rels = lappend(rels, childrel);
3729 part_rels = lappend(part_rels, childrel);
3730 relids = lappend_oid(relids, childrelid);
3731 /* Log this relation only if needed for logical decoding */
3732 if (RelationIsLogicallyLogged(childrel))
3733 relids_logged = lappend_oid(relids_logged, childrelid);
3734 }
3735 }
3736 }
3737
3738 /*
3739 * Even if we used CASCADE on the upstream primary we explicitly default
3740 * to replaying changes without further cascading. This might be later
3741 * changeable with a user specified option.
3742 *
3743 * MySubscription->runasowner tells us whether we want to execute
3744 * replication actions as the subscription owner; the last argument to
3745 * TruncateGuts tells it whether we want to switch to the table owner.
3746 * Those are exactly opposite conditions.
3747 */
3749 relids,
3750 relids_logged,
3752 restart_seqs,
3754 foreach(lc, remote_rels)
3755 {
3756 LogicalRepRelMapEntry *rel = lfirst(lc);
3757
3759 }
3760 foreach(lc, part_rels)
3761 {
3762 Relation rel = lfirst(lc);
3763
3764 table_close(rel, NoLock);
3765 }
3766
3768}
3769
3770
3771/*
3772 * Logical replication protocol message dispatcher.
3773 */
3774void
3776{
3778 LogicalRepMsgType saved_command;
3779
3780 /*
3781 * Set the current command being applied. Since this function can be
3782 * called recursively when applying spooled changes, save the current
3783 * command.
3784 */
3785 saved_command = apply_error_callback_arg.command;
3787
3788 switch (action)
3789 {
3792 break;
3793
3796 break;
3797
3800 break;
3801
3804 break;
3805
3808 break;
3809
3812 break;
3813
3816 break;
3817
3820 break;
3821
3824 break;
3825
3827
3828 /*
3829 * Logical replication does not use generic logical messages yet.
3830 * Although, it could be used by other applications that use this
3831 * output plugin.
3832 */
3833 break;
3834
3837 break;
3838
3841 break;
3842
3845 break;
3846
3849 break;
3850
3853 break;
3854
3857 break;
3858
3861 break;
3862
3865 break;
3866
3869 break;
3870
3871 default:
3872 ereport(ERROR,
3873 (errcode(ERRCODE_PROTOCOL_VIOLATION),
3874 errmsg("invalid logical replication message type \"??? (%d)\"", action)));
3875 }
3876
3877 /* Reset the current command */
3878 apply_error_callback_arg.command = saved_command;
3879}
3880
3881/*
3882 * Figure out which write/flush positions to report to the walsender process.
3883 *
3884 * We can't simply report back the last LSN the walsender sent us because the
3885 * local transaction might not yet be flushed to disk locally. Instead we
3886 * build a list that associates local with remote LSNs for every commit. When
3887 * reporting back the flush position to the sender we iterate that list and
3888 * check which entries on it are already locally flushed. Those we can report
3889 * as having been flushed.
3890 *
3891 * The have_pending_txes is true if there are outstanding transactions that
3892 * need to be flushed.
3893 */
3894static void
3896 bool *have_pending_txes)
3897{
3898 dlist_mutable_iter iter;
3899 XLogRecPtr local_flush = GetFlushRecPtr(NULL);
3900
3902 *flush = InvalidXLogRecPtr;
3903
3905 {
3906 FlushPosition *pos =
3907 dlist_container(FlushPosition, node, iter.cur);
3908
3909 *write = pos->remote_end;
3910
3911 if (pos->local_end <= local_flush)
3912 {
3913 *flush = pos->remote_end;
3914 dlist_delete(iter.cur);
3915 pfree(pos);
3916 }
3917 else
3918 {
3919 /*
3920 * Don't want to uselessly iterate over the rest of the list which
3921 * could potentially be long. Instead get the last element and
3922 * grab the write position from there.
3923 */
3925 &lsn_mapping);
3926 *write = pos->remote_end;
3927 *have_pending_txes = true;
3928 return;
3929 }
3930 }
3931
3932 *have_pending_txes = !dlist_is_empty(&lsn_mapping);
3933}
3934
3935/*
3936 * Store current remote/local lsn pair in the tracking list.
3937 */
3938void
3940{
3941 FlushPosition *flushpos;
3942
3943 /*
3944 * Skip for parallel apply workers, because the lsn_mapping is maintained
3945 * by the leader apply worker.
3946 */
3948 return;
3949
3950 /* Need to do this in permanent context */
3952
3953 /* Track commit lsn */
3954 flushpos = (FlushPosition *) palloc(sizeof(FlushPosition));
3955 flushpos->local_end = local_lsn;
3956 flushpos->remote_end = remote_lsn;
3957
3958 dlist_push_tail(&lsn_mapping, &flushpos->node);
3960}
3961
3962
3963/* Update statistics of the worker. */
3964static void
3965UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply)
3966{
3967 MyLogicalRepWorker->last_lsn = last_lsn;
3970 if (reply)
3971 {
3972 MyLogicalRepWorker->reply_lsn = last_lsn;
3973 MyLogicalRepWorker->reply_time = send_time;
3974 }
3975}
3976
3977/*
3978 * Apply main loop.
3979 */
3980static void
3982{
3983 TimestampTz last_recv_timestamp = GetCurrentTimestamp();
3984 bool ping_sent = false;
3985 TimeLineID tli;
3986 ErrorContextCallback errcallback;
3987 RetainDeadTuplesData rdt_data = {0};
3988
3989 /*
3990 * Init the ApplyMessageContext which we clean up after each replication
3991 * protocol message.
3992 */
3994 "ApplyMessageContext",
3996
3997 /*
3998 * This memory context is used for per-stream data when the streaming mode
3999 * is enabled. This context is reset on each stream stop.
4000 */
4002 "LogicalStreamingContext",
4004
4005 /* mark as idle, before starting to loop */
4007
4008 /*
4009 * Push apply error context callback. Fields will be filled while applying
4010 * a change.
4011 */
4012 errcallback.callback = apply_error_callback;
4013 errcallback.previous = error_context_stack;
4014 error_context_stack = &errcallback;
4016
4017 /* This outer loop iterates once per wait. */
4018 for (;;)
4019 {
4021 int rc;
4022 int len;
4023 char *buf = NULL;
4024 bool endofstream = false;
4025 long wait_time;
4026
4028
4030
4032
4033 if (len != 0)
4034 {
4035 /* Loop to process all available data (without blocking). */
4036 for (;;)
4037 {
4039
4040 if (len == 0)
4041 {
4042 break;
4043 }
4044 else if (len < 0)
4045 {
4046 ereport(LOG,
4047 (errmsg("data stream from publisher has ended")));
4048 endofstream = true;
4049 break;
4050 }
4051 else
4052 {
4053 int c;
4055
4057 {
4058 ConfigReloadPending = false;
4060 }
4061
4062 /* Reset timeout. */
4063 last_recv_timestamp = GetCurrentTimestamp();
4064 ping_sent = false;
4065
4066 rdt_data.last_recv_time = last_recv_timestamp;
4067
4068 /* Ensure we are reading the data into our memory context. */
4070
4072
4073 c = pq_getmsgbyte(&s);
4074
4075 if (c == PqReplMsg_WALData)
4076 {
4077 XLogRecPtr start_lsn;
4078 XLogRecPtr end_lsn;
4079 TimestampTz send_time;
4080
4081 start_lsn = pq_getmsgint64(&s);
4082 end_lsn = pq_getmsgint64(&s);
4083 send_time = pq_getmsgint64(&s);
4084
4085 if (last_received < start_lsn)
4086 last_received = start_lsn;
4087
4088 if (last_received < end_lsn)
4089 last_received = end_lsn;
4090
4091 UpdateWorkerStats(last_received, send_time, false);
4092
4093 apply_dispatch(&s);
4094
4095 maybe_advance_nonremovable_xid(&rdt_data, false);
4096 }
4097 else if (c == PqReplMsg_Keepalive)
4098 {
4099 XLogRecPtr end_lsn;
4101 bool reply_requested;
4102
4103 end_lsn = pq_getmsgint64(&s);
4105 reply_requested = pq_getmsgbyte(&s);
4106
4107 if (last_received < end_lsn)
4108 last_received = end_lsn;
4109
4110 send_feedback(last_received, reply_requested, false);
4111
4112 maybe_advance_nonremovable_xid(&rdt_data, false);
4113
4114 UpdateWorkerStats(last_received, timestamp, true);
4115 }
4116 else if (c == PqReplMsg_PrimaryStatusUpdate)
4117 {
4118 rdt_data.remote_lsn = pq_getmsgint64(&s);
4121 rdt_data.reply_time = pq_getmsgint64(&s);
4122
4123 /*
4124 * This should never happen, see
4125 * ProcessStandbyPSRequestMessage. But if it happens
4126 * due to a bug, we don't want to proceed as it can
4127 * incorrectly advance oldest_nonremovable_xid.
4128 */
4129 if (!XLogRecPtrIsValid(rdt_data.remote_lsn))
4130 elog(ERROR, "cannot get the latest WAL position from the publisher");
4131
4132 maybe_advance_nonremovable_xid(&rdt_data, true);
4133
4134 UpdateWorkerStats(last_received, rdt_data.reply_time, false);
4135 }
4136 /* other message types are purposefully ignored */
4137
4139 }
4140
4142 }
4143 }
4144
4145 /* confirm all writes so far */
4146 send_feedback(last_received, false, false);
4147
4148 /* Reset the timestamp if no message was received */
4149 rdt_data.last_recv_time = 0;
4150
4151 maybe_advance_nonremovable_xid(&rdt_data, false);
4152
4154 {
4155 /*
4156 * If we didn't get any transactions for a while there might be
4157 * unconsumed invalidation messages in the queue, consume them
4158 * now.
4159 */
4162
4163 /*
4164 * Process any relations that are being synchronized in parallel
4165 * and any newly added tables or sequences.
4166 */
4167 ProcessSyncingRelations(last_received);
4168 }
4169
4170 /* Cleanup the memory. */
4173
4174 /* Check if we need to exit the streaming loop. */
4175 if (endofstream)
4176 break;
4177
4178 /*
4179 * Wait for more data or latch. If we have unflushed transactions,
4180 * wake up after WalWriterDelay to see if they've been flushed yet (in
4181 * which case we should send a feedback message). Otherwise, there's
4182 * no particular urgency about waking up unless we get data or a
4183 * signal.
4184 */
4186 wait_time = WalWriterDelay;
4187 else
4188 wait_time = NAPTIME_PER_CYCLE;
4189
4190 /*
4191 * Ensure to wake up when it's possible to advance the non-removable
4192 * transaction ID, or when the retention duration may have exceeded
4193 * max_retention_duration.
4194 */
4196 {
4197 if (rdt_data.phase == RDT_GET_CANDIDATE_XID &&
4198 rdt_data.xid_advance_interval)
4199 wait_time = Min(wait_time, rdt_data.xid_advance_interval);
4200 else if (MySubscription->maxretention > 0)
4201 wait_time = Min(wait_time, MySubscription->maxretention);
4202 }
4203
4207 fd, wait_time,
4208 WAIT_EVENT_LOGICAL_APPLY_MAIN);
4209
4210 if (rc & WL_LATCH_SET)
4211 {
4214 }
4215
4217 {
4218 ConfigReloadPending = false;
4220 }
4221
4222 if (rc & WL_TIMEOUT)
4223 {
4224 /*
4225 * We didn't receive anything new. If we haven't heard anything
4226 * from the server for more than wal_receiver_timeout / 2, ping
4227 * the server. Also, if it's been longer than
4228 * wal_receiver_status_interval since the last update we sent,
4229 * send a status update to the primary anyway, to report any
4230 * progress in applying WAL.
4231 */
4232 bool requestReply = false;
4233
4234 /*
4235 * Check if time since last receive from primary has reached the
4236 * configured limit.
4237 */
4238 if (wal_receiver_timeout > 0)
4239 {
4241 TimestampTz timeout;
4242
4243 timeout =
4244 TimestampTzPlusMilliseconds(last_recv_timestamp,
4246
4247 if (now >= timeout)
4248 ereport(ERROR,
4249 (errcode(ERRCODE_CONNECTION_FAILURE),
4250 errmsg("terminating logical replication worker due to timeout")));
4251
4252 /* Check to see if it's time for a ping. */
4253 if (!ping_sent)
4254 {
4255 timeout = TimestampTzPlusMilliseconds(last_recv_timestamp,
4256 (wal_receiver_timeout / 2));
4257 if (now >= timeout)
4258 {
4259 requestReply = true;
4260 ping_sent = true;
4261 }
4262 }
4263 }
4264
4265 send_feedback(last_received, requestReply, requestReply);
4266
4267 maybe_advance_nonremovable_xid(&rdt_data, false);
4268
4269 /*
4270 * Force reporting to ensure long idle periods don't lead to
4271 * arbitrarily delayed stats. Stats can only be reported outside
4272 * of (implicit or explicit) transactions. That shouldn't lead to
4273 * stats being delayed for long, because transactions are either
4274 * sent as a whole on commit or streamed. Streamed transactions
4275 * are spilled to disk and applied on commit.
4276 */
4277 if (!IsTransactionState())
4278 pgstat_report_stat(true);
4279 }
4280 }
4281
4282 /* Pop the error context stack */
4283 error_context_stack = errcallback.previous;
4285
4286 /* All done */
4288}
4289
4290/*
4291 * Send a Standby Status Update message to server.
4292 *
4293 * 'recvpos' is the latest LSN we've received data to, force is set if we need
4294 * to send a response to avoid timeouts.
4295 */
4296static void
4297send_feedback(XLogRecPtr recvpos, bool force, bool requestReply)
4298{
4299 static StringInfo reply_message = NULL;
4300 static TimestampTz send_time = 0;
4301
4302 static XLogRecPtr last_recvpos = InvalidXLogRecPtr;
4303 static XLogRecPtr last_writepos = InvalidXLogRecPtr;
4304
4305 XLogRecPtr writepos;
4306 XLogRecPtr flushpos;
4308 bool have_pending_txes;
4309
4310 /*
4311 * If the user doesn't want status to be reported to the publisher, be
4312 * sure to exit before doing anything at all.
4313 */
4314 if (!force && wal_receiver_status_interval <= 0)
4315 return;
4316
4317 /* It's legal to not pass a recvpos */
4318 if (recvpos < last_recvpos)
4319 recvpos = last_recvpos;
4320
4321 get_flush_position(&writepos, &flushpos, &have_pending_txes);
4322
4323 /*
4324 * No outstanding transactions to flush, we can report the latest received
4325 * position. This is important for synchronous replication.
4326 */
4327 if (!have_pending_txes)
4328 flushpos = writepos = recvpos;
4329
4330 if (writepos < last_writepos)
4331 writepos = last_writepos;
4332
4333 if (flushpos < last_flushpos)
4334 flushpos = last_flushpos;
4335
4337
4338 /* if we've already reported everything we're good */
4339 if (!force &&
4340 writepos == last_writepos &&
4341 flushpos == last_flushpos &&
4342 !TimestampDifferenceExceeds(send_time, now,
4344 return;
4345 send_time = now;
4346
4347 if (!reply_message)
4348 {
4350
4352 MemoryContextSwitchTo(oldctx);
4353 }
4354 else
4356
4358 pq_sendint64(reply_message, recvpos); /* write */
4359 pq_sendint64(reply_message, flushpos); /* flush */
4360 pq_sendint64(reply_message, writepos); /* apply */
4361 pq_sendint64(reply_message, now); /* sendTime */
4362 pq_sendbyte(reply_message, requestReply); /* replyRequested */
4363
4364 elog(DEBUG2, "sending feedback (force %d) to recv %X/%08X, write %X/%08X, flush %X/%08X",
4365 force,
4366 LSN_FORMAT_ARGS(recvpos),
4367 LSN_FORMAT_ARGS(writepos),
4368 LSN_FORMAT_ARGS(flushpos));
4369
4372
4373 if (recvpos > last_recvpos)
4374 last_recvpos = recvpos;
4375 if (writepos > last_writepos)
4376 last_writepos = writepos;
4377 if (flushpos > last_flushpos)
4378 last_flushpos = flushpos;
4379}
4380
4381/*
4382 * Attempt to advance the non-removable transaction ID.
4383 *
4384 * See comments atop worker.c for details.
4385 */
4386static void
4388 bool status_received)
4389{
4390 if (!can_advance_nonremovable_xid(rdt_data))
4391 return;
4392
4393 process_rdt_phase_transition(rdt_data, status_received);
4394}
4395
4396/*
4397 * Preliminary check to determine if advancing the non-removable transaction ID
4398 * is allowed.
4399 */
4400static bool
4402{
4403 /*
4404 * It is sufficient to manage non-removable transaction ID for a
4405 * subscription by the main apply worker to detect update_deleted reliably
4406 * even for table sync or parallel apply workers.
4407 */
4409 return false;
4410
4411 /* No need to advance if retaining dead tuples is not required */
4413 return false;
4414
4415 return true;
4416}
4417
4418/*
4419 * Process phase transitions during the non-removable transaction ID
4420 * advancement. See comments atop worker.c for details of the transition.
4421 */
4422static void
4424 bool status_received)
4425{
4426 switch (rdt_data->phase)
4427 {
4429 get_candidate_xid(rdt_data);
4430 break;
4432 request_publisher_status(rdt_data);
4433 break;
4435 wait_for_publisher_status(rdt_data, status_received);
4436 break;
4438 wait_for_local_flush(rdt_data);
4439 break;
4442 break;
4445 break;
4446 }
4447}
4448
4449/*
4450 * Workhorse for the RDT_GET_CANDIDATE_XID phase.
4451 */
4452static void
4454{
4455 TransactionId oldest_running_xid;
4457
4458 /*
4459 * Use last_recv_time when applying changes in the loop to avoid
4460 * unnecessary system time retrieval. If last_recv_time is not available,
4461 * obtain the current timestamp.
4462 */
4463 now = rdt_data->last_recv_time ? rdt_data->last_recv_time : GetCurrentTimestamp();
4464
4465 /*
4466 * Compute the candidate_xid and request the publisher status at most once
4467 * per xid_advance_interval. Refer to adjust_xid_advance_interval() for
4468 * details on how this value is dynamically adjusted. This is to avoid
4469 * using CPU and network resources without making much progress.
4470 */
4472 rdt_data->xid_advance_interval))
4473 return;
4474
4475 /*
4476 * Immediately update the timer, even if the function returns later
4477 * without setting candidate_xid due to inactivity on the subscriber. This
4478 * avoids frequent calls to GetOldestActiveTransactionId.
4479 */
4480 rdt_data->candidate_xid_time = now;
4481
4482 /*
4483 * Consider transactions in the current database, as only dead tuples from
4484 * this database are required for conflict detection.
4485 */
4486 oldest_running_xid = GetOldestActiveTransactionId(false, false);
4487
4488 /*
4489 * Oldest active transaction ID (oldest_running_xid) can't be behind any
4490 * of its previously computed value.
4491 */
4493 oldest_running_xid));
4494
4495 /* Return if the oldest_nonremovable_xid cannot be advanced */
4497 oldest_running_xid))
4498 {
4499 adjust_xid_advance_interval(rdt_data, false);
4500 return;
4501 }
4502
4503 adjust_xid_advance_interval(rdt_data, true);
4504
4505 rdt_data->candidate_xid = oldest_running_xid;
4507
4508 /* process the next phase */
4509 process_rdt_phase_transition(rdt_data, false);
4510}
4511
4512/*
4513 * Workhorse for the RDT_REQUEST_PUBLISHER_STATUS phase.
4514 */
4515static void
4517{
4518 static StringInfo request_message = NULL;
4519
4520 if (!request_message)
4521 {
4523
4524 request_message = makeStringInfo();
4525 MemoryContextSwitchTo(oldctx);
4526 }
4527 else
4528 resetStringInfo(request_message);
4529
4530 /*
4531 * Send the current time to update the remote walsender's latest reply
4532 * message received time.
4533 */
4535 pq_sendint64(request_message, GetCurrentTimestamp());
4536
4537 elog(DEBUG2, "sending publisher status request message");
4538
4539 /* Send a request for the publisher status */
4541 request_message->data, request_message->len);
4542
4544
4545 /*
4546 * Skip calling maybe_advance_nonremovable_xid() since further transition
4547 * is possible only once we receive the publisher status message.
4548 */
4549}
4550
4551/*
4552 * Workhorse for the RDT_WAIT_FOR_PUBLISHER_STATUS phase.
4553 */
4554static void
4556 bool status_received)
4557{
4558 /*
4559 * Return if we have requested but not yet received the publisher status.
4560 */
4561 if (!status_received)
4562 return;
4563
4564 /*
4565 * We don't need to maintain oldest_nonremovable_xid if we decide to stop
4566 * retaining conflict information for this worker.
4567 */
4569 {
4571 return;
4572 }
4573
4575 rdt_data->remote_wait_for = rdt_data->remote_nextxid;
4576
4577 /*
4578 * Check if all remote concurrent transactions that were active at the
4579 * first status request have now completed. If completed, proceed to the
4580 * next phase; otherwise, continue checking the publisher status until
4581 * these transactions finish.
4582 *
4583 * It's possible that transactions in the commit phase during the last
4584 * cycle have now finished committing, but remote_oldestxid remains older
4585 * than remote_wait_for. This can happen if some old transaction came in
4586 * the commit phase when we requested status in this cycle. We do not
4587 * handle this case explicitly as it's rare and the benefit doesn't
4588 * justify the required complexity. Tracking would require either caching
4589 * all xids at the publisher or sending them to subscribers. The condition
4590 * will resolve naturally once the remaining transactions are finished.
4591 *
4592 * Directly advancing the non-removable transaction ID is possible if
4593 * there are no activities on the publisher since the last advancement
4594 * cycle. However, it requires maintaining two fields, last_remote_nextxid
4595 * and last_remote_lsn, within the structure for comparison with the
4596 * current cycle's values. Considering the minimal cost of continuing in
4597 * RDT_WAIT_FOR_LOCAL_FLUSH without awaiting changes, we opted not to
4598 * advance the transaction ID here.
4599 */
4601 rdt_data->remote_oldestxid))
4602 rdt_data->phase = RDT_WAIT_FOR_LOCAL_FLUSH;
4603 else
4605
4606 /* process the next phase */
4607 process_rdt_phase_transition(rdt_data, false);
4608}
4609
4610/*
4611 * Workhorse for the RDT_WAIT_FOR_LOCAL_FLUSH phase.
4612 */
4613static void
4615{
4616 Assert(XLogRecPtrIsValid(rdt_data->remote_lsn) &&
4618
4619 /*
4620 * We expect the publisher and subscriber clocks to be in sync using time
4621 * sync service like NTP. Otherwise, we will advance this worker's
4622 * oldest_nonremovable_xid prematurely, leading to the removal of rows
4623 * required to detect update_deleted reliably. This check primarily
4624 * addresses scenarios where the publisher's clock falls behind; if the
4625 * publisher's clock is ahead, subsequent transactions will naturally bear
4626 * later commit timestamps, conforming to the design outlined atop
4627 * worker.c.
4628 *
4629 * XXX Consider waiting for the publisher's clock to catch up with the
4630 * subscriber's before proceeding to the next phase.
4631 */
4633 rdt_data->candidate_xid_time, 0))
4634 ereport(ERROR,
4635 errmsg_internal("oldest_nonremovable_xid transaction ID could be advanced prematurely"),
4636 errdetail_internal("The clock on the publisher is behind that of the subscriber."));
4637
4638 /*
4639 * Do not attempt to advance the non-removable transaction ID when table
4640 * sync is in progress. During this time, changes from a single
4641 * transaction may be applied by multiple table sync workers corresponding
4642 * to the target tables. So, it's necessary for all table sync workers to
4643 * apply and flush the corresponding changes before advancing the
4644 * transaction ID, otherwise, dead tuples that are still needed for
4645 * conflict detection in table sync workers could be removed prematurely.
4646 * However, confirming the apply and flush progress across all table sync
4647 * workers is complex and not worth the effort, so we simply return if not
4648 * all tables are in the READY state.
4649 *
4650 * Advancing the transaction ID is necessary even when no tables are
4651 * currently subscribed, to avoid retaining dead tuples unnecessarily.
4652 * While it might seem safe to skip all phases and directly assign
4653 * candidate_xid to oldest_nonremovable_xid during the
4654 * RDT_GET_CANDIDATE_XID phase in such cases, this is unsafe. If users
4655 * concurrently add tables to the subscription, the apply worker may not
4656 * process invalidations in time. Consequently,
4657 * HasSubscriptionTablesCached() might miss the new tables, leading to
4658 * premature advancement of oldest_nonremovable_xid.
4659 *
4660 * Performing the check during RDT_WAIT_FOR_LOCAL_FLUSH is safe, as
4661 * invalidations are guaranteed to be processed before applying changes
4662 * from newly added tables while waiting for the local flush to reach
4663 * remote_lsn.
4664 *
4665 * Additionally, even if we check for subscription tables during
4666 * RDT_GET_CANDIDATE_XID, they might be dropped before reaching
4667 * RDT_WAIT_FOR_LOCAL_FLUSH. Therefore, it's still necessary to verify
4668 * subscription tables at this stage to prevent unnecessary tuple
4669 * retention.
4670 */
4672 {
4674
4675 now = rdt_data->last_recv_time
4676 ? rdt_data->last_recv_time : GetCurrentTimestamp();
4677
4678 /*
4679 * Record the time spent waiting for table sync, it is needed for the
4680 * timeout check in should_stop_conflict_info_retention().
4681 */
4682 rdt_data->table_sync_wait_time =
4684
4685 return;
4686 }
4687
4688 /*
4689 * We don't need to maintain oldest_nonremovable_xid if we decide to stop
4690 * retaining conflict information for this worker.
4691 */
4693 {
4695 return;
4696 }
4697
4698 /*
4699 * Update and check the remote flush position if we are applying changes
4700 * in a loop. This is done at most once per WalWriterDelay to avoid
4701 * performing costly operations in get_flush_position() too frequently
4702 * during change application.
4703 */
4704 if (last_flushpos < rdt_data->remote_lsn && rdt_data->last_recv_time &&
4706 rdt_data->last_recv_time, WalWriterDelay))
4707 {
4708 XLogRecPtr writepos;
4709 XLogRecPtr flushpos;
4710 bool have_pending_txes;
4711
4712 /* Fetch the latest remote flush position */
4713 get_flush_position(&writepos, &flushpos, &have_pending_txes);
4714
4715 if (flushpos > last_flushpos)
4716 last_flushpos = flushpos;
4717
4718 rdt_data->flushpos_update_time = rdt_data->last_recv_time;
4719 }
4720
4721 /* Return to wait for the changes to be applied */
4722 if (last_flushpos < rdt_data->remote_lsn)
4723 return;
4724
4725 /*
4726 * Reaching this point implies should_stop_conflict_info_retention()
4727 * returned false earlier, meaning that the most recent duration for
4728 * advancing the non-removable transaction ID is within the
4729 * max_retention_duration or max_retention_duration is set to 0.
4730 *
4731 * Therefore, if conflict info retention was previously stopped due to a
4732 * timeout, it is now safe to resume retention.
4733 */
4735 {
4737 return;
4738 }
4739
4740 /*
4741 * Reaching here means the remote WAL position has been received, and all
4742 * transactions up to that position on the publisher have been applied and
4743 * flushed locally. So, we can advance the non-removable transaction ID.
4744 */
4748
4749 elog(DEBUG2, "confirmed flush up to remote lsn %X/%08X: new oldest_nonremovable_xid %u",
4750 LSN_FORMAT_ARGS(rdt_data->remote_lsn),
4751 rdt_data->candidate_xid);
4752
4753 /* Notify launcher to update the xmin of the conflict slot */
4755
4757
4758 /* process the next phase */
4759 process_rdt_phase_transition(rdt_data, false);
4760}
4761
4762/*
4763 * Check whether conflict information retention should be stopped due to
4764 * exceeding the maximum wait time (max_retention_duration).
4765 *
4766 * If retention should be stopped, return true. Otherwise, return false.
4767 */
4768static bool
4770{
4772
4775 rdt_data->phase == RDT_WAIT_FOR_LOCAL_FLUSH);
4776
4778 return false;
4779
4780 /*
4781 * Use last_recv_time when applying changes in the loop to avoid
4782 * unnecessary system time retrieval. If last_recv_time is not available,
4783 * obtain the current timestamp.
4784 */
4785 now = rdt_data->last_recv_time ? rdt_data->last_recv_time : GetCurrentTimestamp();
4786
4787 /*
4788 * Return early if the wait time has not exceeded the configured maximum
4789 * (max_retention_duration). Time spent waiting for table synchronization
4790 * is excluded from this calculation, as it occurs infrequently.
4791 */
4794 rdt_data->table_sync_wait_time))
4795 return false;
4796
4797 return true;
4798}
4799
4800/*
4801 * Workhorse for the RDT_STOP_CONFLICT_INFO_RETENTION phase.
4802 */
4803static void
4805{
4806 /* Stop retention if not yet */
4808 {
4809 /*
4810 * If the retention status cannot be updated (e.g., due to active
4811 * transaction), skip further processing to avoid inconsistent
4812 * retention behavior.
4813 */
4814 if (!update_retention_status(false))
4815 return;
4816
4820
4821 ereport(LOG,
4822 errmsg("logical replication worker for subscription \"%s\" has stopped retaining the information for detecting conflicts",
4824 errdetail("Retention is stopped because the apply process has not caught up with the publisher within the configured max_retention_duration."));
4825 }
4826
4828
4829 /*
4830 * If retention has been stopped, reset to the initial phase to retry
4831 * resuming retention. This reset is required to recalculate the current
4832 * wait time and resume retention if the time falls within
4833 * max_retention_duration.
4834 */
4836}
4837
4838/*
4839 * Workhorse for the RDT_RESUME_CONFLICT_INFO_RETENTION phase.
4840 */
4841static void
4843{
4844 /* We can't resume retention without updating retention status. */
4845 if (!update_retention_status(true))
4846 return;
4847
4848 ereport(LOG,
4849 errmsg("logical replication worker for subscription \"%s\" will resume retaining the information for detecting conflicts",
4852 ? errdetail("Retention is re-enabled because the apply process has caught up with the publisher within the configured max_retention_duration.")
4853 : errdetail("Retention is re-enabled because max_retention_duration has been set to unlimited."));
4854
4855 /*
4856 * Restart the worker to let the launcher initialize
4857 * oldest_nonremovable_xid at startup.
4858 *
4859 * While it's technically possible to derive this value on-the-fly using
4860 * the conflict detection slot's xmin, doing so risks a race condition:
4861 * the launcher might clean slot.xmin just after retention resumes. This
4862 * would make oldest_nonremovable_xid unreliable, especially during xid
4863 * wraparound.
4864 *
4865 * Although this can be prevented by introducing heavy weight locking, the
4866 * complexity it will bring doesn't seem worthwhile given how rarely
4867 * retention is resumed.
4868 */
4870}
4871
4872/*
4873 * Updates pg_subscription.subretentionactive to the given value within a
4874 * new transaction.
4875 *
4876 * If already inside an active transaction, skips the update and returns
4877 * false.
4878 *
4879 * Returns true if the update is successfully performed.
4880 */
4881static bool
4883{
4884 /*
4885 * Do not update the catalog during an active transaction. The transaction
4886 * may be started during change application, leading to a possible
4887 * rollback of catalog updates if the application fails subsequently.
4888 */
4889 if (IsTransactionState())
4890 return false;
4891
4893
4894 /*
4895 * Updating pg_subscription might involve TOAST table access, so ensure we
4896 * have a valid snapshot.
4897 */
4899
4900 /* Update pg_subscription.subretentionactive */
4902
4905
4906 /* Notify launcher to update the conflict slot */
4908
4910
4911 return true;
4912}
4913
4914/*
4915 * Reset all data fields of RetainDeadTuplesData except those used to
4916 * determine the timing for the next round of transaction ID advancement. We
4917 * can even use flushpos_update_time in the next round to decide whether to get
4918 * the latest flush position.
4919 */
4920static void
4922{
4923 rdt_data->phase = RDT_GET_CANDIDATE_XID;
4924 rdt_data->remote_lsn = InvalidXLogRecPtr;
4927 rdt_data->reply_time = 0;
4930 rdt_data->table_sync_wait_time = 0;
4931}
4932
4933/*
4934 * Adjust the interval for advancing non-removable transaction IDs.
4935 *
4936 * If there is no activity on the node or retention has been stopped, we
4937 * progressively double the interval used to advance non-removable transaction
4938 * ID. This helps conserve CPU and network resources when there's little benefit
4939 * to frequent updates.
4940 *
4941 * The interval is capped by the lowest of the following:
4942 * - wal_receiver_status_interval (if set and retention is active),
4943 * - a default maximum of 3 minutes,
4944 * - max_retention_duration (if retention is active).
4945 *
4946 * This ensures the interval never exceeds the retention boundary, even if other
4947 * limits are higher. Once activity resumes on the node and the retention is
4948 * active, the interval is reset to lesser of 100ms and max_retention_duration,
4949 * allowing timely advancement of non-removable transaction ID.
4950 *
4951 * XXX The use of wal_receiver_status_interval is a bit arbitrary so we can
4952 * consider the other interval or a separate GUC if the need arises.
4953 */
4954static void
4956{
4957 if (rdt_data->xid_advance_interval && !new_xid_found)
4958 {
4959 int max_interval = wal_receiver_status_interval
4962
4963 /*
4964 * No new transaction ID has been assigned since the last check, so
4965 * double the interval, but not beyond the maximum allowable value.
4966 */
4967 rdt_data->xid_advance_interval = Min(rdt_data->xid_advance_interval * 2,
4968 max_interval);
4969 }
4970 else if (rdt_data->xid_advance_interval &&
4972 {
4973 /*
4974 * Retention has been stopped, so double the interval-capped at a
4975 * maximum of 3 minutes. The wal_receiver_status_interval is
4976 * intentionally not used as a upper bound, since the likelihood of
4977 * retention resuming is lower than that of general activity resuming.
4978 */
4979 rdt_data->xid_advance_interval = Min(rdt_data->xid_advance_interval * 2,
4981 }
4982 else
4983 {
4984 /*
4985 * A new transaction ID was found or the interval is not yet
4986 * initialized, so set the interval to the minimum value.
4987 */
4989 }
4990
4991 /*
4992 * Ensure the wait time remains within the maximum retention time limit
4993 * when retention is active.
4994 */
4996 rdt_data->xid_advance_interval = Min(rdt_data->xid_advance_interval,
4998}
4999
5000/*
5001 * Exit routine for apply workers due to subscription parameter changes.
5002 */
5003static void
5005{
5007 {
5008 /*
5009 * Don't stop the parallel apply worker as the leader will detect the
5010 * subscription parameter change and restart logical replication later
5011 * anyway. This also prevents the leader from reporting errors when
5012 * trying to communicate with a stopped parallel apply worker, which
5013 * would accidentally disable subscriptions if disable_on_error was
5014 * set.
5015 */
5016 return;
5017 }
5018
5019 /*
5020 * Reset the last-start time for this apply worker so that the launcher
5021 * will restart it without waiting for wal_retrieve_retry_interval if the
5022 * subscription is still active, and so that we won't leak that hash table
5023 * entry if it isn't.
5024 */
5027
5028 proc_exit(0);
5029}
5030
5031/*
5032 * Reread subscription info if needed.
5033 *
5034 * For significant changes, we react by exiting the current process; a new
5035 * one will be launched afterwards if needed.
5036 */
5037void
5039{
5040 MemoryContext oldctx;
5042 bool started_tx = false;
5043
5044 /* When cache state is valid there is nothing to do here. */
5046 return;
5047
5048 /* This function might be called inside or outside of transaction. */
5049 if (!IsTransactionState())
5050 {
5052 started_tx = true;
5053 }
5054
5055 /* Ensure allocations in permanent context. */
5057
5059
5060 /*
5061 * Exit if the subscription was removed. This normally should not happen
5062 * as the worker gets killed during DROP SUBSCRIPTION.
5063 */
5064 if (!newsub)
5065 {
5066 ereport(LOG,
5067 (errmsg("logical replication worker for subscription \"%s\" will stop because the subscription was removed",
5068 MySubscription->name)));
5069
5070 /* Ensure we remove no-longer-useful entry for worker's start time */
5073
5074 proc_exit(0);
5075 }
5076
5077 /* Exit if the subscription was disabled. */
5078 if (!newsub->enabled)
5079 {
5080 ereport(LOG,
5081 (errmsg("logical replication worker for subscription \"%s\" will stop because the subscription was disabled",
5082 MySubscription->name)));
5083
5085 }
5086
5087 /* !slotname should never happen when enabled is true. */
5088 Assert(newsub->slotname);
5089
5090 /* two-phase cannot be altered while the worker is running */
5091 Assert(newsub->twophasestate == MySubscription->twophasestate);
5092
5093 /*
5094 * Exit if any parameter that affects the remote connection was changed.
5095 * The launcher will start a new worker but note that the parallel apply
5096 * worker won't restart if the streaming option's value is changed from
5097 * 'parallel' to any other value or the server decides not to stream the
5098 * in-progress transaction.
5099 */
5100 if (strcmp(newsub->conninfo, MySubscription->conninfo) != 0 ||
5101 strcmp(newsub->name, MySubscription->name) != 0 ||
5102 strcmp(newsub->slotname, MySubscription->slotname) != 0 ||
5103 newsub->binary != MySubscription->binary ||
5104 newsub->stream != MySubscription->stream ||
5105 newsub->passwordrequired != MySubscription->passwordrequired ||
5106 strcmp(newsub->origin, MySubscription->origin) != 0 ||
5107 newsub->owner != MySubscription->owner ||
5108 !equal(newsub->publications, MySubscription->publications))
5109 {
5111 ereport(LOG,
5112 (errmsg("logical replication parallel apply worker for subscription \"%s\" will stop because of a parameter change",
5113 MySubscription->name)));
5114 else
5115 ereport(LOG,
5116 (errmsg("logical replication worker for subscription \"%s\" will restart because of a parameter change",
5117 MySubscription->name)));
5118
5120 }
5121
5122 /*
5123 * Exit if the subscription owner's superuser privileges have been
5124 * revoked.
5125 */
5126 if (!newsub->ownersuperuser && MySubscription->ownersuperuser)
5127 {
5129 ereport(LOG,
5130 errmsg("logical replication parallel apply worker for subscription \"%s\" will stop because the subscription owner's superuser privileges have been revoked",
5132 else
5133 ereport(LOG,
5134 errmsg("logical replication worker for subscription \"%s\" will restart because the subscription owner's superuser privileges have been revoked",
5136
5138 }
5139
5140 /* Check for other changes that should never happen too. */
5141 if (newsub->dbid != MySubscription->dbid)
5142 {
5143 elog(ERROR, "subscription %u changed unexpectedly",
5145 }
5146
5147 /* Clean old subscription info and switch to new one. */
5150
5151 MemoryContextSwitchTo(oldctx);
5152
5153 /* Change synchronous commit according to the user's wishes */
5154 SetConfigOption("synchronous_commit", MySubscription->synccommit,
5156
5157 if (started_tx)
5159
5160 MySubscriptionValid = true;
5161}
5162
5163/*
5164 * Callback from subscription syscache invalidation.
5165 */
5166static void
5167subscription_change_cb(Datum arg, int cacheid, uint32 hashvalue)
5168{
5169 MySubscriptionValid = false;
5170}
5171
5172/*
5173 * subxact_info_write
5174 * Store information about subxacts for a toplevel transaction.
5175 *
5176 * For each subxact we store offset of its first change in the main file.
5177 * The file is always over-written as a whole.
5178 *
5179 * XXX We should only store subxacts that were not aborted yet.
5180 */
5181static void
5183{
5184 char path[MAXPGPATH];
5185 Size len;
5186 BufFile *fd;
5187
5189
5190 /* construct the subxact filename */
5191 subxact_filename(path, subid, xid);
5192
5193 /* Delete the subxacts file, if exists. */
5194 if (subxact_data.nsubxacts == 0)
5195 {
5198
5199 return;
5200 }
5201
5202 /*
5203 * Create the subxact file if it not already created, otherwise open the
5204 * existing file.
5205 */
5207 true);
5208 if (fd == NULL)
5210
5212
5213 /* Write the subxact count and subxact info */
5216
5218
5219 /* free the memory allocated for subxact info */
5221}
5222
5223/*
5224 * subxact_info_read
5225 * Restore information about subxacts of a streamed transaction.
5226 *
5227 * Read information about subxacts into the structure subxact_data that can be
5228 * used later.
5229 */
5230static void
5232{
5233 char path[MAXPGPATH];
5234 Size len;
5235 BufFile *fd;
5236 MemoryContext oldctx;
5237
5241
5242 /*
5243 * If the subxact file doesn't exist that means we don't have any subxact
5244 * info.
5245 */
5246 subxact_filename(path, subid, xid);
5248 true);
5249 if (fd == NULL)
5250 return;
5251
5252 /* read number of subxact items */
5254
5256
5257 /* we keep the maximum as a power of 2 */
5259
5260 /*
5261 * Allocate subxact information in the logical streaming context. We need
5262 * this information during the complete stream so that we can add the sub
5263 * transaction info to this. On stream stop we will flush this information
5264 * to the subxact file and reset the logical streaming context.
5265 */
5268 sizeof(SubXactInfo));
5269 MemoryContextSwitchTo(oldctx);
5270
5271 if (len > 0)
5273
5275}
5276
5277/*
5278 * subxact_info_add
5279 * Add information about a subxact (offset in the main file).
5280 */
5281static void
5283{
5284 SubXactInfo *subxacts = subxact_data.subxacts;
5285 int64 i;
5286
5287 /* We must have a valid top level stream xid and a stream fd. */
5289 Assert(stream_fd != NULL);
5290
5291 /*
5292 * If the XID matches the toplevel transaction, we don't want to add it.
5293 */
5294 if (stream_xid == xid)
5295 return;
5296
5297 /*
5298 * In most cases we're checking the same subxact as we've already seen in
5299 * the last call, so make sure to ignore it (this change comes later).
5300 */
5301 if (subxact_data.subxact_last == xid)
5302 return;
5303
5304 /* OK, remember we're processing this XID. */
5306
5307 /*
5308 * Check if the transaction is already present in the array of subxact. We
5309 * intentionally scan the array from the tail, because we're likely adding
5310 * a change for the most recent subtransactions.
5311 *
5312 * XXX Can we rely on the subxact XIDs arriving in sorted order? That
5313 * would allow us to use binary search here.
5314 */
5315 for (i = subxact_data.nsubxacts; i > 0; i--)
5316 {
5317 /* found, so we're done */
5318 if (subxacts[i - 1].xid == xid)
5319 return;
5320 }
5321
5322 /* This is a new subxact, so we need to add it to the array. */
5323 if (subxact_data.nsubxacts == 0)
5324 {
5325 MemoryContext oldctx;
5326
5328
5329 /*
5330 * Allocate this memory for subxacts in per-stream context, see
5331 * subxact_info_read.
5332 */
5334 subxacts = palloc(subxact_data.nsubxacts_max * sizeof(SubXactInfo));
5335 MemoryContextSwitchTo(oldctx);
5336 }
5338 {
5340 subxacts = repalloc(subxacts,
5342 }
5343
5344 subxacts[subxact_data.nsubxacts].xid = xid;
5345
5346 /*
5347 * Get the current offset of the stream file and store it as offset of
5348 * this subxact.
5349 */
5351 &subxacts[subxact_data.nsubxacts].fileno,
5352 &subxacts[subxact_data.nsubxacts].offset);
5353
5355 subxact_data.subxacts = subxacts;
5356}
5357
5358/* format filename for file containing the info about subxacts */
5359static inline void
5360subxact_filename(char *path, Oid subid, TransactionId xid)
5361{
5362 snprintf(path, MAXPGPATH, "%u-%u.subxacts", subid, xid);
5363}
5364
5365/* format filename for file containing serialized changes */
5366static inline void
5367changes_filename(char *path, Oid subid, TransactionId xid)
5368{
5369 snprintf(path, MAXPGPATH, "%u-%u.changes", subid, xid);
5370}
5371
5372/*
5373 * stream_cleanup_files
5374 * Cleanup files for a subscription / toplevel transaction.
5375 *
5376 * Remove files with serialized changes and subxact info for a particular
5377 * toplevel transaction. Each subscription has a separate set of files
5378 * for any toplevel transaction.
5379 */
5380void
5382{
5383 char path[MAXPGPATH];
5384
5385 /* Delete the changes file. */
5386 changes_filename(path, subid, xid);
5388
5389 /* Delete the subxact file, if it exists. */
5390 subxact_filename(path, subid, xid);
5392}
5393
5394/*
5395 * stream_open_file
5396 * Open a file that we'll use to serialize changes for a toplevel
5397 * transaction.
5398 *
5399 * Open a file for streamed changes from a toplevel transaction identified
5400 * by stream_xid (global variable). If it's the first chunk of streamed
5401 * changes for this transaction, create the buffile, otherwise open the
5402 * previously created file.
5403 */
5404static void
5405stream_open_file(Oid subid, TransactionId xid, bool first_segment)
5406{
5407 char path[MAXPGPATH];
5408 MemoryContext oldcxt;
5409
5410 Assert(OidIsValid(subid));
5412 Assert(stream_fd == NULL);
5413
5414
5415 changes_filename(path, subid, xid);
5416 elog(DEBUG1, "opening file \"%s\" for streamed changes", path);
5417
5418 /*
5419 * Create/open the buffiles under the logical streaming context so that we
5420 * have those files until stream stop.
5421 */
5423
5424 /*
5425 * If this is the first streamed segment, create the changes file.
5426 * Otherwise, just open the file for writing, in append mode.
5427 */
5428 if (first_segment)
5430 path);
5431 else
5432 {
5433 /*
5434 * Open the file and seek to the end of the file because we always
5435 * append the changes file.
5436 */
5438 path, O_RDWR, false);
5439 BufFileSeek(stream_fd, 0, 0, SEEK_END);
5440 }
5441
5442 MemoryContextSwitchTo(oldcxt);
5443}
5444
5445/*
5446 * stream_close_file
5447 * Close the currently open file with streamed changes.
5448 */
5449static void
5451{
5452 Assert(stream_fd != NULL);
5453
5455
5456 stream_fd = NULL;
5457}
5458
5459/*
5460 * stream_write_change
5461 * Serialize a change to a file for the current toplevel transaction.
5462 *
5463 * The change is serialized in a simple format, with length (not including
5464 * the length), action code (identifying the message type) and message
5465 * contents (without the subxact TransactionId value).
5466 */
5467static void
5469{
5470 int len;
5471
5472 Assert(stream_fd != NULL);
5473
5474 /* total on-disk size, including the action type character */
5475 len = (s->len - s->cursor) + sizeof(char);
5476
5477 /* first write the size */
5478 BufFileWrite(stream_fd, &len, sizeof(len));
5479
5480 /* then the action */
5481 BufFileWrite(stream_fd, &action, sizeof(action));
5482
5483 /* and finally the remaining part of the buffer (after the XID) */
5484 len = (s->len - s->cursor);
5485
5487}
5488
5489/*
5490 * stream_open_and_write_change
5491 * Serialize a message to a file for the given transaction.
5492 *
5493 * This function is similar to stream_write_change except that it will open the
5494 * target file if not already before writing the message and close the file at
5495 * the end.
5496 */
5497static void
5499{
5501
5502 if (!stream_fd)
5503 stream_start_internal(xid, false);
5504
5507}
5508
5509/*
5510 * Sets streaming options including replication slot name and origin start
5511 * position. Workers need these options for logical replication.
5512 */
5513void
5515 char *slotname,
5516 XLogRecPtr *origin_startpos)
5517{
5518 int server_version;
5519
5520 options->logical = true;
5521 options->startpoint = *origin_startpos;
5522 options->slotname = slotname;
5523
5525 options->proto.logical.proto_version =
5530
5531 options->proto.logical.publication_names = MySubscription->publications;
5532 options->proto.logical.binary = MySubscription->binary;
5533
5534 /*
5535 * Assign the appropriate option value for streaming option according to
5536 * the 'streaming' mode and the publisher's ability to support that mode.
5537 */
5538 if (server_version >= 160000 &&
5539 MySubscription->stream == LOGICALREP_STREAM_PARALLEL)
5540 {
5541 options->proto.logical.streaming_str = "parallel";
5543 }
5544 else if (server_version >= 140000 &&
5545 MySubscription->stream != LOGICALREP_STREAM_OFF)
5546 {
5547 options->proto.logical.streaming_str = "on";
5549 }
5550 else
5551 {
5552 options->proto.logical.streaming_str = NULL;
5554 }
5555
5556 options->proto.logical.twophase = false;
5557 options->proto.logical.origin = pstrdup(MySubscription->origin);
5558}
5559
5560/*
5561 * Cleanup the memory for subxacts and reset the related variables.
5562 */
5563static inline void
5565{
5568
5569 subxact_data.subxacts = NULL;
5573}
5574
5575/*
5576 * Common function to run the apply loop with error handling. Disable the
5577 * subscription, if necessary.
5578 *
5579 * Note that we don't handle FATAL errors which are probably because
5580 * of system resource error and are not repeatable.
5581 */
5582void
5583start_apply(XLogRecPtr origin_startpos)
5584{
5585 PG_TRY();
5586 {
5587 LogicalRepApplyLoop(origin_startpos);
5588 }
5589 PG_CATCH();
5590 {
5591 /*
5592 * Reset the origin state to prevent the advancement of origin
5593 * progress if we fail to apply. Otherwise, this will result in
5594 * transaction loss as that transaction won't be sent again by the
5595 * server.
5596 */
5597 replorigin_reset(0, (Datum) 0);
5598
5601 else
5602 {
5603 /*
5604 * Report the worker failed while applying changes. Abort the
5605 * current transaction so that the stats message is sent in an
5606 * idle state.
5607 */
5611
5612 PG_RE_THROW();
5613 }
5614 }
5615 PG_END_TRY();
5616}
5617
5618/*
5619 * Runs the leader apply worker.
5620 *
5621 * It sets up replication origin, streaming options and then starts streaming.
5622 */
5623static void
5625{
5626 char originname[NAMEDATALEN];
5627 XLogRecPtr origin_startpos = InvalidXLogRecPtr;
5628 char *slotname = NULL;
5630 RepOriginId originid;
5631 TimeLineID startpointTLI;
5632 char *err;
5633 bool must_use_password;
5634
5635 slotname = MySubscription->slotname;
5636
5637 /*
5638 * This shouldn't happen if the subscription is enabled, but guard against
5639 * DDL bugs or manual catalog changes. (libpqwalreceiver will crash if
5640 * slot is NULL.)
5641 */
5642 if (!slotname)
5643 ereport(ERROR,
5644 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
5645 errmsg("subscription has no replication slot set")));
5646
5647 /* Setup replication origin tracking. */
5649 originname, sizeof(originname));
5651 originid = replorigin_by_name(originname, true);
5652 if (!OidIsValid(originid))
5653 originid = replorigin_create(originname);
5654 replorigin_session_setup(originid, 0);
5655 replorigin_session_origin = originid;
5656 origin_startpos = replorigin_session_get_progress(false);
5658
5659 /* Is the use of a password mandatory? */
5660 must_use_password = MySubscription->passwordrequired &&
5662
5664 true, must_use_password,
5666
5667 if (LogRepWorkerWalRcvConn == NULL)
5668 ereport(ERROR,
5669 (errcode(ERRCODE_CONNECTION_FAILURE),
5670 errmsg("apply worker for subscription \"%s\" could not connect to the publisher: %s",
5671 MySubscription->name, err)));
5672
5673 /*
5674 * We don't really use the output identify_system for anything but it does
5675 * some initializations on the upstream so let's still call it.
5676 */
5677 (void) walrcv_identify_system(LogRepWorkerWalRcvConn, &startpointTLI);
5678
5680
5681 set_stream_options(&options, slotname, &origin_startpos);
5682
5683 /*
5684 * Even when the two_phase mode is requested by the user, it remains as
5685 * the tri-state PENDING until all tablesyncs have reached READY state.
5686 * Only then, can it become ENABLED.
5687 *
5688 * Note: If the subscription has no tables then leave the state as
5689 * PENDING, which allows ALTER SUBSCRIPTION ... REFRESH PUBLICATION to
5690 * work.
5691 */
5692 if (MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
5694 {
5695 /* Start streaming with two_phase enabled */
5696 options.proto.logical.twophase = true;
5698
5700
5701 /*
5702 * Updating pg_subscription might involve TOAST table access, so
5703 * ensure we have a valid snapshot.
5704 */
5706
5707 UpdateTwoPhaseState(MySubscription->oid, LOGICALREP_TWOPHASE_STATE_ENABLED);
5708 MySubscription->twophasestate = LOGICALREP_TWOPHASE_STATE_ENABLED;
5711 }
5712 else
5713 {
5715 }
5716
5718 (errmsg_internal("logical replication apply worker for subscription \"%s\" two_phase is %s",
5720 MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_DISABLED ? "DISABLED" :
5721 MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING ? "PENDING" :
5722 MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED ? "ENABLED" :
5723 "?")));
5724
5725 /* Run the main loop. */
5726 start_apply(origin_startpos);
5727}
5728
5729/*
5730 * Common initialization for leader apply worker, parallel apply worker,
5731 * tablesync worker and sequencesync worker.
5732 *
5733 * Initialize the database connection, in-memory subscription and necessary
5734 * config options.
5735 */
5736void
5738{
5739 MemoryContext oldctx;
5740
5741 /* Run as replica session replication role. */
5742 SetConfigOption("session_replication_role", "replica",
5744
5745 /* Connect to our database. */
5748 0);
5749
5750 /*
5751 * Set always-secure search path, so malicious users can't redirect user
5752 * code (e.g. pg_index.indexprs).
5753 */
5754 SetConfigOption("search_path", "", PGC_SUSET, PGC_S_OVERRIDE);
5755
5756 /* Load the subscription into persistent memory context. */
5758 "ApplyContext",
5762
5763 /*
5764 * Lock the subscription to prevent it from being concurrently dropped,
5765 * then re-verify its existence. After the initialization, the worker will
5766 * be terminated gracefully if the subscription is dropped.
5767 */
5768 LockSharedObject(SubscriptionRelationId, MyLogicalRepWorker->subid, 0,
5771 if (!MySubscription)
5772 {
5773 ereport(LOG,
5774 (errmsg("logical replication worker for subscription %u will not start because the subscription was removed during startup",
5776
5777 /* Ensure we remove no-longer-useful entry for worker's start time */
5780
5781 proc_exit(0);
5782 }
5783
5784 MySubscriptionValid = true;
5785 MemoryContextSwitchTo(oldctx);
5786
5787 if (!MySubscription->enabled)
5788 {
5789 ereport(LOG,
5790 (errmsg("logical replication worker for subscription \"%s\" will not start because the subscription was disabled during startup",
5791 MySubscription->name)));
5792
5794 }
5795
5796 /*
5797 * Restart the worker if retain_dead_tuples was enabled during startup.
5798 *
5799 * At this point, the replication slot used for conflict detection might
5800 * not exist yet, or could be dropped soon if the launcher perceives
5801 * retain_dead_tuples as disabled. To avoid unnecessary tracking of
5802 * oldest_nonremovable_xid when the slot is absent or at risk of being
5803 * dropped, a restart is initiated.
5804 *
5805 * The oldest_nonremovable_xid should be initialized only when the
5806 * subscription's retention is active before launching the worker. See
5807 * logicalrep_worker_launch.
5808 */
5809 if (am_leader_apply_worker() &&
5813 {
5814 ereport(LOG,
5815 errmsg("logical replication worker for subscription \"%s\" will restart because the option %s was enabled during startup",
5816 MySubscription->name, "retain_dead_tuples"));
5817
5819 }
5820
5821 /* Setup synchronous commit according to the user's wishes */
5822 SetConfigOption("synchronous_commit", MySubscription->synccommit,
5824
5825 /*
5826 * Keep us informed about subscription or role changes. Note that the
5827 * role's superuser privilege can be revoked.
5828 */
5829 CacheRegisterSyscacheCallback(SUBSCRIPTIONOID,
5831 (Datum) 0);
5832
5835 (Datum) 0);
5836
5837 if (am_tablesync_worker())
5838 ereport(LOG,
5839 errmsg("logical replication table synchronization worker for subscription \"%s\", table \"%s\" has started",
5842 else if (am_sequencesync_worker())
5843 ereport(LOG,
5844 errmsg("logical replication sequence synchronization worker for subscription \"%s\" has started",
5846 else
5847 ereport(LOG,
5848 errmsg("logical replication apply worker for subscription \"%s\" has started",
5850
5852}
5853
5854/*
5855 * Reset the origin state.
5856 */
5857static void
5859{
5863}
5864
5865/*
5866 * Common function to setup the leader apply, tablesync and sequencesync worker.
5867 */
5868void
5870{
5871 /* Attach to slot */
5872 logicalrep_worker_attach(worker_slot);
5873
5875
5876 /* Setup signal handling */
5878 pqsignal(SIGTERM, die);
5880
5881 /*
5882 * We don't currently need any ResourceOwner in a walreceiver process, but
5883 * if we did, we could call CreateAuxProcessResourceOwner here.
5884 */
5885
5886 /* Initialise stats to a sanish value */
5889
5890 /* Load the libpq-specific functions */
5891 load_file("libpqwalreceiver", false);
5892
5894
5895 /*
5896 * Register a callback to reset the origin state before aborting any
5897 * pending transaction during shutdown (see ShutdownPostgres()). This will
5898 * avoid origin advancement for an in-complete transaction which could
5899 * otherwise lead to its loss as such a transaction won't be sent by the
5900 * server again.
5901 *
5902 * Note that even a LOG or DEBUG statement placed after setting the origin
5903 * state may process a shutdown signal before committing the current apply
5904 * operation. So, it is important to register such a callback here.
5905 */
5907
5908 /* Connect to the origin and start the replication. */
5909 elog(DEBUG1, "connecting to publisher using connection string \"%s\"",
5911
5912 /*
5913 * Setup callback for syscache so that we know when something changes in
5914 * the subscription relation state.
5915 */
5916 CacheRegisterSyscacheCallback(SUBSCRIPTIONRELMAP,
5918 (Datum) 0);
5919}
5920
5921/* Logical Replication Apply worker entry point */
5922void
5924{
5925 int worker_slot = DatumGetInt32(main_arg);
5926
5928
5929 SetupApplyOrSyncWorker(worker_slot);
5930
5932
5934
5935 proc_exit(0);
5936}
5937
5938/*
5939 * After error recovery, disable the subscription in a new transaction
5940 * and exit cleanly.
5941 */
5942void
5944{
5945 /*
5946 * Emit the error message, and recover from the error state to an idle
5947 * state
5948 */
5950
5954
5956
5957 /*
5958 * Report the worker failed during sequence synchronization, table
5959 * synchronization, or apply.
5960 */
5963
5964 /* Disable the subscription */
5966
5967 /*
5968 * Updating pg_subscription might involve TOAST table access, so ensure we
5969 * have a valid snapshot.
5970 */
5972
5976
5977 /* Ensure we remove no-longer-useful entry for worker's start time */
5980
5981 /* Notify the subscription has been disabled and exit */
5982 ereport(LOG,
5983 errmsg("subscription \"%s\" has been disabled because of an error",
5985
5986 /*
5987 * Skip the track_commit_timestamp check when disabling the worker due to
5988 * an error, as verifying commit timestamps is unnecessary in this
5989 * context.
5990 */
5994
5995 proc_exit(0);
5996}
5997
5998/*
5999 * Is current process a logical replication worker?
6000 */
6001bool
6003{
6004 return MyLogicalRepWorker != NULL;
6005}
6006
6007/*
6008 * Is current process a logical replication parallel apply worker?
6009 */
6010bool
6012{
6014}
6015
6016/*
6017 * Start skipping changes of the transaction if the given LSN matches the
6018 * LSN specified by subscription's skiplsn.
6019 */
6020static void
6022{
6026
6027 /*
6028 * Quick return if it's not requested to skip this transaction. This
6029 * function is called for every remote transaction and we assume that
6030 * skipping the transaction is not used often.
6031 */
6033 MySubscription->skiplsn != finish_lsn))
6034 return;
6035
6036 /* Start skipping all changes of this transaction */
6037 skip_xact_finish_lsn = finish_lsn;
6038
6039 ereport(LOG,
6040 errmsg("logical replication starts skipping transaction at LSN %X/%08X",
6042}
6043
6044/*
6045 * Stop skipping changes by resetting skip_xact_finish_lsn if enabled.
6046 */
6047static void
6049{
6050 if (!is_skipping_changes())
6051 return;
6052
6053 ereport(LOG,
6054 errmsg("logical replication completed skipping transaction at LSN %X/%08X",
6056
6057 /* Stop skipping changes */
6059}
6060
6061/*
6062 * Clear subskiplsn of pg_subscription catalog.
6063 *
6064 * finish_lsn is the transaction's finish LSN that is used to check if the
6065 * subskiplsn matches it. If not matched, we raise a warning when clearing the
6066 * subskiplsn in order to inform users for cases e.g., where the user mistakenly
6067 * specified the wrong subskiplsn.
6068 */
6069static void
6071{
6072 Relation rel;
6073 Form_pg_subscription subform;
6074 HeapTuple tup;
6075 XLogRecPtr myskiplsn = MySubscription->skiplsn;
6076 bool started_tx = false;
6077
6079 return;
6080
6081 if (!IsTransactionState())
6082 {
6084 started_tx = true;
6085 }
6086
6087 /*
6088 * Updating pg_subscription might involve TOAST table access, so ensure we
6089 * have a valid snapshot.
6090 */
6092
6093 /*
6094 * Protect subskiplsn of pg_subscription from being concurrently updated
6095 * while clearing it.
6096 */
6097 LockSharedObject(SubscriptionRelationId, MySubscription->oid, 0,
6099
6100 rel = table_open(SubscriptionRelationId, RowExclusiveLock);
6101
6102 /* Fetch the existing tuple. */
6103 tup = SearchSysCacheCopy1(SUBSCRIPTIONOID,
6105
6106 if (!HeapTupleIsValid(tup))
6107 elog(ERROR, "subscription \"%s\" does not exist", MySubscription->name);
6108
6109 subform = (Form_pg_subscription) GETSTRUCT(tup);
6110
6111 /*
6112 * Clear the subskiplsn. If the user has already changed subskiplsn before
6113 * clearing it we don't update the catalog and the replication origin
6114 * state won't get advanced. So in the worst case, if the server crashes
6115 * before sending an acknowledgment of the flush position the transaction
6116 * will be sent again and the user needs to set subskiplsn again. We can
6117 * reduce the possibility by logging a replication origin WAL record to
6118 * advance the origin LSN instead but there is no way to advance the
6119 * origin timestamp and it doesn't seem to be worth doing anything about
6120 * it since it's a very rare case.
6121 */
6122 if (subform->subskiplsn == myskiplsn)
6123 {
6124 bool nulls[Natts_pg_subscription];
6125 bool replaces[Natts_pg_subscription];
6126 Datum values[Natts_pg_subscription];
6127
6128 memset(values, 0, sizeof(values));
6129 memset(nulls, false, sizeof(nulls));
6130 memset(replaces, false, sizeof(replaces));
6131
6132 /* reset subskiplsn */
6133 values[Anum_pg_subscription_subskiplsn - 1] = LSNGetDatum(InvalidXLogRecPtr);
6134 replaces[Anum_pg_subscription_subskiplsn - 1] = true;
6135
6136 tup = heap_modify_tuple(tup, RelationGetDescr(rel), values, nulls,
6137 replaces);
6138 CatalogTupleUpdate(rel, &tup->t_self, tup);
6139
6140 if (myskiplsn != finish_lsn)
6142 errmsg("skip-LSN of subscription \"%s\" cleared", MySubscription->name),
6143 errdetail("Remote transaction's finish WAL location (LSN) %X/%08X did not match skip-LSN %X/%08X.",
6144 LSN_FORMAT_ARGS(finish_lsn),
6145 LSN_FORMAT_ARGS(myskiplsn)));
6146 }
6147
6148 heap_freetuple(tup);
6149 table_close(rel, NoLock);
6150
6152
6153 if (started_tx)
6155}
6156
6157/* Error callback to give more context info about the change being applied */
6158void
6160{
6162
6164 return;
6165
6166 Assert(errarg->origin_name);
6167
6168 if (errarg->rel == NULL)
6169 {
6170 if (!TransactionIdIsValid(errarg->remote_xid))
6171 errcontext("processing remote data for replication origin \"%s\" during message type \"%s\"",
6172 errarg->origin_name,
6174 else if (!XLogRecPtrIsValid(errarg->finish_lsn))
6175 errcontext("processing remote data for replication origin \"%s\" during message type \"%s\" in transaction %u",
6176 errarg->origin_name,
6178 errarg->remote_xid);
6179 else
6180 errcontext("processing remote data for replication origin \"%s\" during message type \"%s\" in transaction %u, finished at %X/%08X",
6181 errarg->origin_name,
6183 errarg->remote_xid,
6184 LSN_FORMAT_ARGS(errarg->finish_lsn));
6185 }
6186 else
6187 {
6188 if (errarg->remote_attnum < 0)
6189 {
6190 if (!XLogRecPtrIsValid(errarg->finish_lsn))
6191 errcontext("processing remote data for replication origin \"%s\" during message type \"%s\" for replication target relation \"%s.%s\" in transaction %u",
6192 errarg->origin_name,
6194 errarg->rel->remoterel.nspname,
6195 errarg->rel->remoterel.relname,
6196 errarg->remote_xid);
6197 else
6198 errcontext("processing remote data for replication origin \"%s\" during message type \"%s\" for replication target relation \"%s.%s\" in transaction %u, finished at %X/%08X",
6199 errarg->origin_name,
6201 errarg->rel->remoterel.nspname,
6202 errarg->rel->remoterel.relname,
6203 errarg->remote_xid,
6204 LSN_FORMAT_ARGS(errarg->finish_lsn));
6205 }
6206 else
6207 {
6208 if (!XLogRecPtrIsValid(errarg->finish_lsn))
6209 errcontext("processing remote data for replication origin \"%s\" during message type \"%s\" for replication target relation \"%s.%s\" column \"%s\" in transaction %u",
6210 errarg->origin_name,
6212 errarg->rel->remoterel.nspname,
6213 errarg->rel->remoterel.relname,
6214 errarg->rel->remoterel.attnames[errarg->remote_attnum],
6215 errarg->remote_xid);
6216 else
6217 errcontext("processing remote data for replication origin \"%s\" during message type \"%s\" for replication target relation \"%s.%s\" column \"%s\" in transaction %u, finished at %X/%08X",
6218 errarg->origin_name,
6220 errarg->rel->remoterel.nspname,
6221 errarg->rel->remoterel.relname,
6222 errarg->rel->remoterel.attnames[errarg->remote_attnum],
6223 errarg->remote_xid,
6224 LSN_FORMAT_ARGS(errarg->finish_lsn));
6225 }
6226 }
6227}
6228
6229/* Set transaction information of apply error callback */
6230static inline void
6232{
6235}
6236
6237/* Reset all information of apply error callback */
6238static inline void
6240{
6245}
6246
6247/*
6248 * Request wakeup of the workers for the given subscription OID
6249 * at commit of the current transaction.
6250 *
6251 * This is used to ensure that the workers process assorted changes
6252 * as soon as possible.
6253 */
6254void
6256{
6257 MemoryContext oldcxt;
6258
6262 MemoryContextSwitchTo(oldcxt);
6263}
6264
6265/*
6266 * Wake up the workers of any subscriptions that were changed in this xact.
6267 */
6268void
6270{
6271 if (isCommit && on_commit_wakeup_workers_subids != NIL)
6272 {
6273 ListCell *lc;
6274
6275 LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
6277 {
6278 Oid subid = lfirst_oid(lc);
6279 List *workers;
6280 ListCell *lc2;
6281
6282 workers = logicalrep_workers_find(subid, true, false);
6283 foreach(lc2, workers)
6284 {
6285 LogicalRepWorker *worker = (LogicalRepWorker *) lfirst(lc2);
6286
6288 }
6289 }
6290 LWLockRelease(LogicalRepWorkerLock);
6291 }
6292
6293 /* The List storage will be reclaimed automatically in xact cleanup. */
6295}
6296
6297/*
6298 * Allocate the origin name in long-lived context for error context message.
6299 */
6300void
6302{
6304 originname);
6305}
6306
6307/*
6308 * Return the action to be taken for the given transaction. See
6309 * TransApplyAction for information on each of the actions.
6310 *
6311 * *winfo is assigned to the destination parallel worker info when the leader
6312 * apply worker has to pass all the transaction's changes to the parallel
6313 * apply worker.
6314 */
6315static TransApplyAction
6317{
6318 *winfo = NULL;
6319
6321 {
6322 return TRANS_PARALLEL_APPLY;
6323 }
6324
6325 /*
6326 * If we are processing this transaction using a parallel apply worker
6327 * then either we send the changes to the parallel worker or if the worker
6328 * is busy then serialize the changes to the file which will later be
6329 * processed by the parallel worker.
6330 */
6331 *winfo = pa_find_worker(xid);
6332
6333 if (*winfo && (*winfo)->serialize_changes)
6334 {
6336 }
6337 else if (*winfo)
6338 {
6340 }
6341
6342 /*
6343 * If there is no parallel worker involved to process this transaction
6344 * then we either directly apply the change or serialize it to a file
6345 * which will later be applied when the transaction finish message is
6346 * processed.
6347 */
6348 else if (in_streamed_transaction)
6349 {
6351 }
6352 else
6353 {
6354 return TRANS_LEADER_APPLY;
6355 }
6356}
AclResult
Definition: acl.h:182
@ ACLCHECK_OK
Definition: acl.h:183
void aclcheck_error(AclResult aclerr, ObjectType objtype, const char *objectname)
Definition: aclchk.c:2652
AclResult pg_class_aclcheck(Oid table_oid, Oid roleid, AclMode mode)
Definition: aclchk.c:4037
void pa_set_xact_state(ParallelApplyWorkerShared *wshared, ParallelTransState xact_state)
void pa_unlock_stream(TransactionId xid, LOCKMODE lockmode)
void pa_stream_abort(LogicalRepStreamAbortData *abort_data)
void pa_lock_stream(TransactionId xid, LOCKMODE lockmode)
void pa_set_fileset_state(ParallelApplyWorkerShared *wshared, PartialFileSetState fileset_state)
void pa_reset_subtrans(void)
void pa_lock_transaction(TransactionId xid, LOCKMODE lockmode)
ParallelApplyWorkerShared * MyParallelShared
void pa_start_subtrans(TransactionId current_xid, TransactionId top_xid)
void pa_switch_to_partial_serialize(ParallelApplyWorkerInfo *winfo, bool stream_locked)
void pa_xact_finish(ParallelApplyWorkerInfo *winfo, XLogRecPtr remote_lsn)
bool pa_send_data(ParallelApplyWorkerInfo *winfo, Size nbytes, const void *data)
void pa_allocate_worker(TransactionId xid)
void pa_set_stream_apply_worker(ParallelApplyWorkerInfo *winfo)
ParallelApplyWorkerInfo * pa_find_worker(TransactionId xid)
void pa_unlock_transaction(TransactionId xid, LOCKMODE lockmode)
void pa_decr_and_wait_stream_block(void)
static uint32 pg_atomic_add_fetch_u32(volatile pg_atomic_uint32 *ptr, int32 add_)
Definition: atomics.h:422
static void check_relation_updatable(LogicalRepRelMapEntry *rel)
Definition: worker.c:2749
static void subxact_filename(char *path, Oid subid, TransactionId xid)
Definition: worker.c:5360
static void begin_replication_step(void)
Definition: worker.c:726
static void end_replication_step(void)
Definition: worker.c:749
static ApplyExecutionData * create_edata_for_relation(LogicalRepRelMapEntry *rel)
Definition: worker.c:870
static void cleanup_subxact_info(void)
Definition: worker.c:5564
void set_stream_options(WalRcvStreamOptions *options, char *slotname, XLogRecPtr *origin_startpos)
Definition: worker.c:5514
static void apply_handle_stream_prepare(StringInfo s)
Definition: worker.c:1518
static void apply_handle_insert_internal(ApplyExecutionData *edata, ResultRelInfo *relinfo, TupleTableSlot *remoteslot)
Definition: worker.c:2724
static void subxact_info_add(TransactionId xid)
Definition: worker.c:5282
static bool should_stop_conflict_info_retention(RetainDeadTuplesData *rdt_data)
Definition: worker.c:4769
static XLogRecPtr last_flushpos
Definition: worker.c:527
void stream_cleanup_files(Oid subid, TransactionId xid)
Definition: worker.c:5381
MemoryContext ApplyMessageContext
Definition: worker.c:471
static bool should_apply_changes_for_rel(LogicalRepRelMapEntry *rel)
Definition: worker.c:681
static void apply_handle_type(StringInfo s)
Definition: worker.c:2586
static bool can_advance_nonremovable_xid(RetainDeadTuplesData *rdt_data)
Definition: worker.c:4401
static void wait_for_local_flush(RetainDeadTuplesData *rdt_data)
Definition: worker.c:4614
static void apply_handle_truncate(StringInfo s)
Definition: worker.c:3647
RetainDeadTuplesPhase
Definition: worker.c:388
@ RDT_WAIT_FOR_PUBLISHER_STATUS
Definition: worker.c:391
@ RDT_RESUME_CONFLICT_INFO_RETENTION
Definition: worker.c:394
@ RDT_GET_CANDIDATE_XID
Definition: worker.c:389
@ RDT_REQUEST_PUBLISHER_STATUS
Definition: worker.c:390
@ RDT_WAIT_FOR_LOCAL_FLUSH
Definition: worker.c:392
@ RDT_STOP_CONFLICT_INFO_RETENTION
Definition: worker.c:393
static void UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply)
Definition: worker.c:3965
static void get_candidate_xid(RetainDeadTuplesData *rdt_data)
Definition: worker.c:4453
static void subscription_change_cb(Datum arg, int cacheid, uint32 hashvalue)
Definition: worker.c:5167
static TransApplyAction get_transaction_apply_action(TransactionId xid, ParallelApplyWorkerInfo **winfo)
Definition: worker.c:6316
TransApplyAction
Definition: worker.c:370
@ TRANS_LEADER_SERIALIZE
Definition: worker.c:375
@ TRANS_PARALLEL_APPLY
Definition: worker.c:378
@ TRANS_LEADER_SEND_TO_PARALLEL
Definition: worker.c:376
@ TRANS_LEADER_APPLY
Definition: worker.c:372
@ TRANS_LEADER_PARTIAL_SERIALIZE
Definition: worker.c:377
static bool handle_streamed_transaction(LogicalRepMsgType action, StringInfo s)
Definition: worker.c:777
static void stream_open_and_write_change(TransactionId xid, char action, StringInfo s)
Definition: worker.c:5498
struct ApplyExecutionData ApplyExecutionData
static void changes_filename(char *path, Oid subid, TransactionId xid)
Definition: worker.c:5367
bool InitializingApplyWorker
Definition: worker.c:499
static void apply_worker_exit(void)
Definition: worker.c:5004
static BufFile * stream_fd
Definition: worker.c:520
static void apply_handle_update(StringInfo s)
Definition: worker.c:2790
struct RetainDeadTuplesData RetainDeadTuplesData
void stream_stop_internal(TransactionId xid)
Definition: worker.c:1862
static void apply_handle_stream_commit(StringInfo s)
Definition: worker.c:2390
void start_apply(XLogRecPtr origin_startpos)
Definition: worker.c:5583
static void stop_skipping_changes(void)
Definition: worker.c:6048
struct ApplySubXactData ApplySubXactData
#define NAPTIME_PER_CYCLE
Definition: worker.c:299
static bool FindReplTupleInLocalRel(ApplyExecutionData *edata, Relation localrel, LogicalRepRelation *remoterel, Oid localidxoid, TupleTableSlot *remoteslot, TupleTableSlot **localslot)
Definition: worker.c:3174
static void get_flush_position(XLogRecPtr *write, XLogRecPtr *flush, bool *have_pending_txes)
Definition: worker.c:3895
static bool update_retention_status(bool active)
Definition: worker.c:4882
static uint32 parallel_stream_nchanges
Definition: worker.c:496
static void apply_handle_commit_prepared(StringInfo s)
Definition: worker.c:1405
static void LogicalRepApplyLoop(XLogRecPtr last_received)
Definition: worker.c:3981
void LogicalRepWorkersWakeupAtCommit(Oid subid)
Definition: worker.c:6255
#define MAX_XID_ADVANCE_INTERVAL
Definition: worker.c:456
bool IsLogicalWorker(void)
Definition: worker.c:6002
static ApplySubXactData subxact_data
Definition: worker.c:545
static void apply_handle_tuple_routing(ApplyExecutionData *edata, TupleTableSlot *remoteslot, LogicalRepTupleData *newtup, CmdType operation)
Definition: worker.c:3351
static ApplyErrorCallbackArg apply_error_callback_arg
Definition: worker.c:459
bool in_remote_transaction
Definition: worker.c:484
static XLogRecPtr skip_xact_finish_lsn
Definition: worker.c:516
static void stream_open_file(Oid subid, TransactionId xid, bool first_segment)
Definition: worker.c:5405
static void apply_handle_delete(StringInfo s)
Definition: worker.c:3012
void apply_dispatch(StringInfo s)
Definition: worker.c:3775
static void adjust_xid_advance_interval(RetainDeadTuplesData *rdt_data, bool new_xid_found)
Definition: worker.c:4955
#define is_skipping_changes()
Definition: worker.c:517
static void stream_write_change(char action, StringInfo s)
Definition: worker.c:5468
static void clear_subscription_skip_lsn(XLogRecPtr finish_lsn)
Definition: worker.c:6070
static void replorigin_reset(int code, Datum arg)
Definition: worker.c:5858
static void apply_handle_update_internal(ApplyExecutionData *edata, ResultRelInfo *relinfo, TupleTableSlot *remoteslot, LogicalRepTupleData *newtup, Oid localindexoid)
Definition: worker.c:2907
static void ensure_last_message(FileSet *stream_fileset, TransactionId xid, int fileno, off_t offset)
Definition: worker.c:2228
#define MIN_XID_ADVANCE_INTERVAL
Definition: worker.c:455
static void apply_handle_begin(StringInfo s)
Definition: worker.c:1211
void DisableSubscriptionAndExit(void)
Definition: worker.c:5943
static dlist_head lsn_mapping
Definition: worker.c:308
bool IsLogicalParallelApplyWorker(void)
Definition: worker.c:6011
void AtEOXact_LogicalRepWorkers(bool isCommit)
Definition: worker.c:6269
static void slot_store_data(TupleTableSlot *slot, LogicalRepRelMapEntry *rel, LogicalRepTupleData *tupleData)
Definition: worker.c:1017
void ReplicationOriginNameForLogicalRep(Oid suboid, Oid relid, char *originname, Size szoriginname)
Definition: worker.c:641
static void finish_edata(ApplyExecutionData *edata)
Definition: worker.c:928
static void slot_modify_data(TupleTableSlot *slot, TupleTableSlot *srcslot, LogicalRepRelMapEntry *rel, LogicalRepTupleData *tupleData)
Definition: worker.c:1118
static void set_apply_error_context_xact(TransactionId xid, XLogRecPtr lsn)
Definition: worker.c:6231
ErrorContextCallback * apply_error_context_stack
Definition: worker.c:469
static void stream_abort_internal(TransactionId xid, TransactionId subxid)
Definition: worker.c:1988
static void apply_handle_commit(StringInfo s)
Definition: worker.c:1236
static bool IsIndexUsableForFindingDeletedTuple(Oid localindexoid, TransactionId conflict_detection_xmin)
Definition: worker.c:3235
void stream_start_internal(TransactionId xid, bool first_segment)
Definition: worker.c:1687
static List * on_commit_wakeup_workers_subids
Definition: worker.c:482
static void apply_handle_stream_abort(StringInfo s)
Definition: worker.c:2071
static void apply_handle_relation(StringInfo s)
Definition: worker.c:2563
void set_apply_error_context_origin(char *originname)
Definition: worker.c:6301
static void wait_for_publisher_status(RetainDeadTuplesData *rdt_data, bool status_received)
Definition: worker.c:4555
struct ApplyErrorCallbackArg ApplyErrorCallbackArg
MemoryContext ApplyContext
Definition: worker.c:472
static void subxact_info_write(Oid subid, TransactionId xid)
Definition: worker.c:5182
static void TargetPrivilegesCheck(Relation rel, AclMode mode)
Definition: worker.c:2601
static void apply_handle_prepare(StringInfo s)
Definition: worker.c:1331
static void apply_handle_rollback_prepared(StringInfo s)
Definition: worker.c:1457
static void run_apply_worker()
Definition: worker.c:5624
void SetupApplyOrSyncWorker(int worker_slot)
Definition: worker.c:5869
static void apply_handle_stream_stop(StringInfo s)
Definition: worker.c:1885
static void apply_handle_origin(StringInfo s)
Definition: worker.c:1666
static void request_publisher_status(RetainDeadTuplesData *rdt_data)
Definition: worker.c:4516
static void send_feedback(XLogRecPtr recvpos, bool force, bool requestReply)
Definition: worker.c:4297
static void reset_retention_data_fields(RetainDeadTuplesData *rdt_data)
Definition: worker.c:4921
static void process_rdt_phase_transition(RetainDeadTuplesData *rdt_data, bool status_received)
Definition: worker.c:4423
static void maybe_advance_nonremovable_xid(RetainDeadTuplesData *rdt_data, bool status_received)
Definition: worker.c:4387
WalReceiverConn * LogRepWorkerWalRcvConn
Definition: worker.c:477
static void resume_conflict_info_retention(RetainDeadTuplesData *rdt_data)
Definition: worker.c:4842
static XLogRecPtr remote_final_lsn
Definition: worker.c:485
static bool MySubscriptionValid
Definition: worker.c:480
void apply_error_callback(void *arg)
Definition: worker.c:6159
void store_flush_position(XLogRecPtr remote_lsn, XLogRecPtr local_lsn)
Definition: worker.c:3939
static MemoryContext LogicalStreamingContext
Definition: worker.c:475
void maybe_reread_subscription(void)
Definition: worker.c:5038
static void apply_handle_commit_internal(LogicalRepCommitData *commit_data)
Definition: worker.c:2503
void InitializeLogRepWorker(void)
Definition: worker.c:5737
static bool in_streamed_transaction
Definition: worker.c:488
struct SubXactInfo SubXactInfo
static void apply_handle_begin_prepare(StringInfo s)
Definition: worker.c:1265
struct FlushPosition FlushPosition
void ApplyWorkerMain(Datum main_arg)
Definition: worker.c:5923
void apply_spooled_messages(FileSet *stream_fileset, TransactionId xid, XLogRecPtr lsn)
Definition: worker.c:2260
static void apply_handle_stream_start(StringInfo s)
Definition: worker.c:1725
static void maybe_start_skipping_changes(XLogRecPtr finish_lsn)
Definition: worker.c:6021
static void stop_conflict_info_retention(RetainDeadTuplesData *rdt_data)
Definition: worker.c:4804
Subscription * MySubscription
Definition: worker.c:479
static void apply_handle_prepare_internal(LogicalRepPreparedTxnData *prepare_data)
Definition: worker.c:1294
static void stream_close_file(void)
Definition: worker.c:5450
static TransactionId stream_xid
Definition: worker.c:490
static void apply_handle_insert(StringInfo s)
Definition: worker.c:2633
static void slot_fill_defaults(LogicalRepRelMapEntry *rel, EState *estate, TupleTableSlot *slot)
Definition: worker.c:959
static void subxact_info_read(Oid subid, TransactionId xid)
Definition: worker.c:5231
static bool FindDeletedTupleInLocalRel(Relation localrel, Oid localidxoid, TupleTableSlot *remoteslot, TransactionId *delete_xid, RepOriginId *delete_origin, TimestampTz *delete_time)
Definition: worker.c:3269
static void apply_handle_delete_internal(ApplyExecutionData *edata, ResultRelInfo *relinfo, TupleTableSlot *remoteslot, Oid localindexoid)
Definition: worker.c:3106
static void reset_apply_error_context_info(void)
Definition: worker.c:6239
long TimestampDifferenceMilliseconds(TimestampTz start_time, TimestampTz stop_time)
Definition: timestamp.c:1757
bool TimestampDifferenceExceeds(TimestampTz start_time, TimestampTz stop_time, int msec)
Definition: timestamp.c:1781
TimestampTz GetCurrentTimestamp(void)
Definition: timestamp.c:1645
Datum now(PG_FUNCTION_ARGS)
Definition: timestamp.c:1609
void pgstat_report_activity(BackendState state, const char *cmd_str)
@ STATE_IDLE
@ STATE_IDLEINTRANSACTION
@ STATE_RUNNING
void BackgroundWorkerUnblockSignals(void)
Definition: bgworker.c:930
void BackgroundWorkerInitializeConnectionByOid(Oid dboid, Oid useroid, uint32 flags)
Definition: bgworker.c:890
Bitmapset * bms_make_singleton(int x)
Definition: bitmapset.c:216
Bitmapset * bms_add_member(Bitmapset *a, int x)
Definition: bitmapset.c:815
static Datum values[MAXATTR]
Definition: bootstrap.c:153
BufFile * BufFileOpenFileSet(FileSet *fileset, const char *name, int mode, bool missing_ok)
Definition: buffile.c:291
void BufFileReadExact(BufFile *file, void *ptr, size_t size)
Definition: buffile.c:654
void BufFileTell(BufFile *file, int *fileno, off_t *offset)
Definition: buffile.c:833
void BufFileWrite(BufFile *file, const void *ptr, size_t size)
Definition: buffile.c:676
size_t BufFileReadMaybeEOF(BufFile *file, void *ptr, size_t size, bool eofOK)
Definition: buffile.c:664
void BufFileTruncateFileSet(BufFile *file, int fileno, off_t offset)
Definition: buffile.c:928
BufFile * BufFileCreateFileSet(FileSet *fileset, const char *name)
Definition: buffile.c:267
int BufFileSeek(BufFile *file, int fileno, off_t offset, int whence)
Definition: buffile.c:740
void BufFileClose(BufFile *file)
Definition: buffile.c:412
void BufFileDeleteFileSet(FileSet *fileset, const char *name, bool missing_ok)
Definition: buffile.c:364
#define Min(x, y)
Definition: c.h:1008
#define likely(x)
Definition: c.h:406
int64_t int64
Definition: c.h:540
uint64_t uint64
Definition: c.h:544
uint32_t uint32
Definition: c.h:543
uint32 TransactionId
Definition: c.h:662
#define OidIsValid(objectId)
Definition: c.h:779
size_t Size
Definition: c.h:615
bool track_commit_timestamp
Definition: commit_ts.c:109
void ReportApplyConflict(EState *estate, ResultRelInfo *relinfo, int elevel, ConflictType type, TupleTableSlot *searchslot, TupleTableSlot *remoteslot, List *conflicttuples)
Definition: conflict.c:104
void InitConflictIndexes(ResultRelInfo *relInfo)
Definition: conflict.c:139
bool GetTupleTransactionInfo(TupleTableSlot *localslot, TransactionId *xmin, RepOriginId *localorigin, TimestampTz *localts)
Definition: conflict.c:63
ConflictType
Definition: conflict.h:32
@ CT_UPDATE_DELETED
Definition: conflict.h:43
@ CT_DELETE_MISSING
Definition: conflict.h:52
@ CT_UPDATE_ORIGIN_DIFFERS
Definition: conflict.h:37
@ CT_UPDATE_MISSING
Definition: conflict.h:46
@ CT_DELETE_ORIGIN_DIFFERS
Definition: conflict.h:49
int64 TimestampTz
Definition: timestamp.h:39
void load_file(const char *filename, bool restricted)
Definition: dfmgr.c:149
int errmsg_internal(const char *fmt,...)
Definition: elog.c:1170
void EmitErrorReport(void)
Definition: elog.c:1704
int errdetail_internal(const char *fmt,...)
Definition: elog.c:1243
int errdetail(const char *fmt,...)
Definition: elog.c:1216
ErrorContextCallback * error_context_stack
Definition: elog.c:95
void FlushErrorState(void)
Definition: elog.c:1884
int errcode(int sqlerrcode)
Definition: elog.c:863
int errmsg(const char *fmt,...)
Definition: elog.c:1080
#define LOG
Definition: elog.h:31
#define PG_RE_THROW()
Definition: elog.h:405
#define errcontext
Definition: elog.h:198
#define PG_TRY(...)
Definition: elog.h:372
#define WARNING
Definition: elog.h:36
#define DEBUG2
Definition: elog.h:29
#define PG_END_TRY(...)
Definition: elog.h:397
#define DEBUG1
Definition: elog.h:30
#define ERROR
Definition: elog.h:39
#define PG_CATCH(...)
Definition: elog.h:382
#define elog(elevel,...)
Definition: elog.h:226
#define ereport(elevel,...)
Definition: elog.h:150
bool equal(const void *a, const void *b)
Definition: equalfuncs.c:223
void err(int eval, const char *fmt,...)
Definition: err.c:43
ExprState * ExecInitExpr(Expr *node, PlanState *parent)
Definition: execExpr.c:143
void ExecCloseIndices(ResultRelInfo *resultRelInfo)
Definition: execIndexing.c:238
void ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative)
Definition: execIndexing.c:160
bool ExecPartitionCheck(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate, bool emitError)
Definition: execMain.c:1856
void EvalPlanQualInit(EPQState *epqstate, EState *parentestate, Plan *subplan, List *auxrowmarks, int epqParam, List *resultRelations)
Definition: execMain.c:2718
void InitResultRelInfo(ResultRelInfo *resultRelInfo, Relation resultRelationDesc, Index resultRelationIndex, ResultRelInfo *partition_root_rri, int instrument_options)
Definition: execMain.c:1243
void EvalPlanQualEnd(EPQState *epqstate)
Definition: execMain.c:3182
PartitionTupleRouting * ExecSetupPartitionTupleRouting(EState *estate, Relation rel)
ResultRelInfo * ExecFindPartition(ModifyTableState *mtstate, ResultRelInfo *rootResultRelInfo, PartitionTupleRouting *proute, TupleTableSlot *slot, EState *estate)
void ExecCleanupTupleRouting(ModifyTableState *mtstate, PartitionTupleRouting *proute)
void CheckSubscriptionRelkind(char localrelkind, char remoterelkind, const char *nspname, const char *relname)
bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode, TupleTableSlot *searchslot, TupleTableSlot *outslot)
bool RelationFindReplTupleByIndex(Relation rel, Oid idxoid, LockTupleMode lockmode, TupleTableSlot *searchslot, TupleTableSlot *outslot)
void ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo, EState *estate, EPQState *epqstate, TupleTableSlot *searchslot)
bool RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot, TransactionId oldestxmin, TransactionId *delete_xid, RepOriginId *delete_origin, TimestampTz *delete_time)
void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo, EState *estate, EPQState *epqstate, TupleTableSlot *searchslot, TupleTableSlot *slot)
void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo, EState *estate, TupleTableSlot *slot)
bool RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid, TupleTableSlot *searchslot, TransactionId oldestxmin, TransactionId *delete_xid, RepOriginId *delete_origin, TimestampTz *delete_time)
void ExecResetTupleTable(List *tupleTable, bool shouldFree)
Definition: execTuples.c:1380
const TupleTableSlotOps TTSOpsVirtual
Definition: execTuples.c:84
TupleTableSlot * ExecStoreVirtualTuple(TupleTableSlot *slot)
Definition: execTuples.c:1741
TupleTableSlot * ExecInitExtraTupleSlot(EState *estate, TupleDesc tupledesc, const TupleTableSlotOps *tts_ops)
Definition: execTuples.c:2020
TupleConversionMap * ExecGetRootToChildMap(ResultRelInfo *resultRelInfo, EState *estate)
Definition: execUtils.c:1326
void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos, Bitmapset *unpruned_relids)
Definition: execUtils.c:773
void FreeExecutorState(EState *estate)
Definition: execUtils.c:192
EState * CreateExecutorState(void)
Definition: execUtils.c:88
#define GetPerTupleExprContext(estate)
Definition: executor.h:656
#define GetPerTupleMemoryContext(estate)
Definition: executor.h:661
#define EvalPlanQualSetSlot(epqstate, slot)
Definition: executor.h:289
static Datum ExecEvalExpr(ExprState *state, ExprContext *econtext, bool *isNull)
Definition: executor.h:393
void FileSetInit(FileSet *fileset)
Definition: fileset.c:52
Datum OidReceiveFunctionCall(Oid functionId, StringInfo buf, Oid typioparam, int32 typmod)
Definition: fmgr.c:1772
Datum OidInputFunctionCall(Oid functionId, char *str, Oid typioparam, int32 typmod)
Definition: fmgr.c:1754
struct Latch * MyLatch
Definition: globals.c:63
void ProcessConfigFile(GucContext context)
Definition: guc-file.l:120
void SetConfigOption(const char *name, const char *value, GucContext context, GucSource source)
Definition: guc.c:4196
@ PGC_S_OVERRIDE
Definition: guc.h:123
@ PGC_SUSET
Definition: guc.h:78
@ PGC_SIGHUP
Definition: guc.h:75
@ PGC_BACKEND
Definition: guc.h:77
Assert(PointerIsAligned(start, uint64))
HeapTuple heap_modify_tuple(HeapTuple tuple, TupleDesc tupleDesc, const Datum *replValues, const bool *replIsnull, const bool *doReplace)
Definition: heaptuple.c:1210
void heap_freetuple(HeapTuple htup)
Definition: heaptuple.c:1435
#define HeapTupleIsValid(tuple)
Definition: htup.h:78
static TransactionId HeapTupleHeaderGetXmin(const HeapTupleHeaderData *tup)
Definition: htup_details.h:324
static void * GETSTRUCT(const HeapTupleData *tuple)
Definition: htup_details.h:728
static void dlist_delete(dlist_node *node)
Definition: ilist.h:405
#define dlist_tail_element(type, membername, lhead)
Definition: ilist.h:612
#define dlist_foreach_modify(iter, lhead)
Definition: ilist.h:640
static bool dlist_is_empty(const dlist_head *head)
Definition: ilist.h:336
static void dlist_push_tail(dlist_head *head, dlist_node *node)
Definition: ilist.h:364
#define DLIST_STATIC_INIT(name)
Definition: ilist.h:281
#define dlist_container(type, membername, ptr)
Definition: ilist.h:593
void index_close(Relation relation, LOCKMODE lockmode)
Definition: indexam.c:177
Relation index_open(Oid relationId, LOCKMODE lockmode)
Definition: indexam.c:133
void CatalogTupleUpdate(Relation heapRel, const ItemPointerData *otid, HeapTuple tup)
Definition: indexing.c:313
#define write(a, b, c)
Definition: win32.h:14
volatile sig_atomic_t ConfigReloadPending
Definition: interrupt.c:27
void SignalHandlerForConfigReload(SIGNAL_ARGS)
Definition: interrupt.c:61
void AcceptInvalidationMessages(void)
Definition: inval.c:930
void CacheRegisterSyscacheCallback(int cacheid, SyscacheCallbackFunction func, Datum arg)
Definition: inval.c:1812
void before_shmem_exit(pg_on_exit_callback function, Datum arg)
Definition: ipc.c:337
void proc_exit(int code)
Definition: ipc.c:104
int i
Definition: isn.c:77
int WaitLatchOrSocket(Latch *latch, int wakeEvents, pgsocket sock, long timeout, uint32 wait_event_info)
Definition: latch.c:223
void ResetLatch(Latch *latch)
Definition: latch.c:374
List * logicalrep_workers_find(Oid subid, bool only_running, bool acquire_lock)
Definition: launcher.c:293
void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker)
Definition: launcher.c:746
void logicalrep_worker_attach(int slot)
Definition: launcher.c:757
void ApplyLauncherWakeup(void)
Definition: launcher.c:1194
LogicalRepWorker * logicalrep_worker_find(LogicalRepWorkerType wtype, Oid subid, Oid relid, bool only_running)
Definition: launcher.c:258
void logicalrep_worker_wakeup(LogicalRepWorkerType wtype, Oid subid, Oid relid)
Definition: launcher.c:723
LogicalRepWorker * MyLogicalRepWorker
Definition: launcher.c:56
void ApplyLauncherForgetWorkerStartTime(Oid subid)
Definition: launcher.c:1154
List * lappend(List *list, void *datum)
Definition: list.c:339
List * lappend_oid(List *list, Oid datum)
Definition: list.c:375
List * list_append_unique_oid(List *list, Oid datum)
Definition: list.c:1380
bool list_member_oid(const List *list, Oid datum)
Definition: list.c:722
void LockSharedObject(Oid classid, Oid objid, uint16 objsubid, LOCKMODE lockmode)
Definition: lmgr.c:1088
int LOCKMODE
Definition: lockdefs.h:26
#define NoLock
Definition: lockdefs.h:34
#define AccessExclusiveLock
Definition: lockdefs.h:43
#define AccessShareLock
Definition: lockdefs.h:36
#define RowExclusiveLock
Definition: lockdefs.h:38
@ LockTupleExclusive
Definition: lockoptions.h:58
#define LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM
Definition: logicalproto.h:44
#define LOGICALREP_PROTO_STREAM_VERSION_NUM
Definition: logicalproto.h:42
#define LOGICALREP_PROTO_TWOPHASE_VERSION_NUM
Definition: logicalproto.h:43
#define LOGICALREP_COLUMN_UNCHANGED
Definition: logicalproto.h:97
LogicalRepMsgType
Definition: logicalproto.h:58
@ LOGICAL_REP_MSG_INSERT
Definition: logicalproto.h:62
@ LOGICAL_REP_MSG_TRUNCATE
Definition: logicalproto.h:65
@ LOGICAL_REP_MSG_STREAM_STOP
Definition: logicalproto.h:74
@ LOGICAL_REP_MSG_BEGIN
Definition: logicalproto.h:59
@ LOGICAL_REP_MSG_STREAM_PREPARE
Definition: logicalproto.h:77
@ LOGICAL_REP_MSG_STREAM_ABORT
Definition: logicalproto.h:76
@ LOGICAL_REP_MSG_BEGIN_PREPARE
Definition: logicalproto.h:69
@ LOGICAL_REP_MSG_STREAM_START
Definition: logicalproto.h:73
@ LOGICAL_REP_MSG_COMMIT
Definition: logicalproto.h:60
@ LOGICAL_REP_MSG_PREPARE
Definition: logicalproto.h:70
@ LOGICAL_REP_MSG_RELATION
Definition: logicalproto.h:66
@ LOGICAL_REP_MSG_MESSAGE
Definition: logicalproto.h:68
@ LOGICAL_REP_MSG_ROLLBACK_PREPARED
Definition: logicalproto.h:72
@ LOGICAL_REP_MSG_COMMIT_PREPARED
Definition: logicalproto.h:71
@ LOGICAL_REP_MSG_TYPE
Definition: logicalproto.h:67
@ LOGICAL_REP_MSG_DELETE
Definition: logicalproto.h:64
@ LOGICAL_REP_MSG_STREAM_COMMIT
Definition: logicalproto.h:75
@ LOGICAL_REP_MSG_ORIGIN
Definition: logicalproto.h:61
@ LOGICAL_REP_MSG_UPDATE
Definition: logicalproto.h:63
uint32 LogicalRepRelId
Definition: logicalproto.h:101
#define LOGICALREP_PROTO_VERSION_NUM
Definition: logicalproto.h:41
#define LOGICALREP_COLUMN_BINARY
Definition: logicalproto.h:99
#define LOGICALREP_COLUMN_TEXT
Definition: logicalproto.h:98
char * get_rel_name(Oid relid)
Definition: lsyscache.c:2095
void getTypeInputInfo(Oid type, Oid *typInput, Oid *typIOParam)
Definition: lsyscache.c:3041
char * get_namespace_name(Oid nspid)
Definition: lsyscache.c:3533
void getTypeBinaryInputInfo(Oid type, Oid *typReceive, Oid *typIOParam)
Definition: lsyscache.c:3107
bool LWLockAcquire(LWLock *lock, LWLockMode mode)
Definition: lwlock.c:1174
void LWLockRelease(LWLock *lock)
Definition: lwlock.c:1894
@ LW_SHARED
Definition: lwlock.h:113
char * MemoryContextStrdup(MemoryContext context, const char *string)
Definition: mcxt.c:1746
void MemoryContextReset(MemoryContext context)
Definition: mcxt.c:400
MemoryContext TopTransactionContext
Definition: mcxt.c:171
char * pstrdup(const char *in)
Definition: mcxt.c:1759
void * repalloc(void *pointer, Size size)
Definition: mcxt.c:1610
void pfree(void *pointer)
Definition: mcxt.c:1594
void * palloc0(Size size)
Definition: mcxt.c:1395
MemoryContext TopMemoryContext
Definition: mcxt.c:166
void * palloc(Size size)
Definition: mcxt.c:1365
#define AllocSetContextCreate
Definition: memutils.h:129
#define ALLOCSET_DEFAULT_SIZES
Definition: memutils.h:160
#define RESUME_INTERRUPTS()
Definition: miscadmin.h:136
#define CHECK_FOR_INTERRUPTS()
Definition: miscadmin.h:123
#define HOLD_INTERRUPTS()
Definition: miscadmin.h:134
Oid GetUserId(void)
Definition: miscinit.c:469
char * GetUserNameFromId(Oid roleid, bool noerr)
Definition: miscinit.c:988
CmdType
Definition: nodes.h:273
@ CMD_INSERT
Definition: nodes.h:277
@ CMD_DELETE
Definition: nodes.h:278
@ CMD_UPDATE
Definition: nodes.h:276
#define makeNode(_type_)
Definition: nodes.h:161
ObjectType get_relkind_objtype(char relkind)
TimestampTz replorigin_session_origin_timestamp
Definition: origin.c:165
RepOriginId replorigin_by_name(const char *roname, bool missing_ok)
Definition: origin.c:226
RepOriginId replorigin_create(const char *roname)
Definition: origin.c:257
RepOriginId replorigin_session_origin
Definition: origin.c:163
void replorigin_session_setup(RepOriginId node, int acquired_by)
Definition: origin.c:1120
XLogRecPtr replorigin_session_get_progress(bool flush)
Definition: origin.c:1273
XLogRecPtr replorigin_session_origin_lsn
Definition: origin.c:164
#define InvalidRepOriginId
Definition: origin.h:33
static MemoryContext MemoryContextSwitchTo(MemoryContext context)
Definition: palloc.h:124
RTEPermissionInfo * addRTEPermissionInfo(List **rteperminfos, RangeTblEntry *rte)
#define ACL_DELETE
Definition: parsenodes.h:79
uint64 AclMode
Definition: parsenodes.h:74
#define ACL_INSERT
Definition: parsenodes.h:76
#define ACL_UPDATE
Definition: parsenodes.h:78
@ RTE_RELATION
Definition: parsenodes.h:1043
@ DROP_RESTRICT
Definition: parsenodes.h:2398
#define ACL_SELECT
Definition: parsenodes.h:77
#define ACL_TRUNCATE
Definition: parsenodes.h:80
int16 attnum
Definition: pg_attribute.h:74
FormData_pg_attribute * Form_pg_attribute
Definition: pg_attribute.h:202
void * arg
static uint32 pg_ceil_log2_32(uint32 num)
Definition: pg_bitutils.h:258
static PgChecksumMode mode
Definition: pg_checksums.c:56
#define NAMEDATALEN
#define MAXPGPATH
const void size_t len
static int server_version
Definition: pg_dumpall.c:109
List * find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
Definition: pg_inherits.c:255
#define lfirst(lc)
Definition: pg_list.h:172
#define NIL
Definition: pg_list.h:68
#define list_make1(x1)
Definition: pg_list.h:212
static void * list_nth(const List *list, int n)
Definition: pg_list.h:299
#define lfirst_oid(lc)
Definition: pg_list.h:174
static Datum LSNGetDatum(XLogRecPtr X)
Definition: pg_lsn.h:31
static char ** options
void FreeSubscription(Subscription *sub)
void DisableSubscription(Oid subid)
void UpdateDeadTupleRetentionStatus(Oid subid, bool active)
Subscription * GetSubscription(Oid subid, bool missing_ok)
FormData_pg_subscription * Form_pg_subscription
#define die(msg)
static char * buf
Definition: pg_test_fsync.c:72
long pgstat_report_stat(bool force)
Definition: pgstat.c:694
void pgstat_report_subscription_error(Oid subid, LogicalRepWorkerType wtype)
int64 timestamp
Expr * expression_planner(Expr *expr)
Definition: planner.c:6763
#define pqsignal
Definition: port.h:531
int pgsocket
Definition: port.h:29
#define snprintf
Definition: port.h:239
#define PGINVALID_SOCKET
Definition: port.h:31
static Datum ObjectIdGetDatum(Oid X)
Definition: postgres.h:262
uint64_t Datum
Definition: postgres.h:70
static int32 DatumGetInt32(Datum X)
Definition: postgres.h:212
#define InvalidOid
Definition: postgres_ext.h:37
unsigned int Oid
Definition: postgres_ext.h:32
unsigned int pq_getmsgint(StringInfo msg, int b)
Definition: pqformat.c:415
int pq_getmsgbyte(StringInfo msg)
Definition: pqformat.c:399
int64 pq_getmsgint64(StringInfo msg)
Definition: pqformat.c:453
static void pq_sendbyte(StringInfo buf, uint8 byt)
Definition: pqformat.h:160
static void pq_sendint64(StringInfo buf, uint64 i)
Definition: pqformat.h:152
char * c
static int fd(const char *x, int i)
Definition: preproc-init.c:105
char * s2
TransactionId GetOldestActiveTransactionId(bool inCommitOnly, bool allDbs)
Definition: procarray.c:2833
void logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
Definition: proto.c:98
LogicalRepRelId logicalrep_read_delete(StringInfo in, LogicalRepTupleData *oldtup)
Definition: proto.c:561
void logicalrep_read_rollback_prepared(StringInfo in, LogicalRepRollbackPreparedTxnData *rollback_data)
Definition: proto.c:325
void logicalrep_read_begin_prepare(StringInfo in, LogicalRepPreparedTxnData *begin_data)
Definition: proto.c:134
void logicalrep_read_typ(StringInfo in, LogicalRepTyp *ltyp)
Definition: proto.c:757
LogicalRepRelId logicalrep_read_update(StringInfo in, bool *has_oldtuple, LogicalRepTupleData *oldtup, LogicalRepTupleData *newtup)
Definition: proto.c:487
List * logicalrep_read_truncate(StringInfo in, bool *cascade, bool *restart_seqs)
Definition: proto.c:615
void logicalrep_read_stream_abort(StringInfo in, LogicalRepStreamAbortData *abort_data, bool read_abort_info)
Definition: proto.c:1187
void logicalrep_read_begin(StringInfo in, LogicalRepBeginData *begin_data)
Definition: proto.c:63
void logicalrep_read_commit_prepared(StringInfo in, LogicalRepCommitPreparedTxnData *prepare_data)
Definition: proto.c:267
LogicalRepRelation * logicalrep_read_rel(StringInfo in)
Definition: proto.c:698
const char * logicalrep_message_type(LogicalRepMsgType action)
Definition: proto.c:1212
void logicalrep_read_stream_prepare(StringInfo in, LogicalRepPreparedTxnData *prepare_data)
Definition: proto.c:365
TransactionId logicalrep_read_stream_commit(StringInfo in, LogicalRepCommitData *commit_data)
Definition: proto.c:1132
LogicalRepRelId logicalrep_read_insert(StringInfo in, LogicalRepTupleData *newtup)
Definition: proto.c:428
void logicalrep_read_prepare(StringInfo in, LogicalRepPreparedTxnData *prepare_data)
Definition: proto.c:228
TransactionId logicalrep_read_stream_start(StringInfo in, bool *first_segment)
Definition: proto.c:1082
#define PqReplMsg_WALData
Definition: protocol.h:77
#define PqReplMsg_PrimaryStatusRequest
Definition: protocol.h:83
#define PqReplMsg_Keepalive
Definition: protocol.h:75
#define PqReplMsg_PrimaryStatusUpdate
Definition: protocol.h:76
#define PqReplMsg_StandbyStatusUpdate
Definition: protocol.h:84
static color newsub(struct colormap *cm, color co)
Definition: regc_color.c:389
#define RelationGetRelid(relation)
Definition: rel.h:515
#define RelationIsLogicallyLogged(relation)
Definition: rel.h:711
#define RelationGetDescr(relation)
Definition: rel.h:541
#define RelationGetRelationName(relation)
Definition: rel.h:549
#define RELATION_IS_OTHER_TEMP(relation)
Definition: rel.h:668
#define RelationGetNamespace(relation)
Definition: rel.h:556
List * RelationGetIndexList(Relation relation)
Definition: relcache.c:4836
ResourceOwner TopTransactionResourceOwner
Definition: resowner.c:175
ResourceOwner CurrentResourceOwner
Definition: resowner.c:173
Node * build_column_default(Relation rel, int attrno)
int check_enable_rls(Oid relid, Oid checkAsUser, bool noError)
Definition: rls.c:52
@ RLS_ENABLED
Definition: rls.h:45
Snapshot GetTransactionSnapshot(void)
Definition: snapmgr.c:271
void PushActiveSnapshot(Snapshot snapshot)
Definition: snapmgr.c:680
void PopActiveSnapshot(void)
Definition: snapmgr.c:773
#define SpinLockRelease(lock)
Definition: spin.h:61
#define SpinLockAcquire(lock)
Definition: spin.h:59
void logicalrep_partmap_reset_relmap(LogicalRepRelation *remoterel)
Definition: relation.c:584
LogicalRepRelMapEntry * logicalrep_partition_open(LogicalRepRelMapEntry *root, Relation partrel, AttrMap *map)
Definition: relation.c:646
bool IsIndexUsableForReplicaIdentityFull(Relation idxrel, AttrMap *attrmap)
Definition: relation.c:834
Oid GetRelationIdentityOrPK(Relation rel)
Definition: relation.c:904
void logicalrep_relmap_update(LogicalRepRelation *remoterel)
Definition: relation.c:164
void logicalrep_rel_close(LogicalRepRelMapEntry *rel, LOCKMODE lockmode)
Definition: relation.c:517
LogicalRepRelMapEntry * logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
Definition: relation.c:361
StringInfo makeStringInfo(void)
Definition: stringinfo.c:72
void resetStringInfo(StringInfo str)
Definition: stringinfo.c:126
static void initReadOnlyStringInfo(StringInfo str, char *data, int len)
Definition: stringinfo.h:157
TransactionId remote_xid
Definition: worker.c:330
LogicalRepMsgType command
Definition: worker.c:325
XLogRecPtr finish_lsn
Definition: worker.c:331
LogicalRepRelMapEntry * rel
Definition: worker.c:326
ResultRelInfo * targetRelInfo
Definition: worker.c:315
EState * estate
Definition: worker.c:312
PartitionTupleRouting * proute
Definition: worker.c:319
ModifyTableState * mtstate
Definition: worker.c:318
LogicalRepRelMapEntry * targetRel
Definition: worker.c:314
uint32 nsubxacts
Definition: worker.c:539
uint32 nsubxacts_max
Definition: worker.c:540
SubXactInfo * subxacts
Definition: worker.c:542
TransactionId subxact_last
Definition: worker.c:541
Definition: attmap.h:35
int maplen
Definition: attmap.h:37
AttrNumber * attnums
Definition: attmap.h:36
bool attgenerated
Definition: tupdesc.h:78
bool attisdropped
Definition: tupdesc.h:77
TimestampTz ts
Definition: conflict.h:78
RepOriginId origin
Definition: conflict.h:77
TransactionId xmin
Definition: conflict.h:75
TupleTableSlot * slot
Definition: conflict.h:71
List * es_rteperminfos
Definition: execnodes.h:668
List * es_tupleTable
Definition: execnodes.h:712
List * es_opened_result_relations
Definition: execnodes.h:688
CommandId es_output_cid
Definition: execnodes.h:682
struct ErrorContextCallback * previous
Definition: elog.h:297
void(* callback)(void *arg)
Definition: elog.h:298
dlist_node node
Definition: worker.c:303
XLogRecPtr remote_end
Definition: worker.c:305
XLogRecPtr local_end
Definition: worker.c:304
ItemPointerData t_self
Definition: htup.h:65
HeapTupleHeader t_data
Definition: htup.h:68
Definition: pg_list.h:54
XLogRecPtr final_lsn
Definition: logicalproto.h:129
TransactionId xid
Definition: logicalproto.h:131
TimestampTz committime
Definition: logicalproto.h:138
LogicalRepRelation remoterel
StringInfoData * colvalues
Definition: logicalproto.h:87
TimestampTz last_recv_time
LogicalRepWorkerType type
TimestampTz reply_time
FileSet * stream_fileset
TransactionId oldest_nonremovable_xid
XLogRecPtr reply_lsn
TimestampTz last_send_time
CmdType operation
Definition: execnodes.h:1404
ResultRelInfo * resultRelInfo
Definition: execnodes.h:1408
PlanState ps
Definition: execnodes.h:1403
ParallelApplyWorkerShared * shared
pg_atomic_uint32 pending_stream_count
Plan * plan
Definition: execnodes.h:1165
EState * state
Definition: execnodes.h:1167
Bitmapset * updatedCols
Definition: parsenodes.h:1326
RTEKind rtekind
Definition: parsenodes.h:1078
Form_pg_class rd_rel
Definition: rel.h:111
TupleTableSlot * ri_PartitionTupleSlot
Definition: execnodes.h:619
List * ri_onConflictArbiterIndexes
Definition: execnodes.h:580
Relation ri_RelationDesc
Definition: execnodes.h:480
RelationPtr ri_IndexRelationDescs
Definition: execnodes.h:486
TimestampTz flushpos_update_time
Definition: worker.c:432
FullTransactionId remote_oldestxid
Definition: worker.c:412
FullTransactionId remote_wait_for
Definition: worker.c:428
TimestampTz last_recv_time
Definition: worker.c:443
TimestampTz candidate_xid_time
Definition: worker.c:444
long table_sync_wait_time
Definition: worker.c:436
FullTransactionId remote_nextxid
Definition: worker.c:419
RetainDeadTuplesPhase phase
Definition: worker.c:403
XLogRecPtr remote_lsn
Definition: worker.c:404
TimestampTz reply_time
Definition: worker.c:421
TransactionId candidate_xid
Definition: worker.c:430
int xid_advance_interval
Definition: worker.c:445
off_t offset
Definition: worker.c:533
TransactionId xid
Definition: worker.c:531
int fileno
Definition: worker.c:532
XLogRecPtr skiplsn
AttrMap * attrMap
Definition: tupconvert.h:28
TupleDesc tts_tupleDescriptor
Definition: tuptable.h:122
bool * tts_isnull
Definition: tuptable.h:126
Datum * tts_values
Definition: tuptable.h:124
dlist_node * cur
Definition: ilist.h:200
void CheckSubDeadTupleRetention(bool check_guc, bool sub_disabled, int elevel_for_sub_disabled, bool retain_dead_tuples, bool retention_active, bool max_retention_set)
void ProcessSyncingRelations(XLogRecPtr current_lsn)
Definition: syncutils.c:155
void InvalidateSyncingRelStates(Datum arg, int cacheid, uint32 hashvalue)
Definition: syncutils.c:101
#define FirstLowInvalidHeapAttributeNumber
Definition: sysattr.h:27
void ReleaseSysCache(HeapTuple tuple)
Definition: syscache.c:264
HeapTuple SearchSysCache1(int cacheId, Datum key1)
Definition: syscache.c:220
#define SearchSysCacheCopy1(cacheId, key1)
Definition: syscache.h:91
void table_close(Relation relation, LOCKMODE lockmode)
Definition: table.c:126
Relation table_open(Oid relationId, LOCKMODE lockmode)
Definition: table.c:40
TupleTableSlot * table_slot_create(Relation relation, List **reglist)
Definition: tableam.c:92
void ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged, DropBehavior behavior, bool restart_seqs, bool run_as_table_owner)
Definition: tablecmds.c:1975
bool AllTablesyncsReady(void)
Definition: tablesync.c:1600
bool HasSubscriptionTablesCached(void)
Definition: tablesync.c:1630
void UpdateTwoPhaseState(Oid suboid, char new_state)
Definition: tablesync.c:1651
#define InvalidTransactionId
Definition: transam.h:31
#define FullTransactionIdPrecedesOrEquals(a, b)
Definition: transam.h:52
static bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
Definition: transam.h:282
static FullTransactionId FullTransactionIdFromU64(uint64 value)
Definition: transam.h:81
#define TransactionIdEquals(id1, id2)
Definition: transam.h:43
#define TransactionIdIsValid(xid)
Definition: transam.h:41
#define InvalidFullTransactionId
Definition: transam.h:56
#define FullTransactionIdIsValid(x)
Definition: transam.h:55
static bool TransactionIdPrecedes(TransactionId id1, TransactionId id2)
Definition: transam.h:263
void AfterTriggerEndQuery(EState *estate)
Definition: trigger.c:5124
void AfterTriggerBeginQuery(void)
Definition: trigger.c:5104
TupleConversionMap * convert_tuples_by_name(TupleDesc indesc, TupleDesc outdesc)
Definition: tupconvert.c:103
TupleTableSlot * execute_attr_map_slot(AttrMap *attrMap, TupleTableSlot *in_slot, TupleTableSlot *out_slot)
Definition: tupconvert.c:193
static FormData_pg_attribute * TupleDescAttr(TupleDesc tupdesc, int i)
Definition: tupdesc.h:160
static CompactAttribute * TupleDescCompactAttr(TupleDesc tupdesc, int i)
Definition: tupdesc.h:175
static TupleTableSlot * ExecClearTuple(TupleTableSlot *slot)
Definition: tuptable.h:457
static void slot_getallattrs(TupleTableSlot *slot)
Definition: tuptable.h:371
static TupleTableSlot * ExecCopySlot(TupleTableSlot *dstslot, TupleTableSlot *srcslot)
Definition: tuptable.h:524
void TwoPhaseTransactionGid(Oid subid, TransactionId xid, char *gid_res, int szgid)
Definition: twophase.c:2747
bool LookupGXact(const char *gid, XLogRecPtr prepare_end_lsn, TimestampTz origin_prepare_timestamp)
Definition: twophase.c:2688
void FinishPreparedTransaction(const char *gid, bool isCommit)
Definition: twophase.c:1497
void SwitchToUntrustedUser(Oid userid, UserContext *context)
Definition: usercontext.c:33
void RestoreUserContext(UserContext *context)
Definition: usercontext.c:87
#define TimestampTzPlusMilliseconds(tz, ms)
Definition: timestamp.h:85
const char * type
#define WL_SOCKET_READABLE
Definition: waiteventset.h:35
#define WL_TIMEOUT
Definition: waiteventset.h:37
#define WL_EXIT_ON_PM_DEATH
Definition: waiteventset.h:39
#define WL_LATCH_SET
Definition: waiteventset.h:34
static StringInfoData reply_message
Definition: walreceiver.c:132
int wal_receiver_status_interval
Definition: walreceiver.c:88
int wal_receiver_timeout
Definition: walreceiver.c:89
#define walrcv_startstreaming(conn, options)
Definition: walreceiver.h:451
#define walrcv_connect(conninfo, replication, logical, must_use_password, appname, err)
Definition: walreceiver.h:435
#define walrcv_send(conn, buffer, nbytes)
Definition: walreceiver.h:457
#define walrcv_server_version(conn)
Definition: walreceiver.h:447
#define walrcv_endstreaming(conn, next_tli)
Definition: walreceiver.h:453
#define walrcv_identify_system(conn, primary_tli)
Definition: walreceiver.h:443
#define walrcv_receive(conn, buffer, wait_fd)
Definition: walreceiver.h:455
int WalWriterDelay
Definition: walwriter.c:70
#define SIGHUP
Definition: win32_port.h:158
@ PARALLEL_TRANS_STARTED
@ PARALLEL_TRANS_FINISHED
static bool am_parallel_apply_worker(void)
@ WORKERTYPE_TABLESYNC
@ WORKERTYPE_UNKNOWN
@ WORKERTYPE_SEQUENCESYNC
@ WORKERTYPE_PARALLEL_APPLY
@ WORKERTYPE_APPLY
@ FS_SERIALIZE_DONE
static bool am_sequencesync_worker(void)
static bool am_tablesync_worker(void)
static bool am_leader_apply_worker(void)
bool IsTransactionOrTransactionBlock(void)
Definition: xact.c:5007
bool PrepareTransactionBlock(const char *gid)
Definition: xact.c:4010
bool IsTransactionState(void)
Definition: xact.c:388
void CommandCounterIncrement(void)
Definition: xact.c:1101
void StartTransactionCommand(void)
Definition: xact.c:3077
void SetCurrentStatementStartTimestamp(void)
Definition: xact.c:915
bool IsTransactionBlock(void)
Definition: xact.c:4989
void BeginTransactionBlock(void)
Definition: xact.c:3942
void CommitTransactionCommand(void)
Definition: xact.c:3175
bool EndTransactionBlock(bool chain)
Definition: xact.c:4062
void AbortOutOfAnyTransaction(void)
Definition: xact.c:4880
CommandId GetCurrentCommandId(bool used)
Definition: xact.c:830
#define GIDSIZE
Definition: xact.h:31
XLogRecPtr GetFlushRecPtr(TimeLineID *insertTLI)
Definition: xlog.c:6571
XLogRecPtr XactLastCommitEnd
Definition: xlog.c:257
#define XLogRecPtrIsValid(r)
Definition: xlogdefs.h:29
#define LSN_FORMAT_ARGS(lsn)
Definition: xlogdefs.h:47
uint16 RepOriginId
Definition: xlogdefs.h:69
uint64 XLogRecPtr
Definition: xlogdefs.h:21
#define InvalidXLogRecPtr
Definition: xlogdefs.h:28
uint32 TimeLineID
Definition: xlogdefs.h:63