doxygen.postgresql.org/backend_2replication_2logical_2worker_8c_source.html

/*-------------------------------------------------------------------------

 * worker.c

 *     PostgreSQL logical replication worker (apply)

 *

 * Copyright (c) 2016-2025, PostgreSQL Global Development Group

 *

 * IDENTIFICATION

 *    src/backend/replication/logical/worker.c

 *

 * NOTES

 *    This file contains the worker which applies logical changes as they come

 *    from remote logical replication stream.

 *

 *    The main worker (apply) is started by logical replication worker

 *    launcher for every enabled subscription in a database. It uses

 *    walsender protocol to communicate with publisher.

 *

 *    This module includes server facing code and shares libpqwalreceiver

 *    module with walreceiver for providing the libpq specific functionality.

 *

 *

 * STREAMED TRANSACTIONS

 * ---------------------

 * Streamed transactions (large transactions exceeding a memory limit on the

 * upstream) are applied using one of two approaches:

 *

 * 1) Write to temporary files and apply when the final commit arrives

 *

 * This approach is used when the user has set the subscription's streaming

 * option as on.

 *

 * Unlike the regular (non-streamed) case, handling streamed transactions has

 * to handle aborts of both the toplevel transaction and subtransactions. This

 * is achieved by tracking offsets for subtransactions, which is then used

 * to truncate the file with serialized changes.

 *

 * The files are placed in tmp file directory by default, and the filenames

 * include both the XID of the toplevel transaction and OID of the

 * subscription. This is necessary so that different workers processing a

 * remote transaction with the same XID doesn't interfere.

 *

 * We use BufFiles instead of using normal temporary files because (a) the

 * BufFile infrastructure supports temporary files that exceed the OS file size

 * limit, (b) provides a way for automatic clean up on the error and (c) provides

 * a way to survive these files across local transactions and allow to open and

 * close at stream start and close. We decided to use FileSet

 * infrastructure as without that it deletes the files on the closure of the

 * file and if we decide to keep stream files open across the start/stop stream

 * then it will consume a lot of memory (more than 8K for each BufFile and

 * there could be multiple such BufFiles as the subscriber could receive

 * multiple start/stop streams for different transactions before getting the

 * commit). Moreover, if we don't use FileSet then we also need to invent

 * a new way to pass filenames to BufFile APIs so that we are allowed to open

 * the file we desired across multiple stream-open calls for the same

 * transaction.

 *

 * 2) Parallel apply workers.

 *

 * This approach is used when the user has set the subscription's streaming

 * option as parallel. See logical/applyparallelworker.c for information about

 * this approach.

 *

 * TWO_PHASE TRANSACTIONS

 * ----------------------

 * Two phase transactions are replayed at prepare and then committed or

 * rolled back at commit prepared and rollback prepared respectively. It is

 * possible to have a prepared transaction that arrives at the apply worker

 * when the tablesync is busy doing the initial copy. In this case, the apply

 * worker skips all the prepared operations [e.g. inserts] while the tablesync

 * is still busy (see the condition of should_apply_changes_for_rel). The

 * tablesync worker might not get such a prepared transaction because say it

 * was prior to the initial consistent point but might have got some later

 * commits. Now, the tablesync worker will exit without doing anything for the

 * prepared transaction skipped by the apply worker as the sync location for it

 * will be already ahead of the apply worker's current location. This would lead

 * to an "empty prepare", because later when the apply worker does the commit

 * prepare, there is nothing in it (the inserts were skipped earlier).

 *

 * To avoid this, and similar prepare confusions the subscription's two_phase

 * commit is enabled only after the initial sync is over. The two_phase option

 * has been implemented as a tri-state with values DISABLED, PENDING, and

 * ENABLED.

 *

 * Even if the user specifies they want a subscription with two_phase = on,

 * internally it will start with a tri-state of PENDING which only becomes

 * ENABLED after all tablesync initializations are completed - i.e. when all

 * tablesync workers have reached their READY state. In other words, the value

 * PENDING is only a temporary state for subscription start-up.

 *

 * Until the two_phase is properly available (ENABLED) the subscription will

 * behave as if two_phase = off. When the apply worker detects that all

 * tablesyncs have become READY (while the tri-state was PENDING) it will

 * restart the apply worker process. This happens in

 * ProcessSyncingTablesForApply.

 *

 * When the (re-started) apply worker finds that all tablesyncs are READY for a

 * two_phase tri-state of PENDING it start streaming messages with the

 * two_phase option which in turn enables the decoding of two-phase commits at

 * the publisher. Then, it updates the tri-state value from PENDING to ENABLED.

 * Now, it is possible that during the time we have not enabled two_phase, the

 * publisher (replication server) would have skipped some prepares but we

 * ensure that such prepares are sent along with commit prepare, see

 * ReorderBufferFinishPrepared.

 *

 * If the subscription has no tables then a two_phase tri-state PENDING is

 * left unchanged. This lets the user still do an ALTER SUBSCRIPTION REFRESH

 * PUBLICATION which might otherwise be disallowed (see below).

 *

 * If ever a user needs to be aware of the tri-state value, they can fetch it

 * from the pg_subscription catalog (see column subtwophasestate).

 *

 * Finally, to avoid problems mentioned in previous paragraphs from any

 * subsequent (not READY) tablesyncs (need to toggle two_phase option from 'on'

 * to 'off' and then again back to 'on') there is a restriction for

 * ALTER SUBSCRIPTION REFRESH PUBLICATION. This command is not permitted when

 * the two_phase tri-state is ENABLED, except when copy_data = false.

 *

 * We can get prepare of the same GID more than once for the genuine cases

 * where we have defined multiple subscriptions for publications on the same

 * server and prepared transaction has operations on tables subscribed to those

 * subscriptions. For such cases, if we use the GID sent by publisher one of

 * the prepares will be successful and others will fail, in which case the

 * server will send them again. Now, this can lead to a deadlock if user has

 * set synchronous_standby_names for all the subscriptions on subscriber. To

 * avoid such deadlocks, we generate a unique GID (consisting of the

 * subscription oid and the xid of the prepared transaction) for each prepare

 * transaction on the subscriber.

 *

 * FAILOVER

 * ----------------------

 * The logical slot on the primary can be synced to the standby by specifying

 * failover = true when creating the subscription. Enabling failover allows us

 * to smoothly transition to the promoted standby, ensuring that we can

 * subscribe to the new primary without losing any data.

 *

 * RETAIN DEAD TUPLES

 * ----------------------

 * Each apply worker that enabled retain_dead_tuples option maintains a

 * non-removable transaction ID (oldest_nonremovable_xid) in shared memory to

 * prevent dead rows from being removed prematurely when the apply worker still

 * needs them to detect update_deleted conflicts. Additionally, this helps to

 * retain the required commit_ts module information, which further helps to

 * detect update_origin_differs and delete_origin_differs conflicts reliably, as

 * otherwise, vacuum freeze could remove the required information.

 *

 * The logical replication launcher manages an internal replication slot named

 * "pg_conflict_detection". It asynchronously aggregates the non-removable

 * transaction ID from all apply workers to determine the appropriate xmin for

 * the slot, thereby retaining necessary tuples.

 *

 * The non-removable transaction ID in the apply worker is advanced to the

 * oldest running transaction ID once all concurrent transactions on the

 * publisher have been applied and flushed locally. The process involves:

 *

 * - RDT_GET_CANDIDATE_XID:

 *   Call GetOldestActiveTransactionId() to take oldestRunningXid as the

 *   candidate xid.

 *

 * - RDT_REQUEST_PUBLISHER_STATUS:

 *   Send a message to the walsender requesting the publisher status, which

 *   includes the latest WAL write position and information about transactions

 *   that are in the commit phase.

 *

 * - RDT_WAIT_FOR_PUBLISHER_STATUS:

 *   Wait for the status from the walsender. After receiving the first status,

 *   do not proceed if there are concurrent remote transactions that are still

 *   in the commit phase. These transactions might have been assigned an

 *   earlier commit timestamp but have not yet written the commit WAL record.

 *   Continue to request the publisher status (RDT_REQUEST_PUBLISHER_STATUS)

 *   until all these transactions have completed.

 *

 * - RDT_WAIT_FOR_LOCAL_FLUSH:

 *   Advance the non-removable transaction ID if the current flush location has

 *   reached or surpassed the last received WAL position.

 *

 * - RDT_STOP_CONFLICT_INFO_RETENTION:

 *   This phase is required only when max_retention_duration is defined. We

 *   enter this phase if the wait time in either the

 *   RDT_WAIT_FOR_PUBLISHER_STATUS or RDT_WAIT_FOR_LOCAL_FLUSH phase exceeds

 *   configured max_retention_duration. In this phase,

 *   pg_subscription.subretentionactive is updated to false within a new

 *   transaction, and oldest_nonremovable_xid is set to InvalidTransactionId.

 *

 * - RDT_RESUME_CONFLICT_INFO_RETENTION:

 *   This phase is required only when max_retention_duration is defined. We

 *   enter this phase if the retention was previously stopped, and the time

 *   required to advance the non-removable transaction ID in the

 *   RDT_WAIT_FOR_LOCAL_FLUSH phase has decreased to within acceptable limits

 *   (or if max_retention_duration is set to 0). During this phase,

 *   pg_subscription.subretentionactive is updated to true within a new

 *   transaction, and the worker will be restarted.

 *

 * The overall state progression is: GET_CANDIDATE_XID ->

 * REQUEST_PUBLISHER_STATUS -> WAIT_FOR_PUBLISHER_STATUS -> (loop to

 * REQUEST_PUBLISHER_STATUS till concurrent remote transactions end) ->

 * WAIT_FOR_LOCAL_FLUSH -> loop back to GET_CANDIDATE_XID.

 *

 * Retaining the dead tuples for this period is sufficient for ensuring

 * eventual consistency using last-update-wins strategy, as dead tuples are

 * useful for detecting conflicts only during the application of concurrent

 * transactions from remote nodes. After applying and flushing all remote

 * transactions that occurred concurrently with the tuple DELETE, any

 * subsequent UPDATE from a remote node should have a later timestamp. In such

 * cases, it is acceptable to detect an update_missing scenario and convert the

 * UPDATE to an INSERT when applying it. But, for concurrent remote

 * transactions with earlier timestamps than the DELETE, detecting

 * update_deleted is necessary, as the UPDATEs in remote transactions should be

 * ignored if their timestamp is earlier than that of the dead tuples.

 *

 * Note that advancing the non-removable transaction ID is not supported if the

 * publisher is also a physical standby. This is because the logical walsender

 * on the standby can only get the WAL replay position but there may be more

 * WALs that are being replicated from the primary and those WALs could have

 * earlier commit timestamp.

 *

 * Similarly, when the publisher has subscribed to another publisher,

 * information necessary for conflict detection cannot be retained for

 * changes from origins other than the publisher. This is because publisher

 * lacks the information on concurrent transactions of other publishers to

 * which it subscribes. As the information on concurrent transactions is

 * unavailable beyond subscriber's immediate publishers, the non-removable

 * transaction ID might be advanced prematurely before changes from other

 * origins have been fully applied.

 *

 * XXX Retaining information for changes from other origins might be possible

 * by requesting the subscription on that origin to enable retain_dead_tuples

 * and fetching the conflict detection slot.xmin along with the publisher's

 * status. In the RDT_WAIT_FOR_PUBLISHER_STATUS phase, the apply worker could

 * wait for the remote slot's xmin to reach the oldest active transaction ID,

 * ensuring that all transactions from other origins have been applied on the

 * publisher, thereby getting the latest WAL position that includes all

 * concurrent changes. However, this approach may impact performance, so it

 * might not worth the effort.

 *

 * XXX It seems feasible to get the latest commit's WAL location from the

 * publisher and wait till that is applied. However, we can't do that

 * because commit timestamps can regress as a commit with a later LSN is not

 * guaranteed to have a later timestamp than those with earlier LSNs. Having

 * said that, even if that is possible, it won't improve performance much as

 * the apply always lag and moves slowly as compared with the transactions

 * on the publisher.

 *-------------------------------------------------------------------------

 */


#include "postgres.h"


#include <sys/stat.h>

#include <unistd.h>


#include "access/commit_ts.h"

#include "access/table.h"

#include "access/tableam.h"

#include "access/twophase.h"

#include "access/xact.h"

#include "catalog/indexing.h"

#include "catalog/pg_inherits.h"

#include "catalog/pg_subscription.h"

#include "catalog/pg_subscription_rel.h"

#include "commands/subscriptioncmds.h"

#include "commands/tablecmds.h"

#include "commands/trigger.h"

#include "executor/executor.h"

#include "executor/execPartition.h"

#include "libpq/pqformat.h"

#include "miscadmin.h"

#include "optimizer/optimizer.h"

#include "parser/parse_relation.h"

#include "pgstat.h"

#include "postmaster/bgworker.h"

#include "postmaster/interrupt.h"

#include "postmaster/walwriter.h"

#include "replication/conflict.h"

#include "replication/logicallauncher.h"

#include "replication/logicalproto.h"

#include "replication/logicalrelation.h"

#include "replication/logicalworker.h"

#include "replication/origin.h"

#include "replication/slot.h"

#include "replication/walreceiver.h"

#include "replication/worker_internal.h"

#include "rewrite/rewriteHandler.h"

#include "storage/buffile.h"

#include "storage/ipc.h"

#include "storage/lmgr.h"

#include "storage/procarray.h"

#include "tcop/tcopprot.h"

#include "utils/acl.h"

#include "utils/guc.h"

#include "utils/inval.h"

#include "utils/lsyscache.h"

#include "utils/memutils.h"

#include "utils/pg_lsn.h"

#include "utils/rel.h"

#include "utils/rls.h"

#include "utils/snapmgr.h"

#include "utils/syscache.h"

#include "utils/usercontext.h"


#define NAPTIME_PER_CYCLE 1000  /* max sleep time between cycles (1s) */


typedef struct FlushPosition

{

    dlist_node  node;

    XLogRecPtr  local_end;

    XLogRecPtr  remote_end;

} FlushPosition;


static dlist_head lsn_mapping = DLIST_STATIC_INIT(lsn_mapping);


typedef struct ApplyExecutionData

{

    EState     *estate;         /* executor state, used to track resources */


    LogicalRepRelMapEntry *targetRel;   /* replication target rel */

    ResultRelInfo *targetRelInfo;   /* ResultRelInfo for same */


    /* These fields are used when the target relation is partitioned: */

    ModifyTableState *mtstate;  /* dummy ModifyTable state */

    PartitionTupleRouting *proute;  /* partition routing info */

} ApplyExecutionData;


/* Struct for saving and restoring apply errcontext information */

typedef struct ApplyErrorCallbackArg

{

    LogicalRepMsgType command;  /* 0 if invalid */

    LogicalRepRelMapEntry *rel;


    /* Remote node information */

    int         remote_attnum;  /* -1 if invalid */

    TransactionId remote_xid;

    XLogRecPtr  finish_lsn;

    char       *origin_name;

} ApplyErrorCallbackArg;


/*

 * The action to be taken for the changes in the transaction.

 *

 * TRANS_LEADER_APPLY:

 * This action means that we are in the leader apply worker or table sync

 * worker. The changes of the transaction are either directly applied or

 * are read from temporary files (for streaming transactions) and then

 * applied by the worker.

 *

 * TRANS_LEADER_SERIALIZE:

 * This action means that we are in the leader apply worker or table sync

 * worker. Changes are written to temporary files and then applied when the

 * final commit arrives.

 *

 * TRANS_LEADER_SEND_TO_PARALLEL:

 * This action means that we are in the leader apply worker and need to send

 * the changes to the parallel apply worker.

 *

 * TRANS_LEADER_PARTIAL_SERIALIZE:

 * This action means that we are in the leader apply worker and have sent some

 * changes directly to the parallel apply worker and the remaining changes are

 * serialized to a file, due to timeout while sending data. The parallel apply

 * worker will apply these serialized changes when the final commit arrives.

 *

 * We can't use TRANS_LEADER_SERIALIZE for this case because, in addition to

 * serializing changes, the leader worker also needs to serialize the

 * STREAM_XXX message to a file, and wait for the parallel apply worker to

 * finish the transaction when processing the transaction finish command. So

 * this new action was introduced to keep the code and logic clear.

 *

 * TRANS_PARALLEL_APPLY:

 * This action means that we are in the parallel apply worker and changes of

 * the transaction are applied directly by the worker.

 */

typedef enum

{

    /* The action for non-streaming transactions. */

    TRANS_LEADER_APPLY,


    /* Actions for streaming transactions. */

    TRANS_LEADER_SERIALIZE,

    TRANS_LEADER_SEND_TO_PARALLEL,

    TRANS_LEADER_PARTIAL_SERIALIZE,

    TRANS_PARALLEL_APPLY,

} TransApplyAction;


/*

 * The phases involved in advancing the non-removable transaction ID.

 *

 * See comments atop worker.c for details of the transition between these

 * phases.

 */

typedef enum

{

    RDT_GET_CANDIDATE_XID,

    RDT_REQUEST_PUBLISHER_STATUS,

    RDT_WAIT_FOR_PUBLISHER_STATUS,

    RDT_WAIT_FOR_LOCAL_FLUSH,

    RDT_STOP_CONFLICT_INFO_RETENTION,

    RDT_RESUME_CONFLICT_INFO_RETENTION,

} RetainDeadTuplesPhase;


/*

 * Critical information for managing phase transitions within the

 * RetainDeadTuplesPhase.

 */

typedef struct RetainDeadTuplesData

{

    RetainDeadTuplesPhase phase;    /* current phase */

    XLogRecPtr  remote_lsn;     /* WAL write position on the publisher */


    /*

     * Oldest transaction ID that was in the commit phase on the publisher.

     * Use FullTransactionId to prevent issues with transaction ID wraparound,

     * where a new remote_oldestxid could falsely appear to originate from the

     * past and block advancement.

     */

    FullTransactionId remote_oldestxid;


    /*

     * Next transaction ID to be assigned on the publisher. Use

     * FullTransactionId for consistency and to allow straightforward

     * comparisons with remote_oldestxid.

     */

    FullTransactionId remote_nextxid;


    TimestampTz reply_time;     /* when the publisher responds with status */


    /*

     * Publisher transaction ID that must be awaited to complete before

     * entering the final phase (RDT_WAIT_FOR_LOCAL_FLUSH). Use

     * FullTransactionId for the same reason as remote_nextxid.

     */

    FullTransactionId remote_wait_for;


    TransactionId candidate_xid;    /* candidate for the non-removable

                                     * transaction ID */

    TimestampTz flushpos_update_time;   /* when the remote flush position was

                                         * updated in final phase

                                         * (RDT_WAIT_FOR_LOCAL_FLUSH) */


    long        table_sync_wait_time;   /* time spent waiting for table sync

                                         * to finish */


    /*

     * The following fields are used to determine the timing for the next

     * round of transaction ID advancement.

     */

    TimestampTz last_recv_time; /* when the last message was received */

    TimestampTz candidate_xid_time; /* when the candidate_xid is decided */

    int         xid_advance_interval;   /* how much time (ms) to wait before

                                         * attempting to advance the

                                         * non-removable transaction ID */

} RetainDeadTuplesData;


/*

 * The minimum (100ms) and maximum (3 minutes) intervals for advancing

 * non-removable transaction IDs. The maximum interval is a bit arbitrary but

 * is sufficient to not cause any undue network traffic.

 */

#define MIN_XID_ADVANCE_INTERVAL 100

#define MAX_XID_ADVANCE_INTERVAL 180000


/* errcontext tracker */

static ApplyErrorCallbackArg apply_error_callback_arg =

{

    .command = 0,

    .rel = NULL,

    .remote_attnum = -1,

    .remote_xid = InvalidTransactionId,

    .finish_lsn = InvalidXLogRecPtr,

    .origin_name = NULL,

};


ErrorContextCallback *apply_error_context_stack = NULL;


MemoryContext ApplyMessageContext = NULL;

MemoryContext ApplyContext = NULL;


/* per stream context for streaming transactions */

static MemoryContext LogicalStreamingContext = NULL;


WalReceiverConn *LogRepWorkerWalRcvConn = NULL;


Subscription *MySubscription = NULL;

static bool MySubscriptionValid = false;


static List *on_commit_wakeup_workers_subids = NIL;


bool        in_remote_transaction = false;

static XLogRecPtr remote_final_lsn = InvalidXLogRecPtr;


/* fields valid only when processing streamed transaction */

static bool in_streamed_transaction = false;


static TransactionId stream_xid = InvalidTransactionId;


/*

 * The number of changes applied by parallel apply worker during one streaming

 * block.

 */

static uint32 parallel_stream_nchanges = 0;


/* Are we initializing an apply worker? */

bool        InitializingApplyWorker = false;


/*

 * We enable skipping all data modification changes (INSERT, UPDATE, etc.) for

 * the subscription if the remote transaction's finish LSN matches the subskiplsn.

 * Once we start skipping changes, we don't stop it until we skip all changes of

 * the transaction even if pg_subscription is updated and MySubscription->skiplsn

 * gets changed or reset during that. Also, in streaming transaction cases (streaming = on),

 * we don't skip receiving and spooling the changes since we decide whether or not

 * to skip applying the changes when starting to apply changes. The subskiplsn is

 * cleared after successfully skipping the transaction or applying non-empty

 * transaction. The latter prevents the mistakenly specified subskiplsn from

 * being left. Note that we cannot skip the streaming transactions when using

 * parallel apply workers because we cannot get the finish LSN before applying

 * the changes. So, we don't start parallel apply worker when finish LSN is set

 * by the user.

 */

static XLogRecPtr skip_xact_finish_lsn = InvalidXLogRecPtr;

#define is_skipping_changes() (unlikely(XLogRecPtrIsValid(skip_xact_finish_lsn)))


/* BufFile handle of the current streaming file */

static BufFile *stream_fd = NULL;


/*

 * The remote WAL position that has been applied and flushed locally. We record

 * and use this information both while sending feedback to the server and

 * advancing oldest_nonremovable_xid.

 */

static XLogRecPtr last_flushpos = InvalidXLogRecPtr;


typedef struct SubXactInfo

{

    TransactionId xid;          /* XID of the subxact */

    int         fileno;         /* file number in the buffile */

    off_t       offset;         /* offset in the file */

} SubXactInfo;


/* Sub-transaction data for the current streaming transaction */

typedef struct ApplySubXactData

{

    uint32      nsubxacts;      /* number of sub-transactions */

    uint32      nsubxacts_max;  /* current capacity of subxacts */

    TransactionId subxact_last; /* xid of the last sub-transaction */

    SubXactInfo *subxacts;      /* sub-xact offset in changes file */

} ApplySubXactData;


static ApplySubXactData subxact_data = {0, 0, InvalidTransactionId, NULL};


static inline void subxact_filename(char *path, Oid subid, TransactionId xid);

static inline void changes_filename(char *path, Oid subid, TransactionId xid);


/*

 * Information about subtransactions of a given toplevel transaction.

 */

static void subxact_info_write(Oid subid, TransactionId xid);

static void subxact_info_read(Oid subid, TransactionId xid);

static void subxact_info_add(TransactionId xid);

static inline void cleanup_subxact_info(void);


/*

 * Serialize and deserialize changes for a toplevel transaction.

 */

static void stream_open_file(Oid subid, TransactionId xid,

                             bool first_segment);

static void stream_write_change(char action, StringInfo s);

static void stream_open_and_write_change(TransactionId xid, char action, StringInfo s);

static void stream_close_file(void);


static void send_feedback(XLogRecPtr recvpos, bool force, bool requestReply);


static void maybe_advance_nonremovable_xid(RetainDeadTuplesData *rdt_data,

                                           bool status_received);

static bool can_advance_nonremovable_xid(RetainDeadTuplesData *rdt_data);

static void process_rdt_phase_transition(RetainDeadTuplesData *rdt_data,

                                         bool status_received);

static void get_candidate_xid(RetainDeadTuplesData *rdt_data);

static void request_publisher_status(RetainDeadTuplesData *rdt_data);

static void wait_for_publisher_status(RetainDeadTuplesData *rdt_data,

                                      bool status_received);

static void wait_for_local_flush(RetainDeadTuplesData *rdt_data);

static bool should_stop_conflict_info_retention(RetainDeadTuplesData *rdt_data);

static void stop_conflict_info_retention(RetainDeadTuplesData *rdt_data);

static void resume_conflict_info_retention(RetainDeadTuplesData *rdt_data);

static bool update_retention_status(bool active);

static void reset_retention_data_fields(RetainDeadTuplesData *rdt_data);

static void adjust_xid_advance_interval(RetainDeadTuplesData *rdt_data,

                                        bool new_xid_found);


static void apply_worker_exit(void);


static void apply_handle_commit_internal(LogicalRepCommitData *commit_data);

static void apply_handle_insert_internal(ApplyExecutionData *edata,

                                         ResultRelInfo *relinfo,

                                         TupleTableSlot *remoteslot);

static void apply_handle_update_internal(ApplyExecutionData *edata,

                                         ResultRelInfo *relinfo,

                                         TupleTableSlot *remoteslot,

                                         LogicalRepTupleData *newtup,

                                         Oid localindexoid);

static void apply_handle_delete_internal(ApplyExecutionData *edata,

                                         ResultRelInfo *relinfo,

                                         TupleTableSlot *remoteslot,

                                         Oid localindexoid);

static bool FindReplTupleInLocalRel(ApplyExecutionData *edata, Relation localrel,

                                    LogicalRepRelation *remoterel,

                                    Oid localidxoid,

                                    TupleTableSlot *remoteslot,

                                    TupleTableSlot **localslot);

static bool FindDeletedTupleInLocalRel(Relation localrel,

                                       Oid localidxoid,

                                       TupleTableSlot *remoteslot,

                                       TransactionId *delete_xid,

                                       RepOriginId *delete_origin,

                                       TimestampTz *delete_time);

static void apply_handle_tuple_routing(ApplyExecutionData *edata,

                                       TupleTableSlot *remoteslot,

                                       LogicalRepTupleData *newtup,

                                       CmdType operation);


/* Functions for skipping changes */

static void maybe_start_skipping_changes(XLogRecPtr finish_lsn);

static void stop_skipping_changes(void);

static void clear_subscription_skip_lsn(XLogRecPtr finish_lsn);


/* Functions for apply error callback */

static inline void set_apply_error_context_xact(TransactionId xid, XLogRecPtr lsn);

static inline void reset_apply_error_context_info(void);


static TransApplyAction get_transaction_apply_action(TransactionId xid,

                                                     ParallelApplyWorkerInfo **winfo);


static void replorigin_reset(int code, Datum arg);


/*

 * Form the origin name for the subscription.

 *

 * This is a common function for tablesync and other workers. Tablesync workers

 * must pass a valid relid. Other callers must pass relid = InvalidOid.

 *

 * Return the name in the supplied buffer.

 */

void

ReplicationOriginNameForLogicalRep(Oid suboid, Oid relid,

                                   char *originname, Size szoriginname)

{

    if (OidIsValid(relid))

    {

        /* Replication origin name for tablesync workers. */

        snprintf(originname, szoriginname, "pg_%u_%u", suboid, relid);

    }

    else

    {

        /* Replication origin name for non-tablesync workers. */

        snprintf(originname, szoriginname, "pg_%u", suboid);

    }

}


/*

 * Should this worker apply changes for given relation.

 *

 * This is mainly needed for initial relation data sync as that runs in

 * separate worker process running in parallel and we need some way to skip

 * changes coming to the leader apply worker during the sync of a table.

 *

 * Note we need to do smaller or equals comparison for SYNCDONE state because

 * it might hold position of end of initial slot consistent point WAL

 * record + 1 (ie start of next record) and next record can be COMMIT of

 * transaction we are now processing (which is what we set remote_final_lsn

 * to in apply_handle_begin).

 *

 * Note that for streaming transactions that are being applied in the parallel

 * apply worker, we disallow applying changes if the target table in the

 * subscription is not in the READY state, because we cannot decide whether to

 * apply the change as we won't know remote_final_lsn by that time.

 *

 * We already checked this in pa_can_start() before assigning the

 * streaming transaction to the parallel worker, but it also needs to be

 * checked here because if the user executes ALTER SUBSCRIPTION ... REFRESH

 * PUBLICATION in parallel, the new table can be added to pg_subscription_rel

 * while applying this transaction.

 */

static bool

should_apply_changes_for_rel(LogicalRepRelMapEntry *rel)

{

    switch (MyLogicalRepWorker->type)

    {

        case WORKERTYPE_TABLESYNC:

            return MyLogicalRepWorker->relid == rel->localreloid;


        case WORKERTYPE_PARALLEL_APPLY:

            /* We don't synchronize rel's that are in unknown state. */

            if (rel->state != SUBREL_STATE_READY &&

                rel->state != SUBREL_STATE_UNKNOWN)

                ereport(ERROR,

                        (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

                         errmsg("logical replication parallel apply worker for subscription \"%s\" will stop",

                                MySubscription->name),

                         errdetail("Cannot handle streamed replication transactions using parallel apply workers until all tables have been synchronized.")));


            return rel->state == SUBREL_STATE_READY;


        case WORKERTYPE_APPLY:

            return (rel->state == SUBREL_STATE_READY ||

                    (rel->state == SUBREL_STATE_SYNCDONE &&

                     rel->statelsn <= remote_final_lsn));


        case WORKERTYPE_SEQUENCESYNC:

            /* Should never happen. */

            elog(ERROR, "sequence synchronization worker is not expected to apply changes");

            break;


        case WORKERTYPE_UNKNOWN:

            /* Should never happen. */

            elog(ERROR, "Unknown worker type");

    }


    return false;               /* dummy for compiler */

}


/*

 * Begin one step (one INSERT, UPDATE, etc) of a replication transaction.

 *

 * Start a transaction, if this is the first step (else we keep using the

 * existing transaction).

 * Also provide a global snapshot and ensure we run in ApplyMessageContext.

 */

static void

begin_replication_step(void)

{

    SetCurrentStatementStartTimestamp();


    if (!IsTransactionState())

    {

        StartTransactionCommand();

        maybe_reread_subscription();

    }


    PushActiveSnapshot(GetTransactionSnapshot());


    MemoryContextSwitchTo(ApplyMessageContext);

}


/*

 * Finish up one step of a replication transaction.

 * Callers of begin_replication_step() must also call this.

 *

 * We don't close out the transaction here, but we should increment

 * the command counter to make the effects of this step visible.

 */

static void

end_replication_step(void)

{

    PopActiveSnapshot();


    CommandCounterIncrement();

}


/*

 * Handle streamed transactions for both the leader apply worker and the

 * parallel apply workers.

 *

 * In the streaming case (receiving a block of the streamed transaction), for

 * serialize mode, simply redirect it to a file for the proper toplevel

 * transaction, and for parallel mode, the leader apply worker will send the

 * changes to parallel apply workers and the parallel apply worker will define

 * savepoints if needed. (LOGICAL_REP_MSG_RELATION or LOGICAL_REP_MSG_TYPE

 * messages will be applied by both leader apply worker and parallel apply

 * workers).

 *

 * Returns true for streamed transactions (when the change is either serialized

 * to file or sent to parallel apply worker), false otherwise (regular mode or

 * needs to be processed by parallel apply worker).

 *

 * Exception: If the message being processed is LOGICAL_REP_MSG_RELATION

 * or LOGICAL_REP_MSG_TYPE, return false even if the message needs to be sent

 * to a parallel apply worker.

 */

static bool

handle_streamed_transaction(LogicalRepMsgType action, StringInfo s)

{

    TransactionId current_xid;

    ParallelApplyWorkerInfo *winfo;

    TransApplyAction apply_action;

    StringInfoData original_msg;


    apply_action = get_transaction_apply_action(stream_xid, &winfo);


    /* not in streaming mode */

    if (apply_action == TRANS_LEADER_APPLY)

        return false;


    Assert(TransactionIdIsValid(stream_xid));


    /*

     * The parallel apply worker needs the xid in this message to decide

     * whether to define a savepoint, so save the original message that has

     * not moved the cursor after the xid. We will serialize this message to a

     * file in PARTIAL_SERIALIZE mode.

     */

    original_msg = *s;


    /*

     * We should have received XID of the subxact as the first part of the

     * message, so extract it.

     */

    current_xid = pq_getmsgint(s, 4);


    if (!TransactionIdIsValid(current_xid))

        ereport(ERROR,

                (errcode(ERRCODE_PROTOCOL_VIOLATION),

                 errmsg_internal("invalid transaction ID in streamed replication transaction")));


    switch (apply_action)

    {

        case TRANS_LEADER_SERIALIZE:

            Assert(stream_fd);


            /* Add the new subxact to the array (unless already there). */

            subxact_info_add(current_xid);


            /* Write the change to the current file */

            stream_write_change(action, s);

            return true;


        case TRANS_LEADER_SEND_TO_PARALLEL:

            Assert(winfo);


            /*

             * XXX The publisher side doesn't always send relation/type update

             * messages after the streaming transaction, so also update the

             * relation/type in leader apply worker. See function

             * cleanup_rel_sync_cache.

             */

            if (pa_send_data(winfo, s->len, s->data))

                return (action != LOGICAL_REP_MSG_RELATION &&

                        action != LOGICAL_REP_MSG_TYPE);


            /*

             * Switch to serialize mode when we are not able to send the

             * change to parallel apply worker.

             */

            pa_switch_to_partial_serialize(winfo, false);


            /* fall through */

        case TRANS_LEADER_PARTIAL_SERIALIZE:

            stream_write_change(action, &original_msg);


            /* Same reason as TRANS_LEADER_SEND_TO_PARALLEL case. */

            return (action != LOGICAL_REP_MSG_RELATION &&

                    action != LOGICAL_REP_MSG_TYPE);


        case TRANS_PARALLEL_APPLY:

            parallel_stream_nchanges += 1;


            /* Define a savepoint for a subxact if needed. */

            pa_start_subtrans(current_xid, stream_xid);

            return false;


        default:

            elog(ERROR, "unexpected apply action: %d", (int) apply_action);

            return false;       /* silence compiler warning */

    }

}


/*

 * Executor state preparation for evaluation of constraint expressions,

 * indexes and triggers for the specified relation.

 *

 * Note that the caller must open and close any indexes to be updated.

 */

static ApplyExecutionData *

create_edata_for_relation(LogicalRepRelMapEntry *rel)

{

    ApplyExecutionData *edata;

    EState     *estate;

    RangeTblEntry *rte;

    List       *perminfos = NIL;

    ResultRelInfo *resultRelInfo;


    edata = (ApplyExecutionData *) palloc0(sizeof(ApplyExecutionData));

    edata->targetRel = rel;


    edata->estate = estate = CreateExecutorState();


    rte = makeNode(RangeTblEntry);

    rte->rtekind = RTE_RELATION;

    rte->relid = RelationGetRelid(rel->localrel);

    rte->relkind = rel->localrel->rd_rel->relkind;

    rte->rellockmode = AccessShareLock;


    addRTEPermissionInfo(&perminfos, rte);


    ExecInitRangeTable(estate, list_make1(rte), perminfos,

                       bms_make_singleton(1));


    edata->targetRelInfo = resultRelInfo = makeNode(ResultRelInfo);


    /*

     * Use Relation opened by logicalrep_rel_open() instead of opening it

     * again.

     */

    InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);


    /*

     * We put the ResultRelInfo in the es_opened_result_relations list, even

     * though we don't populate the es_result_relations array.  That's a bit

     * bogus, but it's enough to make ExecGetTriggerResultRel() find them.

     *

     * ExecOpenIndices() is not called here either, each execution path doing

     * an apply operation being responsible for that.

     */

    estate->es_opened_result_relations =

        lappend(estate->es_opened_result_relations, resultRelInfo);


    estate->es_output_cid = GetCurrentCommandId(true);


    /* Prepare to catch AFTER triggers. */

    AfterTriggerBeginQuery();


    /* other fields of edata remain NULL for now */


    return edata;

}


/*

 * Finish any operations related to the executor state created by

 * create_edata_for_relation().

 */

static void

finish_edata(ApplyExecutionData *edata)

{

    EState     *estate = edata->estate;


    /* Handle any queued AFTER triggers. */

    AfterTriggerEndQuery(estate);


    /* Shut down tuple routing, if any was done. */

    if (edata->proute)

        ExecCleanupTupleRouting(edata->mtstate, edata->proute);


    /*

     * Cleanup.  It might seem that we should call ExecCloseResultRelations()

     * here, but we intentionally don't.  It would close the rel we added to

     * es_opened_result_relations above, which is wrong because we took no

     * corresponding refcount.  We rely on ExecCleanupTupleRouting() to close

     * any other relations opened during execution.

     */

    ExecResetTupleTable(estate->es_tupleTable, false);

    FreeExecutorState(estate);

    pfree(edata);

}


/*

 * Executes default values for columns for which we can't map to remote

 * relation columns.

 *

 * This allows us to support tables which have more columns on the downstream

 * than on the upstream.

 */

static void

slot_fill_defaults(LogicalRepRelMapEntry *rel, EState *estate,

                   TupleTableSlot *slot)

{

    TupleDesc   desc = RelationGetDescr(rel->localrel);

    int         num_phys_attrs = desc->natts;

    int         i;

    int         attnum,

                num_defaults = 0;

    int        *defmap;

    ExprState **defexprs;

    ExprContext *econtext;


    econtext = GetPerTupleExprContext(estate);


    /* We got all the data via replication, no need to evaluate anything. */

    if (num_phys_attrs == rel->remoterel.natts)

        return;


    defmap = (int *) palloc(num_phys_attrs * sizeof(int));

    defexprs = (ExprState **) palloc(num_phys_attrs * sizeof(ExprState *));


    Assert(rel->attrmap->maplen == num_phys_attrs);

    for (attnum = 0; attnum < num_phys_attrs; attnum++)

    {

        CompactAttribute *cattr = TupleDescCompactAttr(desc, attnum);

        Expr       *defexpr;


        if (cattr->attisdropped || cattr->attgenerated)

            continue;


        if (rel->attrmap->attnums[attnum] >= 0)

            continue;


        defexpr = (Expr *) build_column_default(rel->localrel, attnum + 1);


        if (defexpr != NULL)

        {

            /* Run the expression through planner */

            defexpr = expression_planner(defexpr);


            /* Initialize executable expression in copycontext */

            defexprs[num_defaults] = ExecInitExpr(defexpr, NULL);

            defmap[num_defaults] = attnum;

            num_defaults++;

        }

    }


    for (i = 0; i < num_defaults; i++)

        slot->tts_values[defmap[i]] =

            ExecEvalExpr(defexprs[i], econtext, &slot->tts_isnull[defmap[i]]);

}


/*

 * Store tuple data into slot.

 *

 * Incoming data can be either text or binary format.

 */

static void

slot_store_data(TupleTableSlot *slot, LogicalRepRelMapEntry *rel,

                LogicalRepTupleData *tupleData)

{

    int         natts = slot->tts_tupleDescriptor->natts;

    int         i;


    ExecClearTuple(slot);


    /* Call the "in" function for each non-dropped, non-null attribute */

    Assert(natts == rel->attrmap->maplen);

    for (i = 0; i < natts; i++)

    {

        Form_pg_attribute att = TupleDescAttr(slot->tts_tupleDescriptor, i);

        int         remoteattnum = rel->attrmap->attnums[i];


        if (!att->attisdropped && remoteattnum >= 0)

        {

            StringInfo  colvalue = &tupleData->colvalues[remoteattnum];


            Assert(remoteattnum < tupleData->ncols);


            /* Set attnum for error callback */

            apply_error_callback_arg.remote_attnum = remoteattnum;


            if (tupleData->colstatus[remoteattnum] == LOGICALREP_COLUMN_TEXT)

            {

                Oid         typinput;

                Oid         typioparam;


                getTypeInputInfo(att->atttypid, &typinput, &typioparam);

                slot->tts_values[i] =

                    OidInputFunctionCall(typinput, colvalue->data,

                                         typioparam, att->atttypmod);

                slot->tts_isnull[i] = false;

            }

            else if (tupleData->colstatus[remoteattnum] == LOGICALREP_COLUMN_BINARY)

            {

                Oid         typreceive;

                Oid         typioparam;


                /*

                 * In some code paths we may be asked to re-parse the same

                 * tuple data.  Reset the StringInfo's cursor so that works.

                 */

                colvalue->cursor = 0;


                getTypeBinaryInputInfo(att->atttypid, &typreceive, &typioparam);

                slot->tts_values[i] =

                    OidReceiveFunctionCall(typreceive, colvalue,

                                           typioparam, att->atttypmod);


                /* Trouble if it didn't eat the whole buffer */

                if (colvalue->cursor != colvalue->len)

                    ereport(ERROR,

                            (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),

                             errmsg("incorrect binary data format in logical replication column %d",

                                    remoteattnum + 1)));

                slot->tts_isnull[i] = false;

            }

            else

            {

                /*

                 * NULL value from remote.  (We don't expect to see

                 * LOGICALREP_COLUMN_UNCHANGED here, but if we do, treat it as

                 * NULL.)

                 */

                slot->tts_values[i] = (Datum) 0;

                slot->tts_isnull[i] = true;

            }


            /* Reset attnum for error callback */

            apply_error_callback_arg.remote_attnum = -1;

        }

        else

        {

            /*

             * We assign NULL to dropped attributes and missing values

             * (missing values should be later filled using

             * slot_fill_defaults).

             */

            slot->tts_values[i] = (Datum) 0;

            slot->tts_isnull[i] = true;

        }

    }


    ExecStoreVirtualTuple(slot);

}


/*

 * Replace updated columns with data from the LogicalRepTupleData struct.

 * This is somewhat similar to heap_modify_tuple but also calls the type

 * input functions on the user data.

 *

 * "slot" is filled with a copy of the tuple in "srcslot", replacing

 * columns provided in "tupleData" and leaving others as-is.

 *

 * Caution: unreplaced pass-by-ref columns in "slot" will point into the

 * storage for "srcslot".  This is OK for current usage, but someday we may

 * need to materialize "slot" at the end to make it independent of "srcslot".

 */

static void

slot_modify_data(TupleTableSlot *slot, TupleTableSlot *srcslot,

                 LogicalRepRelMapEntry *rel,

                 LogicalRepTupleData *tupleData)

{

    int         natts = slot->tts_tupleDescriptor->natts;

    int         i;


    /* We'll fill "slot" with a virtual tuple, so we must start with ... */

    ExecClearTuple(slot);


    /*

     * Copy all the column data from srcslot, so that we'll have valid values

     * for unreplaced columns.

     */

    Assert(natts == srcslot->tts_tupleDescriptor->natts);

    slot_getallattrs(srcslot);

    memcpy(slot->tts_values, srcslot->tts_values, natts * sizeof(Datum));

    memcpy(slot->tts_isnull, srcslot->tts_isnull, natts * sizeof(bool));


    /* Call the "in" function for each replaced attribute */

    Assert(natts == rel->attrmap->maplen);

    for (i = 0; i < natts; i++)

    {

        Form_pg_attribute att = TupleDescAttr(slot->tts_tupleDescriptor, i);

        int         remoteattnum = rel->attrmap->attnums[i];


        if (remoteattnum < 0)

            continue;


        Assert(remoteattnum < tupleData->ncols);


        if (tupleData->colstatus[remoteattnum] != LOGICALREP_COLUMN_UNCHANGED)

        {

            StringInfo  colvalue = &tupleData->colvalues[remoteattnum];


            /* Set attnum for error callback */

            apply_error_callback_arg.remote_attnum = remoteattnum;


            if (tupleData->colstatus[remoteattnum] == LOGICALREP_COLUMN_TEXT)

            {

                Oid         typinput;

                Oid         typioparam;


                getTypeInputInfo(att->atttypid, &typinput, &typioparam);

                slot->tts_values[i] =

                    OidInputFunctionCall(typinput, colvalue->data,

                                         typioparam, att->atttypmod);

                slot->tts_isnull[i] = false;

            }

            else if (tupleData->colstatus[remoteattnum] == LOGICALREP_COLUMN_BINARY)

            {

                Oid         typreceive;

                Oid         typioparam;


                /*

                 * In some code paths we may be asked to re-parse the same

                 * tuple data.  Reset the StringInfo's cursor so that works.

                 */

                colvalue->cursor = 0;


                getTypeBinaryInputInfo(att->atttypid, &typreceive, &typioparam);

                slot->tts_values[i] =

                    OidReceiveFunctionCall(typreceive, colvalue,

                                           typioparam, att->atttypmod);


                /* Trouble if it didn't eat the whole buffer */

                if (colvalue->cursor != colvalue->len)

                    ereport(ERROR,

                            (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),

                             errmsg("incorrect binary data format in logical replication column %d",

                                    remoteattnum + 1)));

                slot->tts_isnull[i] = false;

            }

            else

            {

                /* must be LOGICALREP_COLUMN_NULL */

                slot->tts_values[i] = (Datum) 0;

                slot->tts_isnull[i] = true;

            }


            /* Reset attnum for error callback */

            apply_error_callback_arg.remote_attnum = -1;

        }

    }


    /* And finally, declare that "slot" contains a valid virtual tuple */

    ExecStoreVirtualTuple(slot);

}


/*

 * Handle BEGIN message.

 */

static void

apply_handle_begin(StringInfo s)

{

    LogicalRepBeginData begin_data;


    /* There must not be an active streaming transaction. */

    Assert(!TransactionIdIsValid(stream_xid));


    logicalrep_read_begin(s, &begin_data);

    set_apply_error_context_xact(begin_data.xid, begin_data.final_lsn);


    remote_final_lsn = begin_data.final_lsn;


    maybe_start_skipping_changes(begin_data.final_lsn);


    in_remote_transaction = true;


    pgstat_report_activity(STATE_RUNNING, NULL);

}


/*

 * Handle COMMIT message.

 *

 * TODO, support tracking of multiple origins

 */

static void

apply_handle_commit(StringInfo s)

{

    LogicalRepCommitData commit_data;


    logicalrep_read_commit(s, &commit_data);


    if (commit_data.commit_lsn != remote_final_lsn)

        ereport(ERROR,

                (errcode(ERRCODE_PROTOCOL_VIOLATION),

                 errmsg_internal("incorrect commit LSN %X/%08X in commit message (expected %X/%08X)",

                                 LSN_FORMAT_ARGS(commit_data.commit_lsn),

                                 LSN_FORMAT_ARGS(remote_final_lsn))));


    apply_handle_commit_internal(&commit_data);


    /*

     * Process any tables that are being synchronized in parallel, as well as

     * any newly added tables or sequences.

     */

    ProcessSyncingRelations(commit_data.end_lsn);


    pgstat_report_activity(STATE_IDLE, NULL);

    reset_apply_error_context_info();

}


/*

 * Handle BEGIN PREPARE message.

 */

static void

apply_handle_begin_prepare(StringInfo s)

{

    LogicalRepPreparedTxnData begin_data;


    /* Tablesync should never receive prepare. */

    if (am_tablesync_worker())

        ereport(ERROR,

                (errcode(ERRCODE_PROTOCOL_VIOLATION),

                 errmsg_internal("tablesync worker received a BEGIN PREPARE message")));


    /* There must not be an active streaming transaction. */

    Assert(!TransactionIdIsValid(stream_xid));


    logicalrep_read_begin_prepare(s, &begin_data);

    set_apply_error_context_xact(begin_data.xid, begin_data.prepare_lsn);


    remote_final_lsn = begin_data.prepare_lsn;


    maybe_start_skipping_changes(begin_data.prepare_lsn);


    in_remote_transaction = true;


    pgstat_report_activity(STATE_RUNNING, NULL);

}


/*

 * Common function to prepare the GID.

 */

static void

apply_handle_prepare_internal(LogicalRepPreparedTxnData *prepare_data)

{

    char        gid[GIDSIZE];


    /*

     * Compute unique GID for two_phase transactions. We don't use GID of

     * prepared transaction sent by server as that can lead to deadlock when

     * we have multiple subscriptions from same node point to publications on

     * the same node. See comments atop worker.c

     */

    TwoPhaseTransactionGid(MySubscription->oid, prepare_data->xid,

                           gid, sizeof(gid));


    /*

     * BeginTransactionBlock is necessary to balance the EndTransactionBlock

     * called within the PrepareTransactionBlock below.

     */

    if (!IsTransactionBlock())

    {

        BeginTransactionBlock();

        CommitTransactionCommand(); /* Completes the preceding Begin command. */

    }


    /*

     * Update origin state so we can restart streaming from correct position

     * in case of crash.

     */

    replorigin_session_origin_lsn = prepare_data->end_lsn;

    replorigin_session_origin_timestamp = prepare_data->prepare_time;


    PrepareTransactionBlock(gid);

}


/*

 * Handle PREPARE message.

 */

static void

apply_handle_prepare(StringInfo s)

{

    LogicalRepPreparedTxnData prepare_data;


    logicalrep_read_prepare(s, &prepare_data);


    if (prepare_data.prepare_lsn != remote_final_lsn)

        ereport(ERROR,

                (errcode(ERRCODE_PROTOCOL_VIOLATION),

                 errmsg_internal("incorrect prepare LSN %X/%08X in prepare message (expected %X/%08X)",

                                 LSN_FORMAT_ARGS(prepare_data.prepare_lsn),

                                 LSN_FORMAT_ARGS(remote_final_lsn))));


    /*

     * Unlike commit, here, we always prepare the transaction even though no

     * change has happened in this transaction or all changes are skipped. It

     * is done this way because at commit prepared time, we won't know whether

     * we have skipped preparing a transaction because of those reasons.

     *

     * XXX, We can optimize such that at commit prepared time, we first check

     * whether we have prepared the transaction or not but that doesn't seem

     * worthwhile because such cases shouldn't be common.

     */

    begin_replication_step();


    apply_handle_prepare_internal(&prepare_data);


    end_replication_step();

    CommitTransactionCommand();

    pgstat_report_stat(false);


    /*

     * It is okay not to set the local_end LSN for the prepare because we

     * always flush the prepare record. So, we can send the acknowledgment of

     * the remote_end LSN as soon as prepare is finished.

     *

     * XXX For the sake of consistency with commit, we could have set it with

     * the LSN of prepare but as of now we don't track that value similar to

     * XactLastCommitEnd, and adding it for this purpose doesn't seems worth

     * it.

     */

    store_flush_position(prepare_data.end_lsn, InvalidXLogRecPtr);


    in_remote_transaction = false;


    /*

     * Process any tables that are being synchronized in parallel, as well as

     * any newly added tables or sequences.

     */

    ProcessSyncingRelations(prepare_data.end_lsn);


    /*

     * Since we have already prepared the transaction, in a case where the

     * server crashes before clearing the subskiplsn, it will be left but the

     * transaction won't be resent. But that's okay because it's a rare case

     * and the subskiplsn will be cleared when finishing the next transaction.

     */

    stop_skipping_changes();

    clear_subscription_skip_lsn(prepare_data.prepare_lsn);


    pgstat_report_activity(STATE_IDLE, NULL);

    reset_apply_error_context_info();

}


/*

 * Handle a COMMIT PREPARED of a previously PREPARED transaction.

 *

 * Note that we don't need to wait here if the transaction was prepared in a

 * parallel apply worker. In that case, we have already waited for the prepare

 * to finish in apply_handle_stream_prepare() which will ensure all the

 * operations in that transaction have happened in the subscriber, so no

 * concurrent transaction can cause deadlock or transaction dependency issues.

 */

static void

apply_handle_commit_prepared(StringInfo s)

{

    LogicalRepCommitPreparedTxnData prepare_data;

    char        gid[GIDSIZE];


    logicalrep_read_commit_prepared(s, &prepare_data);

    set_apply_error_context_xact(prepare_data.xid, prepare_data.commit_lsn);


    /* Compute GID for two_phase transactions. */

    TwoPhaseTransactionGid(MySubscription->oid, prepare_data.xid,

                           gid, sizeof(gid));


    /* There is no transaction when COMMIT PREPARED is called */

    begin_replication_step();


    /*

     * Update origin state so we can restart streaming from correct position

     * in case of crash.

     */

    replorigin_session_origin_lsn = prepare_data.end_lsn;

    replorigin_session_origin_timestamp = prepare_data.commit_time;


    FinishPreparedTransaction(gid, true);

    end_replication_step();

    CommitTransactionCommand();

    pgstat_report_stat(false);


    store_flush_position(prepare_data.end_lsn, XactLastCommitEnd);

    in_remote_transaction = false;


    /*

     * Process any tables that are being synchronized in parallel, as well as

     * any newly added tables or sequences.

     */

    ProcessSyncingRelations(prepare_data.end_lsn);


    clear_subscription_skip_lsn(prepare_data.end_lsn);


    pgstat_report_activity(STATE_IDLE, NULL);

    reset_apply_error_context_info();

}


/*

 * Handle a ROLLBACK PREPARED of a previously PREPARED TRANSACTION.

 *

 * Note that we don't need to wait here if the transaction was prepared in a

 * parallel apply worker. In that case, we have already waited for the prepare

 * to finish in apply_handle_stream_prepare() which will ensure all the

 * operations in that transaction have happened in the subscriber, so no

 * concurrent transaction can cause deadlock or transaction dependency issues.

 */

static void

apply_handle_rollback_prepared(StringInfo s)

{

    LogicalRepRollbackPreparedTxnData rollback_data;

    char        gid[GIDSIZE];


    logicalrep_read_rollback_prepared(s, &rollback_data);

    set_apply_error_context_xact(rollback_data.xid, rollback_data.rollback_end_lsn);


    /* Compute GID for two_phase transactions. */

    TwoPhaseTransactionGid(MySubscription->oid, rollback_data.xid,

                           gid, sizeof(gid));


    /*

     * It is possible that we haven't received prepare because it occurred

     * before walsender reached a consistent point or the two_phase was still

     * not enabled by that time, so in such cases, we need to skip rollback

     * prepared.

     */

    if (LookupGXact(gid, rollback_data.prepare_end_lsn,

                    rollback_data.prepare_time))

    {

        /*

         * Update origin state so we can restart streaming from correct

         * position in case of crash.

         */

        replorigin_session_origin_lsn = rollback_data.rollback_end_lsn;

        replorigin_session_origin_timestamp = rollback_data.rollback_time;


        /* There is no transaction when ABORT/ROLLBACK PREPARED is called */

        begin_replication_step();

        FinishPreparedTransaction(gid, false);

        end_replication_step();

        CommitTransactionCommand();


        clear_subscription_skip_lsn(rollback_data.rollback_end_lsn);

    }


    pgstat_report_stat(false);


    /*

     * It is okay not to set the local_end LSN for the rollback of prepared

     * transaction because we always flush the WAL record for it. See

     * apply_handle_prepare.

     */

    store_flush_position(rollback_data.rollback_end_lsn, InvalidXLogRecPtr);

    in_remote_transaction = false;


    /*

     * Process any tables that are being synchronized in parallel, as well as

     * any newly added tables or sequences.

     */

    ProcessSyncingRelations(rollback_data.rollback_end_lsn);


    pgstat_report_activity(STATE_IDLE, NULL);

    reset_apply_error_context_info();

}


/*

 * Handle STREAM PREPARE.

 */

static void

apply_handle_stream_prepare(StringInfo s)

{

    LogicalRepPreparedTxnData prepare_data;

    ParallelApplyWorkerInfo *winfo;

    TransApplyAction apply_action;


    /* Save the message before it is consumed. */

    StringInfoData original_msg = *s;


    if (in_streamed_transaction)

        ereport(ERROR,

                (errcode(ERRCODE_PROTOCOL_VIOLATION),

                 errmsg_internal("STREAM PREPARE message without STREAM STOP")));


    /* Tablesync should never receive prepare. */

    if (am_tablesync_worker())

        ereport(ERROR,

                (errcode(ERRCODE_PROTOCOL_VIOLATION),

                 errmsg_internal("tablesync worker received a STREAM PREPARE message")));


    logicalrep_read_stream_prepare(s, &prepare_data);

    set_apply_error_context_xact(prepare_data.xid, prepare_data.prepare_lsn);


    apply_action = get_transaction_apply_action(prepare_data.xid, &winfo);


    switch (apply_action)

    {

        case TRANS_LEADER_APPLY:


            /*

             * The transaction has been serialized to file, so replay all the

             * spooled operations.

             */

            apply_spooled_messages(MyLogicalRepWorker->stream_fileset,

                                   prepare_data.xid, prepare_data.prepare_lsn);


            /* Mark the transaction as prepared. */

            apply_handle_prepare_internal(&prepare_data);


            CommitTransactionCommand();


            /*

             * It is okay not to set the local_end LSN for the prepare because

             * we always flush the prepare record. See apply_handle_prepare.

             */

            store_flush_position(prepare_data.end_lsn, InvalidXLogRecPtr);


            in_remote_transaction = false;


            /* Unlink the files with serialized changes and subxact info. */

            stream_cleanup_files(MyLogicalRepWorker->subid, prepare_data.xid);


            elog(DEBUG1, "finished processing the STREAM PREPARE command");

            break;


        case TRANS_LEADER_SEND_TO_PARALLEL:

            Assert(winfo);


            if (pa_send_data(winfo, s->len, s->data))

            {

                /* Finish processing the streaming transaction. */

                pa_xact_finish(winfo, prepare_data.end_lsn);

                break;

            }


            /*

             * Switch to serialize mode when we are not able to send the

             * change to parallel apply worker.

             */

            pa_switch_to_partial_serialize(winfo, true);


            /* fall through */

        case TRANS_LEADER_PARTIAL_SERIALIZE:

            Assert(winfo);


            stream_open_and_write_change(prepare_data.xid,

                                         LOGICAL_REP_MSG_STREAM_PREPARE,

                                         &original_msg);


            pa_set_fileset_state(winfo->shared, FS_SERIALIZE_DONE);


            /* Finish processing the streaming transaction. */

            pa_xact_finish(winfo, prepare_data.end_lsn);

            break;


        case TRANS_PARALLEL_APPLY:


            /*

             * If the parallel apply worker is applying spooled messages then

             * close the file before preparing.

             */

            if (stream_fd)

                stream_close_file();


            begin_replication_step();


            /* Mark the transaction as prepared. */

            apply_handle_prepare_internal(&prepare_data);


            end_replication_step();


            CommitTransactionCommand();


            /*

             * It is okay not to set the local_end LSN for the prepare because

             * we always flush the prepare record. See apply_handle_prepare.

             */

            MyParallelShared->last_commit_end = InvalidXLogRecPtr;


            pa_set_xact_state(MyParallelShared, PARALLEL_TRANS_FINISHED);

            pa_unlock_transaction(MyParallelShared->xid, AccessExclusiveLock);


            pa_reset_subtrans();


            elog(DEBUG1, "finished processing the STREAM PREPARE command");

            break;


        default:

            elog(ERROR, "unexpected apply action: %d", (int) apply_action);

            break;

    }


    pgstat_report_stat(false);


    /*

     * Process any tables that are being synchronized in parallel, as well as

     * any newly added tables or sequences.

     */

    ProcessSyncingRelations(prepare_data.end_lsn);


    /*

     * Similar to prepare case, the subskiplsn could be left in a case of

     * server crash but it's okay. See the comments in apply_handle_prepare().

     */

    stop_skipping_changes();

    clear_subscription_skip_lsn(prepare_data.prepare_lsn);


    pgstat_report_activity(STATE_IDLE, NULL);


    reset_apply_error_context_info();

}


/*

 * Handle ORIGIN message.

 *

 * TODO, support tracking of multiple origins

 */

static void

apply_handle_origin(StringInfo s)

{

    /*

     * ORIGIN message can only come inside streaming transaction or inside

     * remote transaction and before any actual writes.

     */

    if (!in_streamed_transaction &&

        (!in_remote_transaction ||

         (IsTransactionState() && !am_tablesync_worker())))

        ereport(ERROR,

                (errcode(ERRCODE_PROTOCOL_VIOLATION),

                 errmsg_internal("ORIGIN message sent out of order")));

}


/*

 * Initialize fileset (if not already done).

 *

 * Create a new file when first_segment is true, otherwise open the existing

 * file.

 */

void

stream_start_internal(TransactionId xid, bool first_segment)

{

    begin_replication_step();


    /*

     * Initialize the worker's stream_fileset if we haven't yet. This will be

     * used for the entire duration of the worker so create it in a permanent

     * context. We create this on the very first streaming message from any

     * transaction and then use it for this and other streaming transactions.

     * Now, we could create a fileset at the start of the worker as well but

     * then we won't be sure that it will ever be used.

     */

    if (!MyLogicalRepWorker->stream_fileset)

    {

        MemoryContext oldctx;


        oldctx = MemoryContextSwitchTo(ApplyContext);


        MyLogicalRepWorker->stream_fileset = palloc(sizeof(FileSet));

        FileSetInit(MyLogicalRepWorker->stream_fileset);


        MemoryContextSwitchTo(oldctx);

    }


    /* Open the spool file for this transaction. */

    stream_open_file(MyLogicalRepWorker->subid, xid, first_segment);


    /* If this is not the first segment, open existing subxact file. */

    if (!first_segment)

        subxact_info_read(MyLogicalRepWorker->subid, xid);


    end_replication_step();

}


/*

 * Handle STREAM START message.

 */

static void

apply_handle_stream_start(StringInfo s)

{

    bool        first_segment;

    ParallelApplyWorkerInfo *winfo;

    TransApplyAction apply_action;


    /* Save the message before it is consumed. */

    StringInfoData original_msg = *s;


    if (in_streamed_transaction)

        ereport(ERROR,

                (errcode(ERRCODE_PROTOCOL_VIOLATION),

                 errmsg_internal("duplicate STREAM START message")));


    /* There must not be an active streaming transaction. */

    Assert(!TransactionIdIsValid(stream_xid));


    /* notify handle methods we're processing a remote transaction */

    in_streamed_transaction = true;


    /* extract XID of the top-level transaction */

    stream_xid = logicalrep_read_stream_start(s, &first_segment);


    if (!TransactionIdIsValid(stream_xid))

        ereport(ERROR,

                (errcode(ERRCODE_PROTOCOL_VIOLATION),

                 errmsg_internal("invalid transaction ID in streamed replication transaction")));


    set_apply_error_context_xact(stream_xid, InvalidXLogRecPtr);


    /* Try to allocate a worker for the streaming transaction. */

    if (first_segment)

        pa_allocate_worker(stream_xid);


    apply_action = get_transaction_apply_action(stream_xid, &winfo);


    switch (apply_action)

    {

        case TRANS_LEADER_SERIALIZE:


            /*

             * Function stream_start_internal starts a transaction. This

             * transaction will be committed on the stream stop unless it is a

             * tablesync worker in which case it will be committed after

             * processing all the messages. We need this transaction for

             * handling the BufFile, used for serializing the streaming data

             * and subxact info.

             */

            stream_start_internal(stream_xid, first_segment);

            break;


        case TRANS_LEADER_SEND_TO_PARALLEL:

            Assert(winfo);


            /*

             * Once we start serializing the changes, the parallel apply

             * worker will wait for the leader to release the stream lock

             * until the end of the transaction. So, we don't need to release

             * the lock or increment the stream count in that case.

             */

            if (pa_send_data(winfo, s->len, s->data))

            {

                /*

                 * Unlock the shared object lock so that the parallel apply

                 * worker can continue to receive changes.

                 */

                if (!first_segment)

                    pa_unlock_stream(winfo->shared->xid, AccessExclusiveLock);


                /*

                 * Increment the number of streaming blocks waiting to be

                 * processed by parallel apply worker.

                 */

                pg_atomic_add_fetch_u32(&winfo->shared->pending_stream_count, 1);


                /* Cache the parallel apply worker for this transaction. */

                pa_set_stream_apply_worker(winfo);

                break;

            }


            /*

             * Switch to serialize mode when we are not able to send the

             * change to parallel apply worker.

             */

            pa_switch_to_partial_serialize(winfo, !first_segment);


            /* fall through */

        case TRANS_LEADER_PARTIAL_SERIALIZE:

            Assert(winfo);


            /*

             * Open the spool file unless it was already opened when switching

             * to serialize mode. The transaction started in

             * stream_start_internal will be committed on the stream stop.

             */

            if (apply_action != TRANS_LEADER_SEND_TO_PARALLEL)

                stream_start_internal(stream_xid, first_segment);


            stream_write_change(LOGICAL_REP_MSG_STREAM_START, &original_msg);


            /* Cache the parallel apply worker for this transaction. */

            pa_set_stream_apply_worker(winfo);

            break;


        case TRANS_PARALLEL_APPLY:

            if (first_segment)

            {

                /* Hold the lock until the end of the transaction. */

                pa_lock_transaction(MyParallelShared->xid, AccessExclusiveLock);

                pa_set_xact_state(MyParallelShared, PARALLEL_TRANS_STARTED);


                /*

                 * Signal the leader apply worker, as it may be waiting for

                 * us.

                 */

                logicalrep_worker_wakeup(WORKERTYPE_APPLY,

                                         MyLogicalRepWorker->subid, InvalidOid);

            }


            parallel_stream_nchanges = 0;

            break;


        default:

            elog(ERROR, "unexpected apply action: %d", (int) apply_action);

            break;

    }


    pgstat_report_activity(STATE_RUNNING, NULL);

}


/*

 * Update the information about subxacts and close the file.

 *

 * This function should be called when the stream_start_internal function has

 * been called.

 */

void

stream_stop_internal(TransactionId xid)

{

    /*

     * Serialize information about subxacts for the toplevel transaction, then

     * close the stream messages spool file.

     */

    subxact_info_write(MyLogicalRepWorker->subid, xid);

    stream_close_file();


    /* We must be in a valid transaction state */

    Assert(IsTransactionState());


    /* Commit the per-stream transaction */

    CommitTransactionCommand();


    /* Reset per-stream context */

    MemoryContextReset(LogicalStreamingContext);

}


/*

 * Handle STREAM STOP message.

 */

static void

apply_handle_stream_stop(StringInfo s)

{

    ParallelApplyWorkerInfo *winfo;

    TransApplyAction apply_action;


    if (!in_streamed_transaction)

        ereport(ERROR,

                (errcode(ERRCODE_PROTOCOL_VIOLATION),

                 errmsg_internal("STREAM STOP message without STREAM START")));


    apply_action = get_transaction_apply_action(stream_xid, &winfo);


    switch (apply_action)

    {

        case TRANS_LEADER_SERIALIZE:

            stream_stop_internal(stream_xid);

            break;


        case TRANS_LEADER_SEND_TO_PARALLEL:

            Assert(winfo);


            /*

             * Lock before sending the STREAM_STOP message so that the leader

             * can hold the lock first and the parallel apply worker will wait

             * for leader to release the lock. See Locking Considerations atop

             * applyparallelworker.c.

             */

            pa_lock_stream(winfo->shared->xid, AccessExclusiveLock);


            if (pa_send_data(winfo, s->len, s->data))

            {

                pa_set_stream_apply_worker(NULL);

                break;

            }


            /*

             * Switch to serialize mode when we are not able to send the

             * change to parallel apply worker.

             */

            pa_switch_to_partial_serialize(winfo, true);


            /* fall through */

        case TRANS_LEADER_PARTIAL_SERIALIZE:

            stream_write_change(LOGICAL_REP_MSG_STREAM_STOP, s);

            stream_stop_internal(stream_xid);

            pa_set_stream_apply_worker(NULL);

            break;


        case TRANS_PARALLEL_APPLY:

            elog(DEBUG1, "applied %u changes in the streaming chunk",

                 parallel_stream_nchanges);


            /*

             * By the time parallel apply worker is processing the changes in

             * the current streaming block, the leader apply worker may have

             * sent multiple streaming blocks. This can lead to parallel apply

             * worker start waiting even when there are more chunk of streams

             * in the queue. So, try to lock only if there is no message left

             * in the queue. See Locking Considerations atop

             * applyparallelworker.c.

             *

             * Note that here we have a race condition where we can start

             * waiting even when there are pending streaming chunks. This can

             * happen if the leader sends another streaming block and acquires

             * the stream lock again after the parallel apply worker checks

             * that there is no pending streaming block and before it actually

             * starts waiting on a lock. We can handle this case by not

             * allowing the leader to increment the stream block count during

             * the time parallel apply worker acquires the lock but it is not

             * clear whether that is worth the complexity.

             *

             * Now, if this missed chunk contains rollback to savepoint, then

             * there is a risk of deadlock which probably shouldn't happen

             * after restart.

             */

            pa_decr_and_wait_stream_block();

            break;


        default:

            elog(ERROR, "unexpected apply action: %d", (int) apply_action);

            break;

    }


    in_streamed_transaction = false;

    stream_xid = InvalidTransactionId;


    /*

     * The parallel apply worker could be in a transaction in which case we

     * need to report the state as STATE_IDLEINTRANSACTION.

     */

    if (IsTransactionOrTransactionBlock())

        pgstat_report_activity(STATE_IDLEINTRANSACTION, NULL);

    else

        pgstat_report_activity(STATE_IDLE, NULL);


    reset_apply_error_context_info();

}


/*

 * Helper function to handle STREAM ABORT message when the transaction was

 * serialized to file.

 */

static void

stream_abort_internal(TransactionId xid, TransactionId subxid)

{

    /*

     * If the two XIDs are the same, it's in fact abort of toplevel xact, so

     * just delete the files with serialized info.

     */

    if (xid == subxid)

        stream_cleanup_files(MyLogicalRepWorker->subid, xid);

    else

    {

        /*

         * OK, so it's a subxact. We need to read the subxact file for the

         * toplevel transaction, determine the offset tracked for the subxact,

         * and truncate the file with changes. We also remove the subxacts

         * with higher offsets (or rather higher XIDs).

         *

         * We intentionally scan the array from the tail, because we're likely

         * aborting a change for the most recent subtransactions.

         *

         * We can't use the binary search here as subxact XIDs won't

         * necessarily arrive in sorted order, consider the case where we have

         * released the savepoint for multiple subtransactions and then

         * performed rollback to savepoint for one of the earlier

         * sub-transaction.

         */

        int64       i;

        int64       subidx;

        BufFile    *fd;

        bool        found = false;

        char        path[MAXPGPATH];


        subidx = -1;

        begin_replication_step();

        subxact_info_read(MyLogicalRepWorker->subid, xid);


        for (i = subxact_data.nsubxacts; i > 0; i--)

        {

            if (subxact_data.subxacts[i - 1].xid == subxid)

            {

                subidx = (i - 1);

                found = true;

                break;

            }

        }


        /*

         * If it's an empty sub-transaction then we will not find the subxid

         * here so just cleanup the subxact info and return.

         */

        if (!found)

        {

            /* Cleanup the subxact info */

            cleanup_subxact_info();

            end_replication_step();

            CommitTransactionCommand();

            return;

        }


        /* open the changes file */

        changes_filename(path, MyLogicalRepWorker->subid, xid);

        fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset, path,

                                O_RDWR, false);


        /* OK, truncate the file at the right offset */

        BufFileTruncateFileSet(fd, subxact_data.subxacts[subidx].fileno,

                               subxact_data.subxacts[subidx].offset);

        BufFileClose(fd);


        /* discard the subxacts added later */

        subxact_data.nsubxacts = subidx;


        /* write the updated subxact list */

        subxact_info_write(MyLogicalRepWorker->subid, xid);


        end_replication_step();

        CommitTransactionCommand();

    }

}


/*

 * Handle STREAM ABORT message.

 */

static void

apply_handle_stream_abort(StringInfo s)

{

    TransactionId xid;

    TransactionId subxid;

    LogicalRepStreamAbortData abort_data;

    ParallelApplyWorkerInfo *winfo;

    TransApplyAction apply_action;


    /* Save the message before it is consumed. */

    StringInfoData original_msg = *s;

    bool        toplevel_xact;


    if (in_streamed_transaction)

        ereport(ERROR,

                (errcode(ERRCODE_PROTOCOL_VIOLATION),

                 errmsg_internal("STREAM ABORT message without STREAM STOP")));


    /* We receive abort information only when we can apply in parallel. */

    logicalrep_read_stream_abort(s, &abort_data,

                                 MyLogicalRepWorker->parallel_apply);


    xid = abort_data.xid;

    subxid = abort_data.subxid;

    toplevel_xact = (xid == subxid);


    set_apply_error_context_xact(subxid, abort_data.abort_lsn);


    apply_action = get_transaction_apply_action(xid, &winfo);


    switch (apply_action)

    {

        case TRANS_LEADER_APPLY:


            /*

             * We are in the leader apply worker and the transaction has been

             * serialized to file.

             */

            stream_abort_internal(xid, subxid);


            elog(DEBUG1, "finished processing the STREAM ABORT command");

            break;


        case TRANS_LEADER_SEND_TO_PARALLEL:

            Assert(winfo);


            /*

             * For the case of aborting the subtransaction, we increment the

             * number of streaming blocks and take the lock again before

             * sending the STREAM_ABORT to ensure that the parallel apply

             * worker will wait on the lock for the next set of changes after

             * processing the STREAM_ABORT message if it is not already

             * waiting for STREAM_STOP message.

             *

             * It is important to perform this locking before sending the

             * STREAM_ABORT message so that the leader can hold the lock first

             * and the parallel apply worker will wait for the leader to

             * release the lock. This is the same as what we do in

             * apply_handle_stream_stop. See Locking Considerations atop

             * applyparallelworker.c.

             */

            if (!toplevel_xact)

            {

                pa_unlock_stream(xid, AccessExclusiveLock);

                pg_atomic_add_fetch_u32(&winfo->shared->pending_stream_count, 1);

                pa_lock_stream(xid, AccessExclusiveLock);

            }


            if (pa_send_data(winfo, s->len, s->data))

            {

                /*

                 * Unlike STREAM_COMMIT and STREAM_PREPARE, we don't need to

                 * wait here for the parallel apply worker to finish as that

                 * is not required to maintain the commit order and won't have

                 * the risk of failures due to transaction dependencies and

                 * deadlocks. However, it is possible that before the parallel

                 * worker finishes and we clear the worker info, the xid

                 * wraparound happens on the upstream and a new transaction

                 * with the same xid can appear and that can lead to duplicate

                 * entries in ParallelApplyTxnHash. Yet another problem could

                 * be that we may have serialized the changes in partial

                 * serialize mode and the file containing xact changes may

                 * already exist, and after xid wraparound trying to create

                 * the file for the same xid can lead to an error. To avoid

                 * these problems, we decide to wait for the aborts to finish.

                 *

                 * Note, it is okay to not update the flush location position

                 * for aborts as in worst case that means such a transaction

                 * won't be sent again after restart.

                 */

                if (toplevel_xact)

                    pa_xact_finish(winfo, InvalidXLogRecPtr);


                break;

            }


            /*

             * Switch to serialize mode when we are not able to send the

             * change to parallel apply worker.

             */

            pa_switch_to_partial_serialize(winfo, true);


            /* fall through */

        case TRANS_LEADER_PARTIAL_SERIALIZE:

            Assert(winfo);


            /*

             * Parallel apply worker might have applied some changes, so write

             * the STREAM_ABORT message so that it can rollback the

             * subtransaction if needed.

             */

            stream_open_and_write_change(xid, LOGICAL_REP_MSG_STREAM_ABORT,

                                         &original_msg);


            if (toplevel_xact)

            {

                pa_set_fileset_state(winfo->shared, FS_SERIALIZE_DONE);

                pa_xact_finish(winfo, InvalidXLogRecPtr);

            }

            break;


        case TRANS_PARALLEL_APPLY:


            /*

             * If the parallel apply worker is applying spooled messages then

             * close the file before aborting.

             */

            if (toplevel_xact && stream_fd)

                stream_close_file();


            pa_stream_abort(&abort_data);


            /*

             * We need to wait after processing rollback to savepoint for the

             * next set of changes.

             *

             * We have a race condition here due to which we can start waiting

             * here when there are more chunk of streams in the queue. See

             * apply_handle_stream_stop.

             */

            if (!toplevel_xact)

                pa_decr_and_wait_stream_block();


            elog(DEBUG1, "finished processing the STREAM ABORT command");

            break;


        default:

            elog(ERROR, "unexpected apply action: %d", (int) apply_action);

            break;

    }


    reset_apply_error_context_info();

}


/*

 * Ensure that the passed location is fileset's end.

 */

static void

ensure_last_message(FileSet *stream_fileset, TransactionId xid, int fileno,

                    off_t offset)

{

    char        path[MAXPGPATH];

    BufFile    *fd;

    int         last_fileno;

    off_t       last_offset;


    Assert(!IsTransactionState());


    begin_replication_step();


    changes_filename(path, MyLogicalRepWorker->subid, xid);


    fd = BufFileOpenFileSet(stream_fileset, path, O_RDONLY, false);


    BufFileSeek(fd, 0, 0, SEEK_END);

    BufFileTell(fd, &last_fileno, &last_offset);


    BufFileClose(fd);


    end_replication_step();


    if (last_fileno != fileno || last_offset != offset)

        elog(ERROR, "unexpected message left in streaming transaction's changes file \"%s\"",

             path);

}


/*

 * Common spoolfile processing.

 */

void

apply_spooled_messages(FileSet *stream_fileset, TransactionId xid,

                       XLogRecPtr lsn)

{

    int         nchanges;

    char        path[MAXPGPATH];

    char       *buffer = NULL;

    MemoryContext oldcxt;

    ResourceOwner oldowner;

    int         fileno;

    off_t       offset;


    if (!am_parallel_apply_worker())

        maybe_start_skipping_changes(lsn);


    /* Make sure we have an open transaction */

    begin_replication_step();


    /*

     * Allocate file handle and memory required to process all the messages in

     * TopTransactionContext to avoid them getting reset after each message is

     * processed.

     */

    oldcxt = MemoryContextSwitchTo(TopTransactionContext);


    /* Open the spool file for the committed/prepared transaction */

    changes_filename(path, MyLogicalRepWorker->subid, xid);

    elog(DEBUG1, "replaying changes from file \"%s\"", path);


    /*

     * Make sure the file is owned by the toplevel transaction so that the

     * file will not be accidentally closed when aborting a subtransaction.

     */

    oldowner = CurrentResourceOwner;

    CurrentResourceOwner = TopTransactionResourceOwner;


    stream_fd = BufFileOpenFileSet(stream_fileset, path, O_RDONLY, false);


    CurrentResourceOwner = oldowner;


    buffer = palloc(BLCKSZ);


    MemoryContextSwitchTo(oldcxt);


    remote_final_lsn = lsn;


    /*

     * Make sure the handle apply_dispatch methods are aware we're in a remote

     * transaction.

     */

    in_remote_transaction = true;

    pgstat_report_activity(STATE_RUNNING, NULL);


    end_replication_step();


    /*

     * Read the entries one by one and pass them through the same logic as in

     * apply_dispatch.

     */

    nchanges = 0;

    while (true)

    {

        StringInfoData s2;

        size_t      nbytes;

        int         len;


        CHECK_FOR_INTERRUPTS();


        /* read length of the on-disk record */

        nbytes = BufFileReadMaybeEOF(stream_fd, &len, sizeof(len), true);


        /* have we reached end of the file? */

        if (nbytes == 0)

            break;


        /* do we have a correct length? */

        if (len <= 0)

            elog(ERROR, "incorrect length %d in streaming transaction's changes file \"%s\"",

                 len, path);


        /* make sure we have sufficiently large buffer */

        buffer = repalloc(buffer, len);


        /* and finally read the data into the buffer */

        BufFileReadExact(stream_fd, buffer, len);


        BufFileTell(stream_fd, &fileno, &offset);


        /* init a stringinfo using the buffer and call apply_dispatch */

        initReadOnlyStringInfo(&s2, buffer, len);


        /* Ensure we are reading the data into our memory context. */

        oldcxt = MemoryContextSwitchTo(ApplyMessageContext);


        apply_dispatch(&s2);


        MemoryContextReset(ApplyMessageContext);


        MemoryContextSwitchTo(oldcxt);


        nchanges++;


        /*

         * It is possible the file has been closed because we have processed

         * the transaction end message like stream_commit in which case that

         * must be the last message.

         */

        if (!stream_fd)

        {

            ensure_last_message(stream_fileset, xid, fileno, offset);

            break;

        }


        if (nchanges % 1000 == 0)

            elog(DEBUG1, "replayed %d changes from file \"%s\"",

                 nchanges, path);

    }


    if (stream_fd)

        stream_close_file();


    elog(DEBUG1, "replayed %d (all) changes from file \"%s\"",

         nchanges, path);


    return;

}


/*

 * Handle STREAM COMMIT message.

 */

static void

apply_handle_stream_commit(StringInfo s)

{

    TransactionId xid;

    LogicalRepCommitData commit_data;

    ParallelApplyWorkerInfo *winfo;

    TransApplyAction apply_action;


    /* Save the message before it is consumed. */

    StringInfoData original_msg = *s;


    if (in_streamed_transaction)

        ereport(ERROR,

                (errcode(ERRCODE_PROTOCOL_VIOLATION),

                 errmsg_internal("STREAM COMMIT message without STREAM STOP")));


    xid = logicalrep_read_stream_commit(s, &commit_data);

    set_apply_error_context_xact(xid, commit_data.commit_lsn);


    apply_action = get_transaction_apply_action(xid, &winfo);


    switch (apply_action)

    {

        case TRANS_LEADER_APPLY:


            /*

             * The transaction has been serialized to file, so replay all the

             * spooled operations.

             */

            apply_spooled_messages(MyLogicalRepWorker->stream_fileset, xid,

                                   commit_data.commit_lsn);


            apply_handle_commit_internal(&commit_data);


            /* Unlink the files with serialized changes and subxact info. */

            stream_cleanup_files(MyLogicalRepWorker->subid, xid);


            elog(DEBUG1, "finished processing the STREAM COMMIT command");

            break;


        case TRANS_LEADER_SEND_TO_PARALLEL:

            Assert(winfo);


            if (pa_send_data(winfo, s->len, s->data))

            {

                /* Finish processing the streaming transaction. */

                pa_xact_finish(winfo, commit_data.end_lsn);

                break;

            }


            /*

             * Switch to serialize mode when we are not able to send the

             * change to parallel apply worker.

             */

            pa_switch_to_partial_serialize(winfo, true);


            /* fall through */

        case TRANS_LEADER_PARTIAL_SERIALIZE:

            Assert(winfo);


            stream_open_and_write_change(xid, LOGICAL_REP_MSG_STREAM_COMMIT,

                                         &original_msg);


            pa_set_fileset_state(winfo->shared, FS_SERIALIZE_DONE);


            /* Finish processing the streaming transaction. */

            pa_xact_finish(winfo, commit_data.end_lsn);

            break;


        case TRANS_PARALLEL_APPLY:


            /*

             * If the parallel apply worker is applying spooled messages then

             * close the file before committing.

             */

            if (stream_fd)

                stream_close_file();


            apply_handle_commit_internal(&commit_data);


            MyParallelShared->last_commit_end = XactLastCommitEnd;


            /*

             * It is important to set the transaction state as finished before

             * releasing the lock. See pa_wait_for_xact_finish.

             */

            pa_set_xact_state(MyParallelShared, PARALLEL_TRANS_FINISHED);

            pa_unlock_transaction(xid, AccessExclusiveLock);


            pa_reset_subtrans();


            elog(DEBUG1, "finished processing the STREAM COMMIT command");

            break;


        default:

            elog(ERROR, "unexpected apply action: %d", (int) apply_action);

            break;

    }


    /*

     * Process any tables that are being synchronized in parallel, as well as

     * any newly added tables or sequences.

     */

    ProcessSyncingRelations(commit_data.end_lsn);


    pgstat_report_activity(STATE_IDLE, NULL);


    reset_apply_error_context_info();

}


/*

 * Helper function for apply_handle_commit and apply_handle_stream_commit.

 */

static void

apply_handle_commit_internal(LogicalRepCommitData *commit_data)

{

    if (is_skipping_changes())

    {

        stop_skipping_changes();


        /*

         * Start a new transaction to clear the subskiplsn, if not started

         * yet.

         */

        if (!IsTransactionState())

            StartTransactionCommand();

    }


    if (IsTransactionState())

    {

        /*

         * The transaction is either non-empty or skipped, so we clear the

         * subskiplsn.

         */

        clear_subscription_skip_lsn(commit_data->commit_lsn);


        /*

         * Update origin state so we can restart streaming from correct

         * position in case of crash.

         */

        replorigin_session_origin_lsn = commit_data->end_lsn;

        replorigin_session_origin_timestamp = commit_data->committime;


        CommitTransactionCommand();


        if (IsTransactionBlock())

        {

            EndTransactionBlock(false);

            CommitTransactionCommand();

        }


        pgstat_report_stat(false);


        store_flush_position(commit_data->end_lsn, XactLastCommitEnd);

    }

    else

    {

        /* Process any invalidation messages that might have accumulated. */

        AcceptInvalidationMessages();

        maybe_reread_subscription();

    }


    in_remote_transaction = false;

}


/*

 * Handle RELATION message.

 *

 * Note we don't do validation against local schema here. The validation

 * against local schema is postponed until first change for given relation

 * comes as we only care about it when applying changes for it anyway and we

 * do less locking this way.

 */

static void

apply_handle_relation(StringInfo s)

{

    LogicalRepRelation *rel;


    if (handle_streamed_transaction(LOGICAL_REP_MSG_RELATION, s))

        return;


    rel = logicalrep_read_rel(s);

    logicalrep_relmap_update(rel);


    /* Also reset all entries in the partition map that refer to remoterel. */

    logicalrep_partmap_reset_relmap(rel);

}


/*

 * Handle TYPE message.

 *

 * This implementation pays no attention to TYPE messages; we expect the user

 * to have set things up so that the incoming data is acceptable to the input

 * functions for the locally subscribed tables.  Hence, we just read and

 * discard the message.

 */

static void

apply_handle_type(StringInfo s)

{

    LogicalRepTyp typ;


    if (handle_streamed_transaction(LOGICAL_REP_MSG_TYPE, s))

        return;


    logicalrep_read_typ(s, &typ);

}


/*

 * Check that we (the subscription owner) have sufficient privileges on the

 * target relation to perform the given operation.

 */

static void

TargetPrivilegesCheck(Relation rel, AclMode mode)

{

    Oid         relid;

    AclResult   aclresult;


    relid = RelationGetRelid(rel);

    aclresult = pg_class_aclcheck(relid, GetUserId(), mode);

    if (aclresult != ACLCHECK_OK)

        aclcheck_error(aclresult,

                       get_relkind_objtype(rel->rd_rel->relkind),

                       get_rel_name(relid));


    /*

     * We lack the infrastructure to honor RLS policies.  It might be possible

     * to add such infrastructure here, but tablesync workers lack it, too, so

     * we don't bother.  RLS does not ordinarily apply to TRUNCATE commands,

     * but it seems dangerous to replicate a TRUNCATE and then refuse to

     * replicate subsequent INSERTs, so we forbid all commands the same.

     */

    if (check_enable_rls(relid, InvalidOid, false) == RLS_ENABLED)

        ereport(ERROR,

                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),

                 errmsg("user \"%s\" cannot replicate into relation with row-level security enabled: \"%s\"",

                        GetUserNameFromId(GetUserId(), true),

                        RelationGetRelationName(rel))));

}


/*

 * Handle INSERT message.

 */


static void

apply_handle_insert(StringInfo s)

{

    LogicalRepRelMapEntry *rel;

    LogicalRepTupleData newtup;

    LogicalRepRelId relid;

    UserContext ucxt;

    ApplyExecutionData *edata;

    EState     *estate;

    TupleTableSlot *remoteslot;

    MemoryContext oldctx;

    bool        run_as_owner;


    /*

     * Quick return if we are skipping data modification changes or handling

     * streamed transactions.

     */

    if (is_skipping_changes() ||

        handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))

        return;


    begin_replication_step();


    relid = logicalrep_read_insert(s, &newtup);

    rel = logicalrep_rel_open(relid, RowExclusiveLock);

    if (!should_apply_changes_for_rel(rel))

    {

        /*

         * The relation can't become interesting in the middle of the

         * transaction so it's safe to unlock it.

         */

        logicalrep_rel_close(rel, RowExclusiveLock);

        end_replication_step();

        return;

    }


    /*

     * Make sure that any user-supplied code runs as the table owner, unless

     * the user has opted out of that behavior.

     */

    run_as_owner = MySubscription->runasowner;

    if (!run_as_owner)

        SwitchToUntrustedUser(rel->localrel->rd_rel->relowner, &ucxt);


    /* Set relation for error callback */

    apply_error_callback_arg.rel = rel;


    /* Initialize the executor state. */

    edata = create_edata_for_relation(rel);

    estate = edata->estate;

    remoteslot = ExecInitExtraTupleSlot(estate,

                                        RelationGetDescr(rel->localrel),

                                        &TTSOpsVirtual);


    /* Process and store remote tuple in the slot */

    oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));

    slot_store_data(remoteslot, rel, &newtup);

    slot_fill_defaults(rel, estate, remoteslot);

    MemoryContextSwitchTo(oldctx);


    /* For a partitioned table, insert the tuple into a partition. */

    if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)

        apply_handle_tuple_routing(edata,

                                   remoteslot, NULL, CMD_INSERT);

    else

    {

        ResultRelInfo *relinfo = edata->targetRelInfo;


        ExecOpenIndices(relinfo, false);

        apply_handle_insert_internal(edata, relinfo, remoteslot);

        ExecCloseIndices(relinfo);

    }


    finish_edata(edata);


    /* Reset relation for error callback */

    apply_error_callback_arg.rel = NULL;


    if (!run_as_owner)

        RestoreUserContext(&ucxt);


    logicalrep_rel_close(rel, NoLock);


    end_replication_step();

}


/*

 * Workhorse for apply_handle_insert()

 * relinfo is for the relation we're actually inserting into

 * (could be a child partition of edata->targetRelInfo)

 */

static void

apply_handle_insert_internal(ApplyExecutionData *edata,

                             ResultRelInfo *relinfo,

                             TupleTableSlot *remoteslot)

{

    EState     *estate = edata->estate;


    /* Caller should have opened indexes already. */

    Assert(relinfo->ri_IndexRelationDescs != NULL ||

           !relinfo->ri_RelationDesc->rd_rel->relhasindex ||

           RelationGetIndexList(relinfo->ri_RelationDesc) == NIL);


    /* Caller will not have done this bit. */

    Assert(relinfo->ri_onConflictArbiterIndexes == NIL);

    InitConflictIndexes(relinfo);


    /* Do the insert. */

    TargetPrivilegesCheck(relinfo->ri_RelationDesc, ACL_INSERT);

    ExecSimpleRelationInsert(relinfo, estate, remoteslot);

}


/*

 * Check if the logical replication relation is updatable and throw

 * appropriate error if it isn't.

 */

static void

check_relation_updatable(LogicalRepRelMapEntry *rel)

{

    /*

     * For partitioned tables, we only need to care if the target partition is

     * updatable (aka has PK or RI defined for it).

     */

    if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)

        return;


    /* Updatable, no error. */

    if (rel->updatable)

        return;


    /*

     * We are in error mode so it's fine this is somewhat slow. It's better to

     * give user correct error.

     */

    if (OidIsValid(GetRelationIdentityOrPK(rel->localrel)))

    {

        ereport(ERROR,

                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

                 errmsg("publisher did not send replica identity column "

                        "expected by the logical replication target relation \"%s.%s\"",

                        rel->remoterel.nspname, rel->remoterel.relname)));

    }


    ereport(ERROR,

            (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

             errmsg("logical replication target relation \"%s.%s\" has "

                    "neither REPLICA IDENTITY index nor PRIMARY "

                    "KEY and published relation does not have "

                    "REPLICA IDENTITY FULL",

                    rel->remoterel.nspname, rel->remoterel.relname)));

}


/*

 * Handle UPDATE message.

 *

 * TODO: FDW support

 */

static void

apply_handle_update(StringInfo s)

{

    LogicalRepRelMapEntry *rel;

    LogicalRepRelId relid;

    UserContext ucxt;

    ApplyExecutionData *edata;

    EState     *estate;

    LogicalRepTupleData oldtup;

    LogicalRepTupleData newtup;

    bool        has_oldtup;

    TupleTableSlot *remoteslot;

    RTEPermissionInfo *target_perminfo;

    MemoryContext oldctx;

    bool        run_as_owner;


    /*

     * Quick return if we are skipping data modification changes or handling

     * streamed transactions.

     */

    if (is_skipping_changes() ||

        handle_streamed_transaction(LOGICAL_REP_MSG_UPDATE, s))

        return;


    begin_replication_step();


    relid = logicalrep_read_update(s, &has_oldtup, &oldtup,

                                   &newtup);

    rel = logicalrep_rel_open(relid, RowExclusiveLock);

    if (!should_apply_changes_for_rel(rel))

    {

        /*

         * The relation can't become interesting in the middle of the

         * transaction so it's safe to unlock it.

         */

        logicalrep_rel_close(rel, RowExclusiveLock);

        end_replication_step();

        return;

    }


    /* Set relation for error callback */

    apply_error_callback_arg.rel = rel;


    /* Check if we can do the update. */

    check_relation_updatable(rel);


    /*

     * Make sure that any user-supplied code runs as the table owner, unless

     * the user has opted out of that behavior.

     */

    run_as_owner = MySubscription->runasowner;

    if (!run_as_owner)

        SwitchToUntrustedUser(rel->localrel->rd_rel->relowner, &ucxt);


    /* Initialize the executor state. */

    edata = create_edata_for_relation(rel);

    estate = edata->estate;

    remoteslot = ExecInitExtraTupleSlot(estate,

                                        RelationGetDescr(rel->localrel),

                                        &TTSOpsVirtual);


    /*

     * Populate updatedCols so that per-column triggers can fire, and so

     * executor can correctly pass down indexUnchanged hint.  This could

     * include more columns than were actually changed on the publisher

     * because the logical replication protocol doesn't contain that

     * information.  But it would for example exclude columns that only exist

     * on the subscriber, since we are not touching those.

     */

    target_perminfo = list_nth(estate->es_rteperminfos, 0);

    for (int i = 0; i < remoteslot->tts_tupleDescriptor->natts; i++)

    {

        CompactAttribute *att = TupleDescCompactAttr(remoteslot->tts_tupleDescriptor, i);

        int         remoteattnum = rel->attrmap->attnums[i];


        if (!att->attisdropped && remoteattnum >= 0)

        {

            Assert(remoteattnum < newtup.ncols);

            if (newtup.colstatus[remoteattnum] != LOGICALREP_COLUMN_UNCHANGED)

                target_perminfo->updatedCols =

                    bms_add_member(target_perminfo->updatedCols,

                                   i + 1 - FirstLowInvalidHeapAttributeNumber);

        }

    }


    /* Build the search tuple. */

    oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));

    slot_store_data(remoteslot, rel,

                    has_oldtup ? &oldtup : &newtup);

    MemoryContextSwitchTo(oldctx);


    /* For a partitioned table, apply update to correct partition. */

    if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)

        apply_handle_tuple_routing(edata,

                                   remoteslot, &newtup, CMD_UPDATE);

    else

        apply_handle_update_internal(edata, edata->targetRelInfo,

                                     remoteslot, &newtup, rel->localindexoid);


    finish_edata(edata);


    /* Reset relation for error callback */

    apply_error_callback_arg.rel = NULL;


    if (!run_as_owner)

        RestoreUserContext(&ucxt);


    logicalrep_rel_close(rel, NoLock);


    end_replication_step();

}


/*

 * Workhorse for apply_handle_update()

 * relinfo is for the relation we're actually updating in

 * (could be a child partition of edata->targetRelInfo)

 */

static void

apply_handle_update_internal(ApplyExecutionData *edata,

                             ResultRelInfo *relinfo,

                             TupleTableSlot *remoteslot,

                             LogicalRepTupleData *newtup,

                             Oid localindexoid)

{

    EState     *estate = edata->estate;

    LogicalRepRelMapEntry *relmapentry = edata->targetRel;

    Relation    localrel = relinfo->ri_RelationDesc;

    EPQState    epqstate;

    TupleTableSlot *localslot = NULL;

    ConflictTupleInfo conflicttuple = {0};

    bool        found;

    MemoryContext oldctx;


    EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1, NIL);

    ExecOpenIndices(relinfo, false);


    found = FindReplTupleInLocalRel(edata, localrel,

                                    &relmapentry->remoterel,

                                    localindexoid,

                                    remoteslot, &localslot);


    /*

     * Tuple found.

     *

     * Note this will fail if there are other conflicting unique indexes.

     */

    if (found)

    {

        /*

         * Report the conflict if the tuple was modified by a different

         * origin.

         */

        if (GetTupleTransactionInfo(localslot, &conflicttuple.xmin,

                                    &conflicttuple.origin, &conflicttuple.ts) &&

            conflicttuple.origin != replorigin_session_origin)

        {

            TupleTableSlot *newslot;


            /* Store the new tuple for conflict reporting */

            newslot = table_slot_create(localrel, &estate->es_tupleTable);

            slot_store_data(newslot, relmapentry, newtup);


            conflicttuple.slot = localslot;


            ReportApplyConflict(estate, relinfo, LOG, CT_UPDATE_ORIGIN_DIFFERS,

                                remoteslot, newslot,

                                list_make1(&conflicttuple));

        }


        /* Process and store remote tuple in the slot */

        oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));

        slot_modify_data(remoteslot, localslot, relmapentry, newtup);

        MemoryContextSwitchTo(oldctx);


        EvalPlanQualSetSlot(&epqstate, remoteslot);


        InitConflictIndexes(relinfo);


        /* Do the actual update. */

        TargetPrivilegesCheck(relinfo->ri_RelationDesc, ACL_UPDATE);

        ExecSimpleRelationUpdate(relinfo, estate, &epqstate, localslot,

                                 remoteslot);

    }

    else

    {

        ConflictType type;

        TupleTableSlot *newslot = localslot;


        /*

         * Detecting whether the tuple was recently deleted or never existed

         * is crucial to avoid misleading the user during conflict handling.

         */

        if (FindDeletedTupleInLocalRel(localrel, localindexoid, remoteslot,

                                       &conflicttuple.xmin,

                                       &conflicttuple.origin,

                                       &conflicttuple.ts) &&

            conflicttuple.origin != replorigin_session_origin)

            type = CT_UPDATE_DELETED;

        else

            type = CT_UPDATE_MISSING;


        /* Store the new tuple for conflict reporting */

        slot_store_data(newslot, relmapentry, newtup);


        /*

         * The tuple to be updated could not be found or was deleted.  Do

         * nothing except for emitting a log message.

         */

        ReportApplyConflict(estate, relinfo, LOG, type, remoteslot, newslot,

                            list_make1(&conflicttuple));

    }


    /* Cleanup. */

    ExecCloseIndices(relinfo);

    EvalPlanQualEnd(&epqstate);

}


/*

 * Handle DELETE message.

 *

 * TODO: FDW support

 */

static void

apply_handle_delete(StringInfo s)

{

    LogicalRepRelMapEntry *rel;

    LogicalRepTupleData oldtup;

    LogicalRepRelId relid;

    UserContext ucxt;

    ApplyExecutionData *edata;

    EState     *estate;

    TupleTableSlot *remoteslot;

    MemoryContext oldctx;

    bool        run_as_owner;


    /*

     * Quick return if we are skipping data modification changes or handling

     * streamed transactions.

     */

    if (is_skipping_changes() ||

        handle_streamed_transaction(LOGICAL_REP_MSG_DELETE, s))

        return;


    begin_replication_step();


    relid = logicalrep_read_delete(s, &oldtup);

    rel = logicalrep_rel_open(relid, RowExclusiveLock);

    if (!should_apply_changes_for_rel(rel))

    {

        /*

         * The relation can't become interesting in the middle of the

         * transaction so it's safe to unlock it.

         */

        logicalrep_rel_close(rel, RowExclusiveLock);

        end_replication_step();

        return;

    }


    /* Set relation for error callback */

    apply_error_callback_arg.rel = rel;


    /* Check if we can do the delete. */

    check_relation_updatable(rel);


    /*

     * Make sure that any user-supplied code runs as the table owner, unless

     * the user has opted out of that behavior.

     */

    run_as_owner = MySubscription->runasowner;

    if (!run_as_owner)

        SwitchToUntrustedUser(rel->localrel->rd_rel->relowner, &ucxt);


    /* Initialize the executor state. */

    edata = create_edata_for_relation(rel);

    estate = edata->estate;

    remoteslot = ExecInitExtraTupleSlot(estate,

                                        RelationGetDescr(rel->localrel),

                                        &TTSOpsVirtual);


    /* Build the search tuple. */

    oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));

    slot_store_data(remoteslot, rel, &oldtup);

    MemoryContextSwitchTo(oldctx);


    /* For a partitioned table, apply delete to correct partition. */

    if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)

        apply_handle_tuple_routing(edata,

                                   remoteslot, NULL, CMD_DELETE);

    else

    {

        ResultRelInfo *relinfo = edata->targetRelInfo;


        ExecOpenIndices(relinfo, false);

        apply_handle_delete_internal(edata, relinfo,

                                     remoteslot, rel->localindexoid);

        ExecCloseIndices(relinfo);

    }


    finish_edata(edata);


    /* Reset relation for error callback */

    apply_error_callback_arg.rel = NULL;


    if (!run_as_owner)

        RestoreUserContext(&ucxt);


    logicalrep_rel_close(rel, NoLock);


    end_replication_step();

}


/*

 * Workhorse for apply_handle_delete()

 * relinfo is for the relation we're actually deleting from

 * (could be a child partition of edata->targetRelInfo)

 */

static void

apply_handle_delete_internal(ApplyExecutionData *edata,

                             ResultRelInfo *relinfo,

                             TupleTableSlot *remoteslot,

                             Oid localindexoid)

{

    EState     *estate = edata->estate;

    Relation    localrel = relinfo->ri_RelationDesc;

    LogicalRepRelation *remoterel = &edata->targetRel->remoterel;

    EPQState    epqstate;

    TupleTableSlot *localslot;

    ConflictTupleInfo conflicttuple = {0};

    bool        found;


    EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1, NIL);


    /* Caller should have opened indexes already. */

    Assert(relinfo->ri_IndexRelationDescs != NULL ||

           !localrel->rd_rel->relhasindex ||

           RelationGetIndexList(localrel) == NIL);


    found = FindReplTupleInLocalRel(edata, localrel, remoterel, localindexoid,

                                    remoteslot, &localslot);


    /* If found delete it. */

    if (found)

    {

        /*

         * Report the conflict if the tuple was modified by a different

         * origin.

         */

        if (GetTupleTransactionInfo(localslot, &conflicttuple.xmin,

                                    &conflicttuple.origin, &conflicttuple.ts) &&

            conflicttuple.origin != replorigin_session_origin)

        {

            conflicttuple.slot = localslot;

            ReportApplyConflict(estate, relinfo, LOG, CT_DELETE_ORIGIN_DIFFERS,

                                remoteslot, NULL,

                                list_make1(&conflicttuple));

        }


        EvalPlanQualSetSlot(&epqstate, localslot);


        /* Do the actual delete. */

        TargetPrivilegesCheck(relinfo->ri_RelationDesc, ACL_DELETE);

        ExecSimpleRelationDelete(relinfo, estate, &epqstate, localslot);

    }

    else

    {

        /*

         * The tuple to be deleted could not be found.  Do nothing except for

         * emitting a log message.

         */

        ReportApplyConflict(estate, relinfo, LOG, CT_DELETE_MISSING,

                            remoteslot, NULL, list_make1(&conflicttuple));

    }


    /* Cleanup. */

    EvalPlanQualEnd(&epqstate);

}


/*

 * Try to find a tuple received from the publication side (in 'remoteslot') in

 * the corresponding local relation using either replica identity index,

 * primary key, index or if needed, sequential scan.

 *

 * Local tuple, if found, is returned in '*localslot'.

 */

static bool

FindReplTupleInLocalRel(ApplyExecutionData *edata, Relation localrel,

                        LogicalRepRelation *remoterel,

                        Oid localidxoid,

                        TupleTableSlot *remoteslot,

                        TupleTableSlot **localslot)

{

    EState     *estate = edata->estate;

    bool        found;


    /*

     * Regardless of the top-level operation, we're performing a read here, so

     * check for SELECT privileges.

     */

    TargetPrivilegesCheck(localrel, ACL_SELECT);


    *localslot = table_slot_create(localrel, &estate->es_tupleTable);


    Assert(OidIsValid(localidxoid) ||

           (remoterel->replident == REPLICA_IDENTITY_FULL));


    if (OidIsValid(localidxoid))

    {

#ifdef USE_ASSERT_CHECKING

        Relation    idxrel = index_open(localidxoid, AccessShareLock);


        /* Index must be PK, RI, or usable for REPLICA IDENTITY FULL tables */

        Assert(GetRelationIdentityOrPK(localrel) == localidxoid ||

               (remoterel->replident == REPLICA_IDENTITY_FULL &&

                IsIndexUsableForReplicaIdentityFull(idxrel,

                                                    edata->targetRel->attrmap)));

        index_close(idxrel, AccessShareLock);

#endif


        found = RelationFindReplTupleByIndex(localrel, localidxoid,

                                             LockTupleExclusive,

                                             remoteslot, *localslot);

    }

    else

        found = RelationFindReplTupleSeq(localrel, LockTupleExclusive,

                                         remoteslot, *localslot);


    return found;

}


/*

 * Determine whether the index can reliably locate the deleted tuple in the

 * local relation.

 *

 * An index may exclude deleted tuples if it was re-indexed or re-created during

 * change application. Therefore, an index is considered usable only if the

 * conflict detection slot.xmin (conflict_detection_xmin) is greater than the

 * index tuple's xmin. This ensures that any tuples deleted prior to the index

 * creation or re-indexing are not relevant for conflict detection in the

 * current apply worker.

 *

 * Note that indexes may also be excluded if they were modified by other DDL

 * operations, such as ALTER INDEX. However, this is acceptable, as the

 * likelihood of such DDL changes coinciding with the need to scan dead

 * tuples for the update_deleted is low.

 */

static bool

IsIndexUsableForFindingDeletedTuple(Oid localindexoid,

                                    TransactionId conflict_detection_xmin)

{

    HeapTuple   index_tuple;

    TransactionId index_xmin;


    index_tuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(localindexoid));


    if (!HeapTupleIsValid(index_tuple)) /* should not happen */

        elog(ERROR, "cache lookup failed for index %u", localindexoid);


    /*

     * No need to check for a frozen transaction ID, as

     * TransactionIdPrecedes() manages it internally, treating it as falling

     * behind the conflict_detection_xmin.

     */

    index_xmin = HeapTupleHeaderGetXmin(index_tuple->t_data);


    ReleaseSysCache(index_tuple);


    return TransactionIdPrecedes(index_xmin, conflict_detection_xmin);

}


/*

 * Attempts to locate a deleted tuple in the local relation that matches the

 * values of the tuple received from the publication side (in 'remoteslot').

 * The search is performed using either the replica identity index, primary

 * key, other available index, or a sequential scan if necessary.

 *

 * Returns true if the deleted tuple is found. If found, the transaction ID,

 * origin, and commit timestamp of the deletion are stored in '*delete_xid',

 * '*delete_origin', and '*delete_time' respectively.

 */

static bool

FindDeletedTupleInLocalRel(Relation localrel, Oid localidxoid,

                           TupleTableSlot *remoteslot,

                           TransactionId *delete_xid, RepOriginId *delete_origin,

                           TimestampTz *delete_time)

{

    TransactionId oldestxmin;


    /*

     * Return false if either dead tuples are not retained or commit timestamp

     * data is not available.

     */

    if (!MySubscription->retaindeadtuples || !track_commit_timestamp)

        return false;


    /*

     * For conflict detection, we use the leader worker's

     * oldest_nonremovable_xid value instead of invoking

     * GetOldestNonRemovableTransactionId() or using the conflict detection

     * slot's xmin. The oldest_nonremovable_xid acts as a threshold to

     * identify tuples that were recently deleted. These deleted tuples are no

     * longer visible to concurrent transactions. However, if a remote update

     * matches such a tuple, we log an update_deleted conflict.

     *

     * While GetOldestNonRemovableTransactionId() and slot.xmin may return

     * transaction IDs older than oldest_nonremovable_xid, for our current

     * purpose, it is acceptable to treat tuples deleted by transactions prior

     * to oldest_nonremovable_xid as update_missing conflicts.

     */

    if (am_leader_apply_worker())

    {

        oldestxmin = MyLogicalRepWorker->oldest_nonremovable_xid;

    }

    else

    {

        LogicalRepWorker *leader;


        /*

         * Obtain the information from the leader apply worker as only the

         * leader manages oldest_nonremovable_xid (see

         * maybe_advance_nonremovable_xid() for details).

         */

        LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);

        leader = logicalrep_worker_find(WORKERTYPE_APPLY,

                                        MyLogicalRepWorker->subid, InvalidOid,

                                        false);

        if (!leader)

        {

            ereport(ERROR,

                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

                     errmsg("could not detect conflict as the leader apply worker has exited")));

        }


        SpinLockAcquire(&leader->relmutex);

        oldestxmin = leader->oldest_nonremovable_xid;

        SpinLockRelease(&leader->relmutex);

        LWLockRelease(LogicalRepWorkerLock);

    }


    /*

     * Return false if the leader apply worker has stopped retaining

     * information for detecting conflicts. This implies that update_deleted

     * can no longer be reliably detected.

     */

    if (!TransactionIdIsValid(oldestxmin))

        return false;


    if (OidIsValid(localidxoid) &&

        IsIndexUsableForFindingDeletedTuple(localidxoid, oldestxmin))

        return RelationFindDeletedTupleInfoByIndex(localrel, localidxoid,

                                                   remoteslot, oldestxmin,

                                                   delete_xid, delete_origin,

                                                   delete_time);

    else

        return RelationFindDeletedTupleInfoSeq(localrel, remoteslot,

                                               oldestxmin, delete_xid,

                                               delete_origin, delete_time);

}


/*

 * This handles insert, update, delete on a partitioned table.

 */

static void

apply_handle_tuple_routing(ApplyExecutionData *edata,

                           TupleTableSlot *remoteslot,

                           LogicalRepTupleData *newtup,

                           CmdType operation)

{

    EState     *estate = edata->estate;

    LogicalRepRelMapEntry *relmapentry = edata->targetRel;

    ResultRelInfo *relinfo = edata->targetRelInfo;

    Relation    parentrel = relinfo->ri_RelationDesc;

    ModifyTableState *mtstate;

    PartitionTupleRouting *proute;

    ResultRelInfo *partrelinfo;

    Relation    partrel;

    TupleTableSlot *remoteslot_part;

    TupleConversionMap *map;

    MemoryContext oldctx;

    LogicalRepRelMapEntry *part_entry = NULL;

    AttrMap    *attrmap = NULL;


    /* ModifyTableState is needed for ExecFindPartition(). */

    edata->mtstate = mtstate = makeNode(ModifyTableState);

    mtstate->ps.plan = NULL;

    mtstate->ps.state = estate;

    mtstate->operation = operation;

    mtstate->resultRelInfo = relinfo;


    /* ... as is PartitionTupleRouting. */

    edata->proute = proute = ExecSetupPartitionTupleRouting(estate, parentrel);


    /*

     * Find the partition to which the "search tuple" belongs.

     */

    Assert(remoteslot != NULL);

    oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));

    partrelinfo = ExecFindPartition(mtstate, relinfo, proute,

                                    remoteslot, estate);

    Assert(partrelinfo != NULL);

    partrel = partrelinfo->ri_RelationDesc;


    /*

     * Check for supported relkind.  We need this since partitions might be of

     * unsupported relkinds; and the set of partitions can change, so checking

     * at CREATE/ALTER SUBSCRIPTION would be insufficient.

     */

    CheckSubscriptionRelkind(partrel->rd_rel->relkind,

                             relmapentry->remoterel.relkind,

                             get_namespace_name(RelationGetNamespace(partrel)),

                             RelationGetRelationName(partrel));


    /*

     * To perform any of the operations below, the tuple must match the

     * partition's rowtype. Convert if needed or just copy, using a dedicated

     * slot to store the tuple in any case.

     */

    remoteslot_part = partrelinfo->ri_PartitionTupleSlot;

    if (remoteslot_part == NULL)

        remoteslot_part = table_slot_create(partrel, &estate->es_tupleTable);

    map = ExecGetRootToChildMap(partrelinfo, estate);

    if (map != NULL)

    {

        attrmap = map->attrMap;

        remoteslot_part = execute_attr_map_slot(attrmap, remoteslot,

                                                remoteslot_part);

    }

    else

    {

        remoteslot_part = ExecCopySlot(remoteslot_part, remoteslot);

        slot_getallattrs(remoteslot_part);

    }

    MemoryContextSwitchTo(oldctx);


    /* Check if we can do the update or delete on the leaf partition. */

    if (operation == CMD_UPDATE || operation == CMD_DELETE)

    {

        part_entry = logicalrep_partition_open(relmapentry, partrel,

                                               attrmap);

        check_relation_updatable(part_entry);

    }


    switch (operation)

    {

        case CMD_INSERT:

            apply_handle_insert_internal(edata, partrelinfo,

                                         remoteslot_part);

            break;


        case CMD_DELETE:

            apply_handle_delete_internal(edata, partrelinfo,

                                         remoteslot_part,

                                         part_entry->localindexoid);

            break;


        case CMD_UPDATE:


            /*

             * For UPDATE, depending on whether or not the updated tuple

             * satisfies the partition's constraint, perform a simple UPDATE

             * of the partition or move the updated tuple into a different

             * suitable partition.

             */

            {

                TupleTableSlot *localslot;

                ResultRelInfo *partrelinfo_new;

                Relation    partrel_new;

                bool        found;

                EPQState    epqstate;

                ConflictTupleInfo conflicttuple = {0};


                /* Get the matching local tuple from the partition. */

                found = FindReplTupleInLocalRel(edata, partrel,

                                                &part_entry->remoterel,

                                                part_entry->localindexoid,

                                                remoteslot_part, &localslot);

                if (!found)

                {

                    ConflictType type;

                    TupleTableSlot *newslot = localslot;


                    /*

                     * Detecting whether the tuple was recently deleted or

                     * never existed is crucial to avoid misleading the user

                     * during conflict handling.

                     */

                    if (FindDeletedTupleInLocalRel(partrel,

                                                   part_entry->localindexoid,

                                                   remoteslot_part,

                                                   &conflicttuple.xmin,

                                                   &conflicttuple.origin,

                                                   &conflicttuple.ts) &&

                        conflicttuple.origin != replorigin_session_origin)

                        type = CT_UPDATE_DELETED;

                    else

                        type = CT_UPDATE_MISSING;


                    /* Store the new tuple for conflict reporting */

                    slot_store_data(newslot, part_entry, newtup);


                    /*

                     * The tuple to be updated could not be found or was

                     * deleted.  Do nothing except for emitting a log message.

                     */

                    ReportApplyConflict(estate, partrelinfo, LOG,

                                        type, remoteslot_part, newslot,

                                        list_make1(&conflicttuple));


                    return;

                }


                /*

                 * Report the conflict if the tuple was modified by a

                 * different origin.

                 */

                if (GetTupleTransactionInfo(localslot, &conflicttuple.xmin,

                                            &conflicttuple.origin,

                                            &conflicttuple.ts) &&

                    conflicttuple.origin != replorigin_session_origin)

                {

                    TupleTableSlot *newslot;


                    /* Store the new tuple for conflict reporting */

                    newslot = table_slot_create(partrel, &estate->es_tupleTable);

                    slot_store_data(newslot, part_entry, newtup);


                    conflicttuple.slot = localslot;


                    ReportApplyConflict(estate, partrelinfo, LOG, CT_UPDATE_ORIGIN_DIFFERS,

                                        remoteslot_part, newslot,

                                        list_make1(&conflicttuple));

                }


                /*

                 * Apply the update to the local tuple, putting the result in

                 * remoteslot_part.

                 */

                oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));

                slot_modify_data(remoteslot_part, localslot, part_entry,

                                 newtup);

                MemoryContextSwitchTo(oldctx);


                EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1, NIL);


                /*

                 * Does the updated tuple still satisfy the current

                 * partition's constraint?

                 */

                if (!partrel->rd_rel->relispartition ||

                    ExecPartitionCheck(partrelinfo, remoteslot_part, estate,

                                       false))

                {

                    /*

                     * Yes, so simply UPDATE the partition.  We don't call

                     * apply_handle_update_internal() here, which would

                     * normally do the following work, to avoid repeating some

                     * work already done above to find the local tuple in the

                     * partition.

                     */

                    InitConflictIndexes(partrelinfo);


                    EvalPlanQualSetSlot(&epqstate, remoteslot_part);

                    TargetPrivilegesCheck(partrelinfo->ri_RelationDesc,

                                          ACL_UPDATE);

                    ExecSimpleRelationUpdate(partrelinfo, estate, &epqstate,

                                             localslot, remoteslot_part);

                }

                else

                {

                    /* Move the tuple into the new partition. */


                    /*

                     * New partition will be found using tuple routing, which

                     * can only occur via the parent table.  We might need to

                     * convert the tuple to the parent's rowtype.  Note that

                     * this is the tuple found in the partition, not the

                     * original search tuple received by this function.

                     */

                    if (map)

                    {

                        TupleConversionMap *PartitionToRootMap =

                            convert_tuples_by_name(RelationGetDescr(partrel),

                                                   RelationGetDescr(parentrel));


                        remoteslot =

                            execute_attr_map_slot(PartitionToRootMap->attrMap,

                                                  remoteslot_part, remoteslot);

                    }

                    else

                    {

                        remoteslot = ExecCopySlot(remoteslot, remoteslot_part);

                        slot_getallattrs(remoteslot);

                    }


                    /* Find the new partition. */

                    oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));

                    partrelinfo_new = ExecFindPartition(mtstate, relinfo,

                                                        proute, remoteslot,

                                                        estate);

                    MemoryContextSwitchTo(oldctx);

                    Assert(partrelinfo_new != partrelinfo);

                    partrel_new = partrelinfo_new->ri_RelationDesc;


                    /* Check that new partition also has supported relkind. */

                    CheckSubscriptionRelkind(partrel_new->rd_rel->relkind,

                                             relmapentry->remoterel.relkind,

                                             get_namespace_name(RelationGetNamespace(partrel_new)),

                                             RelationGetRelationName(partrel_new));


                    /* DELETE old tuple found in the old partition. */

                    EvalPlanQualSetSlot(&epqstate, localslot);

                    TargetPrivilegesCheck(partrelinfo->ri_RelationDesc, ACL_DELETE);

                    ExecSimpleRelationDelete(partrelinfo, estate, &epqstate, localslot);


                    /* INSERT new tuple into the new partition. */


                    /*

                     * Convert the replacement tuple to match the destination

                     * partition rowtype.

                     */

                    oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));

                    remoteslot_part = partrelinfo_new->ri_PartitionTupleSlot;

                    if (remoteslot_part == NULL)

                        remoteslot_part = table_slot_create(partrel_new,

                                                            &estate->es_tupleTable);

                    map = ExecGetRootToChildMap(partrelinfo_new, estate);

                    if (map != NULL)

                    {

                        remoteslot_part = execute_attr_map_slot(map->attrMap,

                                                                remoteslot,

                                                                remoteslot_part);

                    }

                    else

                    {

                        remoteslot_part = ExecCopySlot(remoteslot_part,

                                                       remoteslot);

                        slot_getallattrs(remoteslot);

                    }

                    MemoryContextSwitchTo(oldctx);

                    apply_handle_insert_internal(edata, partrelinfo_new,

                                                 remoteslot_part);

                }


                EvalPlanQualEnd(&epqstate);

            }

            break;


        default:

            elog(ERROR, "unrecognized CmdType: %d", (int) operation);

            break;

    }

}


/*

 * Handle TRUNCATE message.

 *

 * TODO: FDW support

 */

static void

apply_handle_truncate(StringInfo s)

{

    bool        cascade = false;

    bool        restart_seqs = false;

    List       *remote_relids = NIL;

    List       *remote_rels = NIL;

    List       *rels = NIL;

    List       *part_rels = NIL;

    List       *relids = NIL;

    List       *relids_logged = NIL;

    ListCell   *lc;

    LOCKMODE    lockmode = AccessExclusiveLock;


    /*

     * Quick return if we are skipping data modification changes or handling

     * streamed transactions.

     */

    if (is_skipping_changes() ||

        handle_streamed_transaction(LOGICAL_REP_MSG_TRUNCATE, s))

        return;


    begin_replication_step();


    remote_relids = logicalrep_read_truncate(s, &cascade, &restart_seqs);


    foreach(lc, remote_relids)

    {

        LogicalRepRelId relid = lfirst_oid(lc);

        LogicalRepRelMapEntry *rel;


        rel = logicalrep_rel_open(relid, lockmode);

        if (!should_apply_changes_for_rel(rel))

        {

            /*

             * The relation can't become interesting in the middle of the

             * transaction so it's safe to unlock it.

             */

            logicalrep_rel_close(rel, lockmode);

            continue;

        }


        remote_rels = lappend(remote_rels, rel);

        TargetPrivilegesCheck(rel->localrel, ACL_TRUNCATE);

        rels = lappend(rels, rel->localrel);

        relids = lappend_oid(relids, rel->localreloid);

        if (RelationIsLogicallyLogged(rel->localrel))

            relids_logged = lappend_oid(relids_logged, rel->localreloid);


        /*

         * Truncate partitions if we got a message to truncate a partitioned

         * table.

         */

        if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)

        {

            ListCell   *child;

            List       *children = find_all_inheritors(rel->localreloid,

                                                       lockmode,

                                                       NULL);


            foreach(child, children)

            {

                Oid         childrelid = lfirst_oid(child);

                Relation    childrel;


                if (list_member_oid(relids, childrelid))

                    continue;


                /* find_all_inheritors already got lock */

                childrel = table_open(childrelid, NoLock);


                /*

                 * Ignore temp tables of other backends.  See similar code in

                 * ExecuteTruncate().

                 */

                if (RELATION_IS_OTHER_TEMP(childrel))

                {

                    table_close(childrel, lockmode);

                    continue;

                }


                TargetPrivilegesCheck(childrel, ACL_TRUNCATE);

                rels = lappend(rels, childrel);

                part_rels = lappend(part_rels, childrel);

                relids = lappend_oid(relids, childrelid);

                /* Log this relation only if needed for logical decoding */

                if (RelationIsLogicallyLogged(childrel))

                    relids_logged = lappend_oid(relids_logged, childrelid);

            }

        }

    }


    /*

     * Even if we used CASCADE on the upstream primary we explicitly default

     * to replaying changes without further cascading. This might be later

     * changeable with a user specified option.

     *

     * MySubscription->runasowner tells us whether we want to execute

     * replication actions as the subscription owner; the last argument to

     * TruncateGuts tells it whether we want to switch to the table owner.

     * Those are exactly opposite conditions.

     */

    ExecuteTruncateGuts(rels,

                        relids,

                        relids_logged,

                        DROP_RESTRICT,

                        restart_seqs,

                        !MySubscription->runasowner);

    foreach(lc, remote_rels)

    {

        LogicalRepRelMapEntry *rel = lfirst(lc);


        logicalrep_rel_close(rel, NoLock);

    }

    foreach(lc, part_rels)

    {

        Relation    rel = lfirst(lc);


        table_close(rel, NoLock);

    }


    end_replication_step();

}


/*

 * Logical replication protocol message dispatcher.

 */

void

apply_dispatch(StringInfo s)

{

    LogicalRepMsgType action = pq_getmsgbyte(s);

    LogicalRepMsgType saved_command;


    /*

     * Set the current command being applied. Since this function can be

     * called recursively when applying spooled changes, save the current

     * command.

     */

    saved_command = apply_error_callback_arg.command;

    apply_error_callback_arg.command = action;


    switch (action)

    {

        case LOGICAL_REP_MSG_BEGIN:

            apply_handle_begin(s);

            break;


        case LOGICAL_REP_MSG_COMMIT:

            apply_handle_commit(s);

            break;


        case LOGICAL_REP_MSG_INSERT:

            apply_handle_insert(s);

            break;


        case LOGICAL_REP_MSG_UPDATE:

            apply_handle_update(s);

            break;


        case LOGICAL_REP_MSG_DELETE:

            apply_handle_delete(s);

            break;


        case LOGICAL_REP_MSG_TRUNCATE:

            apply_handle_truncate(s);

            break;


        case LOGICAL_REP_MSG_RELATION:

            apply_handle_relation(s);

            break;


        case LOGICAL_REP_MSG_TYPE:

            apply_handle_type(s);

            break;


        case LOGICAL_REP_MSG_ORIGIN:

            apply_handle_origin(s);

            break;


        case LOGICAL_REP_MSG_MESSAGE:


            /*

             * Logical replication does not use generic logical messages yet.

             * Although, it could be used by other applications that use this

             * output plugin.

             */

            break;


        case LOGICAL_REP_MSG_STREAM_START:

            apply_handle_stream_start(s);

            break;


        case LOGICAL_REP_MSG_STREAM_STOP:

            apply_handle_stream_stop(s);

            break;


        case LOGICAL_REP_MSG_STREAM_ABORT:

            apply_handle_stream_abort(s);

            break;


        case LOGICAL_REP_MSG_STREAM_COMMIT:

            apply_handle_stream_commit(s);

            break;


        case LOGICAL_REP_MSG_BEGIN_PREPARE:

            apply_handle_begin_prepare(s);

            break;


        case LOGICAL_REP_MSG_PREPARE:

            apply_handle_prepare(s);

            break;


        case LOGICAL_REP_MSG_COMMIT_PREPARED:

            apply_handle_commit_prepared(s);

            break;


        case LOGICAL_REP_MSG_ROLLBACK_PREPARED:

            apply_handle_rollback_prepared(s);

            break;


        case LOGICAL_REP_MSG_STREAM_PREPARE:

            apply_handle_stream_prepare(s);

            break;


        default:

            ereport(ERROR,

                    (errcode(ERRCODE_PROTOCOL_VIOLATION),

                     errmsg("invalid logical replication message type \"??? (%d)\"", action)));

    }


    /* Reset the current command */

    apply_error_callback_arg.command = saved_command;

}


/*

 * Figure out which write/flush positions to report to the walsender process.

 *

 * We can't simply report back the last LSN the walsender sent us because the

 * local transaction might not yet be flushed to disk locally. Instead we

 * build a list that associates local with remote LSNs for every commit. When

 * reporting back the flush position to the sender we iterate that list and

 * check which entries on it are already locally flushed. Those we can report

 * as having been flushed.

 *

 * The have_pending_txes is true if there are outstanding transactions that

 * need to be flushed.

 */

static void

get_flush_position(XLogRecPtr *write, XLogRecPtr *flush,

                   bool *have_pending_txes)

{

    dlist_mutable_iter iter;

    XLogRecPtr  local_flush = GetFlushRecPtr(NULL);


    *write = InvalidXLogRecPtr;

    *flush = InvalidXLogRecPtr;


    dlist_foreach_modify(iter, &lsn_mapping)

    {

        FlushPosition *pos =

            dlist_container(FlushPosition, node, iter.cur);


        *write = pos->remote_end;


        if (pos->local_end <= local_flush)

        {

            *flush = pos->remote_end;

            dlist_delete(iter.cur);

            pfree(pos);

        }

        else

        {

            /*

             * Don't want to uselessly iterate over the rest of the list which

             * could potentially be long. Instead get the last element and

             * grab the write position from there.

             */

            pos = dlist_tail_element(FlushPosition, node,

                                     &lsn_mapping);

            *write = pos->remote_end;

            *have_pending_txes = true;

            return;

        }

    }


    *have_pending_txes = !dlist_is_empty(&lsn_mapping);

}


/*

 * Store current remote/local lsn pair in the tracking list.

 */

void

store_flush_position(XLogRecPtr remote_lsn, XLogRecPtr local_lsn)

{

    FlushPosition *flushpos;


    /*

     * Skip for parallel apply workers, because the lsn_mapping is maintained

     * by the leader apply worker.

     */

    if (am_parallel_apply_worker())

        return;


    /* Need to do this in permanent context */

    MemoryContextSwitchTo(ApplyContext);


    /* Track commit lsn  */

    flushpos = (FlushPosition *) palloc(sizeof(FlushPosition));

    flushpos->local_end = local_lsn;

    flushpos->remote_end = remote_lsn;


    dlist_push_tail(&lsn_mapping, &flushpos->node);

    MemoryContextSwitchTo(ApplyMessageContext);

}


/* Update statistics of the worker. */

static void

UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply)

{

    MyLogicalRepWorker->last_lsn = last_lsn;

    MyLogicalRepWorker->last_send_time = send_time;

    MyLogicalRepWorker->last_recv_time = GetCurrentTimestamp();

    if (reply)

    {

        MyLogicalRepWorker->reply_lsn = last_lsn;

        MyLogicalRepWorker->reply_time = send_time;

    }

}


/*

 * Apply main loop.

 */

static void

LogicalRepApplyLoop(XLogRecPtr last_received)

{

    TimestampTz last_recv_timestamp = GetCurrentTimestamp();

    bool        ping_sent = false;

    TimeLineID  tli;

    ErrorContextCallback errcallback;

    RetainDeadTuplesData rdt_data = {0};


    /*

     * Init the ApplyMessageContext which we clean up after each replication

     * protocol message.

     */

    ApplyMessageContext = AllocSetContextCreate(ApplyContext,

                                                "ApplyMessageContext",

                                                ALLOCSET_DEFAULT_SIZES);


    /*

     * This memory context is used for per-stream data when the streaming mode

     * is enabled. This context is reset on each stream stop.

     */

    LogicalStreamingContext = AllocSetContextCreate(ApplyContext,

                                                    "LogicalStreamingContext",

                                                    ALLOCSET_DEFAULT_SIZES);


    /* mark as idle, before starting to loop */

    pgstat_report_activity(STATE_IDLE, NULL);


    /*

     * Push apply error context callback. Fields will be filled while applying

     * a change.

     */

    errcallback.callback = apply_error_callback;

    errcallback.previous = error_context_stack;

    error_context_stack = &errcallback;

    apply_error_context_stack = error_context_stack;


    /* This outer loop iterates once per wait. */

    for (;;)

    {

        pgsocket    fd = PGINVALID_SOCKET;

        int         rc;

        int         len;

        char       *buf = NULL;

        bool        endofstream = false;

        long        wait_time;


        CHECK_FOR_INTERRUPTS();


        MemoryContextSwitchTo(ApplyMessageContext);


        len = walrcv_receive(LogRepWorkerWalRcvConn, &buf, &fd);


        if (len != 0)

        {

            /* Loop to process all available data (without blocking). */

            for (;;)

            {

                CHECK_FOR_INTERRUPTS();


                if (len == 0)

                {

                    break;

                }

                else if (len < 0)

                {

                    ereport(LOG,

                            (errmsg("data stream from publisher has ended")));

                    endofstream = true;

                    break;

                }

                else

                {

                    int         c;

                    StringInfoData s;


                    if (ConfigReloadPending)

                    {

                        ConfigReloadPending = false;

                        ProcessConfigFile(PGC_SIGHUP);

                    }


                    /* Reset timeout. */

                    last_recv_timestamp = GetCurrentTimestamp();

                    ping_sent = false;


                    rdt_data.last_recv_time = last_recv_timestamp;


                    /* Ensure we are reading the data into our memory context. */

                    MemoryContextSwitchTo(ApplyMessageContext);


                    initReadOnlyStringInfo(&s, buf, len);


                    c = pq_getmsgbyte(&s);


                    if (c == PqReplMsg_WALData)

                    {

                        XLogRecPtr  start_lsn;

                        XLogRecPtr  end_lsn;

                        TimestampTz send_time;


                        start_lsn = pq_getmsgint64(&s);

                        end_lsn = pq_getmsgint64(&s);

                        send_time = pq_getmsgint64(&s);


                        if (last_received < start_lsn)

                            last_received = start_lsn;


                        if (last_received < end_lsn)

                            last_received = end_lsn;


                        UpdateWorkerStats(last_received, send_time, false);


                        apply_dispatch(&s);


                        maybe_advance_nonremovable_xid(&rdt_data, false);

                    }

                    else if (c == PqReplMsg_Keepalive)

                    {

                        XLogRecPtr  end_lsn;

                        TimestampTz timestamp;

                        bool        reply_requested;


                        end_lsn = pq_getmsgint64(&s);

                        timestamp = pq_getmsgint64(&s);

                        reply_requested = pq_getmsgbyte(&s);


                        if (last_received < end_lsn)

                            last_received = end_lsn;


                        send_feedback(last_received, reply_requested, false);


                        maybe_advance_nonremovable_xid(&rdt_data, false);


                        UpdateWorkerStats(last_received, timestamp, true);

                    }

                    else if (c == PqReplMsg_PrimaryStatusUpdate)

                    {

                        rdt_data.remote_lsn = pq_getmsgint64(&s);

                        rdt_data.remote_oldestxid = FullTransactionIdFromU64((uint64) pq_getmsgint64(&s));

                        rdt_data.remote_nextxid = FullTransactionIdFromU64((uint64) pq_getmsgint64(&s));

                        rdt_data.reply_time = pq_getmsgint64(&s);


                        /*

                         * This should never happen, see

                         * ProcessStandbyPSRequestMessage. But if it happens

                         * due to a bug, we don't want to proceed as it can

                         * incorrectly advance oldest_nonremovable_xid.

                         */

                        if (!XLogRecPtrIsValid(rdt_data.remote_lsn))

                            elog(ERROR, "cannot get the latest WAL position from the publisher");


                        maybe_advance_nonremovable_xid(&rdt_data, true);


                        UpdateWorkerStats(last_received, rdt_data.reply_time, false);

                    }

                    /* other message types are purposefully ignored */


                    MemoryContextReset(ApplyMessageContext);

                }


                len = walrcv_receive(LogRepWorkerWalRcvConn, &buf, &fd);

            }

        }


        /* confirm all writes so far */

        send_feedback(last_received, false, false);


        /* Reset the timestamp if no message was received */

        rdt_data.last_recv_time = 0;


        maybe_advance_nonremovable_xid(&rdt_data, false);


        if (!in_remote_transaction && !in_streamed_transaction)

        {

            /*

             * If we didn't get any transactions for a while there might be

             * unconsumed invalidation messages in the queue, consume them

             * now.

             */

            AcceptInvalidationMessages();

            maybe_reread_subscription();


            /*

             * Process any relations that are being synchronized in parallel

             * and any newly added tables or sequences.

             */

            ProcessSyncingRelations(last_received);

        }


        /* Cleanup the memory. */

        MemoryContextReset(ApplyMessageContext);

        MemoryContextSwitchTo(TopMemoryContext);


        /* Check if we need to exit the streaming loop. */

        if (endofstream)

            break;


        /*

         * Wait for more data or latch.  If we have unflushed transactions,

         * wake up after WalWriterDelay to see if they've been flushed yet (in

         * which case we should send a feedback message).  Otherwise, there's

         * no particular urgency about waking up unless we get data or a

         * signal.

         */

        if (!dlist_is_empty(&lsn_mapping))

            wait_time = WalWriterDelay;

        else

            wait_time = NAPTIME_PER_CYCLE;


        /*

         * Ensure to wake up when it's possible to advance the non-removable

         * transaction ID, or when the retention duration may have exceeded

         * max_retention_duration.

         */

        if (MySubscription->retentionactive)

        {

            if (rdt_data.phase == RDT_GET_CANDIDATE_XID &&

                rdt_data.xid_advance_interval)

                wait_time = Min(wait_time, rdt_data.xid_advance_interval);

            else if (MySubscription->maxretention > 0)

                wait_time = Min(wait_time, MySubscription->maxretention);

        }


        rc = WaitLatchOrSocket(MyLatch,

                               WL_SOCKET_READABLE | WL_LATCH_SET |

                               WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,

                               fd, wait_time,

                               WAIT_EVENT_LOGICAL_APPLY_MAIN);


        if (rc & WL_LATCH_SET)

        {

            ResetLatch(MyLatch);

            CHECK_FOR_INTERRUPTS();

        }


        if (ConfigReloadPending)

        {

            ConfigReloadPending = false;

            ProcessConfigFile(PGC_SIGHUP);

        }


        if (rc & WL_TIMEOUT)

        {

            /*

             * We didn't receive anything new. If we haven't heard anything

             * from the server for more than wal_receiver_timeout / 2, ping

             * the server. Also, if it's been longer than

             * wal_receiver_status_interval since the last update we sent,

             * send a status update to the primary anyway, to report any

             * progress in applying WAL.

             */

            bool        requestReply = false;


            /*

             * Check if time since last receive from primary has reached the

             * configured limit.

             */

            if (wal_receiver_timeout > 0)

            {

                TimestampTz now = GetCurrentTimestamp();

                TimestampTz timeout;


                timeout =

                    TimestampTzPlusMilliseconds(last_recv_timestamp,

                                                wal_receiver_timeout);


                if (now >= timeout)

                    ereport(ERROR,

                            (errcode(ERRCODE_CONNECTION_FAILURE),

                             errmsg("terminating logical replication worker due to timeout")));


                /* Check to see if it's time for a ping. */

                if (!ping_sent)

                {

                    timeout = TimestampTzPlusMilliseconds(last_recv_timestamp,

                                                          (wal_receiver_timeout / 2));

                    if (now >= timeout)

                    {

                        requestReply = true;

                        ping_sent = true;

                    }

                }

            }


            send_feedback(last_received, requestReply, requestReply);


            maybe_advance_nonremovable_xid(&rdt_data, false);


            /*

             * Force reporting to ensure long idle periods don't lead to

             * arbitrarily delayed stats. Stats can only be reported outside

             * of (implicit or explicit) transactions. That shouldn't lead to

             * stats being delayed for long, because transactions are either

             * sent as a whole on commit or streamed. Streamed transactions

             * are spilled to disk and applied on commit.

             */

            if (!IsTransactionState())

                pgstat_report_stat(true);

        }

    }


    /* Pop the error context stack */

    error_context_stack = errcallback.previous;

    apply_error_context_stack = error_context_stack;


    /* All done */

    walrcv_endstreaming(LogRepWorkerWalRcvConn, &tli);

}


/*

 * Send a Standby Status Update message to server.

 *

 * 'recvpos' is the latest LSN we've received data to, force is set if we need

 * to send a response to avoid timeouts.

 */

static void

send_feedback(XLogRecPtr recvpos, bool force, bool requestReply)

{

    static StringInfo reply_message = NULL;

    static TimestampTz send_time = 0;


    static XLogRecPtr last_recvpos = InvalidXLogRecPtr;

    static XLogRecPtr last_writepos = InvalidXLogRecPtr;


    XLogRecPtr  writepos;

    XLogRecPtr  flushpos;

    TimestampTz now;

    bool        have_pending_txes;


    /*

     * If the user doesn't want status to be reported to the publisher, be

     * sure to exit before doing anything at all.

     */

    if (!force && wal_receiver_status_interval <= 0)

        return;


    /* It's legal to not pass a recvpos */

    if (recvpos < last_recvpos)

        recvpos = last_recvpos;


    get_flush_position(&writepos, &flushpos, &have_pending_txes);


    /*

     * No outstanding transactions to flush, we can report the latest received

     * position. This is important for synchronous replication.

     */

    if (!have_pending_txes)

        flushpos = writepos = recvpos;


    if (writepos < last_writepos)

        writepos = last_writepos;


    if (flushpos < last_flushpos)

        flushpos = last_flushpos;


    now = GetCurrentTimestamp();


    /* if we've already reported everything we're good */

    if (!force &&

        writepos == last_writepos &&

        flushpos == last_flushpos &&

        !TimestampDifferenceExceeds(send_time, now,

                                    wal_receiver_status_interval * 1000))

        return;

    send_time = now;


    if (!reply_message)

    {

        MemoryContext oldctx = MemoryContextSwitchTo(ApplyContext);


        reply_message = makeStringInfo();

        MemoryContextSwitchTo(oldctx);

    }

    else

        resetStringInfo(reply_message);


    pq_sendbyte(reply_message, PqReplMsg_StandbyStatusUpdate);

    pq_sendint64(reply_message, recvpos);   /* write */

    pq_sendint64(reply_message, flushpos);  /* flush */

    pq_sendint64(reply_message, writepos);  /* apply */

    pq_sendint64(reply_message, now);   /* sendTime */

    pq_sendbyte(reply_message, requestReply);   /* replyRequested */


    elog(DEBUG2, "sending feedback (force %d) to recv %X/%08X, write %X/%08X, flush %X/%08X",

         force,

         LSN_FORMAT_ARGS(recvpos),

         LSN_FORMAT_ARGS(writepos),

         LSN_FORMAT_ARGS(flushpos));


    walrcv_send(LogRepWorkerWalRcvConn,

                reply_message->data, reply_message->len);


    if (recvpos > last_recvpos)

        last_recvpos = recvpos;

    if (writepos > last_writepos)

        last_writepos = writepos;

    if (flushpos > last_flushpos)

        last_flushpos = flushpos;

}


/*

 * Attempt to advance the non-removable transaction ID.

 *

 * See comments atop worker.c for details.

 */

static void

maybe_advance_nonremovable_xid(RetainDeadTuplesData *rdt_data,

                               bool status_received)

{

    if (!can_advance_nonremovable_xid(rdt_data))

        return;


    process_rdt_phase_transition(rdt_data, status_received);

}


/*

 * Preliminary check to determine if advancing the non-removable transaction ID

 * is allowed.

 */

static bool

can_advance_nonremovable_xid(RetainDeadTuplesData *rdt_data)

{

    /*

     * It is sufficient to manage non-removable transaction ID for a

     * subscription by the main apply worker to detect update_deleted reliably

     * even for table sync or parallel apply workers.

     */

    if (!am_leader_apply_worker())

        return false;


    /* No need to advance if retaining dead tuples is not required */

    if (!MySubscription->retaindeadtuples)

        return false;


    return true;

}


/*

 * Process phase transitions during the non-removable transaction ID

 * advancement. See comments atop worker.c for details of the transition.

 */

static void

process_rdt_phase_transition(RetainDeadTuplesData *rdt_data,

                             bool status_received)

{

    switch (rdt_data->phase)

    {

        case RDT_GET_CANDIDATE_XID:

            get_candidate_xid(rdt_data);

            break;

        case RDT_REQUEST_PUBLISHER_STATUS:

            request_publisher_status(rdt_data);

            break;

        case RDT_WAIT_FOR_PUBLISHER_STATUS:

            wait_for_publisher_status(rdt_data, status_received);

            break;

        case RDT_WAIT_FOR_LOCAL_FLUSH:

            wait_for_local_flush(rdt_data);

            break;

        case RDT_STOP_CONFLICT_INFO_RETENTION:

            stop_conflict_info_retention(rdt_data);

            break;

        case RDT_RESUME_CONFLICT_INFO_RETENTION:

            resume_conflict_info_retention(rdt_data);

            break;

    }

}


/*

 * Workhorse for the RDT_GET_CANDIDATE_XID phase.

 */

static void

get_candidate_xid(RetainDeadTuplesData *rdt_data)

{

    TransactionId oldest_running_xid;

    TimestampTz now;


    /*

     * Use last_recv_time when applying changes in the loop to avoid

     * unnecessary system time retrieval. If last_recv_time is not available,

     * obtain the current timestamp.

     */

    now = rdt_data->last_recv_time ? rdt_data->last_recv_time : GetCurrentTimestamp();


    /*

     * Compute the candidate_xid and request the publisher status at most once

     * per xid_advance_interval. Refer to adjust_xid_advance_interval() for

     * details on how this value is dynamically adjusted. This is to avoid

     * using CPU and network resources without making much progress.

     */

    if (!TimestampDifferenceExceeds(rdt_data->candidate_xid_time, now,

                                    rdt_data->xid_advance_interval))

        return;


    /*

     * Immediately update the timer, even if the function returns later

     * without setting candidate_xid due to inactivity on the subscriber. This

     * avoids frequent calls to GetOldestActiveTransactionId.

     */

    rdt_data->candidate_xid_time = now;


    /*

     * Consider transactions in the current database, as only dead tuples from

     * this database are required for conflict detection.

     */

    oldest_running_xid = GetOldestActiveTransactionId(false, false);


    /*

     * Oldest active transaction ID (oldest_running_xid) can't be behind any

     * of its previously computed value.

     */

    Assert(TransactionIdPrecedesOrEquals(MyLogicalRepWorker->oldest_nonremovable_xid,

                                         oldest_running_xid));


    /* Return if the oldest_nonremovable_xid cannot be advanced */

    if (TransactionIdEquals(MyLogicalRepWorker->oldest_nonremovable_xid,

                            oldest_running_xid))

    {

        adjust_xid_advance_interval(rdt_data, false);

        return;

    }


    adjust_xid_advance_interval(rdt_data, true);


    rdt_data->candidate_xid = oldest_running_xid;

    rdt_data->phase = RDT_REQUEST_PUBLISHER_STATUS;


    /* process the next phase */

    process_rdt_phase_transition(rdt_data, false);

}


/*

 * Workhorse for the RDT_REQUEST_PUBLISHER_STATUS phase.

 */

static void

request_publisher_status(RetainDeadTuplesData *rdt_data)

{

    static StringInfo request_message = NULL;


    if (!request_message)

    {

        MemoryContext oldctx = MemoryContextSwitchTo(ApplyContext);


        request_message = makeStringInfo();

        MemoryContextSwitchTo(oldctx);

    }

    else

        resetStringInfo(request_message);


    /*

     * Send the current time to update the remote walsender's latest reply

     * message received time.

     */

    pq_sendbyte(request_message, PqReplMsg_PrimaryStatusRequest);

    pq_sendint64(request_message, GetCurrentTimestamp());


    elog(DEBUG2, "sending publisher status request message");


    /* Send a request for the publisher status */

    walrcv_send(LogRepWorkerWalRcvConn,

                request_message->data, request_message->len);


    rdt_data->phase = RDT_WAIT_FOR_PUBLISHER_STATUS;


    /*

     * Skip calling maybe_advance_nonremovable_xid() since further transition

     * is possible only once we receive the publisher status message.

     */

}


/*

 * Workhorse for the RDT_WAIT_FOR_PUBLISHER_STATUS phase.

 */

static void

wait_for_publisher_status(RetainDeadTuplesData *rdt_data,

                          bool status_received)

{

    /*

     * Return if we have requested but not yet received the publisher status.

     */

    if (!status_received)

        return;


    /*

     * We don't need to maintain oldest_nonremovable_xid if we decide to stop

     * retaining conflict information for this worker.

     */

    if (should_stop_conflict_info_retention(rdt_data))

    {

        rdt_data->phase = RDT_STOP_CONFLICT_INFO_RETENTION;

        return;

    }


    if (!FullTransactionIdIsValid(rdt_data->remote_wait_for))

        rdt_data->remote_wait_for = rdt_data->remote_nextxid;


    /*

     * Check if all remote concurrent transactions that were active at the

     * first status request have now completed. If completed, proceed to the

     * next phase; otherwise, continue checking the publisher status until

     * these transactions finish.

     *

     * It's possible that transactions in the commit phase during the last

     * cycle have now finished committing, but remote_oldestxid remains older

     * than remote_wait_for. This can happen if some old transaction came in

     * the commit phase when we requested status in this cycle. We do not

     * handle this case explicitly as it's rare and the benefit doesn't

     * justify the required complexity. Tracking would require either caching

     * all xids at the publisher or sending them to subscribers. The condition

     * will resolve naturally once the remaining transactions are finished.

     *

     * Directly advancing the non-removable transaction ID is possible if

     * there are no activities on the publisher since the last advancement

     * cycle. However, it requires maintaining two fields, last_remote_nextxid

     * and last_remote_lsn, within the structure for comparison with the

     * current cycle's values. Considering the minimal cost of continuing in

     * RDT_WAIT_FOR_LOCAL_FLUSH without awaiting changes, we opted not to

     * advance the transaction ID here.

     */

    if (FullTransactionIdPrecedesOrEquals(rdt_data->remote_wait_for,

                                          rdt_data->remote_oldestxid))

        rdt_data->phase = RDT_WAIT_FOR_LOCAL_FLUSH;

    else

        rdt_data->phase = RDT_REQUEST_PUBLISHER_STATUS;


    /* process the next phase */

    process_rdt_phase_transition(rdt_data, false);

}


/*

 * Workhorse for the RDT_WAIT_FOR_LOCAL_FLUSH phase.

 */

static void

wait_for_local_flush(RetainDeadTuplesData *rdt_data)

{

    Assert(XLogRecPtrIsValid(rdt_data->remote_lsn) &&

           TransactionIdIsValid(rdt_data->candidate_xid));


    /*

     * We expect the publisher and subscriber clocks to be in sync using time

     * sync service like NTP. Otherwise, we will advance this worker's

     * oldest_nonremovable_xid prematurely, leading to the removal of rows

     * required to detect update_deleted reliably. This check primarily

     * addresses scenarios where the publisher's clock falls behind; if the

     * publisher's clock is ahead, subsequent transactions will naturally bear

     * later commit timestamps, conforming to the design outlined atop

     * worker.c.

     *

     * XXX Consider waiting for the publisher's clock to catch up with the

     * subscriber's before proceeding to the next phase.

     */

    if (TimestampDifferenceExceeds(rdt_data->reply_time,

                                   rdt_data->candidate_xid_time, 0))

        ereport(ERROR,

                errmsg_internal("oldest_nonremovable_xid transaction ID could be advanced prematurely"),

                errdetail_internal("The clock on the publisher is behind that of the subscriber."));


    /*

     * Do not attempt to advance the non-removable transaction ID when table

     * sync is in progress. During this time, changes from a single

     * transaction may be applied by multiple table sync workers corresponding

     * to the target tables. So, it's necessary for all table sync workers to

     * apply and flush the corresponding changes before advancing the

     * transaction ID, otherwise, dead tuples that are still needed for

     * conflict detection in table sync workers could be removed prematurely.

     * However, confirming the apply and flush progress across all table sync

     * workers is complex and not worth the effort, so we simply return if not

     * all tables are in the READY state.

     *

     * Advancing the transaction ID is necessary even when no tables are

     * currently subscribed, to avoid retaining dead tuples unnecessarily.

     * While it might seem safe to skip all phases and directly assign

     * candidate_xid to oldest_nonremovable_xid during the

     * RDT_GET_CANDIDATE_XID phase in such cases, this is unsafe. If users

     * concurrently add tables to the subscription, the apply worker may not

     * process invalidations in time. Consequently,

     * HasSubscriptionTablesCached() might miss the new tables, leading to

     * premature advancement of oldest_nonremovable_xid.

     *

     * Performing the check during RDT_WAIT_FOR_LOCAL_FLUSH is safe, as

     * invalidations are guaranteed to be processed before applying changes

     * from newly added tables while waiting for the local flush to reach

     * remote_lsn.

     *

     * Additionally, even if we check for subscription tables during

     * RDT_GET_CANDIDATE_XID, they might be dropped before reaching

     * RDT_WAIT_FOR_LOCAL_FLUSH. Therefore, it's still necessary to verify

     * subscription tables at this stage to prevent unnecessary tuple

     * retention.

     */

    if (HasSubscriptionTablesCached() && !AllTablesyncsReady())

    {

        TimestampTz now;


        now = rdt_data->last_recv_time

            ? rdt_data->last_recv_time : GetCurrentTimestamp();


        /*

         * Record the time spent waiting for table sync, it is needed for the

         * timeout check in should_stop_conflict_info_retention().

         */

        rdt_data->table_sync_wait_time =

            TimestampDifferenceMilliseconds(rdt_data->candidate_xid_time, now);


        return;

    }


    /*

     * We don't need to maintain oldest_nonremovable_xid if we decide to stop

     * retaining conflict information for this worker.

     */

    if (should_stop_conflict_info_retention(rdt_data))

    {

        rdt_data->phase = RDT_STOP_CONFLICT_INFO_RETENTION;

        return;

    }


    /*

     * Update and check the remote flush position if we are applying changes

     * in a loop. This is done at most once per WalWriterDelay to avoid

     * performing costly operations in get_flush_position() too frequently

     * during change application.

     */

    if (last_flushpos < rdt_data->remote_lsn && rdt_data->last_recv_time &&

        TimestampDifferenceExceeds(rdt_data->flushpos_update_time,

                                   rdt_data->last_recv_time, WalWriterDelay))

    {

        XLogRecPtr  writepos;

        XLogRecPtr  flushpos;

        bool        have_pending_txes;


        /* Fetch the latest remote flush position */

        get_flush_position(&writepos, &flushpos, &have_pending_txes);


        if (flushpos > last_flushpos)

            last_flushpos = flushpos;


        rdt_data->flushpos_update_time = rdt_data->last_recv_time;

    }


    /* Return to wait for the changes to be applied */

    if (last_flushpos < rdt_data->remote_lsn)

        return;


    /*

     * Reaching this point implies should_stop_conflict_info_retention()

     * returned false earlier, meaning that the most recent duration for

     * advancing the non-removable transaction ID is within the

     * max_retention_duration or max_retention_duration is set to 0.

     *

     * Therefore, if conflict info retention was previously stopped due to a

     * timeout, it is now safe to resume retention.

     */

    if (!MySubscription->retentionactive)

    {

        rdt_data->phase = RDT_RESUME_CONFLICT_INFO_RETENTION;

        return;

    }


    /*

     * Reaching here means the remote WAL position has been received, and all

     * transactions up to that position on the publisher have been applied and

     * flushed locally. So, we can advance the non-removable transaction ID.

     */

    SpinLockAcquire(&MyLogicalRepWorker->relmutex);

    MyLogicalRepWorker->oldest_nonremovable_xid = rdt_data->candidate_xid;

    SpinLockRelease(&MyLogicalRepWorker->relmutex);


    elog(DEBUG2, "confirmed flush up to remote lsn %X/%08X: new oldest_nonremovable_xid %u",

         LSN_FORMAT_ARGS(rdt_data->remote_lsn),

         rdt_data->candidate_xid);


    /* Notify launcher to update the xmin of the conflict slot */

    ApplyLauncherWakeup();


    reset_retention_data_fields(rdt_data);


    /* process the next phase */

    process_rdt_phase_transition(rdt_data, false);

}


/*

 * Check whether conflict information retention should be stopped due to

 * exceeding the maximum wait time (max_retention_duration).

 *

 * If retention should be stopped, return true. Otherwise, return false.

 */

static bool

should_stop_conflict_info_retention(RetainDeadTuplesData *rdt_data)

{

    TimestampTz now;


    Assert(TransactionIdIsValid(rdt_data->candidate_xid));

    Assert(rdt_data->phase == RDT_WAIT_FOR_PUBLISHER_STATUS ||

           rdt_data->phase == RDT_WAIT_FOR_LOCAL_FLUSH);


    if (!MySubscription->maxretention)

        return false;


    /*

     * Use last_recv_time when applying changes in the loop to avoid

     * unnecessary system time retrieval. If last_recv_time is not available,

     * obtain the current timestamp.

     */

    now = rdt_data->last_recv_time ? rdt_data->last_recv_time : GetCurrentTimestamp();


    /*

     * Return early if the wait time has not exceeded the configured maximum

     * (max_retention_duration). Time spent waiting for table synchronization

     * is excluded from this calculation, as it occurs infrequently.

     */

    if (!TimestampDifferenceExceeds(rdt_data->candidate_xid_time, now,

                                    MySubscription->maxretention +

                                    rdt_data->table_sync_wait_time))

        return false;


    return true;

}


/*

 * Workhorse for the RDT_STOP_CONFLICT_INFO_RETENTION phase.

 */

static void

stop_conflict_info_retention(RetainDeadTuplesData *rdt_data)

{

    /* Stop retention if not yet */

    if (MySubscription->retentionactive)

    {

        /*

         * If the retention status cannot be updated (e.g., due to active

         * transaction), skip further processing to avoid inconsistent

         * retention behavior.

         */

        if (!update_retention_status(false))

            return;


        SpinLockAcquire(&MyLogicalRepWorker->relmutex);

        MyLogicalRepWorker->oldest_nonremovable_xid = InvalidTransactionId;

        SpinLockRelease(&MyLogicalRepWorker->relmutex);


        ereport(LOG,

                errmsg("logical replication worker for subscription \"%s\" has stopped retaining the information for detecting conflicts",

                       MySubscription->name),

                errdetail("Retention is stopped because the apply process has not caught up with the publisher within the configured max_retention_duration."));

    }


    Assert(!TransactionIdIsValid(MyLogicalRepWorker->oldest_nonremovable_xid));


    /*

     * If retention has been stopped, reset to the initial phase to retry

     * resuming retention. This reset is required to recalculate the current

     * wait time and resume retention if the time falls within

     * max_retention_duration.

     */

    reset_retention_data_fields(rdt_data);

}


/*

 * Workhorse for the RDT_RESUME_CONFLICT_INFO_RETENTION phase.

 */

static void

resume_conflict_info_retention(RetainDeadTuplesData *rdt_data)

{

    /* We can't resume retention without updating retention status. */

    if (!update_retention_status(true))

        return;


    ereport(LOG,

            errmsg("logical replication worker for subscription \"%s\" will resume retaining the information for detecting conflicts",

                   MySubscription->name),

            MySubscription->maxretention

            ? errdetail("Retention is re-enabled because the apply process has caught up with the publisher within the configured max_retention_duration.")

            : errdetail("Retention is re-enabled because max_retention_duration has been set to unlimited."));


    /*

     * Restart the worker to let the launcher initialize

     * oldest_nonremovable_xid at startup.

     *

     * While it's technically possible to derive this value on-the-fly using

     * the conflict detection slot's xmin, doing so risks a race condition:

     * the launcher might clean slot.xmin just after retention resumes. This

     * would make oldest_nonremovable_xid unreliable, especially during xid

     * wraparound.

     *

     * Although this can be prevented by introducing heavy weight locking, the

     * complexity it will bring doesn't seem worthwhile given how rarely

     * retention is resumed.

     */

    apply_worker_exit();

}


/*

 * Updates pg_subscription.subretentionactive to the given value within a

 * new transaction.

 *

 * If already inside an active transaction, skips the update and returns

 * false.

 *

 * Returns true if the update is successfully performed.

 */

static bool

update_retention_status(bool active)

{

    /*

     * Do not update the catalog during an active transaction. The transaction

     * may be started during change application, leading to a possible

     * rollback of catalog updates if the application fails subsequently.

     */

    if (IsTransactionState())

        return false;


    StartTransactionCommand();


    /*

     * Updating pg_subscription might involve TOAST table access, so ensure we

     * have a valid snapshot.

     */

    PushActiveSnapshot(GetTransactionSnapshot());


    /* Update pg_subscription.subretentionactive */

    UpdateDeadTupleRetentionStatus(MySubscription->oid, active);


    PopActiveSnapshot();

    CommitTransactionCommand();


    /* Notify launcher to update the conflict slot */

    ApplyLauncherWakeup();


    MySubscription->retentionactive = active;


    return true;

}


/*

 * Reset all data fields of RetainDeadTuplesData except those used to

 * determine the timing for the next round of transaction ID advancement. We

 * can even use flushpos_update_time in the next round to decide whether to get

 * the latest flush position.

 */

static void

reset_retention_data_fields(RetainDeadTuplesData *rdt_data)

{

    rdt_data->phase = RDT_GET_CANDIDATE_XID;

    rdt_data->remote_lsn = InvalidXLogRecPtr;

    rdt_data->remote_oldestxid = InvalidFullTransactionId;

    rdt_data->remote_nextxid = InvalidFullTransactionId;

    rdt_data->reply_time = 0;

    rdt_data->remote_wait_for = InvalidFullTransactionId;

    rdt_data->candidate_xid = InvalidTransactionId;

    rdt_data->table_sync_wait_time = 0;

}


/*

 * Adjust the interval for advancing non-removable transaction IDs.

 *

 * If there is no activity on the node or retention has been stopped, we

 * progressively double the interval used to advance non-removable transaction

 * ID. This helps conserve CPU and network resources when there's little benefit

 * to frequent updates.

 *

 * The interval is capped by the lowest of the following:

 * - wal_receiver_status_interval (if set and retention is active),

 * - a default maximum of 3 minutes,

 * - max_retention_duration (if retention is active).

 *

 * This ensures the interval never exceeds the retention boundary, even if other

 * limits are higher. Once activity resumes on the node and the retention is

 * active, the interval is reset to lesser of 100ms and max_retention_duration,

 * allowing timely advancement of non-removable transaction ID.

 *

 * XXX The use of wal_receiver_status_interval is a bit arbitrary so we can

 * consider the other interval or a separate GUC if the need arises.

 */

static void

adjust_xid_advance_interval(RetainDeadTuplesData *rdt_data, bool new_xid_found)

{

    if (rdt_data->xid_advance_interval && !new_xid_found)

    {

        int         max_interval = wal_receiver_status_interval

            ? wal_receiver_status_interval * 1000

            : MAX_XID_ADVANCE_INTERVAL;


        /*

         * No new transaction ID has been assigned since the last check, so

         * double the interval, but not beyond the maximum allowable value.

         */

        rdt_data->xid_advance_interval = Min(rdt_data->xid_advance_interval * 2,

                                             max_interval);

    }

    else if (rdt_data->xid_advance_interval &&

             !MySubscription->retentionactive)

    {

        /*

         * Retention has been stopped, so double the interval-capped at a

         * maximum of 3 minutes. The wal_receiver_status_interval is

         * intentionally not used as a upper bound, since the likelihood of

         * retention resuming is lower than that of general activity resuming.

         */

        rdt_data->xid_advance_interval = Min(rdt_data->xid_advance_interval * 2,

                                             MAX_XID_ADVANCE_INTERVAL);

    }

    else

    {

        /*

         * A new transaction ID was found or the interval is not yet

         * initialized, so set the interval to the minimum value.

         */

        rdt_data->xid_advance_interval = MIN_XID_ADVANCE_INTERVAL;

    }


    /*

     * Ensure the wait time remains within the maximum retention time limit

     * when retention is active.

     */

    if (MySubscription->retentionactive)

        rdt_data->xid_advance_interval = Min(rdt_data->xid_advance_interval,

                                             MySubscription->maxretention);

}


/*

 * Exit routine for apply workers due to subscription parameter changes.

 */

static void

apply_worker_exit(void)

{

    if (am_parallel_apply_worker())

    {

        /*

         * Don't stop the parallel apply worker as the leader will detect the

         * subscription parameter change and restart logical replication later

         * anyway. This also prevents the leader from reporting errors when

         * trying to communicate with a stopped parallel apply worker, which

         * would accidentally disable subscriptions if disable_on_error was

         * set.

         */

        return;

    }


    /*

     * Reset the last-start time for this apply worker so that the launcher

     * will restart it without waiting for wal_retrieve_retry_interval if the

     * subscription is still active, and so that we won't leak that hash table

     * entry if it isn't.

     */

    if (am_leader_apply_worker())

        ApplyLauncherForgetWorkerStartTime(MyLogicalRepWorker->subid);


    proc_exit(0);

}


/*

 * Reread subscription info if needed.

 *

 * For significant changes, we react by exiting the current process; a new

 * one will be launched afterwards if needed.

 */

void

maybe_reread_subscription(void)

{

    MemoryContext oldctx;

    Subscription *newsub;

    bool        started_tx = false;


    /* When cache state is valid there is nothing to do here. */

    if (MySubscriptionValid)

        return;


    /* This function might be called inside or outside of transaction. */

    if (!IsTransactionState())

    {

        StartTransactionCommand();

        started_tx = true;

    }


    /* Ensure allocations in permanent context. */

    oldctx = MemoryContextSwitchTo(ApplyContext);


    newsub = GetSubscription(MyLogicalRepWorker->subid, true);


    /*

     * Exit if the subscription was removed. This normally should not happen

     * as the worker gets killed during DROP SUBSCRIPTION.

     */

    if (!newsub)

    {

        ereport(LOG,

                (errmsg("logical replication worker for subscription \"%s\" will stop because the subscription was removed",

                        MySubscription->name)));


        /* Ensure we remove no-longer-useful entry for worker's start time */

        if (am_leader_apply_worker())

            ApplyLauncherForgetWorkerStartTime(MyLogicalRepWorker->subid);


        proc_exit(0);

    }


    /* Exit if the subscription was disabled. */

    if (!newsub->enabled)

    {

        ereport(LOG,

                (errmsg("logical replication worker for subscription \"%s\" will stop because the subscription was disabled",

                        MySubscription->name)));


        apply_worker_exit();

    }


    /* !slotname should never happen when enabled is true. */

    Assert(newsub->slotname);


    /* two-phase cannot be altered while the worker is running */

    Assert(newsub->twophasestate == MySubscription->twophasestate);


    /*

     * Exit if any parameter that affects the remote connection was changed.

     * The launcher will start a new worker but note that the parallel apply

     * worker won't restart if the streaming option's value is changed from

     * 'parallel' to any other value or the server decides not to stream the

     * in-progress transaction.

     */

    if (strcmp(newsub->conninfo, MySubscription->conninfo) != 0 ||

        strcmp(newsub->name, MySubscription->name) != 0 ||

        strcmp(newsub->slotname, MySubscription->slotname) != 0 ||

        newsub->binary != MySubscription->binary ||

        newsub->stream != MySubscription->stream ||

        newsub->passwordrequired != MySubscription->passwordrequired ||

        strcmp(newsub->origin, MySubscription->origin) != 0 ||

        newsub->owner != MySubscription->owner ||

        !equal(newsub->publications, MySubscription->publications))

    {

        if (am_parallel_apply_worker())

            ereport(LOG,

                    (errmsg("logical replication parallel apply worker for subscription \"%s\" will stop because of a parameter change",

                            MySubscription->name)));

        else

            ereport(LOG,

                    (errmsg("logical replication worker for subscription \"%s\" will restart because of a parameter change",

                            MySubscription->name)));


        apply_worker_exit();

    }


    /*

     * Exit if the subscription owner's superuser privileges have been

     * revoked.

     */

    if (!newsub->ownersuperuser && MySubscription->ownersuperuser)

    {

        if (am_parallel_apply_worker())

            ereport(LOG,

                    errmsg("logical replication parallel apply worker for subscription \"%s\" will stop because the subscription owner's superuser privileges have been revoked",

                           MySubscription->name));

        else

            ereport(LOG,

                    errmsg("logical replication worker for subscription \"%s\" will restart because the subscription owner's superuser privileges have been revoked",

                           MySubscription->name));


        apply_worker_exit();

    }


    /* Check for other changes that should never happen too. */

    if (newsub->dbid != MySubscription->dbid)

    {

        elog(ERROR, "subscription %u changed unexpectedly",

             MyLogicalRepWorker->subid);

    }


    /* Clean old subscription info and switch to new one. */

    FreeSubscription(MySubscription);

    MySubscription = newsub;


    MemoryContextSwitchTo(oldctx);


    /* Change synchronous commit according to the user's wishes */

    SetConfigOption("synchronous_commit", MySubscription->synccommit,

                    PGC_BACKEND, PGC_S_OVERRIDE);


    if (started_tx)

        CommitTransactionCommand();


    MySubscriptionValid = true;

}


/*

 * Callback from subscription syscache invalidation.

 */

static void

subscription_change_cb(Datum arg, int cacheid, uint32 hashvalue)

{

    MySubscriptionValid = false;

}


/*

 * subxact_info_write

 *    Store information about subxacts for a toplevel transaction.

 *

 * For each subxact we store offset of its first change in the main file.

 * The file is always over-written as a whole.

 *

 * XXX We should only store subxacts that were not aborted yet.

 */

static void

subxact_info_write(Oid subid, TransactionId xid)

{

    char        path[MAXPGPATH];

    Size        len;

    BufFile    *fd;


    Assert(TransactionIdIsValid(xid));


    /* construct the subxact filename */

    subxact_filename(path, subid, xid);


    /* Delete the subxacts file, if exists. */

    if (subxact_data.nsubxacts == 0)

    {

        cleanup_subxact_info();

        BufFileDeleteFileSet(MyLogicalRepWorker->stream_fileset, path, true);


        return;

    }


    /*

     * Create the subxact file if it not already created, otherwise open the

     * existing file.

     */

    fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset, path, O_RDWR,

                            true);

    if (fd == NULL)

        fd = BufFileCreateFileSet(MyLogicalRepWorker->stream_fileset, path);


    len = sizeof(SubXactInfo) * subxact_data.nsubxacts;


    /* Write the subxact count and subxact info */

    BufFileWrite(fd, &subxact_data.nsubxacts, sizeof(subxact_data.nsubxacts));

    BufFileWrite(fd, subxact_data.subxacts, len);


    BufFileClose(fd);


    /* free the memory allocated for subxact info */

    cleanup_subxact_info();

}


/*

 * subxact_info_read

 *    Restore information about subxacts of a streamed transaction.

 *

 * Read information about subxacts into the structure subxact_data that can be

 * used later.

 */

static void

subxact_info_read(Oid subid, TransactionId xid)

{

    char        path[MAXPGPATH];

    Size        len;

    BufFile    *fd;

    MemoryContext oldctx;


    Assert(!subxact_data.subxacts);

    Assert(subxact_data.nsubxacts == 0);

    Assert(subxact_data.nsubxacts_max == 0);


    /*

     * If the subxact file doesn't exist that means we don't have any subxact

     * info.

     */

    subxact_filename(path, subid, xid);

    fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset, path, O_RDONLY,

                            true);

    if (fd == NULL)

        return;


    /* read number of subxact items */

    BufFileReadExact(fd, &subxact_data.nsubxacts, sizeof(subxact_data.nsubxacts));


    len = sizeof(SubXactInfo) * subxact_data.nsubxacts;


    /* we keep the maximum as a power of 2 */

    subxact_data.nsubxacts_max = 1 << pg_ceil_log2_32(subxact_data.nsubxacts);


    /*

     * Allocate subxact information in the logical streaming context. We need

     * this information during the complete stream so that we can add the sub

     * transaction info to this. On stream stop we will flush this information

     * to the subxact file and reset the logical streaming context.

     */

    oldctx = MemoryContextSwitchTo(LogicalStreamingContext);

    subxact_data.subxacts = palloc(subxact_data.nsubxacts_max *

                                   sizeof(SubXactInfo));

    MemoryContextSwitchTo(oldctx);


    if (len > 0)

        BufFileReadExact(fd, subxact_data.subxacts, len);


    BufFileClose(fd);

}


/*

 * subxact_info_add

 *    Add information about a subxact (offset in the main file).

 */

static void

subxact_info_add(TransactionId xid)

{

    SubXactInfo *subxacts = subxact_data.subxacts;

    int64       i;


    /* We must have a valid top level stream xid and a stream fd. */

    Assert(TransactionIdIsValid(stream_xid));

    Assert(stream_fd != NULL);


    /*

     * If the XID matches the toplevel transaction, we don't want to add it.

     */

    if (stream_xid == xid)

        return;


    /*

     * In most cases we're checking the same subxact as we've already seen in

     * the last call, so make sure to ignore it (this change comes later).

     */

    if (subxact_data.subxact_last == xid)

        return;


    /* OK, remember we're processing this XID. */

    subxact_data.subxact_last = xid;


    /*

     * Check if the transaction is already present in the array of subxact. We

     * intentionally scan the array from the tail, because we're likely adding

     * a change for the most recent subtransactions.

     *

     * XXX Can we rely on the subxact XIDs arriving in sorted order? That

     * would allow us to use binary search here.

     */

    for (i = subxact_data.nsubxacts; i > 0; i--)

    {

        /* found, so we're done */

        if (subxacts[i - 1].xid == xid)

            return;

    }


    /* This is a new subxact, so we need to add it to the array. */

    if (subxact_data.nsubxacts == 0)

    {

        MemoryContext oldctx;


        subxact_data.nsubxacts_max = 128;


        /*

         * Allocate this memory for subxacts in per-stream context, see

         * subxact_info_read.

         */

        oldctx = MemoryContextSwitchTo(LogicalStreamingContext);

        subxacts = palloc(subxact_data.nsubxacts_max * sizeof(SubXactInfo));

        MemoryContextSwitchTo(oldctx);

    }

    else if (subxact_data.nsubxacts == subxact_data.nsubxacts_max)

    {

        subxact_data.nsubxacts_max *= 2;

        subxacts = repalloc(subxacts,

                            subxact_data.nsubxacts_max * sizeof(SubXactInfo));

    }


    subxacts[subxact_data.nsubxacts].xid = xid;


    /*

     * Get the current offset of the stream file and store it as offset of

     * this subxact.

     */

    BufFileTell(stream_fd,

                &subxacts[subxact_data.nsubxacts].fileno,

                &subxacts[subxact_data.nsubxacts].offset);


    subxact_data.nsubxacts++;

    subxact_data.subxacts = subxacts;

}


/* format filename for file containing the info about subxacts */

static inline void

subxact_filename(char *path, Oid subid, TransactionId xid)

{

    snprintf(path, MAXPGPATH, "%u-%u.subxacts", subid, xid);

}


/* format filename for file containing serialized changes */

static inline void

changes_filename(char *path, Oid subid, TransactionId xid)

{

    snprintf(path, MAXPGPATH, "%u-%u.changes", subid, xid);

}


/*

 * stream_cleanup_files

 *    Cleanup files for a subscription / toplevel transaction.

 *

 * Remove files with serialized changes and subxact info for a particular

 * toplevel transaction. Each subscription has a separate set of files

 * for any toplevel transaction.

 */

void

stream_cleanup_files(Oid subid, TransactionId xid)

{

    char        path[MAXPGPATH];


    /* Delete the changes file. */

    changes_filename(path, subid, xid);

    BufFileDeleteFileSet(MyLogicalRepWorker->stream_fileset, path, false);


    /* Delete the subxact file, if it exists. */

    subxact_filename(path, subid, xid);

    BufFileDeleteFileSet(MyLogicalRepWorker->stream_fileset, path, true);

}


/*

 * stream_open_file

 *    Open a file that we'll use to serialize changes for a toplevel

 * transaction.

 *

 * Open a file for streamed changes from a toplevel transaction identified

 * by stream_xid (global variable). If it's the first chunk of streamed

 * changes for this transaction, create the buffile, otherwise open the

 * previously created file.

 */

static void

stream_open_file(Oid subid, TransactionId xid, bool first_segment)

{

    char        path[MAXPGPATH];

    MemoryContext oldcxt;


    Assert(OidIsValid(subid));

    Assert(TransactionIdIsValid(xid));

    Assert(stream_fd == NULL);


    changes_filename(path, subid, xid);

    elog(DEBUG1, "opening file \"%s\" for streamed changes", path);


    /*

     * Create/open the buffiles under the logical streaming context so that we

     * have those files until stream stop.

     */

    oldcxt = MemoryContextSwitchTo(LogicalStreamingContext);


    /*

     * If this is the first streamed segment, create the changes file.

     * Otherwise, just open the file for writing, in append mode.

     */

    if (first_segment)

        stream_fd = BufFileCreateFileSet(MyLogicalRepWorker->stream_fileset,

                                         path);

    else

    {

        /*

         * Open the file and seek to the end of the file because we always

         * append the changes file.

         */

        stream_fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset,

                                       path, O_RDWR, false);

        BufFileSeek(stream_fd, 0, 0, SEEK_END);

    }


    MemoryContextSwitchTo(oldcxt);

}


/*

 * stream_close_file

 *    Close the currently open file with streamed changes.

 */

static void

stream_close_file(void)

{

    Assert(stream_fd != NULL);


    BufFileClose(stream_fd);


    stream_fd = NULL;

}


/*

 * stream_write_change

 *    Serialize a change to a file for the current toplevel transaction.

 *

 * The change is serialized in a simple format, with length (not including

 * the length), action code (identifying the message type) and message

 * contents (without the subxact TransactionId value).

 */

static void

stream_write_change(char action, StringInfo s)

{

    int         len;


    Assert(stream_fd != NULL);


    /* total on-disk size, including the action type character */

    len = (s->len - s->cursor) + sizeof(char);


    /* first write the size */

    BufFileWrite(stream_fd, &len, sizeof(len));


    /* then the action */

    BufFileWrite(stream_fd, &action, sizeof(action));


    /* and finally the remaining part of the buffer (after the XID) */

    len = (s->len - s->cursor);


    BufFileWrite(stream_fd, &s->data[s->cursor], len);

}


/*

 * stream_open_and_write_change

 *    Serialize a message to a file for the given transaction.

 *

 * This function is similar to stream_write_change except that it will open the

 * target file if not already before writing the message and close the file at

 * the end.

 */

static void

stream_open_and_write_change(TransactionId xid, char action, StringInfo s)

{

    Assert(!in_streamed_transaction);


    if (!stream_fd)

        stream_start_internal(xid, false);


    stream_write_change(action, s);

    stream_stop_internal(xid);

}


/*

 * Sets streaming options including replication slot name and origin start

 * position. Workers need these options for logical replication.

 */

void

set_stream_options(WalRcvStreamOptions *options,

                   char *slotname,

                   XLogRecPtr *origin_startpos)

{

    int         server_version;


    options->logical = true;

    options->startpoint = *origin_startpos;

    options->slotname = slotname;


    server_version = walrcv_server_version(LogRepWorkerWalRcvConn);

    options->proto.logical.proto_version =

        server_version >= 160000 ? LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM :

        server_version >= 150000 ? LOGICALREP_PROTO_TWOPHASE_VERSION_NUM :

        server_version >= 140000 ? LOGICALREP_PROTO_STREAM_VERSION_NUM :

        LOGICALREP_PROTO_VERSION_NUM;


    options->proto.logical.publication_names = MySubscription->publications;

    options->proto.logical.binary = MySubscription->binary;


    /*

     * Assign the appropriate option value for streaming option according to

     * the 'streaming' mode and the publisher's ability to support that mode.

     */

    if (server_version >= 160000 &&

        MySubscription->stream == LOGICALREP_STREAM_PARALLEL)

    {

        options->proto.logical.streaming_str = "parallel";

        MyLogicalRepWorker->parallel_apply = true;

    }

    else if (server_version >= 140000 &&

             MySubscription->stream != LOGICALREP_STREAM_OFF)

    {

        options->proto.logical.streaming_str = "on";

        MyLogicalRepWorker->parallel_apply = false;

    }

    else

    {

        options->proto.logical.streaming_str = NULL;

        MyLogicalRepWorker->parallel_apply = false;

    }


    options->proto.logical.twophase = false;

    options->proto.logical.origin = pstrdup(MySubscription->origin);

}


/*

 * Cleanup the memory for subxacts and reset the related variables.

 */

static inline void

cleanup_subxact_info()

{

    if (subxact_data.subxacts)

        pfree(subxact_data.subxacts);


    subxact_data.subxacts = NULL;

    subxact_data.subxact_last = InvalidTransactionId;

    subxact_data.nsubxacts = 0;

    subxact_data.nsubxacts_max = 0;

}


/*

 * Common function to run the apply loop with error handling. Disable the

 * subscription, if necessary.

 *

 * Note that we don't handle FATAL errors which are probably because

 * of system resource error and are not repeatable.

 */

void

start_apply(XLogRecPtr origin_startpos)

{

    PG_TRY();

    {

        LogicalRepApplyLoop(origin_startpos);

    }

    PG_CATCH();

    {

        /*

         * Reset the origin state to prevent the advancement of origin

         * progress if we fail to apply. Otherwise, this will result in

         * transaction loss as that transaction won't be sent again by the

         * server.

         */

        replorigin_reset(0, (Datum) 0);


        if (MySubscription->disableonerr)

            DisableSubscriptionAndExit();

        else

        {

            /*

             * Report the worker failed while applying changes. Abort the

             * current transaction so that the stats message is sent in an

             * idle state.

             */

            AbortOutOfAnyTransaction();

            pgstat_report_subscription_error(MySubscription->oid,

                                             MyLogicalRepWorker->type);


            PG_RE_THROW();

        }

    }

    PG_END_TRY();

}


/*

 * Runs the leader apply worker.

 *

 * It sets up replication origin, streaming options and then starts streaming.

 */

static void

run_apply_worker()

{

    char        originname[NAMEDATALEN];

    XLogRecPtr  origin_startpos = InvalidXLogRecPtr;

    char       *slotname = NULL;

    WalRcvStreamOptions options;

    RepOriginId originid;

    TimeLineID  startpointTLI;

    char       *err;

    bool        must_use_password;


    slotname = MySubscription->slotname;


    /*

     * This shouldn't happen if the subscription is enabled, but guard against

     * DDL bugs or manual catalog changes.  (libpqwalreceiver will crash if

     * slot is NULL.)

     */

    if (!slotname)

        ereport(ERROR,

                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

                 errmsg("subscription has no replication slot set")));


    /* Setup replication origin tracking. */

    ReplicationOriginNameForLogicalRep(MySubscription->oid, InvalidOid,

                                       originname, sizeof(originname));

    StartTransactionCommand();

    originid = replorigin_by_name(originname, true);

    if (!OidIsValid(originid))

        originid = replorigin_create(originname);

    replorigin_session_setup(originid, 0);

    replorigin_session_origin = originid;

    origin_startpos = replorigin_session_get_progress(false);

    CommitTransactionCommand();


    /* Is the use of a password mandatory? */

    must_use_password = MySubscription->passwordrequired &&

        !MySubscription->ownersuperuser;


    LogRepWorkerWalRcvConn = walrcv_connect(MySubscription->conninfo, true,

                                            true, must_use_password,

                                            MySubscription->name, &err);


    if (LogRepWorkerWalRcvConn == NULL)

        ereport(ERROR,

                (errcode(ERRCODE_CONNECTION_FAILURE),

                 errmsg("apply worker for subscription \"%s\" could not connect to the publisher: %s",

                        MySubscription->name, err)));


    /*

     * We don't really use the output identify_system for anything but it does

     * some initializations on the upstream so let's still call it.

     */

    (void) walrcv_identify_system(LogRepWorkerWalRcvConn, &startpointTLI);


    set_apply_error_context_origin(originname);


    set_stream_options(&options, slotname, &origin_startpos);


    /*

     * Even when the two_phase mode is requested by the user, it remains as

     * the tri-state PENDING until all tablesyncs have reached READY state.

     * Only then, can it become ENABLED.

     *

     * Note: If the subscription has no tables then leave the state as

     * PENDING, which allows ALTER SUBSCRIPTION ... REFRESH PUBLICATION to

     * work.

     */

    if (MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&

        AllTablesyncsReady())

    {

        /* Start streaming with two_phase enabled */

        options.proto.logical.twophase = true;

        walrcv_startstreaming(LogRepWorkerWalRcvConn, &options);


        StartTransactionCommand();


        /*

         * Updating pg_subscription might involve TOAST table access, so

         * ensure we have a valid snapshot.

         */

        PushActiveSnapshot(GetTransactionSnapshot());


        UpdateTwoPhaseState(MySubscription->oid, LOGICALREP_TWOPHASE_STATE_ENABLED);

        MySubscription->twophasestate = LOGICALREP_TWOPHASE_STATE_ENABLED;

        PopActiveSnapshot();

        CommitTransactionCommand();

    }

    else

    {

        walrcv_startstreaming(LogRepWorkerWalRcvConn, &options);

    }


    ereport(DEBUG1,

            (errmsg_internal("logical replication apply worker for subscription \"%s\" two_phase is %s",

                             MySubscription->name,

                             MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_DISABLED ? "DISABLED" :

                             MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING ? "PENDING" :

                             MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED ? "ENABLED" :

                             "?")));


    /* Run the main loop. */

    start_apply(origin_startpos);

}


/*

 * Common initialization for leader apply worker, parallel apply worker,

 * tablesync worker and sequencesync worker.

 *

 * Initialize the database connection, in-memory subscription and necessary

 * config options.

 */

void

InitializeLogRepWorker(void)

{

    MemoryContext oldctx;


    /* Run as replica session replication role. */

    SetConfigOption("session_replication_role", "replica",

                    PGC_SUSET, PGC_S_OVERRIDE);


    /* Connect to our database. */

    BackgroundWorkerInitializeConnectionByOid(MyLogicalRepWorker->dbid,

                                              MyLogicalRepWorker->userid,

                                              0);


    /*

     * Set always-secure search path, so malicious users can't redirect user

     * code (e.g. pg_index.indexprs).

     */

    SetConfigOption("search_path", "", PGC_SUSET, PGC_S_OVERRIDE);


    /* Load the subscription into persistent memory context. */

    ApplyContext = AllocSetContextCreate(TopMemoryContext,

                                         "ApplyContext",

                                         ALLOCSET_DEFAULT_SIZES);

    StartTransactionCommand();

    oldctx = MemoryContextSwitchTo(ApplyContext);


    /*

     * Lock the subscription to prevent it from being concurrently dropped,

     * then re-verify its existence. After the initialization, the worker will

     * be terminated gracefully if the subscription is dropped.

     */

    LockSharedObject(SubscriptionRelationId, MyLogicalRepWorker->subid, 0,

                     AccessShareLock);

    MySubscription = GetSubscription(MyLogicalRepWorker->subid, true);

    if (!MySubscription)

    {

        ereport(LOG,

                (errmsg("logical replication worker for subscription %u will not start because the subscription was removed during startup",

                        MyLogicalRepWorker->subid)));


        /* Ensure we remove no-longer-useful entry for worker's start time */

        if (am_leader_apply_worker())

            ApplyLauncherForgetWorkerStartTime(MyLogicalRepWorker->subid);


        proc_exit(0);

    }


    MySubscriptionValid = true;

    MemoryContextSwitchTo(oldctx);


    if (!MySubscription->enabled)

    {

        ereport(LOG,

                (errmsg("logical replication worker for subscription \"%s\" will not start because the subscription was disabled during startup",

                        MySubscription->name)));


        apply_worker_exit();

    }


    /*

     * Restart the worker if retain_dead_tuples was enabled during startup.

     *

     * At this point, the replication slot used for conflict detection might

     * not exist yet, or could be dropped soon if the launcher perceives

     * retain_dead_tuples as disabled. To avoid unnecessary tracking of

     * oldest_nonremovable_xid when the slot is absent or at risk of being

     * dropped, a restart is initiated.

     *

     * The oldest_nonremovable_xid should be initialized only when the

     * subscription's retention is active before launching the worker. See

     * logicalrep_worker_launch.

     */

    if (am_leader_apply_worker() &&

        MySubscription->retaindeadtuples &&

        MySubscription->retentionactive &&

        !TransactionIdIsValid(MyLogicalRepWorker->oldest_nonremovable_xid))

    {

        ereport(LOG,

                errmsg("logical replication worker for subscription \"%s\" will restart because the option %s was enabled during startup",

                       MySubscription->name, "retain_dead_tuples"));


        apply_worker_exit();

    }


    /* Setup synchronous commit according to the user's wishes */

    SetConfigOption("synchronous_commit", MySubscription->synccommit,

                    PGC_BACKEND, PGC_S_OVERRIDE);


    /*

     * Keep us informed about subscription or role changes. Note that the

     * role's superuser privilege can be revoked.

     */

    CacheRegisterSyscacheCallback(SUBSCRIPTIONOID,

                                  subscription_change_cb,

                                  (Datum) 0);


    CacheRegisterSyscacheCallback(AUTHOID,

                                  subscription_change_cb,

                                  (Datum) 0);


    if (am_tablesync_worker())

        ereport(LOG,

                errmsg("logical replication table synchronization worker for subscription \"%s\", table \"%s\" has started",

                       MySubscription->name,

                       get_rel_name(MyLogicalRepWorker->relid)));

    else if (am_sequencesync_worker())

        ereport(LOG,

                errmsg("logical replication sequence synchronization worker for subscription \"%s\" has started",

                       MySubscription->name));

    else

        ereport(LOG,

                errmsg("logical replication apply worker for subscription \"%s\" has started",

                       MySubscription->name));


    CommitTransactionCommand();

}


/*

 * Reset the origin state.

 */

static void

replorigin_reset(int code, Datum arg)

{

    replorigin_session_origin = InvalidRepOriginId;

    replorigin_session_origin_lsn = InvalidXLogRecPtr;

    replorigin_session_origin_timestamp = 0;

}


/*

 * Common function to setup the leader apply, tablesync and sequencesync worker.

 */

void

SetupApplyOrSyncWorker(int worker_slot)

{

    /* Attach to slot */

    logicalrep_worker_attach(worker_slot);


    Assert(am_tablesync_worker() || am_sequencesync_worker() || am_leader_apply_worker());


    /* Setup signal handling */

    pqsignal(SIGHUP, SignalHandlerForConfigReload);

    pqsignal(SIGTERM, die);

    BackgroundWorkerUnblockSignals();


    /*

     * We don't currently need any ResourceOwner in a walreceiver process, but

     * if we did, we could call CreateAuxProcessResourceOwner here.

     */


    /* Initialise stats to a sanish value */

    MyLogicalRepWorker->last_send_time = MyLogicalRepWorker->last_recv_time =

        MyLogicalRepWorker->reply_time = GetCurrentTimestamp();


    /* Load the libpq-specific functions */

    load_file("libpqwalreceiver", false);


    InitializeLogRepWorker();


    /*

     * Register a callback to reset the origin state before aborting any

     * pending transaction during shutdown (see ShutdownPostgres()). This will

     * avoid origin advancement for an in-complete transaction which could

     * otherwise lead to its loss as such a transaction won't be sent by the

     * server again.

     *

     * Note that even a LOG or DEBUG statement placed after setting the origin

     * state may process a shutdown signal before committing the current apply

     * operation. So, it is important to register such a callback here.

     */

    before_shmem_exit(replorigin_reset, (Datum) 0);


    /* Connect to the origin and start the replication. */

    elog(DEBUG1, "connecting to publisher using connection string \"%s\"",

         MySubscription->conninfo);


    /*

     * Setup callback for syscache so that we know when something changes in

     * the subscription relation state.

     */

    CacheRegisterSyscacheCallback(SUBSCRIPTIONRELMAP,

                                  InvalidateSyncingRelStates,

                                  (Datum) 0);

}


/* Logical Replication Apply worker entry point */

void

ApplyWorkerMain(Datum main_arg)

{

    int         worker_slot = DatumGetInt32(main_arg);


    InitializingApplyWorker = true;


    SetupApplyOrSyncWorker(worker_slot);


    InitializingApplyWorker = false;


    run_apply_worker();


    proc_exit(0);

}


/*

 * After error recovery, disable the subscription in a new transaction

 * and exit cleanly.

 */

void

DisableSubscriptionAndExit(void)

{

    /*

     * Emit the error message, and recover from the error state to an idle

     * state

     */

    HOLD_INTERRUPTS();


    EmitErrorReport();

    AbortOutOfAnyTransaction();

    FlushErrorState();


    RESUME_INTERRUPTS();


    /*

     * Report the worker failed during sequence synchronization, table

     * synchronization, or apply.

     */

    pgstat_report_subscription_error(MyLogicalRepWorker->subid,

                                     MyLogicalRepWorker->type);


    /* Disable the subscription */

    StartTransactionCommand();


    /*

     * Updating pg_subscription might involve TOAST table access, so ensure we

     * have a valid snapshot.

     */

    PushActiveSnapshot(GetTransactionSnapshot());


    DisableSubscription(MySubscription->oid);

    PopActiveSnapshot();

    CommitTransactionCommand();


    /* Ensure we remove no-longer-useful entry for worker's start time */

    if (am_leader_apply_worker())

        ApplyLauncherForgetWorkerStartTime(MyLogicalRepWorker->subid);


    /* Notify the subscription has been disabled and exit */

    ereport(LOG,

            errmsg("subscription \"%s\" has been disabled because of an error",

                   MySubscription->name));


    /*

     * Skip the track_commit_timestamp check when disabling the worker due to

     * an error, as verifying commit timestamps is unnecessary in this

     * context.

     */

    CheckSubDeadTupleRetention(false, true, WARNING,

                               MySubscription->retaindeadtuples,

                               MySubscription->retentionactive, false);


    proc_exit(0);

}


/*

 * Is current process a logical replication worker?

 */

bool

IsLogicalWorker(void)

{

    return MyLogicalRepWorker != NULL;

}


/*

 * Is current process a logical replication parallel apply worker?

 */

bool

IsLogicalParallelApplyWorker(void)

{

    return IsLogicalWorker() && am_parallel_apply_worker();

}


/*

 * Start skipping changes of the transaction if the given LSN matches the

 * LSN specified by subscription's skiplsn.

 */

static void

maybe_start_skipping_changes(XLogRecPtr finish_lsn)

{

    Assert(!is_skipping_changes());

    Assert(!in_remote_transaction);

    Assert(!in_streamed_transaction);


    /*

     * Quick return if it's not requested to skip this transaction. This

     * function is called for every remote transaction and we assume that

     * skipping the transaction is not used often.

     */

    if (likely(!XLogRecPtrIsValid(MySubscription->skiplsn) ||

               MySubscription->skiplsn != finish_lsn))

        return;


    /* Start skipping all changes of this transaction */

    skip_xact_finish_lsn = finish_lsn;


    ereport(LOG,

            errmsg("logical replication starts skipping transaction at LSN %X/%08X",

                   LSN_FORMAT_ARGS(skip_xact_finish_lsn)));

}


/*

 * Stop skipping changes by resetting skip_xact_finish_lsn if enabled.

 */

static void

stop_skipping_changes(void)

{

    if (!is_skipping_changes())

        return;


    ereport(LOG,

            errmsg("logical replication completed skipping transaction at LSN %X/%08X",

                   LSN_FORMAT_ARGS(skip_xact_finish_lsn)));


    /* Stop skipping changes */

    skip_xact_finish_lsn = InvalidXLogRecPtr;

}


/*

 * Clear subskiplsn of pg_subscription catalog.

 *

 * finish_lsn is the transaction's finish LSN that is used to check if the

 * subskiplsn matches it. If not matched, we raise a warning when clearing the

 * subskiplsn in order to inform users for cases e.g., where the user mistakenly

 * specified the wrong subskiplsn.

 */

static void

clear_subscription_skip_lsn(XLogRecPtr finish_lsn)

{

    Relation    rel;

    Form_pg_subscription subform;

    HeapTuple   tup;

    XLogRecPtr  myskiplsn = MySubscription->skiplsn;

    bool        started_tx = false;


    if (likely(!XLogRecPtrIsValid(myskiplsn)) || am_parallel_apply_worker())

        return;


    if (!IsTransactionState())

    {

        StartTransactionCommand();

        started_tx = true;

    }


    /*

     * Updating pg_subscription might involve TOAST table access, so ensure we

     * have a valid snapshot.

     */

    PushActiveSnapshot(GetTransactionSnapshot());


    /*

     * Protect subskiplsn of pg_subscription from being concurrently updated

     * while clearing it.

     */

    LockSharedObject(SubscriptionRelationId, MySubscription->oid, 0,

                     AccessShareLock);


    rel = table_open(SubscriptionRelationId, RowExclusiveLock);


    /* Fetch the existing tuple. */

    tup = SearchSysCacheCopy1(SUBSCRIPTIONOID,

                              ObjectIdGetDatum(MySubscription->oid));


    if (!HeapTupleIsValid(tup))

        elog(ERROR, "subscription \"%s\" does not exist", MySubscription->name);


    subform = (Form_pg_subscription) GETSTRUCT(tup);


    /*

     * Clear the subskiplsn. If the user has already changed subskiplsn before

     * clearing it we don't update the catalog and the replication origin

     * state won't get advanced. So in the worst case, if the server crashes

     * before sending an acknowledgment of the flush position the transaction

     * will be sent again and the user needs to set subskiplsn again. We can

     * reduce the possibility by logging a replication origin WAL record to

     * advance the origin LSN instead but there is no way to advance the

     * origin timestamp and it doesn't seem to be worth doing anything about

     * it since it's a very rare case.

     */

    if (subform->subskiplsn == myskiplsn)

    {

        bool        nulls[Natts_pg_subscription];

        bool        replaces[Natts_pg_subscription];

        Datum       values[Natts_pg_subscription];


        memset(values, 0, sizeof(values));

        memset(nulls, false, sizeof(nulls));

        memset(replaces, false, sizeof(replaces));


        /* reset subskiplsn */

        values[Anum_pg_subscription_subskiplsn - 1] = LSNGetDatum(InvalidXLogRecPtr);

        replaces[Anum_pg_subscription_subskiplsn - 1] = true;


        tup = heap_modify_tuple(tup, RelationGetDescr(rel), values, nulls,

                                replaces);

        CatalogTupleUpdate(rel, &tup->t_self, tup);


        if (myskiplsn != finish_lsn)

            ereport(WARNING,

                    errmsg("skip-LSN of subscription \"%s\" cleared", MySubscription->name),

                    errdetail("Remote transaction's finish WAL location (LSN) %X/%08X did not match skip-LSN %X/%08X.",

                              LSN_FORMAT_ARGS(finish_lsn),

                              LSN_FORMAT_ARGS(myskiplsn)));

    }


    heap_freetuple(tup);

    table_close(rel, NoLock);


    PopActiveSnapshot();


    if (started_tx)

        CommitTransactionCommand();

}


/* Error callback to give more context info about the change being applied */

void

apply_error_callback(void *arg)

{

    ApplyErrorCallbackArg *errarg = &apply_error_callback_arg;


    if (apply_error_callback_arg.command == 0)

        return;


    Assert(errarg->origin_name);


    if (errarg->rel == NULL)

    {

        if (!TransactionIdIsValid(errarg->remote_xid))

            errcontext("processing remote data for replication origin \"%s\" during message type \"%s\"",

                       errarg->origin_name,

                       logicalrep_message_type(errarg->command));

        else if (!XLogRecPtrIsValid(errarg->finish_lsn))

            errcontext("processing remote data for replication origin \"%s\" during message type \"%s\" in transaction %u",

                       errarg->origin_name,

                       logicalrep_message_type(errarg->command),

                       errarg->remote_xid);

        else

            errcontext("processing remote data for replication origin \"%s\" during message type \"%s\" in transaction %u, finished at %X/%08X",

                       errarg->origin_name,

                       logicalrep_message_type(errarg->command),

                       errarg->remote_xid,

                       LSN_FORMAT_ARGS(errarg->finish_lsn));

    }

    else

    {

        if (errarg->remote_attnum < 0)

        {

            if (!XLogRecPtrIsValid(errarg->finish_lsn))

                errcontext("processing remote data for replication origin \"%s\" during message type \"%s\" for replication target relation \"%s.%s\" in transaction %u",

                           errarg->origin_name,

                           logicalrep_message_type(errarg->command),

                           errarg->rel->remoterel.nspname,

                           errarg->rel->remoterel.relname,

                           errarg->remote_xid);

            else

                errcontext("processing remote data for replication origin \"%s\" during message type \"%s\" for replication target relation \"%s.%s\" in transaction %u, finished at %X/%08X",

                           errarg->origin_name,

                           logicalrep_message_type(errarg->command),

                           errarg->rel->remoterel.nspname,

                           errarg->rel->remoterel.relname,

                           errarg->remote_xid,

                           LSN_FORMAT_ARGS(errarg->finish_lsn));

        }

        else

        {

            if (!XLogRecPtrIsValid(errarg->finish_lsn))

                errcontext("processing remote data for replication origin \"%s\" during message type \"%s\" for replication target relation \"%s.%s\" column \"%s\" in transaction %u",

                           errarg->origin_name,

                           logicalrep_message_type(errarg->command),

                           errarg->rel->remoterel.nspname,

                           errarg->rel->remoterel.relname,

                           errarg->rel->remoterel.attnames[errarg->remote_attnum],

                           errarg->remote_xid);

            else

                errcontext("processing remote data for replication origin \"%s\" during message type \"%s\" for replication target relation \"%s.%s\" column \"%s\" in transaction %u, finished at %X/%08X",

                           errarg->origin_name,

                           logicalrep_message_type(errarg->command),

                           errarg->rel->remoterel.nspname,

                           errarg->rel->remoterel.relname,

                           errarg->rel->remoterel.attnames[errarg->remote_attnum],

                           errarg->remote_xid,

                           LSN_FORMAT_ARGS(errarg->finish_lsn));

        }

    }

}


/* Set transaction information of apply error callback */

static inline void

set_apply_error_context_xact(TransactionId xid, XLogRecPtr lsn)

{

    apply_error_callback_arg.remote_xid = xid;

    apply_error_callback_arg.finish_lsn = lsn;

}


/* Reset all information of apply error callback */

static inline void

reset_apply_error_context_info(void)

{

    apply_error_callback_arg.command = 0;

    apply_error_callback_arg.rel = NULL;

    apply_error_callback_arg.remote_attnum = -1;

    set_apply_error_context_xact(InvalidTransactionId, InvalidXLogRecPtr);

}


/*

 * Request wakeup of the workers for the given subscription OID

 * at commit of the current transaction.

 *

 * This is used to ensure that the workers process assorted changes

 * as soon as possible.

 */

void

LogicalRepWorkersWakeupAtCommit(Oid subid)

{

    MemoryContext oldcxt;


    oldcxt = MemoryContextSwitchTo(TopTransactionContext);

    on_commit_wakeup_workers_subids =

        list_append_unique_oid(on_commit_wakeup_workers_subids, subid);

    MemoryContextSwitchTo(oldcxt);

}


/*

 * Wake up the workers of any subscriptions that were changed in this xact.

 */

void

AtEOXact_LogicalRepWorkers(bool isCommit)

{

    if (isCommit && on_commit_wakeup_workers_subids != NIL)

    {

        ListCell   *lc;


        LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);

        foreach(lc, on_commit_wakeup_workers_subids)

        {

            Oid         subid = lfirst_oid(lc);

            List       *workers;

            ListCell   *lc2;


            workers = logicalrep_workers_find(subid, true, false);

            foreach(lc2, workers)

            {

                LogicalRepWorker *worker = (LogicalRepWorker *) lfirst(lc2);


                logicalrep_worker_wakeup_ptr(worker);

            }

        }

        LWLockRelease(LogicalRepWorkerLock);

    }


    /* The List storage will be reclaimed automatically in xact cleanup. */

    on_commit_wakeup_workers_subids = NIL;

}


/*

 * Allocate the origin name in long-lived context for error context message.

 */

void

set_apply_error_context_origin(char *originname)

{

    apply_error_callback_arg.origin_name = MemoryContextStrdup(ApplyContext,

                                                               originname);

}


/*

 * Return the action to be taken for the given transaction. See

 * TransApplyAction for information on each of the actions.

 *

 * *winfo is assigned to the destination parallel worker info when the leader

 * apply worker has to pass all the transaction's changes to the parallel

 * apply worker.

 */

static TransApplyAction

get_transaction_apply_action(TransactionId xid, ParallelApplyWorkerInfo **winfo)

{

    *winfo = NULL;


    if (am_parallel_apply_worker())

    {

        return TRANS_PARALLEL_APPLY;

    }


    /*

     * If we are processing this transaction using a parallel apply worker

     * then either we send the changes to the parallel worker or if the worker

     * is busy then serialize the changes to the file which will later be

     * processed by the parallel worker.

     */

    *winfo = pa_find_worker(xid);


    if (*winfo && (*winfo)->serialize_changes)

    {

        return TRANS_LEADER_PARTIAL_SERIALIZE;

    }

    else if (*winfo)

    {

        return TRANS_LEADER_SEND_TO_PARALLEL;

    }


    /*

     * If there is no parallel worker involved to process this transaction

     * then we either directly apply the change or serialize it to a file

     * which will later be applied when the transaction finish message is

     * processed.

     */

    else if (in_streamed_transaction)

    {

        return TRANS_LEADER_SERIALIZE;

    }

    else

    {

        return TRANS_LEADER_APPLY;

    }

}

acl.h

AclResult
AclResult
Definition: acl.h:182

ACLCHECK_OK
@ ACLCHECK_OK
Definition: acl.h:183

aclcheck_error
void aclcheck_error(AclResult aclerr, ObjectType objtype, const char *objectname)
Definition: aclchk.c:2652

pg_class_aclcheck
AclResult pg_class_aclcheck(Oid table_oid, Oid roleid, AclMode mode)
Definition: aclchk.c:4037

pa_set_xact_state
void pa_set_xact_state(ParallelApplyWorkerShared *wshared, ParallelTransState xact_state)
Definition: applyparallelworker.c:1315

pa_unlock_stream
void pa_unlock_stream(TransactionId xid, LOCKMODE lockmode)
Definition: applyparallelworker.c:1555

pa_stream_abort
void pa_stream_abort(LogicalRepStreamAbortData *abort_data)
Definition: applyparallelworker.c:1424

pa_lock_stream
void pa_lock_stream(TransactionId xid, LOCKMODE lockmode)
Definition: applyparallelworker.c:1548

pa_set_fileset_state
void pa_set_fileset_state(ParallelApplyWorkerShared *wshared, PartialFileSetState fileset_state)
Definition: applyparallelworker.c:1506

pa_reset_subtrans
void pa_reset_subtrans(void)
Definition: applyparallelworker.c:1410

pa_lock_transaction
void pa_lock_transaction(TransactionId xid, LOCKMODE lockmode)
Definition: applyparallelworker.c:1581

MyParallelShared
ParallelApplyWorkerShared * MyParallelShared
Definition: applyparallelworker.c:239

pa_start_subtrans
void pa_start_subtrans(TransactionId current_xid, TransactionId top_xid)
Definition: applyparallelworker.c:1370

pa_switch_to_partial_serialize
void pa_switch_to_partial_serialize(ParallelApplyWorkerInfo *winfo, bool stream_locked)
Definition: applyparallelworker.c:1219

pa_xact_finish
void pa_xact_finish(ParallelApplyWorkerInfo *winfo, XLogRecPtr remote_lsn)
Definition: applyparallelworker.c:1626

pa_send_data
bool pa_send_data(ParallelApplyWorkerInfo *winfo, Size nbytes, const void *data)
Definition: applyparallelworker.c:1154

pa_allocate_worker
void pa_allocate_worker(TransactionId xid)
Definition: applyparallelworker.c:471

pa_set_stream_apply_worker
void pa_set_stream_apply_worker(ParallelApplyWorkerInfo *winfo)
Definition: applyparallelworker.c:1342

pa_find_worker
ParallelApplyWorkerInfo * pa_find_worker(TransactionId xid)
Definition: applyparallelworker.c:519

pa_unlock_transaction
void pa_unlock_transaction(TransactionId xid, LOCKMODE lockmode)
Definition: applyparallelworker.c:1588

pa_decr_and_wait_stream_block
void pa_decr_and_wait_stream_block(void)
Definition: applyparallelworker.c:1599

pg_atomic_add_fetch_u32
static uint32 pg_atomic_add_fetch_u32(volatile pg_atomic_uint32 *ptr, int32 add_)
Definition: atomics.h:422

check_relation_updatable
static void check_relation_updatable(LogicalRepRelMapEntry *rel)
Definition: worker.c:2749

subxact_filename
static void subxact_filename(char *path, Oid subid, TransactionId xid)
Definition: worker.c:5360

begin_replication_step
static void begin_replication_step(void)
Definition: worker.c:726

end_replication_step
static void end_replication_step(void)
Definition: worker.c:749

create_edata_for_relation
static ApplyExecutionData * create_edata_for_relation(LogicalRepRelMapEntry *rel)
Definition: worker.c:870

cleanup_subxact_info
static void cleanup_subxact_info(void)
Definition: worker.c:5564

set_stream_options
void set_stream_options(WalRcvStreamOptions *options, char *slotname, XLogRecPtr *origin_startpos)
Definition: worker.c:5514

apply_handle_stream_prepare
static void apply_handle_stream_prepare(StringInfo s)
Definition: worker.c:1518

apply_handle_insert_internal
static void apply_handle_insert_internal(ApplyExecutionData *edata, ResultRelInfo *relinfo, TupleTableSlot *remoteslot)
Definition: worker.c:2724

subxact_info_add
static void subxact_info_add(TransactionId xid)
Definition: worker.c:5282

should_stop_conflict_info_retention
static bool should_stop_conflict_info_retention(RetainDeadTuplesData *rdt_data)
Definition: worker.c:4769

last_flushpos
static XLogRecPtr last_flushpos
Definition: worker.c:527

stream_cleanup_files
void stream_cleanup_files(Oid subid, TransactionId xid)
Definition: worker.c:5381

ApplyMessageContext
MemoryContext ApplyMessageContext
Definition: worker.c:471

should_apply_changes_for_rel
static bool should_apply_changes_for_rel(LogicalRepRelMapEntry *rel)
Definition: worker.c:681

apply_handle_type
static void apply_handle_type(StringInfo s)
Definition: worker.c:2586

can_advance_nonremovable_xid
static bool can_advance_nonremovable_xid(RetainDeadTuplesData *rdt_data)
Definition: worker.c:4401

wait_for_local_flush
static void wait_for_local_flush(RetainDeadTuplesData *rdt_data)
Definition: worker.c:4614

apply_handle_truncate
static void apply_handle_truncate(StringInfo s)
Definition: worker.c:3647

RetainDeadTuplesPhase
RetainDeadTuplesPhase
Definition: worker.c:388

RDT_WAIT_FOR_PUBLISHER_STATUS
@ RDT_WAIT_FOR_PUBLISHER_STATUS
Definition: worker.c:391

RDT_RESUME_CONFLICT_INFO_RETENTION
@ RDT_RESUME_CONFLICT_INFO_RETENTION
Definition: worker.c:394

RDT_GET_CANDIDATE_XID
@ RDT_GET_CANDIDATE_XID
Definition: worker.c:389

RDT_REQUEST_PUBLISHER_STATUS
@ RDT_REQUEST_PUBLISHER_STATUS
Definition: worker.c:390

RDT_WAIT_FOR_LOCAL_FLUSH
@ RDT_WAIT_FOR_LOCAL_FLUSH
Definition: worker.c:392

RDT_STOP_CONFLICT_INFO_RETENTION
@ RDT_STOP_CONFLICT_INFO_RETENTION
Definition: worker.c:393

UpdateWorkerStats
static void UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply)
Definition: worker.c:3965

get_candidate_xid
static void get_candidate_xid(RetainDeadTuplesData *rdt_data)
Definition: worker.c:4453

subscription_change_cb
static void subscription_change_cb(Datum arg, int cacheid, uint32 hashvalue)
Definition: worker.c:5167

get_transaction_apply_action
static TransApplyAction get_transaction_apply_action(TransactionId xid, ParallelApplyWorkerInfo **winfo)
Definition: worker.c:6316

TransApplyAction
TransApplyAction
Definition: worker.c:370

TRANS_LEADER_SERIALIZE
@ TRANS_LEADER_SERIALIZE
Definition: worker.c:375

TRANS_PARALLEL_APPLY
@ TRANS_PARALLEL_APPLY
Definition: worker.c:378

TRANS_LEADER_SEND_TO_PARALLEL
@ TRANS_LEADER_SEND_TO_PARALLEL
Definition: worker.c:376

TRANS_LEADER_APPLY
@ TRANS_LEADER_APPLY
Definition: worker.c:372

TRANS_LEADER_PARTIAL_SERIALIZE
@ TRANS_LEADER_PARTIAL_SERIALIZE
Definition: worker.c:377

handle_streamed_transaction
static bool handle_streamed_transaction(LogicalRepMsgType action, StringInfo s)
Definition: worker.c:777

stream_open_and_write_change
static void stream_open_and_write_change(TransactionId xid, char action, StringInfo s)
Definition: worker.c:5498

ApplyExecutionData
struct ApplyExecutionData ApplyExecutionData

changes_filename
static void changes_filename(char *path, Oid subid, TransactionId xid)
Definition: worker.c:5367

InitializingApplyWorker
bool InitializingApplyWorker
Definition: worker.c:499

apply_worker_exit
static void apply_worker_exit(void)
Definition: worker.c:5004

stream_fd
static BufFile * stream_fd
Definition: worker.c:520

apply_handle_update
static void apply_handle_update(StringInfo s)
Definition: worker.c:2790

RetainDeadTuplesData
struct RetainDeadTuplesData RetainDeadTuplesData

stream_stop_internal
void stream_stop_internal(TransactionId xid)
Definition: worker.c:1862

apply_handle_stream_commit
static void apply_handle_stream_commit(StringInfo s)
Definition: worker.c:2390

start_apply
void start_apply(XLogRecPtr origin_startpos)
Definition: worker.c:5583

stop_skipping_changes
static void stop_skipping_changes(void)
Definition: worker.c:6048

ApplySubXactData
struct ApplySubXactData ApplySubXactData

NAPTIME_PER_CYCLE
#define NAPTIME_PER_CYCLE
Definition: worker.c:299

FindReplTupleInLocalRel
static bool FindReplTupleInLocalRel(ApplyExecutionData *edata, Relation localrel, LogicalRepRelation *remoterel, Oid localidxoid, TupleTableSlot *remoteslot, TupleTableSlot **localslot)
Definition: worker.c:3174

get_flush_position
static void get_flush_position(XLogRecPtr *write, XLogRecPtr *flush, bool *have_pending_txes)
Definition: worker.c:3895

update_retention_status
static bool update_retention_status(bool active)
Definition: worker.c:4882

parallel_stream_nchanges
static uint32 parallel_stream_nchanges
Definition: worker.c:496

apply_handle_commit_prepared
static void apply_handle_commit_prepared(StringInfo s)
Definition: worker.c:1405

LogicalRepApplyLoop
static void LogicalRepApplyLoop(XLogRecPtr last_received)
Definition: worker.c:3981

LogicalRepWorkersWakeupAtCommit
void LogicalRepWorkersWakeupAtCommit(Oid subid)
Definition: worker.c:6255

MAX_XID_ADVANCE_INTERVAL
#define MAX_XID_ADVANCE_INTERVAL
Definition: worker.c:456

IsLogicalWorker
bool IsLogicalWorker(void)
Definition: worker.c:6002

subxact_data
static ApplySubXactData subxact_data
Definition: worker.c:545

apply_handle_tuple_routing
static void apply_handle_tuple_routing(ApplyExecutionData *edata, TupleTableSlot *remoteslot, LogicalRepTupleData *newtup, CmdType operation)
Definition: worker.c:3351

apply_error_callback_arg
static ApplyErrorCallbackArg apply_error_callback_arg
Definition: worker.c:459

in_remote_transaction
bool in_remote_transaction
Definition: worker.c:484

skip_xact_finish_lsn
static XLogRecPtr skip_xact_finish_lsn
Definition: worker.c:516

stream_open_file
static void stream_open_file(Oid subid, TransactionId xid, bool first_segment)
Definition: worker.c:5405

apply_handle_delete
static void apply_handle_delete(StringInfo s)
Definition: worker.c:3012

apply_dispatch
void apply_dispatch(StringInfo s)
Definition: worker.c:3775

adjust_xid_advance_interval
static void adjust_xid_advance_interval(RetainDeadTuplesData *rdt_data, bool new_xid_found)
Definition: worker.c:4955

is_skipping_changes
#define is_skipping_changes()
Definition: worker.c:517

stream_write_change
static void stream_write_change(char action, StringInfo s)
Definition: worker.c:5468

clear_subscription_skip_lsn
static void clear_subscription_skip_lsn(XLogRecPtr finish_lsn)
Definition: worker.c:6070

replorigin_reset
static void replorigin_reset(int code, Datum arg)
Definition: worker.c:5858

apply_handle_update_internal
static void apply_handle_update_internal(ApplyExecutionData *edata, ResultRelInfo *relinfo, TupleTableSlot *remoteslot, LogicalRepTupleData *newtup, Oid localindexoid)
Definition: worker.c:2907

ensure_last_message
static void ensure_last_message(FileSet *stream_fileset, TransactionId xid, int fileno, off_t offset)
Definition: worker.c:2228

MIN_XID_ADVANCE_INTERVAL
#define MIN_XID_ADVANCE_INTERVAL
Definition: worker.c:455

apply_handle_begin
static void apply_handle_begin(StringInfo s)
Definition: worker.c:1211

DisableSubscriptionAndExit
void DisableSubscriptionAndExit(void)
Definition: worker.c:5943

lsn_mapping
static dlist_head lsn_mapping
Definition: worker.c:308

IsLogicalParallelApplyWorker
bool IsLogicalParallelApplyWorker(void)
Definition: worker.c:6011

AtEOXact_LogicalRepWorkers
void AtEOXact_LogicalRepWorkers(bool isCommit)
Definition: worker.c:6269

slot_store_data
static void slot_store_data(TupleTableSlot *slot, LogicalRepRelMapEntry *rel, LogicalRepTupleData *tupleData)
Definition: worker.c:1017

ReplicationOriginNameForLogicalRep
void ReplicationOriginNameForLogicalRep(Oid suboid, Oid relid, char *originname, Size szoriginname)
Definition: worker.c:641

finish_edata
static void finish_edata(ApplyExecutionData *edata)
Definition: worker.c:928

slot_modify_data
static void slot_modify_data(TupleTableSlot *slot, TupleTableSlot *srcslot, LogicalRepRelMapEntry *rel, LogicalRepTupleData *tupleData)
Definition: worker.c:1118

set_apply_error_context_xact
static void set_apply_error_context_xact(TransactionId xid, XLogRecPtr lsn)
Definition: worker.c:6231

apply_error_context_stack
ErrorContextCallback * apply_error_context_stack
Definition: worker.c:469

stream_abort_internal
static void stream_abort_internal(TransactionId xid, TransactionId subxid)
Definition: worker.c:1988

apply_handle_commit
static void apply_handle_commit(StringInfo s)
Definition: worker.c:1236

IsIndexUsableForFindingDeletedTuple
static bool IsIndexUsableForFindingDeletedTuple(Oid localindexoid, TransactionId conflict_detection_xmin)
Definition: worker.c:3235

stream_start_internal
void stream_start_internal(TransactionId xid, bool first_segment)
Definition: worker.c:1687

on_commit_wakeup_workers_subids
static List * on_commit_wakeup_workers_subids
Definition: worker.c:482

apply_handle_stream_abort
static void apply_handle_stream_abort(StringInfo s)
Definition: worker.c:2071

apply_handle_relation
static void apply_handle_relation(StringInfo s)
Definition: worker.c:2563

set_apply_error_context_origin
void set_apply_error_context_origin(char *originname)
Definition: worker.c:6301

wait_for_publisher_status
static void wait_for_publisher_status(RetainDeadTuplesData *rdt_data, bool status_received)
Definition: worker.c:4555

ApplyErrorCallbackArg
struct ApplyErrorCallbackArg ApplyErrorCallbackArg

ApplyContext
MemoryContext ApplyContext
Definition: worker.c:472

subxact_info_write
static void subxact_info_write(Oid subid, TransactionId xid)
Definition: worker.c:5182

TargetPrivilegesCheck
static void TargetPrivilegesCheck(Relation rel, AclMode mode)
Definition: worker.c:2601

apply_handle_prepare
static void apply_handle_prepare(StringInfo s)
Definition: worker.c:1331

apply_handle_rollback_prepared
static void apply_handle_rollback_prepared(StringInfo s)
Definition: worker.c:1457

run_apply_worker
static void run_apply_worker()
Definition: worker.c:5624

SetupApplyOrSyncWorker
void SetupApplyOrSyncWorker(int worker_slot)
Definition: worker.c:5869

apply_handle_stream_stop
static void apply_handle_stream_stop(StringInfo s)
Definition: worker.c:1885

apply_handle_origin
static void apply_handle_origin(StringInfo s)
Definition: worker.c:1666

request_publisher_status
static void request_publisher_status(RetainDeadTuplesData *rdt_data)
Definition: worker.c:4516

send_feedback
static void send_feedback(XLogRecPtr recvpos, bool force, bool requestReply)
Definition: worker.c:4297

reset_retention_data_fields
static void reset_retention_data_fields(RetainDeadTuplesData *rdt_data)
Definition: worker.c:4921

process_rdt_phase_transition
static void process_rdt_phase_transition(RetainDeadTuplesData *rdt_data, bool status_received)
Definition: worker.c:4423

maybe_advance_nonremovable_xid
static void maybe_advance_nonremovable_xid(RetainDeadTuplesData *rdt_data, bool status_received)
Definition: worker.c:4387

LogRepWorkerWalRcvConn
WalReceiverConn * LogRepWorkerWalRcvConn
Definition: worker.c:477

resume_conflict_info_retention
static void resume_conflict_info_retention(RetainDeadTuplesData *rdt_data)
Definition: worker.c:4842

remote_final_lsn
static XLogRecPtr remote_final_lsn
Definition: worker.c:485

MySubscriptionValid
static bool MySubscriptionValid
Definition: worker.c:480

apply_error_callback
void apply_error_callback(void *arg)
Definition: worker.c:6159

store_flush_position
void store_flush_position(XLogRecPtr remote_lsn, XLogRecPtr local_lsn)
Definition: worker.c:3939

LogicalStreamingContext
static MemoryContext LogicalStreamingContext
Definition: worker.c:475

maybe_reread_subscription
void maybe_reread_subscription(void)
Definition: worker.c:5038

apply_handle_commit_internal
static void apply_handle_commit_internal(LogicalRepCommitData *commit_data)
Definition: worker.c:2503

InitializeLogRepWorker
void InitializeLogRepWorker(void)
Definition: worker.c:5737

in_streamed_transaction
static bool in_streamed_transaction
Definition: worker.c:488

SubXactInfo
struct SubXactInfo SubXactInfo

apply_handle_begin_prepare
static void apply_handle_begin_prepare(StringInfo s)
Definition: worker.c:1265

FlushPosition
struct FlushPosition FlushPosition

ApplyWorkerMain
void ApplyWorkerMain(Datum main_arg)
Definition: worker.c:5923

apply_spooled_messages
void apply_spooled_messages(FileSet *stream_fileset, TransactionId xid, XLogRecPtr lsn)
Definition: worker.c:2260

apply_handle_stream_start
static void apply_handle_stream_start(StringInfo s)
Definition: worker.c:1725

maybe_start_skipping_changes
static void maybe_start_skipping_changes(XLogRecPtr finish_lsn)
Definition: worker.c:6021

stop_conflict_info_retention
static void stop_conflict_info_retention(RetainDeadTuplesData *rdt_data)
Definition: worker.c:4804

MySubscription
Subscription * MySubscription
Definition: worker.c:479

apply_handle_prepare_internal
static void apply_handle_prepare_internal(LogicalRepPreparedTxnData *prepare_data)
Definition: worker.c:1294

stream_close_file
static void stream_close_file(void)
Definition: worker.c:5450

stream_xid
static TransactionId stream_xid
Definition: worker.c:490

apply_handle_insert
static void apply_handle_insert(StringInfo s)
Definition: worker.c:2633

slot_fill_defaults
static void slot_fill_defaults(LogicalRepRelMapEntry *rel, EState *estate, TupleTableSlot *slot)
Definition: worker.c:959

subxact_info_read
static void subxact_info_read(Oid subid, TransactionId xid)
Definition: worker.c:5231

FindDeletedTupleInLocalRel
static bool FindDeletedTupleInLocalRel(Relation localrel, Oid localidxoid, TupleTableSlot *remoteslot, TransactionId *delete_xid, RepOriginId *delete_origin, TimestampTz *delete_time)
Definition: worker.c:3269

apply_handle_delete_internal
static void apply_handle_delete_internal(ApplyExecutionData *edata, ResultRelInfo *relinfo, TupleTableSlot *remoteslot, Oid localindexoid)
Definition: worker.c:3106

reset_apply_error_context_info
static void reset_apply_error_context_info(void)
Definition: worker.c:6239

TimestampDifferenceMilliseconds
long TimestampDifferenceMilliseconds(TimestampTz start_time, TimestampTz stop_time)
Definition: timestamp.c:1757

TimestampDifferenceExceeds
bool TimestampDifferenceExceeds(TimestampTz start_time, TimestampTz stop_time, int msec)
Definition: timestamp.c:1781

GetCurrentTimestamp
TimestampTz GetCurrentTimestamp(void)
Definition: timestamp.c:1645

now
Datum now(PG_FUNCTION_ARGS)
Definition: timestamp.c:1609

pgstat_report_activity
void pgstat_report_activity(BackendState state, const char *cmd_str)
Definition: backend_status.c:572

STATE_IDLE
@ STATE_IDLE
Definition: backend_status.h:28

STATE_IDLEINTRANSACTION
@ STATE_IDLEINTRANSACTION
Definition: backend_status.h:30

STATE_RUNNING
@ STATE_RUNNING
Definition: backend_status.h:29

BackgroundWorkerUnblockSignals
void BackgroundWorkerUnblockSignals(void)
Definition: bgworker.c:930

BackgroundWorkerInitializeConnectionByOid
void BackgroundWorkerInitializeConnectionByOid(Oid dboid, Oid useroid, uint32 flags)
Definition: bgworker.c:890

bgworker.h

bms_make_singleton
Bitmapset * bms_make_singleton(int x)
Definition: bitmapset.c:216

bms_add_member
Bitmapset * bms_add_member(Bitmapset *a, int x)
Definition: bitmapset.c:815

values
static Datum values[MAXATTR]
Definition: bootstrap.c:153

BufFileOpenFileSet
BufFile * BufFileOpenFileSet(FileSet *fileset, const char *name, int mode, bool missing_ok)
Definition: buffile.c:291

BufFileReadExact
void BufFileReadExact(BufFile *file, void *ptr, size_t size)
Definition: buffile.c:654

BufFileTell
void BufFileTell(BufFile *file, int *fileno, off_t *offset)
Definition: buffile.c:833

BufFileWrite
void BufFileWrite(BufFile *file, const void *ptr, size_t size)
Definition: buffile.c:676

BufFileReadMaybeEOF
size_t BufFileReadMaybeEOF(BufFile *file, void *ptr, size_t size, bool eofOK)
Definition: buffile.c:664

BufFileTruncateFileSet
void BufFileTruncateFileSet(BufFile *file, int fileno, off_t offset)
Definition: buffile.c:928

BufFileCreateFileSet
BufFile * BufFileCreateFileSet(FileSet *fileset, const char *name)
Definition: buffile.c:267

BufFileSeek
int BufFileSeek(BufFile *file, int fileno, off_t offset, int whence)
Definition: buffile.c:740

BufFileClose
void BufFileClose(BufFile *file)
Definition: buffile.c:412

BufFileDeleteFileSet
void BufFileDeleteFileSet(FileSet *fileset, const char *name, bool missing_ok)
Definition: buffile.c:364

buffile.h

Min
#define Min(x, y)
Definition: c.h:1008

likely
#define likely(x)
Definition: c.h:406

int64
int64_t int64
Definition: c.h:540

uint64
uint64_t uint64
Definition: c.h:544

uint32
uint32_t uint32
Definition: c.h:543

TransactionId
uint32 TransactionId
Definition: c.h:662

OidIsValid
#define OidIsValid(objectId)
Definition: c.h:779

Size
size_t Size
Definition: c.h:615

track_commit_timestamp
bool track_commit_timestamp
Definition: commit_ts.c:109

commit_ts.h

ReportApplyConflict
void ReportApplyConflict(EState *estate, ResultRelInfo *relinfo, int elevel, ConflictType type, TupleTableSlot *searchslot, TupleTableSlot *remoteslot, List *conflicttuples)
Definition: conflict.c:104

InitConflictIndexes
void InitConflictIndexes(ResultRelInfo *relInfo)
Definition: conflict.c:139

GetTupleTransactionInfo
bool GetTupleTransactionInfo(TupleTableSlot *localslot, TransactionId *xmin, RepOriginId *localorigin, TimestampTz *localts)
Definition: conflict.c:63

conflict.h

ConflictType
ConflictType
Definition: conflict.h:32

CT_UPDATE_DELETED
@ CT_UPDATE_DELETED
Definition: conflict.h:43

CT_DELETE_MISSING
@ CT_DELETE_MISSING
Definition: conflict.h:52

CT_UPDATE_ORIGIN_DIFFERS
@ CT_UPDATE_ORIGIN_DIFFERS
Definition: conflict.h:37

CT_UPDATE_MISSING
@ CT_UPDATE_MISSING
Definition: conflict.h:46

CT_DELETE_ORIGIN_DIFFERS
@ CT_DELETE_ORIGIN_DIFFERS
Definition: conflict.h:49

TimestampTz
int64 TimestampTz
Definition: timestamp.h:39

load_file
void load_file(const char *filename, bool restricted)
Definition: dfmgr.c:149

errmsg_internal
int errmsg_internal(const char *fmt,...)
Definition: elog.c:1170

EmitErrorReport
void EmitErrorReport(void)
Definition: elog.c:1704

errdetail_internal
int errdetail_internal(const char *fmt,...)
Definition: elog.c:1243

errdetail
int errdetail(const char *fmt,...)
Definition: elog.c:1216

error_context_stack
ErrorContextCallback * error_context_stack
Definition: elog.c:95

FlushErrorState
void FlushErrorState(void)
Definition: elog.c:1884

errcode
int errcode(int sqlerrcode)
Definition: elog.c:863

errmsg
int errmsg(const char *fmt,...)
Definition: elog.c:1080

LOG
#define LOG
Definition: elog.h:31

PG_RE_THROW
#define PG_RE_THROW()
Definition: elog.h:405

errcontext
#define errcontext
Definition: elog.h:198

PG_TRY
#define PG_TRY(...)
Definition: elog.h:372

WARNING
#define WARNING
Definition: elog.h:36

DEBUG2
#define DEBUG2
Definition: elog.h:29

PG_END_TRY
#define PG_END_TRY(...)
Definition: elog.h:397

DEBUG1
#define DEBUG1
Definition: elog.h:30

ERROR
#define ERROR
Definition: elog.h:39

PG_CATCH
#define PG_CATCH(...)
Definition: elog.h:382

elog
#define elog(elevel,...)
Definition: elog.h:226

ereport
#define ereport(elevel,...)
Definition: elog.h:150

equal
bool equal(const void *a, const void *b)
Definition: equalfuncs.c:223

err
void err(int eval, const char *fmt,...)
Definition: err.c:43

ExecInitExpr
ExprState * ExecInitExpr(Expr *node, PlanState *parent)
Definition: execExpr.c:143

ExecCloseIndices
void ExecCloseIndices(ResultRelInfo *resultRelInfo)
Definition: execIndexing.c:238

ExecOpenIndices
void ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative)
Definition: execIndexing.c:160

ExecPartitionCheck
bool ExecPartitionCheck(ResultRelInfo *resultRelInfo, TupleTableSlot *slot, EState *estate, bool emitError)
Definition: execMain.c:1856

EvalPlanQualInit
void EvalPlanQualInit(EPQState *epqstate, EState *parentestate, Plan *subplan, List *auxrowmarks, int epqParam, List *resultRelations)
Definition: execMain.c:2718

InitResultRelInfo
void InitResultRelInfo(ResultRelInfo *resultRelInfo, Relation resultRelationDesc, Index resultRelationIndex, ResultRelInfo *partition_root_rri, int instrument_options)
Definition: execMain.c:1243

EvalPlanQualEnd
void EvalPlanQualEnd(EPQState *epqstate)
Definition: execMain.c:3182

ExecSetupPartitionTupleRouting
PartitionTupleRouting * ExecSetupPartitionTupleRouting(EState *estate, Relation rel)
Definition: execPartition.c:218

ExecFindPartition
ResultRelInfo * ExecFindPartition(ModifyTableState *mtstate, ResultRelInfo *rootResultRelInfo, PartitionTupleRouting *proute, TupleTableSlot *slot, EState *estate)
Definition: execPartition.c:265

ExecCleanupTupleRouting
void ExecCleanupTupleRouting(ModifyTableState *mtstate, PartitionTupleRouting *proute)
Definition: execPartition.c:1241

execPartition.h

CheckSubscriptionRelkind
void CheckSubscriptionRelkind(char localrelkind, char remoterelkind, const char *nspname, const char *relname)
Definition: execReplication.c:1121

RelationFindReplTupleSeq
bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode, TupleTableSlot *searchslot, TupleTableSlot *outslot)
Definition: execReplication.c:368

RelationFindReplTupleByIndex
bool RelationFindReplTupleByIndex(Relation rel, Oid idxoid, LockTupleMode lockmode, TupleTableSlot *searchslot, TupleTableSlot *outslot)
Definition: execReplication.c:181

ExecSimpleRelationDelete
void ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo, EState *estate, EPQState *epqstate, TupleTableSlot *searchslot)
Definition: execReplication.c:977

RelationFindDeletedTupleInfoSeq
bool RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot, TransactionId oldestxmin, TransactionId *delete_xid, RepOriginId *delete_origin, TimestampTz *delete_time)
Definition: execReplication.c:561

ExecSimpleRelationUpdate
void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo, EState *estate, EPQState *epqstate, TupleTableSlot *searchslot, TupleTableSlot *slot)
Definition: execReplication.c:894

ExecSimpleRelationInsert
void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo, EState *estate, TupleTableSlot *slot)
Definition: execReplication.c:805

RelationFindDeletedTupleInfoByIndex
bool RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid, TupleTableSlot *searchslot, TransactionId oldestxmin, TransactionId *delete_xid, RepOriginId *delete_origin, TimestampTz *delete_time)
Definition: execReplication.c:630

ExecResetTupleTable
void ExecResetTupleTable(List *tupleTable, bool shouldFree)
Definition: execTuples.c:1380

TTSOpsVirtual
const TupleTableSlotOps TTSOpsVirtual
Definition: execTuples.c:84

ExecStoreVirtualTuple
TupleTableSlot * ExecStoreVirtualTuple(TupleTableSlot *slot)
Definition: execTuples.c:1741

ExecInitExtraTupleSlot
TupleTableSlot * ExecInitExtraTupleSlot(EState *estate, TupleDesc tupledesc, const TupleTableSlotOps *tts_ops)
Definition: execTuples.c:2020

ExecGetRootToChildMap
TupleConversionMap * ExecGetRootToChildMap(ResultRelInfo *resultRelInfo, EState *estate)
Definition: execUtils.c:1326

ExecInitRangeTable
void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos, Bitmapset *unpruned_relids)
Definition: execUtils.c:773

FreeExecutorState
void FreeExecutorState(EState *estate)
Definition: execUtils.c:192

CreateExecutorState
EState * CreateExecutorState(void)
Definition: execUtils.c:88

executor.h

GetPerTupleExprContext
#define GetPerTupleExprContext(estate)
Definition: executor.h:656

GetPerTupleMemoryContext
#define GetPerTupleMemoryContext(estate)
Definition: executor.h:661

EvalPlanQualSetSlot
#define EvalPlanQualSetSlot(epqstate, slot)
Definition: executor.h:289

ExecEvalExpr
static Datum ExecEvalExpr(ExprState *state, ExprContext *econtext, bool *isNull)
Definition: executor.h:393

FileSetInit
void FileSetInit(FileSet *fileset)
Definition: fileset.c:52

OidReceiveFunctionCall
Datum OidReceiveFunctionCall(Oid functionId, StringInfo buf, Oid typioparam, int32 typmod)
Definition: fmgr.c:1772

OidInputFunctionCall
Datum OidInputFunctionCall(Oid functionId, char *str, Oid typioparam, int32 typmod)
Definition: fmgr.c:1754

MyLatch
struct Latch * MyLatch
Definition: globals.c:63

ProcessConfigFile
void ProcessConfigFile(GucContext context)
Definition: guc-file.l:120

SetConfigOption
void SetConfigOption(const char *name, const char *value, GucContext context, GucSource source)
Definition: guc.c:4196

guc.h

PGC_S_OVERRIDE
@ PGC_S_OVERRIDE
Definition: guc.h:123

PGC_SUSET
@ PGC_SUSET
Definition: guc.h:78

PGC_SIGHUP
@ PGC_SIGHUP
Definition: guc.h:75

PGC_BACKEND
@ PGC_BACKEND
Definition: guc.h:77

Assert
Assert(PointerIsAligned(start, uint64))

heap_modify_tuple
HeapTuple heap_modify_tuple(HeapTuple tuple, TupleDesc tupleDesc, const Datum *replValues, const bool *replIsnull, const bool *doReplace)
Definition: heaptuple.c:1210

heap_freetuple
void heap_freetuple(HeapTuple htup)
Definition: heaptuple.c:1435

HeapTupleIsValid
#define HeapTupleIsValid(tuple)
Definition: htup.h:78

HeapTupleHeaderGetXmin
static TransactionId HeapTupleHeaderGetXmin(const HeapTupleHeaderData *tup)
Definition: htup_details.h:324

GETSTRUCT
static void * GETSTRUCT(const HeapTupleData *tuple)
Definition: htup_details.h:728

dlist_delete
static void dlist_delete(dlist_node *node)
Definition: ilist.h:405

dlist_tail_element
#define dlist_tail_element(type, membername, lhead)
Definition: ilist.h:612

dlist_foreach_modify
#define dlist_foreach_modify(iter, lhead)
Definition: ilist.h:640

dlist_is_empty
static bool dlist_is_empty(const dlist_head *head)
Definition: ilist.h:336

dlist_push_tail
static void dlist_push_tail(dlist_head *head, dlist_node *node)
Definition: ilist.h:364

DLIST_STATIC_INIT
#define DLIST_STATIC_INIT(name)
Definition: ilist.h:281

dlist_container
#define dlist_container(type, membername, ptr)
Definition: ilist.h:593

index_close
void index_close(Relation relation, LOCKMODE lockmode)
Definition: indexam.c:177

index_open
Relation index_open(Oid relationId, LOCKMODE lockmode)
Definition: indexam.c:133

CatalogTupleUpdate
void CatalogTupleUpdate(Relation heapRel, const ItemPointerData *otid, HeapTuple tup)
Definition: indexing.c:313

indexing.h

write
#define write(a, b, c)
Definition: win32.h:14

ConfigReloadPending
volatile sig_atomic_t ConfigReloadPending
Definition: interrupt.c:27

SignalHandlerForConfigReload
void SignalHandlerForConfigReload(SIGNAL_ARGS)
Definition: interrupt.c:61

interrupt.h

AcceptInvalidationMessages
void AcceptInvalidationMessages(void)
Definition: inval.c:930

CacheRegisterSyscacheCallback
void CacheRegisterSyscacheCallback(int cacheid, SyscacheCallbackFunction func, Datum arg)
Definition: inval.c:1812

inval.h

before_shmem_exit
void before_shmem_exit(pg_on_exit_callback function, Datum arg)
Definition: ipc.c:337

proc_exit
void proc_exit(int code)
Definition: ipc.c:104

ipc.h

i
int i
Definition: isn.c:77

WaitLatchOrSocket
int WaitLatchOrSocket(Latch *latch, int wakeEvents, pgsocket sock, long timeout, uint32 wait_event_info)
Definition: latch.c:223

ResetLatch
void ResetLatch(Latch *latch)
Definition: latch.c:374

logicalrep_workers_find
List * logicalrep_workers_find(Oid subid, bool only_running, bool acquire_lock)
Definition: launcher.c:293

logicalrep_worker_wakeup_ptr
void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker)
Definition: launcher.c:746

logicalrep_worker_attach
void logicalrep_worker_attach(int slot)
Definition: launcher.c:757

ApplyLauncherWakeup
void ApplyLauncherWakeup(void)
Definition: launcher.c:1194

logicalrep_worker_find
LogicalRepWorker * logicalrep_worker_find(LogicalRepWorkerType wtype, Oid subid, Oid relid, bool only_running)
Definition: launcher.c:258

logicalrep_worker_wakeup
void logicalrep_worker_wakeup(LogicalRepWorkerType wtype, Oid subid, Oid relid)
Definition: launcher.c:723

MyLogicalRepWorker
LogicalRepWorker * MyLogicalRepWorker
Definition: launcher.c:56

ApplyLauncherForgetWorkerStartTime
void ApplyLauncherForgetWorkerStartTime(Oid subid)
Definition: launcher.c:1154

lappend
List * lappend(List *list, void *datum)
Definition: list.c:339

lappend_oid
List * lappend_oid(List *list, Oid datum)
Definition: list.c:375

list_append_unique_oid
List * list_append_unique_oid(List *list, Oid datum)
Definition: list.c:1380

list_member_oid
bool list_member_oid(const List *list, Oid datum)
Definition: list.c:722

LockSharedObject
void LockSharedObject(Oid classid, Oid objid, uint16 objsubid, LOCKMODE lockmode)
Definition: lmgr.c:1088

lmgr.h

LOCKMODE
int LOCKMODE
Definition: lockdefs.h:26

NoLock
#define NoLock
Definition: lockdefs.h:34

AccessExclusiveLock
#define AccessExclusiveLock
Definition: lockdefs.h:43

AccessShareLock
#define AccessShareLock
Definition: lockdefs.h:36

RowExclusiveLock
#define RowExclusiveLock
Definition: lockdefs.h:38

LockTupleExclusive
@ LockTupleExclusive
Definition: lockoptions.h:58

logicallauncher.h

logicalproto.h

LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM
#define LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM
Definition: logicalproto.h:44

LOGICALREP_PROTO_STREAM_VERSION_NUM
#define LOGICALREP_PROTO_STREAM_VERSION_NUM
Definition: logicalproto.h:42

LOGICALREP_PROTO_TWOPHASE_VERSION_NUM
#define LOGICALREP_PROTO_TWOPHASE_VERSION_NUM
Definition: logicalproto.h:43

LOGICALREP_COLUMN_UNCHANGED
#define LOGICALREP_COLUMN_UNCHANGED
Definition: logicalproto.h:97

LogicalRepMsgType
LogicalRepMsgType
Definition: logicalproto.h:58

LOGICAL_REP_MSG_INSERT
@ LOGICAL_REP_MSG_INSERT
Definition: logicalproto.h:62

LOGICAL_REP_MSG_TRUNCATE
@ LOGICAL_REP_MSG_TRUNCATE
Definition: logicalproto.h:65

LOGICAL_REP_MSG_STREAM_STOP
@ LOGICAL_REP_MSG_STREAM_STOP
Definition: logicalproto.h:74

LOGICAL_REP_MSG_BEGIN
@ LOGICAL_REP_MSG_BEGIN
Definition: logicalproto.h:59

LOGICAL_REP_MSG_STREAM_PREPARE
@ LOGICAL_REP_MSG_STREAM_PREPARE
Definition: logicalproto.h:77

LOGICAL_REP_MSG_STREAM_ABORT
@ LOGICAL_REP_MSG_STREAM_ABORT
Definition: logicalproto.h:76

LOGICAL_REP_MSG_BEGIN_PREPARE
@ LOGICAL_REP_MSG_BEGIN_PREPARE
Definition: logicalproto.h:69

LOGICAL_REP_MSG_STREAM_START
@ LOGICAL_REP_MSG_STREAM_START
Definition: logicalproto.h:73

LOGICAL_REP_MSG_COMMIT
@ LOGICAL_REP_MSG_COMMIT
Definition: logicalproto.h:60

LOGICAL_REP_MSG_PREPARE
@ LOGICAL_REP_MSG_PREPARE
Definition: logicalproto.h:70

LOGICAL_REP_MSG_RELATION
@ LOGICAL_REP_MSG_RELATION
Definition: logicalproto.h:66

LOGICAL_REP_MSG_MESSAGE
@ LOGICAL_REP_MSG_MESSAGE
Definition: logicalproto.h:68

LOGICAL_REP_MSG_ROLLBACK_PREPARED
@ LOGICAL_REP_MSG_ROLLBACK_PREPARED
Definition: logicalproto.h:72

LOGICAL_REP_MSG_COMMIT_PREPARED
@ LOGICAL_REP_MSG_COMMIT_PREPARED
Definition: logicalproto.h:71

LOGICAL_REP_MSG_TYPE
@ LOGICAL_REP_MSG_TYPE
Definition: logicalproto.h:67

LOGICAL_REP_MSG_DELETE
@ LOGICAL_REP_MSG_DELETE
Definition: logicalproto.h:64

LOGICAL_REP_MSG_STREAM_COMMIT
@ LOGICAL_REP_MSG_STREAM_COMMIT
Definition: logicalproto.h:75

LOGICAL_REP_MSG_ORIGIN
@ LOGICAL_REP_MSG_ORIGIN
Definition: logicalproto.h:61

LOGICAL_REP_MSG_UPDATE
@ LOGICAL_REP_MSG_UPDATE
Definition: logicalproto.h:63

LogicalRepRelId
uint32 LogicalRepRelId
Definition: logicalproto.h:101

LOGICALREP_PROTO_VERSION_NUM
#define LOGICALREP_PROTO_VERSION_NUM
Definition: logicalproto.h:41

LOGICALREP_COLUMN_BINARY
#define LOGICALREP_COLUMN_BINARY
Definition: logicalproto.h:99

LOGICALREP_COLUMN_TEXT
#define LOGICALREP_COLUMN_TEXT
Definition: logicalproto.h:98

logicalrelation.h

logicalworker.h

get_rel_name
char * get_rel_name(Oid relid)
Definition: lsyscache.c:2095

getTypeInputInfo
void getTypeInputInfo(Oid type, Oid *typInput, Oid *typIOParam)
Definition: lsyscache.c:3041

get_namespace_name
char * get_namespace_name(Oid nspid)
Definition: lsyscache.c:3533

getTypeBinaryInputInfo
void getTypeBinaryInputInfo(Oid type, Oid *typReceive, Oid *typIOParam)
Definition: lsyscache.c:3107

lsyscache.h

LWLockAcquire
bool LWLockAcquire(LWLock *lock, LWLockMode mode)
Definition: lwlock.c:1174

LWLockRelease
void LWLockRelease(LWLock *lock)
Definition: lwlock.c:1894

LW_SHARED
@ LW_SHARED
Definition: lwlock.h:113

MemoryContextStrdup
char * MemoryContextStrdup(MemoryContext context, const char *string)
Definition: mcxt.c:1746

MemoryContextReset
void MemoryContextReset(MemoryContext context)
Definition: mcxt.c:400

TopTransactionContext
MemoryContext TopTransactionContext
Definition: mcxt.c:171

pstrdup
char * pstrdup(const char *in)
Definition: mcxt.c:1759

repalloc
void * repalloc(void *pointer, Size size)
Definition: mcxt.c:1610

pfree
void pfree(void *pointer)
Definition: mcxt.c:1594

palloc0
void * palloc0(Size size)
Definition: mcxt.c:1395

TopMemoryContext
MemoryContext TopMemoryContext
Definition: mcxt.c:166

palloc
void * palloc(Size size)
Definition: mcxt.c:1365

memutils.h

AllocSetContextCreate
#define AllocSetContextCreate
Definition: memutils.h:129

ALLOCSET_DEFAULT_SIZES
#define ALLOCSET_DEFAULT_SIZES
Definition: memutils.h:160

miscadmin.h

RESUME_INTERRUPTS
#define RESUME_INTERRUPTS()
Definition: miscadmin.h:136

CHECK_FOR_INTERRUPTS
#define CHECK_FOR_INTERRUPTS()
Definition: miscadmin.h:123

HOLD_INTERRUPTS
#define HOLD_INTERRUPTS()
Definition: miscadmin.h:134

GetUserId
Oid GetUserId(void)
Definition: miscinit.c:469

GetUserNameFromId
char * GetUserNameFromId(Oid roleid, bool noerr)
Definition: miscinit.c:988

generate_unaccent_rules.action
action
Definition: generate_unaccent_rules.py:287

CmdType
CmdType
Definition: nodes.h:273

CMD_INSERT
@ CMD_INSERT
Definition: nodes.h:277

CMD_DELETE
@ CMD_DELETE
Definition: nodes.h:278

CMD_UPDATE
@ CMD_UPDATE
Definition: nodes.h:276

makeNode
#define makeNode(_type_)
Definition: nodes.h:161

get_relkind_objtype
ObjectType get_relkind_objtype(char relkind)
Definition: objectaddress.c:6185

optimizer.h

replorigin_session_origin_timestamp
TimestampTz replorigin_session_origin_timestamp
Definition: origin.c:165

replorigin_by_name
RepOriginId replorigin_by_name(const char *roname, bool missing_ok)
Definition: origin.c:226

replorigin_create
RepOriginId replorigin_create(const char *roname)
Definition: origin.c:257

replorigin_session_origin
RepOriginId replorigin_session_origin
Definition: origin.c:163

replorigin_session_setup
void replorigin_session_setup(RepOriginId node, int acquired_by)
Definition: origin.c:1120

replorigin_session_get_progress
XLogRecPtr replorigin_session_get_progress(bool flush)
Definition: origin.c:1273

replorigin_session_origin_lsn
XLogRecPtr replorigin_session_origin_lsn
Definition: origin.c:164

origin.h

InvalidRepOriginId
#define InvalidRepOriginId
Definition: origin.h:33

MemoryContextSwitchTo
static MemoryContext MemoryContextSwitchTo(MemoryContext context)
Definition: palloc.h:124

addRTEPermissionInfo
RTEPermissionInfo * addRTEPermissionInfo(List **rteperminfos, RangeTblEntry *rte)
Definition: parse_relation.c:3980

parse_relation.h

ACL_DELETE
#define ACL_DELETE
Definition: parsenodes.h:79

AclMode
uint64 AclMode
Definition: parsenodes.h:74

ACL_INSERT
#define ACL_INSERT
Definition: parsenodes.h:76

ACL_UPDATE
#define ACL_UPDATE
Definition: parsenodes.h:78

RTE_RELATION
@ RTE_RELATION
Definition: parsenodes.h:1043

DROP_RESTRICT
@ DROP_RESTRICT
Definition: parsenodes.h:2398

ACL_SELECT
#define ACL_SELECT
Definition: parsenodes.h:77

ACL_TRUNCATE
#define ACL_TRUNCATE
Definition: parsenodes.h:80

attnum
int16 attnum
Definition: pg_attribute.h:74

Form_pg_attribute
FormData_pg_attribute * Form_pg_attribute
Definition: pg_attribute.h:202

arg
void * arg
Definition: pg_backup_utils.c:29

pg_ceil_log2_32
static uint32 pg_ceil_log2_32(uint32 num)
Definition: pg_bitutils.h:258

mode
static PgChecksumMode mode
Definition: pg_checksums.c:56

NAMEDATALEN
#define NAMEDATALEN
Definition: pg_config_manual.h:39

MAXPGPATH
#define MAXPGPATH
Definition: pg_config_manual.h:105

len
const void size_t len
Definition: pg_crc32c_sse42.c:28

server_version
static int server_version
Definition: pg_dumpall.c:109

find_all_inheritors
List * find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
Definition: pg_inherits.c:255

pg_inherits.h

lfirst
#define lfirst(lc)
Definition: pg_list.h:172

NIL
#define NIL
Definition: pg_list.h:68

list_make1
#define list_make1(x1)
Definition: pg_list.h:212

list_nth
static void * list_nth(const List *list, int n)
Definition: pg_list.h:299

lfirst_oid
#define lfirst_oid(lc)
Definition: pg_list.h:174

pg_lsn.h

LSNGetDatum
static Datum LSNGetDatum(XLogRecPtr X)
Definition: pg_lsn.h:31

options
static char ** options
Definition: pg_recvlogical.c:59

FreeSubscription
void FreeSubscription(Subscription *sub)
Definition: pg_subscription.c:189

DisableSubscription
void DisableSubscription(Oid subid)
Definition: pg_subscription.c:203

UpdateDeadTupleRetentionStatus
void UpdateDeadTupleRetentionStatus(Oid subid, bool active)
Definition: pg_subscription.c:645

GetSubscription
Subscription * GetSubscription(Oid subid, bool missing_ok)
Definition: pg_subscription.c:72

pg_subscription.h

Form_pg_subscription
FormData_pg_subscription * Form_pg_subscription
Definition: pg_subscription.h:111

pg_subscription_rel.h

die
#define die(msg)
Definition: pg_test_fsync.c:100

buf
static char * buf
Definition: pg_test_fsync.c:72

pgstat_report_stat
long pgstat_report_stat(bool force)
Definition: pgstat.c:694

pgstat.h

pgstat_report_subscription_error
void pgstat_report_subscription_error(Oid subid, LogicalRepWorkerType wtype)
Definition: pgstat_subscription.c:28

timestamp
int64 timestamp
Definition: pgtypes_timestamp.h:10

expression_planner
Expr * expression_planner(Expr *expr)
Definition: planner.c:6763

pqsignal
#define pqsignal
Definition: port.h:531

pgsocket
int pgsocket
Definition: port.h:29

snprintf
#define snprintf
Definition: port.h:239

PGINVALID_SOCKET
#define PGINVALID_SOCKET
Definition: port.h:31

postgres.h

ObjectIdGetDatum
static Datum ObjectIdGetDatum(Oid X)
Definition: postgres.h:262

Datum
uint64_t Datum
Definition: postgres.h:70

DatumGetInt32
static int32 DatumGetInt32(Datum X)
Definition: postgres.h:212

InvalidOid
#define InvalidOid
Definition: postgres_ext.h:37

Oid
unsigned int Oid
Definition: postgres_ext.h:32

pq_getmsgint
unsigned int pq_getmsgint(StringInfo msg, int b)
Definition: pqformat.c:415

pq_getmsgbyte
int pq_getmsgbyte(StringInfo msg)
Definition: pqformat.c:399

pq_getmsgint64
int64 pq_getmsgint64(StringInfo msg)
Definition: pqformat.c:453

pqformat.h

pq_sendbyte
static void pq_sendbyte(StringInfo buf, uint8 byt)
Definition: pqformat.h:160

pq_sendint64
static void pq_sendint64(StringInfo buf, uint64 i)
Definition: pqformat.h:152

c
char * c
Definition: preproc-cursor.c:31

fd
static int fd(const char *x, int i)
Definition: preproc-init.c:105

s2
char * s2
Definition: preproc-strings.c:43

GetOldestActiveTransactionId
TransactionId GetOldestActiveTransactionId(bool inCommitOnly, bool allDbs)
Definition: procarray.c:2833

procarray.h

logicalrep_read_commit
void logicalrep_read_commit(StringInfo in, LogicalRepCommitData *commit_data)
Definition: proto.c:98

logicalrep_read_delete
LogicalRepRelId logicalrep_read_delete(StringInfo in, LogicalRepTupleData *oldtup)
Definition: proto.c:561

logicalrep_read_rollback_prepared
void logicalrep_read_rollback_prepared(StringInfo in, LogicalRepRollbackPreparedTxnData *rollback_data)
Definition: proto.c:325

logicalrep_read_begin_prepare
void logicalrep_read_begin_prepare(StringInfo in, LogicalRepPreparedTxnData *begin_data)
Definition: proto.c:134

logicalrep_read_typ
void logicalrep_read_typ(StringInfo in, LogicalRepTyp *ltyp)
Definition: proto.c:757

logicalrep_read_update
LogicalRepRelId logicalrep_read_update(StringInfo in, bool *has_oldtuple, LogicalRepTupleData *oldtup, LogicalRepTupleData *newtup)
Definition: proto.c:487

logicalrep_read_truncate
List * logicalrep_read_truncate(StringInfo in, bool *cascade, bool *restart_seqs)
Definition: proto.c:615

logicalrep_read_stream_abort
void logicalrep_read_stream_abort(StringInfo in, LogicalRepStreamAbortData *abort_data, bool read_abort_info)
Definition: proto.c:1187

logicalrep_read_begin
void logicalrep_read_begin(StringInfo in, LogicalRepBeginData *begin_data)
Definition: proto.c:63

logicalrep_read_commit_prepared
void logicalrep_read_commit_prepared(StringInfo in, LogicalRepCommitPreparedTxnData *prepare_data)
Definition: proto.c:267

logicalrep_read_rel
LogicalRepRelation * logicalrep_read_rel(StringInfo in)
Definition: proto.c:698

logicalrep_message_type
const char * logicalrep_message_type(LogicalRepMsgType action)
Definition: proto.c:1212

logicalrep_read_stream_prepare
void logicalrep_read_stream_prepare(StringInfo in, LogicalRepPreparedTxnData *prepare_data)
Definition: proto.c:365

logicalrep_read_stream_commit
TransactionId logicalrep_read_stream_commit(StringInfo in, LogicalRepCommitData *commit_data)
Definition: proto.c:1132

logicalrep_read_insert
LogicalRepRelId logicalrep_read_insert(StringInfo in, LogicalRepTupleData *newtup)
Definition: proto.c:428

logicalrep_read_prepare
void logicalrep_read_prepare(StringInfo in, LogicalRepPreparedTxnData *prepare_data)
Definition: proto.c:228

logicalrep_read_stream_start
TransactionId logicalrep_read_stream_start(StringInfo in, bool *first_segment)
Definition: proto.c:1082

PqReplMsg_WALData
#define PqReplMsg_WALData
Definition: protocol.h:77

PqReplMsg_PrimaryStatusRequest
#define PqReplMsg_PrimaryStatusRequest
Definition: protocol.h:83

PqReplMsg_Keepalive
#define PqReplMsg_Keepalive
Definition: protocol.h:75

PqReplMsg_PrimaryStatusUpdate
#define PqReplMsg_PrimaryStatusUpdate
Definition: protocol.h:76

PqReplMsg_StandbyStatusUpdate
#define PqReplMsg_StandbyStatusUpdate
Definition: protocol.h:84

newsub
static color newsub(struct colormap *cm, color co)
Definition: regc_color.c:389

rel.h

RelationGetRelid
#define RelationGetRelid(relation)
Definition: rel.h:515

RelationIsLogicallyLogged
#define RelationIsLogicallyLogged(relation)
Definition: rel.h:711

RelationGetDescr
#define RelationGetDescr(relation)
Definition: rel.h:541

RelationGetRelationName
#define RelationGetRelationName(relation)
Definition: rel.h:549

RELATION_IS_OTHER_TEMP
#define RELATION_IS_OTHER_TEMP(relation)
Definition: rel.h:668

RelationGetNamespace
#define RelationGetNamespace(relation)
Definition: rel.h:556

RelationGetIndexList
List * RelationGetIndexList(Relation relation)
Definition: relcache.c:4836

TopTransactionResourceOwner
ResourceOwner TopTransactionResourceOwner
Definition: resowner.c:175

CurrentResourceOwner
ResourceOwner CurrentResourceOwner
Definition: resowner.c:173

build_column_default
Node * build_column_default(Relation rel, int attrno)
Definition: rewriteHandler.c:1228

rewriteHandler.h

check_enable_rls
int check_enable_rls(Oid relid, Oid checkAsUser, bool noError)
Definition: rls.c:52

rls.h

RLS_ENABLED
@ RLS_ENABLED
Definition: rls.h:45

slot.h

GetTransactionSnapshot
Snapshot GetTransactionSnapshot(void)
Definition: snapmgr.c:271

PushActiveSnapshot
void PushActiveSnapshot(Snapshot snapshot)
Definition: snapmgr.c:680

PopActiveSnapshot
void PopActiveSnapshot(void)
Definition: snapmgr.c:773

snapmgr.h

SpinLockRelease
#define SpinLockRelease(lock)
Definition: spin.h:61

SpinLockAcquire
#define SpinLockAcquire(lock)
Definition: spin.h:59

logicalrep_partmap_reset_relmap
void logicalrep_partmap_reset_relmap(LogicalRepRelation *remoterel)
Definition: relation.c:584

logicalrep_partition_open
LogicalRepRelMapEntry * logicalrep_partition_open(LogicalRepRelMapEntry *root, Relation partrel, AttrMap *map)
Definition: relation.c:646

IsIndexUsableForReplicaIdentityFull
bool IsIndexUsableForReplicaIdentityFull(Relation idxrel, AttrMap *attrmap)
Definition: relation.c:834

GetRelationIdentityOrPK
Oid GetRelationIdentityOrPK(Relation rel)
Definition: relation.c:904

logicalrep_relmap_update
void logicalrep_relmap_update(LogicalRepRelation *remoterel)
Definition: relation.c:164

logicalrep_rel_close
void logicalrep_rel_close(LogicalRepRelMapEntry *rel, LOCKMODE lockmode)
Definition: relation.c:517

logicalrep_rel_open
LogicalRepRelMapEntry * logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
Definition: relation.c:361

makeStringInfo
StringInfo makeStringInfo(void)
Definition: stringinfo.c:72

resetStringInfo
void resetStringInfo(StringInfo str)
Definition: stringinfo.c:126

initReadOnlyStringInfo
static void initReadOnlyStringInfo(StringInfo str, char *data, int len)
Definition: stringinfo.h:157

ApplyErrorCallbackArg
Definition: worker.c:324

ApplyErrorCallbackArg::remote_xid
TransactionId remote_xid
Definition: worker.c:330

ApplyErrorCallbackArg::remote_attnum
int remote_attnum
Definition: worker.c:329

ApplyErrorCallbackArg::command
LogicalRepMsgType command
Definition: worker.c:325

ApplyErrorCallbackArg::finish_lsn
XLogRecPtr finish_lsn
Definition: worker.c:331

ApplyErrorCallbackArg::origin_name
char * origin_name
Definition: worker.c:332

ApplyErrorCallbackArg::rel
LogicalRepRelMapEntry * rel
Definition: worker.c:326

ApplyExecutionData
Definition: worker.c:311

ApplyExecutionData::targetRelInfo
ResultRelInfo * targetRelInfo
Definition: worker.c:315

ApplyExecutionData::estate
EState * estate
Definition: worker.c:312

ApplyExecutionData::proute
PartitionTupleRouting * proute
Definition: worker.c:319

ApplyExecutionData::mtstate
ModifyTableState * mtstate
Definition: worker.c:318

ApplyExecutionData::targetRel
LogicalRepRelMapEntry * targetRel
Definition: worker.c:314

ApplySubXactData
Definition: worker.c:538

ApplySubXactData::nsubxacts
uint32 nsubxacts
Definition: worker.c:539

ApplySubXactData::nsubxacts_max
uint32 nsubxacts_max
Definition: worker.c:540

ApplySubXactData::subxacts
SubXactInfo * subxacts
Definition: worker.c:542

ApplySubXactData::subxact_last
TransactionId subxact_last
Definition: worker.c:541

AttrMap
Definition: attmap.h:35

AttrMap::maplen
int maplen
Definition: attmap.h:37

AttrMap::attnums
AttrNumber * attnums
Definition: attmap.h:36

BufFile
Definition: buffile.c:71

CompactAttribute
Definition: tupdesc.h:69

CompactAttribute::attgenerated
bool attgenerated
Definition: tupdesc.h:78

CompactAttribute::attisdropped
bool attisdropped
Definition: tupdesc.h:77

ConflictTupleInfo
Definition: conflict.h:70

ConflictTupleInfo::ts
TimestampTz ts
Definition: conflict.h:78

ConflictTupleInfo::origin
RepOriginId origin
Definition: conflict.h:77

ConflictTupleInfo::xmin
TransactionId xmin
Definition: conflict.h:75

ConflictTupleInfo::slot
TupleTableSlot * slot
Definition: conflict.h:71

EPQState
Definition: execnodes.h:1299

EState
Definition: execnodes.h:655

EState::es_rteperminfos
List * es_rteperminfos
Definition: execnodes.h:668

EState::es_tupleTable
List * es_tupleTable
Definition: execnodes.h:712

EState::es_opened_result_relations
List * es_opened_result_relations
Definition: execnodes.h:688

EState::es_output_cid
CommandId es_output_cid
Definition: execnodes.h:682

ErrorContextCallback
Definition: elog.h:296

ErrorContextCallback::previous
struct ErrorContextCallback * previous
Definition: elog.h:297

ErrorContextCallback::callback
void(* callback)(void *arg)
Definition: elog.h:298

ExprContext
Definition: execnodes.h:268

ExprState
Definition: execnodes.h:85

Expr
Definition: primnodes.h:189

FileSet
Definition: fileset.h:23

FlushPosition
Definition: worker.c:302

FlushPosition::node
dlist_node node
Definition: worker.c:303

FlushPosition::remote_end
XLogRecPtr remote_end
Definition: worker.c:305

FlushPosition::local_end
XLogRecPtr local_end
Definition: worker.c:304

FullTransactionId
Definition: transam.h:66

HeapTupleData
Definition: htup.h:63

HeapTupleData::t_self
ItemPointerData t_self
Definition: htup.h:65

HeapTupleData::t_data
HeapTupleHeader t_data
Definition: htup.h:68

List
Definition: pg_list.h:54

LogicalRepBeginData
Definition: logicalproto.h:128

LogicalRepBeginData::final_lsn
XLogRecPtr final_lsn
Definition: logicalproto.h:129

LogicalRepBeginData::xid
TransactionId xid
Definition: logicalproto.h:131

LogicalRepCommitData
Definition: logicalproto.h:135

LogicalRepCommitData::end_lsn
XLogRecPtr end_lsn
Definition: logicalproto.h:137

LogicalRepCommitData::committime
TimestampTz committime
Definition: logicalproto.h:138

LogicalRepCommitData::commit_lsn
XLogRecPtr commit_lsn
Definition: logicalproto.h:136

LogicalRepCommitPreparedTxnData
Definition: logicalproto.h:157

LogicalRepCommitPreparedTxnData::xid
TransactionId xid
Definition: logicalproto.h:161

LogicalRepCommitPreparedTxnData::commit_time
TimestampTz commit_time
Definition: logicalproto.h:160

LogicalRepCommitPreparedTxnData::end_lsn
XLogRecPtr end_lsn
Definition: logicalproto.h:159

LogicalRepCommitPreparedTxnData::commit_lsn
XLogRecPtr commit_lsn
Definition: logicalproto.h:158

LogicalRepPreparedTxnData
Definition: logicalproto.h:145

LogicalRepPreparedTxnData::xid
TransactionId xid
Definition: logicalproto.h:149

LogicalRepPreparedTxnData::prepare_time
TimestampTz prepare_time
Definition: logicalproto.h:148

LogicalRepPreparedTxnData::end_lsn
XLogRecPtr end_lsn
Definition: logicalproto.h:147

LogicalRepPreparedTxnData::prepare_lsn
XLogRecPtr prepare_lsn
Definition: logicalproto.h:146

LogicalRepRelMapEntry
Definition: logicalrelation.h:20

LogicalRepRelMapEntry::localrel
Relation localrel
Definition: logicalrelation.h:32

LogicalRepRelMapEntry::remoterel
LogicalRepRelation remoterel
Definition: logicalrelation.h:21

LogicalRepRelMapEntry::localindexoid
Oid localindexoid
Definition: logicalrelation.h:35

LogicalRepRelMapEntry::updatable
bool updatable
Definition: logicalrelation.h:34

LogicalRepRelMapEntry::state
char state
Definition: logicalrelation.h:38

LogicalRepRelMapEntry::attrmap
AttrMap * attrmap
Definition: logicalrelation.h:33

LogicalRepRelMapEntry::localreloid
Oid localreloid
Definition: logicalrelation.h:31

LogicalRepRelMapEntry::statelsn
XLogRecPtr statelsn
Definition: logicalrelation.h:39

LogicalRepRelation
Definition: logicalproto.h:105

LogicalRepRelation::relkind
char relkind
Definition: logicalproto.h:114

LogicalRepRelation::attnames
char ** attnames
Definition: logicalproto.h:111

LogicalRepRelation::replident
char replident
Definition: logicalproto.h:113

LogicalRepRelation::relname
char * relname
Definition: logicalproto.h:109

LogicalRepRelation::natts
int natts
Definition: logicalproto.h:110

LogicalRepRelation::nspname
char * nspname
Definition: logicalproto.h:108

LogicalRepRollbackPreparedTxnData
Definition: logicalproto.h:174

LogicalRepRollbackPreparedTxnData::prepare_time
TimestampTz prepare_time
Definition: logicalproto.h:177

LogicalRepRollbackPreparedTxnData::prepare_end_lsn
XLogRecPtr prepare_end_lsn
Definition: logicalproto.h:175

LogicalRepRollbackPreparedTxnData::rollback_end_lsn
XLogRecPtr rollback_end_lsn
Definition: logicalproto.h:176

LogicalRepRollbackPreparedTxnData::xid
TransactionId xid
Definition: logicalproto.h:179

LogicalRepRollbackPreparedTxnData::rollback_time
TimestampTz rollback_time
Definition: logicalproto.h:178

LogicalRepStreamAbortData
Definition: logicalproto.h:187

LogicalRepStreamAbortData::abort_lsn
XLogRecPtr abort_lsn
Definition: logicalproto.h:190

LogicalRepStreamAbortData::xid
TransactionId xid
Definition: logicalproto.h:188

LogicalRepStreamAbortData::subxid
TransactionId subxid
Definition: logicalproto.h:189

LogicalRepTupleData
Definition: logicalproto.h:85

LogicalRepTupleData::colvalues
StringInfoData * colvalues
Definition: logicalproto.h:87

LogicalRepTupleData::ncols
int ncols
Definition: logicalproto.h:91

LogicalRepTupleData::colstatus
char * colstatus
Definition: logicalproto.h:89

LogicalRepTyp
Definition: logicalproto.h:120

LogicalRepWorker
Definition: worker_internal.h:39

LogicalRepWorker::last_recv_time
TimestampTz last_recv_time
Definition: worker_internal.h:107

LogicalRepWorker::type
LogicalRepWorkerType type
Definition: worker_internal.h:41

LogicalRepWorker::parallel_apply
bool parallel_apply
Definition: worker_internal.h:88

LogicalRepWorker::reply_time
TimestampTz reply_time
Definition: worker_internal.h:109

LogicalRepWorker::stream_fileset
FileSet * stream_fileset
Definition: worker_internal.h:79

LogicalRepWorker::subid
Oid subid
Definition: worker_internal.h:62

LogicalRepWorker::dbid
Oid dbid
Definition: worker_internal.h:56

LogicalRepWorker::relid
Oid relid
Definition: worker_internal.h:65

LogicalRepWorker::oldest_nonremovable_xid
TransactionId oldest_nonremovable_xid
Definition: worker_internal.h:102

LogicalRepWorker::reply_lsn
XLogRecPtr reply_lsn
Definition: worker_internal.h:108

LogicalRepWorker::last_lsn
XLogRecPtr last_lsn
Definition: worker_internal.h:105

LogicalRepWorker::last_send_time
TimestampTz last_send_time
Definition: worker_internal.h:106

LogicalRepWorker::userid
Oid userid
Definition: worker_internal.h:59

LogicalRepWorker::relmutex
slock_t relmutex
Definition: worker_internal.h:68

MemoryContextData
Definition: memnodes.h:118

ModifyTableState
Definition: execnodes.h:1402

ModifyTableState::operation
CmdType operation
Definition: execnodes.h:1404

ModifyTableState::resultRelInfo
ResultRelInfo * resultRelInfo
Definition: execnodes.h:1408

ModifyTableState::ps
PlanState ps
Definition: execnodes.h:1403

ParallelApplyWorkerInfo
Definition: worker_internal.h:206

ParallelApplyWorkerInfo::serialize_changes
bool serialize_changes
Definition: worker_internal.h:226

ParallelApplyWorkerInfo::shared
ParallelApplyWorkerShared * shared
Definition: worker_internal.h:234

ParallelApplyWorkerShared::pending_stream_count
pg_atomic_uint32 pending_stream_count
Definition: worker_internal.h:180

ParallelApplyWorkerShared::xid
TransactionId xid
Definition: worker_internal.h:159

ParallelApplyWorkerShared::last_commit_end
XLogRecPtr last_commit_end
Definition: worker_internal.h:186

PartitionTupleRouting
Definition: execPartition.c:92

PlanState::plan
Plan * plan
Definition: execnodes.h:1165

PlanState::state
EState * state
Definition: execnodes.h:1167

RTEPermissionInfo
Definition: parsenodes.h:1317

RTEPermissionInfo::updatedCols
Bitmapset * updatedCols
Definition: parsenodes.h:1326

RangeTblEntry
Definition: parsenodes.h:1058

RangeTblEntry::rtekind
RTEKind rtekind
Definition: parsenodes.h:1078

RelationData
Definition: rel.h:56

RelationData::rd_rel
Form_pg_class rd_rel
Definition: rel.h:111

ResourceOwnerData
Definition: resowner.c:113

ResultRelInfo
Definition: execnodes.h:473

ResultRelInfo::ri_PartitionTupleSlot
TupleTableSlot * ri_PartitionTupleSlot
Definition: execnodes.h:619

ResultRelInfo::ri_onConflictArbiterIndexes
List * ri_onConflictArbiterIndexes
Definition: execnodes.h:580

ResultRelInfo::ri_RelationDesc
Relation ri_RelationDesc
Definition: execnodes.h:480

ResultRelInfo::ri_IndexRelationDescs
RelationPtr ri_IndexRelationDescs
Definition: execnodes.h:486

RetainDeadTuplesData
Definition: worker.c:402

RetainDeadTuplesData::flushpos_update_time
TimestampTz flushpos_update_time
Definition: worker.c:432

RetainDeadTuplesData::remote_oldestxid
FullTransactionId remote_oldestxid
Definition: worker.c:412

RetainDeadTuplesData::remote_wait_for
FullTransactionId remote_wait_for
Definition: worker.c:428

RetainDeadTuplesData::last_recv_time
TimestampTz last_recv_time
Definition: worker.c:443

RetainDeadTuplesData::candidate_xid_time
TimestampTz candidate_xid_time
Definition: worker.c:444

RetainDeadTuplesData::table_sync_wait_time
long table_sync_wait_time
Definition: worker.c:436

RetainDeadTuplesData::remote_nextxid
FullTransactionId remote_nextxid
Definition: worker.c:419

RetainDeadTuplesData::phase
RetainDeadTuplesPhase phase
Definition: worker.c:403

RetainDeadTuplesData::remote_lsn
XLogRecPtr remote_lsn
Definition: worker.c:404

RetainDeadTuplesData::reply_time
TimestampTz reply_time
Definition: worker.c:421

RetainDeadTuplesData::candidate_xid
TransactionId candidate_xid
Definition: worker.c:430

RetainDeadTuplesData::xid_advance_interval
int xid_advance_interval
Definition: worker.c:445

StringInfoData
Definition: stringinfo.h:47

StringInfoData::cursor
int cursor
Definition: stringinfo.h:51

StringInfoData::data
char * data
Definition: stringinfo.h:48

StringInfoData::len
int len
Definition: stringinfo.h:49

SubXactInfo
Definition: worker.c:530

SubXactInfo::offset
off_t offset
Definition: worker.c:533

SubXactInfo::xid
TransactionId xid
Definition: worker.c:531

SubXactInfo::fileno
int fileno
Definition: worker.c:532

Subscription
Definition: pg_subscription.h:122

Subscription::origin
char * origin
Definition: pg_subscription.h:159

Subscription::passwordrequired
bool passwordrequired
Definition: pg_subscription.h:140

Subscription::disableonerr
bool disableonerr
Definition: pg_subscription.h:137

Subscription::conninfo
char * conninfo
Definition: pg_subscription.h:155

Subscription::stream
char stream
Definition: pg_subscription.h:134

Subscription::synccommit
char * synccommit
Definition: pg_subscription.h:157

Subscription::runasowner
bool runasowner
Definition: pg_subscription.h:141

Subscription::dbid
Oid dbid
Definition: pg_subscription.h:124

Subscription::binary
bool binary
Definition: pg_subscription.h:132

Subscription::owner
Oid owner
Definition: pg_subscription.h:129

Subscription::retaindeadtuples
bool retaindeadtuples
Definition: pg_subscription.h:146

Subscription::enabled
bool enabled
Definition: pg_subscription.h:131

Subscription::ownersuperuser
bool ownersuperuser
Definition: pg_subscription.h:130

Subscription::skiplsn
XLogRecPtr skiplsn
Definition: pg_subscription.h:126

Subscription::maxretention
int32 maxretention
Definition: pg_subscription.h:148

Subscription::twophasestate
char twophasestate
Definition: pg_subscription.h:136

Subscription::slotname
char * slotname
Definition: pg_subscription.h:156

Subscription::name
char * name
Definition: pg_subscription.h:128

Subscription::retentionactive
bool retentionactive
Definition: pg_subscription.h:151

Subscription::publications
List * publications
Definition: pg_subscription.h:158

Subscription::oid
Oid oid
Definition: pg_subscription.h:123

TupleConversionMap
Definition: tupconvert.h:25

TupleConversionMap::attrMap
AttrMap * attrMap
Definition: tupconvert.h:28

TupleDescData
Definition: tupdesc.h:136

TupleDescData::natts
int natts
Definition: tupdesc.h:137

TupleTableSlot
Definition: tuptable.h:114

TupleTableSlot::tts_tupleDescriptor
TupleDesc tts_tupleDescriptor
Definition: tuptable.h:122

TupleTableSlot::tts_isnull
bool * tts_isnull
Definition: tuptable.h:126

TupleTableSlot::tts_values
Datum * tts_values
Definition: tuptable.h:124

UserContext
Definition: usercontext.h:16

WalRcvStreamOptions
Definition: walreceiver.h:168

WalReceiverConn
Definition: libpqwalreceiver.c:46

dlist_head
Definition: ilist.h:152

dlist_mutable_iter
Definition: ilist.h:199

dlist_mutable_iter::cur
dlist_node * cur
Definition: ilist.h:200

dlist_node
Definition: ilist.h:138

options
Definition: oid2name.c:30

CheckSubDeadTupleRetention
void CheckSubDeadTupleRetention(bool check_guc, bool sub_disabled, int elevel_for_sub_disabled, bool retain_dead_tuples, bool retention_active, bool max_retention_set)
Definition: subscriptioncmds.c:2809

subscriptioncmds.h

ProcessSyncingRelations
void ProcessSyncingRelations(XLogRecPtr current_lsn)
Definition: syncutils.c:155

InvalidateSyncingRelStates
void InvalidateSyncingRelStates(Datum arg, int cacheid, uint32 hashvalue)
Definition: syncutils.c:101

FirstLowInvalidHeapAttributeNumber
#define FirstLowInvalidHeapAttributeNumber
Definition: sysattr.h:27

ReleaseSysCache
void ReleaseSysCache(HeapTuple tuple)
Definition: syscache.c:264

SearchSysCache1
HeapTuple SearchSysCache1(int cacheId, Datum key1)
Definition: syscache.c:220

syscache.h

SearchSysCacheCopy1
#define SearchSysCacheCopy1(cacheId, key1)
Definition: syscache.h:91

table_close
void table_close(Relation relation, LOCKMODE lockmode)
Definition: table.c:126

table_open
Relation table_open(Oid relationId, LOCKMODE lockmode)
Definition: table.c:40

table.h

table_slot_create
TupleTableSlot * table_slot_create(Relation relation, List **reglist)
Definition: tableam.c:92

tableam.h

ExecuteTruncateGuts
void ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged, DropBehavior behavior, bool restart_seqs, bool run_as_table_owner)
Definition: tablecmds.c:1975

tablecmds.h

AllTablesyncsReady
bool AllTablesyncsReady(void)
Definition: tablesync.c:1600

HasSubscriptionTablesCached
bool HasSubscriptionTablesCached(void)
Definition: tablesync.c:1630

UpdateTwoPhaseState
void UpdateTwoPhaseState(Oid suboid, char new_state)
Definition: tablesync.c:1651

tcopprot.h

InvalidTransactionId
#define InvalidTransactionId
Definition: transam.h:31

FullTransactionIdPrecedesOrEquals
#define FullTransactionIdPrecedesOrEquals(a, b)
Definition: transam.h:52

TransactionIdPrecedesOrEquals
static bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
Definition: transam.h:282

FullTransactionIdFromU64
static FullTransactionId FullTransactionIdFromU64(uint64 value)
Definition: transam.h:81

TransactionIdEquals
#define TransactionIdEquals(id1, id2)
Definition: transam.h:43

TransactionIdIsValid
#define TransactionIdIsValid(xid)
Definition: transam.h:41

InvalidFullTransactionId
#define InvalidFullTransactionId
Definition: transam.h:56

FullTransactionIdIsValid
#define FullTransactionIdIsValid(x)
Definition: transam.h:55

TransactionIdPrecedes
static bool TransactionIdPrecedes(TransactionId id1, TransactionId id2)
Definition: transam.h:263

AfterTriggerEndQuery
void AfterTriggerEndQuery(EState *estate)
Definition: trigger.c:5124

AfterTriggerBeginQuery
void AfterTriggerBeginQuery(void)
Definition: trigger.c:5104

trigger.h

convert_tuples_by_name
TupleConversionMap * convert_tuples_by_name(TupleDesc indesc, TupleDesc outdesc)
Definition: tupconvert.c:103

execute_attr_map_slot
TupleTableSlot * execute_attr_map_slot(AttrMap *attrMap, TupleTableSlot *in_slot, TupleTableSlot *out_slot)
Definition: tupconvert.c:193

TupleDescAttr
static FormData_pg_attribute * TupleDescAttr(TupleDesc tupdesc, int i)
Definition: tupdesc.h:160

TupleDescCompactAttr
static CompactAttribute * TupleDescCompactAttr(TupleDesc tupdesc, int i)
Definition: tupdesc.h:175

ExecClearTuple
static TupleTableSlot * ExecClearTuple(TupleTableSlot *slot)
Definition: tuptable.h:457

slot_getallattrs
static void slot_getallattrs(TupleTableSlot *slot)
Definition: tuptable.h:371

ExecCopySlot
static TupleTableSlot * ExecCopySlot(TupleTableSlot *dstslot, TupleTableSlot *srcslot)
Definition: tuptable.h:524

TwoPhaseTransactionGid
void TwoPhaseTransactionGid(Oid subid, TransactionId xid, char *gid_res, int szgid)
Definition: twophase.c:2747

LookupGXact
bool LookupGXact(const char *gid, XLogRecPtr prepare_end_lsn, TimestampTz origin_prepare_timestamp)
Definition: twophase.c:2688

FinishPreparedTransaction
void FinishPreparedTransaction(const char *gid, bool isCommit)
Definition: twophase.c:1497

twophase.h

ListCell
Definition: pg_list.h:46

unistd.h

SwitchToUntrustedUser
void SwitchToUntrustedUser(Oid userid, UserContext *context)
Definition: usercontext.c:33

RestoreUserContext
void RestoreUserContext(UserContext *context)
Definition: usercontext.c:87

usercontext.h

TimestampTzPlusMilliseconds
#define TimestampTzPlusMilliseconds(tz, ms)
Definition: timestamp.h:85

type
const char * type
Definition: wait_event_funcs.c:27

WL_SOCKET_READABLE
#define WL_SOCKET_READABLE
Definition: waiteventset.h:35

WL_TIMEOUT
#define WL_TIMEOUT
Definition: waiteventset.h:37

WL_EXIT_ON_PM_DEATH
#define WL_EXIT_ON_PM_DEATH
Definition: waiteventset.h:39

WL_LATCH_SET
#define WL_LATCH_SET
Definition: waiteventset.h:34

reply_message
static StringInfoData reply_message
Definition: walreceiver.c:132

wal_receiver_status_interval
int wal_receiver_status_interval
Definition: walreceiver.c:88

wal_receiver_timeout
int wal_receiver_timeout
Definition: walreceiver.c:89

walreceiver.h

walrcv_startstreaming
#define walrcv_startstreaming(conn, options)
Definition: walreceiver.h:451

walrcv_connect
#define walrcv_connect(conninfo, replication, logical, must_use_password, appname, err)
Definition: walreceiver.h:435

walrcv_send
#define walrcv_send(conn, buffer, nbytes)
Definition: walreceiver.h:457

walrcv_server_version
#define walrcv_server_version(conn)
Definition: walreceiver.h:447

walrcv_endstreaming
#define walrcv_endstreaming(conn, next_tli)
Definition: walreceiver.h:453

walrcv_identify_system
#define walrcv_identify_system(conn, primary_tli)
Definition: walreceiver.h:443

walrcv_receive
#define walrcv_receive(conn, buffer, wait_fd)
Definition: walreceiver.h:455

WalWriterDelay
int WalWriterDelay
Definition: walwriter.c:70

walwriter.h

SIGHUP
#define SIGHUP
Definition: win32_port.h:158

worker_internal.h

PARALLEL_TRANS_STARTED
@ PARALLEL_TRANS_STARTED
Definition: worker_internal.h:123

PARALLEL_TRANS_FINISHED
@ PARALLEL_TRANS_FINISHED
Definition: worker_internal.h:124

am_parallel_apply_worker
static bool am_parallel_apply_worker(void)
Definition: worker_internal.h:389

WORKERTYPE_TABLESYNC
@ WORKERTYPE_TABLESYNC
Definition: worker_internal.h:32

WORKERTYPE_UNKNOWN
@ WORKERTYPE_UNKNOWN
Definition: worker_internal.h:31

WORKERTYPE_SEQUENCESYNC
@ WORKERTYPE_SEQUENCESYNC
Definition: worker_internal.h:33

WORKERTYPE_PARALLEL_APPLY
@ WORKERTYPE_PARALLEL_APPLY
Definition: worker_internal.h:35

WORKERTYPE_APPLY
@ WORKERTYPE_APPLY
Definition: worker_internal.h:34

FS_SERIALIZE_DONE
@ FS_SERIALIZE_DONE
Definition: worker_internal.h:147

am_sequencesync_worker
static bool am_sequencesync_worker(void)
Definition: worker_internal.h:376

am_tablesync_worker
static bool am_tablesync_worker(void)
Definition: worker_internal.h:370

am_leader_apply_worker
static bool am_leader_apply_worker(void)
Definition: worker_internal.h:382

IsTransactionOrTransactionBlock
bool IsTransactionOrTransactionBlock(void)
Definition: xact.c:5007

PrepareTransactionBlock
bool PrepareTransactionBlock(const char *gid)
Definition: xact.c:4010

IsTransactionState
bool IsTransactionState(void)
Definition: xact.c:388

CommandCounterIncrement
void CommandCounterIncrement(void)
Definition: xact.c:1101

StartTransactionCommand
void StartTransactionCommand(void)
Definition: xact.c:3077

SetCurrentStatementStartTimestamp
void SetCurrentStatementStartTimestamp(void)
Definition: xact.c:915

IsTransactionBlock
bool IsTransactionBlock(void)
Definition: xact.c:4989

BeginTransactionBlock
void BeginTransactionBlock(void)
Definition: xact.c:3942

CommitTransactionCommand
void CommitTransactionCommand(void)
Definition: xact.c:3175

EndTransactionBlock
bool EndTransactionBlock(bool chain)
Definition: xact.c:4062

AbortOutOfAnyTransaction
void AbortOutOfAnyTransaction(void)
Definition: xact.c:4880

GetCurrentCommandId
CommandId GetCurrentCommandId(bool used)
Definition: xact.c:830

xact.h

GIDSIZE
#define GIDSIZE
Definition: xact.h:31

GetFlushRecPtr
XLogRecPtr GetFlushRecPtr(TimeLineID *insertTLI)
Definition: xlog.c:6571

XactLastCommitEnd
XLogRecPtr XactLastCommitEnd
Definition: xlog.c:257

XLogRecPtrIsValid
#define XLogRecPtrIsValid(r)
Definition: xlogdefs.h:29

LSN_FORMAT_ARGS
#define LSN_FORMAT_ARGS(lsn)
Definition: xlogdefs.h:47

RepOriginId
uint16 RepOriginId
Definition: xlogdefs.h:69

XLogRecPtr
uint64 XLogRecPtr
Definition: xlogdefs.h:21

InvalidXLogRecPtr
#define InvalidXLogRecPtr
Definition: xlogdefs.h:28

TimeLineID
uint32 TimeLineID
Definition: xlogdefs.h:63