2

I am using postgresql database cluster. I have an issue with low disk space. After investigation I found it is happen due to WAL file.

Due to WAL file my disc space reduce dramatically. Now I need to free up some space without loosing any data or corruption in PostgreSQL. To free up space I need to remove WAL file.

In my cluster has 2 standby nodes and one primary node. So that, without interruption I need to do something to free some space.

What are the recommended steps need to follow to remove WAL file without any interruption in my PostgreSQL cluster?

1 Answer 1

5

Don't remove WAL segments manually. Instead, find out what keeps PostgreSQL from removing them and fix that condition.

There are several possibilities:

  1. a stale replication slot (most likely)

    Find out with this query on the primary:

    SELECT slot_name,
           active,
           pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS age
    FROM pg_replication_slots;
    

    If there is a slot with a high age, that is your problem.

    Examine the standby whose slot is behind and look into its log to find out why it is not replicating. Either fix that problem so that the standby can catch up or abandon that standby by dropping the replication slot:

    SELECT pg_drop_replication_slot('bad_slot');
    
  2. the archiver got stuck

    Examine the contents of pg_stat_archiver on the primary. If that tells you that the archiver has problems, look at the log file to see detailed error messages. Fix the problem so that archiving can resume.

    If you want to stop archiving (which will break your backup!), you can set archive_command to something like /bin/true and reload.

  3. a much too high wal_keep_size/wal_keep_segments

    If that parameter on the primary is your problem, simply reduce the value and reload.

Once you have fixed the problem, WAL will get removed. That can take a while, since WAL is removed during checkpoints. You can force a checkpoint with the CHECKPOINT SQL statement.

Sign up to request clarification or add additional context in comments.

2 Comments

If standby unable to catchup primary due to bad replication slot. Then what need to do? drop slot and clone again will solve that problem? @Laurenz Albe
Yes, that is the way.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.