1

Huge pile of WAL files are generated in Master-Standby replication. walfiles are archived at one of the standby node and every 2 hour, we are using tar to compress the archived WALs in standby node. Still, it becomes a huge size to store. When it comes to 30, 90 days' backup it becomes a huge storage issue. Also, ends up taking more time to download and replay the WAL's during restoration.

I have used the below options.

wal_level=replica
wal_compression=on
archive_mode = always

And below parameters are commented/not used.

archive_timeout
checkpoint_timeout

Is there any other way, we can reduce the number of WAL's generated or an easier way to manage them? pg_waldump is showing around 70-90% of the data is full page images.

Also, Can I make above parameters in effect by changing in standby node? Is standby archiving the same WAL's sent by the master? OR is it regenerating based on standby's configuration?

-- Update: Modified to below values

        name        | setting | unit
--------------------+---------+------
 archive_timeout    | 0       | s
 checkpoint_timeout | 3600    | s
 checkpoint_warning | 3600    | s
 max_wal_size       | 4000    | MB
 min_wal_size       | 2000    | MB
 shared_buffers     | 458752  | 8kB
 wal_buffers        | 4096    | 8kB
 wal_compression    | on      |
 wal_level          | replica |

still seeing 3-4 WAL files generated every minute. I am making these changes on hot standby node(From where backup is taken). Should I change this in Master? Does master settings have affect on Standby's WAL generation?

Example pg_waldump showing FPI size=87%

pg_waldump --stats 0000000100000498000000B2
Type                                           N      (%)          Record size      (%)             FPI size      (%)        Combined size      (%)
----                                           -      ---          -----------      ---             --------      ---        -------------      ---
XLOG                                           1 (  0.00)                  114 (  0.01)                    0 (  0.00)                  114 (  0.00)
Transaction                                 3070 ( 10.35)               104380 (  4.86)                    0 (  0.00)               104380 (  0.63)
Storage                                        0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
CLOG                                           0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
Database                                       0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
Tablespace                                     0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
MultiXact                                      0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
RelMap                                         0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
Standby                                        2 (  0.01)                  100 (  0.00)                    0 (  0.00)                  100 (  0.00)
Heap2                                        590 (  1.99)                33863 (  1.58)                46192 (  0.32)                80055 (  0.48)
Heap                                        6679 ( 22.51)               578232 ( 26.92)              4482508 ( 30.92)              5060740 ( 30.41)
Btree                                      19330 ( 65.14)              1430918 ( 66.62)              9967524 ( 68.76)             11398442 ( 68.48)
Hash                                           0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
Gin                                            0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
Gist                                           0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
Sequence                                       0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
SPGist                                         0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
BRIN                                           0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
CommitTs                                       4 (  0.01)                  120 (  0.01)                    0 (  0.00)                  120 (  0.00)
ReplicationOrigin                              0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
Generic                                        0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
LogicalMessage                                 0 (  0.00)                    0 (  0.00)                    0 (  0.00)                    0 (  0.00)
                                        --------                      --------                      --------                      --------
Total                                      29676                       2147727 [12.90%]             14496224 [87.10%]             16643951 [100%]

After using log_checkpoints=on

2022-06-15 07:08:57 UTC [11] LOG:  checkpoint starting: time
2022-06-15 07:29:57 UTC [11] LOG:  checkpoint complete: wrote 67010 buffers (14.6%); 0 WAL file(s) added, 12 removed, 56 recycled; write=1259.767 s, sync=0.010 s, total=1259.961 s; sync files=253, longest=0.003 s, average=0.001 s; distance=1125728 kB, estimate=2176006 kB
2022-06-15 07:38:57 UTC [11] LOG:  checkpoint starting: time
2022-06-15 07:59:57 UTC [11] LOG:  checkpoint complete: wrote 61886 buffers (13.5%); 0 WAL file(s) added, 20 removed, 10 recycled; write=1259.740 s, sync=0.005 s, total=1259.878 s; sync files=185, longest=0.002 s, average=0.001 s; distance=491822 kB, estimate=2007588 kB
4
  • 2
    Many data changes lead to much WAL, that's life. You can increase max_wal_size and checkpoint_timeout to reduce the number of checkpoints and full page images in the WAL, which will reduce the amount of WAL somewhat at the price of longer crash recovery. Commented Jun 10, 2022 at 1:13
  • @LaurenzAlbe checkpoint_timeout not set. based on the number of WALs, I think none of the WALs are empty. none of them are generated because a checkpoint was reached. by the way I reached here cybertec-postgresql.com/en/… and enabled wal_compression=on. I am already using tar to keep them compresses. Need to see the difference. Thank you ! Commented Jun 10, 2022 at 5:27
  • A checkpoint does not cause a WAL switch. The intention of my suggestion is to get fewer full 8kB page images in the WAL. The first time a page is dirtied after a checkpoint, the whole page it written to WAL. Commented Jun 10, 2022 at 5:33
  • @LaurenzAlbe Got it. Is there any thumb rule or any rule to set a decent value for checkpoint_timeout? pg_waldump showing around 70-90 % of data is FPI. Commented Jun 10, 2022 at 6:01

3 Answers 3

4

wal_compression=on

This may be counter-productive. This type of compression needs to compress each WAL record in isolation, without the larger context. So this is not very effective. However, when you then recompress whole WAL files offline where they do have access to the larger context, the first round of attempted compression interferes with the better-situated compression attempt.

For example, if I take the WAL from 1,000,000 pgbench transactions, they occupy 889192448 raw bytes without wal_compression, and 637534208 with it.

But then after passing them through 'xz' (a very slow but very thorough compressor), the first set takes 129393020 bytes but the 2nd one takes 155769400. So turning on compression too soon cost me 20% more space.

You could use pg_waldump --stat ... on some WAL files to see what is actually in them. If it is mostly FPI, then you could try to make the checkpoints further apart to reduce the FPI frequency. But if you don't have much FPI to start with, that would be ineffective. If you can isolate what is causing so much WAL maybe you can do something about it. For example if you do a lot of degenerate updates where a column is set to the same value it already had, adding a WHERE to suppress those cases could spare you a lot of WAL generation.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for pointing to pg_waldump. Nice tool. As per pg_waldump, FPI size on each WAL is around 70%-90%. Does this mean, checkpoints should be further apart? unnecessarily WALs are generated before enough data is generated on DB?
1

WALs being generated are a reflection of your primary machine activity. Increasing checkpoint_timeout will help reduce your overall machine activity making it easier to process the WAL logs.

Standby Archiving is the processing the logs as sent by the Primary. They are binary identical. Is it a cold standby or are you processing logs on the standby as they are sent?

2 Comments

It's a hot standby. As soon as any changes appear in primary, it is available on standby as well. So the archived logs I am getting is newly created by standby OR the same shipped by primary?
They are the same shipped by the primary.
1

Since a high percentage of your WAL consists of full page images, you can reduce the amount of WAL considerably by having checkpoints less often. A full page image is written to WAL whenever a page becomes dirty for the first time after a checkpoint. The price you have to pay is a longer crash recovery time.

To reduce the rate of checkpoints, change these parameters:

  • checkpoint_timeout (default 5 minutes): set it to something high like 1 hour

  • max_wal_size (default 1GB): set it higher than the amount of WAL that is written withing one hour to match the checkpoint_timeout setting

These settings have to be made on the primary server, where WAL is generated, not on the standby. Best practice is to use the same settings on both servers.

6 Comments

I configured checkpoint_timeout=3600 and max_wal_size=4G. Restarted docker running psql. Still I see a multiple WAL files are generated every minute. 3-4 files of 16MB are created in a minute. Isn't this abnormal? Also, I used below command pg_waldump --stats 0000000100000385000000EF and got FPI as 70-90% Should I specify LSN instead?
sorry, my bad. .conf had parameter set as 4GB. But terminal showing ``` name | max_wal_size setting | 4096 unit | MB ```
I have update the question with more details after the changes. Still 3-4 WAL files in every minute.
You have to change it on the primary. Try with max_wal_size = 10GB to be on the safe side. Use log_checkpoints = on to see how often you get a checkpoint. The number of full page images should decrease over time.
I have increased values to max_wal_size = 8GB and checkpoint_timeout=1800. Still I am seeing multiple walfiles in a minute. And example WALfile showing FPI around 80-% I used below command: ` pg_waldump --stats 00000001000003BE000000CB ` And got ` FPI size=14070852 [84.65%]`.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.