I know that postgres uses an LRU/clock sweep algorithm to evict data from cache, but having a hard time understanding how it goes into shared_buffers.
Please note that my intention is not to make this naive query faster, an index is always the best option.But I want to understand how the cache works in the absence of indexes.
Lets take the below query execution plan from example (I have purposely not included/created an index)
performance_test=# explain (analyze,buffers) select count(*) from users;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=48214.95..48214.96 rows=1 width=0) (actual time=3874.445..3874.445 rows=1 loops=1)
Buffers: shared read=35715
-> Seq Scan on users (cost=0.00..45714.96 rows=999996 width=0) (actual time=6.024..3526.606 rows=1000000 loops=1)
Buffers: shared read=35715
Planning time: 0.114 ms
Execution time: 3874.509 ms
We can see that all of the data was fetched from disk, i.e shared read = 35715.
Now if we executed the same query again.
performance_test=# explain (analyze,buffers) select count(*) from users;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=48214.95..48214.96 rows=1 width=0) (actual time=426.385..426.385 rows=1 loops=1)
Buffers: shared hit=32 read=35683
-> Seq Scan on users (cost=0.00..45714.96 rows=999996 width=0) (actual time=0.036..285.363 rows=1000000 loops=1)
Buffers: shared hit=32 read=35683
Planning time: 0.048 ms
Execution time: 426.431 ms
Only 32 pages/blocks came into memory.When repeating this, the shared hit keeps increasing by 32.
performance_test=# explain (analyze,buffers) select count(*) from users;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=48214.95..48214.96 rows=1 width=0) (actual time=416.829..416.829 rows=1 loops=1)
Buffers: shared hit=64 read=35651
-> Seq Scan on users (cost=0.00..45714.96 rows=999996 width=0) (actual time=0.034..273.417 rows=1000000 loops=1)
Buffers: shared hit=64 read=35651
Planning time: 0.050 ms
Execution time: 416.874 ms
My shared_buffers = 1GB and the table size is 279MB. So the whole of the table can be cached in memory, but this is not the case and the cache works a bit differently.Can someone explain how it plans and moves the data from disk to shared_buffers.
Is there a mechanism, which controls how much pages can move into the shared_buffers with each query.