2

I'm using postgres v9.6.5. I have a query which seems not that complicated and was wondering why is it so "slow" (it's not really that slow, but I don't have a lot of data actually - like a few thousand rows).

Here is the query:

SELECT o0.* 
FROM "orders" AS o0 
JOIN "balances" AS b1 ON b1."id" = o0."balance_id" 
JOIN "users" AS u3 ON u3."id" = b1."user_id" 
WHERE (u3."partner_id" = 3) 
ORDER BY o0."id" DESC LIMIT 10;

And that's query plan:

Limit  (cost=0.43..12.84 rows=10 width=148) (actual time=0.062..53.866 rows=4 loops=1)
  ->  Nested Loop  (cost=0.43..4750.03 rows=3826 width=148) (actual time=0.061..53.864 rows=4 loops=1)
        Join Filter: (b1.user_id = u3.id)
        Rows Removed by Join Filter: 67404
        ->  Nested Loop  (cost=0.43..3945.32 rows=17856 width=152) (actual time=0.025..38.457 rows=16852 loops=1)
              ->  Index Scan Backward using orders_pkey on orders o0  (cost=0.29..897.80 rows=17856 width=148) (actual time=0.016..11.558 rows=16852 loops=1)
              ->  Index Scan using balances_pkey on balances b1  (cost=0.14..0.16 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=16852)
                    Index Cond: (id = o0.balance_id)
        ->  Materialize  (cost=0.00..1.19 rows=3 width=4) (actual time=0.000..0.000 rows=4 loops=16852)
              ->  Seq Scan on users u3  (cost=0.00..1.18 rows=3 width=4) (actual time=0.023..0.030 rows=4 loops=1)
                    Filter: (partner_id = 3)
                    Rows Removed by Filter: 12
Planning time: 0.780 ms
Execution time: 54.053 ms

I actually tried without LIMIT and I got quite different plan:

Sort  (cost=874.23..883.80 rows=3826 width=148) (actual time=11.361..11.362 rows=4 loops=1)
  Sort Key: o0.id DESC
  Sort Method: quicksort  Memory: 26kB
  ->  Hash Join  (cost=3.77..646.55 rows=3826 width=148) (actual time=11.300..11.346 rows=4 loops=1)
        Hash Cond: (o0.balance_id = b1.id)
        ->  Seq Scan on orders o0  (cost=0.00..537.56 rows=17856 width=148) (actual time=0.012..8.464 rows=16852 loops=1)
        ->  Hash  (cost=3.55..3.55 rows=18 width=4) (actual time=0.125..0.125 rows=24 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 9kB
              ->  Hash Join  (cost=1.21..3.55 rows=18 width=4) (actual time=0.046..0.089 rows=24 loops=1)
                    Hash Cond: (b1.user_id = u3.id)
                    ->  Seq Scan on balances b1  (cost=0.00..1.84 rows=84 width=8) (actual time=0.011..0.029 rows=96 loops=1)
                    ->  Hash  (cost=1.18..1.18 rows=3 width=4) (actual time=0.028..0.028 rows=4 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Seq Scan on users u3  (cost=0.00..1.18 rows=3 width=4) (actual time=0.014..0.021 rows=4 loops=1)
                                Filter: (partner_id = 3)
                                Rows Removed by Filter: 12
Planning time: 0.569 ms
Execution time: 11.420 ms

And also without WHERE (but with LIMIT):

Limit  (cost=0.43..4.74 rows=10 width=148) (actual time=0.023..0.066 rows=10 loops=1)
  ->  Nested Loop  (cost=0.43..7696.26 rows=17856 width=148) (actual time=0.022..0.065 rows=10 loops=1)
        Join Filter: (b1.user_id = u3.id)
        Rows Removed by Join Filter: 139
        ->  Nested Loop  (cost=0.43..3945.32 rows=17856 width=152) (actual time=0.009..0.029 rows=10 loops=1)
              ->  Index Scan Backward using orders_pkey on orders o0  (cost=0.29..897.80 rows=17856 width=148) (actual time=0.007..0.015 rows=10 loops=1)
              ->  Index Scan using balances_pkey on balances b1  (cost=0.14..0.16 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=10)
                    Index Cond: (id = o0.balance_id)
        ->  Materialize  (cost=0.00..1.21 rows=14 width=4) (actual time=0.001..0.001 rows=15 loops=10)
              ->  Seq Scan on users u3  (cost=0.00..1.14 rows=14 width=4) (actual time=0.005..0.007 rows=16 loops=1)
Planning time: 0.286 ms
Execution time: 0.097 ms

As you can see, without WHERE it's much faster. Can someone provide me with some information where can I look for explanations for those plans to better understand them? And also what can I do to make those queries faster (or I shouldn't worry cause with like 100 times more data they will still be fast enough? - 50ms is fine for me tbh)

1 Answer 1

3

PostgreSQL thinks that it will be fastest if it scans orders in the correct order until it finds a matching users entry that satisfies the WHERE condition.

However, it seems that the data distribution is such that it has to scan almost 17000 orders before it finds a match.

Since PostgreSQL doesn't know how values correlate across tables, there is nothing much you can do to change that.

You can force PostgreSQL to plan the query without the LIMIT clause like this:

SELECT *
FROM (<your query without ORDER BY and LIMIT> OFFSET 0) q
ORDER BY id DESC LIMIT 10;

With a top-N-sort this should perform better.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.