optimize query with nested selects

Question

Is it possible to optimize the following query? webdte.docto a is very large table with millions of entries and runs indexes on all queried columns. The final sort order is quite important.

SELECT 
   id_doc,
   id_tip_doc,
   id_est_doc,
   folios.nro_fol,
   seleccionable
FROM
(
   SELECT distinct(nro_fol)
   FROM webdte.docto 
   WHERE
      id_tip_doc IN
      (
         SELECT distinct(id_tip_doc)
         FROM webdte.docto
         WHERE id_doc IN
         (
            SELECT id_doc
            FROM webdte.lib_doc
            WHERE id_lib = 37
         )
      ) AND
      id_doc IN
      (
         SELECT id_doc
         FROM webdte.lib_doc
         WHERE id_lib = 37
      )
) AS folios JOIN webdte.docto AS docs ON docs.nro_fol = folios.nro_fol
ORDER BY id_tip_doc, folios.nro_fol, id_est_doc;

Sorry here is the explain for my fist query approach. the answer from Egalitarian is already good, but maybe it can be still faster?? Thank you!

Sort  (cost=13745.13..13805.42 rows=24115 width=22)"
  Sort Key: docs.id_tip_doc, docto.nro_fol, docs.id_est_doc"
  ->  Hash Join  (cost=9240.19..11492.84 rows=24115 width=22)"
        Hash Cond: (docto.nro_fol = docs.nro_fol)"
        ->  HashAggregate  (cost=4424.81..4665.91 rows=24110 width=6)"
              ->  Hash Semi Join  (cost=733.75..4364.54 rows=24110 width=6)"
                    Hash Cond: (docto.id_doc = lib_doc.id_doc)"
                    ->  Seq Scan on docto  (cost=0.00..2885.28 rows=105128 width=10)"
                    ->  Hash  (cost=432.38..432.38 rows=24110 width=4)"
                          ->  Seq Scan on lib_doc  (cost=0.00..432.38 rows=24110 width=4)"
                                Filter: (id_lib = 37)"
        ->  Hash  (cost=2885.28..2885.28 rows=105128 width=22)"
              ->  Seq Scan on docto docs  (cost=0.00..2885.28 rows=105128 width=22)"

Could you show us the results from EXPLAIN and EXPLAIN ANALYZE? Without this information, it's next to impossible to optimize the query because you can't see where the actual problems are. Only guess... — Frank Heikens
– Frank Heikens, Commented Jul 23, 2012 at 9:51

Erwin Brandstetter · Accepted Answer · 2012-07-23 13:53:15Z

1

I think you can simplify to:

SELECT id_doc
      ,id_tip_doc
      ,id_est_doc
      ,nro_fol
      ,seleccionable
FROM   webdte.docto d
WHERE  EXISTS (
   SELECT 1
   FROM   webdte.docto   d0
   JOIN   webdte.lib_doc l USING (id_doc)
   WHERE  l.id_lib = 37
   AND    d0.nro_fol = d.nro_fol
   )
ORDER  BY id_tip_doc, nro_fol, id_est_doc;

Because of EXISTS, DISTINCT should not be needed. This can speed up the query quite a bit if there are many duplicates on nro_fol.
Your original query was quite redundant.

answered Jul 23, 2012 at 13:53

Erwin Brandstetter

668k159 gold badges1.2k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Egalitarian · Accepted Answer · 2012-07-23 06:17:41Z

I think the where clause to fetch unique id_tip_doc is not of much significance as you are anyways selecting distinct(nro_fol). Though one of the best ways to optimize this query would be to use the proper indexes and then re-write the query.

You can create the following indexes(Though it also depends on your other queries) : 1. webdte.lib_doc : id_lib 2. webdte.docto : id_doc + nro_fol

select id_doc,id_tip_doc,id_est_doc,  folios.nro_fol ,seleccionable

from (select distinct(nro_fol) from webdte.docto where id_doc in (select id_doc from webdte.lib_doc where id_lib = 37) ) folios
join webdte.docto docs on docs.nro_fol = folios.nro_fol order by id_tip_doc, folios.nro_fol, id_est_doc;

Collectives™ on Stack Overflow

optimize query with nested selects

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related