11

I'm trying to optimise a PostgreSQL 8.4 query. After greatly simplifying the original query, trying to figure out what's making it choose a bad query plan, I got to the point where running the query under EXPLAIN ANALYZE takes only 0.5s, while running it normally takes 2.8s. It seems obvious then, that what EXPLAIN ANALYZE is showing me is not what it normally does, so whatever it's showing me is useless, isn't it? What is going on here and how do I see what it's really doing?

5
  • 3
    Is the query returning lots of data? My understanding is that EXPLAIN ANALYZE discards the data -- perhaps you're gaining back time not having to transfer it through a pipe or network connection? Commented Aug 6, 2010 at 2:26
  • About 75,000 rows so I wouldn't say "lots". Certainly shouldn't take much time on a LAN. Commented Aug 6, 2010 at 2:27
  • 1
    Apparently that's enough data that it takes about 1.3s (which would be about 16.25MB or approx 220KB/row) if you're achieving a transfer rate of 100Mbps Commented Aug 6, 2010 at 2:30
  • 1
    No, the rows are very small. More like 50 bytes per row. Commented Aug 6, 2010 at 3:49
  • @EMP did you ever find the answer to this? I'm seeing the same issue, explain analyze taking 40s while the query execution is taking 1m and I only have 10 rows and 3 columns of data returned, mostly integers Commented Nov 11 at 21:57

2 Answers 2

5

Most likely, the data pages are in the OS disk cache when you are manually running with EXPLAIN ANALYZE in order to try and optimize the query. When run in a normal environment, the pages probably aren't in the cache already and have to be fetched from disk, increasing the runtime.

Sign up to request clarification or add additional context in comments.

2 Comments

I don't understand - if they were in the cache when I ran EXPLAIN ANALYZE then why aren't they in there when I run without EXPLAIN immediately after?
Sorry, I misunderstood the order. Now, I would say that it's more likely that the difference is network throughput. I recommend adding a LIMIT clause and trying varying amounts of records (like 1,5,10,100,1000,10000, etc) until you reach your max and compare the times. I'm guessing it will scale roughly as "a+(t*n)" where a is your EXPLAIN ANALYZE time, t is a rough constant of rows per second transferred and n is your number of rows. Obviously, this won't be exact, but I'm guessing it would trend towards it.
5

It shows less time because:

1) The Total runtime shown by EXPLAIN ANALYZE includes executor start-up and shut-down time, as well as the time to run any triggers that are fired, but it does not include parsing, rewriting, or planning time.

2)Since no output rows are delivered to the client, network transmission costs and I/O conversion costs are not included.

Warning!

The measurement overhead added by EXPLAIN ANALYZE can be significant, especially on machines with slow gettimeofday() operating-system calls. So, it's advisable to use EXPLAIN (ANALYZE TRUE, TIMING FALSE).

1 Comment

thanks. I actually downloaded the data to my machine and still it takes about 13 seconds while in the EXPLAIN ANALYZE it's about 0.5 second. how can I optimize data writing?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.