3

I have a function in Postgres 8.3.5 that selects data from multiple tables and dumps the result in a single table:

create or replace function test_function_2(startdate timestamp, enddate timestamp)
returns void as $$
begin
     delete from cl_final_report;

     INSERT INTO cl_final_report
     SELECT 
         b.batchkey AS batchnumber, 
         pv.productkey, 
         p.name AS productname, 
         avg(r.value) AS avgchemlean, 
         sum(r.auxvalue) AS totalweight, 
         max(o.time) AS timecompleted
     FROM result r
     LEFT JOIN physicalvalue pv ON r.physicalvaluekey = pv.physicalvaluekey
     LEFT JOIN product p ON pv.productkey = p.productkey
     LEFT JOIN object o ON r.objectkey = o.objectkey
     LEFT JOIN batch b ON o.batchkey = b.batchkey
     WHERE pv.name = 'CL'::text AND
         and o.time between startdate and enddate
     GROUP BY b.batchkey, pv.productkey, p.name
end
$$ language plpgsql;

This function takes 113 seconds to complete using PgAdmin and executing this command:

select test_function_2('05/02/2013', '05/03/2013')

However, if I replace the input variables in the function with literals like this:

create or replace function test_function_2(startdate timestamp, enddate timestamp)
returns void as $$
begin
     delete from cl_final_report;

     INSERT INTO cl_final_report
     SELECT 
         b.batchkey AS batchnumber, 
         pv.productkey, 
         p.name AS productname, 
         avg(r.value) AS avgchemlean, 
         sum(r.auxvalue) AS totalweight, 
         max(o.time) AS timecompleted
     FROM result r
     LEFT JOIN physicalvalue pv ON r.physicalvaluekey = pv.physicalvaluekey
     LEFT JOIN product p ON pv.productkey = p.productkey
     LEFT JOIN object o ON r.objectkey = o.objectkey
     LEFT JOIN batch b ON o.batchkey = b.batchkey
     WHERE pv.name = 'CL'::text AND
         and o.time between '05/02/2013' and '05/03/2013'
     GROUP BY b.batchkey, pv.productkey, p.name
end
$$ language plpgsql;

The function executes in less than 5 seconds.

I'm new to Postgres so there's probably something I'm missing, but I can't seem to find an answer anywhere.

3
  • Do a truncate instead of delete from cl_final_report; Commented May 3, 2013 at 15:06
  • @JustBob: For small tables, DELETE is actually faster. TRUNCATE excels with bigger tables. Commented May 3, 2013 at 16:07
  • 1
    @ErwinBrandstetter I learn something new everyday, thanks. Here's a link that goes into detail on delete vs truncate stackoverflow.com/questions/11419536/…. Commented May 3, 2013 at 18:21

3 Answers 3

8

@A.H's explanation is accurate for PostgreSQL 9.1 or older. So it is applicable for the OP, who is using the outdated version 8.3.

However, PostgreSQL 9.2 brought a substantial update in this area. PL/pgSQL functions have become a lot smarter about when to replan. I am quoting the release notes for 9.2 here

E.5.3.1.3. Optimizer

Allow the planner to generate custom plans for specific parameter values even when using prepared statements (Tom Lane)

In the past, a prepared statement always had a single "generic" plan that was used for all parameter values, which was frequently much inferior to the plans used for non-prepared statements containing explicit constant values. Now, the planner attempts to generate custom plans for specific parameter values. A generic plan will only be used after custom plans have repeatedly proven to provide no benefit. This change should eliminate the performance penalties formerly seen from use of prepared statements (including non-dynamic statements in PL/pgSQL).

Bold emphasis mine.

Ergo: One solution for the OP would be to upgrade to PostgreSQL 9.2+ and everything should just work fine automatically.

Sign up to request clarification or add additional context in comments.

Comments

6

The query planner / optimizer can calculate a better plan when it has constants at hand.

When no constants are used the planner must generate a plan which is acceptable for all possible values of startdate and enddate. If the difference between these two values is very large, then a large part of the table must be fetched. In this case indizes are not used in most cases, because random access costs are higher than linear reads.

But when there are constants, then the planner can calculate based on gathered statistics, that the query will only touch a tiny fraction of the table and hence an index might be faster.

This is a common irk with the PostgreSQL query planner. The manual contains some hints in the PREPARE section (PREPARE is used internally by pl/pgsql):

In some situations, the query plan produced for a prepared statement will be inferior to the query plan that would have been chosen if the statement had been submitted and executed normally. This is because when the statement is planned and the planner attempts to determine the optimal query plan, the actual values of any parameters specified in the statement are unavailable. PostgreSQL collects statistics on the distribution of data in the table, and can use constant values in a statement to make guesses about the likely result of executing the statement. Since this data is unavailable when planning prepared statements with parameters, the chosen plan might be suboptimal. To examine the query plan PostgreSQL has chosen for a prepared statement, use EXPLAIN.

2 Comments

While this is true, there is a new chapter to the story. I added an answer.
@ErwinBrandstetter: Thanks for giving an up-to-date answer.
4

You can force a new plan on each execution if you make the query dynamic:

create or replace function test_function_2(
    startdate timestamp, enddate timestamp
) returns void as $function$
begin
    delete from cl_final_report;

    execute $$
        INSERT INTO cl_final_report
        SELECT 
            b.batchkey AS batchnumber, 
            pv.productkey, 
            p.name AS productname, 
            avg(r.value) AS avgchemlean, 
            sum(r.auxvalue) AS totalweight, 
            max(o.time) AS timecompleted
        FROM result r
        LEFT JOIN physicalvalue pv ON r.physicalvaluekey = pv.physicalvaluekey
        LEFT JOIN product p ON pv.productkey = p.productkey
        LEFT JOIN object o ON r.objectkey = o.objectkey
        LEFT JOIN batch b ON o.batchkey = b.batchkey
        WHERE pv.name = 'CL'::text AND
            and o.time between $1 and $2
        GROUP BY b.batchkey, pv.productkey, p.name
    $$ using startdate, enddate;
end;
$function$ language plpgsql;

For it to work in 8.3 without using do string concatenation:

create or replace function test_function_2(
    startdate timestamp, enddate timestamp
) returns void as $function$
begin
    delete from cl_final_report;

    execute $$
        INSERT INTO cl_final_report
        SELECT 
            b.batchkey AS batchnumber, 
            pv.productkey, 
            p.name AS productname, 
            avg(r.value) AS avgchemlean, 
            sum(r.auxvalue) AS totalweight, 
            max(o.time) AS timecompleted
        FROM result r
        LEFT JOIN physicalvalue pv ON r.physicalvaluekey = pv.physicalvaluekey
        LEFT JOIN product p ON pv.productkey = p.productkey
        LEFT JOIN object o ON r.objectkey = o.objectkey
        LEFT JOIN batch b ON o.batchkey = b.batchkey
        WHERE pv.name = 'CL'::text AND
            and o.time between '$$ || startdate || $$' and '$$ || enddate || $$'
        GROUP BY b.batchkey, pv.productkey, p.name
    $$;
end;
$function$ language plpgsql;

7 Comments

I get a syntax error near the "using" statement when I execute that.
@striking13 When you create the function or when you select the function?
@striking13 Did you copy and paste that exact code? If not what is the exact code?
I copied and pasted the exact code. Here's the error message: ERROR: syntax error at or near "using" LINE 18: $$ using $1 , $2. (The "^" is directly under the "u" in using)
@striking13 using only exists in 8.4+. What is your version?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.