0

I have the following Query/View:

CREATE OR REPLACE VIEW "SumAndSalesPerCountryYear" AS 
 SELECT date_part('year'::text, "Invoice"."InvoiceDate") AS year,
    "Invoice"."BillingCountry" AS country,
    sum("Invoice"."Total") AS total
   FROM "Invoice"
  GROUP BY date_part('year'::text, "Invoice"."InvoiceDate"), "Invoice"."BillingCountry"
  ORDER BY date_part('year'::text, "Invoice"."InvoiceDate") DESC, sum("Invoice"."Total") DESC;

My table structure is as follows:

CREATE TABLE "Invoice"
(
  "InvoiceId" integer NOT NULL,
  "CustomerId" integer NOT NULL,
  "InvoiceDate" timestamp without time zone NOT NULL,
  "BillingAddress" character varying(70),
  "BillingCity" character varying(40),
  "BillingState" character varying(40),
  "BillingCountry" character varying(40),
  "BillingPostalCode" character varying(10),
  "Total" numeric(10,2) NOT NULL,
  CONSTRAINT "PK_Invoice" PRIMARY KEY ("InvoiceId"),
  CONSTRAINT "FK_InvoiceCustomerId" FOREIGN KEY ("CustomerId")
      REFERENCES "Customer" ("CustomerId") MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
  OIDS=FALSE
);

The current execution plan is

Sort  (cost=33.65..34.54 rows=354 width=21) (actual time=0.691..0.698 rows=101 loops=1)"
  Sort Key: (date_part('year'::text, "Invoice"."InvoiceDate")), (sum("Invoice"."Total"))
  Sort Method: quicksort  Memory: 32kB
  ->  HashAggregate  (cost=14.24..18.67 rows=354 width=21) (actual time=0.540..0.567 rows=101 loops=1)
        ->  Seq Scan on "Invoice"  (cost=0.00..11.15 rows=412 width=21) (actual time=0.015..0.216 rows=412 loops=1)
Total runtime: 0.753 ms

My task is to optimize the query by using indices, however i cannot think of a way to use indices for optimizing aggregate results.

3
  • Indexes will rarely be helpful without a where clause. You should mention the typical use case (with another explain analyze) if you want a useful answer. Otherwise, getting the full list of all invoices in the amount of time you suggested seems reasonable. Naively, I would think an index on date_part('year'::text, "Invoice"."InvoiceDate"), BillingCountry, or some combination of the two could be useful depending on which you want to limit search results by. Commented Dec 6, 2014 at 21:21
  • 2
    Your query runs for less than one milli second. How fast do you need it to be? One nanosecond? Commented Dec 6, 2014 at 21:46
  • Consider using a materialized view, if you don't need a fresh data: postgresql.org/docs/9.3/static/rules-materializedviews.html The view can be refreshed every night, and it's results will include data up to previous day only, but will be very fast. Commented Dec 6, 2014 at 23:24

2 Answers 2

2

You can try to penalize Hashagg by "SET enable_hashagg to OFF", but probably for small data, there will not be any benefit from index .. in this use case - hashagg is usually most fast method for aggregation and sort 32kB is pretty quick.

But .. you are trying do performance benchmark on table with 412 rows. It is nonsense. Any thinking about performance has sense on data with size related 2..3 years of production usage.

Sign up to request clarification or add additional context in comments.

Comments

1

As noted by Pavel, Ramfjord, and horse, using an index is of little use with such a tiny amount of data. It's so small that it's faster for Postgres to read disk page or two and process everything in memory.

Further, you have the best possible plan for your query already. You're asking Postgres to compute an aggregate over an entire table and returning it in a certain order. Postgres proceeds by computing the aggregate in memory without bothering to sort the data first, by assigning intermediary results using a hash; it then sorts the small number of results according to your criteria.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.