How to optimize a MAX SQL query with GROUP BY DATE

Question

I'm trying to optimize a query from a table with 3M rows.

The columns are value, datetime and point_id.

SELECT DATE(datetime), MAX(value) FROM historical_points WHERE point_id=1 GROUP BY DATE(datetime);

This query takes 2 seconds.

I tried indexing the point_id=1 but the results were not much better.

Is it possible to index the MAX query or is there a better way to do it? Maybe with an INNER JOIN?

EDIT: This is the explain analyze of similar one, that is tackling the case better. This one also ha performance problem.

EXPLAIN ANALYZE SELECT DATE(datetime), MAX(value), MIN(value) FROM buildings_hispoint WHERE point_id=64 AND datetime BETWEEN '2017-09-01 00:00:00' AND '2017-10-01 00:00:00' GROUP BY DATE(datetime);

>GroupAggregate  (cost=84766.65..92710.99 rows=336803 width=68) (actual time=1461.060..2701.145 rows=21 loops=1)
>  Group Key: (date(datetime))
>  ->  Sort  (cost=84766.65..85700.23 rows=373430 width=14) (actual time=1408.445..1547.929 rows=523621 loops=1)
>        Sort Key: (date(datetime))
>        Sort Method: external sort  Disk: 11944kB
>        ->  Bitmap Heap Scan on buildings_hispoint  (cost=10476.02..43820.81 rows=373430 width=14) (actual time=148.970..731.154 rows=523621 loops=1)
>              Recheck Cond: (point_id = 64)
>              Filter: ((datetime >= '2017-09-01 00:00:00+02'::timestamp with time zone) AND (datetime               Rows Removed by Filter: 35712
>              Heap Blocks: exact=14422
>              ->  Bitmap Index Scan on buildings_measurementdatapoint_ffb10c68  (cost=0.00..10382.67 rows=561898 width=0) (actual time=125.150..125.150 rows=559333 loops=1)
>                    Index Cond: (point_id = 64)
>Planning time: 0.284 ms
>Execution time: 2704.566 ms

might sound strange, but a covering index to optimize on (point_id, datetime, value) as a single index. — DRapp
– DRapp, Commented Sep 27, 2017 at 16:08

Michel Milezzi · Accepted Answer · 2017-09-27 18:49:17Z

1

Without seeing EXPLAIN output is difficult to say something. My guess is that you must include DATE() call on index definition:

CREATE INDEX historical_points_idx ON historical_points (DATE(datetime), point_id);

Also, if point_id has more distinct values than DATE(datetime) then you must reverse column order:

CREATE INDEX historical_points_idx ON historical_points (point_id, DATE(datetime));

Keep in mind that cardinality of columns is very important to the planner, columns with high selectivity is preferred to go first.

answered Sep 27, 2017 at 18:49

Michel Milezzi

11.4k3 gold badges24 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Rodolfo del Valle Over a year ago

I tried with: CREATE INDEX historical_points_idx ON historical_points (DATE(datetime AT TIME ZONE 'UTC'), point_id); but it didn't make any difference

Andrew Lazarus · Accepted Answer · 2017-09-27 22:22:13Z

1

SELECT DISTINCT ON (DATE(datetime)) DATE(datetime), value 
FROM historical_points WHERE point_id=1
ORDER BY DATE(datetime) DESC, value DESC;

Put an computed index on DATE(datetime), value. [I hope those aren't your real column names. Using reserved words like VALUE as a column name is a recipe for confusion.]

The SELECT DISTINCT will work like a GROUP ON. The ORDER BY replaces the MAX, and will be fast if indexed.

I owe this technique to @ErwinBrandstetter.

answered Sep 27, 2017 at 22:22

Andrew Lazarus

19.5k3 gold badges38 silver badges57 bronze badges

2 Comments

Rodolfo del Valle Over a year ago

This is a bit faster, but not so much. To be honest, I don't know how much I can push it down. Do you think it looks feasible to reduce it way more?

Andrew Lazarus Over a year ago

What are your indexes. This should run fast.

Collectives™ on Stack Overflow

How to optimize a MAX SQL query with GROUP BY DATE

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related