I'm currently working on a web service that supports multiple databases. I'm trying to optimize tables and fix missing indexes. The following is the MySQL query:
SELECT 'UTC' AS timezone, pak.id AS package_id, rel.unique_id AS relay, sns.unique_id AS sensor, pak.rtime AS time,
sns.units AS sensor_units, typ.name AS sensor_type, dat.data AS sensor_data,
loc.altitude AS altitude, Y(loc.location) AS latitude, X(loc.location) as longitude,
loc.speed as speed, loc.climb as climb, loc.track as track,
loc.longitude_error as longitude_error, loc.latitude_error as latitude_error, loc.altitude_error as altitude_error,
loc.speed_error as speed_error, loc.climb_error as climb_error, loc.track_error as track_error
FROM sensor_data dat
LEFT OUTER JOIN package_location loc on dat.package_id = loc.package_id
LEFT OUTER JOIN data_package pak ON dat.package_id = pak.id
LEFT OUTER JOIN relays rel ON pak.relay_id = rel.id
LEFT OUTER JOIN sensors sns ON dat.sensor_id = sns.id
LEFT OUTER JOIN sensor_types typ ON sns.sensor_type = typ.id
WHERE typ.name='Temperature'
AND rel.unique_id='OneWireTester'
AND pak.rtime > '2015-01-01'
AND pak.rtime < '2016-01-01'
and the explanation...
+----+-------------+-------+--------+------------------------------------------+----------------------+---------+------------------------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+------------------------------------------+----------------------+---------+------------------------+------+----------------------------------------------------+
| 1 | SIMPLE | rel | ALL | PRIMARY | NULL | NULL | NULL | 5 | Using where |
| 1 | SIMPLE | pak | ref | PRIMARY,fk_package_relay_id | fk_package_relay_id | 9 | BigSense.rel.id | 1 | Using index condition; Using where |
| 1 | SIMPLE | dat | ref | fk_sensor_package_id,fk_sensor_sensor_id | fk_sensor_package_id | 9 | BigSense.pak.id | 1 | NULL |
| 1 | SIMPLE | sns | eq_ref | PRIMARY,fk_sensors_type_id | PRIMARY | 8 | BigSense.dat.sensor_id | 1 | NULL |
| 1 | SIMPLE | loc | eq_ref | PRIMARY | PRIMARY | 8 | BigSense.pak.id | 1 | NULL |
| 1 | SIMPLE | typ | ALL | PRIMARY | NULL | NULL | NULL | 5 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+--------+------------------------------------------+----------------------+---------+------------------------+------+----------------------------------------------------+
...seems pretty straight forward. I need to add an index on the relays table and the sensor_types to optimize the query.
The tables for the PostgreSQL version are pretty much identical. However when I use the following query:
SELECT 'UTC' AS timezone, pak.id AS package_id, rel.unique_id AS relay, sns.unique_id AS sensor, pak.rtime AS time,
sns.units AS sensor_units, typ.name AS sensor_type, dat.data AS sensor_data,
loc.altitude AS altitude, ST_Y(loc.location::geometry) AS latitude, ST_X(loc.location::geometry) as longitude,
loc.speed as speed, loc.climb as climb, loc.track as track,
loc.longitude_error as longitude_error, loc.latitude_error as latitude_error, loc.altitude_error as altitude_error,
loc.speed_error as speed_error, loc.climb_error as climb_error, loc.track_error as track_error
FROM sensor_data dat
LEFT OUTER JOIN package_location loc on dat.package_id = loc.package_id
LEFT OUTER JOIN data_package pak ON dat.package_id = pak.id
LEFT OUTER JOIN relays rel ON pak.relay_id = rel.id
LEFT OUTER JOIN sensors sns ON dat.sensor_id = sns.id
LEFT OUTER JOIN sensor_types typ ON sns.sensor_type = typ.id
WHERE typ.name='Temperature'
AND rel.unique_id='OneWireTester'
AND pak.rtime > '2015-01-01'
AND pak.rtime < '2016-01-01';
If I do an explain analyze, I get the following:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=36.23..131.80 rows=1 width=477) (actual time=0.074..3.933 rows=76 loops=1)
-> Nested Loop (cost=36.09..131.60 rows=1 width=349) (actual time=0.068..3.782 rows=76 loops=1)
-> Nested Loop (cost=35.94..130.58 rows=4 width=267) (actual time=0.062..2.472 rows=620 loops=1)
-> Hash Join (cost=35.67..128.73 rows=4 width=247) (actual time=0.053..0.611 rows=620 loops=1)
Hash Cond: (dat.sensor_id = sns.id)
-> Seq Scan on sensor_data dat (cost=0.00..89.46 rows=946 width=21) (actual time=0.007..0.178 rows=1006 loops=1)
-> Hash (cost=35.64..35.64 rows=2 width=238) (actual time=0.037..0.037 rows=11 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Hash Join (cost=20.68..35.64 rows=2 width=238) (actual time=0.019..0.035 rows=11 loops=1)
Hash Cond: (sns.sensor_type = typ.id)
-> Seq Scan on sensors sns (cost=0.00..13.60 rows=360 width=188) (actual time=0.002..0.005 rows=31 loops=1)
-> Hash (cost=20.62..20.62 rows=4 width=66) (actual time=0.010..0.010 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on sensor_types typ (cost=0.00..20.62 rows=4 width=66) (actual time=0.006..0.008 rows=1 loops=1)
Filter: ((name)::text = 'Temperature'::text)
Rows Removed by Filter: 4
-> Index Scan using data_package_pkey on data_package pak (cost=0.28..0.45 rows=1 width=20) (actual time=0.002..0.002 rows=1 loops=620)
Index Cond: (id = dat.package_id)
Filter: ((rtime > '2015-01-01 00:00:00'::timestamp without time zone) AND (rtime < '2016-01-01 00:00:00'::timestamp without time zone))
-> Index Scan using relays_pkey on relays rel (cost=0.14..0.24 rows=1 width=94) (actual time=0.002..0.002 rows=0 loops=620)
Index Cond: (id = pak.relay_id)
Filter: ((unique_id)::text = 'OneWireTester'::text)
Rows Removed by Filter: 1
-> Index Scan using package_location_pkey on package_location loc (cost=0.14..0.18 rows=1 width=140) (actual time=0.001..0.001 rows=0 loops=76)
Index Cond: (dat.package_id = package_id)
Planning time: 0.959 ms
Execution time: 4.030 ms
(27 rows)
The table schema has the same foreign keys and general structure, so I'd expect to see the same indexes required. However I've been looking through several guides on pgsql's examine statement and from what I've gathered, the Seq Scan statements are indicators of missing indexes, meaning I am missing indexes on sensors, sensor_data, and sensor_type.
Am I interpreting the results of these examine statements correctly? What should I be looking for in order to optimize both databases?

and from what I've gathered, the Seq Scan statements are indicators of missing indexes,wrong.