0

I have a sensor_status table that logs the status of hundreds of sensors every 45 minutes

CREATE TABLE sensor_status (
  status_id INT AUTO_INCREMENT PRIMARY KEY,
  sensor_id INT NOT NULL,
  status ENUM('Online','Offline','Unknown') NOT NULL,
  status_timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  ...
);

Over time, this table will grow very large and most queries only care about:

  • The latest status for each sensor
  • The last 24 hours of data
  • The last week of data

Right now, every dashboard/API request scans the full sensor_status table with a WHERE status_timestamp >= … filter, which is becoming slow.

I'm fairly new to db management and organization so I am curious as to what are common practices to "cache" time-series tables without constantly querying the full time series table?

I'm currently considering doing the following:

  1. Separate "latest" table with insert trigger

  2. Snapshot tables with event scheduler. Rebuild a sensor_status_24hr and sensor_status_1wk every day.

Thank you!

10
  • 1
    I'm no pro but, a Index on the status_timestamp for filtering the most recent registers would be a good start. Commented Jun 17 at 17:26
  • Also, you might have a better return on the DBA comunity: dba.stackexchange.com Commented Jun 17 at 17:28
  • mariadb.com/docs/server/server-usage/partitioning-tables/… Commented Jun 17 at 17:41
  • @IłyaBursov - PARTITION is unlikely to help performance. Commented Jun 17 at 20:48
  • Use Summary Tables for Sensor Data Commented Jun 17 at 20:49

1 Answer 1

1

Add an index on your timestamp column:

ALTER TABLE sensor_status ADD INDEX (status_timestamp);

This will help the queries you describe examine only recent data instead of the whole table. You can verify the index is used with EXPLAIN.

The performance of a query is proportional to the number of rows it examines. If your table has no index, the query has to examine every row of the table to check if status_timestamp >= ....

Whereas if you have an index, the query can use the index to skip all the rows less than the value you are searching for. It doesn't need to examine them, so they don't cost anything for the query performance.

That should help make the queries have good performance for a long time. At least until your table has hundreds of millions of rows. If you collect 1000 sensors' data every 45 minutes, it will take about 8.5 years to reach 100 million rows.

Once you do reach the limits of how an index can help you optimize your queries, you might look into caching or partitioning as a solution. But I think the index is a simpler and more effective solution in the short term.

Sign up to request clarification or add additional context in comments.

3 Comments

Good idea, I didn't know about that. Will probably use this more often generally
Does every sensor get a new row every 45 minutes? If not, the suggested index may have missing info.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.