Best practices for caching subsets of time-series data in MariaDB? [closed]

Question

Closed. This question is opinion-based. It is not currently accepting answers.

Want to improve this question? Because this question may lead to opinionated discussion, debate, and answers, it has been closed. You may edit the question if you feel you can improve it so that it requires answers that include facts and citations or a detailed explanation of the proposed solution. If edited, the question will be reviewed and might be reopened.

Closed 5 months ago.

Improve this question

I have a sensor_status table that logs the status of hundreds of sensors every 45 minutes

CREATE TABLE sensor_status (
  status_id INT AUTO_INCREMENT PRIMARY KEY,
  sensor_id INT NOT NULL,
  status ENUM('Online','Offline','Unknown') NOT NULL,
  status_timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  ...
);

Over time, this table will grow very large and most queries only care about:

The latest status for each sensor
The last 24 hours of data
The last week of data

Right now, every dashboard/API request scans the full sensor_status table with a WHERE status_timestamp >= … filter, which is becoming slow.

I'm fairly new to db management and organization so I am curious as to what are common practices to "cache" time-series tables without constantly querying the full time series table?

I'm currently considering doing the following:

Separate "latest" table with insert trigger
Snapshot tables with event scheduler. Rebuild a sensor_status_24hr and sensor_status_1wk every day.

Thank you!

I'm no pro but, a Index on the status_timestamp for filtering the most recent registers would be a good start. — NatanG.G
– NatanG.G, Commented Jun 17 at 17:26
Also, you might have a better return on the DBA comunity: dba.stackexchange.com — NatanG.G
– NatanG.G, Commented Jun 17 at 17:28
mariadb.com/docs/server/server-usage/partitioning-tables/… — Iłya Bursov
– Iłya Bursov, Commented Jun 17 at 17:41

Bill Karwin · Accepted Answer · 2025-06-17 17:49:15Z

1

Add an index on your timestamp column:

ALTER TABLE sensor_status ADD INDEX (status_timestamp);

This will help the queries you describe examine only recent data instead of the whole table. You can verify the index is used with EXPLAIN.

The performance of a query is proportional to the number of rows it examines. If your table has no index, the query has to examine every row of the table to check if status_timestamp >= ....

Whereas if you have an index, the query can use the index to skip all the rows less than the value you are searching for. It doesn't need to examine them, so they don't cost anything for the query performance.

That should help make the queries have good performance for a long time. At least until your table has hundreds of millions of rows. If you collect 1000 sensors' data every 45 minutes, it will take about 8.5 years to reach 100 million rows.

Once you do reach the limits of how an index can help you optimize your queries, you might look into caching or partitioning as a solution. But I think the index is a simpler and more effective solution in the short term.

edited Jun 17 at 17:49

answered Jun 17 at 17:26

Bill Karwin

567k87 gold badges709 silver badges869 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

raryar Jun 18 at 16:57

Good idea, I didn't know about that. Will probably use this more often generally

Bill Karwin Jun 18 at 17:37

You might like my presentation How to Design Indexes, Really, or the video. And of course my book SQL Antipatterns, Volume 1 Avoiding the Pitfalls of Database Programming. :-)

Rick James Jun 19 at 4:57

Does every sensor get a new row every 45 minutes? If not, the suggested index may have missing info.

Collectives™ on Stack Overflow

Best practices for caching subsets of time-series data in MariaDB? [closed]

1 Answer 1

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Related