3

I'm using PostgreSQL 9.6. I have a table like this:

mac   sn         loc   time     date      vin1    vin2    vin3
1a34 4as11111111 aaaa  7:06:18  1/1/2018  447.42  472.32  682.59
1a34 4as11111111 aaaa  7:06:43  1/1/2018  455.97  476.25  682.59
1a34 4as11111111 aaaa  7:07:35  1/1/2018  470.88  484.2   682.5

I need to calculate the average of the vin1, vin2, vin3 within time intervals of 300 sec (5 min). For example, starting from the first time (7:06:18 - 7:11:18), for the dates in range. I can select the data I need with this query:

select * from table
where sn='4as11111111' and date between '2018-01-01' and '2018-01-02';

But I don't know how to group it by 300 sec time intervals and calculate average for vin1, vin2, vin3 columns for those 5 min interval, to get something like this:

mac  sn          loc     time     date      vin1_av  vin2_av  vin3_av
1a34 4as11111111 aaaa   7:06:18  1/1/2018  450.0    480.32   600.59
1a34 4as11111111 aaaa   7:11:18  1/1/2018  460.0    490.25   782.59
1a34 4as11111111 aaaa   7:16:18  1/1/2018  470.88   500.2    600.5

Any help would be greatly appreciated.

8
  • What about time intervals without any existing rows? Include with result 0or NULL or no row at all? Can the range span more than one day? Also, your example data & result do not match. Nor does the query (different date). Commented May 10, 2018 at 2:32
  • All the rows are guaranteed to have data, there could be 0.0V but not NULL. There could be time slots with no data, you are right, so those slots should be ignored. Span can be weeks, months. Sorry about the query dates (I fixed it), they don't match the results I want, I just copied one of my experiment queries as an example Commented May 10, 2018 at 16:23
  • Thank you Erwin. My real table has a first column "mac" (see the edited question, sorry I didn't include it to the original post, didn't think it matters), and It changes the query so that I can't make it work using your solution. I feel like it is correct, but still can't figure out the changes. Commented May 21, 2018 at 22:29
  • Looks like mac and loc are always the same for the same sn. Is that so? If not, how would that reflect on the desired result? I updated the solution on an educated guess. Commented May 21, 2018 at 23:49
  • @Erwin: No, mac and loc could be different, but it doesn't matter much as I mostly interested in selection by sn. It's weired, I played with your query at dbfiddle (dbfiddle.uk/…) and it perfectly works there with the example data, but when I try it with real data (as in dbfiddle now), there is "no rows returned" (see dbfiddle link). And on my local db I get this "ERROR: ORDER/GROUP BY expression not found in targetlist". So again, I missed something simple there. Commented May 22, 2018 at 18:38

1 Answer 1

7

DB design

While you can work with separate date and time columns, there is really no advantage over a single timestamp column. I would adapt:

ALTER TABLE tbl ADD column ts timestamp;
UPDATE tbl SET ts = date + time;  -- assuming actual date and time types
ALTER TABLE tbl DROP column date, DROP column time;

If date and time are not actual date and time data types, use to_timestamp(). Related:

Query

Then the query is a bit simpler:

SELECT *
FROM  (
   SELECT sn, generate_series(min(ts), max(ts), interval '5 min') AS ts
   FROM   tbl
   WHERE  sn = '4as11111111'
   AND    ts >= '2018-01-01'
   AND    ts <  '2018-01-02'
   GROUP  BY 1
   ) grid
CROSS  JOIN LATERAL (
   SELECT round(avg(vin1), 2) AS vin1_av
        , round(avg(vin2), 2) AS vin2_av
        , round(avg(vin3), 2) AS vin3_av
   FROM   tbl
   WHERE  sn =  grid.sn
   AND    ts >= grid.ts
   AND    ts <  grid.ts + interval '5 min'
   ) avg;

db<>fiddle here

Generate a grid of start times in the first subquery grid, running from the first to the last qualifying row in the given time frame.

Join to rows that fall in each partition with a LATERAL join and immediately aggregate averages in the subquery avg. Due to the aggregates, it always returns a row even if no entries are found. Averages default to NULL in this case.

The result includes all time slots between the first and last qualifying row in the given time frame. Various other result compositions would make sense, too. Like including all times slots in the given time frame or just time slots with actual values. All possible, I had to pick one interpretation.

Index

At least have this multicolumn index:

CRATE INDEX foo_idx ON tbl (sn, ts);

Or on (sn, ts, vin1, vin2, vin3) to allow index-only scans - if some preconditions are met and especially if table rows are much wider than in the demo.

Closely related:

Based on your original table

As requested and clarified in the comment, and later updated again in the question to include the columns mac and loc. I assume you want separate averages per (mac, loc).

date and time are still separate columns, vin* columns are type float, and exclude time slots without rows:

The updated query also moves the set-returning function generate_series() to the FROM list, which is cleaner before Postgres 10:

SELECT t.mac, sn.sn, t.loc, ts.ts::time AS time, ts.ts::date AS date
     , t.vin1_av, t.vin2_av, t.vin3_av
FROM  (SELECT text '4as11111111') sn(sn)  -- provide sn here once
CROSS  JOIN LATERAL (
   SELECT min(date+time) AS min_ts, max(date+time) AS max_ts
   FROM   tbl
   WHERE  sn = sn.sn
   AND    date+time >= '2018-01-01 0:0'   -- provide time frame here
   AND    date+time <  '2018-01-02 0:0'
   ) grid
CROSS  JOIN LATERAL generate_series(min_ts, max_ts, interval '5 min') ts(ts)
CROSS  JOIN LATERAL (
   SELECT mac, loc
        , round(avg(vin1)::numeric, 2) AS vin1_av  -- cast to numeric for round()
        , round(avg(vin2)::numeric, 2) AS vin2_av  -- but rounding is optional
        , round(avg(vin3)::numeric, 2) AS vin3_av
   FROM   tbl
   WHERE  sn = sn.sn
   AND    date+time >= ts.ts
   AND    date+time <  ts.ts + interval '5 min'
   GROUP  BY mac, loc
   HAVING count(*) > 0  -- exclude empty slots
   ) t;

Create a multicolumn expression index to support this:

CRATE INDEX bar_idx ON tbl (sn, (date+time));

db<>fiddle here

But I would much rather use timestamp all along.

Sign up to request clarification or add additional context in comments.

8 Comments

Thank you very much, Erwin. Is it possible to do what I want with existing DB, with separate date and time columns? There are indexes for sn_idx and date_idx, as those queries are happening frequently, but vin1 ets. are all float and no indexed. And you are right, I don't need an empty time slots filled with NULLs (like at some part of the day it could be no actual values).
@lugger1: I added another solution.
Thank you Erwin. When I'm trying this in postgres, I'm getting the "ERROR: ORDER/GROUP BY expression not found in targetlist" for some reason. Am I missing something?
@lugger1: I tested with your version 9.6 and it works. See the added fiddle.
It was working, but now I'm getting ""ERROR: ORDER/GROUP BY expression not found in targetlist" error. I found that query like "select sn, generate_series(min(date+time), max(date+time), interval '5 min') AS ts from tbl where sn = '4as11111111' and date+time >= '2018-01-01 0:0' and date+time < '2018-01-01 10:0' group by 1 limit 10;" - works, but the same query without "limit 10" (even output is 8 rows, not 10) gives "ORDER/GROUP BY expression not found in targetlist" error. It seems that this is related to the sn_index that was created to improve the speed. Any suggestions how to fix?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.