3

I'm trying to do a query where I want to ignore the first and the last row of the result query. In order to do this a hit was given using window functions like the query gave to me above

SELECT lag(timestamp_min)    OVER (ORDER BY timestamp_min) AS timestamp_min,
       lag(type)             OVER (ORDER BY timestamp_min) AS type,
       lag(sum_first_medium) OVER (ORDER BY timestamp_min),
FROM (SELECT to_timestamp(
                floor(
                   (extract('epoch' FROM TIMESTAMP) / 300)
                ) * 300
             ) AS timestamp_min,
             type,
             floor(sum(medium[1])) AS sum_first_medium
      FROM default_dataset
      WHERE type = 'ap_clients.wlan0'
        AND timestamp > current_timestamp - INTERVAL '85 minutes'
        AND organization_id = '9fc02db4-c3df-4890-93ac-8dd575ca5638'
      GROUP BY timestamp_min, type) lagme
OFFSET 2;

The problem is the last query is not returning anything:

ws_controller_hist=> SELECT lag(timestamp_min) OVER (ORDER BY timestamp_min) AS timestamp_min, lag(type) OVER (ORDER BY timestamp_min) AS type, lag(sum_first_medium) OVER (ORDER BY timestamp_min) FROM (SELECT to_timestamp(floor((extract('epoch' FROM TIMESTAMP) / 300)) * 300) AS timestamp_min, type, floor(sum(medium[1])) AS sum_first_medium FROM default_dataset WHERE type = 'ap_clients.wlan0' AND timestamp > current_timestamp - INTERVAL '85 minutes' AND organization_id = '9fc02db4-c3df-4890-93ac-8dd575ca5638' GROUP BY timestamp_min, type) lagme OFFSET 2;
 timestamp_min | type | lag
---------------+------+-----
(0 rows)

But I have data for the type "ap_clients.wlan0"

ws_controller_hist=> select * from default_dataset where type ='ap_clients.wlan0' order by timestamp desc limit 3;
                  id                  |       timestamp        | agregation_period | medium | maximum | minimum | sum |       type       |              device_id               | network_id |           organiza
tion_id            |     labels
--------------------------------------+------------------------+-------------------+--------+---------+---------+-----+------------------+--------------------------------------+------------+-------------------
-------------------+----------------
 b3661dca-a459-43cd-a3c4-7609e36c18d5 | 2018-01-02 10:21:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 abbca52d-f3f5-4a99-bd2f-41602964506e | 2018-01-02 10:16:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 24e00926-bc6d-4025-8a6c-a8de9efacdad | 2018-01-02 10:11:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
(3 rows)

I need to have a query retrieving the sum of all mediums in the last hour grouped by 5 minutes.

My first approach to resolve my problem to ignore the first record I used offset(1) and to ignore the last i was trying to do a limit in my id field, ordering by timestamp desc.

ws_controller_hist=>  
SELECT to_timestamp(floor((extract('epoch' FROM TIMESTAMP) / 300)) * 300) 
AS timestamp_min,
       TYPE,
       floor(sum(medium[1]))
FROM default_dataset
WHERE TYPE LIKE 'ap_clients.wlan0'
  AND TIMESTAMP > CURRENT_TIMESTAMP - interval '85 minutes'
  AND organization_id = '9fc02db4-c3df-4890-93ac-8dd575ca5638'
  AND id NOT IN
    (SELECT id
     FROM default_dataset
     ORDER BY TIMESTAMP DESC
     LIMIT 1)
GROUP BY timestamp_min,
         TYPE
ORDER BY timestamp_min ASC
OFFSET 1;

     timestamp_min      |       type       | floor
------------------------+------------------+-------
 2017-12-19 14:20:00+00 | ap_clients.wlan0 |    38
 2017-12-19 14:25:00+00 | ap_clients.wlan0 |    37
 2017-12-19 14:30:00+00 | ap_clients.wlan0 |    39
 2017-12-19 14:35:00+00 | ap_clients.wlan0 |    42
 2017-12-19 14:40:00+00 | ap_clients.wlan0 |    43
 2017-12-19 14:45:00+00 | ap_clients.wlan0 |    44
 2017-12-19 14:50:00+00 | ap_clients.wlan0 |    45
 2017-12-19 14:55:00+00 | ap_clients.wlan0 |    45
 2017-12-19 15:00:00+00 | ap_clients.wlan0 |    43
 2017-12-19 15:05:00+00 | ap_clients.wlan0 |    43
 2017-12-19 15:10:00+00 | ap_clients.wlan0 |    50
 2017-12-19 15:15:00+00 | ap_clients.wlan0 |    52
 2017-12-19 15:20:00+00 | ap_clients.wlan0 |    50
 2017-12-19 15:25:00+00 | ap_clients.wlan0 |    53
 2017-12-19 15:30:00+00 | ap_clients.wlan0 |    49
 2017-12-19 15:35:00+00 | ap_clients.wlan0 |    39
 2017-12-19 15:40:00+00 | ap_clients.wlan0 |    16

But my last query is not ignoring the last record because i have the same records dont using the subquery " and id not in (select id from default_dataset order by timestamp desc limit 1) ".

If I try to make a query to see the results for the type "ap_clients.wlan0" I have

ws_controller_hist=> select * from default_dataset where organization_id='ce4b69af-bdce-4f1b-ba71-dd03544205d5' and type ='ap_clients.wlan0' order by timestamp desc limit 5;
                  id                  |       timestamp        | agregation_period | medium | maximum | minimum | sum |       type       |              device_id               | network_id |           organiza
tion_id            |     labels
--------------------------------------+------------------------+-------------------+--------+---------+---------+-----+------------------+--------------------------------------+------------+-------------------
-------------------+----------------
 b3661dca-a459-43cd-a3c4-7609e36c18d5 | 2018-01-02 10:21:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 abbca52d-f3f5-4a99-bd2f-41602964506e | 2018-01-02 10:16:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 24e00926-bc6d-4025-8a6c-a8de9efacdad | 2018-01-02 10:11:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 e67baf28-6d5b-43a5-85e2-fcf2d04a0b2e | 2018-01-02 10:06:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 c7ce16ce-9cda-423f-b32b-f4d6dce859e6 | 2018-01-02 10:01:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}

What can I do?

2
  • Will you have data for every possible 5-minute slot in your chosen time-window, as in your example? If so why not select only the slots you want, rather than selecting an extra one at the beginning and an extra one at the end and then trying to remove them? Commented Jan 2, 2018 at 13:18
  • I've edited my question and added a new query for type ap_clients.wlan0 Commented Jan 2, 2018 at 14:07

1 Answer 1

1

A simple solution is to use lag and lead window functions with a parameter that cannot be NULL, in such way lag will return NULL for the first row and lead will return NULL for the last row, so you can simple filter for rows where both of them are NOT NULL:

SELECT
    t2.timestamp_min,
    t2.type,
    t2.sum_first_medium
FROM (
    SELECT
        t1.*,
        lead(1) OVER(ORDER BY t1.timestamp_min) AS lead,
        lag(1) OVER(ORDER BY t1.timestamp_min) AS lag
    FROM (
        SELECT
            to_timestamp(
              floor(
                (extract('epoch' FROM TIMESTAMP) / 300)
              ) * 300
            ) AS timestamp_min,
            type,
            floor(sum(medium[1])) AS sum_first_medium
        FROM default_dataset
        WHERE
            type = 'ap_clients.wlan0'
            AND timestamp > current_timestamp - INTERVAL '85 minutes'
            AND organization_id = '9fc02db4-c3df-4890-93ac-8dd575ca5638'
        GROUP BY timestamp_min, type
    ) t1
) t2
WHERE
    t2.lag IS NOT NULL -- Only first row will return NULL, skip it
    AND t2.lead IS NOT NULL -- Only last row will return NULL, skip it
ORDER BY t2.timestamp_min

Notice I have used lead(1) and lag(1) just because 1 is a non-NULL expression, you could use any non-NULL expression or even a column there (since guaranteed to be NOT NULL).

Another possible solution would be applying two row_number() calls, one use ORDER BY timestamp_min ASC and another using ORDER BY timestamp_min DESC, and then filtering the rows which those are <> 1. But that would require two sorts of the dataset (one for ASC and one for DESC), while the lag/lead solution requires only one (although it may be harder to understand).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.