PostgreSQL: detecting the first/last rows of result set

Question

Is there any way to embed a flag in a select that indicates that it is the first or the last row of a result set? I'm thinking something to the effect of:

> SELECT is_first_row() AS f, is_last_row() AS l FROM blah;
  f  |  l
-----------
  t  |  f
  f  |  f
  f  |  f
  f  |  f
  f  |  t

The answer might be in window functions but I've only just learned about them, and I question their efficiency.

SELECT first_value(unique_column) OVER () = unique_column, last_value(unique_column) OVER () = unique_column, * FROM blah;

seems to do what I want. Unfortunately, I don't even fully understand that syntax, but since unique_column is unique and NOT NULL it should deliver unambiguous results. But if it does sorting, then the cure might be worse than the disease. (Actually, in my tests, unique_column is not sorted, so that's something.)

EXPLAIN ANALYZE doesn't indicate there's an efficiency problem, but when has it ever told me what I needed to know?

And I might need to use this in an aggregate function, but I've just been told window functions aren't allowed there. 😕

Edit: Actually, I just added ORDER BY unique_column to the above query and the rows identified as first and last were thrown into the middle of the result set. It's as if first_value()/last_value() really means "the first/last value I picked up before I began sorting." I don't think I can safely do this optimally. Not unless a much better understanding of the use of the OVER keyword is to be had.

I'm running PostgreSQL 9.6 in a Debian 9.5 environment.

This isn't a duplicate, because I'm trying to get the first row and last row of the result set to identify themselves, while Postgres: get min, max, aggregate values in one select is just going for the minimum and maximum values for a column in a result set.

What is the purpose of knowing first and last record? What do you want to do it further with it? And yes window function is the best solution for such problems. if you need to aggregate just use the result of your first query as a table in your aggregation query and then it works without problem — Grzegorz Grabek
– Grzegorz Grabek, Commented Aug 31, 2018 at 15:20
The result set will be sent off to another system asynchronously that will want to know which part of the result set it's looking at. It would be difficult to apply these notations after the result set is produced — Opux
– Opux, Commented Aug 31, 2018 at 15:25
There is no such thing as the "first" or "last" row in a relational database. Rows in a table are not sorted in any way. So unless you specify a sort definition you can't tell what the "first" row is. — user330315
– user330315, Commented Aug 31, 2018 at 16:44
You have an empty window definition : OVER (), which means: anything goes!. Compare that to the window definitions in my answer, which do impose an order. — joop
– joop, Commented Aug 31, 2018 at 17:22
@Opux, to be honest, I think that you overcomplicated things a lot. If you don't order records then first and last is totally random. More important when it goes to another system it can be read in completely different order then in your statement. You will have just 2 records flagged as first last from over few hundred, thousands or millions. Usefulness is close to zero or I miss something very important and don't understand why you want to do it that way. — Grzegorz Grabek
– Grzegorz Grabek, Commented Aug 31, 2018 at 17:36

joop · Accepted Answer · 2018-08-31 16:41:27Z

1

You can use the lead() and lag() window functions (over the appropiate window) and compare them to NULL:

-- \i tmp.sql

CREATE TABLE ztable
( id SERIAL PRIMARY KEY
  , starttime TIMESTAMP
);

INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '1 minute');
INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '2 minute');
INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '3 minute');
INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '4 minute');
INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '5 minute');
INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '6 minute');

SELECT id, starttime
        , ( lead(id) OVER www IS NULL) AS is_first
        , ( lag(id) OVER www IS NULL) AS is_last
FROM ztable
WINDOW www AS (ORDER BY id )
ORDER BY id
        ;


SELECT id, starttime
        , ( lead(id) OVER www IS NULL) AS is_first
        , ( lag(id) OVER www IS NULL) AS is_last
FROM ztable
WINDOW www AS (ORDER BY starttime )
ORDER BY id
        ;

SELECT id, starttime
        , ( lead(id) OVER www IS NULL) AS is_first
        , ( lag(id) OVER www IS NULL) AS is_last
FROM ztable
WINDOW www AS (ORDER BY starttime )
ORDER BY random()
        ;

Result:

INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
 id |         starttime          | is_first | is_last 
----+----------------------------+----------+---------
  1 | 2018-08-31 18:38:45.567393 | f        | t
  2 | 2018-08-31 18:37:45.575586 | f        | f
  3 | 2018-08-31 18:36:45.587436 | f        | f
  4 | 2018-08-31 18:35:45.592316 | f        | f
  5 | 2018-08-31 18:34:45.600619 | f        | f
  6 | 2018-08-31 18:33:45.60907  | t        | f
(6 rows)

 id |         starttime          | is_first | is_last 
----+----------------------------+----------+---------
  1 | 2018-08-31 18:38:45.567393 | t        | f
  2 | 2018-08-31 18:37:45.575586 | f        | f
  3 | 2018-08-31 18:36:45.587436 | f        | f
  4 | 2018-08-31 18:35:45.592316 | f        | f
  5 | 2018-08-31 18:34:45.600619 | f        | f
  6 | 2018-08-31 18:33:45.60907  | f        | t
(6 rows)

 id |         starttime          | is_first | is_last 
----+----------------------------+----------+---------
  2 | 2018-08-31 18:37:45.575586 | f        | f
  4 | 2018-08-31 18:35:45.592316 | f        | f
  6 | 2018-08-31 18:33:45.60907  | f        | t
  5 | 2018-08-31 18:34:45.600619 | f        | f
  1 | 2018-08-31 18:38:45.567393 | t        | f
  3 | 2018-08-31 18:36:45.587436 | f        | f
(6 rows)

[updated: added a randomly sorted case]

edited Aug 31, 2018 at 16:41

answered Aug 31, 2018 at 16:09

joop

4,5431 gold badge18 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Opux Over a year ago

What's that first SELECT supposed to be? The data seems wrong. It also seems I'd still need to sort. I tried removing the ORDER BYs from it, and the data was also wrong.

joop Over a year ago

I added an extra query to illustrate that the result does not depend on the sorting order.

Opux Over a year ago

My expectation is that the first record will be called the first and the last will be called the last. The 1st query is fine for Jesus (Matt 20:16). But the 2nd gives correct results, but sorts

joop Over a year ago

The decision first/last is made based on the window definition, the final order of the results is unrelated. [ and: apart from the ordering, the results for query #2 and #3 are identical.

Opux Over a year ago

I just edited my question w/a new roadblock. Maybe it makes it clearer what I'm trying to accomplish.

Abelisto · Accepted Answer · 2018-08-31 22:51:45Z

1

It is simple using window functions with particular frames:

with t(x, y) as (select generate_series(1,5), random()) 
select *,
  count(*) over (rows between unbounded preceding and current row),
  count(*) over (rows between current row and unbounded following)
from t;
┌───┬───────────────────┬───────┬───────┐
│ x │         y         │ count │ count │
├───┼───────────────────┼───────┼───────┤
│ 1 │ 0.543995119165629 │     1 │     5 │
│ 2 │ 0.886343683116138 │     2 │     4 │
│ 3 │ 0.124682310037315 │     3 │     3 │
│ 4 │ 0.668972567655146 │     4 │     2 │
│ 5 │ 0.266671542543918 │     5 │     1 │
└───┴───────────────────┴───────┴───────┘

As you can see count(*) over (rows between unbounded preceding and current row) returns rows count from the data set beginning to current row and count(*) over (rows between current row and unbounded following) returns rows count from the current to data set end. 1 indicates the first/last rows.

It works until you ordering your data set by order by. In this case you need to duplicate it in the frames definitions:

with t(x, y) as (select generate_series(1,5), random()) 
select *,
  count(*) over (order by y rows between unbounded preceding and current row),
  count(*) over (order by y rows between current row and unbounded following)
from t order by y;
┌───┬───────────────────┬───────┬───────┐
│ x │         y         │ count │ count │
├───┼───────────────────┼───────┼───────┤
│ 1 │ 0.125781774986535 │     1 │     5 │
│ 4 │  0.25046408502385 │     2 │     4 │
│ 5 │ 0.538880597334355 │     3 │     3 │
│ 3 │ 0.802807193249464 │     4 │     2 │
│ 2 │ 0.869908029679209 │     5 │     1 │
└───┴───────────────────┴───────┴───────┘

PS: As mentioned by a_horse_with_no_name in the comment:

there is no such thing as the "first" or "last" row without sorting.

answered Aug 31, 2018 at 22:51

Abelisto

15.8k3 gold badges38 silver badges47 bronze badges

2 Comments

Opux Over a year ago

That seems to work, too. A bit wordy though. Was I just lucky that first_value(unique_column) OVER () = unique_column worked for me and it might not work somewhere down the line? I found that if I sorted, then I was able to get it to work by putting my main query in a subquery, then put the first_value() stuff in the superquery.

Abelisto Over a year ago

@Opux Here is great article with detailed explanation how window functions works: red-gate.com/simple-talk/sql/t-sql-programming/…

Bruno Paulino · Accepted Answer · 2018-08-31 15:47:37Z

0

In fact, Window Functions are a great approach and for that requirement of yours, they are awesome.

Regarding efficiency, window functions work over the data set already at hand. Which means the DBMS will just add extra processing to infer first/last values.

Just one thing I'd like to suggest: I like to put an ORDER BY criteria inside the OVER clause, just to ensure the data set order is the same between multiple executions, thus returning the same values to you.

answered Aug 31, 2018 at 15:47

Bruno Paulino

561 silver badge2 bronze badges

1 Comment

Opux Over a year ago

It looks like you are right about the sorting. I just edited my question w/a new development. If I need to sort, then I don't think this thing is worth doing. My next endeavour is to better understand OVER, in case that offers a solution.

Pallav Kabra · Accepted Answer · 2018-08-31 16:42:43Z

0

Try using

SELECT columns 
FROM mytable 
Join conditions
WHERE conditions ORDER BY date DESC LIMIT 1

UNION ALL 

SELECT columns
FROM mytable 
Join conditions
WHERE conditions ORDER BY date ASC LIMIT 1

SELECT just cut half of the processing time. You can go for indexing also.

edited Aug 31, 2018 at 16:42

user330315

answered Aug 31, 2018 at 15:17

Pallav Kabra

4583 silver badges8 bronze badges

3 Comments

Opux Over a year ago

Yeah, but, therein lies the sorting, which is the kind of thing I wanted to avoid, if possible. AFAIK, window functions don't require sorting

Pallav Kabra Over a year ago

Refer, stackoverflow.com/questions/1485391/…

user330315 Over a year ago

@Opux: there is no such thing as the "first" or "last" row without sorting. And window functions - if you use them correctly - will require sorting as well

Collectives™ on Stack Overflow

PostgreSQL: detecting the first/last rows of result set

4 Answers 4

5 Comments

2 Comments

1 Comment

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

2 Comments

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related