Is there an aggregate function that could return first non-null value within a group?

Question

I'm using Oracle XE 10g.

Please I beg you to read my question carefully. I have a weird use case for this but please bear with it.

Let's say I have the following records:

Table person
Name  YearOfBirth
a     null
a     2001
a     2002
b     1990
b     null
c     null
c     2001
c     2009

Basically if I do the following query:

select
  p.Name, max(p.YearOfBirth)
from
  person p
group by
  p.Name

That will give me records with distinct Names and each distinct name will be paired to maximum value of YearOfBirth within its group. In the given example the group where Name='a', the maximum YearOfBirth is 2002.

If max() is an aggregate function that returns the maximum value of a column in a given group, is there a function that returns the first value within the group that is not null? Instead of giving me the maximum value, I want the first value you could find as long as it is not null.

Please don't ask me why I can't simply use min() or max() instead.

Obviously I can't use rownum here as some might suggest because doing so will limit the number of groups I could get.

How do you define "first"? Rows in a table don't have a defined order unless your table is an IOT (Index Organized Table) or you are processing rows returned from a SELECT with an "ORDER BY". — George3
– George3, Commented Oct 17, 2011 at 4:12
Please define first. Data in a table is unordered, the order that results are returned in could change at any time. The concept of first only makes sense if it can be defined in terms of the data. — Shannon Severance
– Shannon Severance, Commented Oct 17, 2011 at 4:12
@George3: Even in an IOT, there is no defined order and it is possible to get results back that are not in order by the primary key, especially if a fast full scan of the primary key index is performed. See: asktom.oracle.com/pls/apex/… — Shannon Severance
– Shannon Severance, Commented Oct 17, 2011 at 4:17
@Shannon Severance - Good point no defined order in an IOT for retrieval, only ordered as defined for logical storage by primary key. — George3
– George3, Commented Oct 17, 2011 at 4:30
@Shannon Yeah, I know it doesn't make sense not to have a "spec" of retrieving the "first" row or it doesn't make sense not having a definite definition of the "first". But that's the point, the solution itself should have no basis of getting the first. That's exactly the "spec". I know it doesn't make sense but what the heck, it's a long story. Never had this use case before. — supertonsky
– supertonsky, Commented Oct 17, 2011 at 5:47

Adam Wenger · Accepted Answer · 2011-10-17 04:23:02Z

11

I may be misunderstanding why ROW NUMBER would not work for you. I do not have Oracle, but I did test this in SQL Server, and I believe it provides the results you requested:

WITH soTable AS
(
   SELECT 'a' AS Name, null AS YearOfBirth
   UNION ALL SELECT 'a', 2001
   UNION ALL SELECT 'a', 2002
   UNION ALL SELECT 'b', 1990
   UNION ALL SELECT 'b', null
   UNION ALL SELECT 'b', 1994
   UNION ALL SELECT 'b', 1981
   UNION ALL SELECT 'c', null
   UNION ALL SELECT 'c', 2009
   UNION ALL SELECT 'c', 2001
)
, soTableNoNulls AS
(
   SELECT so.Name, so.YearOfBirth, ROW_NUMBER() OVER (PARTITION BY so.Name ORDER BY so.Name ASC) AS RowNumber
   FROM soTable AS so
   WHERE so.YearOfBirth IS NOT NULL
)
SELECT nn.Name, nn.YearOfBirth
FROM soTableNoNulls AS nn
WHERE nn.RowNumber = 1

edited Oct 17, 2011 at 4:23

answered Oct 17, 2011 at 4:14

Adam Wenger

17.7k8 gold badges55 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Adam Wenger Over a year ago

I'm making the assumption here that there is a primary key driving order so the 'first' record would be consistent.

Shannon Severance Over a year ago

It doesn't look like you use the RowNumber column from soTableNoNulls. If it's not needed, would be best to remove. I think you could cut that down to one CTE instead of two. (Not counting the CTE with test data.) (CTE = Common Table Expression, usually called subquery factoring in Oracle.)

Adam Wenger Over a year ago

Thanks, noticed that too late after I posted the answer. It's removed now.

supertonsky Over a year ago

Fantastic! I don't know how "Partition By" exactly works but you made it work. Thanks Adam. BTW, there's no primary key. It is possible to get more than one record with the same names and the same YearOfBirths. Would that be a problem?

Adam Wenger Over a year ago

Brent Ozar wrote a good post about how PARTITION BY works in ROW_NUMBER (his post has information on other aggregate functions as well) brentozar.com/archive/2011/07/leaving-windows-open

|

David Faber · Accepted Answer · 2015-01-22 15:24:59Z

2

If by "first" you mean the record with the lowest birth year, then you can do the following:

WITH s1 AS
(
   SELECT 'a' AS name, NULL AS birth_year FROM dual
   UNION ALL SELECT 'a', 2001 FROM dual
   UNION ALL SELECT 'a', 2002 FROM dual
   UNION ALL SELECT 'b', 1990 FROM dual
   UNION ALL SELECT 'b', null FROM dual
   UNION ALL SELECT 'b', 1994 FROM dual
   UNION ALL SELECT 'b', 1981 FROM dual
   UNION ALL SELECT 'c', null FROM dual
   UNION ALL SELECT 'c', 2009 FROM dual
   UNION ALL SELECT 'c', 2001 FROM dual
)
SELECT name, birth_year FROM (
    SELECT name, birth_year
         , FIRST_VALUE(birth_year IGNORE NULLS) OVER ( PARTITION BY name ORDER BY birth_year ) AS first_birth_year
      FROM s1
) WHERE birth_year = first_birth_year

The advantage of using FIRST_VALUE() over ROW_NUMBER() is that the former will return multiple rows in the event of ties. For example, if you had another a born in 2001 in your data, then the resulting data would look like this:

NAME  BIRTH_YEAR
a     2001
a     2001
b     1981
c     2001

The ROW_NUMBER() solution would return only one of the above rows. However, that could also be solved by using RANK().

If there is some other way of defining "first" (e.g., an entry date column), simply use that in the ORDER BY clause of FIRST_VALUE().

answered Jan 22, 2015 at 15:24

David Faber

12.5k2 gold badges33 silver badges41 bronze badges

1 Comment

SQLServerSteve Over a year ago

Just FYI for the benefit of anyone looking for a T-SQL equivalent, this solution also works for SQL Server - even though its FIRST_VALUE lacks the IGNORE NULLS clause. You can simply ORDER BY the column DESC if the other values are all null. This helps avoid a lot of awkward joins in pivot queries, as I'm finding out first-hand right now (thanks for the solution)

Alexi Theodore · Accepted Answer · 2020-09-29 06:57:52Z

1

This is the solution:

CREATE OR REPLACE FUNCTION first_agg ( anyelement, anyelement )
RETURNS anyelement AS
$$
    SELECT $1;
$$
LANGUAGE SQL
IMMUTABLE
;

then:

CREATE AGGREGATE first (
        sfunc    = first_agg,
        basetype = anyelement,
        stype    = anyelement
);

test it:

select first((case when a = 1 then null else a end) ORDER BY a NULLS FIRST) from generate_series(1, 100) a; -- => "2"

answered Sep 29, 2020 at 6:57

Alexi Theodore

1,84717 silver badges24 bronze badges

3 Comments

Gill Bates Over a year ago

Re-writing it in Plpgsql would make it 2x faster.

Alexi Theodore Over a year ago

@GillBates Not sure about that. Have you tested it or have reason to claim so? I specifically used SQL because it should be faster on the understanding that is well written out here: stackoverflow.com/questions/24755468/…

Gill Bates Over a year ago

Yes, it's based on practical experience.

Travesty3 · Accepted Answer · 2023-05-04 18:52:55Z

I found this question while searching for a similar solution for MSSQL.

The main problem I had with the above solution is that it will omit any records that don't have any non-null values.

With the help from the answers here, combined with the answers from this other question, I came up with this solution for SQL Server:

WITH soTable AS (
  SELECT 'a' AS Name, null AS YearOfBirth
  UNION ALL SELECT 'a', 2001
  UNION ALL SELECT 'a', 2002
  UNION ALL SELECT 'b', 1990
  UNION ALL SELECT 'b', null
  UNION ALL SELECT 'b', 1994
  UNION ALL SELECT 'b', 1981
  UNION ALL SELECT 'c', null
  UNION ALL SELECT 'c', 2009
  UNION ALL SELECT 'c', 2001
  UNION ALL SELECT 'd', null
)
SELECT
  Name,
  SUBSTRING(STRING_AGG(YearOfBirth, '|'), 1, CHARINDEX('|', STRING_AGG(YearOfBirth, '|'))-1) AS YearOfBirth
FROM
  soTable
GROUP BY
  Name;

Collectives™ on Stack Overflow

Is there an aggregate function that could return first non-null value within a group?

4 Answers 4

7 Comments

1 Comment

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

7 Comments

1 Comment

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related