PostgreSQL GROUP BY issue

Question

Suppose I have the table

      id      |       name       | number  |                address
--------------+------------------+---------+-------------------------------------
 1            | channel A        |      0  | http://stream01
 2            | channel B        |      2  | http://stream02
 3            | channel C        |      16 | http://stream03
 4            | channel B        |      2  | http://stream04
 5            | channel B        |      16 | http://stream05
 6            | channel C        |      16 | http://stream06
 7            | channel A        |      7  | http://stream07
 8            | channel A        |      5  | http://stream08
 9            | channel A        |      0  | http://stream09
...etc

I want to remove duplicate channels (rows with the same name and number). But I want the result to contain the other columns along with name and number.

The problem is which id and address I choose once I've removed the duplicates. I'm happy to choose the first it finds. So, for example, the result from the above table should be

      id      |       name       | number  |                address
--------------+------------------+---------+-------------------------------------
 1            | channel A        |      0  | http://stream01
 2            | channel B        |      2  | http://stream02
 3            | channel C        |      16 | http://stream03
 5            | channel B        |      16 | http://stream05
 7            | channel A        |      7  | http://stream07
 8            | channel A        |      5  | http://stream08
...etc

I realise I'll probably need a SELECT name,number FROM table GROUP BY name,number in my query and the query should start off SELECT id,name,number,address FROM (..) but I just can't think of a way to do this in one query.

Any ideas?

With "remove" you mean you want to delete the rows or just not have them in the result? — user330315
– user330315, Commented Mar 8, 2012 at 14:08
Not have them in the result. Have a feeling I almost have it - SELECT id,name,number,address FROM table AS t JOIN (SELECT name,number FROM table GROUP BY name,number) AS j USING(name,number). Didn't quite work. — tbh1
– tbh1, Commented Mar 8, 2012 at 14:15

Gavin · Accepted Answer · 2012-03-08 14:18:13Z

4

SELECT DISTINCT ON (name,number)
       id,
       name,
       number,
       address
  FROM table
 ORDER BY name,number,id;

answered Mar 8, 2012 at 14:18

Gavin

6,5603 gold badges28 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

tbh1 Over a year ago

Thank you. I tried DISTINCT ON earlier today but it didn't work. That's because my actual query has an ORDER BY at the end and was throwing the error SELECT DISTINCT ON expressions must match initial ORDER BY expressions. I can fix this by wrapping the entire query in () and prep-end a SELECT * FROM and stick my ORDER BY at the end.

MK_ · Accepted Answer · 2012-03-08 14:20:38Z

0

That should be enough:

 SELECT MIN(id), name, number, address FROM table GROUP BY name, number

answered Mar 8, 2012 at 14:20

MK_

861 bronze badge

1 Comment

user330315 Over a year ago

Will throw an error. You need to either apply an aggregate function on address or include it in the GROUP BY clause

Mike Sherrill 'Cat Recall' · Accepted Answer · 2012-03-08 14:36:05Z

I think the most understandable way to do this is with views or common table expressions. I'll use common table expressions.

create table test (
  id integer primary key,
  name varchar(20) not null,
  number integer not null,
  address varchar(30) not null
);

insert into test values 
(1, 'channel A', 0, 'http://stream01'),
(2, 'channel B', 2,   'http://stream02'),
(3, 'channel C', 16,  'http://stream03'),
(4, 'channel B', 2,   'http://stream04'),
(5, 'channel B', 16,  'http://stream05'),
(6, 'channel C', 16, 'http://stream06'),
(7, 'channel A', 7, 'http://stream07'),
(8, 'channel A', 5, 'http://stream08'),
(9, 'channel A', 0, 'http://stream09');

with unique_name_num as (
  select distinct name, number
  from test
),
min_id as (
  select number, name, min(id) id
  from test
  group by number, name
)
select t.*
from test t
inner join unique_name_num u on u.name = t.name and u.number = t.number
inner join min_id m on m.number = t.number and m.name = t.name and m.id = t.id
order by t.name, t.number

score 0 · Accepted Answer · 2012-03-08 14:52:02Z

0

SELECT min(id),
       name,
       number,
       min(address)
FROM the_table
GROUP BY name, number;

Edit:
If you need matching id and address, then the following is another solution:

SELECT id, 
       name, 
       number, 
       address
FROM ( 
  SELECT id,
         name,
         number,
         address, 
         row_number() over (partition by name, number order by id) as rn
  FROM the_table
) t
WHERE rn = 1

edited Mar 8, 2012 at 14:52

answered Mar 8, 2012 at 14:19

user330315

4 Comments

Mike Sherrill 'Cat Recall' Over a year ago

If min(id) and min(address) aren't in the same row, this will have the effect of manufacturing a row that isn't in the original table, won't it?

user330315 Over a year ago

Yes, you are correct. I understood the question such that tbh1 does not care which value is returned.

tbh1 Over a year ago

Thanks for this. My actual query is a fair bit more involved than the dumbed down one in the question. Wrapping everything in min() doesn't strike me as the best solution in my case, but thanks.

user330315 Over a year ago

@tbh1: the version with the windowing function might be more flexible then.

Collectives™ on Stack Overflow

PostgreSQL GROUP BY issue

4 Answers 4

1 Comment

1 Comment

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

1 Comment

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related