SQL - insert rows preserving overall number of duplicate values

Question

I have a table, say, Product (Id, Name):

Id	Name
1	'one'
2	'two'
3	'three'

I have a new version of data that is needed to update the table, with possible duplicate values.

For example:

Name
'one'
'two'
'two'
'two'
'four'

I need the result table to consist of as many rows of each "Name" as there are in the second table, i.e.

Id	Name
1	'one'
2	'two'
3	'three'
4	'two'
5	'two'
6	'four'

So, if the value, like 'one' already exists, we don't need to insert a duplicate, but if there are more rows of the same value in the second table, like 'two' or 'four', we need to insert respectively 2 and 1 more values into the table.

How do I do it using SQL?

I've tried answers that suggest inserting into the table without duplicates, which is not my task.

delete "one" "two" and "four" in the table, then insert your new data. — topsail
– topsail, Commented Jul 3, 2024 at 14:13
@topsail that is not the preferable solution for my task, but thanks for your suggestion — Victor Khotin
– Victor Khotin, Commented Jul 3, 2024 at 15:18

Søren Kongstad · Accepted Answer · 2024-07-03 14:13:26Z

First I set up som tables for the testing:

create temp table Product(id  integer primary key generated always as identity, name text);
insert into Product (name) values ('one'),('two'),('three');
select * from Product;

create  temp table newdata(name text);
insert into newdata values ('one'),('two'),('two'),('two'),('four');
select * from newdata;

Then I use the analytical function "row_number()" which creates row numbers on the result. That is just a way of counting how many instances of each name is in your list

select name,row_number() over (partition by name order by name) rn from newdata;


+------+----+
| name | rn |
+------+----+
| four |  1 |
| one  |  1 |
| two  |  1 |
| two  |  2 |
| two  |  3 |
+------+----+

For product:

select name,row_number() over (partition by name order by name) rn from product


+-------+----+
| name  | rn |
+-------+----+
| one   |  1 |
| three |  1 |
| two   |  1 |
+-------+----+

Finding the differences between the two tells me which names I am missing:

select name,row_number() over (partition by name order by name) rn from newdata
except
select name,row_number() over (partition by name order by name) rn from product;


+------+----+
| name | rn |
+------+----+
| two  |  3 |
| four |  1 |
| two  |  2 |
+------+----+

Then I just need to insert the missing names:

insert into product (name)
select name from (
select name,row_number() over (partition by name order by name) rn from newdata
except
select name,row_number() over (partition by name order by name) rn from product
) a;
select * from product


+----+-------+
| id | name  |
+----+-------+
|  1 | one   |
|  2 | two   |
|  3 | three |
|  4 | two   |
|  5 | four  |
|  6 | two   |
+----+-------+

Collectives™ on Stack Overflow

SQL - insert rows preserving overall number of duplicate values

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related