Insert into table unique rows by postgresql

Question

I have some code stats in greenplum table A

| id  | file   | repo | lang | line |
-------------------------------------
| a   | /a.txt | r1   | txt  | 3    |
| a   | /b.c   | r1   | c    | 5    |
| b   | /x.java| r1   | java | 33   |
| c   | /f.cpp | r2   | c++  | 23   |
| a   | /a.txt | r3   | txt  | 3    |
| a   | /b.c   | r3   | c    | 5    |

but the last two rows code indicate this code is come form repo r1, because the commit id is same with first two rows. I want to remove the duplicate rows, and insert result to table B:

| id  | file   | repo | lang | line |
-------------------------------------
| a   | /a.txt | r1   | txt  | 3    |
| a   | /b.c   | r1   | c    | 5    |
| b   | /x.java| r1   | java | 33   |
| c   | /f.cpp | r2   | c++  | 23   |

the row can be distinct by: id + file + repo

Thanks in advance.

Salman Arshad · Accepted Answer · 2019-10-22 14:39:02Z

1

You can use NOT EXISTS to check that a duplicate does not exist:

SELECT *
FROM t
WHERE NOT EXISTS (
    SELECT 1
    FROM t AS x
    WHERE x.id   = t.id
    AND   x.file = t.file
    AND   x.repo < t.repo
)

SQL Fiddle

edited Oct 22, 2019 at 14:39

answered Oct 22, 2019 at 12:45

Salman Arshad

274k85 gold badges450 silver badges540 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

jamee Over a year ago

what does x.repo > t.repo mean ?

Salman Arshad Over a year ago

You seem to have different repo for a,/a.txt: r1 and r3; this condition keeps r1 and discards r2. Seems to match the expected output.

jamee Over a year ago

There may be more than two repos, can this condition keep just one discard others?

Salman Arshad Over a year ago

This is what it should do... replace > with < to keep greatest or smallest repo and (compared as strings) discard all others. It assumes that repo is different if id and file of two rows are same.

Gordon Linoff · Accepted Answer · 2019-10-22 12:46:36Z

0

Aggregation would seem to do what you want:

select id, file, min(repo) as repo, lang, line
from t
group by id, file, lang, line;

answered Oct 22, 2019 at 12:46

Gordon Linoff

1.3m62 gold badges705 silver badges857 bronze badges

Collectives™ on Stack Overflow

Insert into table unique rows by postgresql

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related