Function to flag duplicates in within query Postgresql

Question

I would like to write a function that flags duplicates in specified columns in postgresql.

For example, if I had the following table:

country | landscape | household
--------------------------------
TZA     | L01       | HH02
TZA     | L01       | HH03
KEN     | L02       | HH01
RWA     | L03       | HH01

I would like to be able to run the following query:

SELECT country,
       landscape,
       household,
       flag_duplicates(country, landscape) AS flag
FROM mytable

And get the following result:

country | landscape | household | flag
---------------------------------------
TZA     | L01       | HH02      | duplicated
TZA     | L01       | HH03      | duplicated
KEN     | L02       | HH01      |
RWA     | L03       | HH01      |

Inside the body of the function, I think I need something like:

IF (country || landscape IN (SELECT country || landscape FROM mytable
                            GROUP BY country || landscape)
    HAVING count(*) > 1) THEN 'duplicated'
ELSE NULL

But I am confused about how to pass all of those as arguments. I appreciate the help. I am using postgresql version 9.3.

Marcin H. · Accepted Answer · 2016-12-02 21:36:48Z

You don't need a function to accomplish that. Using function for every row in result set is not so good idea because of performance. A way better solution is use pure SQL (even with subqueries) and give database engine chance to optimize it. In your very example it should be something like that:

SELECT t.country,t.landscape,t.household,case when duplicates.count>1 then 'duplicate'end 
FROM mytable t JOIN ( 
SELECT count(household) FROM mytable GROUP BY country,landscape
) duplicates ON duplicates.country=t.country AND duplicates.landscape=t.landscape

which produces exactly the same result.

Update - if You want to use function at all cost, here is working example:

CREATE FUNCTION find_duplicates(arg_country varchar, arg_landscape varchar) returns varchar AS $$
BEGIN
    RETURN CASE WHEN count(household)>1 THEN 'duplicated' END FROM mytable
    WHERE country=arg_country AND landscape=arg_landscape
    GROUP BY country,landscape;
END
$$
LANGUAGE plpgsql STABLE;

Community · Accepted Answer · 2017-05-23 12:33:50Z

1

select
  *,
  (count(*) over (partition by country, landscape)) > 1 as flag
from
  mytable;

For function look at the @MarcinH answer but add stable to the function's definition to make its calls faster.

edited May 23, 2017 at 12:33

CommunityBot

11 silver badge

answered Dec 2, 2016 at 19:52

Abelisto

15.8k3 gold badges38 silver badges47 bronze badges

Collectives™ on Stack Overflow

Function to flag duplicates in within query Postgresql

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related