0

I would like to write a function that flags duplicates in specified columns in postgresql.

For example, if I had the following table:

country | landscape | household
--------------------------------
TZA     | L01       | HH02
TZA     | L01       | HH03
KEN     | L02       | HH01
RWA     | L03       | HH01

I would like to be able to run the following query:

SELECT country,
       landscape,
       household,
       flag_duplicates(country, landscape) AS flag
FROM mytable

And get the following result:

country | landscape | household | flag
---------------------------------------
TZA     | L01       | HH02      | duplicated
TZA     | L01       | HH03      | duplicated
KEN     | L02       | HH01      |
RWA     | L03       | HH01      |

Inside the body of the function, I think I need something like:

IF (country || landscape IN (SELECT country || landscape FROM mytable
                            GROUP BY country || landscape)
    HAVING count(*) > 1) THEN 'duplicated'
ELSE NULL

But I am confused about how to pass all of those as arguments. I appreciate the help. I am using postgresql version 9.3.

2 Answers 2

1

You don't need a function to accomplish that. Using function for every row in result set is not so good idea because of performance. A way better solution is use pure SQL (even with subqueries) and give database engine chance to optimize it. In your very example it should be something like that:

SELECT t.country,t.landscape,t.household,case when duplicates.count>1 then 'duplicate'end 
FROM mytable t JOIN ( 
SELECT count(household) FROM mytable GROUP BY country,landscape
) duplicates ON duplicates.country=t.country AND duplicates.landscape=t.landscape

which produces exactly the same result.

Update - if You want to use function at all cost, here is working example:

CREATE FUNCTION find_duplicates(arg_country varchar, arg_landscape varchar) returns varchar AS $$
BEGIN
    RETURN CASE WHEN count(household)>1 THEN 'duplicated' END FROM mytable
    WHERE country=arg_country AND landscape=arg_landscape
    GROUP BY country,landscape;
END
$$
LANGUAGE plpgsql STABLE;
Sign up to request clarification or add additional context in comments.

Comments

1
select
  *,
  (count(*) over (partition by country, landscape)) > 1 as flag
from
  mytable;

For function look at the @MarcinH answer but add stable to the function's definition to make its calls faster.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.