Finding duplicate values in MySQL

Question

I have a table with a varchar column, and I would like to find all the records that have duplicate values in this column. What is the best query I can use to find the duplicates?

Since you mentioned find all records, I am assuming you need to know the KEYS as well as the duplicated VALUES in that varchar column. — TechTravelThink
– TechTravelThink, Commented Mar 27, 2009 at 4:34
I can find the keys easy enough after I get the values, I really just want a list of all the duplicate values. — Jon Tackabury
– Jon Tackabury, Commented Mar 27, 2009 at 13:49

the Tin Man · Accepted Answer · 2012-08-21 23:33:10Z

1844

Do a SELECT with a GROUP BY clause. Let's say name is the column you want to find duplicates in:

SELECT name, COUNT(*) c FROM table GROUP BY name HAVING c > 1;

This will return a result with the name value in the first column, and a count of how many times that value appears in the second.

edited Aug 21, 2012 at 23:33

the Tin Man

161k44 gold badges222 silver badges308 bronze badges

answered Mar 27, 2009 at 4:24

levik

118k28 gold badges77 silver badges92 bronze badges

Sign up to request clarification or add additional context in comments.

13 Comments

NobleUplift Over a year ago

But how is this useful if you can't get the IDs of the rows with duplicate values? Yes, you can do a new query matching for each duplicate value, but is it possible to simply list the duplicates?

Matt R. Over a year ago

@NobleUplift You can do a GROUP_CONCAT(id) and it will list the IDs. See my answer for an example.

User Over a year ago

What would it mean if it said ERROR: column "c" does not exist LINE 1?

Monica Heddneck Over a year ago

I'm confused why this is the accepted answer and why it has so many upvotes. The OP asked, "I would like to find all the records that have duplicate values in this column." This answer returns a table of counts. -1

John Hunt Over a year ago

For those that don't understand how HAVING works - it's simply a filter on the result set, so happens after the main query.

|

simhumileco · Accepted Answer · 2019-01-24 12:35:19Z

296

SELECT varchar_col
FROM table
GROUP BY varchar_col
HAVING COUNT(*) > 1;

edited Jan 24, 2019 at 12:35

simhumileco

35.3k18 gold badges148 silver badges125 bronze badges

answered Mar 27, 2009 at 4:27

maxyfc

11.4k7 gold badges39 silver badges47 bronze badges

2 Comments

wmassingham Over a year ago

Superior to @levik's answer since it doesn't add an extra column. Makes it useful for use with IN()/NOT IN().

RisingSun Over a year ago

This answer is exactly the same as levik's answer, just written differently as IDs of duplicate values are still omitted from the result. levik's answer just uses an alias of the count and this one does not. Perhaps this one is a bit cleaner if you don't need the duplicate count.

techtheatre · Accepted Answer · 2022-12-28 10:42:07Z

247

SELECT  *
FROM    mytable mto
WHERE   EXISTS
        (
        SELECT  1
        FROM    mytable mti
        WHERE   mti.varchar_column = mto.varchar_column
        LIMIT 1, 1
        )
ORDER BY varchar_column

This query returns complete records, not just distinct varchar_column's.

This query doesn't use COUNT(*). If there are lots of duplicates, COUNT(*) is expensive, and you don't need the whole COUNT(*), you just need to know if there are two rows with same value.

This is achieved by the LIMIT 1, 1 at the bottom of the correlated query (essentially meaning "return the second row"). EXISTS would only return true if the aforementioned second row exists (i. e. there are at least two rows with the same value of varchar_column) .

Having an index on varchar_column will, of course, speed up this query greatly.

edited Dec 28, 2022 at 10:42

techtheatre

6,2108 gold badges33 silver badges52 bronze badges

answered Mar 27, 2009 at 10:54

Quassnoi

427k94 gold badges628 silver badges623 bronze badges

14 Comments

trante Over a year ago

Very good. I added ORDER BY varchar_column DESC to the end of query.

Rémi Breton Over a year ago

This should be the accepted answer, as GROUP BY and HAVING returns only one of the possible duplicates. Also, performance with indexed field instead of COUNT(*), and the possibility to ORDER BY to group duplicate records.

TryHarder Over a year ago

As stated in the comments above, this query allows you to list all duplicated rows. Very useful.

Clox Over a year ago

Looking at this I don't understand how it would work at all. Wont the inner condition always be true since any row in the outer table will also be available in the inner table and so every row will always at least match itself? I tried the query and got the result i suspected - every row returned. But with so many upvotes I'm doubting myself. Isn't the inner query missing something like "AND mto.id<>mti.id"? It does work for me when I add that.

Clox Over a year ago

@Quassnoi Alright. I've tried putting it on sqlfiddle but I've given up since every query I try to run, apart from creating the schema gets timed out. I did figure out that just removing "EXISTS" also makes the query work correctly for me.

|

Novocaine · Accepted Answer · 2021-08-12 12:52:08Z

195

Building off of levik's answer to get the IDs of the duplicate rows you can do a GROUP_CONCAT if your server supports it (this will return a comma separated list of ids).

SELECT GROUP_CONCAT(id), name, COUNT(*) c
FROM documents
GROUP BY name
HAVING c > 1;

edited Aug 12, 2021 at 12:52

Novocaine

4,8064 gold badges48 silver badges68 bronze badges

answered Feb 19, 2015 at 0:56

Matt R.

2,2991 gold badge17 silver badges19 bronze badges

3 Comments

Armfoot Over a year ago

Really appreciated Matt. This is truly helpful! For those trying to update in phpmyadmin if you leave the id together with the function like this: SELECT id, GROUP_CONCAT(id), name, COUNT(*) c [...] it enables inline editing and it should update all the rows involved (or at least the first one matched), but unfortunately the edit generates a Javascript error...

CMCDragonkai Over a year ago

How would you then calculate how many ids are subject to duplication?

MailBlade Over a year ago

How do I not get all the ID's grouped, but instead listed from first to last; with all their respective values in the columns next to them? So instead of grouping it, it just shows ID 1 and its value, ID 2 and its value. EVEN if the values for the ID is the same.

slfan · Accepted Answer · 2019-05-08 09:02:24Z

25

to get all the data that contains duplication i used this:

SELECT * FROM TableName INNER JOIN(
  SELECT DupliactedData FROM TableName GROUP BY DupliactedData HAVING COUNT(DupliactedData) > 1 order by DupliactedData)
  temp ON TableName.DupliactedData = temp.DupliactedData;

TableName = the table you are working with.

DupliactedData = the duplicated data you are looking for.

edited May 8, 2019 at 9:02

slfan

9,129115 gold badges69 silver badges81 bronze badges

answered May 8, 2019 at 8:40

udi

2614 silver badges5 bronze badges

3 Comments

warmwhisky Over a year ago

This one shows each duplicate in it's own row. That's what I need. Thanks.

RationalRabbit Over a year ago

Yes, exactly what I was looking for also. Displays all data and columns in rows which have a duplicate in a specific column.

Denis Viunyk Over a year ago

Thx, it is the best for me.

TechTravelThink · Accepted Answer · 2009-03-27 04:29:28Z

19

Assuming your table is named TableABC and the column which you want is Col and the primary key to T1 is Key.

SELECT a.Key, b.Key, a.Col 
FROM TableABC a, TableABC b
WHERE a.Col = b.Col 
AND a.Key <> b.Key

The advantage of this approach over the above answer is it gives the Key.

answered Mar 27, 2009 at 4:29

TechTravelThink

3,0943 gold badges21 silver badges13 bronze badges

3 Comments

Fabien Snauwaert Over a year ago

+1 Because it's handy. Though, ironically, the result itself contains duplicates (it lists a and b, then b and a.)

Michael Over a year ago

@FabienSnauwaert You can get rid of some of the duplicates by comparing less than (or greater than)

bcag2 Over a year ago

@TechTravelThink your answer is very clear, thanks for that but on large table it takes some time (about 2mn on more 20'000 entries table) and after show 25 first results, if I click to show next one, phpmyadmin show error "#1052 - Column 'id' in order clause is ambiguous"

AbsoluteƵERØ · Accepted Answer · 2017-08-01 22:29:58Z

16

Taking @maxyfc's answer further, I needed to find all of the rows that were returned with the duplicate values, so I could edit them in MySQL Workbench:

SELECT * FROM table
   WHERE field IN (
     SELECT field FROM table GROUP BY field HAVING count(*) > 1
   ) ORDER BY field

answered Aug 1, 2017 at 22:29

AbsoluteƵERØ

7,8792 gold badges27 silver badges35 bronze badges

Comments

demongolem · Accepted Answer · 2014-05-22 15:05:46Z

12

SELECT * 
FROM `dps` 
WHERE pid IN (SELECT pid FROM `dps` GROUP BY pid HAVING COUNT(pid)>1)

edited May 22, 2014 at 15:05

demongolem

9,76436 gold badges97 silver badges107 bronze badges

answered May 22, 2014 at 14:48

strustam

1211 silver badge2 bronze badges

1 Comment

Oddman Over a year ago

No, because this is quite possibly the slowest of the lot. Subselects are notoriously slow, as they're executed for every row returned.

davejal · Accepted Answer · 2015-11-24 13:23:41Z

11

To find how many records are duplicates in name column in Employee, the query below is helpful;

Select name from employee group by name having count(*)>1;

edited Nov 24, 2015 at 13:23

davejal

6,19110 gold badges45 silver badges85 bronze badges

answered Nov 24, 2015 at 12:12

user5599549

1191 silver badge2 bronze badges

Comments

Jonathan Bird · Accepted Answer · 2017-05-05 02:38:17Z

11

My final query incorporated a few of the answers here that helped - combining group by, count & GROUP_CONCAT.

SELECT GROUP_CONCAT(id), `magento_simple`, COUNT(*) c 
FROM product_variant 
GROUP BY `magento_simple` HAVING c > 1;

This provides the id of both examples (comma separated), the barcode I needed, and how many duplicates.

Change table and columns accordingly.

answered May 5, 2017 at 2:38

Jonathan Bird

3721 gold badge6 silver badges22 bronze badges

Comments

Mahbub · Accepted Answer · 2019-04-30 11:39:38Z

9

I am not seeing any JOIN approaches, which have many uses in terms of duplicates.

This approach gives you actual doubled results.

SELECT t1.* FROM my_table as t1 
LEFT JOIN my_table as t2 
ON t1.name=t2.name and t1.id!=t2.id 
WHERE t2.id IS NOT NULL 
ORDER BY t1.name

edited Apr 30, 2019 at 11:39

Mahbub

4,9621 gold badge33 silver badges35 bronze badges

answered Apr 20, 2018 at 10:33

Adam Fischer

1,10011 silver badges23 bronze badges

1 Comment

Drew Over a year ago

FYI - You'll want to 'select distinct somecol ..' if there is a potential for more than 1 duplicate record to exist otherwise the results will contain duplicates of the duplicated rows that were found.

davejal · Accepted Answer · 2017-02-23 15:11:07Z

8

I saw the above result and query will work fine if you need to check single column value which are duplicate. For example email.

But if you need to check with more columns and would like to check the combination of the result so this query will work fine:

SELECT COUNT(CONCAT(name,email)) AS tot,
       name,
       email
FROM users
GROUP BY CONCAT(name,email)
HAVING tot>1 (This query will SHOW the USER list which ARE greater THAN 1
              AND also COUNT)

edited Feb 23, 2017 at 15:11

davejal

6,19110 gold badges45 silver badges85 bronze badges

answered May 30, 2016 at 7:42

user2235601

1 Comment

Avatar Over a year ago

Exactly what was needed! Here my query, checking 3 fields for duplicates:

SELECT COUNT(CONCAT(userid,event,datetime)) AS total, userid, event, datetime FROM mytable GROUP BY CONCAT(userid, event, datetime ) HAVING total>1

Lukasz Szozda · Accepted Answer · 2018-07-12 17:40:11Z

8

I prefer to use windowed functions(MySQL 8.0+) to find duplicates because I could see entire row:

WITH cte AS (
  SELECT *
    ,COUNT(*) OVER(PARTITION BY col_name) AS num_of_duplicates_group
    ,ROW_NUMBER() OVER(PARTITION BY col_name ORDER BY col_name2) AS pos_in_group
  FROM table
)
SELECT *
FROM cte
WHERE num_of_duplicates_group > 1;

DB Fiddle Demo

answered Jul 12, 2018 at 17:40

Lukasz Szozda

181k26 gold badges278 silver badges326 bronze badges

Comments

AbsoluteƵERØ · Accepted Answer · 2017-08-01 22:19:23Z

7

SELECT t.*,(select count(*) from city as tt
  where tt.name=t.name) as count
  FROM `city` as t
  where (
     select count(*) from city as tt
     where tt.name=t.name
  ) > 1 order by count desc

Replace city with your Table. Replace name with your field name

edited Aug 1, 2017 at 22:19

AbsoluteƵERØ

7,8792 gold badges27 silver badges35 bronze badges

answered Jan 25, 2013 at 5:59

Lalit Patel

1091 silver badge5 bronze badges

Comments

AsgarAli · Accepted Answer · 2018-02-21 06:46:22Z

6

SELECT ColumnA, COUNT( * )
FROM Table
GROUP BY ColumnA
HAVING COUNT( * ) > 1

edited Feb 21, 2018 at 6:46

AsgarAli

2,2092 gold badges21 silver badges32 bronze badges

answered Mar 27, 2009 at 4:28

Scott Ferguson

7,8667 gold badges44 silver badges64 bronze badges

2 Comments

Kafoso Over a year ago

This is incorrect as it also finds unique occurrences. 0 should be 1.

Hashim Aziz Over a year ago

No idea why this is so low, the simplest answer that worked for me by far. I still find it crazy that something as basic as identifying duplicates is a four-line command in SQL, but that's better than some of the convoluted 10-line answers that were somehow voted higher than this one.

Nhlanhla R. · Accepted Answer · 2020-10-29 22:57:32Z

6

I improved from this:

SELECT 
    col, 
    COUNT(col)
FROM
    table_name
GROUP BY col
HAVING COUNT(col) > 1;

answered Oct 29, 2020 at 22:57

Nhlanhla R.

1091 silver badge3 bronze badges

Comments

David Robertson · Accepted Answer · 2021-02-24 01:07:08Z

5

As a variation on Levik's answer that allows you to find also the ids of the duplicate results, I used the following:

SELECT * FROM table1 WHERE column1 IN (SELECT column1 AS duplicate_value FROM table1 GROUP BY column1 HAVING COUNT(*) > 1)

answered Feb 24, 2021 at 1:07

David Robertson

4998 silver badges17 bronze badges

Comments

Vardkin · Accepted Answer · 2023-08-08 11:22:23Z

5

If you want to remove duplicate use DISTINCT

Otherwise use this query:

SELECT users.*,COUNT(user_ID) as user 
FROM users 
GROUP BY user_name 
HAVING user > 1;

edited Aug 8, 2023 at 11:22

Vardkin

1103 silver badges4 bronze badges

answered Jan 14, 2019 at 7:21

Hassan Latif Butt

511 silver badge1 bronze badge

Comments

Moseleyi · Accepted Answer · 2013-02-21 08:59:37Z

3

SELECT 
    t.*,
    (SELECT COUNT(*) FROM city AS tt WHERE tt.name=t.name) AS count 
FROM `city` AS t 
WHERE 
    (SELECT count(*) FROM city AS tt WHERE tt.name=t.name) > 1 ORDER BY count DESC

edited Feb 21, 2013 at 8:59

Moseleyi

2,9754 gold badges34 silver badges57 bronze badges

answered Feb 21, 2013 at 8:37

magesh

311 bronze badge

1 Comment

NobleUplift Over a year ago

Doing the same subquery twice seems inefficient.

Chandresh · Accepted Answer · 2016-05-30 13:52:46Z

3

The following will find all product_id that are used more than once. You only get a single record for each product_id.

SELECT product_id FROM oc_product_reward GROUP BY product_id HAVING count( product_id ) >1

Code taken from : http://chandreshrana.blogspot.in/2014/12/find-duplicate-records-based-on-any.html

answered May 30, 2016 at 13:52

Chandresh

3712 silver badges7 bronze badges

Comments

kodabear · Accepted Answer · 2016-07-08 16:59:46Z

3

CREATE TABLE tbl_master
    (`id` int, `email` varchar(15));

INSERT INTO tbl_master
    (`id`, `email`) VALUES
    (1, '[email protected]'),
    (2, '[email protected]'),
    (3, '[email protected]'),
    (4, '[email protected]'),
    (5, '[email protected]');

QUERY : SELECT id, email FROM tbl_master
WHERE email IN (SELECT email FROM tbl_master GROUP BY email HAVING COUNT(id) > 1)

edited Jul 8, 2016 at 16:59

kodabear

3401 silver badge14 bronze badges

answered Mar 4, 2016 at 7:55

Bijesh Sheth

774 bronze badges

Comments

Hassaan · Accepted Answer · 2016-06-30 12:33:34Z

2

SELECT DISTINCT a.email FROM `users` a LEFT JOIN `users` b ON a.email = b.email WHERE a.id != b.id;

edited Jun 30, 2016 at 12:33

Hassaan

7,7207 gold badges34 silver badges53 bronze badges

answered Jul 1, 2013 at 18:17

Pawel Furmaniak

4,8263 gold badges32 silver badges34 bronze badges

5 Comments

NobleUplift Over a year ago

Worth noting that this is unbearably slow or might not even finish if the column being queried for is not indexed. Otherwise, I was able to change a.email to a.* and get all the IDs of the rows with duplicates.

Michael Over a year ago

@NobleUplift What are you talking about?

NobleUplift Over a year ago

@Michael Well since this is three years old I can't test on whatever version of MySQL I was using, but I tried this same query on a database where the column I selected did not have an index on it, so it took quite a few seconds to finish. Changing it to SELECT DISTINCT a.* resolved almost instantly.

Michael Over a year ago

@NobleUplift Ah ok. I can understand it being slow... the part that I am concerned about is "might not even finish".

NobleUplift Over a year ago

@Michael I don't remember which table in our system I had to run this query on, but for the ones with a few million records they probably would have finished, but in a time that took so long that I gave up on seeing when it actually would finish.

Iwan Ross · Accepted Answer · 2021-09-21 14:36:59Z

2

Thanks to @novocaine for his great answer and his solution worked for me. I altered it slightly to include a percentage of the recurring values, which was needed in my case. Below is the altered version. It reduces the percentage to two decimal places. If you change the ,2 to 0, it will display no decimals, and to 1, then it will display one decimal place, and so on.

SELECT GROUP_CONCAT(id), name, COUNT(*) c, 
COUNT(*) OVER() AS totalRecords, 
CONCAT(FORMAT(COUNT(*)/COUNT(*) OVER()*100,2),'%') as recurringPecentage
FROM table
GROUP BY name
HAVING c > 1

answered Sep 21, 2021 at 14:36

Iwan Ross

3643 silver badges12 bronze badges

Comments

score 1 · Accepted Answer · 2016-02-05 03:31:24Z

1

For removing duplicate rows with multiple fields , first cancate them to the new unique key which is specified for the only distinct rows, then use "group by" command to removing duplicate rows with the same new unique key:

Create TEMPORARY table tmp select concat(f1,f2) as cfs,t1.* from mytable as t1;
Create index x_tmp_cfs on tmp(cfs);
Create table unduptable select f1,f2,... from tmp group by cfs;

edited Feb 5, 2016 at 3:31

answered Feb 4, 2016 at 9:58

user3162712

2 Comments

Robert Over a year ago

can you also add an explanation?

maxhb Over a year ago

Why not use CREATE TEMPORARY TABLE ...? A little explanation of your solution would be great.

Andrew LaPrise · Accepted Answer · 2016-09-06 14:21:26Z

One very late contribution... in case it helps anyone waaaaaay down the line... I had a task to find matching pairs of transactions (actually both sides of account-to-account transfers) in a banking app, to identify which ones were the 'from' and 'to' for each inter-account-transfer transaction, so we ended up with this:

SELECT 
    LEAST(primaryid, secondaryid) AS transactionid1,
    GREATEST(primaryid, secondaryid) AS transactionid2
FROM (
    SELECT table1.transactionid AS primaryid, 
        table2.transactionid AS secondaryid
    FROM financial_transactions table1
    INNER JOIN financial_transactions table2 
    ON table1.accountid = table2.accountid
    AND table1.transactionid <> table2.transactionid 
    AND table1.transactiondate = table2.transactiondate
    AND table1.sourceref = table2.destinationref
    AND table1.amount = (0 - table2.amount)
) AS DuplicateResultsTable
GROUP BY transactionid1
ORDER BY transactionid1;

The result is that the DuplicateResultsTable provides rows containing matching (i.e. duplicate) transactions, but it also provides the same transaction id's in reverse the second time it matches the same pair, so the outer SELECT is there to group by the first transaction ID, which is done by using LEAST and GREATEST to make sure the two transactionid's are always in the same order in the results, which makes it safe to GROUP by the first one, thus eliminating all the duplicate matches. Ran through nearly a million records and identified 12,000+ matches in just under 2 seconds. Of course the transactionid is the primary index, which really helped.

Vipin Jain · Accepted Answer · 2017-05-01 09:41:26Z

1

Select column_name, column_name1,column_name2, count(1) as temp from table_name group by column_name having temp > 1

edited May 1, 2017 at 9:41

answered Dec 18, 2015 at 18:21

Vipin Jain

3,77618 silver badges36 bronze badges

Comments

Triyugi Narayan Mani · Accepted Answer · 2018-11-15 09:42:05Z

1

Try using this query:

SELECT name, COUNT(*) value_count FROM company_master GROUP BY name HAVING value_count > 1;

edited Nov 15, 2018 at 9:42

Triyugi Narayan Mani

3,1098 gold badges38 silver badges57 bronze badges

answered Nov 15, 2018 at 9:16

Atul Akabari

1034 bronze badges

Collectives™ on Stack Overflow

Finding duplicate values in MySQL

27 Answers 27

13 Comments

2 Comments

14 Comments

3 Comments

3 Comments

3 Comments

Comments

1 Comment

Comments

Comments

1 Comment

1 Comment

Comments

Comments

2 Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

5 Comments

Comments

2 Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

27 Answers 27

13 Comments

2 Comments

14 Comments

3 Comments

3 Comments

3 Comments

Comments

1 Comment

Comments

Comments

1 Comment

1 Comment

Comments

Comments

2 Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

5 Comments

Comments

2 Comments

Comments

Comments

Comments

Linked

Related