2

I have a table with columns latitude and longitude. In most cases the value extends past the decimal quite a bit: -81.7770051972473 on the rare occasion the value is like this: -81.77 for some records.

How do I find duplicates and remove one of the duplicates for only the records that extend beyond two decimal places?

3 Answers 3

1

Using some creative substring, float, and charindex logic, I came up with this:

delete l1
from 
    latlong l1
    inner join (
        select
            id,
            substring(cast(latitude as varchar), 0, INSTR(CAST(latitude as varchar))+3, '.') as truncatedLat
        from
            latlong
    ) l2 on
        l1.id <> l2.id
        and l1.latitude = cast(l2.truncatedLat as float)

Before running, try select * in lieu of delete l1 first to make sure you're deleting the right rows.

I should note that this worked on SQL Server using functions I know exist in MySQL, but I wasn't able to test it against a MySQL instance, so there may be some little tweaking that needs to be done. For example, in SQL Server, I used charindex instead of instr, but both should work similarly.

Sign up to request clarification or add additional context in comments.

Comments

1

Not sure how to do that purely in SQL.

I have used scripting languages like PHP or CFML to solve similar needs by building a query to pull the records then looping over the record set and performing some comparison. If true, then VERY CAREFULLY call another function, passing in the record ID and delete the record. I would probably even leave the record in the table, but mark some another column as isDeleted.

If you are more ambitious than I, it looks like this thread is close to what you want

Deleting Duplicates in MySQL

finding multi column duplicates mysql

Comments

1

Using an external programming language (Perl, PHP, Java, Assembly...):

  • Select * from database
  • For each row, select * from database where newLat >= round(oldLat,2) and newLat < round(oldLat,2) + .01 and //same criteria for longitude
  • Keep one of them based on whatever criteria you choose. If lowest primary key, sort by that and skip the first result.
  • Delete everything else.
  • Repeat skipping to this step for any records you already deleted.

If for some reason you want to identify everything with greater than 2 digit precision:

select * from database where lat != round(lat,2), or long != round(long,2)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.