How to delete duplicate rows?

I’ve got a table that may contain rows with duplicates value in one of the columns. If duplicates exist in that column, I want to delete the duplicates, leaving only one. Is there a one statement sql query for doing that? Note that the timestamp and id will be different between duplicates. Only the order_number will be a duplicate.

table:
email_queue

fields:
id (key)
order_number (may be duplicate)
timestamp

A single query won’t be enough here. You will have to write a procedure for this:


DECLARE
    
    CURSOR c1 IS
        SELECT order_number, COUNT(order_number) count 
        FROM table WHERE count > 1 GROUP BY order_number;
    ordnu VARCHAR(255);
    cnt INT;
    
BEGIN
    
    OPEN c1;
    LOOP
        FETCH c1 INTO ordnu, cnt;
        EXIT WHEN c1%NOTFOUND;
        DELETE FROM table WHERE order_number = ordnu;
        COMMIT;
    END LOOP;
    CLOSE c1;
    
END;
/

EDIT: This is Oracle syntax. But it should be easy to rewrite it into MySQL.

Or just use this SELECT query and store the returned result set:


        SELECT order_number, COUNT(order_number) count 
        FROM table WHERE count > 1 GROUP BY order_number;

Then run the DELETE query in a for loop in some server side language (or construct one big DELETE query with IN clause).

what’s up with all the looping? :wink:

you can do the whole shebang in one query –

delete email_queue
  from email_queue
inner
  join ( select order_number 
              , MAX(id) AS last_id
           from email_queue
         group
             By order_number
         having count(*) > 1 ) as dupes
    ON dupes.order_number = email_queue.order_number
   and dupes.last_id > email_queue.id 

this deletes all rows where the id value isn’t the last id value in each order_number

you could also do it by the timestamp

Thanks, Rudy! BTW, I just got your book in the mail. It’s awesome and has already done wonders for SQL even though I’m only halfway through it.

thanks for the kind words :slight_smile: