Removing duplicate rows with MYSQL

I’d like to delete duplicate rows in a MYSQL table without creating a new table.

Generally, how is that done?

How are they duplicated? Can you provide the schema? You won’t need to create a new table; DELETE statements should suffice.

George

How do you use DELETE to leave one of the duplicate rows, but DELETE the rest?

generally, start by defining which columns constitute a “duplicate”

i put “duplicate” in quotes because in almost every case where someone wants duplicates removed, they also have an auto_increment on the table – which, by definition, means that there are actually no duplicate rows, just rows with the same values in certain columns or combinations of columns

once you’ve decided which columns determine your dupes, look at the values in the other columns, and ask yourself if you care which rows should provide the values for these other columns

to illustrate this scenario, consider the following three rows in a users table –


userid email           username age
  42   fred@gmail.com  fred     -
   9   fred@gmail.com  -        -
  37   fred@gmail.com  todd     25

the “duplicates” column is the email, and your objective is to eliminate the dupes and leave only one row with fred@gmail.com

but look at the other columns – if you kept row 9, you’d be throwing away some valuable information

have i given you enough information for your rather general question?

These would be pure dupes except for the auto increment.

GOPalmer suggests using DELETE, but it’s not apparent to me how to use DELETE to remove all the dupes except one.

DELETE is tricky for the following reason…

what this means is you have to first extract the data you want to keep, save this in another table, and then you can either delete from your main table and insert from this saved table, or, if your extract was comprehensive enough, just drop and rename the tables

so, there’s nothing in the “manual” that let’s me order a primary key and remove the second and subsequent records of a duplicated key by default and reatain all rows with unique keys and only the first row with duplicate keys?

yes, there is a technique you can use, if you are sure that the duplicate rows have nothing in them that you care about

using the ALTER IGNORE TABLE syntax (note: the IGNORE is important), declare a UNIQUE INDEX on the column(s) that determine the dupes

and vwalah

:cool:

as in:


mysql_query("ALTER IGNORE TABLE stk") or die(mysql_error());

???

EDIT:


mysql_query("ALTER IGNORE TABLE stk INDEX (pic)") or die(mysql_error());

no :slight_smile:

first of all, you wouldn’t set up a php program to do this, as you are only going to do it once, and it’s a lot easier just typing the sql into a front end app like the mysql query browser or heidisql or phpmyadmin

but secondly, your ALTER statement doesn’t actually alter anything – you have to ADD UNIQUE INDEX

I’ve always used php and haven’t manually entered scripts with phpmyadmin before (until now). I just found the edit button to enter scripts.

I just entered: ALTER TABLE stk2 ADD INDEX ( pic )

What do I enter now to delete the dupes?

Obviously, ALTER IGNORE TABLE stk INDEX (pic), failed.

may i suggest that you read my previous posts again, because i mentioned it a couple of times already

you have to declare a special type of index, that rhymes with you neek

:smiley:

OK.

just did: ALTER TABLE stk2 ADD UNIQUE (pic)

then: ALTER IGNORE TABLE ‘stk2’ INDEX (‘pic’)

which worked, but no deletions.

I’m dancin, where do my feet need to go?

since this statement did not include the IGNORE keyword, and assuming that your table in fact did have duplicate pic values, the only possible result is that this statement failed, and you just didn’t notice it

this just adds another non-unique index (whether or not you removed the one you created earlier) and so it would’ve completed, but since it didn’t specify UNIQUE, you still have the dupes

but if you were to combine the above two statements …

(i’m trying real hard not to just hand you the answer, although i’ve pretty much done just that three times now…)

:slight_smile:

DROP UNIQUE INDEX (pic) ON stk2
–> #1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘UNIQUE INDEX (pic) ON stk2’ at line 1

ALTER TABLE stk2 ADD UNIQUE INDEX (pic)
–> #1062 - Duplicate entry ‘LAMS-RM1507.JPG’ for key ‘pic’

ALTER IGNORE TABLE stk2 UNIQUE INDEX (pic) –> Didn’t work for sure

I’ll bet I’m close, but I don’t see it.

i’m sorry, i’m not familiar with the “Didn’t work for sure” error message

have you looked up the syntax in the manual?

Yes I have.

SQL CREATE INDEX Statement

says: CREATE UNIQUE INDEX index_name ON table_name (column_name)

is the syntax.

Except, the error says there’s a duplicate entry for key pic suggesting that’s the problem, but you know that, right?

And, you’re saying it should still work, right?

okay, i’ve decided i will give you the answer

but first…

could you please do a SHOW CREATE TABLE for your table

i want to make sure i clean it up properly before adding another index

CREATE TABLE stk2 (
id int(11) NOT NULL AUTO_INCREMENT,
timestamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
cli varchar(10) NOT NULL,
mfr text NOT NULL,
stknum text NOT NULL,
pic varchar(200) NOT NULL,
width varchar(5) NOT NULL,
margin text NOT NULL,
msrp decimal(11,2) NOT NULL,
name text NOT NULL,
description text NOT NULL,
headline text NOT NULL,
ben1 text NOT NULL,
ben2 text NOT NULL,
ben3 text NOT NULL,
PRIMARY KEY (id),
KEY pic_2 (pic),
KEY pic_3 (pic),
KEY pic_4 (pic)
) ENGINE=MyISAM AUTO_INCREMENT=231 DEFAULT CHARSET=latin1

these are redundant and unnecessary

run this –


ALTER TABLE stk2 DROP INDEX pic_2;
ALTER TABLE stk2 DROP INDEX pic_3;
ALTER TABLE stk2 DROP INDEX pic_4;

you might want to do another SHOW CREATE TABLE for yourself, to ensure those indexes are gone

then do this –


[B][COLOR="Blue"]ALTER IGNORE TABLE stk2 ADD UNIQUE (pic)[/COLOR][/B]

and then check to see if there are any dupes