I am using a table I was given of all the towns in the UK and I have realised it contains duplicates
e.g
971 Birmingham West Midlands
978 Birmingham West Midlands
993 Birmingham West Midlands
1006 Birmingham West Midlands
1047 Birmingham West Midlands
My table structure is as follows:
CREATE TABLE `towns` (
`town_id` int(11) NOT NULL,
`town` varchar(150) collate utf8_unicode_ci NOT NULL,
`county` varchar(150) collate utf8_unicode_ci NOT NULL,
PRIMARY KEY (`town_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci COMMENT='Towns';
I have done a search for some code to remove duplicates and found the following code but wanted to get a second opinion before using it as the site is live and I don’t want to break it:)
CREATE TABLE new_table as
SELECT * FROM old_table WHERE 1 GROUP BY [column to remove duplicates by];
DROP TABLE old_table;
RENAME TABLE new_table TO old_table;
If this is correct, then can you advise me of what to put in the square bracket?
If there is a better solution please let me know.
first, which database system is this? it looks like MySQL but i’m just double-checking because we have a MySQL forum but you chose to post in the Databases forum instead
second, does this query return anything –
SELECT MIN(town_id)
, town
, county
FROM towns
GROUP
BY town
, county
HAVING COUNT(*) > 1
just to be safe, back up your table data first, then try this –
DELETE towns
FROM ( SELECT MIN(town_id) AS min_id
, town
, county
FROM towns
GROUP
BY town
, county
HAVING COUNT(*) > 1 ) AS dupes
INNER
JOIN towns
ON towns.town = dupes.town
AND towns.county = dupes.county
WHERE towns.town_id > dupes.min_id
Thanks Rudy, back up made but get the following error:
#1104 - The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET SQL_MAX_JOIN_SIZE=# if the SELECT is okay
i don’t know why you would get that, nor how to fix it, but i can think of a couple of ways to get around it…
CREATE TABLE towns_to_delete
SELECT towns.town_id
FROM ( SELECT MIN(town_id) AS min_id
, town
, county
FROM towns
GROUP
BY town
, county
HAVING COUNT(*) > 1 ) AS dupes
INNER
JOIN towns
ON towns.town = dupes.town
AND towns.county = dupes.county
WHERE towns.town_id > dupes.min_id
if this runs, you should inspect a few ids from the towns_to_delete table, and convince yourself that those ids do need to be deleted, then run this –
DELETE FROM towns WHERE town_id IN
( SELECT town_id FROM towns_to_delete )
#1104 - The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET SQL_MAX_JOIN_SIZE=# if the SELECT is okay
CREATE TABLE towns_to_delete SELECT towns.town_id
FROM (
SELECT MIN( town_id ) AS min_id, town, county
FROM towns
GROUP
BY town, county
HAVING COUNT( * ) >1
) AS dupes
INNER
JOIN towns ON towns.town = dupes.town
AND towns.county = dupes.county
WHERE towns.town_id > dupes.min_id
I tried the following from post 8 again and it still works. I guess you know it is not exactly the same query
SELECT MIN( town_id ) , town, county
FROM towns
GROUP
BY town, county
HAVING COUNT( * ) >1
LIMIT 0 , 30
After running the ALTER TABLE query, I then tried your previous CREATE TABLE towns_to_delete and it now worked.
The ids in the towns_to_delete where indeed the culprits so I ran the other query to delete them from the towns table and it is all fine now.
So I would just like to say thank you so much for all your time on this. It is really appreciated as I am trying to run before I can walk with this site.