I have a company database with name, address and city. I have been able to get rid of duplicates with matches at the beginning of the name with the same address, the same name and address, and the same name with the same beginning of the address.
But I now have some situations like
DiMio and Di Mio
P S Gourmet Coffee and Ps Gourmet Coffee
Both have the same addresses and city but there are too many cases of names being different with the same address that I can't just use an address match. Some part of the name needs to match also.
I have tried several things, including
SELECT id, name, address, city, count( * ) AS n
GROUP BY name like SUBSTRING(name FROM 5 FOR 4) , address, left( city, 4 )
HAVING n >1
But that didn't really work well at all.
Is there a way to do this in MySQL or do I need to do it with a php routine of some sort?
try this --
, COUNT(*) AS n
BY REPLACE(name,' ','')</font>
HAVING n > 1
Thanks, that works for those names pretty well. And also gives me one more tool in the toolbox to go after the others.
Like if one starts with The and the second one doesn't. And I realize at some point it becomes an individual case and decision. So I'll head down that road. But the replace really helps. Thanks for the input.