Help with complex SELECT query on duplicates

rvlach · May 10, 2011, 9:09am

Please need help with following problem. In my table I have entries such as:

initial name address postcode
I SMITH Frogs Lane TN52 2RE
J SMITH Frogs House TN52 2RE
T SMITH Rose Cottage CT23 5DE
F JONES Tally Ho Road TN52 2RE

Firstly, I need to count how many entries there are with the same name AND the same postcode. For the moment I can only count the number of entries with the same name.

$sql = mysql_query(“select name, count() as Num from profile group by name having count() > 1”);

Secondly, I need to select entries with the same name AND the same postcode resulting in:

I SMITH Frogs Lane TN52 2RE
J SMITH Frogs House TN52 2RE

This selects only duplicates with the same name:

$sql=“SELECT id, title, add_1, add_2, add_3, add_4, tel, tel2, email, mail, emailing, pc, initial, profile.name FROM profile INNER JOIN (SELECT name FROM profile GROUP BY name HAVING COUNT(id) > 1) dup ON profile.name = dup.name ORDER BY name ASC”;

Would very much appreciate someone’s help.

DarthGuido · May 10, 2011, 9:25am

Let’s put these queries in a more readable format:

Query 1:


SELECT 
    name
  , COUNT(*) AS Num 
FROM profile 
GROUP BY name 
HAVING count(*) > 1

Query 2:


SELECT 
    id
  , title
  , add_1
  , add_2
  , add_3
  , add_4
  , tel
  , tel2
  , email 
  , mail
  , emailing
  , pc
  , initial
  , profile.name 
FROM profile 
INNER JOIN 
  (SELECT 
       name 
   FROM profile 
   GROUP BY name 
   HAVING COUNT(id) > 1
) AS dup 
ON profile.name = dup.name 
ORDER BY name ASC

If you look closely at these two queries, you’ll see that resolving the first one, you’ll also have resolved the second (the subquery in the second query = the first query).

Do you understand how the GROUP BY works? Why does query 1 give you the duplicate names?

Kiran_Dangol · May 10, 2011, 9:34am

Hi rvlach,
you can do like given code

SELECT * FROM table_name group by name having postcode like postcode

hope that it will help you

rvlach · May 10, 2011, 9:35am

Yes I’m no pro when it comes to more complex queries other than the simple SELECT, UPDATE and DELET and I do not really understand how GROUP BY works.

This is a sample code I found searching the web and works for counting and selecting duplicates on the name only but this may probably be done some other way.

DarthGuido · May 10, 2011, 10:00am

As the name says, it groups rows based on column values, and then you can do certain operations (like SUM, COUNT, MIN, MAX) on other columns (MySQL :: MySQL 5.0 Reference Manual :: 11.15.1 GROUP BY (Aggregate) Functions).

So in the case of your query 1, grouping by Name, and using the function COUNT(*) what you get is for each name the number of rows found with that name.

Combining this with the HAVING clause, in the end you’ll extract only those names that occur more than once in the table.

If you want to do the same for duplicates of Name+postcode, then all you have to do is add the postcode column to the SELECT and GROUP BY:


SELECT 
    name
  [B][COLOR="Red"], postcode[/COLOR][/B]
  , COUNT(*) AS Num 
FROM profile 
GROUP BY 
    name 
  [B][COLOR="Red"], postcode[/COLOR][/B]
HAVING count(*) > 1

This will group all rows with the same name+postcode, count the number of rows for each combination, and return only those name+postcode combinations that are present in more than 1 row.

rvlach · May 10, 2011, 10:45am

guido,

Thanks for your help :). It all seems so easy when explained correctly. All records are displaying as they should.

rvlach · May 10, 2011, 11:35am

Guido,

I thought it was solved but looking again I see that it’s not quite what is required.

For the moment for each duplicate I have only have one record displayed. But I need all duplicates to be displayed as a manual analysis has to be made to determine whether to link the records together as they may be same family members so purchases made can be grouped under one entry. For example I need to display both

I SMITH Frogs Lane TN52 2RE
J SMITH Frogs Lane TN52 2RE

as these may be 2 families living at the same property. Unfortunately no code can do this for us as only we can determine that.

How would I go about that.

DarthGuido · May 10, 2011, 12:07pm

That’s where you have to modify query 2:


SELECT 
    id
  , title
  , add_1
  , add_2
  , add_3
  , add_4
  , tel
  , tel2
  , email 
  , mail
  , emailing
  , pc
  , initial
  , profile.name 
  [B][COLOR="Red"], profile.postcode[/COLOR][/B]
FROM profile 
INNER JOIN 
  (SELECT 
       name
     [B][COLOR="red"], postcode[/COLOR][/B] 
   FROM profile 
   GROUP BY 
       name 
     [B][COLOR="red"], postcode[/COLOR][/B]
   HAVING COUNT(id) > 1
) AS dup 
ON  profile.name = dup.name 
[B][COLOR="Red"]AND profile.postcode = dup.postcode[/COLOR][/B] 
ORDER BY name ASC

rvlach · May 10, 2011, 12:44pm

Brilliant. Thanks Guido. All is as it should be now.