Please need help with following problem. In my table I have entries such as:
initial name address postcode
I SMITH Frogs Lane TN52 2RE
J SMITH Frogs House TN52 2RE
T SMITH Rose Cottage CT23 5DE
F JONES Tally Ho Road TN52 2RE
Firstly, I need to count how many entries there are with the same name AND the same postcode. For the moment I can only count the number of entries with the same name.
$sql = mysql_query(“select name, count() as Num from profile group by name having count() > 1”);
Secondly, I need to select entries with the same name AND the same postcode resulting in:
I SMITH Frogs Lane TN52 2RE
J SMITH Frogs House TN52 2RE
This selects only duplicates with the same name:
$sql=“SELECT id, title, add_1, add_2, add_3, add_4, tel, tel2, email, mail, emailing, pc, initial, profile.name FROM profile INNER JOIN (SELECT name FROM profile GROUP BY name HAVING COUNT(id) > 1) dup ON profile.name = dup.name ORDER BY name ASC”;
Let’s put these queries in a more readable format:
Query 1:
SELECT
name
, COUNT(*) AS Num
FROM profile
GROUP BY name
HAVING count(*) > 1
Query 2:
SELECT
id
, title
, add_1
, add_2
, add_3
, add_4
, tel
, tel2
, email
, mail
, emailing
, pc
, initial
, profile.name
FROM profile
INNER JOIN
(SELECT
name
FROM profile
GROUP BY name
HAVING COUNT(id) > 1
) AS dup
ON profile.name = dup.name
ORDER BY name ASC
If you look closely at these two queries, you’ll see that resolving the first one, you’ll also have resolved the second (the subquery in the second query = the first query).
Do you understand how the GROUP BY works? Why does query 1 give you the duplicate names?
Yes I’m no pro when it comes to more complex queries other than the simple SELECT, UPDATE and DELET and I do not really understand how GROUP BY works.
This is a sample code I found searching the web and works for counting and selecting duplicates on the name only but this may probably be done some other way.
So in the case of your query 1, grouping by Name, and using the function COUNT(*) what you get is for each name the number of rows found with that name.
Combining this with the HAVING clause, in the end you’ll extract only those names that occur more than once in the table.
If you want to do the same for duplicates of Name+postcode, then all you have to do is add the postcode column to the SELECT and GROUP BY:
SELECT
name
[B][COLOR="Red"], postcode[/COLOR][/B]
, COUNT(*) AS Num
FROM profile
GROUP BY
name
[B][COLOR="Red"], postcode[/COLOR][/B]
HAVING count(*) > 1
This will group all rows with the same name+postcode, count the number of rows for each combination, and return only those name+postcode combinations that are present in more than 1 row.
I thought it was solved but looking again I see that it’s not quite what is required.
For the moment for each duplicate I have only have one record displayed. But I need all duplicates to be displayed as a manual analysis has to be made to determine whether to link the records together as they may be same family members so purchases made can be grouped under one entry. For example I need to display both
I SMITH Frogs Lane TN52 2RE
J SMITH Frogs Lane TN52 2RE
as these may be 2 families living at the same property. Unfortunately no code can do this for us as only we can determine that.
SELECT
id
, title
, add_1
, add_2
, add_3
, add_4
, tel
, tel2
, email
, mail
, emailing
, pc
, initial
, profile.name
[B][COLOR="Red"], profile.postcode[/COLOR][/B]
FROM profile
INNER JOIN
(SELECT
name
[B][COLOR="red"], postcode[/COLOR][/B]
FROM profile
GROUP BY
name
[B][COLOR="red"], postcode[/COLOR][/B]
HAVING COUNT(id) > 1
) AS dup
ON profile.name = dup.name
[B][COLOR="Red"]AND profile.postcode = dup.postcode[/COLOR][/B]
ORDER BY name ASC