Handling duplicates in mysql

Hi All!

please, I need some help to get started with a script for finding and handling duplicate rows in a mysql table.
Here’s the table structure :


+------+---------------+----------+-------------+----------+
|  id  |   IP address  |    user  | visited_id  |    date  |
+------+---------------+----------+-------------+----------+
|  1   |  123.1.2.3    |  user_1  |       10    | 05.05.10 |
+------+---------------+----------+-------------+----------+
|  2   |  123.2.3.4    |  user_2  |       12    | 05.05.10 |
+------+---------------+----------+-------------+----------+
|  3   |  123.1.2.3    |  user_1  |       10    | 06.06.10 |
+------+---------------+----------+-------------+----------+

Now I need to list ALL the rows where the “visited id” is the logged user’s ID :


$q_hits = mysql_query("SELECT * FROM profile_hits WHERE visited_id = '$user_ID'");

As you can see, sometimes ONE unique visitor (column user) is inserted more than once, depending on the date he/she visited a page. I’d like to list ALL the visits to the page with ID “visited_id” BUT to avoid echoing the same user more than once. Is it possible to create an array within the


while ($hits_rows = mysql_fetch_array($q_hits))

loop and then to increment a variable if the user is the same (take the last visit date)?
Any help/idea is highly appreciated!
Regards


SELECT users.username
  FROM users
LEFT OUTER JOIN 
  (SELECT receiver_id AS id
   FROM contacts
   WHERE asker_id = 2
   AND contacts.status = 1
     UNION
   SELECT asker_id
   FROM contacts
   WHERE receiver_id = 2
   AND contacts.status = 1
  ) AS c
ON users.id = c.id
WHERE c.id IS NULL

And yes, I remember you told me some versions of MySQL can’t do this either, but hey, maybe it’s time to upgrade :slight_smile:

Try it without your modifications. :slight_smile:
status has to be 1, because in the subquery you want to select those who ARE friends of John, so the LEFT JOIN will show only those users that AREN’T friends of him (c.id IS NULL)

if i remember correctly, there are versions of mysql where the OR in the ON clause results in poor performance

it has to do with the fact that only one index can be used for each query operation (such as a join)

whereas the UNION is always efficient as each SELECT uses its own index

Huh… that’s a bit more complicated than I thought…
The query above works fine if a user requested contact from another user : the status is either 1 or 0. BUT what if they haven’t requested contacts yet? How can I select the username from TABLE users if it isn’t in the contacts TABLE together with my ID?
I tried to add some conditions to the query. Guess what? I didn’t succeed…
Can you please help me?

Thank you guys for your helpful posts!
I’ve tried both guido’s solutions with some modifications :


SELECT users.*
FROM users
LEFT OUTER JOIN
 (SELECT receiver_id AS id
   FROM contacts
   WHERE asker_id = '$user_ID'
   AND contacts.status = 0
 UNION
   SELECT asker_id AS id
   FROM contacts
   WHERE receiver_id = '$user_ID'
   AND contacts.status = 0
  ) AS c
ON users.id = c.id
WHERE c.id IS NULL
  LIMIT 5

to list ALL the users from the TABLE users that are NOT in confirmed relation with the logged user (status = 0), nor they have requested contact from he, nor the logged user have requested contact from them (huh, I hope it’s clear enough…). The query however lists ALL the members regardless of the status…

Sorry, I’m just too tired of this I guess…
Here’s the final, working query :


SELECT users.*
FROM users
LEFT OUTER JOIN
  (SELECT receiver_id AS id
   FROM contacts
   WHERE asker_id = '$user_ID'
   AND contacts.status = 1
	 UNION
   SELECT asker_id
   FROM contacts
   WHERE receiver_id = '$user_ID'
   AND contacts.status = 1
  ) AS c
ON users.id = c.id
WHERE c.id IS NULL AND users.id != '$user_ID'
  LIMIT 5

Thank you so much r937!!! I think I’d never figured it out!
:smiley:

SELECT users.username
  FROM contacts
INNER
  JOIN users
    ON users.id = contacts.receiver_id
 WHERE contacts.asker_id = 2
   AND contacts.status = 0
UNION 
SELECT users.username
  FROM contacts
INNER
  JOIN users
    ON users.id = contacts.asker_id
 WHERE contacts.receiver_id = 2
   AND contacts.status = 0

SELECT users.username
  FROM users
LEFT OUTER
  JOIN contacts 
    ON ((users.id = contacts.receiver_id AND contacts.asker_id = 2) OR
        (users.id = contacts.asker_id AND contacts.receiver_id = 2))
    AND contacts.status = 1
WHERE contacts.status IS NULL

I agree, BUT… I still need some help to join 2 tables :


TABLE users :

+------+--------------+
|  id  |   username   |
+------+--------------+
|   2  |      john    |
+------+--------------+
|   5  |     peter    |
+------+--------------+
|  9   |     michael  |
+------+--------------+


TABLE contacts :

+-----------+--------------+-----------+
|  asker_id |  receiver_id |  status   |
+-----------+--------------+-----------+
|       2   |         9    |       0   |
+-----------+--------------+-----------+
|       5   |         2    |       1   |
+-----------+--------------+-----------+

TABLE ‘users’ holds all the users with their id’s. If ‘john’ wants to be ‘friends’ with ‘michael’, he send a request and the status will be 0 until confirmed by michael (status 1).

Now what I want is a mysql query to go through the TABLE users and select all the users which are NOT friends with ‘john’ (status is 0 at the TABLE contacts where either asker_id OR receiver_id is equal to john’s AND the corresponding user’s ID). I tried 100s of combinations but I simply got lost… :confused:
Can someone help me please???

THNX!!!

running a query inside a loop is guaranteed poor performance – you should use a join instead

by the way, DISTINCT is ~not~ a function, so remove those needless parentheses

change this –

DISTINCT (visitor_username), ip_address, timestamp 

to this –

DISTINCT visitor_username, ip_address, timestamp 

Thank you r937!

In the meantime I managed to solve the issue with this mysql query :


SELECT DISTINCT (visitor_username), ip_address, timestamp 
FROM profile_hits 
WHERE profile_id = '$user_ID' 
GROUP BY visitor_username 
ORDER BY id DESC

and then later in the loop looked again at the same table to see the total number of visits to the desired page :


while ($hits_rows = mysql_fetch_array($q_hits)) {

$visitor    = $hits_rows['visitor_username'];
$ip_address = $hits_rows['ip_address'];

$q_all_hits = mysql_query("SELECT timestamp FROM profile_hits WHERE profile_id = '$user_ID' AND visitor_username = '$visitor' ORDER BY timestamp DESC");
$nr_of_visits = mysql_num_rows($q_all_hits);
$row_all_visits = mysql_fetch_assoc($q_all_hits);

This works fine, however I’m a bit afraid that it will generate a large amount of queries when the table gets bigger… Thoughts?
THNX
:smiley:

if you are looking to use arrays for this, then it is not a mysql question

if you are looking to do this with mysql, you can use GROUP BY along with aggregate functions that operate on the other columns you want to show

SELECT user
     , MIN(id)         AS min_id
     , MAX(IPaddress)  AS max_ip
     , AVG(visited_id) AS avg_visited_id
     , MAX(date)       AS max_date
  FROM profile_hits 
 WHERE visited_id = $user_ID
GROUP
    BY user