Cleanup Query?

Recently I had an issue where friend requests were only half deleted.

I have a friend table which stores 3 records… user_id1, user_id2, and status of the request. For each friend request there are two records created.

So for instance, the stored data would look like:

1, 2, accepted
2, 1, accepted

The problem is that the remove friend script was recently corrupted and only removed 1 of the 2 friend records so to many many it appears they still have a relationship whereas to their counterpart it does not.

What is the best way to clean up records which do not have a counterpart?

Thanks

Quick and dirty, something like this would work (WARNING - backup your database before running!!!)


DELETE
  FROM Friends
 WHERE CAST(user_id1 AS CHAR(10)) + 
    CAST(user_id2 AS CHAR(10)) NOT IN (SELECT CAST(user_id2 AS CHAR(10)) + 
             CAST(user_id1 AS CHAR(10)) 
           FROM Friends)

I’m very doubtful about what Dave’s suggesting.

If I understood correctly, the “main” filter is:


WHERE request_status = "accepted"

and to retrieve the left overs, a self join is required.

[QUOTE=itmitică;5149828]I’m very doubtful about what Dave’s suggesting.

If I understood correctly, the “main” filter is:


WHERE request_status = "accepted"

and to retrieve the left overs, a self join is required.[/QUOTE]

Not to do what he was asking, which was to cleanup the mess left behind from a faulty query eariler. According to the OP, there are two records for each request, one where user_id1 = friendA and user_id2 = friendB, and one where user_id1 = friendB and user_id2 = friendA. I was just looking for the records where there is no reciprocal record.

The OP was looking for an one-off query, not to correct the process.

  1. The table has multiple distinct values for the request status field. Hence, the first filter is about those accepted requests.

  2. After CAST()+CAST(), ‘6789’: could mean (6, 789), (67, 89), (678, 9) while ‘9876’ could mean (9, 876), (98, 76), (987, 6)

  3. Huge performance penalties with the use of CAST()+CAST().

[QUOTE=itmitică;5149844]2. After CAST()+CAST(), ‘6789’: could mean (6, 789), (67, 89), (678, 9) while ‘9876’ could mean (9, 876), (98, 76), (987, 6)[/quote]not if CHAR(10) works as intended, i.e. with trailing blanks

did you test it?

[QUOTE=itmitică;5149844]3. Huge performance penalties with the use of CAST()+CAST().[/QUOTE]define “huge”

Not this one, but I’d never do something like that because…

…it’s big big big…

…wrong.

[QUOTE=itmitică;5149860]…it’s big big big…

…wrong.[/QUOTE]no, it’s not big, and neither is it wrong

please don’t be unnecessarily alarmist

Yeah, it’s very well known that size it’s a matter of opinion! :wink:

Yeah, it is, since the days of surrogate fixed size fields db structures, before FoxPro 1.0, in the days of Pascal pointers.

I guess the word would be informative. :slight_smile:

[QUOTE=itmitică;5149866]Yeah, [ CAST ] is, since the days of surrogate fixed size fields db structures, before FoxPro 1.0, in the days of Pascal pointers.[/quote]perhaps <snip/> you forgot that we were talking about using the CAST function

which is, i repeat, ~not~ wrong

<snip/>

<snip/>

The use of CAST(), like the use of any other aggregate function in the WHERE clause, brings huge performance penalties with it. <snip/>

At the same time, regarding the fixed size fields and values, Dave and you are advertising flat file databases techniques over the normal relational databases techniques.

[QUOTE=itmitică;5149871]The use of CAST(), like the use of any other aggregate function in the WHERE clause, brings huge performance penalties with it. [/quote]first of all, CAST is not an aggregate function

and once again, i request you to define “huge” in specific terms

[QUOTE=itmitică;5149871]At the same time, regarding the fixed size fields and values, Dave and you are advertising flat file databases techniques over the normal relational databases techniques.[/QUOTE]nonsense

<snip/>

The assumption: based on OP data so far, a request for friendship is always expressed as records couples: (1|2, 2|1).

The solution: look for records that are singletons, no matter the request status: pending, denied, accepted.

The given: the friendship initiator always starts with the accepted value for status. However, this may change for the initiator, since, later on, it may choose to deny the friendship.

Possible actions:

1.1 When user 1 initiates a friendship request, two records are inserted:


    1|2|accepted
    2|1|pending

1.2 When user 1 denies the friendship it has initiated, the corresponding records are updated:


    1|2|denied
    2|1|pending

2.1 When the user 2 accepts the request, the corresponding record is updated:


    1|2|accepted
    2|1|accepted

2.2 When the user 2 denies the requests, the corresponding record is updated:


    1|2|accepted
    2|1|denied

along with the rest of the variations.

The study case table:


FRIENDS_REQUESTS
================
user_id1 | user_id2 | request_status | id
-----------------------------------------
       1 |        2 | accepted       |  1
       2 |        1 | accepted       |  2
       1 |        3 | accepted       |  3
       3 |        1 | accepted       |  4
       2 |        3 | accepted       |  5
       3 |        2 | pending        |  6

The cleanup scenario:

The record having id = 2 has been deleted.
The record having id = 1 is now unmatched.
It needs to be deleted.

The delete process will take id (PK) for the delete filter.
The id set to be deleted is determined like this:


    SELECT fr.id
      FROM friends_requests fr
     LEFT 
OUTER JOIN friends_requests AS fr2
        ON ( fr.user_id1 = fr2.user_id2
       AND   fr.user_id2 = fr2.user_id1 )
     WHERE fr2.request_status IS NULL

The query has an average execution time of about 1.5ms for the test table above.

<hr>

To address the penalties issues.

The query proposed by Dave:


 SELECT friends_requests.id
   FROM friends_requests
  WHERE CAST(user_id1 AS CHAR(10)) 
     || CAST(user_id2 AS CHAR(10)) 
 NOT IN ( SELECT CAST( user_id2 AS CHAR(10) ) 
              || CAST( user_id1 AS CHAR(10) ) 
            FROM friends_requests )

has an average execution time of about 2.0ms for the same test table above.

The larger the data, I personally believe that this difference will become much much bigger.

Another thing that’s plain wrong, to me.

( 1 = 1 AND 2 = 2 ) 

is far better comparison test, performance wise (or otherwise), then

( '1         2         ' = '1         2         ' )

All this while the WHERE…NOT IN, for me, it’s a JOIN gone wrong.

very nicely done

carefully laid out, rationally explained, non-confrontationally

:award:

Thanks to both of you guys for your help!

I went with Imitica’s solution since the friend table has over 5,000,000 records.