Cleanup Query?

britton · July 11, 2012, 4:50pm

Recently I had an issue where friend requests were only half deleted.

I have a friend table which stores 3 records… user_id1, user_id2, and status of the request. For each friend request there are two records created.

So for instance, the stored data would look like:

1, 2, accepted
2, 1, accepted

The problem is that the remove friend script was recently corrupted and only removed 1 of the 2 friend records so to many many it appears they still have a relationship whereas to their counterpart it does not.

What is the best way to clean up records which do not have a counterpart?

Thanks

DaveMaxwell · July 11, 2012, 5:11pm

Quick and dirty, something like this would work (WARNING - backup your database before running!!!)


DELETE
  FROM Friends
 WHERE CAST(user_id1 AS CHAR(10)) + 
    CAST(user_id2 AS CHAR(10)) NOT IN (SELECT CAST(user_id2 AS CHAR(10)) + 
             CAST(user_id1 AS CHAR(10)) 
           FROM Friends)

system · July 11, 2012, 6:51pm

I’m very doubtful about what Dave’s suggesting.

If I understood correctly, the “main” filter is:


WHERE request_status = "accepted"

and to retrieve the left overs, a self join is required.

DaveMaxwell · July 11, 2012, 7:10pm

[QUOTE=itmitică;5149828]I’m very doubtful about what Dave’s suggesting.

If I understood correctly, the “main” filter is:


WHERE request_status = "accepted"

and to retrieve the left overs, a self join is required.[/QUOTE]

Not to do what he was asking, which was to cleanup the mess left behind from a faulty query eariler. According to the OP, there are two records for each request, one where user_id1 = friendA and user_id2 = friendB, and one where user_id1 = friendB and user_id2 = friendA. I was just looking for the records where there is no reciprocal record.

The OP was looking for an one-off query, not to correct the process.

system · July 11, 2012, 7:18pm

The table has multiple distinct values for the request status field. Hence, the first filter is about those accepted requests.
After CAST()+CAST(), ‘6789’: could mean (6, 789), (67, 89), (678, 9) while ‘9876’ could mean (9, 876), (98, 76), (987, 6)
Huge performance penalties with the use of CAST()+CAST().

r937 · July 11, 2012, 7:33pm

[QUOTE=itmitică;5149844]2. After CAST()+CAST(), ‘6789’: could mean (6, 789), (67, 89), (678, 9) while ‘9876’ could mean (9, 876), (98, 76), (987, 6)[/quote]not if CHAR(10) works as intended, i.e. with trailing blanks

did you test it?

[QUOTE=itmitică;5149844]3. Huge performance penalties with the use of CAST()+CAST().[/QUOTE]define “huge”

system · July 11, 2012, 7:37pm

Not this one, but I’d never do something like that because…

…it’s big big big…

…wrong.

r937 · July 11, 2012, 7:43pm

[QUOTE=itmitică;5149860]…it’s big big big…

…wrong.[/QUOTE]no, it’s not big, and neither is it wrong

please don’t be unnecessarily alarmist

system · July 11, 2012, 7:47pm

Yeah, it’s very well known that size it’s a matter of opinion!

Yeah, it is, since the days of surrogate fixed size fields db structures, before FoxPro 1.0, in the days of Pascal pointers.

I guess the word would be informative.

r937 · July 11, 2012, 7:51pm

[QUOTE=itmitică;5149866]Yeah, [ CAST ] is, since the days of surrogate fixed size fields db structures, before FoxPro 1.0, in the days of Pascal pointers.[/quote]perhaps <snip/> you forgot that we were talking about using the CAST function

which is, i repeat, ~not~ wrong

<snip/>

system · July 11, 2012, 7:56pm

<snip/>

The use of CAST(), like the use of any other aggregate function in the WHERE clause, brings huge performance penalties with it. <snip/>

At the same time, regarding the fixed size fields and values, Dave and you are advertising flat file databases techniques over the normal relational databases techniques.

r937 · July 11, 2012, 11:43pm

[QUOTE=itmitică;5149871]The use of CAST(), like the use of any other aggregate function in the WHERE clause, brings huge performance penalties with it. [/quote]first of all, CAST is not an aggregate function

and once again, i request you to define “huge” in specific terms

[QUOTE=itmitică;5149871]At the same time, regarding the fixed size fields and values, Dave and you are advertising flat file databases techniques over the normal relational databases techniques.[/QUOTE]nonsense

<snip/>

system · July 12, 2012, 8:46am

The assumption: based on OP data so far, a request for friendship is always expressed as records couples: (1|2, 2|1).

The solution: look for records that are singletons, no matter the request status: pending, denied, accepted.

The given: the friendship initiator always starts with the accepted value for status. However, this may change for the initiator, since, later on, it may choose to deny the friendship.

Possible actions:

1.1 When user 1 initiates a friendship request, two records are inserted:


    1|2|accepted
    2|1|pending

1.2 When user 1 denies the friendship it has initiated, the corresponding records are updated:


    1|2|denied
    2|1|pending

2.1 When the user 2 accepts the request, the corresponding record is updated:


    1|2|accepted
    2|1|accepted

2.2 When the user 2 denies the requests, the corresponding record is updated:


    1|2|accepted
    2|1|denied

along with the rest of the variations.

The study case table:


FRIENDS_REQUESTS
================
user_id1 | user_id2 | request_status | id
-----------------------------------------
       1 |        2 | accepted       |  1
       2 |        1 | accepted       |  2
       1 |        3 | accepted       |  3
       3 |        1 | accepted       |  4
       2 |        3 | accepted       |  5
       3 |        2 | pending        |  6

The cleanup scenario:

The record having id = 2 has been deleted.
The record having id = 1 is now unmatched.
It needs to be deleted.

The delete process will take id (PK) for the delete filter.
The id set to be deleted is determined like this:


    SELECT fr.id
      FROM friends_requests fr
     LEFT 
OUTER JOIN friends_requests AS fr2
        ON ( fr.user_id1 = fr2.user_id2
       AND   fr.user_id2 = fr2.user_id1 )
     WHERE fr2.request_status IS NULL

The query has an average execution time of about 1.5ms for the test table above.

<hr>

To address the penalties issues.

The query proposed by Dave:


 SELECT friends_requests.id
   FROM friends_requests
  WHERE CAST(user_id1 AS CHAR(10)) 
     || CAST(user_id2 AS CHAR(10)) 
 NOT IN ( SELECT CAST( user_id2 AS CHAR(10) ) 
              || CAST( user_id1 AS CHAR(10) ) 
            FROM friends_requests )

has an average execution time of about 2.0ms for the same test table above.

The larger the data, I personally believe that this difference will become much much bigger.

Another thing that’s plain wrong, to me.

( 1 = 1 AND 2 = 2 )

is far better comparison test, performance wise (or otherwise), then

( '1         2         ' = '1         2         ' )

All this while the WHERE…NOT IN, for me, it’s a JOIN gone wrong.

r937 · July 12, 2012, 10:41am

very nicely done

carefully laid out, rationally explained, non-confrontationally

:award:

britton · July 12, 2012, 1:48pm

Thanks to both of you guys for your help!

I went with Imitica’s solution since the friend table has over 5,000,000 records.