britton — 2012-07-11T12:50:38-04:00 — #1
Recently I had an issue where friend requests were only half deleted.
I have a friend table which stores 3 records... user_id1, user_id2, and status of the request. For each friend request there are two records created.
So for instance, the stored data would look like:
1, 2, accepted
2, 1, accepted
The problem is that the remove friend script was recently corrupted and only removed 1 of the 2 friend records so to many many it appears they still have a relationship whereas to their counterpart it does not.
What is the best way to clean up records which do not have a counterpart?
davemaxwell — 2012-07-11T13:11:55-04:00 — #2
Quick and dirty, something like this would work (WARNING - backup your database before running!!!)
WHERE CAST(user_id1 AS CHAR(10)) +
CAST(user_id2 AS CHAR(10)) NOT IN (SELECT CAST(user_id2 AS CHAR(10)) +
CAST(user_id1 AS CHAR(10))
system — 2012-07-11T14:51:46-04:00 — #3
I'm very doubtful about what Dave's suggesting.
If I understood correctly, the "main" filter is:
WHERE request_status = "accepted"
and to retrieve the left overs, a self join is required.
davemaxwell — 2012-07-11T15:10:19-04:00 — #4
Not to do what he was asking, which was to cleanup the mess left behind from a faulty query eariler. According to the OP, there are two records for each request, one where user_id1 = friendA and user_id2 = friendB, and one where user_id1 = friendB and user_id2 = friendA. I was just looking for the records where there is no reciprocal record.
The OP was looking for an one-off query, not to correct the process.
system — 2012-07-11T15:18:10-04:00 — #5
The table has multiple distinct values for the request status field. Hence, the first filter is about those accepted requests.
After CAST()+CAST(), '6789': could mean (6, 789), (67, 89), (678, 9) while '9876' could mean (9, 876), (98, 76), (987, 6)
Huge performance penalties with the use of CAST()+CAST().
r937 — 2012-07-11T15:33:21-04:00 — #6
not if CHAR(10) works as intended, i.e. with trailing blanks
did you test it?
system — 2012-07-11T15:37:55-04:00 — #7
Not this one, but I'd never do something like that because...
...it's big big big...
r937 — 2012-07-11T15:43:37-04:00 — #8
no, it's not big, and neither is it wrong
please don't be unnecessarily alarmist
system — 2012-07-11T15:47:26-04:00 — #9
Yeah, it's very well known that size it's a matter of opinion!
Yeah, it is, since the days of surrogate fixed size fields db structures, before FoxPro 1.0, in the days of Pascal pointers.
I guess the word would be informative.
r937 — 2012-07-11T15:51:38-04:00 — #10
perhaps <snip/> you forgot that we were talking about using the CAST function
which is, i repeat, ~not~ wrong
system — 2012-07-11T15:56:07-04:00 — #11
The use of CAST(), like the use of any other aggregate function in the WHERE clause, brings huge performance penalties with it. <snip/>
At the same time, regarding the fixed size fields and values, Dave and you are advertising flat file databases techniques over the normal relational databases techniques.
r937 — 2012-07-11T19:43:23-04:00 — #12
first of all, CAST is not an aggregate function
and once again, i request you to define "huge" in specific terms
system — 2012-07-12T04:46:39-04:00 — #13
The assumption: based on OP data so far, a request for friendship is always expressed as records couples: (1|2, 2|1).
The solution: look for records that are singletons, no matter the request status: pending, denied, accepted.
The given: the friendship initiator always starts with the accepted value for status. However, this may change for the initiator, since, later on, it may choose to deny the friendship.
1.1 When user 1 initiates a friendship request, two records are inserted:
1.2 When user 1 denies the friendship it has initiated, the corresponding records are updated:
2.1 When the user 2 accepts the request, the corresponding record is updated:
2.2 When the user 2 denies the requests, the corresponding record is updated:
along with the rest of the variations.
The study case table:
user_id1 | user_id2 | request_status | id
1 | 2 | accepted | 1
2 | 1 | accepted | 2
1 | 3 | accepted | 3
3 | 1 | accepted | 4
2 | 3 | accepted | 5
3 | 2 | pending | 6
The cleanup scenario:
The record having id = 2 has been deleted.
The record having id = 1 is now unmatched.
It needs to be deleted.
The delete process will take id (PK) for the delete filter.
The id set to be deleted is determined like this:
FROM friends_requests fr
OUTER JOIN friends_requests AS fr2
ON ( fr.user_id1 = fr2.user_id2
AND fr.user_id2 = fr2.user_id1 )
WHERE fr2.request_status IS NULL
The query has an average execution time of about 1.5ms for the test table above.
To address the penalties issues.
The query proposed by Dave:
WHERE CAST(user_id1 AS CHAR(10))
|| CAST(user_id2 AS CHAR(10))
NOT IN ( SELECT CAST( user_id2 AS CHAR(10) )
|| CAST( user_id1 AS CHAR(10) )
FROM friends_requests )
has an average execution time of about 2.0ms for the same test table above.
The larger the data, I personally believe that this difference will become much much bigger.
Another thing that's plain wrong, to me.
( 1 = 1 AND 2 = 2 )
is far better comparison test, performance wise (or otherwise), then
( '1 2 ' = '1 2 ' )
All this while the WHERE...NOT IN, for me, it's a JOIN gone wrong.
r937 — 2012-07-12T06:41:08-04:00 — #14
very nicely done
carefully laid out, rationally explained, non-confrontationally
britton — 2012-07-12T09:48:21-04:00 — #15
Thanks to both of you guys for your help!
I went with Imitica's solution since the friend table has over 5,000,000 records.
mittineague — 2014-09-20T15:47:30-04:00 — #16
This topic is now archived. It is frozen and cannot be changed in any way.