forbes — 2011-01-31T07:45:38-05:00 — #1
I've developed a bookmarking application, and it's become apparent that I need to deal with duplicate bookmarks in a sensible way.
The long-term aim is to have a way of "scoring" bookmarks, so that when someone adds a bookmark that already exists, the score for the original is incremented.
The reasons for doing this are, I no doubt, obvious.
So far, the bookmarks table looks like this:
CREATE TABLE `bookmarks` (
`id` int(11) NOT NULL auto_increment,
`user_id` mediumint(11) NOT NULL,
`datetime` datetime default NULL,
`status` enum('public','private') NOT NULL default 'public',
PRIMARY KEY (`id`),
KEY `datetime` (`datetime`),
FULLTEXT KEY `title` (`title`,`snippet`)
I have a few ideas in my mind how I might build a scoring system, but I'd much prefer to run this by you guys first, before committing to something that's either not practical or scales poorly over time.
Any help would be much appreciated!
forbes — 2011-02-01T03:24:31-05:00 — #2
As I said previously.
r937 — 2011-01-31T17:38:17-05:00 — #3
first thing you have to do is remove everything from that table that relates to which user(s) bookmarked a particular bookmark, and put that in a separate table
forbes — 2011-01-31T17:50:52-05:00 — #4
Hi, and thanks for the reply!
That's one of the options I'm thinking of going with — placing the URL itself into a separate table, along with the score and the ID of the originating bookmark.
I was also thinking of chaining the bookmark URLs together, so that all other bookmarks would then fall into a parent-child relationship, along the lines of:
id, parent_id, child_id, url, score
That way, when an originating parent bookmark is viewed, all of the child bookmarks can be retrieved.
Is that a viable method, or are there better ways of doing this?
r937 — 2011-01-31T19:21:15-05:00 — #5
on what basis are two urls related?
r937 — 2011-02-01T05:04:27-05:00 — #6
okay, but what does "originating" mean?
i still don't see how two urls can be related
i'm quite familiar with the parent-child structure, but i just don't see it applying here
forbes — 2011-02-01T05:55:55-05:00 — #7
Someone will have added the first originating bookmark to which all duplicates are a child of.
r937 — 2011-02-01T06:12:20-05:00 — #8
no, that's not a good idea
duplicate the url each time? no
forbes — 2011-02-01T06:19:09-05:00 — #9
That's what I'm trying to figure out. But then there's the issue of determining exact matching URLs, which becomes a parameter cleansing issue.
Rather than me guessing — which is exactly what I'm doing right now — I was hoping someone would actually explain the most appropriate way of doing this.
guelphdad — 2011-02-01T07:49:09-05:00 — #10
Add a UNIQUE key on URL. Create a column called counter or something like that.
See INSERT ON DUPLICATE UPDATE syntax in the manual.
That way a duplicate is not inserted but your Counter column is updated instead.
forbes — 2011-02-01T15:48:30-05:00 — #11
I'm using INSERT ON DUPLICATE UPDATE elsewhere in the application, so I'm at least familiar with that. In fact, I'm doing almost exactly as you're recommending for the analytics side of things (user visits to bookmarks).
Right, I can see how this would work now.
I'm assuming I'm right to keep the parent-child method, so I can keep track of all those who added the same bookmark afterwards, yes?
Thanks for your time! Much appreciated.
guelphdad — 2011-02-01T16:07:05-05:00 — #12
No, you don't need a parent-child method at all, the link isn't a child of the same link, it is a duplicate of the link. You would use parent-child if you were drilling down:
Electronics -> Televisions - Sanyo -> Model123
If you want to know who added the bookmarks then you allow all inserts into the table with duplicate URLs and those who entered them. You would use a GROUP BY clause to group the URLs together and get a count of how many you had.
You would be using three tables. The one you have above (bookmarks) with the user_id column removed entirely, a user table defining information about your users and a users_bookmarks table with two columns userid, url_id and a new row for each person who enters a bookmark. You may have 50 people with different userid(s) with the same url_id in the table.
forbes — 2011-02-03T11:07:26-05:00 — #13
This is actually the one option I thought there might be a way of avoiding.
As I said, I'd thought through a few ideas, and this was the only one I could think of that would work fully. But I thought it was too complex and there was, perhaps, a better way.
OK, if that's the way, so be it.
And thanks for your time, here. Always appreciated.