everactive — 2010-02-20T18:38:04-05:00 — #1
I want to have the tags feature in my PHP application, just like this forum ("Separate tags using a comma." when posting).
Should I have a separate table or just a column? How it's ideally done?
Please advise. Thanks.
dan_grossman — 2010-02-20T19:35:13-05:00 — #2
Separate table, a one-to-many relationship to the object being tagged. That allows you to efficiently query the tags, so that you can do things like create a tag cloud, a tag page, top tags list, tag suggestions, etc.
everactive — 2010-02-20T19:50:34-05:00 — #3
You mean like this, right?
dan_grossman — 2010-02-20T20:01:40-05:00 — #4
Yes. Though you don't need a tagid, the (videoid, word) pairs are unique already.
everactive — 2010-02-20T20:06:01-05:00 — #5
Thanks, much appreciated.
mike_borozdin — 2010-02-22T12:31:15-05:00 — #6
Have you thought about data redundancy this schema creates, as well as updates anomalities?
Here is an example of data stored in the suggested schema?
post#1 | tag#1
post#2 | tag#2
post#3 | tag#1
post#1 | tag#1
See a problem?
I would create two tables. One will contain tags (tagId, tagName) and the other one will link posts with tags (postId, tagId) and yes the latter uses many-to-many.
dan_grossman — 2010-02-22T12:47:28-05:00 — #7
The primary key (post_id, tag) must be unique, so row #1 and row #4 could not both exist. Nor does the schema create update anomalies. If the author of post #1 wants to change his tags, that doesn't mean the tag post #3 previously had in common should change. As long as tags are user defined and not a prewritten safe list maintained by someone else, your schema requires more queries and more maintenance with no practical benefit.
r937 — 2010-02-22T12:47:43-05:00 — #8
yes, you've repeated post#1|tag#1 and that wouldn't be allowed
but i think i know what you were trying to suggest, and i think it's wrong
the issue of "redundancy" is not going to go away if you substitute a numeric id for the tag name
as often as the tag name would have repeated, the numeric id will also repeat just as much
"redundancy" is a poor excuse to use a numeric id
furthermore, note that with a numeric id, an extra join is required
mike_borozdin — 2010-02-22T14:14:20-05:00 — #9
I made a type, of course they couldn't both exist.
As for update anomalies, I mean if you have many posts tagged with "tag A" and suddenly you want to rename this tag to "tag AA" you have to update a whole bunch of entries.
Moreover, if we have a schema similar to this one: tags (tagId, tagTitle, numPosts) and providing we're updating numPosts after tagging a post with a corresponding tag, we are pretty much better off when we need to generate a tag cloud, because we don't have to count the number of posts tagged with all the tags.
r937 — 2010-02-22T15:13:12-05:00 — #10
this is a rational argument for other many-to-many structures, but it hardly applies to tags
'ms jones' might have to become 'mrs smith', but when does the 'mysql' tag have to change to some other name? more to the point, would it still be the mysql tag? this question goes to the root of identity, and of course identity is what candidate and primary keys are all about
this too is a rational argument, but again, it hardly supports the need for a numeric id, it also works just as well for a tags table where the numeric id is missing
mike_borozdin — 2010-02-25T06:54:55-05:00 — #11
There can be a type in a tag that you only notice having published a few posts with it. Also, I remember making a huge tag revamp and I think I changed the names of some tags.
Okay, we can actually go without a numeric id, I just think it's better to join tables rather than perform counting when displaying a tag cloud. Although, it's just words, I didn't evaluate the performance in both cases.
r937 — 2010-02-25T09:25:13-05:00 — #12
fixing typos and making mass changes can both be accomplished easily with simple UPDATE statements
you do not need a numeric id to make occasional updates
and there's the key -- you don't update tags regularly, just occasionally
meanwhile, during normal operations, you are saved the unnecessary overhead that the extra join requires