Group similar results, listing users

r937 · February 24, 2011, 3:23pm

yikes (no offence)

okay, try this –

SELECT bookmarks.url
     , m.dupes
     , bookmarks.title
     , SUBSTR(bookmarks.snippet, 1, 100) as snippet
     , bookmarks.datetime
     , [COLOR="Red"]t.tags[/COLOR]
     , [COLOR="Blue"]l.users[/COLOR]
  FROM ( SELECT url
              , COUNT(*) AS dupes
              , MIN(datetime) AS earliest
           FROM bookmarks
         GROUP
             BY url ) AS m
INNER
  JOIN bookmarks
    ON bookmarks.url = m.url
   AND bookmarks.datetime = m.earliest
[COLOR="red"]LEFT OUTER
  JOIN ( SELECT bookmark_id
              , GROUP_CONCAT(tag) AS tags
           FROM tags
         GROUP
             BY bookmark_id ) AS t
    ON t.bookmark_id = bookmarks.id[/COLOR]
[COLOR="blue"]LEFT OUTER
  JOIN ( SELECT bookmark_id
              , GROUP_CONCAT(user_id) AS users
           FROM links
          WHERE status = 'public'
         GROUP
             BY bookmark_id ) AS l
    ON l.bookmark_id = bookmarks.id[/COLOR]
ORDER 
    BY bookmarks.datetime DESC LIMIT 0, 9

Forbes · February 24, 2011, 4:30pm

OK, I tried that and it hung phpMyAdmin as well as my web browser, when I mirrored your code into mine.

r937 · February 24, 2011, 4:39pm

make (bookmark_id,tag) the PK in tabs table

make (bookmark_id,user_id) the PK in links table

Forbes · February 25, 2011, 9:20am

Just so I know what you’re asking, you want me to make the bookmark_id and tag columns in the tags table primary keys, and similarly, the bookmark_id and user_id columns in the links table primary keys, too?

r937 · February 25, 2011, 10:28am

not primary keys (plural), because a table can have only one primary key – but it can be a composite key, consisting of more than one column

Forbes · February 25, 2011, 10:35am

Ah, OK.

I tried bookmark_id and tag in the tags table, but got an error:

#1170 - BLOB/TEXT column ‘tag’ used in key specification without a key length

Incidentally, tag is already an index of type FULLTEXT; used in the search function of the website.

BTW — I bought your book “Simply SQL”. I thought it was the least I could do, given all of the wonderful help you’ve provided me with, not just here, but at least two or three times previously.

r937 · February 25, 2011, 10:45am

i’m not sure what kind of tags you anticipate having, but i will guess that the longest one is not going to reach 100 characters, never mind the 65K that a TEXT column reserves

and i doubt fulltext searching is necessary

Forbes · February 25, 2011, 10:48am

I don’t want to impose limits on tag length as I’m using comma-delimited tagging, not the silly space-delimited, single word style.

So in some cases, the tags are much longer that 100 characters.

r937 · February 25, 2011, 10:52am

bad move, sorry to say

and fulltext indexing of comma-delimited strings is definitely wrong

you need to redesign it so that each tag on a particular bookmark gets its own row, so that the “tag” column contains only one word (or phrase, if you allow phrases as tags), and a bookmark has as many rows in the tags table as there are tags on that bookmark

basic one-to-many design

r937 · February 25, 2011, 11:04am

see where i did this in the query –

LEFT OUTER
  JOIN ( SELECT bookmark_id
              , GROUP_CONCAT(tag) AS tags
           FROM tags
         GROUP
             BY bookmark_id ) AS t
    ON t.bookmark_id = bookmarks.id

that join was predicated on the assumption that each tag is in a separate row, and the grouping in the subquery produces the comma-delimited list as a result column

if you had like all the tags for a bookmark in a TEXT column, there would’ve been no need to do any grouping of rows

Forbes · February 25, 2011, 11:14am

You’ve misunderstood me — each tag is a completely separate entry in the tags table.

I’m no SQL expert, but I’m no idiot, either!

So what I’m saying is (and I should have said this first time around), the user enters their tags as a comma-delimited “flat” list, in a text field. Then, the Under Cloud adds each tag as a unique entry, with the ID of the bookmark they belong to.

Forbes · February 25, 2011, 3:05pm

OK, I’ve done some rummaging around within the tags table and the longest tag I can find is well under 100 characters long.

So, with that in mind, I’ve changed the tag column type to varchar that’s 100 in length.

Then I tried making the bookmark_id and tag Primary Keys, but I get another error:

#1062 - Duplicate entry ‘4167-shapes’ for key 1

r937 · February 25, 2011, 3:15pm

my apologies, i did not know about this step --[indent]“Then, the Under Cloud adds each tag as a unique entry, with the ID of the bookmark they belong to.”[/indent]

you understand the error message, right?

Forbes · February 25, 2011, 3:41pm

You mean error 1062? No.

There’s bound to be gazillions of duplicate entries in the tag column, that’s something that can’t be avoided.

r937 · February 25, 2011, 3:46pm

no, you can have the same tag in there as many times as you need it

however, you cannot put the same tag on the same bookmark more than once

it’s the combinations that are unique

the error message is telling you that bookmark 4167 already has the ‘shapes’ tag, so you can’t link that bookmark with that tag again

Forbes · February 25, 2011, 3:53pm

No, the error is happening when I try creating a Primary Key from the two columns you mention.

r937 · February 25, 2011, 4:16pm

you cannot create the PK if the table already contains dupes – remove the dupes and then try creating the PK again

Forbes · February 25, 2011, 4:22pm

But what do you mean by duplicates; duplicate Primary Key or duplicate tags?

As I explained, I cannot remove the duplicate tags.

r937 · February 25, 2011, 5:30pm

duplicate rows where the combination bookmark_id and tag are present more than once

SELECT bookmark_id 
     , tag
  FROM tags
GROUP
    BY bookmark_id 
     , tag
HAVING COUNT(*) > 1

Forbes · February 25, 2011, 7:41pm

Right, OK. That’s done; unnecessary duplicates removed!

I added bookmark_id and tag as a combined Primary Key (showing as type BTREE), but when I run the home page, the browser still hangs.

I’ve attached a screen shot, to illustrate the schema.