Best way to do joins

I have a table called BR_writing that contains writing submissions.
I have a table called BR_categories with a list of categories such as Fantasy, Science Fiction, Anime, etc.
I then have another table called BR_writing_to_cat of just ids to link the two together. IE: writing_id, category_id

Normally, if I wanted to, say, select all entries in the Science Fiction category, it would be something like:

SELECT W.writing_id, title, summary 
FROM BR_writing W, BR_writing_to_cat C
WHERE W.writing_id = C.writing_id AND category_id = '".$scifi_id."' 
LIMIT 0, 10

But what if I want to select all writing entries in Anime AND Fantasy? Below is the query I have now, but it seems like there might be a better or more optimal one out there.

SELECT writing_id, title, summary
FROM BR_writing_to_cat A, BR_writing W

INNER JOIN BR_writing_to_cat B 
ON A.writing_id = B.writing_id 
AND B.category_id = ".$id1." 

WHERE A.writing_id = W.id AND A.category_id = ".$id2."

If I wanted to do three at once, i’d do:

SELECT writing_id, title, summary
FROM BR_writing_to_cat A, BR_writing W

INNER JOIN BR_writing_to_cat B 
ON A.writing_id = B.writing_id 
AND B.category_id = ".$id1." 

INNER JOIN BR_writing_to_cat C 
ON B.writing_id = C.writing_id 
AND C.cat_id = ".$id2." 

WHERE A.writing_id = W.id AND A.category_id = ".$id3."

I’m wondering if there’s a faster query than this, and one that I could scale to maybe 5 category selections at once. Thanks for your help!

indeed there is a more efficient and scalable way. Using group by will make sure that only a single unique writing exists within the result set regardless of the number of categories it appears in. The same query can be used regardless of the number of categories. For optimization purposes you will also want to make sure there is an index on w2c.category_id, w2c.writing_id.

(untested)


SELECT
      w.writing_id
     ,w.title
     ,w.summary
  FROM
     BR_writing w
 INNER
  JOIN
     BR_writing_to_cat w2c
    ON
     w.writing_id = w2c.writing_id
 WHERE
     w2c.category_id IN (1,2,3,4)
 GROUP
    BY
     w.writing_id

Hi Oddz. Thanks so much for your help! I gave this a try and the result set ended up giving me entries that are in one of the entered categories. What I want to select are writing entries that are in all of the entered categories. On my site, writing entries can be classified into multiple categories. So I’d like a way to filter entries that are in BOTH sci fi and fantasy, for example. Is this possible?

Here’s the exact query I used:

SELECT
     w.id
    ,w.title
    ,w.summary
 FROM
    BR_writing w
 INNER
 JOIN
    BR_gallery_contents w2c
   ON
    w.id = w2c.medium_id
 WHERE
    w2c.gallery_id IN (367, 19)
 GROUP
   BY
    w.id
ORDER BY updated DESC
SELECT w.id
     , w.title
     , w.summary
  FROM ( SELECT medium_id
           FROM BR_gallery_contents
          WHERE gallery_id IN (367, 19)
         GROUP
             BY medium_id 
         HAVING COUNT(*) = 2 ) AS these
INNER
  JOIN BR_writing AS w
    ON w.id = these.medium_id

the HAVING count should be equal to however many categories you want the medium to belong to

r937 out of mere curiosity why this way:


SELECT w.id
     , w.title
     , w.summary
  FROM ( SELECT medium_id
           FROM BR_gallery_contents
          WHERE gallery_id IN (367, 19)
         GROUP
             BY medium_id 
         HAVING COUNT(*) = 2 ) AS these
INNER
  JOIN BR_writing AS w
    ON w.id = these.medium_id

instead of this way:


SELECT
     w.id
    ,w.title
    ,w.summary
 FROM
    BR_writing w
 INNER
 JOIN
    BR_gallery_contents w2c
   ON
    w.id = w2c.medium_id
 WHERE
    w2c.gallery_id IN (367, 19)
 GROUP
   BY
    w.id
HAVING
     COUNT(*) = 2

think about it this way – in my subquery, the GROUP BY is performed on only one table, and it’s a small table to boot (only 2 columns)

only the appropriate ids, surviving the HAVING, are then used to join to rows from the w table

in your query, you perform all the joins first, then do a GROUP BY on a much wider row, and throw a lot more stuff away in the HAVING clause

it’s a case of applying the filtering restriction first, then doing the join, versus doing the join, then applying the filtering restriction

I had a feeling the reason related to optimization considering I recall transio posting a similar solution to a problem a few weeks back. thanks.