Would JOIN Make This Select Query More Efficient?

casbboy · March 7, 2011, 4:16am

I am trying to better optimize my queries, and this is one that runs a ton…


SELECT trailers.trailer_id, trailers.trailer_title, films.film_title, films.image 
FROM films,trailers,tags 
WHERE tags.snub='$var' 
AND tags.trailer_id=trailers.trailer_id 
AND trailers.film_id=films.film_id 
ORDER BY trailers.trailer_id DESC 
LIMIT 0,10

This query uses three tables to get the information needed. Is this efficient?

All help appreciated
Ryan

r937 · March 7, 2011, 11:55am

it uses three tables because apparently three tables are needed to satisfy the requirements

would JOINs make it more efficient? no, but it would sure make the query easier to understand and to maintain

could you show us the results of the EXPLAIN, please

system · March 7, 2011, 2:04pm

A JOIN would properly implement ANSI-92 SQL, whereas you’re currently using the ANSI-89 spec. It should improve perfornance a bit, as it will limit your hash by two dimensions at a time (it will first hash the first two tables and then the 3rd, rather than all 3 at once). You may be able to further control the hashing by using a V-sproc with sub-queries doing a primary hash and secondary hash in secluded scopes.

However, more than likely, if your query isn’t performing well, you need to INDEX the keys that you’re joining on (film_id and trailer_id in all tables as well as tags.snub)… that’s the #1 reason a query runs too slowly, and will add huge increases in performance.

r937 · March 7, 2011, 2:25pm

steve, not that i doubt what you’re saying (because i can’t understand it well enough to doubt it)…

… but could you give a reference or two for this hashing business? and how to determine whether to hash two tables or three?

also, what’s a V-sproc? and how do you write a subquery to “do” a hash? and what the heck is a secluded scope (other than perhaps an observatory on a mountaintop)?

casbboy · March 7, 2011, 6:56pm

I definitely have all those columns indexed. Here is my explain:

1 SIMPLE tags ref trailer_id,snub snub 202 const 8696 Using where; Using temporary; Using filesort

1 SIMPLE trailers eq_ref PRIMARY,film_id PRIMARY 4 trailera_trailers.tags.trailer_id 1

1 SIMPLE films eq_ref PRIMARY PRIMARY 4 trailera_trailers.trailers.film_id 1

So 8696 x 1 x 1. Looks pretty good as is, right?

Ryan

system · March 7, 2011, 7:17pm

Hashing is the process of comparing multiple records from 2 tables. Say you have 10 records in each of the 3 tables… hashing them all together would result in 1000 results, which would all be compared against the where clause to filter the results. INNER JOINS could potentially get a query plan that filters the results of the first hash first, which could result in say 20 records hashed against the 2nd 10, instead of 100, which results in a total of 300 records getting hashed instead of 1000.

A V-sproc is a way of organizing joins in a stored procedure to force an optimized query execution plan. It would look like this:


CREATE PROCEDURE vsproc AS BEGIN
SELECT tr.*, 
    f.film_title, 
    f.image
FROM
    (SELECT 
        tr.trailer_id, 
        tr.trailer_title
    FROM tags AS t
        INNER JOIN trailers as tr ON t.trailer_id = tr.trailer_id  
            AND t.snub='$var'
    ) AS tr
    INNER JOIN films AS f
        ON tr.film_id = f.film_id  
ORDER BY tr.trailer_id DESC  
LIMIT 0,10
END

Notice each “scope” only joins 2 tables. You start with the scope that will limit results the most, and increasingly expand # of comparisons on outer bands of the query. This is most effective when joining several tables… you can really optimize execution time a lot by controlling the order of filtering and not trusting the DB to optimize the plan itself (which fails without fail).

Cheers!

system · March 7, 2011, 7:21pm

casb, can you also post a dump of your schema?

r937 · March 7, 2011, 7:42pm

transio, nice explanation, but i think those methods went out of fashion once the database optimizer achieved some measure of competence beyond your rather pessimistic “fails without fail”

i would expect that your two-at-a-time-from-the-inside-out method will today produce the same execution path as what mysql would produce from the original query, applying the most restrictive constraint first, etc.

that’s also how i like to code my complex joins, although i simply linearize them, i don’t nest them – most restrictive table first, and that’s the table that has the WHERE clause condition

r937 · March 7, 2011, 7:44pm

that’s a red flag right there

it’s caused by your ORDER BY

casbboy · March 7, 2011, 7:50pm

Ok,

So any time Order By comes into play, some serious efficiency gets lost?

Thanks
Ryan

r937 · March 7, 2011, 8:36pm

not every time, no – just when the ORDER BY sequence is different from the sequence the data was retrieved in, and it has to put the intermediate rows somewhere

casbboy · March 7, 2011, 8:41pm

Ah, now I see what you mean.

Hmmm. Okay, let me see if I can grab these rows differently.

Cheers
Ryan